Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention

Name: Work Video:
Duration: 3 min 30 s
Haotian Yan; Chuang Zhang; Ming Wu
{'id': 'https://openalex.org/W4221153029', 'doi': 'https://doi.org/10.48550/arxiv.2201.01615', 'title': 'Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention', 'display_name': 'Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention', 'publication_year': 2022, 'publication_date': '2022-01-01', 'ids': {'openalex': 'https://openalex.org/W4221153029', 'doi': 'https://doi.org/10.48550/arxiv.2201.01615'}, 'language': 'en', 'primary_location': {'is_oa': True, 'landing_page_url': 'https://arxiv.org/abs/2201.01615', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306400194', 'display_name': 'arXiv (Cornell University)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I205783295', 'host_organization_name': 'Cornell University', 'host_organization_lineage': ['https://openalex.org/I205783295'], 'host_organization_lineage_names': ['Cornell University'], 'type': 'repository'}, 'license': 'other-oa', 'license_id': 'https://openalex.org/licenses/other-oa', 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}, 'type': 'preprint', 'type_crossref': 'posted-content', 'indexed_in': ['arxiv', 'datacite'], 'open_access': {'is_oa': True, 'oa_status': 'green', 'oa_url': 'https://arxiv.org/abs/2201.01615', 'any_repository_has_fulltext': True}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5057204922', 'display_name': 'Haotian Yan', 'orcid': 'https://orcid.org/0000-0003-0049-8331'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Yan, Haotian', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5100634906', 'display_name': 'Chuang Zhang', 'orcid': 'https://orcid.org/0000-0001-6685-7048'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Zhang, Chuang', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5100669887', 'display_name': 'Ming Wu', 'orcid': 'https://orcid.org/0000-0002-3582-4881'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Wu, Ming', 'raw_affiliation_strings': [], 'affiliations': []}], 'institution_assertions': [], 'countries_distinct_count': 0, 'institutions_distinct_count': 0, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': None, 'has_fulltext': False, 'cited_by_count': 42, 'citation_normalized_percentile': {'value': 0.999842, 'is_in_top_1_percent': True, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 98, 'max': 99}, 'biblio': {'volume': None, 'issue': None, 'first_page': None, 'last_page': None}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10036', 'display_name': 'Advanced Neural Network Applications', 'score': 0.9992, 'subfield': {'id': 'https://openalex.org/subfields/1707', 'display_name': 'Computer Vision and Pattern Recognition'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10036', 'display_name': 'Advanced Neural Network Applications', 'score': 0.9992, 'subfield': {'id': 'https://openalex.org/subfields/1707', 'display_name': 'Computer Vision and Pattern Recognition'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11714', 'display_name': 'Multimodal Machine Learning Applications', 'score': 0.9976, 'subfield': {'id': 'https://openalex.org/subfields/1707', 'display_name': 'Computer Vision and Pattern Recognition'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10627', 'display_name': 'Advanced Image and Video Retrieval Techniques', 'score': 0.9923, 'subfield': {'id': 'https://openalex.org/subfields/1707', 'display_name': 'Computer Vision and Pattern Recognition'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/pooling', 'display_name': 'Pooling', 'score': 0.6785495}], 'concepts': [{'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.76719844}, {'id': 'https://openalex.org/C89600930', 'wikidata': 'https://www.wikidata.org/wiki/Q1423946', 'display_name': 'Segmentation', 'level': 2, 'score': 0.7033913}, {'id': 'https://openalex.org/C70437156', 'wikidata': 'https://www.wikidata.org/wiki/Q7228652', 'display_name': 'Pooling', 'level': 2, 'score': 0.6785495}, {'id': 'https://openalex.org/C66322947', 'wikidata': 'https://www.wikidata.org/wiki/Q11658', 'display_name': 'Transformer', 'level': 3, 'score': 0.66418135}, {'id': 'https://openalex.org/C118505674', 'wikidata': 'https://www.wikidata.org/wiki/Q42586063', 'display_name': 'Encoder', 'level': 2, 'score': 0.57353824}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.5005214}, {'id': 'https://openalex.org/C81363708', 'wikidata': 'https://www.wikidata.org/wiki/Q17084460', 'display_name': 'Convolutional neural network', 'level': 2, 'score': 0.4559385}, {'id': 'https://openalex.org/C153180895', 'wikidata': 'https://www.wikidata.org/wiki/Q7148389', 'display_name': 'Pattern recognition (psychology)', 'level': 2, 'score': 0.36957324}, {'id': 'https://openalex.org/C127413603', 'wikidata': 'https://www.wikidata.org/wiki/Q11023', 'display_name': 'Engineering', 'level': 0, 'score': 0.073388875}, {'id': 'https://openalex.org/C165801399', 'wikidata': 'https://www.wikidata.org/wiki/Q25428', 'display_name': 'Voltage', 'level': 2, 'score': 0.0}, {'id': 'https://openalex.org/C119599485', 'wikidata': 'https://www.wikidata.org/wiki/Q43035', 'display_name': 'Electrical engineering', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C111919701', 'wikidata': 'https://www.wikidata.org/wiki/Q9135', 'display_name': 'Operating system', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 3, 'locations': [{'is_oa': True, 'landing_page_url': 'https://arxiv.org/abs/2201.01615', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306400194', 'display_name': 'arXiv (Cornell University)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I205783295', 'host_organization_name': 'Cornell University', 'host_organization_lineage': ['https://openalex.org/I205783295'], 'host_organization_lineage_names': ['Cornell University'], 'type': 'repository'}, 'license': 'other-oa', 'license_id': 'https://openalex.org/licenses/other-oa', 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}, {'is_oa': True, 'landing_page_url': 'http://arxiv.org/abs/2201.01615', 'pdf_url': 'http://arxiv.org/pdf/2201.01615', 'source': {'id': 'https://openalex.org/S4306400194', 'display_name': 'arXiv (Cornell University)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I205783295', 'host_organization_name': 'Cornell University', 'host_organization_lineage': ['https://openalex.org/I205783295'], 'host_organization_lineage_names': ['Cornell University'], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}, {'is_oa': False, 'landing_page_url': 'https://api.datacite.org/dois/10.48550/arxiv.2201.01615', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4393179698', 'display_name': 'DataCite API', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I4210145204', 'host_organization_name': 'DataCite', 'host_organization_lineage': ['https://openalex.org/I4210145204'], 'host_organization_lineage_names': ['DataCite'], 'type': 'metadata'}, 'license': None, 'license_id': None, 'version': None}], 'best_oa_location': {'is_oa': True, 'landing_page_url': 'https://arxiv.org/abs/2201.01615', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306400194', 'display_name': 'arXiv (Cornell University)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I205783295', 'host_organization_name': 'Cornell University', 'host_organization_lineage': ['https://openalex.org/I205783295'], 'host_organization_lineage_names': ['Cornell University'], 'type': 'repository'}, 'license': 'other-oa', 'license_id': 'https://openalex.org/licenses/other-oa', 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}, 'sustainable_development_goals': [{'score': 0.58, 'id': 'https://metadata.un.org/sdg/11', 'display_name': 'Sustainable cities and communities'}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 0, 'referenced_works': [], 'related_works': ['https://openalex.org/W4390975304', 'https://openalex.org/W4287804464', 'https://openalex.org/W3211292372', 'https://openalex.org/W3103989898', 'https://openalex.org/W3022252430', 'https://openalex.org/W2953234277', 'https://openalex.org/W2900413183', 'https://openalex.org/W2810679507', 'https://openalex.org/W2626256601', 'https://openalex.org/W147410782'], 'abstract_inverted_index': {'Multi-scale': [0], 'representations': [1, 64], 'are': [2, 40], 'crucial': [3], 'for': [4, 162], 'semantic': [5, 14, 37, 66, 163], 'segmentation.': [6], 'The': [7, 187, 220], 'community': [8], 'has': [9], 'witnessed': [10], 'the': [11, 27, 76, 90, 109, 112, 116, 121, 127, 134, 200], 'flourish': [12], 'of': [13, 44, 53, 98, 111, 136, 173], 'segmentation': [15, 38, 67, 164], 'convolutional': [16], 'neural': [17], 'networks': [18], '(CNN)': [19], 'exploiting': [20], 'multi-scale': [21, 63], 'contextual': [22, 128], 'information.': [23], 'Motivated': [24], 'by': [25], 'that': [26, 191], 'vision': [28, 177], 'transformer': [29, 178], '(ViT)': [30], 'is': [31, 140, 171], 'powerful': [32], 'in': [33, 61], 'image': [34], 'classification,': [35], 'some': [36], 'ViTs': [39], 'recently': [41], 'proposed,': [42], 'most': [43], 'them': [45], 'attaining': [46], 'impressive': [47], 'results': [48, 189], 'but': [49], 'at': [50, 101, 130, 225], 'a': [51, 95, 103, 151, 183], 'cost': [52], 'computational': [54], 'economy.': [55], 'In': [56], 'this': [57, 81], 'paper,': [58], 'we': [59, 83, 119], 'succeed': [60], 'introducing': [62], 'into': [65], 'ViT': [68], 'via': [69], 'window': [70, 86, 92, 100, 123, 147], 'attention': [71, 87, 157], 'mechanism': [72], 'and': [73, 78, 182, 217], 'further': [74, 205], 'improves': [75], 'performance': [77, 209], 'efficiency.': [79], 'To': [80], 'end,': [82], 'introduce': [84], 'large': [85, 146], 'which': [88, 149], 'allows': [89], 'local': [91], 'to': [93, 115, 125, 142, 199], 'query': [94, 117], 'larger': [96], 'area': [97, 114], 'context': [99, 113], 'only': [102], 'little': [104], 'computation': [105], 'overhead.': [106], 'By': [107], 'regulating': [108], 'ratio': [110], 'area,': [118], 'enable': [120], '$\\textit{large': [122], 'attention}$': [124], 'capture': [126], 'information': [129], 'multiple': [131], 'scales.': [132], 'Moreover,': [133], 'framework': [135], 'spatial': [137, 158], 'pyramid': [138, 159], 'pooling': [139, 160], 'adopted': [141], 'collaborate': [143], 'with': [144], '$\\textit{the': [145], 'attention}$,': [148], 'presents': [150], 'novel': [152], 'decoder': [153], 'named': [154], '$\\textbf{la}$rge': [155], '$\\textbf{win}$dow': [156], '(LawinASPP)': [161], 'ViT.': [165], 'Our': [166], 'resulting': [167], 'ViT,': [168], 'Lawin': [169, 192, 203], 'Transformer,': [170], 'composed': [172], 'an': [174, 195], 'efficient': [175], 'hierachical': [176], '(HVT)': [179], 'as': [180, 185], 'encoder': [181], 'LawinASPP': [184], 'decoder.': [186], 'empirical': [188], 'demonstrate': [190], 'Transformer': [193, 204], 'offers': [194], 'improved': [196], 'efficiency': [197], 'compared': [198], 'existing': [201], 'method.': [202], 'sets': [206], 'new': [207], 'state-of-the-art': [208], 'on': [210], 'Cityscapes': [211], '(84.4%': [212], 'mIoU),': [213], 'ADE20K': [214], '(56.2%': [215], 'mIoU)': [216], 'COCO-Stuff': [218], 'datasets.': [219], 'code': [221], 'will': [222], 'be': [223], 'released': [224], 'https://github.com/yan-hao-tian/lawin': [226]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W4221153029', 'counts_by_year': [{'year': 2024, 'cited_by_count': 13}, {'year': 2023, 'cited_by_count': 19}, {'year': 2022, 'cited_by_count': 7}], 'updated_date': '2024-12-25T15:49:35.699884', 'created_date': '2022-04-03'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works