Efficient Large Scale Language Modeling with Mixtures of Experts

Mikel Artetxe; Shruti Bhosale; Naman Goyal; Todor Mihaylov; Myle Ott; Sam Shleifer; Xi Victoria Lin; Jingfei Du; Srinivasan Iyer; Ramakanth Pasunuru; Giridharan Anantharaman; Xian Li; Shuohui Chen; Halil Akın; Mandeep Baines; Louis Martin; Xing Zhou; Punit Singh Koura; Brian O’Horo; Jeffrey Wang; Luke Zettlemoyer; Mona Diab; Zornitsa Kozareva; Veselin Stoyanov
{'id': 'https://openalex.org/W4385567093', 'doi': 'https://doi.org/10.18653/v1/2022.emnlp-main.804', 'title': 'Efficient Large Scale Language Modeling with Mixtures of Experts', 'display_name': 'Efficient Large Scale Language Modeling with Mixtures of Experts', 'publication_year': 2022, 'publication_date': '2022-01-01', 'ids': {'openalex': 'https://openalex.org/W4385567093', 'doi': 'https://doi.org/10.18653/v1/2022.emnlp-main.804'}, 'language': 'en', 'primary_location': {'is_oa': True, 'landing_page_url': 'https://doi.org/10.18653/v1/2022.emnlp-main.804', 'pdf_url': 'https://aclanthology.org/2022.emnlp-main.804.pdf', 'source': None, 'license': 'cc-by', 'license_id': 'https://openalex.org/licenses/cc-by', 'version': 'publishedVersion', 'is_accepted': True, 'is_published': True}, 'type': 'article', 'type_crossref': 'proceedings-article', 'indexed_in': ['crossref'], 'open_access': {'is_oa': True, 'oa_status': 'hybrid', 'oa_url': 'https://aclanthology.org/2022.emnlp-main.804.pdf', 'any_repository_has_fulltext': True}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5023341622', 'display_name': 'Mikel Artetxe', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Mikel Artetxe', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5065321401', 'display_name': 'Shruti Bhosale', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Shruti Bhosale', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5075834790', 'display_name': 'Naman Goyal', 'orcid': 'https://orcid.org/0000-0002-7565-4303'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Naman Goyal', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5052408504', 'display_name': 'Todor Mihaylov', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Todor Mihaylov', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5076248976', 'display_name': 'Myle Ott', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Myle Ott', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5042449490', 'display_name': 'Sam Shleifer', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Sam Shleifer', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5062266757', 'display_name': 'Xi Victoria Lin', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Xi Victoria Lin', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5006191011', 'display_name': 'Jingfei Du', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Jingfei Du', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5088004913', 'display_name': 'Srinivasan Iyer', 'orcid': 'https://orcid.org/0000-0002-6186-2603'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Srinivasan Iyer', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5075564427', 'display_name': 'Ramakanth Pasunuru', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Ramakanth Pasunuru', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5092595749', 'display_name': 'Giridharan Anantharaman', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Giridharan Anantharaman', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5100425999', 'display_name': 'Xian Li', 'orcid': 'https://orcid.org/0000-0001-5714-3940'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Xian Li', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5017068346', 'display_name': 'Shuohui Chen', 'orcid': 'https://orcid.org/0000-0002-1006-2964'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Shuohui Chen', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5003931390', 'display_name': 'Halil Akın', 'orcid': 'https://orcid.org/0000-0002-6651-7403'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Halil Akin', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5004342841', 'display_name': 'Mandeep Baines', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Mandeep Baines', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5012900244', 'display_name': 'Louis Martin', 'orcid': 'https://orcid.org/0000-0002-5168-8904'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Louis Martin', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5100399104', 'display_name': 'Xing Zhou', 'orcid': 'https://orcid.org/0000-0001-8701-3856'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Xing Zhou', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5051808640', 'display_name': 'Punit Singh Koura', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Punit Singh Koura', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5090954457', 'display_name': 'Brian O’Horo', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Brian O’Horo', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5100641852', 'display_name': 'Jeffrey Wang', 'orcid': 'https://orcid.org/0000-0002-6900-8268'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Jeffrey Wang', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5067919401', 'display_name': 'Luke Zettlemoyer', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Luke Zettlemoyer', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5038581447', 'display_name': 'Mona Diab', 'orcid': 'https://orcid.org/0000-0002-7696-1436'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Mona Diab', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5007532004', 'display_name': 'Zornitsa Kozareva', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Zornitsa Kozareva', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5091317839', 'display_name': 'Veselin Stoyanov', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Veselin Stoyanov', 'raw_affiliation_strings': [], 'affiliations': []}], 'countries_distinct_count': 0, 'institutions_distinct_count': 0, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 5.901, 'has_fulltext': True, 'fulltext_origin': 'pdf', 'cited_by_count': 32, 'citation_normalized_percentile': {'value': 0.99997, 'is_in_top_1_percent': True, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 98, 'max': 99}, 'biblio': {'volume': None, 'issue': None, 'first_page': None, 'last_page': None}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10028', 'display_name': 'Natural Language Processing', 'score': 0.9872, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10028', 'display_name': 'Natural Language Processing', 'score': 0.9872, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T13910', 'display_name': 'Computational Text Analysis in Social Sciences', 'score': 0.9611, 'subfield': {'id': 'https://openalex.org/subfields/3300', 'display_name': 'General Social Sciences'}, 'field': {'id': 'https://openalex.org/fields/33', 'display_name': 'Social Sciences'}, 'domain': {'id': 'https://openalex.org/domains/2', 'display_name': 'Social Sciences'}}, {'id': 'https://openalex.org/T10181', 'display_name': 'Statistical Machine Translation and Natural Language Processing', 'score': 0.9537, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/machine-translation', 'display_name': 'Machine Translation', 'score': 0.554715}, {'id': 'https://openalex.org/keywords/neural-machine-translation', 'display_name': 'Neural Machine Translation', 'score': 0.552569}, {'id': 'https://openalex.org/keywords/natural-language-processing', 'display_name': 'Natural Language Processing', 'score': 0.551627}, {'id': 'https://openalex.org/keywords/multilingual-neural-machine-translation', 'display_name': 'Multilingual Neural Machine Translation', 'score': 0.541401}, {'id': 'https://openalex.org/keywords/language-modeling', 'display_name': 'Language Modeling', 'score': 0.539868}], 'concepts': [{'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.5216456}, {'id': 'https://openalex.org/C2778755073', 'wikidata': 'https://www.wikidata.org/wiki/Q10858537', 'display_name': 'Scale (ratio)', 'level': 2, 'score': 0.5187414}, {'id': 'https://openalex.org/C204321447', 'wikidata': 'https://www.wikidata.org/wiki/Q30642', 'display_name': 'Natural language processing', 'level': 1, 'score': 0.47338167}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.39297932}, {'id': 'https://openalex.org/C52119013', 'wikidata': 'https://www.wikidata.org/wiki/Q50637', 'display_name': 'Art history', 'level': 1, 'score': 0.32123166}, {'id': 'https://openalex.org/C142362112', 'wikidata': 'https://www.wikidata.org/wiki/Q735', 'display_name': 'Art', 'level': 0, 'score': 0.284975}, {'id': 'https://openalex.org/C58640448', 'wikidata': 'https://www.wikidata.org/wiki/Q42515', 'display_name': 'Cartography', 'level': 1, 'score': 0.17058906}, {'id': 'https://openalex.org/C205649164', 'wikidata': 'https://www.wikidata.org/wiki/Q1071', 'display_name': 'Geography', 'level': 0, 'score': 0.12567595}], 'mesh': [], 'locations_count': 2, 'locations': [{'is_oa': True, 'landing_page_url': 'https://doi.org/10.18653/v1/2022.emnlp-main.804', 'pdf_url': 'https://aclanthology.org/2022.emnlp-main.804.pdf', 'source': None, 'license': 'cc-by', 'license_id': 'https://openalex.org/licenses/cc-by', 'version': 'publishedVersion', 'is_accepted': True, 'is_published': True}, {'is_oa': True, 'landing_page_url': 'https://arxiv.org/abs/2112.10684', 'pdf_url': 'https://arxiv.org/pdf/2112.10684', 'source': {'id': 'https://openalex.org/S4306400194', 'display_name': 'arXiv (Cornell University)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I205783295', 'host_organization_name': 'Cornell University', 'host_organization_lineage': ['https://openalex.org/I205783295'], 'host_organization_lineage_names': ['Cornell University'], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}], 'best_oa_location': {'is_oa': True, 'landing_page_url': 'https://doi.org/10.18653/v1/2022.emnlp-main.804', 'pdf_url': 'https://aclanthology.org/2022.emnlp-main.804.pdf', 'source': None, 'license': 'cc-by', 'license_id': 'https://openalex.org/licenses/cc-by', 'version': 'publishedVersion', 'is_accepted': True, 'is_published': True}, 'sustainable_development_goals': [{'score': 0.8, 'id': 'https://metadata.un.org/sdg/4', 'display_name': 'Quality education'}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 49, 'referenced_works': ['https://openalex.org/W1566289585', 'https://openalex.org/W2126725946', 'https://openalex.org/W2134800885', 'https://openalex.org/W2785611959', 'https://openalex.org/W2794325560', 'https://openalex.org/W2890894339', 'https://openalex.org/W2891555348', 'https://openalex.org/W2896457183', 'https://openalex.org/W2909212904', 'https://openalex.org/W2914120296', 'https://openalex.org/W2946609015', 'https://openalex.org/W2952638691', 'https://openalex.org/W2963250244', 'https://openalex.org/W2963979492', 'https://openalex.org/W2965373594', 'https://openalex.org/W2970476646', 'https://openalex.org/W2970752815', 'https://openalex.org/W2998617917', 'https://openalex.org/W3032765105', 'https://openalex.org/W3034716087', 'https://openalex.org/W3034999214', 'https://openalex.org/W3035379020', 'https://openalex.org/W3035390927', 'https://openalex.org/W3036369012', 'https://openalex.org/W3099744315', 'https://openalex.org/W3099771192', 'https://openalex.org/W3104163040', 'https://openalex.org/W3105882417', 'https://openalex.org/W3107826490', 'https://openalex.org/W3126259453', 'https://openalex.org/W3156170450', 'https://openalex.org/W3164972323', 'https://openalex.org/W3169483174', 'https://openalex.org/W3172943453', 'https://openalex.org/W3173777717', 'https://openalex.org/W3176477796', 'https://openalex.org/W3185293939', 'https://openalex.org/W3195577433', 'https://openalex.org/W3196642073', 'https://openalex.org/W3196731672', 'https://openalex.org/W4206529673', 'https://openalex.org/W4224308101', 'https://openalex.org/W4287391717', 'https://openalex.org/W4288400010', 'https://openalex.org/W4292779060', 'https://openalex.org/W4298149550', 'https://openalex.org/W4300963525', 'https://openalex.org/W4304697835', 'https://openalex.org/W95183648'], 'related_works': ['https://openalex.org/W3204019825', 'https://openalex.org/W2748952813', 'https://openalex.org/W2530322880', 'https://openalex.org/W2390279801', 'https://openalex.org/W2382290278', 'https://openalex.org/W2376932109', 'https://openalex.org/W2358668433', 'https://openalex.org/W2350741829', 'https://openalex.org/W2001405890', 'https://openalex.org/W1596801655'], 'abstract_inverted_index': {'Mikel': [0], 'Artetxe,': [1], 'Shruti': [2], 'Bhosale,': [3], 'Naman': [4], 'Goyal,': [5], 'Todor': [6], 'Mihaylov,': [7], 'Myle': [8], 'Ott,': [9], 'Sam': [10], 'Shleifer,': [11], 'Xi': [12], 'Victoria': [13], 'Lin,': [14], 'Jingfei': [15], 'Du,': [16], 'Srinivasan': [17], 'Iyer,': [18], 'Ramakanth': [19], 'Pasunuru,': [20], 'Giridharan': [21], 'Anantharaman,': [22], 'Xian': [23], 'Li,': [24], 'Shuohui': [25], 'Chen,': [26], 'Halil': [27], 'Akin,': [28], 'Mandeep': [29], 'Baines,': [30], 'Louis': [31], 'Martin,': [32], 'Xing': [33], 'Zhou,': [34], 'Punit': [35], 'Singh': [36], 'Koura,': [37], 'Brian': [38], 'O’Horo,': [39], 'Jeffrey': [40], 'Wang,': [41], 'Luke': [42], 'Zettlemoyer,': [43], 'Mona': [44], 'Diab,': [45], 'Zornitsa': [46], 'Kozareva,': [47], 'Veselin': [48], 'Stoyanov.': [49], 'Proceedings': [50], 'of': [51], 'the': [52], '2022': [53], 'Conference': [54], 'on': [55], 'Empirical': [56], 'Methods': [57], 'in': [58], 'Natural': [59], 'Language': [60], 'Processing.': [61], '2022.': [62]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W4385567093', 'counts_by_year': [{'year': 2024, 'cited_by_count': 5}, {'year': 2023, 'cited_by_count': 22}, {'year': 2022, 'cited_by_count': 5}], 'updated_date': '2024-09-09T04:33:57.195965', 'created_date': '2023-08-05'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works