Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model

Yoshua Bengio; Jean-Sébastien Senécal
{'id': 'https://openalex.org/W2152808281', 'doi': 'https://doi.org/10.1109/tnn.2007.912312', 'title': 'Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model', 'display_name': 'Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model', 'publication_year': 2008, 'publication_date': '2008-04-01', 'ids': {'openalex': 'https://openalex.org/W2152808281', 'doi': 'https://doi.org/10.1109/tnn.2007.912312', 'mag': '2152808281', 'pmid': 'https://pubmed.ncbi.nlm.nih.gov/18390314'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.1109/tnn.2007.912312', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S42080949', 'display_name': 'IEEE Transactions on Neural Networks', 'issn_l': '1045-9227', 'issn': ['1045-9227', '1941-0093'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310319808', 'host_organization_name': 'Institute of Electrical and Electronics Engineers', 'host_organization_lineage': ['https://openalex.org/P4310319808'], 'host_organization_lineage_names': ['Institute of Electrical and Electronics Engineers'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'journal-article', 'indexed_in': ['crossref', 'pubmed'], 'open_access': {'is_oa': True, 'oa_status': 'green', 'oa_url': 'http://publications.idiap.ch/downloads/reports/2003/rr-03-35.pdf', 'any_repository_has_fulltext': True}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5086198262', 'display_name': 'Yoshua Bengio', 'orcid': 'https://orcid.org/0000-0002-9322-3515'}, 'institutions': [{'id': 'https://openalex.org/I70931966', 'display_name': 'Université de Montréal', 'ror': 'https://ror.org/0161xgx34', 'country_code': 'CA', 'type': 'education', 'lineage': ['https://openalex.org/I70931966']}], 'countries': ['CA'], 'is_corresponding': False, 'raw_author_name': 'Y Bengio', 'raw_affiliation_strings': ['Department of IRO, Universite de Montreal, Montreal, Canada. [email protected]'], 'affiliations': [{'raw_affiliation_string': 'Department of IRO, Universite de Montreal, Montreal, Canada. [email protected]', 'institution_ids': ['https://openalex.org/I70931966']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5060796682', 'display_name': 'Jean-Sébastien Senécal', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'J S Senecal', 'raw_affiliation_strings': [], 'affiliations': []}], 'countries_distinct_count': 1, 'institutions_distinct_count': 1, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 2.986, 'has_fulltext': True, 'fulltext_origin': 'ngrams', 'cited_by_count': 223, 'citation_normalized_percentile': {'value': 0.973009, 'is_in_top_1_percent': False, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 98, 'max': 99}, 'biblio': {'volume': '19', 'issue': '4', 'first_page': '713', 'last_page': '722'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10028', 'display_name': 'Natural Language Processing', 'score': 0.9998, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10028', 'display_name': 'Natural Language Processing', 'score': 0.9998, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10181', 'display_name': 'Statistical Machine Translation and Natural Language Processing', 'score': 0.9998, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10201', 'display_name': 'Speech Recognition Technology', 'score': 0.9968, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/language-modeling', 'display_name': 'Language Modeling', 'score': 0.582845}, {'id': 'https://openalex.org/keywords/speedup', 'display_name': 'Speedup', 'score': 0.5731248}, {'id': 'https://openalex.org/keywords/statistical-language-modeling', 'display_name': 'Statistical Language Modeling', 'score': 0.566241}, {'id': 'https://openalex.org/keywords/topic-modeling', 'display_name': 'Topic Modeling', 'score': 0.550935}, {'id': 'https://openalex.org/keywords/neural-machine-translation', 'display_name': 'Neural Machine Translation', 'score': 0.533755}, {'id': 'https://openalex.org/keywords/syntax-based-translation-models', 'display_name': 'Syntax-based Translation Models', 'score': 0.525583}, {'id': 'https://openalex.org/keywords/feedforward-neural-network', 'display_name': 'Feedforward neural network', 'score': 0.50372}, {'id': 'https://openalex.org/keywords/backpropagation', 'display_name': 'Backpropagation', 'score': 0.415604}], 'concepts': [{'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.7686688}, {'id': 'https://openalex.org/C137293760', 'wikidata': 'https://www.wikidata.org/wiki/Q3621696', 'display_name': 'Language model', 'level': 2, 'score': 0.74103963}, {'id': 'https://openalex.org/C50644808', 'wikidata': 'https://www.wikidata.org/wiki/Q192776', 'display_name': 'Artificial neural network', 'level': 2, 'score': 0.67380947}, {'id': 'https://openalex.org/C134342201', 'wikidata': 'https://www.wikidata.org/wiki/Q7246859', 'display_name': 'Probabilistic neural network', 'level': 4, 'score': 0.58749896}, {'id': 'https://openalex.org/C2777601683', 'wikidata': 'https://www.wikidata.org/wiki/Q6499736', 'display_name': 'Vocabulary', 'level': 2, 'score': 0.5844277}, {'id': 'https://openalex.org/C68339613', 'wikidata': 'https://www.wikidata.org/wiki/Q1549489', 'display_name': 'Speedup', 'level': 2, 'score': 0.5731248}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.54921377}, {'id': 'https://openalex.org/C49937458', 'wikidata': 'https://www.wikidata.org/wiki/Q2599292', 'display_name': 'Probabilistic logic', 'level': 2, 'score': 0.53307724}, {'id': 'https://openalex.org/C45374587', 'wikidata': 'https://www.wikidata.org/wiki/Q12525525', 'display_name': 'Computation', 'level': 2, 'score': 0.52446985}, {'id': 'https://openalex.org/C47702885', 'wikidata': 'https://www.wikidata.org/wiki/Q5441227', 'display_name': 'Feedforward neural network', 'level': 3, 'score': 0.50372}, {'id': 'https://openalex.org/C140779682', 'wikidata': 'https://www.wikidata.org/wiki/Q210868', 'display_name': 'Sampling (signal processing)', 'level': 3, 'score': 0.45854098}, {'id': 'https://openalex.org/C119857082', 'wikidata': 'https://www.wikidata.org/wiki/Q2539', 'display_name': 'Machine learning', 'level': 1, 'score': 0.44980657}, {'id': 'https://openalex.org/C114289077', 'wikidata': 'https://www.wikidata.org/wiki/Q3284399', 'display_name': 'Statistical model', 'level': 2, 'score': 0.43544757}, {'id': 'https://openalex.org/C175202392', 'wikidata': 'https://www.wikidata.org/wiki/Q2434543', 'display_name': 'Time delay neural network', 'level': 3, 'score': 0.42704317}, {'id': 'https://openalex.org/C155032097', 'wikidata': 'https://www.wikidata.org/wiki/Q798503', 'display_name': 'Backpropagation', 'level': 3, 'score': 0.415604}, {'id': 'https://openalex.org/C11413529', 'wikidata': 'https://www.wikidata.org/wiki/Q8366', 'display_name': 'Algorithm', 'level': 1, 'score': 0.30987555}, {'id': 'https://openalex.org/C41895202', 'wikidata': 'https://www.wikidata.org/wiki/Q8162', 'display_name': 'Linguistics', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C138885662', 'wikidata': 'https://www.wikidata.org/wiki/Q5891', 'display_name': 'Philosophy', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C106131492', 'wikidata': 'https://www.wikidata.org/wiki/Q3072260', 'display_name': 'Filter (signal processing)', 'level': 2, 'score': 0.0}, {'id': 'https://openalex.org/C31972630', 'wikidata': 'https://www.wikidata.org/wiki/Q844240', 'display_name': 'Computer vision', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C111919701', 'wikidata': 'https://www.wikidata.org/wiki/Q9135', 'display_name': 'Operating system', 'level': 1, 'score': 0.0}], 'mesh': [{'descriptor_ui': 'D007802', 'descriptor_name': 'Language', 'qualifier_ui': '', 'qualifier_name': None, 'is_major_topic': True}, {'descriptor_ui': 'D015233', 'descriptor_name': 'Models, Statistical', 'qualifier_ui': '', 'qualifier_name': None, 'is_major_topic': True}, {'descriptor_ui': 'D016571', 'descriptor_name': 'Neural Networks, Computer', 'qualifier_ui': '', 'qualifier_name': None, 'is_major_topic': True}, {'descriptor_ui': 'D011381', 'descriptor_name': 'Programming Languages', 'qualifier_ui': '', 'qualifier_name': None, 'is_major_topic': True}, {'descriptor_ui': 'D003198', 'descriptor_name': 'Computer Simulation', 'qualifier_ui': '', 'qualifier_name': None, 'is_major_topic': False}, {'descriptor_ui': 'D006801', 'descriptor_name': 'Humans', 'qualifier_ui': '', 'qualifier_name': None, 'is_major_topic': False}, {'descriptor_ui': 'D008390', 'descriptor_name': 'Markov Chains', 'qualifier_ui': '', 'qualifier_name': None, 'is_major_topic': False}], 'locations_count': 4, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.1109/tnn.2007.912312', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S42080949', 'display_name': 'IEEE Transactions on Neural Networks', 'issn_l': '1045-9227', 'issn': ['1045-9227', '1941-0093'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310319808', 'host_organization_name': 'Institute of Electrical and Electronics Engineers', 'host_organization_lineage': ['https://openalex.org/P4310319808'], 'host_organization_lineage_names': ['Institute of Electrical and Electronics Engineers'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, {'is_oa': True, 'landing_page_url': 'http://publications.idiap.ch/downloads/reports/2003/rr-03-35.pdf', 'pdf_url': 'http://publications.idiap.ch/downloads/reports/2003/rr-03-35.pdf', 'source': {'id': 'https://openalex.org/S4306400487', 'display_name': 'Infoscience (Ecole Polytechnique Fédérale de Lausanne)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}, {'is_oa': True, 'landing_page_url': 'http://infoscience.epfl.ch/record/82914', 'pdf_url': 'https://infoscience.epfl.ch/record/82914/files/rr-03-35.pdf', 'source': {'id': 'https://openalex.org/S4306400488', 'display_name': 'Infoscience (Ecole Polytechnique Fédérale de Lausanne)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}, {'is_oa': False, 'landing_page_url': 'https://pubmed.ncbi.nlm.nih.gov/18390314', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306525036', 'display_name': 'PubMed', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I1299303238', 'host_organization_name': 'National Institutes of Health', 'host_organization_lineage': ['https://openalex.org/I1299303238'], 'host_organization_lineage_names': ['National Institutes of Health'], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': {'is_oa': True, 'landing_page_url': 'http://publications.idiap.ch/downloads/reports/2003/rr-03-35.pdf', 'pdf_url': 'http://publications.idiap.ch/downloads/reports/2003/rr-03-35.pdf', 'source': {'id': 'https://openalex.org/S4306400487', 'display_name': 'Infoscience (Ecole Polytechnique Fédérale de Lausanne)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}, 'sustainable_development_goals': [{'score': 0.83, 'id': 'https://metadata.un.org/sdg/4', 'display_name': 'Quality education'}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 38, 'referenced_works': ['https://openalex.org/W10704533', 'https://openalex.org/W141956881', 'https://openalex.org/W145476170', 'https://openalex.org/W1547224907', 'https://openalex.org/W1574901103', 'https://openalex.org/W1597533204', 'https://openalex.org/W1606274310', 'https://openalex.org/W180232814', 'https://openalex.org/W1802356529', 'https://openalex.org/W1934041838', 'https://openalex.org/W1966849089', 'https://openalex.org/W1974413746', 'https://openalex.org/W1978861152', 'https://openalex.org/W1984635093', 'https://openalex.org/W1985093013', 'https://openalex.org/W2015093644', 'https://openalex.org/W2056590938', 'https://openalex.org/W2057487175', 'https://openalex.org/W2069739265', 'https://openalex.org/W2079182758', 'https://openalex.org/W2089657469', 'https://openalex.org/W2096175520', 'https://openalex.org/W2100506586', 'https://openalex.org/W2106346128', 'https://openalex.org/W2111305191', 'https://openalex.org/W2116064496', 'https://openalex.org/W2132339004', 'https://openalex.org/W2134237567', 'https://openalex.org/W2140679639', 'https://openalex.org/W2143866356', 'https://openalex.org/W2150344704', 'https://openalex.org/W2154391726', 'https://openalex.org/W2999905431', 'https://openalex.org/W3127711384', 'https://openalex.org/W3140710042', 'https://openalex.org/W3207021134', 'https://openalex.org/W4230436525', 'https://openalex.org/W4251232855'], 'related_works': ['https://openalex.org/W2950917560', 'https://openalex.org/W2950022897', 'https://openalex.org/W2785001934', 'https://openalex.org/W2408618716', 'https://openalex.org/W2390775476', 'https://openalex.org/W2186233897', 'https://openalex.org/W2114473615', 'https://openalex.org/W2104714048', 'https://openalex.org/W2089093251', 'https://openalex.org/W1595652908'], 'abstract_inverted_index': {'Previous': [0], 'work': [1], 'on': [2, 37, 106], 'statistical': [3], 'language': [4], 'modeling': [5], 'has': [6], 'shown': [7], 'that': [8, 98], 'it': [9], 'is': [10, 79], 'possible': [11], 'to': [12, 18, 32, 52, 71, 80, 86], 'train': [13], 'a': [14, 69, 99], 'feedforward': [15], 'neural': [16, 42, 94], 'network': [17, 43], 'approximate': [19], 'probabilities': [20], 'over': [21], 'sequences': [22], 'of': [23, 55, 74], 'words,': [24], 'resulting': [25], 'in': [26, 57], 'significant': [27, 101], 'error': [28], 'reduction': [29], 'when': [30], 'compared': [31], 'standard': [33, 107], 'baseline': [34], 'models': [35], 'based': [36], 'n-grams.': [38], 'However,': [39], 'training': [40, 73], 'the': [41, 46, 53, 58, 75, 88, 93], 'model': [44, 85], 'with': [45], 'maximum-likelihood': [47], 'criterion': [48], 'requires': [49], 'computations': [50], 'proportional': [51], 'number': [54], 'words': [56], 'vocabulary.': [59], 'In': [60], 'this': [61], 'paper,': [62], 'we': [63], 'introduce': [64], 'adaptive': [65, 83], 'importance': [66], 'sampling': [67], 'as': [68], 'way': [70], 'accelerate': [72], 'model.': [76], 'The': [77], 'idea': [78], 'use': [81], 'an': [82], 'n-gram': [84], 'track': [87], 'conditional': [89], 'distributions': [90], 'produced': [91], 'by': [92], 'network.': [95], 'We': [96], 'show': [97], 'very': [100], 'speedup': [102], 'can': [103], 'be': [104], 'obtained': [105], 'problems.': [108]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W2152808281', 'counts_by_year': [{'year': 2024, 'cited_by_count': 4}, {'year': 2023, 'cited_by_count': 20}, {'year': 2022, 'cited_by_count': 16}, {'year': 2021, 'cited_by_count': 21}, {'year': 2020, 'cited_by_count': 40}, {'year': 2019, 'cited_by_count': 33}, {'year': 2018, 'cited_by_count': 20}, {'year': 2017, 'cited_by_count': 15}, {'year': 2016, 'cited_by_count': 14}, {'year': 2015, 'cited_by_count': 12}, {'year': 2014, 'cited_by_count': 7}, {'year': 2013, 'cited_by_count': 6}, {'year': 2012, 'cited_by_count': 3}], 'updated_date': '2024-09-13T10:59:04.727608', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works