KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition

Dong Yu; Kaisheng Yao; Hang Su; Gang Li; Frank Seide
{'id': 'https://openalex.org/W1989549063', 'doi': 'https://doi.org/10.1109/icassp.2013.6639201', 'title': 'KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition', 'display_name': 'KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition', 'publication_year': 2013, 'publication_date': '2013-05-01', 'ids': {'openalex': 'https://openalex.org/W1989549063', 'doi': 'https://doi.org/10.1109/icassp.2013.6639201', 'mag': '1989549063'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.1109/icassp.2013.6639201', 'pdf_url': None, 'source': None, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'proceedings-article', 'indexed_in': ['crossref'], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5034476404', 'display_name': 'Dong Yu', 'orcid': 'https://orcid.org/0000-0003-0520-6844'}, 'institutions': [{'id': 'https://openalex.org/I1290206253', 'display_name': 'Microsoft (United States)', 'ror': 'https://ror.org/00d0nc645', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1290206253']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Dong Yu', 'raw_affiliation_strings': ['[Microsoft Research,Redmond,WA,USA]'], 'affiliations': [{'raw_affiliation_string': '[Microsoft Research,Redmond,WA,USA]', 'institution_ids': ['https://openalex.org/I1290206253']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5103119755', 'display_name': 'Kaisheng Yao', 'orcid': 'https://orcid.org/0000-0002-8949-9367'}, 'institutions': [{'id': 'https://openalex.org/I1290206253', 'display_name': 'Microsoft (United States)', 'ror': 'https://ror.org/00d0nc645', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1290206253']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Kaisheng Yao', 'raw_affiliation_strings': ['Online Services Div., Microsoft Corp., Redmond, WA, USA'], 'affiliations': [{'raw_affiliation_string': 'Online Services Div., Microsoft Corp., Redmond, WA, USA', 'institution_ids': ['https://openalex.org/I1290206253']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5100341891', 'display_name': 'Hang Su', 'orcid': 'https://orcid.org/0000-0002-6877-6783'}, 'institutions': [{'id': 'https://openalex.org/I4210113369', 'display_name': 'Microsoft Research Asia (China)', 'ror': 'https://ror.org/0300m5276', 'country_code': 'CN', 'type': 'company', 'lineage': ['https://openalex.org/I1290206253', 'https://openalex.org/I4210113369']}], 'countries': ['CN'], 'is_corresponding': False, 'raw_author_name': 'Hang Su', 'raw_affiliation_strings': ['Microsoft research Asia, Beijing, China'], 'affiliations': [{'raw_affiliation_string': 'Microsoft research Asia, Beijing, China', 'institution_ids': ['https://openalex.org/I4210113369']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5100438769', 'display_name': 'Gang Li', 'orcid': 'https://orcid.org/0000-0003-1583-641X'}, 'institutions': [{'id': 'https://openalex.org/I4210113369', 'display_name': 'Microsoft Research Asia (China)', 'ror': 'https://ror.org/0300m5276', 'country_code': 'CN', 'type': 'company', 'lineage': ['https://openalex.org/I1290206253', 'https://openalex.org/I4210113369']}], 'countries': ['CN'], 'is_corresponding': False, 'raw_author_name': 'Gang Li', 'raw_affiliation_strings': ['Microsoft research Asia, Beijing, China'], 'affiliations': [{'raw_affiliation_string': 'Microsoft research Asia, Beijing, China', 'institution_ids': ['https://openalex.org/I4210113369']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5072932051', 'display_name': 'Frank Seide', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I4210113369', 'display_name': 'Microsoft Research Asia (China)', 'ror': 'https://ror.org/0300m5276', 'country_code': 'CN', 'type': 'company', 'lineage': ['https://openalex.org/I1290206253', 'https://openalex.org/I4210113369']}], 'countries': ['CN'], 'is_corresponding': False, 'raw_author_name': 'Frank Seide', 'raw_affiliation_strings': ['Microsoft research Asia, Beijing, China'], 'affiliations': [{'raw_affiliation_string': 'Microsoft research Asia, Beijing, China', 'institution_ids': ['https://openalex.org/I4210113369']}]}], 'institution_assertions': [], 'countries_distinct_count': 2, 'institutions_distinct_count': 2, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 28.93, 'has_fulltext': True, 'fulltext_origin': 'ngrams', 'cited_by_count': 407, 'citation_normalized_percentile': {'value': 0.999871, 'is_in_top_1_percent': True, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 99, 'max': 100}, 'biblio': {'volume': None, 'issue': None, 'first_page': None, 'last_page': None}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10201', 'display_name': 'Speech Recognition Technology', 'score': 0.9999, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10201', 'display_name': 'Speech Recognition Technology', 'score': 0.9999, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10860', 'display_name': 'Speech Enhancement Techniques', 'score': 0.9988, 'subfield': {'id': 'https://openalex.org/subfields/1711', 'display_name': 'Signal Processing'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11309', 'display_name': 'Audio Signal Classification and Analysis', 'score': 0.9983, 'subfield': {'id': 'https://openalex.org/subfields/1711', 'display_name': 'Signal Processing'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/hidden-markov-models', 'display_name': 'Hidden Markov Models', 'score': 0.564117}, {'id': 'https://openalex.org/keywords/divergence', 'display_name': 'Divergence (linguistics)', 'score': 0.5398881}, {'id': 'https://openalex.org/keywords/backpropagation', 'display_name': 'Backpropagation', 'score': 0.52085906}, {'id': 'https://openalex.org/keywords/kullback–leibler-divergence', 'display_name': 'Kullback–Leibler divergence', 'score': 0.51230776}, {'id': 'https://openalex.org/keywords/regularization', 'display_name': 'Regularization (linguistics)', 'score': 0.48360577}, {'id': 'https://openalex.org/keywords/initialization', 'display_name': 'Initialization', 'score': 0.4403997}, {'id': 'https://openalex.org/keywords/dictation', 'display_name': 'Dictation', 'score': 0.42640814}, {'id': 'https://openalex.org/keywords/deep-neural-networks', 'display_name': 'Deep neural networks', 'score': 0.41024095}], 'concepts': [{'id': 'https://openalex.org/C23224414', 'wikidata': 'https://www.wikidata.org/wiki/Q176769', 'display_name': 'Hidden Markov model', 'level': 2, 'score': 0.8009535}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.72233766}, {'id': 'https://openalex.org/C28490314', 'wikidata': 'https://www.wikidata.org/wiki/Q189436', 'display_name': 'Speech recognition', 'level': 1, 'score': 0.6568474}, {'id': 'https://openalex.org/C50644808', 'wikidata': 'https://www.wikidata.org/wiki/Q192776', 'display_name': 'Artificial neural network', 'level': 2, 'score': 0.56862825}, {'id': 'https://openalex.org/C207390915', 'wikidata': 'https://www.wikidata.org/wiki/Q1230525', 'display_name': 'Divergence (linguistics)', 'level': 2, 'score': 0.5398881}, {'id': 'https://openalex.org/C139807058', 'wikidata': 'https://www.wikidata.org/wiki/Q352374', 'display_name': 'Adaptation (eye)', 'level': 2, 'score': 0.53957963}, {'id': 'https://openalex.org/C155032097', 'wikidata': 'https://www.wikidata.org/wiki/Q798503', 'display_name': 'Backpropagation', 'level': 3, 'score': 0.52085906}, {'id': 'https://openalex.org/C171752962', 'wikidata': 'https://www.wikidata.org/wiki/Q255166', 'display_name': 'Kullback–Leibler divergence', 'level': 2, 'score': 0.51230776}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.5089283}, {'id': 'https://openalex.org/C2776135515', 'wikidata': 'https://www.wikidata.org/wiki/Q17143721', 'display_name': 'Regularization (linguistics)', 'level': 2, 'score': 0.48360577}, {'id': 'https://openalex.org/C114466953', 'wikidata': 'https://www.wikidata.org/wiki/Q6034165', 'display_name': 'Initialization', 'level': 2, 'score': 0.4403997}, {'id': 'https://openalex.org/C2779077324', 'wikidata': 'https://www.wikidata.org/wiki/Q1087138', 'display_name': 'Dictation', 'level': 2, 'score': 0.42640814}, {'id': 'https://openalex.org/C153180895', 'wikidata': 'https://www.wikidata.org/wiki/Q7148389', 'display_name': 'Pattern recognition (psychology)', 'level': 2, 'score': 0.41195387}, {'id': 'https://openalex.org/C2984842247', 'wikidata': 'https://www.wikidata.org/wiki/Q197536', 'display_name': 'Deep neural networks', 'level': 3, 'score': 0.41024095}, {'id': 'https://openalex.org/C138885662', 'wikidata': 'https://www.wikidata.org/wiki/Q5891', 'display_name': 'Philosophy', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C41895202', 'wikidata': 'https://www.wikidata.org/wiki/Q8162', 'display_name': 'Linguistics', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C121332964', 'wikidata': 'https://www.wikidata.org/wiki/Q413', 'display_name': 'Physics', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C120665830', 'wikidata': 'https://www.wikidata.org/wiki/Q14620', 'display_name': 'Optics', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C199360897', 'wikidata': 'https://www.wikidata.org/wiki/Q9143', 'display_name': 'Programming language', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.1109/icassp.2013.6639201', 'pdf_url': None, 'source': None, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [{'id': 'https://metadata.un.org/sdg/4', 'score': 0.54, 'display_name': 'Quality education'}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 35, 'referenced_works': ['https://openalex.org/W1513862252', 'https://openalex.org/W1537275613', 'https://openalex.org/W1722135069', 'https://openalex.org/W185613617', 'https://openalex.org/W1891007208', 'https://openalex.org/W1987238397', 'https://openalex.org/W1993882792', 'https://openalex.org/W2038550700', 'https://openalex.org/W2062164080', 'https://openalex.org/W2076794394', 'https://openalex.org/W2080005694', 'https://openalex.org/W2087006792', 'https://openalex.org/W2095168618', 'https://openalex.org/W2096170322', 'https://openalex.org/W2117824967', 'https://openalex.org/W2130037349', 'https://openalex.org/W2131241448', 'https://openalex.org/W2144792281', 'https://openalex.org/W2147768505', 'https://openalex.org/W2157711090', 'https://openalex.org/W2160306971', 'https://openalex.org/W2160815625', 'https://openalex.org/W2162042984', 'https://openalex.org/W2169434751', 'https://openalex.org/W2173880944', 'https://openalex.org/W217970951', 'https://openalex.org/W2184045248', 'https://openalex.org/W2296748324', 'https://openalex.org/W2394932179', 'https://openalex.org/W2399979637', 'https://openalex.org/W2403195671', 'https://openalex.org/W2941108253', 'https://openalex.org/W2950182411', 'https://openalex.org/W4230820110', 'https://openalex.org/W82936479'], 'related_works': ['https://openalex.org/W4386939572', 'https://openalex.org/W2938137567', 'https://openalex.org/W2558599680', 'https://openalex.org/W2392667367', 'https://openalex.org/W2381835873', 'https://openalex.org/W2381559703', 'https://openalex.org/W2367608902', 'https://openalex.org/W2359823483', 'https://openalex.org/W2353325398', 'https://openalex.org/W2353027778'], 'abstract_inverted_index': {'We': [0, 97], 'propose': [1], 'a': [2, 20, 44], 'novel': [3], 'regularized': [4], 'adaptation': [5, 43, 50, 95, 134, 153, 160], 'technique': [6, 55, 135], 'for': [7], 'context': [8], 'dependent': [9], 'deep': [10], 'neural': [11], 'network': [12], 'hidden': [13, 27], 'Markov': [14], 'models': [15], '(CD-DNN-HMMs).': [16], 'The': [17, 34, 54], 'CD-DNN-HMM': [18, 41, 149], 'has': [19], 'large': [21, 26], 'output': [22], 'layer': [23], 'and': [24, 123, 125, 158], 'many': [25], 'layers,': [28], 'each': [29], 'with': [30], 'thousands': [31], 'of': [32, 37], 'neurons.': [33], 'huge': [35], 'number': [36], 'parameters': [38], 'in': [39, 57, 110], 'the': [40, 49, 61, 66, 71, 80, 94, 107, 111, 132, 143], 'makes': [42], 'challenging': [45], 'task,': [46], 'esp.': [47], 'when': [48], 'set': [51], 'is': [52, 85, 103], 'small.': [53], 'developed': [56], 'this': [58, 101], 'paper': [59], 'adapts': [60], 'model': [62, 73], 'conservatively': [63], 'by': [64, 87], 'forcing': [65], 'senone': [67], 'distribution': [68, 109], 'estimated': [69], 'from': [70, 79], 'adapted': [72], 'to': [74, 77, 93, 105], 'be': [75], 'close': [76], 'that': [78, 99, 131], 'unadapted': [81], 'model.': [82], 'This': [83], 'constraint': [84], 'realized': [86], 'adding': [88], 'Kullback-Leibler': [89], 'divergence': [90], '(KLD)': [91], 'regularization': [92, 102], 'criterion.': [96], 'show': [98], 'applying': [100], 'equivalent': [104], 'changing': [106], 'target': [108], 'conventional': [112], 'backpropagation': [113], 'algorithm.': [114], 'Experiments': [115], 'on': [116], 'Xbox': [117], 'voice': [118], 'search,': [119], 'short': [120], 'message': [121], 'dictation,': [122], 'Switchboard': [124], 'lecture': [126], 'speech': [127], 'transcription': [128], 'tasks': [129], 'demonstrate': [130], 'proposed': [133], 'can': [136], 'provide': [137], '2%-30%': [138], 'relative': [139], 'error': [140], 'reduction': [141], 'against': [142], 'already': [144], 'very': [145], 'strong': [146], 'speaker': [147], 'independent': [148], 'systems': [150], 'using': [151], 'different': [152], 'sets': [154], 'under': [155], 'both': [156], 'supervised': [157], 'unsupervised': [159], 'setups.': [161]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W1989549063', 'counts_by_year': [{'year': 2024, 'cited_by_count': 7}, {'year': 2023, 'cited_by_count': 15}, {'year': 2022, 'cited_by_count': 13}, {'year': 2021, 'cited_by_count': 36}, {'year': 2020, 'cited_by_count': 45}, {'year': 2019, 'cited_by_count': 44}, {'year': 2018, 'cited_by_count': 48}, {'year': 2017, 'cited_by_count': 53}, {'year': 2016, 'cited_by_count': 69}, {'year': 2015, 'cited_by_count': 46}, {'year': 2014, 'cited_by_count': 27}, {'year': 2013, 'cited_by_count': 4}], 'updated_date': '2024-09-20T01:12:05.827130', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works