VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Name: Work Video:
Duration: 3 min 30 s
Quan Wang; Hannah Muckenhirn; Kevin Wilson; Prashant Sridhar; Zelin Wu; John R. Hershey; Rif A. Saurous; Ron J. Weiss; Jia Ye; Ignacio López Moreno
{'id': 'https://openalex.org/W2973062255', 'doi': 'https://doi.org/10.21437/interspeech.2019-1101', 'title': 'VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking', 'display_name': 'VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking', 'publication_year': 2019, 'publication_date': '2019-09-13', 'ids': {'openalex': 'https://openalex.org/W2973062255', 'doi': 'https://doi.org/10.21437/interspeech.2019-1101', 'mag': '2973062255'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.21437/interspeech.2019-1101', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4363604309', 'display_name': 'Interspeech 2022', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'conference'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'proceedings-article', 'indexed_in': ['crossref'], 'open_access': {'is_oa': True, 'oa_status': 'green', 'oa_url': 'https://arxiv.org/pdf/1810.04826', 'any_repository_has_fulltext': True}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5100418252', 'display_name': 'Quan Wang', 'orcid': 'https://orcid.org/0000-0002-2508-6301'}, 'institutions': [{'id': 'https://openalex.org/I1291425158', 'display_name': 'Google (United States)', 'ror': 'https://ror.org/00njsd438', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1291425158', 'https://openalex.org/I4210128969']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Quan Wang', 'raw_affiliation_strings': ['Google Inc., USA'], 'affiliations': [{'raw_affiliation_string': 'Google Inc., USA', 'institution_ids': ['https://openalex.org/I1291425158']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5047091215', 'display_name': 'Hannah Muckenhirn', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I5124864', 'display_name': 'École Polytechnique Fédérale de Lausanne', 'ror': 'https://ror.org/02s376052', 'country_code': 'CH', 'type': 'education', 'lineage': ['https://openalex.org/I2799323385', 'https://openalex.org/I5124864']}, {'id': 'https://openalex.org/I7495430', 'display_name': 'Idiap Research Institute', 'ror': 'https://ror.org/05932h694', 'country_code': 'CH', 'type': 'facility', 'lineage': ['https://openalex.org/I7495430']}], 'countries': ['CH'], 'is_corresponding': False, 'raw_author_name': 'Hannah Muckenhirn', 'raw_affiliation_strings': ['EPFL, Switzerland', 'Idiap Research Institute, Switzerland'], 'affiliations': [{'raw_affiliation_string': 'EPFL, Switzerland', 'institution_ids': ['https://openalex.org/I5124864']}, {'raw_affiliation_string': 'Idiap Research Institute, Switzerland', 'institution_ids': ['https://openalex.org/I7495430']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5087313939', 'display_name': 'Kevin Wilson', 'orcid': 'https://orcid.org/0000-0001-9141-2219'}, 'institutions': [{'id': 'https://openalex.org/I1291425158', 'display_name': 'Google (United States)', 'ror': 'https://ror.org/00njsd438', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1291425158', 'https://openalex.org/I4210128969']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Kevin Wilson', 'raw_affiliation_strings': ['Google Inc., USA'], 'affiliations': [{'raw_affiliation_string': 'Google Inc., USA', 'institution_ids': ['https://openalex.org/I1291425158']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5006917911', 'display_name': 'Prashant Sridhar', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I1291425158', 'display_name': 'Google (United States)', 'ror': 'https://ror.org/00njsd438', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1291425158', 'https://openalex.org/I4210128969']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Prashant Sridhar', 'raw_affiliation_strings': ['Google Inc., USA'], 'affiliations': [{'raw_affiliation_string': 'Google Inc., USA', 'institution_ids': ['https://openalex.org/I1291425158']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5035558261', 'display_name': 'Zelin Wu', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I1291425158', 'display_name': 'Google (United States)', 'ror': 'https://ror.org/00njsd438', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1291425158', 'https://openalex.org/I4210128969']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Zelin Wu', 'raw_affiliation_strings': ['Google Inc., USA'], 'affiliations': [{'raw_affiliation_string': 'Google Inc., USA', 'institution_ids': ['https://openalex.org/I1291425158']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5112763337', 'display_name': 'John R. Hershey', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I1291425158', 'display_name': 'Google (United States)', 'ror': 'https://ror.org/00njsd438', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1291425158', 'https://openalex.org/I4210128969']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'John R. Hershey', 'raw_affiliation_strings': ['Google Inc., USA'], 'affiliations': [{'raw_affiliation_string': 'Google Inc., USA', 'institution_ids': ['https://openalex.org/I1291425158']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5026650852', 'display_name': 'Rif A. Saurous', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I1291425158', 'display_name': 'Google (United States)', 'ror': 'https://ror.org/00njsd438', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1291425158', 'https://openalex.org/I4210128969']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Rif A. Saurous', 'raw_affiliation_strings': ['Google Inc., USA'], 'affiliations': [{'raw_affiliation_string': 'Google Inc., USA', 'institution_ids': ['https://openalex.org/I1291425158']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5103273436', 'display_name': 'Ron J. Weiss', 'orcid': 'https://orcid.org/0000-0003-2010-4053'}, 'institutions': [{'id': 'https://openalex.org/I1291425158', 'display_name': 'Google (United States)', 'ror': 'https://ror.org/00njsd438', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1291425158', 'https://openalex.org/I4210128969']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Ron J. Weiss', 'raw_affiliation_strings': ['Google Inc., USA'], 'affiliations': [{'raw_affiliation_string': 'Google Inc., USA', 'institution_ids': ['https://openalex.org/I1291425158']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5102792436', 'display_name': 'Jia Ye', 'orcid': 'https://orcid.org/0000-0002-8000-4911'}, 'institutions': [{'id': 'https://openalex.org/I1291425158', 'display_name': 'Google (United States)', 'ror': 'https://ror.org/00njsd438', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1291425158', 'https://openalex.org/I4210128969']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Ye Jia', 'raw_affiliation_strings': ['Google Inc., USA'], 'affiliations': [{'raw_affiliation_string': 'Google Inc., USA', 'institution_ids': ['https://openalex.org/I1291425158']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5050898122', 'display_name': 'Ignacio López Moreno', 'orcid': 'https://orcid.org/0000-0002-0900-3473'}, 'institutions': [{'id': 'https://openalex.org/I1291425158', 'display_name': 'Google (United States)', 'ror': 'https://ror.org/00njsd438', 'country_code': 'US', 'type': 'company', 'lineage': ['https://openalex.org/I1291425158', 'https://openalex.org/I4210128969']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Ignacio Lopez Moreno', 'raw_affiliation_strings': ['Google Inc., USA'], 'affiliations': [{'raw_affiliation_string': 'Google Inc., USA', 'institution_ids': ['https://openalex.org/I1291425158']}]}], 'institution_assertions': [], 'countries_distinct_count': 2, 'institutions_distinct_count': 3, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': None, 'has_fulltext': True, 'fulltext_origin': 'pdf', 'cited_by_count': 293, 'citation_normalized_percentile': {'value': 0.999225, 'is_in_top_1_percent': True, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 99, 'max': 100}, 'biblio': {'volume': None, 'issue': None, 'first_page': None, 'last_page': None}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10860', 'display_name': 'Speech and Audio Processing', 'score': 0.9995, 'subfield': {'id': 'https://openalex.org/subfields/1711', 'display_name': 'Signal Processing'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10860', 'display_name': 'Speech and Audio Processing', 'score': 0.9995, 'subfield': {'id': 'https://openalex.org/subfields/1711', 'display_name': 'Signal Processing'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10201', 'display_name': 'Speech Recognition and Synthesis', 'score': 0.9974, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10901', 'display_name': 'Advanced Data Compression Techniques', 'score': 0.9606, 'subfield': {'id': 'https://openalex.org/subfields/1707', 'display_name': 'Computer Vision and Pattern Recognition'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/spectrogram', 'display_name': 'Spectrogram', 'score': 0.9072123}], 'concepts': [{'id': 'https://openalex.org/C45273575', 'wikidata': 'https://www.wikidata.org/wiki/Q578970', 'display_name': 'Spectrogram', 'level': 2, 'score': 0.9072123}, {'id': 'https://openalex.org/C28490314', 'wikidata': 'https://www.wikidata.org/wiki/Q189436', 'display_name': 'Speech recognition', 'level': 1, 'score': 0.71038246}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.56449616}, {'id': 'https://openalex.org/C2777402240', 'wikidata': 'https://www.wikidata.org/wiki/Q6783436', 'display_name': 'Masking (illustration)', 'level': 2, 'score': 0.5515633}, {'id': 'https://openalex.org/C142362112', 'wikidata': 'https://www.wikidata.org/wiki/Q735', 'display_name': 'Art', 'level': 0, 'score': 0.06668311}, {'id': 'https://openalex.org/C153349607', 'wikidata': 'https://www.wikidata.org/wiki/Q36649', 'display_name': 'Visual arts', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 2, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.21437/interspeech.2019-1101', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4363604309', 'display_name': 'Interspeech 2022', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'conference'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, {'is_oa': True, 'landing_page_url': 'https://arxiv.org/abs/1810.04826', 'pdf_url': 'https://arxiv.org/pdf/1810.04826', 'source': {'id': 'https://openalex.org/S4306400194', 'display_name': 'arXiv (Cornell University)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I205783295', 'host_organization_name': 'Cornell University', 'host_organization_lineage': ['https://openalex.org/I205783295'], 'host_organization_lineage_names': ['Cornell University'], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}], 'best_oa_location': {'is_oa': True, 'landing_page_url': 'https://arxiv.org/abs/1810.04826', 'pdf_url': 'https://arxiv.org/pdf/1810.04826', 'source': {'id': 'https://openalex.org/S4306400194', 'display_name': 'arXiv (Cornell University)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I205783295', 'host_organization_name': 'Cornell University', 'host_organization_lineage': ['https://openalex.org/I205783295'], 'host_organization_lineage_names': ['Cornell University'], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': 'submittedVersion', 'is_accepted': False, 'is_published': False}, 'sustainable_development_goals': [{'id': 'https://metadata.un.org/sdg/10', 'score': 0.73, 'display_name': 'Reduced inequalities'}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 21, 'referenced_works': ['https://openalex.org/W1494198834', 'https://openalex.org/W2127851351', 'https://openalex.org/W2221409856', 'https://openalex.org/W2460742184', 'https://openalex.org/W2527729766', 'https://openalex.org/W2545177271', 'https://openalex.org/W2558649592', 'https://openalex.org/W2612601884', 'https://openalex.org/W2726515241', 'https://openalex.org/W2749510669', 'https://openalex.org/W2787692317', 'https://openalex.org/W2808631503', 'https://openalex.org/W2808706139', 'https://openalex.org/W2891833136', 'https://openalex.org/W2896538040', 'https://openalex.org/W2936184970', 'https://openalex.org/W2963470929', 'https://openalex.org/W2963902628', 'https://openalex.org/W2964171275', 'https://openalex.org/W2964238697', 'https://openalex.org/W3123318516'], 'related_works': ['https://openalex.org/W4375868962', 'https://openalex.org/W2973062255', 'https://openalex.org/W2899084033', 'https://openalex.org/W2897924318', 'https://openalex.org/W2748952813', 'https://openalex.org/W2530685530', 'https://openalex.org/W2138997758', 'https://openalex.org/W2088854863', 'https://openalex.org/W2065606036', 'https://openalex.org/W2011227383'], 'abstract_inverted_index': {'In': [0], 'this': [1, 31], 'paper,': [2], 'we': [3], 'present': [4], 'a': [5, 13, 23, 63], 'novel': [6], 'system': [7, 65], 'that': [8, 43, 51], 'separates': [9], 'the': [10, 27, 68], 'voice': [11], 'of': [12, 22], 'target': [14, 28], 'speaker': [15, 40, 57], 'from': [16, 26], 'multi-speaker': [17, 73], 'signals,': [18, 74], 'by': [19, 32], 'making': [20], 'use': [21], 'reference': [24], 'signal': [25], 'speaker.We': [29], 'achieve': [30], 'training': [33], 'two': [34], 'separate': [35], 'neural': [36], 'networks:': [37], '(1)': [38], 'A': [39, 47], 'recognition': [41, 70], 'network': [42, 50], 'produces': [44, 62], 'speaker-discriminative': [45], 'embeddings;(2)': [46], 'spectrogram': [48, 55], 'masking': [49], 'takes': [52], 'both': [53], 'noisy': [54], 'and': [56, 61], 'embedding': [58], 'as': [59], 'input,': [60], 'mask.Our': [64], 'significantly': [66], 'reduces': [67], 'speech': [69], 'WER': [71, 77], 'on': [72, 79], 'with': [75], 'minimal': [76], 'degradation': [78], 'single-speaker': [80], 'signals.': [81]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W2973062255', 'counts_by_year': [{'year': 2024, 'cited_by_count': 32}, {'year': 2023, 'cited_by_count': 54}, {'year': 2022, 'cited_by_count': 45}, {'year': 2021, 'cited_by_count': 75}, {'year': 2020, 'cited_by_count': 46}, {'year': 2019, 'cited_by_count': 27}], 'updated_date': '2024-12-23T04:13:02.441107', 'created_date': '2019-09-19'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works