Language Set Identification in Noisy Synthetic Multilingual Documents

Tommi Jauhiainen; Krister Lindén; Heidi Jauhiainen
{'id': 'https://openalex.org/W851785710', 'doi': 'https://doi.org/10.1007/978-3-319-18111-0_48', 'title': 'Language Set Identification in Noisy Synthetic Multilingual Documents', 'display_name': 'Language Set Identification in Noisy Synthetic Multilingual Documents', 'publication_year': 2015, 'publication_date': '2015-01-01', 'ids': {'openalex': 'https://openalex.org/W851785710', 'doi': 'https://doi.org/10.1007/978-3-319-18111-0_48', 'mag': '851785710'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.1007/978-3-319-18111-0_48', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S106296714', 'display_name': 'Lecture notes in computer science', 'issn_l': '0302-9743', 'issn': ['0302-9743', '1611-3349'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310319900', 'host_organization_name': 'Springer Science+Business Media', 'host_organization_lineage': ['https://openalex.org/P4310319965', 'https://openalex.org/P4310319900'], 'host_organization_lineage_names': ['Springer Nature', 'Springer Science+Business Media'], 'type': 'book series'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'book-chapter', 'type_crossref': 'book-chapter', 'indexed_in': ['crossref'], 'open_access': {'is_oa': True, 'oa_status': 'green', 'oa_url': 'https://helda.helsinki.fi/bitstream/10138/159361/1/CICLing2015.pdf', 'any_repository_has_fulltext': True}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5065410247', 'display_name': 'Tommi Jauhiainen', 'orcid': 'https://orcid.org/0000-0002-6474-3570'}, 'institutions': [{'id': 'https://openalex.org/I133731052', 'display_name': 'University of Helsinki', 'ror': 'https://ror.org/040af2s02', 'country_code': 'FI', 'type': 'education', 'lineage': ['https://openalex.org/I133731052']}], 'countries': ['FI'], 'is_corresponding': True, 'raw_author_name': 'Tommi Jauhiainen', 'raw_affiliation_strings': ['The University of Helsinki Department of Modern Languages, Helsinki, Finland'], 'affiliations': [{'raw_affiliation_string': 'The University of Helsinki Department of Modern Languages, Helsinki, Finland', 'institution_ids': ['https://openalex.org/I133731052']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5001408607', 'display_name': 'Krister Lindén', 'orcid': 'https://orcid.org/0000-0003-2337-303X'}, 'institutions': [{'id': 'https://openalex.org/I133731052', 'display_name': 'University of Helsinki', 'ror': 'https://ror.org/040af2s02', 'country_code': 'FI', 'type': 'education', 'lineage': ['https://openalex.org/I133731052']}], 'countries': ['FI'], 'is_corresponding': False, 'raw_author_name': 'Krister Lindén', 'raw_affiliation_strings': ['The University of Helsinki Department of Modern Languages, Helsinki, Finland'], 'affiliations': [{'raw_affiliation_string': 'The University of Helsinki Department of Modern Languages, Helsinki, Finland', 'institution_ids': ['https://openalex.org/I133731052']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5073038710', 'display_name': 'Heidi Jauhiainen', 'orcid': 'https://orcid.org/0000-0002-8227-5627'}, 'institutions': [{'id': 'https://openalex.org/I133731052', 'display_name': 'University of Helsinki', 'ror': 'https://ror.org/040af2s02', 'country_code': 'FI', 'type': 'education', 'lineage': ['https://openalex.org/I133731052']}], 'countries': ['FI'], 'is_corresponding': False, 'raw_author_name': 'Heidi Jauhiainen', 'raw_affiliation_strings': ['The University of Helsinki Department of Modern Languages, Helsinki, Finland'], 'affiliations': [{'raw_affiliation_string': 'The University of Helsinki Department of Modern Languages, Helsinki, Finland', 'institution_ids': ['https://openalex.org/I133731052']}]}], 'countries_distinct_count': 1, 'institutions_distinct_count': 1, 'corresponding_author_ids': ['https://openalex.org/A5065410247'], 'corresponding_institution_ids': ['https://openalex.org/I133731052'], 'apc_list': {'value': 5000, 'currency': 'EUR', 'value_usd': 5392, 'provenance': 'doaj'}, 'apc_paid': None, 'fwci': 2.255, 'has_fulltext': False, 'cited_by_count': 17, 'citation_normalized_percentile': {'value': 0.907909, 'is_in_top_1_percent': False, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 90, 'max': 91}, 'biblio': {'volume': None, 'issue': None, 'first_page': '633', 'last_page': '643'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10181', 'display_name': 'Statistical Machine Translation and Natural Language Processing', 'score': 0.9994, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10181', 'display_name': 'Statistical Machine Translation and Natural Language Processing', 'score': 0.9994, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T12380', 'display_name': 'Authorship Attribution and User Profiling in Text', 'score': 0.9953, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T13629', 'display_name': 'Automatic Text Simplification and Readability Assessment', 'score': 0.9701, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/identification', 'display_name': 'Identification (biology)', 'score': 0.76372}, {'id': 'https://openalex.org/keywords/language-identification', 'display_name': 'Language identification', 'score': 0.57423806}, {'id': 'https://openalex.org/keywords/complex-word-identification', 'display_name': 'Complex Word Identification', 'score': 0.543377}, {'id': 'https://openalex.org/keywords/natural-language-processing', 'display_name': 'Natural Language Processing', 'score': 0.51289}], 'concepts': [{'id': 'https://openalex.org/C154504017', 'wikidata': 'https://www.wikidata.org/wiki/Q853614', 'display_name': 'Identifier', 'level': 2, 'score': 0.93318295}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.90408546}, {'id': 'https://openalex.org/C116834253', 'wikidata': 'https://www.wikidata.org/wiki/Q2039217', 'display_name': 'Identification (biology)', 'level': 2, 'score': 0.76372}, {'id': 'https://openalex.org/C204321447', 'wikidata': 'https://www.wikidata.org/wiki/Q30642', 'display_name': 'Natural language processing', 'level': 1, 'score': 0.69448984}, {'id': 'https://openalex.org/C177264268', 'wikidata': 'https://www.wikidata.org/wiki/Q1514741', 'display_name': 'Set (abstract data type)', 'level': 2, 'score': 0.6471578}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.58226}, {'id': 'https://openalex.org/C129792486', 'wikidata': 'https://www.wikidata.org/wiki/Q1050419', 'display_name': 'Language identification', 'level': 3, 'score': 0.57423806}, {'id': 'https://openalex.org/C23123220', 'wikidata': 'https://www.wikidata.org/wiki/Q816826', 'display_name': 'Information retrieval', 'level': 1, 'score': 0.41453403}, {'id': 'https://openalex.org/C195324797', 'wikidata': 'https://www.wikidata.org/wiki/Q33742', 'display_name': 'Natural language', 'level': 2, 'score': 0.29421318}, {'id': 'https://openalex.org/C199360897', 'wikidata': 'https://www.wikidata.org/wiki/Q9143', 'display_name': 'Programming language', 'level': 1, 'score': 0.19207218}, {'id': 'https://openalex.org/C59822182', 'wikidata': 'https://www.wikidata.org/wiki/Q441', 'display_name': 'Botany', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C86803240', 'wikidata': 'https://www.wikidata.org/wiki/Q420', 'display_name': 'Biology', 'level': 0, 'score': 0.0}], 'mesh': [], 'locations_count': 2, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.1007/978-3-319-18111-0_48', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S106296714', 'display_name': 'Lecture notes in computer science', 'issn_l': '0302-9743', 'issn': ['0302-9743', '1611-3349'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310319900', 'host_organization_name': 'Springer Science+Business Media', 'host_organization_lineage': ['https://openalex.org/P4310319965', 'https://openalex.org/P4310319900'], 'host_organization_lineage_names': ['Springer Nature', 'Springer Science+Business Media'], 'type': 'book series'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, {'is_oa': True, 'landing_page_url': 'http://hdl.handle.net/10138/159361', 'pdf_url': 'https://helda.helsinki.fi/bitstream/10138/159361/1/CICLing2015.pdf', 'source': {'id': 'https://openalex.org/S4306401476', 'display_name': 'Helda (University of Helsinki)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I133731052', 'host_organization_name': 'University of Helsinki', 'host_organization_lineage': ['https://openalex.org/I133731052'], 'host_organization_lineage_names': ['University of Helsinki'], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': 'acceptedVersion', 'is_accepted': True, 'is_published': False}], 'best_oa_location': {'is_oa': True, 'landing_page_url': 'http://hdl.handle.net/10138/159361', 'pdf_url': 'https://helda.helsinki.fi/bitstream/10138/159361/1/CICLing2015.pdf', 'source': {'id': 'https://openalex.org/S4306401476', 'display_name': 'Helda (University of Helsinki)', 'issn_l': None, 'issn': None, 'is_oa': True, 'is_in_doaj': False, 'is_core': False, 'host_organization': 'https://openalex.org/I133731052', 'host_organization_name': 'University of Helsinki', 'host_organization_lineage': ['https://openalex.org/I133731052'], 'host_organization_lineage_names': ['University of Helsinki'], 'type': 'repository'}, 'license': None, 'license_id': None, 'version': 'acceptedVersion', 'is_accepted': True, 'is_published': False}, 'sustainable_development_goals': [{'display_name': 'Quality education', 'id': 'https://metadata.un.org/sdg/4', 'score': 0.81}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 19, 'referenced_works': ['https://openalex.org/W152884240', 'https://openalex.org/W1533946607', 'https://openalex.org/W1539448894', 'https://openalex.org/W1572370751', 'https://openalex.org/W1598996557', 'https://openalex.org/W195496810', 'https://openalex.org/W2063139288', 'https://openalex.org/W2081748579', 'https://openalex.org/W2106403442', 'https://openalex.org/W2123660869', 'https://openalex.org/W2130544183', 'https://openalex.org/W2132609289', 'https://openalex.org/W2159544173', 'https://openalex.org/W2178472575', 'https://openalex.org/W2183864645', 'https://openalex.org/W2521776173', 'https://openalex.org/W2623273758', 'https://openalex.org/W565010989', 'https://openalex.org/W87844232'], 'related_works': ['https://openalex.org/W4383616786', 'https://openalex.org/W4210643529', 'https://openalex.org/W3213549959', 'https://openalex.org/W3129739276', 'https://openalex.org/W3108387573', 'https://openalex.org/W3082797515', 'https://openalex.org/W2924380321', 'https://openalex.org/W2532974797', 'https://openalex.org/W2394308601', 'https://openalex.org/W2271356425'], 'abstract_inverted_index': {'In': [0], 'this': [1, 42], 'paper,': [2], 'we': [3], 'reconsider': [4], 'the': [5, 22, 50, 62, 81, 85, 100], 'problem': [6], 'of': [7, 10, 65, 107], 'language': [8, 14, 30, 55, 63, 75], 'identification': [9, 15], 'multilingual': [11, 67, 82], 'documents.': [12], 'Automated': [13], 'algorithms': [16], 'have': [17], 'been': [18], 'improving': [19], 'steadily': [20], 'from': [21, 84], 'seventies': [23], 'until': [24], 'recent': [25, 91], 'years.': [26], 'The': [27], 'current': [28], 'state-of-the-art': [29], 'identifiers': [31, 56], 'are': [32, 70], 'quite': [33], 'efficient': [34], 'even': [35], 'with': [36, 80, 99], 'only': [37], 'a': [38, 66, 72, 90], 'few': [39], 'characters': [40], 'and': [41], 'gives': [43], 'us': [44], 'enough': [45], 'reason': [46], 'to': [47, 52, 60], 'again': [48], 'evaluate': [49], 'possibility': [51], 'use': [53], 'existing': [54], 'for': [57, 77], 'monolingual': [58, 78], 'text': [59], 'detect': [61], 'set': [64], 'document.': [68], 'We': [69], 'using': [71], 'previously': [73], 'developed': [74], 'identifier': [76], 'documents': [79, 83], 'WikipediaMulti': [86], 'dataset': [87], 'published': [88], 'in': [89], 'study.': [92], 'Our': [93], 'method': [94], 'outperforms': [95], 'previous': [96], 'methods': [97], 'tested': [98], 'same': [101], 'data,': [102], 'achieving': [103], 'an': [104], 'F': [105], '1-score': [106], '97.6': [108], 'when': [109], 'classifying': [110], 'between': [111], '44': [112], 'languages.': [113]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W851785710', 'counts_by_year': [{'year': 2024, 'cited_by_count': 4}, {'year': 2021, 'cited_by_count': 2}, {'year': 2020, 'cited_by_count': 2}, {'year': 2019, 'cited_by_count': 3}, {'year': 2018, 'cited_by_count': 2}, {'year': 2016, 'cited_by_count': 2}, {'year': 2015, 'cited_by_count': 2}], 'updated_date': '2024-08-14T01:58:12.136237', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works