Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction

Darja Fišer; Nikola Ljubešić; Špela Vintar; Senja Pollak
{'id': 'https://openalex.org/W34279430', 'doi': None, 'title': 'Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction', 'display_name': 'Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction', 'publication_year': 2011, 'publication_date': '2011-06-24', 'ids': {'openalex': 'https://openalex.org/W34279430', 'mag': '34279430'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://www.aclweb.org/anthology/W/W11/W11-1204.pdf', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306420508', 'display_name': 'Meeting of the Association for Computational Linguistics', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'conference'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'proceedings-article', 'indexed_in': [], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5081152655', 'display_name': 'Darja Fišer', 'orcid': 'https://orcid.org/0000-0002-9956-1689'}, 'institutions': [{'id': 'https://openalex.org/I153976015', 'display_name': 'University of Ljubljana', 'ror': 'https://ror.org/05njb9z20', 'country_code': 'SI', 'type': 'education', 'lineage': ['https://openalex.org/I153976015']}], 'countries': ['SI'], 'is_corresponding': False, 'raw_author_name': 'Darja Fišer', 'raw_affiliation_strings': ['University of Ljubljana, Aškerčeva, Ljubljana, Slovenia#TAB#'], 'affiliations': [{'raw_affiliation_string': 'University of Ljubljana, Aškerčeva, Ljubljana, Slovenia#TAB#', 'institution_ids': ['https://openalex.org/I153976015']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5021557805', 'display_name': 'Nikola Ljubešić', 'orcid': 'https://orcid.org/0000-0001-7169-9152'}, 'institutions': [{'id': 'https://openalex.org/I181343428', 'display_name': 'University of Zagreb', 'ror': 'https://ror.org/00mv6sv71', 'country_code': 'HR', 'type': 'education', 'lineage': ['https://openalex.org/I181343428']}], 'countries': ['HR'], 'is_corresponding': False, 'raw_author_name': 'Nikola Ljubešić', 'raw_affiliation_strings': ['University of Zagreb, Ivana Lučića, Zagreb, Croatia#TAB#'], 'affiliations': [{'raw_affiliation_string': 'University of Zagreb, Ivana Lučića, Zagreb, Croatia#TAB#', 'institution_ids': ['https://openalex.org/I181343428']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5006231373', 'display_name': 'Špela Vintar', 'orcid': 'https://orcid.org/0000-0003-1934-0200'}, 'institutions': [{'id': 'https://openalex.org/I153976015', 'display_name': 'University of Ljubljana', 'ror': 'https://ror.org/05njb9z20', 'country_code': 'SI', 'type': 'education', 'lineage': ['https://openalex.org/I153976015']}], 'countries': ['SI'], 'is_corresponding': False, 'raw_author_name': 'Špela Vintar', 'raw_affiliation_strings': ['University of Ljubljana, Aškerčeva, Ljubljana, Slovenia#TAB#'], 'affiliations': [{'raw_affiliation_string': 'University of Ljubljana, Aškerčeva, Ljubljana, Slovenia#TAB#', 'institution_ids': ['https://openalex.org/I153976015']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5074881863', 'display_name': 'Senja Pollak', 'orcid': 'https://orcid.org/0000-0002-4380-0863'}, 'institutions': [{'id': 'https://openalex.org/I153976015', 'display_name': 'University of Ljubljana', 'ror': 'https://ror.org/05njb9z20', 'country_code': 'SI', 'type': 'education', 'lineage': ['https://openalex.org/I153976015']}], 'countries': ['SI'], 'is_corresponding': False, 'raw_author_name': 'Senja Pollak', 'raw_affiliation_strings': ['University of Ljubljana, Aškerčeva, Ljubljana, Slovenia#TAB#'], 'affiliations': [{'raw_affiliation_string': 'University of Ljubljana, Aškerčeva, Ljubljana, Slovenia#TAB#', 'institution_ids': ['https://openalex.org/I153976015']}]}], 'countries_distinct_count': 2, 'institutions_distinct_count': 2, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 3.13, 'has_fulltext': False, 'cited_by_count': 15, 'citation_normalized_percentile': {'value': 0.63112, 'is_in_top_1_percent': False, 'is_in_top_10_percent': False}, 'cited_by_percentile_year': {'min': 88, 'max': 89}, 'biblio': {'volume': None, 'issue': None, 'first_page': '19', 'last_page': '26'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10181', 'display_name': 'Statistical Machine Translation and Natural Language Processing', 'score': 0.9999, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10181', 'display_name': 'Statistical Machine Translation and Natural Language Processing', 'score': 0.9999, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10028', 'display_name': 'Natural Language Processing', 'score': 0.9991, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11710', 'display_name': 'Biomedical Ontologies and Text Mining', 'score': 0.996, 'subfield': {'id': 'https://openalex.org/subfields/1312', 'display_name': 'Molecular Biology'}, 'field': {'id': 'https://openalex.org/fields/13', 'display_name': 'Biochemistry, Genetics and Molecular Biology'}, 'domain': {'id': 'https://openalex.org/domains/1', 'display_name': 'Life Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/language-modeling', 'display_name': 'Language Modeling', 'score': 0.528223}, {'id': 'https://openalex.org/keywords/semantic-similarity', 'display_name': 'Semantic Similarity', 'score': 0.522349}, {'id': 'https://openalex.org/keywords/natural-language-processing', 'display_name': 'Natural Language Processing', 'score': 0.518047}, {'id': 'https://openalex.org/keywords/corpus-linguistics', 'display_name': 'Corpus Linguistics', 'score': 0.515908}, {'id': 'https://openalex.org/keywords/syntax-based-translation-models', 'display_name': 'Syntax-based Translation Models', 'score': 0.50529}], 'concepts': [{'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.8115079}, {'id': 'https://openalex.org/C204321447', 'wikidata': 'https://www.wikidata.org/wiki/Q30642', 'display_name': 'Natural language processing', 'level': 1, 'score': 0.72576725}, {'id': 'https://openalex.org/C2778121359', 'wikidata': 'https://www.wikidata.org/wiki/Q8096', 'display_name': 'Lexicon', 'level': 2, 'score': 0.6920606}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.68080175}, {'id': 'https://openalex.org/C81669768', 'wikidata': 'https://www.wikidata.org/wiki/Q2359161', 'display_name': 'Precision and recall', 'level': 2, 'score': 0.59725237}, {'id': 'https://openalex.org/C2779343474', 'wikidata': 'https://www.wikidata.org/wiki/Q3109175', 'display_name': 'Context (archaeology)', 'level': 2, 'score': 0.5872232}, {'id': 'https://openalex.org/C36503486', 'wikidata': 'https://www.wikidata.org/wiki/Q11235244', 'display_name': 'Domain (mathematical analysis)', 'level': 2, 'score': 0.5769908}, {'id': 'https://openalex.org/C100660578', 'wikidata': 'https://www.wikidata.org/wiki/Q18733', 'display_name': 'Recall', 'level': 2, 'score': 0.4998982}, {'id': 'https://openalex.org/C41895202', 'wikidata': 'https://www.wikidata.org/wiki/Q8162', 'display_name': 'Linguistics', 'level': 1, 'score': 0.13589033}, {'id': 'https://openalex.org/C33923547', 'wikidata': 'https://www.wikidata.org/wiki/Q395', 'display_name': 'Mathematics', 'level': 0, 'score': 0.11317533}, {'id': 'https://openalex.org/C151730666', 'wikidata': 'https://www.wikidata.org/wiki/Q7205', 'display_name': 'Paleontology', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C134306372', 'wikidata': 'https://www.wikidata.org/wiki/Q7754', 'display_name': 'Mathematical analysis', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C138885662', 'wikidata': 'https://www.wikidata.org/wiki/Q5891', 'display_name': 'Philosophy', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C86803240', 'wikidata': 'https://www.wikidata.org/wiki/Q420', 'display_name': 'Biology', 'level': 0, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://www.aclweb.org/anthology/W/W11/W11-1204.pdf', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306420508', 'display_name': 'Meeting of the Association for Computational Linguistics', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'conference'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [{'display_name': 'Quality education', 'id': 'https://metadata.un.org/sdg/4', 'score': 0.83}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 19, 'referenced_works': ['https://openalex.org/W1540916400', 'https://openalex.org/W1625582487', 'https://openalex.org/W1973923101', 'https://openalex.org/W2003025922', 'https://openalex.org/W2066308426', 'https://openalex.org/W2102749417', 'https://openalex.org/W2116780029', 'https://openalex.org/W2121415745', 'https://openalex.org/W2134293518', 'https://openalex.org/W2146950091', 'https://openalex.org/W2155983311', 'https://openalex.org/W2163953154', 'https://openalex.org/W2167207791', 'https://openalex.org/W2251109811', 'https://openalex.org/W2737831686', 'https://openalex.org/W3197645311', 'https://openalex.org/W3203627005', 'https://openalex.org/W71111251', 'https://openalex.org/W77556801'], 'related_works': ['https://openalex.org/W95136384', 'https://openalex.org/W91489440', 'https://openalex.org/W3123064333', 'https://openalex.org/W2811171756', 'https://openalex.org/W2772644762', 'https://openalex.org/W2752697875', 'https://openalex.org/W2574531235', 'https://openalex.org/W2524010971', 'https://openalex.org/W2251914791', 'https://openalex.org/W2250332852', 'https://openalex.org/W2233998645', 'https://openalex.org/W2156985047', 'https://openalex.org/W2121415745', 'https://openalex.org/W2118606687', 'https://openalex.org/W2102749417', 'https://openalex.org/W2066308426', 'https://openalex.org/W1549114989', 'https://openalex.org/W152090050', 'https://openalex.org/W1479719301', 'https://openalex.org/W125496762'], 'abstract_inverted_index': {'This': [0], 'paper': [1], 'presents': [2], 'a': [3, 19, 35, 46, 71, 103], 'series': [4], 'of': [5, 66, 74, 80, 110, 113, 121, 146], 'experiments': [6], 'aimed': [7], 'at': [8], 'inducing': [9], 'and': [10, 30, 106], 'evaluating': [11], 'domain-specific': [12], 'bilingual': [13, 47], 'lexica': [14], 'from': [15, 24, 42, 54], 'comparable': [16, 22, 37], 'corpora.': [17, 44], 'First,': [18], 'small': [20], 'English-Slovene': [21], 'corpus': [23, 38, 56, 95], 'health': [25], 'magazines': [26], 'was': [27, 52], 'manually': [28], 'constructed': [29], 'then': [31], 'used': [32], 'to': [33, 92, 101], 'compile': [34], 'large': [36], 'on': [39], 'health-related': [40], 'topics': [41], 'web': [43], 'Next,': [45], 'lexicon': [48], 'for': [49, 96, 133], 'the': [50, 55, 62, 67, 81, 94, 108, 111, 119, 122, 130, 136, 144], 'domain': [51], 'extracted': [53, 82], 'by': [57, 128], 'comparing': [58], 'context': [59, 75, 134], 'vectors': [60, 76], 'in': [61, 99, 118, 138], 'two': [63], 'languages.': [64], 'Evaluation': [65], 'results': [68], 'shows': [69], 'that': [70, 88, 107, 127], '2-way': [72], 'translation': [73, 83], 'significantly': [77], 'improves': [78], 'precision': [79, 139], 'equivalents.': [84], 'We': [85], 'also': [86], 'show': [87], 'it': [89], 'is': [90, 116, 140], 'sufficient': [91], 'increase': [93, 109, 145], 'one': [97], 'language': [98], 'order': [100], 'obtain': [102], 'higher': [104], 'recall,': [105], 'number': [112], 'new': [114], 'words': [115], 'linear': [117], 'size': [120], 'corpus.': [123], 'Finally,': [124], 'we': [125], 'demonstrate': [126], 'lowering': [129], 'frequency': [131], 'threshold': [132], 'vectors,': [135], 'drop': [137], 'much': [141], 'slower': [142], 'than': [143], 'recall.': [147]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W34279430', 'counts_by_year': [{'year': 2019, 'cited_by_count': 1}, {'year': 2014, 'cited_by_count': 3}, {'year': 2013, 'cited_by_count': 4}, {'year': 2012, 'cited_by_count': 5}], 'updated_date': '2024-09-01T00:36:49.062830', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works