Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach

Hien Nguyen; Jinli Cao
{'id': 'https://openalex.org/W2149154720', 'doi': 'https://doi.org/10.1007/978-3-540-89704-0_29', 'title': 'Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach', 'display_name': 'Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach', 'publication_year': 2008, 'publication_date': '2008-11-13', 'ids': {'openalex': 'https://openalex.org/W2149154720', 'doi': 'https://doi.org/10.1007/978-3-540-89704-0_29', 'mag': '2149154720'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.1007/978-3-540-89704-0_29', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S106296714', 'display_name': 'Lecture notes in computer science', 'issn_l': '0302-9743', 'issn': ['0302-9743', '1611-3349'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310319900', 'host_organization_name': 'Springer Science+Business Media', 'host_organization_lineage': ['https://openalex.org/P4310319965', 'https://openalex.org/P4310319900'], 'host_organization_lineage_names': ['Springer Nature', 'Springer Science+Business Media'], 'type': 'book series'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'book-chapter', 'type_crossref': 'book-chapter', 'indexed_in': ['crossref'], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5048582530', 'display_name': 'Hien Nguyen', 'orcid': 'https://orcid.org/0000-0001-8940-3778'}, 'institutions': [{'id': 'https://openalex.org/I141445968', 'display_name': 'Ton Duc Thang University', 'ror': 'https://ror.org/01drq0835', 'country_code': 'VN', 'type': 'education', 'lineage': ['https://openalex.org/I141445968']}], 'countries': ['VN'], 'is_corresponding': False, 'raw_author_name': 'Hien T. Nguyen', 'raw_affiliation_strings': ['Ton Duc Thang University, Vietnam'], 'affiliations': [{'raw_affiliation_string': 'Ton Duc Thang University, Vietnam', 'institution_ids': ['https://openalex.org/I141445968']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5071311514', 'display_name': 'Jinli Cao', 'orcid': 'https://orcid.org/0000-0002-0221-6361'}, 'institutions': [{'id': 'https://openalex.org/I47265099', 'display_name': 'Ho Chi Minh City University of Technology', 'ror': 'https://ror.org/04qva2324', 'country_code': 'VN', 'type': 'education', 'lineage': ['https://openalex.org/I47265099']}], 'countries': ['VN'], 'is_corresponding': False, 'raw_author_name': 'Tru H. Cao', 'raw_affiliation_strings': ['Ho Chi Minh City University of Technology, Vietnam'], 'affiliations': [{'raw_affiliation_string': 'Ho Chi Minh City University of Technology, Vietnam', 'institution_ids': ['https://openalex.org/I47265099']}]}], 'countries_distinct_count': 1, 'institutions_distinct_count': 2, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': {'value': 5000, 'currency': 'EUR', 'value_usd': 5392, 'provenance': 'doaj'}, 'apc_paid': None, 'fwci': 2.939, 'has_fulltext': True, 'fulltext_origin': 'ngrams', 'cited_by_count': 25, 'citation_normalized_percentile': {'value': 0.784956, 'is_in_top_1_percent': False, 'is_in_top_10_percent': False}, 'cited_by_percentile_year': {'min': 90, 'max': 91}, 'biblio': {'volume': None, 'issue': None, 'first_page': '420', 'last_page': '433'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10028', 'display_name': 'Natural Language Processing', 'score': 0.9987, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10028', 'display_name': 'Natural Language Processing', 'score': 0.9987, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11719', 'display_name': 'Data Quality Assessment and Improvement', 'score': 0.9984, 'subfield': {'id': 'https://openalex.org/subfields/1803', 'display_name': 'Management Science and Operations Research'}, 'field': {'id': 'https://openalex.org/fields/18', 'display_name': 'Decision Sciences'}, 'domain': {'id': 'https://openalex.org/domains/2', 'display_name': 'Social Sciences'}}, {'id': 'https://openalex.org/T12016', 'display_name': 'Web Data Extraction and Crawling Techniques', 'score': 0.9983, 'subfield': {'id': 'https://openalex.org/subfields/1710', 'display_name': 'Information Systems'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/named-entity-recognition', 'display_name': 'Named Entity Recognition', 'score': 0.652716}, {'id': 'https://openalex.org/keywords/web-data-extraction', 'display_name': 'Web Data Extraction', 'score': 0.592864}, {'id': 'https://openalex.org/keywords/entity-linking', 'display_name': 'Entity linking', 'score': 0.5555466}, {'id': 'https://openalex.org/keywords/heuristics', 'display_name': 'Heuristics', 'score': 0.549133}, {'id': 'https://openalex.org/keywords/page-segmentation', 'display_name': 'Page Segmentation', 'score': 0.540391}, {'id': 'https://openalex.org/keywords/web-crawling', 'display_name': 'Web Crawling', 'score': 0.534571}, {'id': 'https://openalex.org/keywords/information-retrieval', 'display_name': 'Information Retrieval', 'score': 0.531936}, {'id': 'https://openalex.org/keywords/rank', 'display_name': 'Rank (graph theory)', 'score': 0.4113471}], 'concepts': [{'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.8991118}, {'id': 'https://openalex.org/C96711827', 'wikidata': 'https://www.wikidata.org/wiki/Q17012245', 'display_name': 'Entity linking', 'level': 3, 'score': 0.5555466}, {'id': 'https://openalex.org/C23123220', 'wikidata': 'https://www.wikidata.org/wiki/Q816826', 'display_name': 'Information retrieval', 'level': 1, 'score': 0.5548476}, {'id': 'https://openalex.org/C127705205', 'wikidata': 'https://www.wikidata.org/wiki/Q5748245', 'display_name': 'Heuristics', 'level': 2, 'score': 0.549133}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.49207944}, {'id': 'https://openalex.org/C36503486', 'wikidata': 'https://www.wikidata.org/wiki/Q11235244', 'display_name': 'Domain (mathematical analysis)', 'level': 2, 'score': 0.48964435}, {'id': 'https://openalex.org/C106131492', 'wikidata': 'https://www.wikidata.org/wiki/Q3072260', 'display_name': 'Filter (signal processing)', 'level': 2, 'score': 0.48178193}, {'id': 'https://openalex.org/C2779135771', 'wikidata': 'https://www.wikidata.org/wiki/Q403574', 'display_name': 'Named-entity recognition', 'level': 3, 'score': 0.46472698}, {'id': 'https://openalex.org/C2778738651', 'wikidata': 'https://www.wikidata.org/wiki/Q16546687', 'display_name': 'Novelty', 'level': 2, 'score': 0.44587943}, {'id': 'https://openalex.org/C124101348', 'wikidata': 'https://www.wikidata.org/wiki/Q172491', 'display_name': 'Data mining', 'level': 1, 'score': 0.4430577}, {'id': 'https://openalex.org/C98045186', 'wikidata': 'https://www.wikidata.org/wiki/Q205663', 'display_name': 'Process (computing)', 'level': 2, 'score': 0.43300578}, {'id': 'https://openalex.org/C204321447', 'wikidata': 'https://www.wikidata.org/wiki/Q30642', 'display_name': 'Natural language processing', 'level': 1, 'score': 0.4286734}, {'id': 'https://openalex.org/C2780801425', 'wikidata': 'https://www.wikidata.org/wiki/Q5164392', 'display_name': 'Construct (python library)', 'level': 2, 'score': 0.42601097}, {'id': 'https://openalex.org/C164226766', 'wikidata': 'https://www.wikidata.org/wiki/Q7293202', 'display_name': 'Rank (graph theory)', 'level': 2, 'score': 0.4113471}, {'id': 'https://openalex.org/C2780451532', 'wikidata': 'https://www.wikidata.org/wiki/Q759676', 'display_name': 'Task (project management)', 'level': 2, 'score': 0.09903899}, {'id': 'https://openalex.org/C4554734', 'wikidata': 'https://www.wikidata.org/wiki/Q593744', 'display_name': 'Knowledge base', 'level': 2, 'score': 0.09330946}, {'id': 'https://openalex.org/C134306372', 'wikidata': 'https://www.wikidata.org/wiki/Q7754', 'display_name': 'Mathematical analysis', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C138885662', 'wikidata': 'https://www.wikidata.org/wiki/Q5891', 'display_name': 'Philosophy', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C33923547', 'wikidata': 'https://www.wikidata.org/wiki/Q395', 'display_name': 'Mathematics', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C187736073', 'wikidata': 'https://www.wikidata.org/wiki/Q2920921', 'display_name': 'Management', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C27206212', 'wikidata': 'https://www.wikidata.org/wiki/Q34178', 'display_name': 'Theology', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C114614502', 'wikidata': 'https://www.wikidata.org/wiki/Q76592', 'display_name': 'Combinatorics', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C199360897', 'wikidata': 'https://www.wikidata.org/wiki/Q9143', 'display_name': 'Programming language', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C162324750', 'wikidata': 'https://www.wikidata.org/wiki/Q8134', 'display_name': 'Economics', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C31972630', 'wikidata': 'https://www.wikidata.org/wiki/Q844240', 'display_name': 'Computer vision', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C111919701', 'wikidata': 'https://www.wikidata.org/wiki/Q9135', 'display_name': 'Operating system', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.1007/978-3-540-89704-0_29', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S106296714', 'display_name': 'Lecture notes in computer science', 'issn_l': '0302-9743', 'issn': ['0302-9743', '1611-3349'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310319900', 'host_organization_name': 'Springer Science+Business Media', 'host_organization_lineage': ['https://openalex.org/P4310319965', 'https://openalex.org/P4310319900'], 'host_organization_lineage_names': ['Springer Nature', 'Springer Science+Business Media'], 'type': 'book series'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [{'score': 0.57, 'display_name': 'Quality education', 'id': 'https://metadata.un.org/sdg/4'}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 27, 'referenced_works': ['https://openalex.org/W1039949627', 'https://openalex.org/W1482174963', 'https://openalex.org/W1502876877', 'https://openalex.org/W1548663377', 'https://openalex.org/W1646278814', 'https://openalex.org/W1660390307', 'https://openalex.org/W1788758954', 'https://openalex.org/W1989602990', 'https://openalex.org/W2029433174', 'https://openalex.org/W2057737049', 'https://openalex.org/W2097682512', 'https://openalex.org/W2110873584', 'https://openalex.org/W2136134253', 'https://openalex.org/W2140428995', 'https://openalex.org/W2144578941', 'https://openalex.org/W2148200831', 'https://openalex.org/W2150380784', 'https://openalex.org/W2151846280', 'https://openalex.org/W2163040665', 'https://openalex.org/W2171300553', 'https://openalex.org/W2426119782', 'https://openalex.org/W2490667291', 'https://openalex.org/W2501765443', 'https://openalex.org/W2561675875', 'https://openalex.org/W2882319491', 'https://openalex.org/W4508078', 'https://openalex.org/W86887328'], 'related_works': ['https://openalex.org/W4313162113', 'https://openalex.org/W4255258373', 'https://openalex.org/W3198729192', 'https://openalex.org/W3133906981', 'https://openalex.org/W3000685722', 'https://openalex.org/W2883921205', 'https://openalex.org/W2593907245', 'https://openalex.org/W2520117834', 'https://openalex.org/W2186562580', 'https://openalex.org/W1884363728'], 'abstract_inverted_index': {'The': [0, 52, 93, 155], 'rapidly': [1], 'increasing': [2], 'use': [3], 'of': [4, 16, 27, 59, 136, 142, 144, 151], 'large-scale': [5], 'data': [6], 'on': [7], 'the': [8, 17, 47, 61, 72, 75, 79, 85, 90, 97, 108, 117, 134, 137, 152], 'Web': [9], 'makes': [10], 'named': [11, 173], 'entity': [12, 121, 174], 'disambiguation': [13, 98, 141, 175], 'become': [14], 'one': [15, 63, 77], 'main': [18], 'challenges': [19], 'to': [20, 46, 69, 83, 88, 169], 'research': [21], 'in': [22, 40, 50, 129, 140, 149], 'Information': [23], 'Extraction': [24], 'and': [25, 43, 67, 74, 102, 115, 147, 165], 'development': [26], 'Semantic': [28], 'Web.': [29], 'This': [30], 'paper': [31], 'presents': [32], 'a': [33, 41, 130, 171], 'novel': [34], 'method': [35, 53, 139], 'for': [36], 'detecting': [37], 'proper': [38], 'names': [39, 143], 'text': [42, 118], 'linking': [44], 'them': [45], 'right': [48, 91], 'entities': [49, 114], 'Wikipedia.': [51], 'is': [54, 95, 100], 'hybrid,': [55], 'containing': [56], 'two': [57], 'phases': [58], 'which': [60], 'first': [62], 'utilizes': [64], 'some': [65], 'heuristics': [66], 'patterns': [68], 'narrow': [70], 'down': [71], 'candidates,': [73, 109], 'second': [76], 'employs': [78], 'vector': [80], 'space': [81], 'model': [82], 'rank': [84], 'ambiguous': [86], 'cases': [87], 'choose': [89], 'candidate.': [92], 'novelty': [94], 'that': [96, 106, 159], 'process': [99], 'incremental': [101], 'includes': [103], 'several': [104], 'rounds': [105], 'filter': [107], 'by': [110, 119], 'exploiting': [111], 'previously': [112], 'identified': [113], 'extending': [116], 'those': [120], 'attributes': [122], 'every': [123], 'time': [124], 'they': [125], 'are': [126], 'successfully': [127], 'resolved': [128], 'round.': [131], 'We': [132], 'test': [133], 'performance': [135], 'proposed': [138], 'people,': [145], 'locations': [146], 'organizations': [148], 'texts': [150], 'news': [153], 'domain.': [154], 'experiment': [156], 'results': [157], 'show': [158], 'our': [160], 'approach': [161], 'achieves': [162], 'high': [163], 'accuracy': [164], 'can': [166], 'be': [167], 'used': [168], 'construct': [170], 'robust': [172], 'system.': [176]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W2149154720', 'counts_by_year': [{'year': 2023, 'cited_by_count': 1}, {'year': 2016, 'cited_by_count': 5}, {'year': 2015, 'cited_by_count': 1}, {'year': 2013, 'cited_by_count': 3}, {'year': 2012, 'cited_by_count': 5}], 'updated_date': '2024-08-31T10:36:52.294673', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works