Measuring text similarity based on structure and word embedding

Mamdouh Farouk
{'id': 'https://openalex.org/W3023321459', 'doi': 'https://doi.org/10.1016/j.cogsys.2020.04.002', 'title': 'Measuring text similarity based on structure and word embedding', 'display_name': 'Measuring text similarity based on structure and word embedding', 'publication_year': 2020, 'publication_date': '2020-10-01', 'ids': {'openalex': 'https://openalex.org/W3023321459', 'doi': 'https://doi.org/10.1016/j.cogsys.2020.04.002', 'mag': '3023321459'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.1016/j.cogsys.2020.04.002', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S96254641', 'display_name': 'Cognitive Systems Research', 'issn_l': '1389-0417', 'issn': ['1389-0417', '2214-4366'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310320990', 'host_organization_name': 'Elsevier BV', 'host_organization_lineage': ['https://openalex.org/P4310320990'], 'host_organization_lineage_names': ['Elsevier BV'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'journal-article', 'indexed_in': ['crossref'], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5034236504', 'display_name': 'Mamdouh Farouk', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I91041137', 'display_name': 'Assiut University', 'ror': 'https://ror.org/01jaj8n65', 'country_code': 'EG', 'type': 'education', 'lineage': ['https://openalex.org/I91041137']}], 'countries': ['EG'], 'is_corresponding': True, 'raw_author_name': 'Mamdouh Farouk', 'raw_affiliation_strings': ['Computer Science Dept. Assiut University, Assiut, Egypt'], 'affiliations': [{'raw_affiliation_string': 'Computer Science Dept. Assiut University, Assiut, Egypt', 'institution_ids': ['https://openalex.org/I91041137']}]}], 'countries_distinct_count': 1, 'institutions_distinct_count': 1, 'corresponding_author_ids': ['https://openalex.org/A5034236504'], 'corresponding_institution_ids': ['https://openalex.org/I91041137'], 'apc_list': {'value': 2770, 'currency': 'USD', 'value_usd': 2770, 'provenance': 'doaj'}, 'apc_paid': None, 'fwci': 4.037, 'has_fulltext': False, 'cited_by_count': 37, 'citation_normalized_percentile': {'value': 0.999973, 'is_in_top_1_percent': True, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 96, 'max': 97}, 'biblio': {'volume': '63', 'issue': None, 'first_page': '1', 'last_page': '10'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10028', 'display_name': 'Natural Language Processing', 'score': 1.0, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10028', 'display_name': 'Natural Language Processing', 'score': 1.0, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10181', 'display_name': 'Statistical Machine Translation and Natural Language Processing', 'score': 0.9999, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11710', 'display_name': 'Biomedical Ontologies and Text Mining', 'score': 0.9956, 'subfield': {'id': 'https://openalex.org/subfields/1312', 'display_name': 'Molecular Biology'}, 'field': {'id': 'https://openalex.org/fields/13', 'display_name': 'Biochemistry, Genetics and Molecular Biology'}, 'domain': {'id': 'https://openalex.org/domains/1', 'display_name': 'Life Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/similarity', 'display_name': 'Similarity (geometry)', 'score': 0.7269109}, {'id': 'https://openalex.org/keywords/word-representation', 'display_name': 'Word Representation', 'score': 0.593259}, {'id': 'https://openalex.org/keywords/semantic-similarity', 'display_name': 'Semantic Similarity', 'score': 0.57593}, {'id': 'https://openalex.org/keywords/natural-language-processing', 'display_name': 'Natural Language Processing', 'score': 0.571169}, {'id': 'https://openalex.org/keywords/word-order', 'display_name': 'Word order', 'score': 0.5496276}, {'id': 'https://openalex.org/keywords/language-modeling', 'display_name': 'Language Modeling', 'score': 0.54352}, {'id': 'https://openalex.org/keywords/text-mining', 'display_name': 'Text Mining', 'score': 0.525476}, {'id': 'https://openalex.org/keywords/word-embedding', 'display_name': 'Word embedding', 'score': 0.5041894}, {'id': 'https://openalex.org/keywords/representation', 'display_name': 'Representation (politics)', 'score': 0.4804206}, {'id': 'https://openalex.org/keywords/similarity-measure', 'display_name': 'Similarity measure', 'score': 0.4586222}], 'concepts': [{'id': 'https://openalex.org/C204321447', 'wikidata': 'https://www.wikidata.org/wiki/Q30642', 'display_name': 'Natural language processing', 'level': 1, 'score': 0.73827916}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.7316928}, {'id': 'https://openalex.org/C103278499', 'wikidata': 'https://www.wikidata.org/wiki/Q254465', 'display_name': 'Similarity (geometry)', 'level': 3, 'score': 0.7269109}, {'id': 'https://openalex.org/C130318100', 'wikidata': 'https://www.wikidata.org/wiki/Q2268914', 'display_name': 'Semantic similarity', 'level': 2, 'score': 0.7135285}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.7125489}, {'id': 'https://openalex.org/C2777530160', 'wikidata': 'https://www.wikidata.org/wiki/Q41796', 'display_name': 'Sentence', 'level': 2, 'score': 0.6961268}, {'id': 'https://openalex.org/C90805587', 'wikidata': 'https://www.wikidata.org/wiki/Q10944557', 'display_name': 'Word (group theory)', 'level': 2, 'score': 0.648921}, {'id': 'https://openalex.org/C70777604', 'wikidata': 'https://www.wikidata.org/wiki/Q257885', 'display_name': 'Word order', 'level': 2, 'score': 0.5496276}, {'id': 'https://openalex.org/C2777462759', 'wikidata': 'https://www.wikidata.org/wiki/Q18395344', 'display_name': 'Word embedding', 'level': 3, 'score': 0.5041894}, {'id': 'https://openalex.org/C2776359362', 'wikidata': 'https://www.wikidata.org/wiki/Q2145286', 'display_name': 'Representation (politics)', 'level': 3, 'score': 0.4804206}, {'id': 'https://openalex.org/C2776517306', 'wikidata': 'https://www.wikidata.org/wiki/Q29017317', 'display_name': 'Similarity measure', 'level': 2, 'score': 0.4586222}, {'id': 'https://openalex.org/C41608201', 'wikidata': 'https://www.wikidata.org/wiki/Q980509', 'display_name': 'Embedding', 'level': 2, 'score': 0.33432198}, {'id': 'https://openalex.org/C33923547', 'wikidata': 'https://www.wikidata.org/wiki/Q395', 'display_name': 'Mathematics', 'level': 0, 'score': 0.1852583}, {'id': 'https://openalex.org/C115961682', 'wikidata': 'https://www.wikidata.org/wiki/Q860623', 'display_name': 'Image (mathematics)', 'level': 2, 'score': 0.08375126}, {'id': 'https://openalex.org/C2524010', 'wikidata': 'https://www.wikidata.org/wiki/Q8087', 'display_name': 'Geometry', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C94625758', 'wikidata': 'https://www.wikidata.org/wiki/Q7163', 'display_name': 'Politics', 'level': 2, 'score': 0.0}, {'id': 'https://openalex.org/C17744445', 'wikidata': 'https://www.wikidata.org/wiki/Q36442', 'display_name': 'Political science', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C199539241', 'wikidata': 'https://www.wikidata.org/wiki/Q7748', 'display_name': 'Law', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.1016/j.cogsys.2020.04.002', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S96254641', 'display_name': 'Cognitive Systems Research', 'issn_l': '1389-0417', 'issn': ['1389-0417', '2214-4366'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310320990', 'host_organization_name': 'Elsevier BV', 'host_organization_lineage': ['https://openalex.org/P4310320990'], 'host_organization_lineage_names': ['Elsevier BV'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [{'score': 0.75, 'display_name': 'Quality education', 'id': 'https://metadata.un.org/sdg/4'}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 29, 'referenced_works': ['https://openalex.org/W1489476898', 'https://openalex.org/W1632114991', 'https://openalex.org/W1886218757', 'https://openalex.org/W1969245183', 'https://openalex.org/W1980776243', 'https://openalex.org/W1985815746', 'https://openalex.org/W2028742638', 'https://openalex.org/W2033194278', 'https://openalex.org/W2045929671', 'https://openalex.org/W2080100102', 'https://openalex.org/W2081580037', 'https://openalex.org/W2082815071', 'https://openalex.org/W2121184547', 'https://openalex.org/W2151170651', 'https://openalex.org/W2156124259', 'https://openalex.org/W2168652246', 'https://openalex.org/W2171313960', 'https://openalex.org/W2200913422', 'https://openalex.org/W2251185591', 'https://openalex.org/W2566908019', 'https://openalex.org/W2736149021', 'https://openalex.org/W2785390220', 'https://openalex.org/W2883868721', 'https://openalex.org/W2912737652', 'https://openalex.org/W2915819397', 'https://openalex.org/W2950577311', 'https://openalex.org/W2950726992', 'https://openalex.org/W2964109882', 'https://openalex.org/W3101613385'], 'related_works': ['https://openalex.org/W4301351852', 'https://openalex.org/W4286432911', 'https://openalex.org/W3134737443', 'https://openalex.org/W3099449837', 'https://openalex.org/W2966570129', 'https://openalex.org/W2912737652', 'https://openalex.org/W2911655849', 'https://openalex.org/W2774861092', 'https://openalex.org/W2622845166', 'https://openalex.org/W2107397692'], 'abstract_inverted_index': {'The': [0, 55], 'problem': [1], 'of': [2, 23, 50, 65, 112], 'finding': [3], 'the': [4, 48, 51, 63, 75, 110, 126, 130], 'similarity': [5, 24, 35, 53, 60, 73, 105], 'between': [6, 25], 'natural': [7, 92], 'language': [8], 'sentences': [9, 26, 93], 'is': [10, 27, 87, 94, 106], 'crucial': [11], 'for': [12, 91], 'many': [13], 'applications': [14], 'in': [15, 62, 114], 'Natural': [16], 'Language': [17], 'Processing': [18], '(NLP).': [19], 'An': [20], 'accurate': [21], 'calculation': [22, 64], 'highly': [28], 'needed.': [29], 'Many': [30], 'approaches': [31, 132], 'depend': [32], 'on': [33, 133], 'word-to-word': [34, 72], 'to': [36, 46, 70, 98, 108], 'measure': [37], 'sentence': [38, 52, 66, 79], 'similarity.': [39, 67, 101, 144], 'This': [40], 'paper': [41], 'proposes': [42], 'a': [43, 88, 134], 'new': [44], 'approach': [45, 57, 77], 'improve': [47], 'accuracy': [49], 'calculation.': [54], 'proposed': [56, 76, 127], 'combines': [58], 'different': [59], 'measures': [61], 'In': [68], 'addition': [69], 'traditional': [71], 'measure,': [74], 'exploits': [78], 'semantic': [80, 89], 'structure.': [81], 'Discourse': [82], 'representation': [83, 90], 'structure': [84, 100], '(DRS)': [85], 'which': [86], 'generated': [95], 'and': [96], 'used': [97], 'calculated': [99], 'Furthermore,': [102], 'word': [103], 'order': [104, 111], 'measured': [107], 'consider': [109], 'words': [113], 'sentences.': [115], 'Experiments': [116], 'show': [117], 'that': [118], 'exploiting': [119], 'structural': [120], 'information': [121], 'achieves': [122], 'good': [123], 'results.': [124], 'Moreover,': [125], 'method': [128], 'outperforms': [129], 'current': [131], 'standard': [135], 'benchmark': [136], 'dataset': [137], 'achieving': [138], '0.8813': [139], 'Pearson': [140], 'correlation': [141], 'with': [142], 'human': [143]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W3023321459', 'counts_by_year': [{'year': 2024, 'cited_by_count': 5}, {'year': 2023, 'cited_by_count': 9}, {'year': 2022, 'cited_by_count': 13}, {'year': 2021, 'cited_by_count': 8}, {'year': 2020, 'cited_by_count': 1}], 'updated_date': '2024-09-11T04:33:07.597972', 'created_date': '2020-05-13'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works