Using Topic Keyword Clusters for Automatic Document Clustering

Hsi-Cheng Chang; Chiun‐Chieh Hsu
{'id': 'https://openalex.org/W2154463838', 'doi': 'https://doi.org/10.1109/icita.2005.303', 'title': 'Using Topic Keyword Clusters for Automatic Document Clustering', 'display_name': 'Using Topic Keyword Clusters for Automatic Document Clustering', 'publication_year': 2005, 'publication_date': '2005-08-03', 'ids': {'openalex': 'https://openalex.org/W2154463838', 'doi': 'https://doi.org/10.1109/icita.2005.303', 'mag': '2154463838'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.1109/icita.2005.303', 'pdf_url': None, 'source': None, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'proceedings-article', 'indexed_in': ['crossref'], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5103572146', 'display_name': 'Hsi-Cheng Chang', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I88780834', 'display_name': 'Hwa Hsia University of Technology', 'ror': 'https://ror.org/02mfjgm25', 'country_code': 'TW', 'type': 'education', 'lineage': ['https://openalex.org/I88780834']}], 'countries': ['TW'], 'is_corresponding': False, 'raw_author_name': 'None Hsi-Cheng Chang', 'raw_affiliation_strings': ['Department of Electronic Engineering, Hwa-Hsia College of Technology and Commerce, Taipei, Taiwan'], 'affiliations': [{'raw_affiliation_string': 'Department of Electronic Engineering, Hwa-Hsia College of Technology and Commerce, Taipei, Taiwan', 'institution_ids': ['https://openalex.org/I88780834']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5102247285', 'display_name': 'Chiun‐Chieh Hsu', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I154864474', 'display_name': 'National Taiwan University of Science and Technology', 'ror': 'https://ror.org/00q09pe49', 'country_code': 'TW', 'type': 'education', 'lineage': ['https://openalex.org/I154864474']}], 'countries': ['TW'], 'is_corresponding': False, 'raw_author_name': 'None Chiun-Chieh Hsu', 'raw_affiliation_strings': ['Department of Information Management, National Taiwan University of Science and Technology, Taipei, Taiwan'], 'affiliations': [{'raw_affiliation_string': 'Department of Information Management, National Taiwan University of Science and Technology, Taipei, Taiwan', 'institution_ids': ['https://openalex.org/I154864474']}]}], 'institution_assertions': [], 'countries_distinct_count': 1, 'institutions_distinct_count': 2, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 1.042, 'has_fulltext': True, 'fulltext_origin': 'ngrams', 'cited_by_count': 24, 'citation_normalized_percentile': {'value': 0.842065, 'is_in_top_1_percent': False, 'is_in_top_10_percent': False}, 'cited_by_percentile_year': {'min': 88, 'max': 89}, 'biblio': {'volume': None, 'issue': None, 'first_page': None, 'last_page': None}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10538', 'display_name': 'Data Mining Techniques and Applications', 'score': 0.9995, 'subfield': {'id': 'https://openalex.org/subfields/1710', 'display_name': 'Information Systems'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10538', 'display_name': 'Data Mining Techniques and Applications', 'score': 0.9995, 'subfield': {'id': 'https://openalex.org/subfields/1710', 'display_name': 'Information Systems'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11106', 'display_name': 'Trajectory Data Mining and Analysis', 'score': 0.9988, 'subfield': {'id': 'https://openalex.org/subfields/1711', 'display_name': 'Signal Processing'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T13083', 'display_name': 'Automatic Keyword Extraction from Textual Data', 'score': 0.9981, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/document-clustering', 'display_name': 'Document clustering', 'score': 0.7327597}, {'id': 'https://openalex.org/keywords/clustering-algorithms', 'display_name': 'Clustering Algorithms', 'score': 0.568347}, {'id': 'https://openalex.org/keywords/textual-data', 'display_name': 'Textual Data', 'score': 0.510959}, {'id': 'https://openalex.org/keywords/top-k-query-processing', 'display_name': 'Top-k Query Processing', 'score': 0.505379}], 'concepts': [{'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.7950804}, {'id': 'https://openalex.org/C177937566', 'wikidata': 'https://www.wikidata.org/wiki/Q4223102', 'display_name': 'Document clustering', 'level': 3, 'score': 0.7327597}, {'id': 'https://openalex.org/C73555534', 'wikidata': 'https://www.wikidata.org/wiki/Q622825', 'display_name': 'Cluster analysis', 'level': 2, 'score': 0.72472775}, {'id': 'https://openalex.org/C23123220', 'wikidata': 'https://www.wikidata.org/wiki/Q816826', 'display_name': 'Information retrieval', 'level': 1, 'score': 0.6163506}, {'id': 'https://openalex.org/C124101348', 'wikidata': 'https://www.wikidata.org/wiki/Q172491', 'display_name': 'Data mining', 'level': 1, 'score': 0.34663653}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.3143808}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.1109/icita.2005.303', 'pdf_url': None, 'source': None, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 17, 'referenced_works': ['https://openalex.org/W1511277043', 'https://openalex.org/W1620204465', 'https://openalex.org/W1660390307', 'https://openalex.org/W1833785989', 'https://openalex.org/W1970974276', 'https://openalex.org/W1997841190', 'https://openalex.org/W2069489983', 'https://openalex.org/W2074429597', 'https://openalex.org/W207882028', 'https://openalex.org/W2079656678', 'https://openalex.org/W2090756040', 'https://openalex.org/W2141585940', 'https://openalex.org/W2144481322', 'https://openalex.org/W2154891739', 'https://openalex.org/W2161180563', 'https://openalex.org/W2439041746', 'https://openalex.org/W4252587933'], 'related_works': ['https://openalex.org/W4319309495', 'https://openalex.org/W4254379378', 'https://openalex.org/W4237592971', 'https://openalex.org/W4206655101', 'https://openalex.org/W3015674157', 'https://openalex.org/W2899601636', 'https://openalex.org/W2387982377', 'https://openalex.org/W2105363053', 'https://openalex.org/W2019737068', 'https://openalex.org/W1562544158'], 'abstract_inverted_index': {'Data': [0], 'clustering': [1, 16, 21, 24, 44, 63, 124, 148, 165], 'is': [2, 86, 160], 'a': [3, 58, 77, 151], 'technique': [4], 'for': [5, 11, 29, 60], 'grouping': [6], 'similar': [7], 'data': [8, 15, 43, 53, 83, 147], 'items': [9], 'together': [10], 'convenient': [12], 'understanding.': [13], 'Conventional': [14], 'methods,': [17], 'including': [18], 'agglomerative': [19], 'hierarchical': [20], 'and': [22, 96, 121, 163], 'partitional': [23], 'algorithms': [25], 'frequently': [26], 'perform': [27], 'unsatisfactorily': [28], 'large': [30], 'text': [31, 71, 95], 'article': [32], 'collections,': [33], 'as': [34, 36], 'well': [35], 'the': [37, 41, 50, 70, 90, 93, 99, 103, 116, 122, 130, 136, 157], 'computation': [38], 'complexity': [39], 'of': [40, 52, 69], 'conventional': [42, 146], 'methods': [45, 149], 'increase': [46], 'very': [47], 'quick': [48], 'with': [49], 'number': [51], 'items.': [54], 'This': [55], 'paper': [56], 'presents': [57], 'system': [59, 75], 'automatic': [61], 'document': [62], 'by': [64], 'identifying': [65], 'topic': [66, 100, 112, 137], 'keyword': [67, 113, 123, 138], 'clusters': [68, 114], 'corpus.': [72], 'The': [73, 140], 'proposed': [74, 141, 158], 'adopts': [76], 'multi-stage': [78], 'process.': [79], 'First,': [80], 'an': [81, 161], 'aggressive': [82], 'cleaning': [84], 'approach': [85, 120], 'employed': [87], 'to': [88], 'reduce': [89], 'noise': [91], 'in': [92, 129], 'free': [94], 'further': [97], 'identify': [98], 'keywords': [101, 107], 'within': [102], 'documents.': [104], 'All': [105], 'extracted': [106], 'are': [108, 132], 'then': [109], 'grouped': [110], 'into': [111], 'using': [115], 'k-nearest': [117], 'neighbor': [118], 'graph': [119], 'function.': [125], 'Finally,': [126], 'all': [127], 'documents': [128], 'corpus': [131], 'clustered': [133], 'based': [134], 'on': [135, 150], 'clusters.': [139], 'method': [142, 159], 'was': [143], 'assessed': [144], 'against': [145], 'Web': [152], 'news': [153], 'collection,': [154], 'indicating': [155], 'that': [156], 'efficient': [162], 'effective': [164], 'approach.': [166]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W2154463838', 'counts_by_year': [{'year': 2019, 'cited_by_count': 1}, {'year': 2017, 'cited_by_count': 1}, {'year': 2016, 'cited_by_count': 2}, {'year': 2015, 'cited_by_count': 2}, {'year': 2014, 'cited_by_count': 3}, {'year': 2013, 'cited_by_count': 2}, {'year': 2012, 'cited_by_count': 3}], 'updated_date': '2024-09-18T21:52:29.401601', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works