Document clustering using nonnegative matrix factorization

Farial Shahnaz; Michael W. Berry; V. Paúl Pauca; Robert J. Plemmons
{'id': 'https://openalex.org/W2113359929', 'doi': 'https://doi.org/10.1016/j.ipm.2004.11.005', 'title': 'Document clustering using nonnegative matrix factorization', 'display_name': 'Document clustering using nonnegative matrix factorization', 'publication_year': 2006, 'publication_date': '2006-03-01', 'ids': {'openalex': 'https://openalex.org/W2113359929', 'doi': 'https://doi.org/10.1016/j.ipm.2004.11.005', 'mag': '2113359929'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.1016/j.ipm.2004.11.005', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S174847851', 'display_name': 'Information Processing & Management', 'issn_l': '0306-4573', 'issn': ['0306-4573', '1873-5371'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310320990', 'host_organization_name': 'Elsevier BV', 'host_organization_lineage': ['https://openalex.org/P4310320990'], 'host_organization_lineage_names': ['Elsevier BV'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'journal-article', 'indexed_in': ['crossref'], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5049715737', 'display_name': 'Farial Shahnaz', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I75027704', 'display_name': 'University of Tennessee at Knoxville', 'ror': 'https://ror.org/020f3ap87', 'country_code': 'US', 'type': 'education', 'lineage': ['https://openalex.org/I75027704']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Farial Shahnaz', 'raw_affiliation_strings': ['Department of Computer Science, University of Tennessee, Knoxville, TN 37996-3450, USA#TAB#'], 'affiliations': [{'raw_affiliation_string': 'Department of Computer Science, University of Tennessee, Knoxville, TN 37996-3450, USA#TAB#', 'institution_ids': ['https://openalex.org/I75027704']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5075015473', 'display_name': 'Michael W. Berry', 'orcid': 'https://orcid.org/0000-0002-9191-9148'}, 'institutions': [{'id': 'https://openalex.org/I75027704', 'display_name': 'University of Tennessee at Knoxville', 'ror': 'https://ror.org/020f3ap87', 'country_code': 'US', 'type': 'education', 'lineage': ['https://openalex.org/I75027704']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Michael W. Berry', 'raw_affiliation_strings': ['Department of Computer Science, University of Tennessee, Knoxville, TN 37996-3450, USA#TAB#'], 'affiliations': [{'raw_affiliation_string': 'Department of Computer Science, University of Tennessee, Knoxville, TN 37996-3450, USA#TAB#', 'institution_ids': ['https://openalex.org/I75027704']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5087573518', 'display_name': 'V. Paúl Pauca', 'orcid': 'https://orcid.org/0009-0005-1741-137X'}, 'institutions': [{'id': 'https://openalex.org/I47251452', 'display_name': 'Wake Forest University', 'ror': 'https://ror.org/0207ad724', 'country_code': 'US', 'type': 'education', 'lineage': ['https://openalex.org/I47251452']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'V.Paul Pauca', 'raw_affiliation_strings': ['Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109, USA.'], 'affiliations': [{'raw_affiliation_string': 'Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109, USA.', 'institution_ids': ['https://openalex.org/I47251452']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5076353369', 'display_name': 'Robert J. Plemmons', 'orcid': 'https://orcid.org/0000-0003-4021-6925'}, 'institutions': [{'id': 'https://openalex.org/I47251452', 'display_name': 'Wake Forest University', 'ror': 'https://ror.org/0207ad724', 'country_code': 'US', 'type': 'education', 'lineage': ['https://openalex.org/I47251452']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Robert J. Plemmons', 'raw_affiliation_strings': ['Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109, USA.'], 'affiliations': [{'raw_affiliation_string': 'Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109, USA.', 'institution_ids': ['https://openalex.org/I47251452']}]}], 'countries_distinct_count': 1, 'institutions_distinct_count': 2, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': {'value': 3220, 'currency': 'USD', 'value_usd': 3220, 'provenance': 'doaj'}, 'apc_paid': None, 'fwci': 41.06, 'has_fulltext': True, 'fulltext_origin': 'ngrams', 'cited_by_count': 581, 'citation_normalized_percentile': {'value': 0.999961, 'is_in_top_1_percent': True, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 99, 'max': 100}, 'biblio': {'volume': '42', 'issue': '2', 'first_page': '373', 'last_page': '386'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T11550', 'display_name': 'Multi-label Text Classification in Machine Learning', 'score': 0.9793, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T11550', 'display_name': 'Multi-label Text Classification in Machine Learning', 'score': 0.9793, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10057', 'display_name': 'Face Recognition and Dimensionality Reduction Techniques', 'score': 0.9781, 'subfield': {'id': 'https://openalex.org/subfields/1707', 'display_name': 'Computer Vision and Pattern Recognition'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10637', 'display_name': 'Data Clustering Techniques and Algorithms', 'score': 0.9768, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/non-negative-matrix-factorization', 'display_name': 'Non-negative Matrix Factorization', 'score': 0.613573}, {'id': 'https://openalex.org/keywords/document-clustering', 'display_name': 'Document Clustering', 'score': 0.6001}, {'id': 'https://openalex.org/keywords/semi-supervised-clustering', 'display_name': 'Semi-supervised Clustering', 'score': 0.551335}, {'id': 'https://openalex.org/keywords/clustering-algorithms', 'display_name': 'Clustering Algorithms', 'score': 0.544682}, {'id': 'https://openalex.org/keywords/feature-selection', 'display_name': 'Feature Selection', 'score': 0.543941}, {'id': 'https://openalex.org/keywords/subtractive-color', 'display_name': 'Subtractive color', 'score': 0.5174066}, {'id': 'https://openalex.org/keywords/benchmark', 'display_name': 'Benchmark (surveying)', 'score': 0.48861986}, {'id': 'https://openalex.org/keywords/basis', 'display_name': 'Basis (linear algebra)', 'score': 0.4730284}, {'id': 'https://openalex.org/keywords/rank', 'display_name': 'Rank (graph theory)', 'score': 0.42236528}, {'id': 'https://openalex.org/keywords/matrix', 'display_name': 'Matrix (chemical analysis)', 'score': 0.41279113}, {'id': 'https://openalex.org/keywords/abstraction', 'display_name': 'Abstraction', 'score': 0.4108426}], 'concepts': [{'id': 'https://openalex.org/C152671427', 'wikidata': 'https://www.wikidata.org/wiki/Q10843505', 'display_name': 'Non-negative matrix factorization', 'level': 4, 'score': 0.74700904}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.7043476}, {'id': 'https://openalex.org/C42355184', 'wikidata': 'https://www.wikidata.org/wiki/Q1361088', 'display_name': 'Matrix decomposition', 'level': 3, 'score': 0.6220861}, {'id': 'https://openalex.org/C73555534', 'wikidata': 'https://www.wikidata.org/wiki/Q622825', 'display_name': 'Cluster analysis', 'level': 2, 'score': 0.60664773}, {'id': 'https://openalex.org/C150799807', 'wikidata': 'https://www.wikidata.org/wiki/Q374081', 'display_name': 'Subtractive color', 'level': 2, 'score': 0.5174066}, {'id': 'https://openalex.org/C185798385', 'wikidata': 'https://www.wikidata.org/wiki/Q1161707', 'display_name': 'Benchmark (surveying)', 'level': 2, 'score': 0.48861986}, {'id': 'https://openalex.org/C12426560', 'wikidata': 'https://www.wikidata.org/wiki/Q189569', 'display_name': 'Basis (linear algebra)', 'level': 2, 'score': 0.4730284}, {'id': 'https://openalex.org/C139018669', 'wikidata': 'https://www.wikidata.org/wiki/Q6961560', 'display_name': 'Nonnegative matrix', 'level': 4, 'score': 0.45794836}, {'id': 'https://openalex.org/C187834632', 'wikidata': 'https://www.wikidata.org/wiki/Q188804', 'display_name': 'Factorization', 'level': 2, 'score': 0.44067448}, {'id': 'https://openalex.org/C164226766', 'wikidata': 'https://www.wikidata.org/wiki/Q7293202', 'display_name': 'Rank (graph theory)', 'level': 2, 'score': 0.42236528}, {'id': 'https://openalex.org/C106487976', 'wikidata': 'https://www.wikidata.org/wiki/Q685816', 'display_name': 'Matrix (chemical analysis)', 'level': 2, 'score': 0.41279113}, {'id': 'https://openalex.org/C124304363', 'wikidata': 'https://www.wikidata.org/wiki/Q673661', 'display_name': 'Abstraction', 'level': 2, 'score': 0.4108426}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.40624925}, {'id': 'https://openalex.org/C124101348', 'wikidata': 'https://www.wikidata.org/wiki/Q172491', 'display_name': 'Data mining', 'level': 1, 'score': 0.34967697}, {'id': 'https://openalex.org/C33923547', 'wikidata': 'https://www.wikidata.org/wiki/Q395', 'display_name': 'Mathematics', 'level': 0, 'score': 0.24312928}, {'id': 'https://openalex.org/C11413529', 'wikidata': 'https://www.wikidata.org/wiki/Q8366', 'display_name': 'Algorithm', 'level': 1, 'score': 0.22228375}, {'id': 'https://openalex.org/C54848796', 'wikidata': 'https://www.wikidata.org/wiki/Q339011', 'display_name': 'Symmetric matrix', 'level': 3, 'score': 0.14127007}, {'id': 'https://openalex.org/C114614502', 'wikidata': 'https://www.wikidata.org/wiki/Q76592', 'display_name': 'Combinatorics', 'level': 1, 'score': 0.07572281}, {'id': 'https://openalex.org/C121332964', 'wikidata': 'https://www.wikidata.org/wiki/Q413', 'display_name': 'Physics', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C62520636', 'wikidata': 'https://www.wikidata.org/wiki/Q944', 'display_name': 'Quantum mechanics', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C158693339', 'wikidata': 'https://www.wikidata.org/wiki/Q190524', 'display_name': 'Eigenvalues and eigenvectors', 'level': 2, 'score': 0.0}, {'id': 'https://openalex.org/C142362112', 'wikidata': 'https://www.wikidata.org/wiki/Q735', 'display_name': 'Art', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C138885662', 'wikidata': 'https://www.wikidata.org/wiki/Q5891', 'display_name': 'Philosophy', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C192562407', 'wikidata': 'https://www.wikidata.org/wiki/Q228736', 'display_name': 'Materials science', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C153349607', 'wikidata': 'https://www.wikidata.org/wiki/Q36649', 'display_name': 'Visual arts', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C159985019', 'wikidata': 'https://www.wikidata.org/wiki/Q181790', 'display_name': 'Composite material', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C2524010', 'wikidata': 'https://www.wikidata.org/wiki/Q8087', 'display_name': 'Geometry', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C13280743', 'wikidata': 'https://www.wikidata.org/wiki/Q131089', 'display_name': 'Geodesy', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C111472728', 'wikidata': 'https://www.wikidata.org/wiki/Q9471', 'display_name': 'Epistemology', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C205649164', 'wikidata': 'https://www.wikidata.org/wiki/Q1071', 'display_name': 'Geography', 'level': 0, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.1016/j.ipm.2004.11.005', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S174847851', 'display_name': 'Information Processing & Management', 'issn_l': '0306-4573', 'issn': ['0306-4573', '1873-5371'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310320990', 'host_organization_name': 'Elsevier BV', 'host_organization_lineage': ['https://openalex.org/P4310320990'], 'host_organization_lineage_names': ['Elsevier BV'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 18, 'referenced_works': ['https://openalex.org/W1544423579', 'https://openalex.org/W1596290048', 'https://openalex.org/W1982114217', 'https://openalex.org/W19916697', 'https://openalex.org/W2013029404', 'https://openalex.org/W2032088375', 'https://openalex.org/W206759535', 'https://openalex.org/W2090729167', 'https://openalex.org/W2116216716', 'https://openalex.org/W2124486835', 'https://openalex.org/W2127877256', 'https://openalex.org/W2135029798', 'https://openalex.org/W2480823300', 'https://openalex.org/W2550481632', 'https://openalex.org/W2749035964', 'https://openalex.org/W2951217737', 'https://openalex.org/W3143596294', 'https://openalex.org/W49804186'], 'related_works': ['https://openalex.org/W4386808492', 'https://openalex.org/W3099441337', 'https://openalex.org/W3000071285', 'https://openalex.org/W2731733684', 'https://openalex.org/W2127243424', 'https://openalex.org/W2108919995', 'https://openalex.org/W2083358384', 'https://openalex.org/W2037504162', 'https://openalex.org/W2022978200', 'https://openalex.org/W1989034480'], 'abstract_inverted_index': {'A': [0], 'methodology': [1], 'for': [2, 56, 62, 73], 'automatically': [3], 'identifying': [4], 'and': [5, 44, 68], 'clustering': [6], 'semantic': [7, 57], 'features': [8], 'or': [9], 'topics': [10], 'in': [11, 48, 94], 'a': [12, 23, 69, 88], 'heterogeneous': [13], 'text': [14, 91], 'collection': [15], 'is': [16, 20, 77], 'presented.': [17], 'Textual': [18], 'data': [19, 33], 'encoded': [21], 'using': [22], 'low': [24], 'rank': [25], 'nonnegative': [26, 63, 74], 'matrix': [27, 64, 75], 'factorization': [28, 65, 76], 'algorithm': [29], 'to': [30, 39], 'retain': [31], 'natural': [32], 'nonnegativity,': [34], 'thereby': [35], 'eliminating': [36], 'the': [37, 82], 'need': [38], 'use': [40], 'subtractive': [41], 'basis': [42], 'vector': [43], 'encoding': [45], 'calculations': [46], 'present': [47], 'other': [49], 'techniques': [50, 61], 'such': [51], 'as': [52], 'principal': [53], 'component': [54], 'analysis': [55], 'feature': [58], 'abstraction.': [59], 'Existing': [60], 'are': [66, 85], 'reviewed': [67], 'new': [70], 'hybrid': [71], 'technique': [72], 'proposed.': [78], 'Performance': [79], 'evaluations': [80], 'of': [81], 'proposed': [83], 'method': [84], 'conducted': [86], 'on': [87], 'few': [89], 'benchmark': [90], 'collections': [92], 'used': [93], 'standard': [95], 'topic': [96], 'detection': [97], 'studies.': [98]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W2113359929', 'counts_by_year': [{'year': 2024, 'cited_by_count': 11}, {'year': 2023, 'cited_by_count': 27}, {'year': 2022, 'cited_by_count': 17}, {'year': 2021, 'cited_by_count': 36}, {'year': 2020, 'cited_by_count': 36}, {'year': 2019, 'cited_by_count': 42}, {'year': 2018, 'cited_by_count': 37}, {'year': 2017, 'cited_by_count': 40}, {'year': 2016, 'cited_by_count': 36}, {'year': 2015, 'cited_by_count': 35}, {'year': 2014, 'cited_by_count': 55}, {'year': 2013, 'cited_by_count': 33}, {'year': 2012, 'cited_by_count': 26}], 'updated_date': '2024-09-12T00:07:50.692329', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works