Proper imputation techniques for missing values in data sets

Tahani Aljuaid; Sreela Sasi
{'id': 'https://openalex.org/W2574666645', 'doi': 'https://doi.org/10.1109/icdse.2016.7823957', 'title': 'Proper imputation techniques for missing values in data sets', 'display_name': 'Proper imputation techniques for missing values in data sets', 'publication_year': 2016, 'publication_date': '2016-08-01', 'ids': {'openalex': 'https://openalex.org/W2574666645', 'doi': 'https://doi.org/10.1109/icdse.2016.7823957', 'mag': '2574666645'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.1109/icdse.2016.7823957', 'pdf_url': None, 'source': None, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'proceedings-article', 'indexed_in': ['crossref'], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5083504267', 'display_name': 'Tahani Aljuaid', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I967637', 'display_name': 'Gannon University', 'ror': 'https://ror.org/02y041669', 'country_code': 'US', 'type': 'education', 'lineage': ['https://openalex.org/I967637']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Tahani Aljuaid', 'raw_affiliation_strings': ['Department of Computer and Information Science, Gannon University, Erie, PA, USA'], 'affiliations': [{'raw_affiliation_string': 'Department of Computer and Information Science, Gannon University, Erie, PA, USA', 'institution_ids': ['https://openalex.org/I967637']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5085361264', 'display_name': 'Sreela Sasi', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I967637', 'display_name': 'Gannon University', 'ror': 'https://ror.org/02y041669', 'country_code': 'US', 'type': 'education', 'lineage': ['https://openalex.org/I967637']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Sreela Sasi', 'raw_affiliation_strings': ['Department of Computer and Information Science, Gannon University, Erie, PA, USA'], 'affiliations': [{'raw_affiliation_string': 'Department of Computer and Information Science, Gannon University, Erie, PA, USA', 'institution_ids': ['https://openalex.org/I967637']}]}], 'institution_assertions': [], 'countries_distinct_count': 1, 'institutions_distinct_count': 1, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 4.963, 'has_fulltext': True, 'fulltext_origin': 'ngrams', 'cited_by_count': 75, 'citation_normalized_percentile': {'value': 0.999821, 'is_in_top_1_percent': True, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 97, 'max': 98}, 'biblio': {'volume': None, 'issue': None, 'first_page': None, 'last_page': None}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10538', 'display_name': 'Data Mining Techniques and Applications', 'score': 0.9961, 'subfield': {'id': 'https://openalex.org/subfields/1710', 'display_name': 'Information Systems'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10538', 'display_name': 'Data Mining Techniques and Applications', 'score': 0.9961, 'subfield': {'id': 'https://openalex.org/subfields/1710', 'display_name': 'Information Systems'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10136', 'display_name': 'Regularization and Variable Selection Methods', 'score': 0.9889, 'subfield': {'id': 'https://openalex.org/subfields/2613', 'display_name': 'Statistics and Probability'}, 'field': {'id': 'https://openalex.org/fields/26', 'display_name': 'Mathematics'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T13398', 'display_name': 'Statistical Computing and Data Analysis in R', 'score': 0.9868, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/imputation', 'display_name': 'Imputation (statistics)', 'score': 0.89011}, {'id': 'https://openalex.org/keywords/categorical-variable', 'display_name': 'Categorical variable', 'score': 0.7883266}, {'id': 'https://openalex.org/keywords/data-mining', 'display_name': 'Data Mining', 'score': 0.547962}, {'id': 'https://openalex.org/keywords/statistical-modeling', 'display_name': 'Statistical Modeling', 'score': 0.545535}, {'id': 'https://openalex.org/keywords/temporal-data-mining', 'display_name': 'Temporal Data Mining', 'score': 0.539298}, {'id': 'https://openalex.org/keywords/statistical-analysis', 'display_name': 'Statistical Analysis', 'score': 0.530696}, {'id': 'https://openalex.org/keywords/data-visualization', 'display_name': 'Data Visualization', 'score': 0.528661}], 'concepts': [{'id': 'https://openalex.org/C9357733', 'wikidata': 'https://www.wikidata.org/wiki/Q6878417', 'display_name': 'Missing data', 'level': 2, 'score': 0.9609711}, {'id': 'https://openalex.org/C58041806', 'wikidata': 'https://www.wikidata.org/wiki/Q1660484', 'display_name': 'Imputation (statistics)', 'level': 3, 'score': 0.89011}, {'id': 'https://openalex.org/C5274069', 'wikidata': 'https://www.wikidata.org/wiki/Q2285707', 'display_name': 'Categorical variable', 'level': 2, 'score': 0.7883266}, {'id': 'https://openalex.org/C124101348', 'wikidata': 'https://www.wikidata.org/wiki/Q172491', 'display_name': 'Data mining', 'level': 1, 'score': 0.63154656}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.6310924}, {'id': 'https://openalex.org/C24756922', 'wikidata': 'https://www.wikidata.org/wiki/Q1757694', 'display_name': 'Data quality', 'level': 3, 'score': 0.4974611}, {'id': 'https://openalex.org/C105795698', 'wikidata': 'https://www.wikidata.org/wiki/Q12483', 'display_name': 'Statistics', 'level': 1, 'score': 0.35710454}, {'id': 'https://openalex.org/C33923547', 'wikidata': 'https://www.wikidata.org/wiki/Q395', 'display_name': 'Mathematics', 'level': 0, 'score': 0.21152863}, {'id': 'https://openalex.org/C119857082', 'wikidata': 'https://www.wikidata.org/wiki/Q2539', 'display_name': 'Machine learning', 'level': 1, 'score': 0.15767264}, {'id': 'https://openalex.org/C127413603', 'wikidata': 'https://www.wikidata.org/wiki/Q11023', 'display_name': 'Engineering', 'level': 0, 'score': 0.094055265}, {'id': 'https://openalex.org/C176217482', 'wikidata': 'https://www.wikidata.org/wiki/Q860554', 'display_name': 'Metric (unit)', 'level': 2, 'score': 0.0}, {'id': 'https://openalex.org/C21547014', 'wikidata': 'https://www.wikidata.org/wiki/Q1423657', 'display_name': 'Operations management', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.1109/icdse.2016.7823957', 'pdf_url': None, 'source': None, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 12, 'referenced_works': ['https://openalex.org/W1498291970', 'https://openalex.org/W1699045790', 'https://openalex.org/W186161899', 'https://openalex.org/W1978433418', 'https://openalex.org/W2056797234', 'https://openalex.org/W2095537056', 'https://openalex.org/W2096863518', 'https://openalex.org/W2162313689', 'https://openalex.org/W2174160981', 'https://openalex.org/W2399542702', 'https://openalex.org/W2545196291', 'https://openalex.org/W3120740533'], 'related_works': ['https://openalex.org/W4211215373', 'https://openalex.org/W3179858851', 'https://openalex.org/W3144172081', 'https://openalex.org/W3028371478', 'https://openalex.org/W2581984549', 'https://openalex.org/W2519761320', 'https://openalex.org/W2181530120', 'https://openalex.org/W2081476516', 'https://openalex.org/W2024529227', 'https://openalex.org/W1574575415'], 'abstract_inverted_index': {'Data': [0], 'mining': [1], 'requires': [2], 'a': [3, 28, 35, 74, 78, 90, 163], 'pre-processing': [4], 'task': [5], 'in': [6, 30, 73], 'which': [7], 'the': [8, 16, 39, 54, 57, 197, 201], 'data': [9, 23, 83, 119, 132, 151, 180, 183], 'are': [10, 67, 173, 193, 203], 'prepared': [11], 'and': [12, 103, 122, 200], 'cleaned': [13], 'for': [14, 27, 62, 105], 'ensuring': [15], 'quality.': [17], 'Missing': [18, 131], 'value': [19, 24], 'occurs': [20], 'when': [21, 42], 'no': [22], 'is': [25, 77, 114], 'stored': [26], 'variable': [29], 'an': [31, 166], 'observation.': [32], 'This': [33, 87], 'has': [34], 'significant': [36], 'effect': [37], 'on': [38, 116, 196], 'results': [40, 202], 'especially': [41], 'it': [43], 'leads': [44], 'to': [45, 156, 175], 'biased': [46], 'parameter': [47], 'estimates.': [48], 'It': [49], 'will': [50], 'not': [51, 144], 'only': [52], 'diminish': [53], 'quality': [55], 'of': [56, 80, 92, 110, 149, 185, 190], 'result,': [58], 'but': [59], 'also': [60], 'disqualify': [61], 'analysis': [63], 'purposes.': [64], 'Hence': [65], 'there': [66], 'risks': [68], 'associated': [69], 'with': [70, 84, 154], 'missing': [71, 82, 106, 118, 136, 140, 145, 150, 179], 'values': [72], 'dataset.': [75], 'Imputation': [76], 'technique': [79], 'replacing': [81], 'substituted': [85], 'values.': [86], 'research': [88], 'presents': [89], 'comparison': [91], 'imputation': [93, 112, 171], 'techniques': [94, 172, 192], 'such': [95], 'as': [96], 'Mean\\Mode,': [97], 'K-Nearest': [98], 'Neighbor,': [99], 'Hot-Deck,': [100], 'Expectation': [101], 'Maximization': [102], 'C5.0': [104], 'data.': [107], 'The': [108, 188], 'choice': [109], 'proper': [111], 'method': [113], 'based': [115, 195], 'datatypes,': [117], 'mechanisms,': [120], 'patterns': [121], 'methods.': [123], 'Datatype': [124], 'can': [125, 134, 152, 161], 'be': [126, 135, 153, 162], 'numerical,': [127], 'categorical': [128], 'or': [129, 143, 158, 165], 'mixed.': [130], 'mechanism': [133], 'completely': [137], 'at': [138, 141, 146], 'random,': [139, 142], 'random.': [147], 'Patterns': [148], 'respect': [155], 'cases': [157], 'attributes.': [159], 'Methods': [160], 'pre-replace': [164], 'embedded': [167], 'method.': [168], 'These': [169], 'five': [170], 'used': [174], 'impute': [176], 'artificially': [177], 'created': [178], 'from': [181], 'different': [182], 'sets': [184], 'varying': [186], 'sizes.': [187], 'performance': [189], 'these': [191], 'compared': [194], 'classification': [198], 'accuracy': [199], 'presented.': [204]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W2574666645', 'counts_by_year': [{'year': 2024, 'cited_by_count': 10}, {'year': 2023, 'cited_by_count': 11}, {'year': 2022, 'cited_by_count': 11}, {'year': 2021, 'cited_by_count': 15}, {'year': 2020, 'cited_by_count': 6}, {'year': 2019, 'cited_by_count': 14}, {'year': 2018, 'cited_by_count': 6}, {'year': 2017, 'cited_by_count': 1}, {'year': 2016, 'cited_by_count': 1}], 'updated_date': '2024-09-15T21:25:02.776793', 'created_date': '2017-01-26'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works