Ecient Sparse Matrix-Vector Multiplication on CUDA

Nathan Bell; Michael Garland
{'id': 'https://openalex.org/W2124007994', 'doi': None, 'title': 'Ecient Sparse Matrix-Vector Multiplication on CUDA', 'display_name': 'Ecient Sparse Matrix-Vector Multiplication on CUDA', 'publication_year': 2008, 'publication_date': '2008-01-01', 'ids': {'openalex': 'https://openalex.org/W2124007994', 'mag': '2124007994'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'http://mgarland.org/files/papers/nvr-2008-004.pdf', 'pdf_url': None, 'source': None, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'journal-article', 'indexed_in': [], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5105939130', 'display_name': 'Nathan Bell', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I1304085615', 'display_name': 'Nvidia (United Kingdom)', 'ror': 'https://ror.org/02kr42612', 'country_code': 'GB', 'type': 'company', 'lineage': ['https://openalex.org/I1304085615', 'https://openalex.org/I4210127875']}], 'countries': ['GB'], 'is_corresponding': False, 'raw_author_name': 'Nathan Bell', 'raw_affiliation_strings': ['Nvidia'], 'affiliations': [{'raw_affiliation_string': 'Nvidia', 'institution_ids': ['https://openalex.org/I1304085615']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5024606205', 'display_name': 'Michael Garland', 'orcid': 'https://orcid.org/0000-0001-6093-7602'}, 'institutions': [{'id': 'https://openalex.org/I1304085615', 'display_name': 'Nvidia (United Kingdom)', 'ror': 'https://ror.org/02kr42612', 'country_code': 'GB', 'type': 'company', 'lineage': ['https://openalex.org/I1304085615', 'https://openalex.org/I4210127875']}], 'countries': ['GB'], 'is_corresponding': False, 'raw_author_name': 'Michael Garland', 'raw_affiliation_strings': ['Nvidia'], 'affiliations': [{'raw_affiliation_string': 'Nvidia', 'institution_ids': ['https://openalex.org/I1304085615']}]}], 'countries_distinct_count': 1, 'institutions_distinct_count': 1, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 10.276, 'has_fulltext': False, 'cited_by_count': 592, 'citation_normalized_percentile': {'value': 0.911216, 'is_in_top_1_percent': False, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 99, 'max': 100}, 'biblio': {'volume': None, 'issue': None, 'first_page': None, 'last_page': None}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10792', 'display_name': 'Matrix Algorithms and Iterative Methods', 'score': 0.9995, 'subfield': {'id': 'https://openalex.org/subfields/1703', 'display_name': 'Computational Theory and Mathematics'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10792', 'display_name': 'Matrix Algorithms and Iterative Methods', 'score': 0.9995, 'subfield': {'id': 'https://openalex.org/subfields/1703', 'display_name': 'Computational Theory and Mathematics'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10054', 'display_name': 'Parallel Computing and Performance Optimization', 'score': 0.9994, 'subfield': {'id': 'https://openalex.org/subfields/1708', 'display_name': 'Hardware and Architecture'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10715', 'display_name': 'Distributed Grid Computing Systems', 'score': 0.9972, 'subfield': {'id': 'https://openalex.org/subfields/1705', 'display_name': 'Computer Networks and Communications'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/linear-algebra', 'display_name': 'Linear algebra', 'score': 0.60317904}, {'id': 'https://openalex.org/keywords/gpu-computing', 'display_name': 'GPU Computing', 'score': 0.590331}, {'id': 'https://openalex.org/keywords/double-precision-floating-point-format', 'display_name': 'Double-precision floating-point format', 'score': 0.558608}, {'id': 'https://openalex.org/keywords/parallel-computing', 'display_name': 'Parallel Computing', 'score': 0.556535}, {'id': 'https://openalex.org/keywords/sparse-linear-systems', 'display_name': 'Sparse Linear Systems', 'score': 0.555332}, {'id': 'https://openalex.org/keywords/matrix-computations', 'display_name': 'Matrix Computations', 'score': 0.53293}, {'id': 'https://openalex.org/keywords/high-performance-computing', 'display_name': 'High-Performance Computing', 'score': 0.518}, {'id': 'https://openalex.org/keywords/matrix', 'display_name': 'Matrix (chemical analysis)', 'score': 0.47163436}, {'id': 'https://openalex.org/keywords/memory-bandwidth', 'display_name': 'Memory bandwidth', 'score': 0.46003115}], 'concepts': [{'id': 'https://openalex.org/C2778119891', 'wikidata': 'https://www.wikidata.org/wiki/Q477690', 'display_name': 'CUDA', 'level': 2, 'score': 0.85129154}, {'id': 'https://openalex.org/C173608175', 'wikidata': 'https://www.wikidata.org/wiki/Q232661', 'display_name': 'Parallel computing', 'level': 1, 'score': 0.7193929}, {'id': 'https://openalex.org/C2780595030', 'wikidata': 'https://www.wikidata.org/wiki/Q3860309', 'display_name': 'Multiplication (music)', 'level': 2, 'score': 0.7173583}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.71718854}, {'id': 'https://openalex.org/C56372850', 'wikidata': 'https://www.wikidata.org/wiki/Q1050404', 'display_name': 'Sparse matrix', 'level': 3, 'score': 0.6484731}, {'id': 'https://openalex.org/C139352143', 'wikidata': 'https://www.wikidata.org/wiki/Q82571', 'display_name': 'Linear algebra', 'level': 2, 'score': 0.60317904}, {'id': 'https://openalex.org/C35912277', 'wikidata': 'https://www.wikidata.org/wiki/Q1243369', 'display_name': 'Double-precision floating-point format', 'level': 3, 'score': 0.558608}, {'id': 'https://openalex.org/C17349429', 'wikidata': 'https://www.wikidata.org/wiki/Q1049914', 'display_name': 'Matrix multiplication', 'level': 3, 'score': 0.48311922}, {'id': 'https://openalex.org/C106487976', 'wikidata': 'https://www.wikidata.org/wiki/Q685816', 'display_name': 'Matrix (chemical analysis)', 'level': 2, 'score': 0.47163436}, {'id': 'https://openalex.org/C188045654', 'wikidata': 'https://www.wikidata.org/wiki/Q17148339', 'display_name': 'Memory bandwidth', 'level': 2, 'score': 0.46003115}, {'id': 'https://openalex.org/C459310', 'wikidata': 'https://www.wikidata.org/wiki/Q117801', 'display_name': 'Computational science', 'level': 1, 'score': 0.4218786}, {'id': 'https://openalex.org/C11413529', 'wikidata': 'https://www.wikidata.org/wiki/Q8366', 'display_name': 'Algorithm', 'level': 1, 'score': 0.3400643}, {'id': 'https://openalex.org/C45374587', 'wikidata': 'https://www.wikidata.org/wiki/Q12525525', 'display_name': 'Computation', 'level': 2, 'score': 0.30656904}, {'id': 'https://openalex.org/C33923547', 'wikidata': 'https://www.wikidata.org/wiki/Q395', 'display_name': 'Mathematics', 'level': 0, 'score': 0.16806325}, {'id': 'https://openalex.org/C163716315', 'wikidata': 'https://www.wikidata.org/wiki/Q901177', 'display_name': 'Gaussian', 'level': 2, 'score': 0.06993577}, {'id': 'https://openalex.org/C121332964', 'wikidata': 'https://www.wikidata.org/wiki/Q413', 'display_name': 'Physics', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C2524010', 'wikidata': 'https://www.wikidata.org/wiki/Q8087', 'display_name': 'Geometry', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C192562407', 'wikidata': 'https://www.wikidata.org/wiki/Q228736', 'display_name': 'Materials science', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C114614502', 'wikidata': 'https://www.wikidata.org/wiki/Q76592', 'display_name': 'Combinatorics', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C62520636', 'wikidata': 'https://www.wikidata.org/wiki/Q944', 'display_name': 'Quantum mechanics', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C159985019', 'wikidata': 'https://www.wikidata.org/wiki/Q181790', 'display_name': 'Composite material', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C84114770', 'wikidata': 'https://www.wikidata.org/wiki/Q46344', 'display_name': 'Quantum', 'level': 2, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'http://mgarland.org/files/papers/nvr-2008-004.pdf', 'pdf_url': None, 'source': None, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 16, 'referenced_works': ['https://openalex.org/W1480928214', 'https://openalex.org/W1506342804', 'https://openalex.org/W1568272005', 'https://openalex.org/W1595783387', 'https://openalex.org/W1988425770', 'https://openalex.org/W1990832096', 'https://openalex.org/W2028499920', 'https://openalex.org/W2051142108', 'https://openalex.org/W2063186542', 'https://openalex.org/W2095292158', 'https://openalex.org/W2102201348', 'https://openalex.org/W2119547137', 'https://openalex.org/W2130289795', 'https://openalex.org/W2142356482', 'https://openalex.org/W2151285624', 'https://openalex.org/W2155503253'], 'related_works': ['https://openalex.org/W2997945685', 'https://openalex.org/W2518567779', 'https://openalex.org/W2183317050', 'https://openalex.org/W2167868137', 'https://openalex.org/W2154118576', 'https://openalex.org/W2128853364', 'https://openalex.org/W2128539477', 'https://openalex.org/W2119547137', 'https://openalex.org/W2098841537', 'https://openalex.org/W2095836023', 'https://openalex.org/W2095292158', 'https://openalex.org/W2063186542', 'https://openalex.org/W2038071684', 'https://openalex.org/W2035080386', 'https://openalex.org/W1990832096', 'https://openalex.org/W1987840949', 'https://openalex.org/W1965034778', 'https://openalex.org/W1762731526', 'https://openalex.org/W1575701986', 'https://openalex.org/W1506342804'], 'abstract_inverted_index': {'The': [0], 'massive': [1], 'parallelism': [2], 'of': [3, 54, 85, 92, 108, 128, 141, 158, 185, 205, 225, 240], 'graphics': [4], 'processing': [5], 'units': [6], '(GPUs)': [7], 'oers': [8], 'tremendous': [9], 'performance': [10, 157, 182, 216], 'in': [11, 38, 57, 125, 161, 167, 183, 191], 'many': [12], 'high-performance': [13], 'computing': [14], 'applications.': [15], 'While': [16], 'dense': [17], 'linear': [18, 44, 59], 'algebra': [19], 'readily': [20], 'maps': [21], 'to': [22, 118, 136, 201], 'such': [23], 'platforms,': [24], 'harnessing': [25], 'this': [26, 62], 'potential': [27], 'for': [28, 41, 70, 80], 'sparse': [29, 43, 49, 58, 109], 'matrix': [30, 131, 142], 'computations': [31], 'presents': [32], 'additional': [33], 'challenges.': [34], 'Given': [35, 88], 'its': [36], 'role': [37], 'iterative': [39], 'methods': [40, 135, 207], 'solving': [42], 'systems': [45], 'and': [46, 68, 99, 116, 164, 188, 193, 220, 232], 'eigenvalue': [47], 'problems,': [48], 'matrix-vector': [50], 'multiplication': [51], '(SpMV)': [52], 'is': [53, 217], 'singular': [55], 'importance': [56], 'algebra.': [60], 'In': [61], 'paper': [63], 'we': [64, 94, 155, 180], 'discuss': [65], 'data': [66], 'structures': [67], 'algorithms': [69], 'SpMV': [71, 206, 215], 'that': [72, 113, 224, 239], 'are': [73, 114], 'eciently': [74], 'implemented': [75], 'on': [76, 170, 208], 'the': [77, 81, 86, 89, 126], 'CUDA': [78], 'platform': [79], 'ne-grained': [82], 'parallel': [83], 'architecture': [84], 'GPU.': [87, 175], 'memory-bound': [90], 'nature': [91], 'SpMV,': [93], 'emphasize': [95], 'memory': [96], 'bandwidth': [97], 'eciency': [98], 'compact': [100], 'storage': [101], 'formats.': [102], 'We': [103, 133], 'consider': [104], 'a': [105, 171, 221, 226, 241], 'broad': [106], 'spectrum': [107], 'matrices,': [110, 179], 'from': [111], 'those': [112], 'well-structured': [115], 'regular': [117], 'highly': [119], 'irregular': [120], 'matrices': [121, 154], 'with': [122, 229], 'large': [123], 'imbalances': [124], 'distribution': [127], 'nonzeros': [129], 'per': [130], 'row.': [132], 'develop': [134], 'exploit': [137], 'several': [138], 'common': [139], 'forms': [140], 'structure': [143], 'while': [144], 'oering': [145], 'alternatives': [146], 'which': [147], 'accommodate': [148], 'greater': [149, 237], 'irregularity.': [150], 'On': [151], 'structured,': [152], 'grid-based': [153], 'achieve': [156], '36': [159], 'GFLOP/s': [160, 166, 187, 190], 'single': [162, 192], 'precision': [163, 169, 195, 214], '16': [165], 'double': [168, 194, 213], 'GeForce': [172], 'GTX': [173], '280': [174], 'For': [176], 'unstructured': [177], 'nite-element': [178], 'observe': [181], 'excess': [184], '15': [186], '10': [189], 'respectively.': [196], 'These': [197], 'results': [198], 'compare': [199], 'favorably': [200], 'prior': [202], 'state-of-the-art': [203], 'studies': [204], 'conventional': [209], 'multicore': [210], 'processors.': [211], 'Our': [212], 'generally': [218], 'two': [219], 'half': [222], 'times': [223, 236], 'Cell': [227], 'BE': [228], '8': [230], 'SPEs': [231], 'more': [233], 'than': [234, 238], 'ten': [235], 'quad-core': [242], 'Intel': [243], 'Clovertown': [244], 'system.': [245]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W2124007994', 'counts_by_year': [{'year': 2022, 'cited_by_count': 2}, {'year': 2021, 'cited_by_count': 24}, {'year': 2020, 'cited_by_count': 21}, {'year': 2019, 'cited_by_count': 21}, {'year': 2018, 'cited_by_count': 33}, {'year': 2017, 'cited_by_count': 41}, {'year': 2016, 'cited_by_count': 64}, {'year': 2015, 'cited_by_count': 46}, {'year': 2014, 'cited_by_count': 68}, {'year': 2013, 'cited_by_count': 72}, {'year': 2012, 'cited_by_count': 71}], 'updated_date': '2024-09-10T02:18:08.176993', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works