Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

Kaushik Datta
{'id': 'https://openalex.org/W1547833503', 'doi': None, 'title': 'Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors', 'display_name': 'Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors', 'publication_year': 2009, 'publication_date': '2009-07-24', 'ids': {'openalex': 'https://openalex.org/W1547833503', 'mag': '1547833503'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://escholarship.org/uc/item/2gm4v579.pdf', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S2764971142', 'display_name': 'Lawrence Berkeley National Laboratory', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4320800592', 'host_organization_name': 'United States Department of Energy', 'host_organization_lineage': ['https://openalex.org/P4320800592'], 'host_organization_lineage_names': ['United States Department of Energy'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'journal-article', 'indexed_in': [], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5104047193', 'display_name': 'Kaushik Datta', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': True, 'raw_author_name': 'Kaushik Datta', 'raw_affiliation_strings': [], 'affiliations': []}], 'institution_assertions': [], 'countries_distinct_count': 0, 'institutions_distinct_count': 0, 'corresponding_author_ids': ['https://openalex.org/A5104047193'], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 0.0, 'has_fulltext': False, 'cited_by_count': 0, 'citation_normalized_percentile': {'value': 0.0, 'is_in_top_1_percent': False, 'is_in_top_10_percent': False}, 'cited_by_percentile_year': {'min': 0, 'max': 64}, 'biblio': {'volume': None, 'issue': None, 'first_page': None, 'last_page': None}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10054', 'display_name': 'Parallel Computing and Performance Optimization', 'score': 0.9992, 'subfield': {'id': 'https://openalex.org/subfields/1708', 'display_name': 'Hardware and Architecture'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10054', 'display_name': 'Parallel Computing and Performance Optimization', 'score': 0.9992, 'subfield': {'id': 'https://openalex.org/subfields/1708', 'display_name': 'Hardware and Architecture'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11181', 'display_name': 'Distributed Storage Systems and Network Coding', 'score': 0.9975, 'subfield': {'id': 'https://openalex.org/subfields/1705', 'display_name': 'Computer Networks and Communications'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10715', 'display_name': 'Distributed Grid Computing Systems', 'score': 0.9911, 'subfield': {'id': 'https://openalex.org/subfields/1705', 'display_name': 'Computer Networks and Communications'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/stencil', 'display_name': 'Stencil', 'score': 0.92382944}, {'id': 'https://openalex.org/keywords/performance-optimization', 'display_name': 'Performance Optimization', 'score': 0.606781}, {'id': 'https://openalex.org/keywords/gpu-computing', 'display_name': 'GPU Computing', 'score': 0.574897}, {'id': 'https://openalex.org/keywords/high-performance-computing', 'display_name': 'High-Performance Computing', 'score': 0.550856}, {'id': 'https://openalex.org/keywords/cache-pollution', 'display_name': 'Cache pollution', 'score': 0.5441682}, {'id': 'https://openalex.org/keywords/multicore-architectures', 'display_name': 'Multicore Architectures', 'score': 0.542139}, {'id': 'https://openalex.org/keywords/parallel-computing', 'display_name': 'Parallel Computing', 'score': 0.540756}, {'id': 'https://openalex.org/keywords/memory-hierarchy', 'display_name': 'Memory hierarchy', 'score': 0.48722997}, {'id': 'https://openalex.org/keywords/cache-oblivious-algorithm', 'display_name': 'Cache-oblivious algorithm', 'score': 0.4342012}, {'id': 'https://openalex.org/keywords/ibm', 'display_name': 'IBM', 'score': 0.43102023}], 'concepts': [{'id': 'https://openalex.org/C76752949', 'wikidata': 'https://www.wikidata.org/wiki/Q7607499', 'display_name': 'Stencil', 'level': 2, 'score': 0.92382944}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.83761454}, {'id': 'https://openalex.org/C173608175', 'wikidata': 'https://www.wikidata.org/wiki/Q232661', 'display_name': 'Parallel computing', 'level': 1, 'score': 0.806815}, {'id': 'https://openalex.org/C115537543', 'wikidata': 'https://www.wikidata.org/wiki/Q165596', 'display_name': 'Cache', 'level': 2, 'score': 0.7613417}, {'id': 'https://openalex.org/C113166858', 'wikidata': 'https://www.wikidata.org/wiki/Q5015981', 'display_name': 'Cache pollution', 'level': 5, 'score': 0.5441682}, {'id': 'https://openalex.org/C38556500', 'wikidata': 'https://www.wikidata.org/wiki/Q13404475', 'display_name': 'Cache algorithms', 'level': 4, 'score': 0.49350083}, {'id': 'https://openalex.org/C2778100165', 'wikidata': 'https://www.wikidata.org/wiki/Q1589327', 'display_name': 'Memory hierarchy', 'level': 3, 'score': 0.48722997}, {'id': 'https://openalex.org/C201148951', 'wikidata': 'https://www.wikidata.org/wiki/Q5015976', 'display_name': 'Cache coloring', 'level': 4, 'score': 0.462956}, {'id': 'https://openalex.org/C189783530', 'wikidata': 'https://www.wikidata.org/wiki/Q352090', 'display_name': 'CPU cache', 'level': 3, 'score': 0.46195105}, {'id': 'https://openalex.org/C59687516', 'wikidata': 'https://www.wikidata.org/wiki/Q5015938', 'display_name': 'Cache-oblivious algorithm', 'level': 5, 'score': 0.4342012}, {'id': 'https://openalex.org/C70388272', 'wikidata': 'https://www.wikidata.org/wiki/Q5968558', 'display_name': 'IBM', 'level': 2, 'score': 0.43102023}, {'id': 'https://openalex.org/C111919701', 'wikidata': 'https://www.wikidata.org/wiki/Q9135', 'display_name': 'Operating system', 'level': 1, 'score': 0.321392}, {'id': 'https://openalex.org/C459310', 'wikidata': 'https://www.wikidata.org/wiki/Q117801', 'display_name': 'Computational science', 'level': 1, 'score': 0.19258338}, {'id': 'https://openalex.org/C192562407', 'wikidata': 'https://www.wikidata.org/wiki/Q228736', 'display_name': 'Materials science', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C171250308', 'wikidata': 'https://www.wikidata.org/wiki/Q11468', 'display_name': 'Nanotechnology', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://escholarship.org/uc/item/2gm4v579.pdf', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S2764971142', 'display_name': 'Lawrence Berkeley National Laboratory', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4320800592', 'host_organization_name': 'United States Department of Energy', 'host_organization_lineage': ['https://openalex.org/P4320800592'], 'host_organization_lineage_names': ['United States Department of Energy'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 0, 'referenced_works': [], 'related_works': ['https://openalex.org/W3092315650', 'https://openalex.org/W3091752332', 'https://openalex.org/W3080357212', 'https://openalex.org/W2996929894', 'https://openalex.org/W2996597839', 'https://openalex.org/W2977807394', 'https://openalex.org/W2769005600', 'https://openalex.org/W2730450744', 'https://openalex.org/W2583643880', 'https://openalex.org/W2560781234', 'https://openalex.org/W2302356120', 'https://openalex.org/W224992144', 'https://openalex.org/W2186216222', 'https://openalex.org/W2157685422', 'https://openalex.org/W2148038801', 'https://openalex.org/W2129829929', 'https://openalex.org/W2091545779', 'https://openalex.org/W2056599241', 'https://openalex.org/W2028067303', 'https://openalex.org/W1506424797'], 'abstract_inverted_index': {'OPTIMIZATION': [0], 'AND': [1], 'PERFORMANCE': [2], 'MODELING': [3], 'OF': [4], 'STENCIL': [5], 'COMPUTATIONS': [6], 'ON': [7], 'MODERN': [8], 'MICROPROCESSORS': [9], '‡': [10, 519], 'KAUSHIK': [11], 'DATTA': [12], '†': [13, 541], 'SHOAIB': [14], 'KAMIL': [15], '∗†': [16, 19, 28], 'SAMUEL': [17], 'WILLIAMS': [18], 'LEONID': [20], 'OLIKER': [21], '∗': [22, 25, 521], 'JOHN': [23], 'SHALF': [24], 'KATHERINE': [26], 'YELICK': [27], 'Abstract.': [29], 'Stencil-based': [30], 'kernels': [31], 'constitute': [32, 258], 'the': [33, 56, 69, 113, 127, 132, 148, 171, 193, 200, 211, 329, 332, 376, 379, 410, 429, 435, 464, 488, 508], 'core': [34], 'of': [35, 50, 71, 79, 131, 147, 152, 173, 207, 219, 262, 318, 331, 378, 393, 424, 431, 458, 507, 524, 546], 'many': [36, 468], 'important': [37], 'scientiﬁc': [38, 263], 'applications': [39, 264, 278], 'on': [40, 76, 112, 126, 197, 414, 441, 455, 462, 484, 502], 'block-structured': [41], 'grids.': [42], 'Unfortunately,': [43], 'these': [44, 386, 418], 'codes': [45], 'achieve': [46, 389], 'a': [47, 77, 135, 182, 189, 259, 290, 300, 306, 316, 384, 390], 'low': [48, 391], 'fraction': [49, 261, 392], 'peak': [51, 209, 221, 395], 'performance,': [52, 396], 'due': [53], 'primarily': [54], 'to': [55, 87, 158, 343, 352, 407, 420, 449, 463, 474], 'disparity': [57], 'between': [58, 512], 'processor': [59, 215], 'and': [60, 83, 100, 118, 155, 274, 324, 355, 514], 'main': [61, 400], 'memory': [62, 74, 140, 166, 196, 401, 425], 'speeds.': [63], 'In': [64, 299, 495], 'this': [65, 496, 525], 'paper,': [66], 'we': [67, 122, 498], 'explore': [68], 'impact': [70], 'trends': [72, 164], 'in': [73, 165, 265, 305, 321, 487, 505, 528], 'subsystems': [75], 'variety': [78], 'stencil': [80, 102, 124, 153, 301, 500], 'optimization': [81], 'techniques': [82, 111], 'develop': [84], 'performance': [85, 156, 232, 234, 510], 'models': [86], 'analytically': [88], 'guide': [89], 'our': [90, 143], 'optimizations.': [91, 177], 'Our': [92], 'work': [93, 144], 'targets': [94], 'cache': [95, 227], 'reuse': [96], 'methodologies': [97], 'across': [98], 'single': [99], 'multiple': [101], 'sweeps,': [103], 'examining': [104], 'cache-aware': [105, 183], 'algorithms': [106], 'as': [107, 109, 269, 516, 518], 'well': [108, 517], 'cache-oblivious': [110, 190, 230], 'Intel': [114, 236], 'Itanium2,': [115, 237], 'AMD': [116, 238], 'Opteron,': [117, 239], 'IBM': [119, 240], 'Power5.': [120], 'Additionally,': [121], 'consider': [123], 'computations': [125, 296, 387, 419, 501], 'heterogeneous': [128], 'multi-core': [129], 'design': [130], 'Cell': [133, 198, 204], 'processor,': [134], 'machine': [136], 'with': [137, 312], 'an': [138], 'explicitly-managed': [139], 'hierarchy.': [141], 'Overall': [142], 'represents': [145], 'one': [146], 'most': [149], 'extensive': [150], 'analyses': [151], 'optimizations': [154, 443, 470], 'modeling': [157], 'date.': [159], 'Results': [160], 'demonstrate': [161], 'that': [162, 181, 335, 346, 369, 447], 'recent': [163, 479], 'system': [167], 'organization': [168], 'have': [169, 438, 481], 'reduced': [170], 'eﬃcacy': [172], 'traditional': [174], 'cache-': [175], 'blocking': [176], 'We': [178], 'also': [179], 'show': [180], 'implementation': [184], 'is': [185, 310], 'signiﬁcantly': [186], 'faster': [187], 'than': [188, 375], 'approach,': [191], 'while': [192, 210], 'explicitly': [194], 'managed': [195], 'enables': [199], 'highest': [201], 'overall': [202], 'eﬃciency:': [203], 'attains': [205], '88%': [206], 'algorithmic': [208, 220], 'best': [212], 'competing': [213], 'cache-based': [214], 'only': [216], 'achieves': [217], '54%': [218], 'performance.': [222], 'Key': [223], 'words.': [224], 'Stencil': [225, 361], 'computations,': [226], 'blocking,': [228], 'time-skewing,': [229], 'algorithms,': [231], 'modeling,': [233], 'evaluation,': [235], 'Power5,': [241], 'STI': [242], 'Cell.': [243], 'AMS': [244], 'subject': [245, 430], 'classiﬁcations.': [246], '65Y10,': [247], '65Yxx,': [248], '35R99,': [249], '68M20': [250], '1.': [251], 'Introduction.': [252], 'Partial': [253], 'diﬀerential': [254], 'equation': [255], '(PDE)': [256], 'solvers': [257, 345], 'large': [260], 'such': [266], 'diverse': [267], 'areas': [268], 'heat': [270], 'diﬀusion,': [271], 'electromag-': [272], 'netics,': [273], 'ﬂuid': [275], 'dynamics.': [276], 'These': [277, 338, 437], 'are': [279, 340, 370], 'often': [280], 'implemented': [281], 'using': [282], 'itera-': [283], 'tive': [284], 'ﬁnite-diﬀerence': [285], 'techniques,': [286], 'which': [287], 'sweep': [288], 'over': [289, 434], 'spatial': [291, 476], 'grid,': [292], 'performing': [293, 453], 'nearest': [294], 'neighbor': [295], 'called': [297], 'stencils.': [298], 'operation,': [302], 'each': [303], 'point': [304], 'multidi-': [307], 'mensional': [308], 'grid': [309], 'updated': [311], 'weighted': [313], 'contributions': [314], 'from': [315, 348, 399], 'subset': [317], 'its': [319], 'neighbors': [320], 'both': [322], 'time': [323, 489], 'space': [325], '—': [326], 'thereby': [327], 'representing': [328], 'coeﬃcients': [330], 'PDE': [333], 'for': [334], 'data': [336, 367, 381, 398, 459], 'element.': [337], 'operations': [339, 454], 'then': [341], 'used': [342], 'build': [344], 'range': [347], 'simple': [349], 'Jacobi': [350], 'iterations': [351], 'complex': [353], 'multigrid': [354], 'adaptive': [356], 'mesh': [357], 'reﬁnement': [358], 'methods': [359], '[3].': [360], 'calculations': [362], 'perform': [363], 'global': [364], 'sweeps': [365], 'through': [366], 'structures': [368], 'typ-': [371], 'ically': [372], 'much': [373, 432], 'larger': [374], 'capacity': [377], 'available': [380], 'caches.': [382], 'As': [383], 'result,': [385], 'generally': [388], 'theoretical': [394], 'since': [397], 'cannot': [402], 'be': [403], 'transferred': [404], 'fast': [405], 'enough': [406], 'avoid': [408], 'stalling': [409], 'com-': [411], 'putational': [412], 'units': [413], 'modern': [415], 'microprocessors.': [416], 'Reorganizing': [417], 'take': [421], 'full': [422], 'advantage': [423], 'hierarchies': [426], 'has': [427], 'been': [428], 'investigation': [433], 'years.': [436], 'principally': [439], 'focused': [440, 482], 'tiling': [442, 469], '[11,': [444], '16,': [445], '17]': [446], 'attempt': [448], 'exploit': [450], 'locality': [451, 486], 'by': [452], 'cache-sized': [456], 'blocks': [457], 'before': [460], 'moving': [461], 'next': [465], 'block.': [466], 'Whereas': [467], 'use': [471], 'domain': [472], 'decomposition': [473], 'improve': [475], 'locality,': [477], 'more': [478], 'studies': [480], 'attention': [483], 'exploiting': [485], 'dimension': [490], '[6,': [491], '13,': [492], '19,': [493], '24].': [494], 'work,': [497], 're-examine': [499], 'current': [503], 'microprocessors': [504], 'light': [506], 'growing': [509], 'gap': [511], 'processors': [513], 'memory,': [515], 'Preliminary': [520], 'CRD/NERSC,': [522], 'versions': [523], 'article': [526], 'appeared': [527], '[8,': [529], '9].': [530], 'Lawrence': [531], 'Berkeley': [532], 'National': [533], 'Laboratory,': [534], '1': [535], 'Cyclotron': [536], 'Road,': [537], 'Berkeley,': [538, 548], 'CA,': [539, 549], '94720.': [540, 550], 'Computer': [542], 'Science': [543], 'Department,': [544], 'University': [545], 'California,': [547]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W1547833503', 'counts_by_year': [], 'updated_date': '2024-09-18T20:08:45.092481', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works