Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

Johannes Habich; Thomas Zeiser; Georg Hager; Gerhard Wellein
{'id': 'https://openalex.org/W1996820070', 'doi': 'https://doi.org/10.1016/j.advengsoft.2010.10.007', 'title': 'Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA', 'display_name': 'Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA', 'publication_year': 2011, 'publication_date': '2011-05-01', 'ids': {'openalex': 'https://openalex.org/W1996820070', 'doi': 'https://doi.org/10.1016/j.advengsoft.2010.10.007', 'mag': '1996820070'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://doi.org/10.1016/j.advengsoft.2010.10.007', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S16540516', 'display_name': 'Advances in Engineering Software', 'issn_l': '0965-9978', 'issn': ['0965-9978', '1873-5339'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310320990', 'host_organization_name': 'Elsevier BV', 'host_organization_lineage': ['https://openalex.org/P4310320990'], 'host_organization_lineage_names': ['Elsevier BV'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'journal-article', 'indexed_in': ['crossref'], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5059969249', 'display_name': 'Johannes Habich', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I181369854', 'display_name': 'Friedrich-Alexander-Universität Erlangen-Nürnberg', 'ror': 'https://ror.org/00f7hpc57', 'country_code': 'DE', 'type': 'education', 'lineage': ['https://openalex.org/I181369854']}], 'countries': ['DE'], 'is_corresponding': False, 'raw_author_name': 'J. Habich', 'raw_affiliation_strings': ['Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany'], 'affiliations': [{'raw_affiliation_string': 'Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany', 'institution_ids': ['https://openalex.org/I181369854']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5055302755', 'display_name': 'Thomas Zeiser', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I181369854', 'display_name': 'Friedrich-Alexander-Universität Erlangen-Nürnberg', 'ror': 'https://ror.org/00f7hpc57', 'country_code': 'DE', 'type': 'education', 'lineage': ['https://openalex.org/I181369854']}], 'countries': ['DE'], 'is_corresponding': False, 'raw_author_name': 'T. Zeiser', 'raw_affiliation_strings': ['Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany'], 'affiliations': [{'raw_affiliation_string': 'Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany', 'institution_ids': ['https://openalex.org/I181369854']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5082552227', 'display_name': 'Georg Hager', 'orcid': 'https://orcid.org/0000-0002-8723-2781'}, 'institutions': [{'id': 'https://openalex.org/I181369854', 'display_name': 'Friedrich-Alexander-Universität Erlangen-Nürnberg', 'ror': 'https://ror.org/00f7hpc57', 'country_code': 'DE', 'type': 'education', 'lineage': ['https://openalex.org/I181369854']}], 'countries': ['DE'], 'is_corresponding': False, 'raw_author_name': 'G. Hager', 'raw_affiliation_strings': ['Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany'], 'affiliations': [{'raw_affiliation_string': 'Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany', 'institution_ids': ['https://openalex.org/I181369854']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5070209050', 'display_name': 'Gerhard Wellein', 'orcid': 'https://orcid.org/0000-0001-7371-3026'}, 'institutions': [{'id': 'https://openalex.org/I181369854', 'display_name': 'Friedrich-Alexander-Universität Erlangen-Nürnberg', 'ror': 'https://ror.org/00f7hpc57', 'country_code': 'DE', 'type': 'education', 'lineage': ['https://openalex.org/I181369854']}], 'countries': ['DE'], 'is_corresponding': False, 'raw_author_name': 'G. Wellein', 'raw_affiliation_strings': ['Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany'], 'affiliations': [{'raw_affiliation_string': 'Erlangen Regional Computing Center (RRZE), University of Erlangen-Nuremberg, Martensstr. 1, 91058 Erlangen, Germany', 'institution_ids': ['https://openalex.org/I181369854']}]}], 'countries_distinct_count': 1, 'institutions_distinct_count': 1, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': {'value': 4190, 'currency': 'USD', 'value_usd': 4190, 'provenance': 'doaj'}, 'apc_paid': None, 'fwci': 4.499, 'has_fulltext': True, 'fulltext_origin': 'ngrams', 'cited_by_count': 50, 'citation_normalized_percentile': {'value': 0.956089, 'is_in_top_1_percent': False, 'is_in_top_10_percent': True}, 'cited_by_percentile_year': {'min': 95, 'max': 96}, 'biblio': {'volume': '42', 'issue': '5', 'first_page': '266', 'last_page': '272'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T11751', 'display_name': 'Lattice Boltzmann Method for Complex Flows', 'score': 0.9998, 'subfield': {'id': 'https://openalex.org/subfields/2206', 'display_name': 'Computational Mechanics'}, 'field': {'id': 'https://openalex.org/fields/22', 'display_name': 'Engineering'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T11751', 'display_name': 'Lattice Boltzmann Method for Complex Flows', 'score': 0.9998, 'subfield': {'id': 'https://openalex.org/subfields/2206', 'display_name': 'Computational Mechanics'}, 'field': {'id': 'https://openalex.org/fields/22', 'display_name': 'Engineering'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T12163', 'display_name': 'Modern Electrostatic Gas Cleaning Technologies and Methods', 'score': 0.979, 'subfield': {'id': 'https://openalex.org/subfields/2208', 'display_name': 'Electrical and Electronic Engineering'}, 'field': {'id': 'https://openalex.org/fields/22', 'display_name': 'Engineering'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/x86', 'display_name': 'x86', 'score': 0.65425754}, {'id': 'https://openalex.org/keywords/solver', 'display_name': 'Solver', 'score': 0.6084882}, {'id': 'https://openalex.org/keywords/lattice-boltzmann-method', 'display_name': 'Lattice Boltzmann Method', 'score': 0.492357}, {'id': 'https://openalex.org/keywords/kernel', 'display_name': 'Kernel (algebra)', 'score': 0.48532122}, {'id': 'https://openalex.org/keywords/speedup', 'display_name': 'Speedup', 'score': 0.43124032}], 'concepts': [{'id': 'https://openalex.org/C173608175', 'wikidata': 'https://www.wikidata.org/wiki/Q232661', 'display_name': 'Parallel computing', 'level': 1, 'score': 0.81411695}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.8081366}, {'id': 'https://openalex.org/C2778119891', 'wikidata': 'https://www.wikidata.org/wiki/Q477690', 'display_name': 'CUDA', 'level': 2, 'score': 0.71201885}, {'id': 'https://openalex.org/C170723468', 'wikidata': 'https://www.wikidata.org/wiki/Q182933', 'display_name': 'x86', 'level': 3, 'score': 0.65425754}, {'id': 'https://openalex.org/C2778770139', 'wikidata': 'https://www.wikidata.org/wiki/Q1966904', 'display_name': 'Solver', 'level': 2, 'score': 0.6084882}, {'id': 'https://openalex.org/C50630238', 'wikidata': 'https://www.wikidata.org/wiki/Q971505', 'display_name': 'General-purpose computing on graphics processing units', 'level': 3, 'score': 0.5061049}, {'id': 'https://openalex.org/C74193536', 'wikidata': 'https://www.wikidata.org/wiki/Q574844', 'display_name': 'Kernel (algebra)', 'level': 2, 'score': 0.48532122}, {'id': 'https://openalex.org/C21442007', 'wikidata': 'https://www.wikidata.org/wiki/Q1027879', 'display_name': 'Graphics', 'level': 2, 'score': 0.4609713}, {'id': 'https://openalex.org/C459310', 'wikidata': 'https://www.wikidata.org/wiki/Q117801', 'display_name': 'Computational science', 'level': 1, 'score': 0.44108704}, {'id': 'https://openalex.org/C68339613', 'wikidata': 'https://www.wikidata.org/wiki/Q1549489', 'display_name': 'Speedup', 'level': 2, 'score': 0.43124032}, {'id': 'https://openalex.org/C111919701', 'wikidata': 'https://www.wikidata.org/wiki/Q9135', 'display_name': 'Operating system', 'level': 1, 'score': 0.17081255}, {'id': 'https://openalex.org/C2777904410', 'wikidata': 'https://www.wikidata.org/wiki/Q7397', 'display_name': 'Software', 'level': 2, 'score': 0.14478439}, {'id': 'https://openalex.org/C33923547', 'wikidata': 'https://www.wikidata.org/wiki/Q395', 'display_name': 'Mathematics', 'level': 0, 'score': 0.0734632}, {'id': 'https://openalex.org/C114614502', 'wikidata': 'https://www.wikidata.org/wiki/Q76592', 'display_name': 'Combinatorics', 'level': 1, 'score': 0.0}, {'id': 'https://openalex.org/C199360897', 'wikidata': 'https://www.wikidata.org/wiki/Q9143', 'display_name': 'Programming language', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://doi.org/10.1016/j.advengsoft.2010.10.007', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S16540516', 'display_name': 'Advances in Engineering Software', 'issn_l': '0965-9978', 'issn': ['0965-9978', '1873-5339'], 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': 'https://openalex.org/P4310320990', 'host_organization_name': 'Elsevier BV', 'host_organization_lineage': ['https://openalex.org/P4310320990'], 'host_organization_lineage_names': ['Elsevier BV'], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [{'id': 'https://metadata.un.org/sdg/7', 'score': 0.43, 'display_name': 'Affordable and clean energy'}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 11, 'referenced_works': ['https://openalex.org/W1603142303', 'https://openalex.org/W1664235680', 'https://openalex.org/W1971876223', 'https://openalex.org/W2062900651', 'https://openalex.org/W2063186542', 'https://openalex.org/W2097495737', 'https://openalex.org/W2108238461', 'https://openalex.org/W2117242079', 'https://openalex.org/W2147127704', 'https://openalex.org/W3141650078', 'https://openalex.org/W4388316717'], 'related_works': ['https://openalex.org/W3048701459', 'https://openalex.org/W2983282793', 'https://openalex.org/W240129890', 'https://openalex.org/W2389600408', 'https://openalex.org/W2370314112', 'https://openalex.org/W2364044215', 'https://openalex.org/W2149078538', 'https://openalex.org/W2080146221', 'https://openalex.org/W1963859303', 'https://openalex.org/W1912958759'], 'abstract_inverted_index': {'This': [0], 'paper': [1], 'presents': [2], 'implementation': [3, 45], 'strategies': [4], 'and': [5, 31, 55], 'optimization': [6], 'approaches': [7], 'for': [8, 36, 87], 'a': [9, 50, 100], 'D3Q19': [10], 'lattice': [11], 'Boltzmann': [12], 'flow': [13, 38], 'solver': [14, 39, 48], 'on': [15, 52], 'nVIDIA': [16], 'graphics': [17], 'processing': [18], 'units': [19], '(GPUs).': [20], 'Using': [21], 'the': [22, 27, 37, 43, 47, 88, 93], 'STREAM': [23], 'benchmarks': [24], 'we': [25], 'demonstrate': [26], 'GPU': [28], 'parallelization': [29], 'approach': [30], 'obtain': [32], 'an': [33, 64], 'upper': [34], 'limit': [35], 'performance.': [40], 'We': [41, 81], 'discuss': [42], 'GPU-specific': [44], 'of': [46, 66, 96], 'with': [49, 74], 'focus': [51], 'memory': [53], 'alignment': [54], 'register': [56], 'shortage.': [57], 'The': [58], 'optimized': [59], 'code': [60], 'is': [61], 'up': [62], 'to': [63, 91], 'order': [65], 'magnitude': [67], 'faster': [68], 'than': [69], 'standard': [70], 'two-socket': [71], 'x86': [72], 'servers': [73], 'AMD': [75], 'Barcelona': [76], 'or': [77], 'Intel': [78], 'Nehalem': [79], 'CPUs.': [80], 'further': [82], 'analyze': [83], 'data': [84], 'transfer': [85], 'rates': [86], 'PCI-express': [89], 'bus': [90], 'evaluate': [92], 'potential': [94], 'benefits': [95], 'multi-GPU': [97], 'parallelism': [98], 'in': [99], 'cluster': [101], 'environment.': [102]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W1996820070', 'counts_by_year': [{'year': 2024, 'cited_by_count': 2}, {'year': 2023, 'cited_by_count': 5}, {'year': 2022, 'cited_by_count': 4}, {'year': 2021, 'cited_by_count': 5}, {'year': 2020, 'cited_by_count': 5}, {'year': 2019, 'cited_by_count': 2}, {'year': 2018, 'cited_by_count': 7}, {'year': 2017, 'cited_by_count': 3}, {'year': 2015, 'cited_by_count': 4}, {'year': 2014, 'cited_by_count': 3}, {'year': 2013, 'cited_by_count': 5}, {'year': 2012, 'cited_by_count': 4}], 'updated_date': '2024-08-15T05:05:20.953259', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works