Title: Parallel Graph Processing on Modern Multi-core Servers: New Findings and Remaining Challenges
Abstract: Big Data analytics and new problems in social networks, computational biology, and web connectivity led to a renewed research interest in graph processing. Due to "irregularity" of graph computations, efficient parallel graph processing faces a set of software and hardware challenges debated in literature. In this paper, by utilizing hardware performance counters, we characterize system bottlenecks, resource usage, and the efficiency of popular graph applications on the modern commodity hardware. We analyze selected graph applications (implemented in the Galois framework) on a variety of graph datasets: both scale-free graphs and meshes. Our profiling shows that with an increased number of cores the analyzed graph applications achieve a good speedup, which is highly correlated with utilized memory bandwidth. Contrary to traditional past stereotypes, we find that graph applications significantly benefit from hardware prefetchers. Moreover, the use of transparent huge pages (THP) exhibits a "double win" impact: 1) THP significantly decrease the TLB misses and page walk durations, and 2) THP boost the hardware prefetchers' performance. These insights shed light to understand the performance of emerging systems with large memories. Our profiling framework reports hardware counter values over time. It reveals the danger of using averages for a bottleneck and resource usage analysis: many applications have a time-varying behavior and stretch the usage of system resources to their peak. We discuss the new insights and remaining challenges for guiding the design of future hardware and software components for efficient graph processing.
Publication Year: 2016
Publication Date: 2016-09-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 11
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot