Title: Scaling number of cores in GPGPU: A comparative performance analysis
Abstract: The Single Instruction Multiple Thread (SIMT) architecture based, Graphic Processing Units (GPUs) are emerging as more efficient than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within a CTA are further grouped as warps. Though warps within a CTA are scheduled for execution on the same core, only one warp is executed at a time due to hardware constraint. The subsequent way in which a GPU exploits parallelism is by employing multiple shader cores to execute multiple warps simultaneously. We explore latter way of exploiting parallelism by increasing number of cores and its impact on different types of applications. We first categorize a number of general purpose GPU workloads into the ones that consumes less (L) DRAM bandwidth and the ones whose bandwidth requirement is heavier (H). We observed that the workloads that get boost in its performance are under type-L when number of cores increase. Whereas type-H workloads experience performance degradation. The maximum performance gain in terms of instructions per cycle (IPC), is 2.03x for type -L workloads. We then observed the impact of scaling on percentage of good cycles for all workloads. Our results show that additional pressure on bandwidth caused by scaling number of shader cores is detrimental for type-H workloads and a boost to type-L workloads at the cost of reduction in percentage of good cycles in both types.
Publication Year: 2015
Publication Date: 2015-08-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot