Title: The Synchronization Treatment in Implementing Data-Parallel Programming Languages on CPUs
Abstract: When implementing data-parallel programming languages such as CUDA, OpenCL on CPUs, synchronization must be simulated correctly. The basic method is thread-based, which means all thread must execute one instruction in turn before execute the next one. In this paper, we propose function splitting to treat synchronization in a co routine style but not just thread-based. It splits the data-parallel function presented by low-level intermediate representation into several parts by simulating synchronization. We evaluate our method in translating PTX kernels to multi-core CPUs, the result of which shows this method could promotes performance by 15% compared to thread-based method. Our main contribution is a generous synchronization treatment that performs on low-level intermediate code given by a control flow graph in SSA form.
Publication Year: 2013
Publication Date: 2013-11-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot