Title: I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors
Abstract: The parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for private-cache multi-core architectures.As a tool for developing geometric algorithms in this model, a parallel version of the I/O-efficient distribution sweeping framework was introduced recently, and a number of algorithms for problems on axis-aligned objects were obtained using this framework.The obtained algorithms were efficient but not optimal.In this paper, we improve the framework to obtain algorithms with the optimal I/O complexity of O(sort ( ) + / ) for a number of problems on axisaligned objects; denotes the number of cores/processors, denotes the number of elements that fit in a cache line, and denote the sizes of the input and output, respectively, and sort ( ) denotes the I/O complexity of sorting items using processors in the PEM model.To obtain the above improvement, we present a new onedimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of O(( + )/ ), where is the sum of the counts of all the ranges.The key to achieving efficient load balancing among the processors in this algorithm is a new method to count the output without enumerating it, which might be of independent interest.