Title: Accelerated diffusion-based recommendation algorithm on tripartite graphs with GPU clusters
Abstract: Exorbitant computation cost hinders the practical application of recommendation algorithm, especially in time-critical application scenario. Although experiments show that recommendation algorithm based on an integrated diffusion on user-item-tag tripartite graphs can significantly improve accuracy, diversification and novelty of recommendation, it is also very time-consuming. Therefore, a parallel solution is frequently needed to improve the performance of the algorithm. This paper explicitly presents the parallel implementation and optimizations of diffusion-based recommendation on weighted tripartite graphs algorithm using Compute Unified Device Architecture (CUDA) and related optimization solutions including shared memory, stream scheduling and GPU cluster optimization. Compared to the algorithm running on a single CPU core, the unoptimized GPU kernel can achieve 153.9 speedup on average with the input dataset consists of 30000 records on GTX 980. With shared memory applied, the time cost on memory access saves about 50% on dataset of 90000 records and with 2 way streams scheduling, the kernel's performance improves about 7% ~ 13%. Based on the optimized kernel, we evaluate the performance of the algorithm with customized socket communication mechanism on GPU clusters. And compared to a single GPU node, we achieve 7.55 speedup on clusters of 9 GPUs when recommending for 8000 users. Besides this, the speedup of GPU clusters is also 26.1 times of the speedup of our CPU clusters of 9 nodes and 1586.28 times of serial algorithm on one CPU core. It proves that GPU technology can dramatically improve the algorithm's performance.
Publication Year: 2016
Publication Date: 2016-05-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot