Title: Accelerating Science with the NERSC Burst Buffer Early User Program
Abstract: Accelerating Science with the NERSC Burst Buffer Early User Program Wahid Bhimji ∗ , Debbie Bard ∗ , Melissa Romanus ∗† , David Paul ∗ , Andrey Ovsyannikov ∗ , Brian Friesen ∗ , Matt Bryson ∗ , Joaquin Correa ∗ , Glenn K. Lockwood ∗ , Vakho Tsulaia ∗ , Suren Byna ∗ , Steve Farrell ∗ , Doga Gursoy ‡ , Chris Daley ∗ , Vince Beckner ∗ , Brian Van Straalen ∗ , David Trebotich ∗ , Craig Tull ∗ , Gunther Weber ∗ , Nicholas J. Wright ∗ , Katie Antypas ∗ , Prabhat ∗ Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA, Email: [email protected] † Rutgers Discovery Informatics Institute, Rutgers University, Piscataway, NJ, USA ‡ Advanced Photon Source, Argonne National Laboratory, 9700 South Cass Avenue, Lemont, IL 60439, USA their workflow, initial results and performance measurements. We conclude with several important lessons learned from this first application of Burst Buffers at scale for HPC. Abstract—NVRAM-based Burst Buffers are an important part of the emerging HPC storage landscape. The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory recently installed one of the first Burst Buffer systems as part of its new Cori supercomputer, collaborating with Cray on the development of the DataWarp software. NERSC has a diverse user base comprised of over 6500 users in 700 different projects spanning a wide variety of scientific computing applications. The use-cases of the Burst Buffer at NERSC are therefore also considerable and diverse. We describe here performance measurements and lessons learned from the Burst Buffer Early User Program at NERSC, which selected a number of research projects to gain early access to the Burst Buffer and exercise its capability to enable new scientific advancements. To the best of our knowledge this is the first time a Burst Buffer has been stressed at scale by diverse, real user workloads and therefore these lessons will be of considerable benefit to shaping the developing use of Burst Buffers at HPC centers. Index Terms—Nonvolatile memory, Data storage systems, Burst Buffer, Parallel I/O, High Performance Computing A. The I/O Hierarchy Recent hardware advancements in HPC systems have en- abled scientific simulations and experimental workflows to tackle larger problems than ever before. The increase in scale and complexity of the applications and scientific instruments has led to corresponding increase in data exchange, interaction, and communication. The efficient management of I/O has become one of the biggest challenges in accelerating the time- to-discovery for science. Historically, the memory architecture of HPC machines has involved compute nodes with on-node memory (DRAM), a limited number of I/O subsystem nodes for handling I/O requests, and a disk-based storage appliance exposed as a parallel file system. DRAM node-memory is an expensive commodity with limited capacity, but fast read/write access, while disk-based storage systems provide a relatively inexpen- sive way to store and persist large amounts of data, but with considerably lower bandwidth and higher latency. This traditional HPC architecture is often unable to meet the I/O coordination and communication needs of the applications that run on it, particularly at extreme scale. In order to address this I/O bottleneck system architects have explored ways of offering cost-effective memory and filesystem solutions that can offer faster performance than parallel filesystems on disk- based storage. A natural extension of this work has been to explore ways of deepening the memory hierarchies on HPC machines to include multiple storage layers in-between DRAM and disk. These proposed solutions leverage technology ad- vancements like solid-state devices (SSDs), as well as other flash-based and/or NVRAM offerings. Therefore, some state-of-the-art HPC systems now include a new tier of ‘intermediate’ storage between the compute nodes and the hard disk storage, known as a ‘Burst Buffer’. This layer is slower (but higher capacity) than on-node memory, but faster (and lower capacity) than HDD-based storage. I. I NTRODUCTION HPC faces a growing I/O challenge. One path forward is a fast storage layer, close to the compute, termed a Burst Buffer [1]. Such a layer was deployed with the first phase of the Cori Cray XC40 System at NERSC in the later half of 2015, providing around 900 TB of NVRAM-based storage. This system not only employs state-of-the-art SSD hardware, but also a new approach to on-demand filesystems through Cray’s DataWarp software. In order to enable scientific ap- plications to utilize this new layer in the storage hierarchy, NERSC is running the Burst Buffer Early User Program, focused on real science applications and workflows that can benefit from the accelerated I/O the system provides. The program is providing a means to test and debug the new technology as well as drive new science results. In this paper we first briefly review the motivation for Burst Buffers and the range of potential use-cases for NERSC’s diverse scientific workload. We then provide a brief overview of the architecture deployed at NERSC in Section II-B before outlining the Early User Program and the projects selected. We then focus on five specific projects and describe in detail
Publication Year: 2016
Publication Date: 2016-01-01
Language: en
Type: article
Access and Citation
Cited By Count: 55
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot