Title: Managing consistency for big data applications : tradeoffs and self-adaptiveness
Abstract: In the era of Big Data, data-intensive applications handle extremely large volumes of data while requiring fast processing times. A large number of such applications run in the cloud in order to benefit from cloud elasticity, easy on-demand deployments, and cost-efficient Pays-As-You-Go usage. In this context, replication is an essential feature in the cloud in order to deal with Big Data challenges. Therefore, replication therefore, enables high availability through multiple replicas, fast data access to local replicas, fault tolerance, and disaster recovery. However, replication introduces the major issue of data consistency across different copies. Consistency management is a critical for Big Data systems. Strong consistency models introduce serious limitations to systems scalability and performance due to the required synchronization efforts. In contrast, weak and eventual consistency models reduce the performance overhead and enable high levels of availability. However, these models may tolerate, under certain scenarios, too much temporal inconsistency. In this Ph.D thesis, we address this issue of consistency tradeoffs in large-scale Big Data systems and applications. We first, focus on consistency management at the storage system level. Accordingly, we propose an automated self-adaptive model (named Harmony) that scale up/down the consistency level at runtime when needed in order to provide as high performance as possible while preserving the application consistency requirements. In addition, we present a thorough study of consistency management impact on the monetary cost of running in the cloud. Hereafter, we leverage this study in order to propose a cost efficient consistency tuning (named Bismar) in the cloud. In a third direction, we study the consistency management impact on energy consumption within the data center. According to our findings, we investigate adaptive configurations of the storage system cluster that target energy saving. In order to complete our system-side study, we focus on the application level. Applications are different and so are their consistency requirements. Understanding such requirements at the storage system level is not possible. Therefore, we propose an application behavior modeling that apprehend the consistency requirements of an application. Based on the model, we propose an online prediction approach- named Chameleon that adapts to the application specific needs and provides customized consistency.