Title: FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning
Abstract: Applying federated learning (FL) on Internet of Things (IoT) devices is necessitated by the large volumes of data they produce and growing concerns of data privacy. However, there are three challenges that need to be addressed to make FL efficient: 1) execution on devices with limited computational capabilities; 2) accounting for stragglers due to computational heterogeneity of devices; and 3) adaptation to the changing network bandwidths. This article presents <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FedAdapt</monospace> , an adaptive offloading FL framework to mitigate the aforementioned challenges. <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FedAdapt</monospace> accelerates local training in computationally constrained devices by leveraging layer offloading of deep neural networks (DNNs) to servers. Furthermore, <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FedAdapt</monospace> adopts reinforcement learning (RL)-based optimization and clustering to adaptively identify which layers of the DNN should be offloaded for each individual device on to a server to tackle the challenges of computational heterogeneity and changing network bandwidth. The experimental studies are carried out on a lab-based testbed and it is demonstrated that by offloading a DNN from the device to the server <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FedAdapt</monospace> reduces the training time of a typical IoT device by over half compared to classic FL. The training time of extreme stragglers and the overall training time can be reduced by up to 57%. Furthermore, with changing network bandwidth, <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">FedAdapt</monospace> is demonstrated to reduce the training time by up to 40% when compared to classic FL, without sacrificing accuracy.