Title: How to steal a machine learning classifier with deep learning
Abstract: This paper presents an exploratory machine learning attack based on deep learning to infer the functionality of an arbitrary classifier by polling it as a black box, and using returned labels to build a functionally equivalent machine. Typically, it is costly and time consuming to build a classifier, because this requires collecting training data (e.g., through crowdsourcing), selecting a suitable machine learning algorithm (through extensive tests and using domain-specific knowledge), and optimizing the underlying hyperparameters (applying a good understanding of the classifier's structure). In addition, all this information is typically proprietary and should be protected. With the proposed black-box attack approach, an adversary can use deep learning to reliably infer the necessary information by using labels previously obtained from the classifier under attack, and build a functionally equivalent machine learning classifier without knowing the type, structure or underlying parameters of the original classifier. Results for a text classification application demonstrate that deep learning can infer Naive Bayes and SVM classifiers with high accuracy and steal their functionalities. This new attack paradigm with deep learning introduces additional security challenges for online machine learning algorithms and raises the need for novel mitigation strategies to counteract the high fidelity inference capability of deep learning.
Publication Year: 2017
Publication Date: 2017-04-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 78
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot