Title: Performance Evaluation of Different Feature Encoding Schemes on Cybersecurity Logs
Abstract:Many cybersecurity logs contain a substantial volume of textual data regarding security events. This data needs to be converted to numerical types before any machine learning (ML) algorithms can be ap...Many cybersecurity logs contain a substantial volume of textual data regarding security events. This data needs to be converted to numerical types before any machine learning (ML) algorithms can be applied. Feature encoding is the process of transforming textual data into numerical values so they may be applied to ML algorithms, resulting in improved model accuracy. Researchers have used many approaches to convert textual data into numerical values such as, “Label Encoding” “One Hot Encoding” and “Binary Encoding”. These approaches are useful encoding schemes for dealing with large scale text data. We examine the application of these methods to cybersecurity datasets to determine which encoding scheme performs the best when used with a classification ML algorithm in identifying intrusion detections. Experimental results show that label encoding performed the best, whereas one hot encoding was least effective.Read More
Publication Year: 2019
Publication Date: 2019-04-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 30
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot