Title: WAF‐Based Chinese Character Recognition for Spam Image Filtering
Abstract: Chinese Journal of ElectronicsVolume 27, Issue 5 p. 1050-1055 ArticleFree Access WAF-Based Chinese Character Recognition for Spam Image Filtering Siyuan LI, Corresponding Author Siyuan LI [email protected] Beijing University of Posts and Telecommunication, Beijing, 100876 China National Institute of Network and Information Security of China, Beijing, 100029 ChinaSearch for more papers by this authorRuiguang LI, Ruiguang LI National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing, 100029 ChinaSearch for more papers by this authorYuan XU, Yuan XU National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing, 100029 ChinaSearch for more papers by this authorHao ZHOU, Hao ZHOU National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing, 100029 ChinaSearch for more papers by this authorHanbing YAN, Hanbing YAN National Institute of Network and Information Security of China, Beijing, 100029 China National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing, 100029 ChinaSearch for more papers by this authorBin XU, Bin XU Beijing University of Posts and Telecommunication, Beijing, 100876 China National Institute of Network and Information Security of China, Beijing, 100029 ChinaSearch for more papers by this authorHonggang ZHANG, Honggang ZHANG Beijing University of Posts and Telecommunication, Beijing, 100876 ChinaSearch for more papers by this author Siyuan LI, Corresponding Author Siyuan LI [email protected] Beijing University of Posts and Telecommunication, Beijing, 100876 China National Institute of Network and Information Security of China, Beijing, 100029 ChinaSearch for more papers by this authorRuiguang LI, Ruiguang LI National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing, 100029 ChinaSearch for more papers by this authorYuan XU, Yuan XU National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing, 100029 ChinaSearch for more papers by this authorHao ZHOU, Hao ZHOU National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing, 100029 ChinaSearch for more papers by this authorHanbing YAN, Hanbing YAN National Institute of Network and Information Security of China, Beijing, 100029 China National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing, 100029 ChinaSearch for more papers by this authorBin XU, Bin XU Beijing University of Posts and Telecommunication, Beijing, 100876 China National Institute of Network and Information Security of China, Beijing, 100029 ChinaSearch for more papers by this authorHonggang ZHANG, Honggang ZHANG Beijing University of Posts and Telecommunication, Beijing, 100876 ChinaSearch for more papers by this author First published: 01 September 2018 https://doi.org/10.1049/cje.2018.06.014Citations: 4AboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract We address the problem of filtering image spam, a kind of rapidly spread spam in which the text is embedded into images to defeat text-based spam filter. Particularly, we focus on image spam with Chinese text as "spam" which is a more challenging task. A popular way to detect image spam is by Optical character recognition (OCR) system, which detects and recognizes the embedded text, then followed by a text classifier that discriminate spam from ham. However, spammers start to obscure image text to prevent OCR system discovering the spam text. To compensate for the shortcomings of OCR system, a novel method which essentially is a keyword reconstruction algorithm based on Word activation force (WAF) model is proposed. It is effiective on discovering keywords, hence is benefit for the later classification stage and notably improve the performance of image spam filtering. The experimental results on a personal data set of spam images (publicly available) validate the effiectiveness of our approach that outperforms the original OCR system in practical usage with complex background in image spam. Citing Literature Volume27, Issue5September 2018Pages 1050-1055 RelatedInformation