Title: Mining features for web ner model construction based on distant learning
Abstract: In this paper, we study the problem of developing a WIDM NER tool to prepare training corpus from the Web for custom named entity recognition (NER) models via distant learning. We consider two major issues including efficient automatic labelling and effective feature mining for training accurate NER models via sequence labelling technique. While the idea of collecting training sentences from search snippets via known entities (seeds) is not new, efficient automatic labelling becomes an issue when we have a large number of seeds (e.g. 500K) and sentences (e.g. 2M). The second issue regards the mining of interesting terms or k-grams as features for supervised learning. We conduct experiments on four types of entity recognition including Chinese person name, food name, location name, and point of interest (POI) to demonstrate the improvement in efficiency and effectiveness with the proposed Web NER model construction tool.
Publication Year: 2017
Publication Date: 2017-12-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 1
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot