Title: Committee-Based Sampling For Training Probabilistic Classifiers
Abstract: In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper proposes a general method for efficiently training probabilistic classifiers, by selecting for training only the more informative examples in a stream of unlabeled examples. The method, committee-based sampling, evaluates the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set selected so far (Monte-Carlo sampling). The method is particularly attractive because it evaluates the expected information gain from a training example implicitly, making the model both easy to implement and generally applicable. We further show how to apply committee-based sampling for training Hidden Markov Model classifiers, which are commonly used for complex classification tasks. The method was implemented and tested for the task of tagging words in natural language sentences with parts-of-speech. Experimental evaluation of committee-based sampling versus standard sequential training showed a substantial improvement in training efficiency.
Publication Year: 1995
Publication Date: 1995-01-01
Language: en
Type: book-chapter
Indexed In: ['crossref']
Access and Citation
Cited By Count: 488
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot