Abstract: In many machine learning tasks, unlabled data abounds, but expert-generated labels are scarce. Consider the process of learning to build a classier for the Sloan Digital Sky Survey (http://www.sdss.org/) so that each astronomical observation may be assigned its class (e.g. “pinwheel galaxy”, “globular galaxy”, “quasar”, “colliding galaxies”, “nebula”, etc.). The SDSS contains 230 million astronomical objects, among which professional Astronomers have classified manually less than one tenth of 1 percent. Consider classifying web pages into subject-matter based taxonomies, such as the Yahoo taxomy or a Dewy library catalog system. Whereas there are many billions of web pages, less than .001% have reliable topic or subject categories.
Publication Year: 2010
Publication Date: 2010-01-01
Language: en
Type: book-chapter
Indexed In: ['crossref']
Access and Citation
Cited By Count: 22
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot