Title: Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process
Abstract:We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the topics). In the sparse topic model (spar...We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the topics). In the sparse topic model (sparseTM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the sparseTM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the sparseTM on four real-world datasets. Compared to traditional approaches, the empirical results will show that sparseTMs give better predictive performance with simpler inferred models.Read More
Publication Year: 2009
Publication Date: 2009-12-07
Language: en
Type: article
Access and Citation
Cited By Count: 116
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot