Title: Unsupervised learning of finite mixture models with deterministic annealing for large-scale data analysis
Abstract: The finite mixture model, one of the most fundamental foundations in the fields of data mining and machine learning areas to access the essential structures of observed random sample data, aims at building a probabilistic model in which random sample data is described as a probabilistic distribution represented by mixtures of other distributions called latent components. The finite mixture model provides a convenient way to explain random phenomena of observed sample data in a generative process of finite mixtures.
The main challenges in the finite mixture model are (i) to search an optimal model parameter set from a large problem space and (ii) to find a generalized model to avoid overfitting. The standard method used to fit a finite mixture model is an Expectation-Maximization (EM) algorithm. However, an EM-based algorithm finds only locally optimized solutions and thus the quality of the answer is heavily affected by an initial condition (known as a local optimum problem). Moreover, it can cause an overfitting problem. We address these problems by using the novel optimization heuristic, known as Deterministic Annealing (DA), which has been proven its success to avoid local optima and been widely used in many data mining algorithms. More specifically, in this thesis, we focus two well-known data mining algorithms based on the finite mixture model: Generative Topographic Mapping (GTM) for dimension reduction and data visualization and Probabilistic Latent Semantic Analysis (PLSA) for text mining and information retrieval. Those two algorithms have been widely used in the field of data visualization and text mining but still suffer from the local optimum problem due to the use of the EM algorithm in their original developments. We extend those algorithms by using the DA algorithm to improve its quality in parameter estimation and provide overfitting avoidance. We show various experiment results to show the improvements.
Publication Year: 2012
Publication Date: 2012-01-01
Language: en
Type: article
Access and Citation
Cited By Count: 1
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot