Title: Innovative algorithms and evaluation methods for biological motif finding
Abstract: Biological motifs are defined as overly recurring short-sized patterns in biological systems. Sequence motifs, structural motifs and network motifs are the examples of biological motifs. Due to its expensive searching process, many biological motif finding algorithms have been focusing on the computational efficiency to discover the motifs. However, there is no comprehensive benchmark to validate the biological significance of the “candidate motifs,” which are discovered computationally with their sequential or structural similarities. Some of sequence motifs are verified by their structural similarities or their functional roles in the DNA or protein sequences, and stored in databases. However, the biological role of network motifs is still invalidated and no databases exist.
In this dissertation, we emphasize more on the biological meanings for the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with a biological information of Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. In addition, the algorithms can replace existing approximation algorithms and parallel search algorithms as well. Experimental results show that the algorithms perform better than existing algorithms by producing more number of high-quality of biological motifs.
Additionally, biological network motifs are applied to predict essential proteins in two ways. We design a more robust and biologically meaningful centrality algorithm to rank proteins in a PPI network, name it MCGO, then show that highest detection rate of MCGO compared with existing centrality algorithms. MCGO is then combined with other centrality algorithms to be plugged as features for a machine learning algorithm to predict essential proteins in a network.
We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins.
INDEX WORDS: Biological network motif, Clustering analysis, Gene ontology, Essential protein, Machine learning.
Publication Year: 2012
Publication Date: 2012-01-01
Language: en
Type: article
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot