Title: DeepText2Go: Improving large-scale protein function prediction with deep semantic text representation
Abstract: UniProtKB has collected more than 88 million protein sequences by July 2017. Less than 0.2% of these proteins, however, have added experimental GO annotations. To reduce this huge gap, automatic protein function prediction (AFP) becomes increasingly important. Results on CAFA (the Critical Assessment of protein Function Annotation algorithms) benchmark demonstrates that sequence homology based methods are highly competitive in AFP. One imperative issues will be incorporating other information sources other than sequence for AFP. In contrast to using BOW (bag of words) representation in traditional text-based AFP, we proposed a new method called DeepText2GO to improve large-scale AFP by using deep semantic text representation instead. Furthermore, DeepText2GO integrates both text-based and sequence homology-based methods through a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence homology-based methods, validating its superiority.
Publication Year: 2017
Publication Date: 2017-11-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 7
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot