Title: Probabilistic computational methods for structural alignment of RNA sequences
Abstract: In this thesis, the problem of structural alignment
of homologous RNA sequences is addressed.
The structural alignment
of a given set of RNA sequences is a secondary structure
for each
sequence, such that the structures are similar to each other, and a
sequence alignment
between the sequences that is conforming with
the secondary structures. A solution
to this problem was proposed
by Sankoff as a dynamic programming algorithm whose time
and
memory complexities are polynomial in the length of shortest
sequence and exponential
in the number of input sequences,
respectively. Variants of Sankoff’s method employ
constraints that
reduce the computation by restricting the allowed alignments or
structures.
In the first part of the thesis, a new methodology is
presented for the purpose of
establishing alignment constraints
based on nucleotide alignment and insertion posterior
probabilities. Using a hidden Markov model, posterior probabilities
of alignment and insertion
are computed and these probabilities
are additively combined to obtain probabilities
of co-incidence.
The constraints on alignments are computed by adaptively
thresholding
these probabilities to determine co-incidence
constraints for pruning of computations that
hold with high
probability. The proposed constraints are implemented into
Dynalign, a free
energy minimization algorithm for structural
alignment. Compared with prior non-adaptive
approaches, the
probabilistic constraints offer a significant reduction in
computation time
along with a marginal increase in base pair
prediction accuracy.
Next, a novel algorithm for structural
alignment of two RNA sequences is presented.
The similarity of the
structures is imposed by matched helical regions in the structural
alignnment.
A matched helical region represents a conserved helix
in each structure. Compared
to the structural alignment models of
previous methods, the matched helical regions extend
the
possibilities for alignment of paired nucleotides, which enables
the new structural
alignment space to better accomodate the
structural variability within a sequence family. A probability
distribution over the space of structural alignments is proposed
based on
pseudo-free energy changes, that account for both
stability of structures and plausibility of
the sequence
alignment. Three different problems are addressed for structural
alignment
prediction as inferences from the distribution: 1.
Estimation of the maximum a posteriori
(MAP) structural alignment,
i.e., the structural alignment with highest probability in
the
space, 2. Computation of base pairing probabilities of nucleotides
in each sequence,
3. Sampling the structural alignment space for
analysis of modes of the distribution over
structural alignments.
Finally, the problem of structure prediction for an arbitrary
number of homologous
sequences is addressed. To circumvent the
computational complexity, the prediction is
broken down into an
iterative computation of base pairing probabilities for each
sequence
using extrinsic…
Publication Year: 2010
Publication Date: 2010-01-01
Language: en
Type: article
Access and Citation
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot