Programming Research Group
Research Report RR-02-01
Improving the sensitivity of multiple-sequence alignments by incorporating prior knowledge
Sumedha Gunewardena, and
Peter Jeavons
January 2002, 20pp.
Abstract
In this paper, we present efficient modifications to the well-established progressive alignment
algorithm for biological sequences. These modifications are designed to allow the user to
incorporate prior knowledge about the sequences and so greatly improve the sensitivity
of the resulting alignments. The first modification increases the probability that
certain biologically distinguishable structures are preserved during the alignment
process. The second modification increases the probability that specified sequence
segments will align with each other.
We have implemented both of these modifications in an interactive multiple-sequence
alignment tool (IMSA). IMSA takes a two-stage approach to the alignment process. The
initial or pre-processing stage takes as input sets of sequence segments defined on DNA,
RNA or protein sequences. These sets of sequences represent biologically distinguishable
features, which could be derived from known homologies, or known structural or functional
elements. The sequences to be aligned are efficiently annotated based on this additional
information, and the program then computes an alignment which is adjusted to take account
of this annotation.
This paper is available as a 416668 bytes gzipped PostScript file.
|