|
This page contains links to software packages for various bioinformatics tasks written by members of the Computational Biology group and students.
Contents
- IMSA - Interactive Multiple Sequence Alignment
- PromView - DNA sequence viewer and pattern finder
- ConsensusMaker - Simple alignment tool to build consensus sequences
- TScan - A new approach to finding binding sites in DNA
- NOSA - Finding Near-Optimal Sequence Alignments
IMSA
"Improving the Sensitivity of Multiple-Sequence Alignments by Incorporating
Prior Knowledge",
Sumedha Gunewardena and Peter Jeavons,
Oxford University Computing Laboratory, Technical Report PRG-RR-02-01,
January 2002.
Link to web page with supplementary
information about this article.
IMSA is a multiple sequence alignment tool that allows users to
input, as prior knowledge, sets of sequences they know to be
homologous or sequences they know to have
structural or functional properties. The
program annotates the
input sequences based on this knowledge, which
is then used to
perform a smart alignment of the sequences. The program
tries to capture two biologically reasonable conjectures that can vastly
improve the sensitivity of the alignments. The first of these
ideas is based on the need to preserve certain biologically
distinguishable structures during the alignment process. The
second idea is based on the need to align residues of certain
distinguishable segments of sequence with each other, with
higher probability than otherwise specified by the substitution matrix.
The multiple sequence alignment algorithm used in IMSA is modified
from a standard iterative pair-wise alignment algorithm. We use what
we call 'sequence tags' to tag the input sequence. This is an efficient
and robust method to tag biological sequences that was developed for
this application.
IMSA is written in ANSI C. You are free to incorporate the modified
alignment equations used in IMSA and the implementation of sequence
tags in third party code provided you cite the above reference. You
may download IMSA version 1.00
from here.
Online documentation for IMSA is
available here.
To help us keep track of how many people use IMSA, we would greatly appreciate
hearing from you
if you make any use of the program. Any comments, suggestions, or bug reports
may also be mailed to the author.
PromView
by Peter Jeavons
Promview
is a viewer for DNA sequences that allows the user to:
- manually adjust the alignment of the sequences
- highlight selected features
- highlight all occurrences of selected patterns
- modify the patterns to obtain the maximum number of hits
- identify and highlight tandem repeats and palindromes
- zoom in and out of the displayed sequences
Promview
is written in Java. The
class files are available for download here. To run PromView you need
to have installed a copy of the Java 2 Runtime Environment (version
1.3 or later) - this can be downloaded here. Once you
have the Java run-time environment installed, you can run PromView with
the command "java -jar PromView.jar"
This program is still under development. To help us keep track of how
many people use PromView, we would greatly appreciate hearing from you if you make
any use of the program. Any comments, suggestions, or bug reports may also
be mailed to the author.
ConsensusMaker
by Peter Jeavons
ConsensusMaker
is a simple alignment tool for sequences that allows the
user to build up a consensus sequence from a collection of input sequences.
ConsensusMaker
is
written in Java and will run as an applet
in a Java-enabled browser. Alternatively, the class files are available for
download here.
To run ConsensusMaker you need to have installed a copy of the Java 2
Runtime Environment (version 1.3 or later) - this can be downloaded
here. Once
you have the Java run-time environment installed, you can run ConsensusMaker
with the command "java -jar ConsensusMaker.jar"
To help us keep track of how
many people use ConsensusMaker, we would greatly appreciate hearing from you if you make
any use of the program. Any comments, suggestions, or bug reports may also
be mailed to the author.
TScan
"Finding Transcription Factor Binding Sites in DNA Sequences: A
Template Based Approach",
Sumedha Gunewardena and Peter Jeavons,
Oxford University Computing Laboratory, Research Report
PRG-RR-03-21,
August 2003.
A problem faced by many algorithms for finding transcription factor binding
sites is the high number of false positive hits that result with the increased
sensitivity of their prediction. A main contributing factor to this is the
short and degenerate nature of these sites which results in a low signal to
noise ratio. In order to counter this problem one needs to look beyond the
base independence assumption.
TScan is a software package written for discriminating motif patterns
in genomic sequences. It was primarily written to identify transcription factor
binding sites in DNA sequences. TScan is based on templates designed to capture,
for discrimination, not only the vertical consensus but also the correlations
present between individual bases with the other bases of the site.
A prototype version of TScan has been written in Matlab and can be downloaded
from here. Online documentation for TScan is available
here.
This version of TScan has been written only to evaluate the performance
of our template models.
To help us keep track of how many people use TScan, we would greatly appreciate
hearing from you
if you make any use of the program. Any comments, suggestions, or bug reports
may also be mailed to the author.
NOSA
by Francis Tsang
An optimal sequence alignment is not necessarily the biologically "correct"
sequence alignment. In particular, when two sequences are evolutionarily distant,
their optimal sequence alignment(s) may fail to identify the essential biological
phenomena that should be captured. On the other hand, a set of sequence alignments
whose scores are close to the optimum may reveal useful information that
is missing in the optimal one(s). This leads to the development of algorithms
that produce optimal and near-optimal sequence alignments. As the number
of near-optimal sequence alignments grows exponentially, it is impractical
to enumerate all of them. Near-Optimal Sequence Aligner (NOSA) is a program
written in Java that produces optimal and near-optimal sequence alignments;
and it allows all optimal and near-optimal sequence alignments to be shown
in the same graph. The algorithm used in NOSA is modified from the one proposed
by Naor and Brutlag [Naor & Brutlag 1994 (J. Comp. Bio., 1:349-366)].
We introduce three new ideas. Firstly, we quantify the significance of every
residue pair in all optimal and near-optimal sequence alignments. Secondly,
NOSA allows a user to specify the way that some part of a sequence must align
against some part of the other sequence. This pre-aligned region(s) is kept
intact during the alignment process. Finally, as the algorithm proposed by
Naor and Brutlag was based on a simple scoring scheme, we extend the algorithm
to cover affine gap weight model, which the score of a gap is computed as
an affine function of the gap length.
The Java source code for NOSA can be downloaded from here. A set of example data files
can be downloaded from here.
To help us keep track of how many people use NOSA, we would greatly appreciate
hearing from you if you make any
use of the program. Any comments, suggestions, or bug reports may also be
mailed to the author.
|