|
TScan V1.00 (Prototype version)
(Online Documentation)
Sumedha Gunewardena,
Oxford University Computing Laboratory,
Wolfson Building, Parks Road,
Oxford, OX1 3QD , UK.
Sumedha.Gunewardena@comlab.ox.ac.uk
Contents
Introduction
A problem faced by many algorithms for finding transcription factor binding sites is the high number of false positive hits that result with the increased sensitivity of their prediction. A main contributing factor to this is the short and degenerate nature of these sites which results in a low signal to noise ratio. In order to counter this problem one needs to look beyond the base independence assumption. Tscan is a program based on templates designed to capture not only the vertical consensus but also the correlation of individual bases with the other bases of the site.
This version of TScan has been written only to evaluate the performance of our template models.
Downloading and Installing TScan
This prototype version of TScan is written in Matlab. It consists of the files 'web_main.m', 'web_transfacfilter.m', 'web_mserror.m', 'web_lda.m', the Matlab mex file 'stpara.c', its compiled version 'stpara.dll' and the example files 'train.txt', 'classify.txt' and 'test.txt'. These files can be downloaded from here. Download these files to your Matlab working directory. You will need to compile the 'stpara.c' file. To do this run the following Matlab command: 'mex -g stpara.c'. The file 'stpara.dll' is version of this file compiled in Windows 98 (this will not work in other platforms so you need to compile the file as above if your platform is not Windows 98).
Open the 'web_main.m' file in your Matlab editor. Go to the top of this file. You need to enter the path and file names of the train, classify and test files at the place indicated. You also need to enter the number of binding sites in the 'classify' file and 'test' file in the variables 'Clp' and 'Tep' as indicated.
TScan Input File Format
There are three input files to TScan. The first file contains a set of binding sites used to model the template. This file does not contain any non-binding sites. The second file contains a set of binding sites and a set of non-binding sites. The first set of sequences in this file should be the binding sites followed by the non-binding sites. This file is used to train the classifier. The third file also contains of a set of binding sites and non-binding sites. For evaluation purposes (to build statistics of the performance of TScan), the first set of sequences in this file should be the binding sites followed by the non-binding sites. The sequences in all three files should be of the same length.
Running TScan
Once you have downloaded all the files to your Matlab working directory, complied the 'stpara.c' file and made appropriate changes to the 'web_main.m' file you can run TScan by typing 'web_main' on the Matlab command prompt.
Please send Questions, Comments, and Bug Reports to
author.
|