GenomeThreader Gene Prediction Software
GenomeThreader is a software tool to compute gene structure predictions.
The gene structure predictions are calculated using a similarity-based approach
where additional cDNA/EST and/or protein sequences are used to predict gene
structures via spliced alignments.
GenomeThreader was motivated by disabling limitations in
GeneSeqer, a popular gene prediction program which is widely used
for plant genome annotation.
Features
-
Intron Cutout Technique:
The intron cutout technique allows to overcome the time and space
limitations of the dynamic programming (DP) algorithms used in
GeneSeqer,
in particular, when applied to organisms containing long introns.
-
Baysian Splice Site Models (BSSMs):
With BSSMs it is possible to assign probabilities to GT donor, GC donor,
and AG acceptor sites. This information is used in the DP to get the exact
exon/intron boundaries right.
-
Combination of cDNA/EST Based Spliced Alignments with Protein Based Spliced
Alignments:
After (spliced) aligning the supplied cDNAs/ESTs and protein sequences onto
the genomic template, GenomeThreader computes consensus spliced
alignments. Consensus spliced alignments combine several spliced alignments
to resolve the complete gene structure and to uncover alternative splicing.
-
Incremental Updates:
When the used cDNA/EST or protein database is updated, a common approach
was to redo the complete mapping. With GenomeThreader, you can combine
newly computed spliced alignments with precomputed spliced alignments to
quickly recompute consensus spliced alignments.
-
XML:
The additional GenomeThreader XML output conforms to our gthXML
standard GenomeThreader.rng.txt. With
the included script XML2GFF.py, it is possible to convert gthXML output to the
GFF format.
A variety of gthXML-specific tools can be found
here.
-
gthDB:
We also provide
a schema and load script for gthDB, which permits storage
and query of GenomeThreader output in a relational format.
References have been omitted for brevity; you can find them and more details on
the implementation in the GenomeThreader
paper.
How to take advantage of these features and many more is described in depth in
the GenomeThreader manual.
All mentioned files and scripts are also part of the GenomeThreader
distribution (see below).
Availability
GenomeThreader is available free of charge for non-commercial research
institutions. To obtain a copy, please fill out the
license agreement and send it to us by fax or
email (as described in the document). As soon as we receive your license agreement, we send you a username and password by email which you can use to
download GenomeThreader.
Examples
-
Evaluation cases described in Gremme et
al. 2005 (see below)
-
A 16.6Kb rice gene structure tractable with GenomeThreader (using
both an intron cutout technique
and without), but beyond
GeneSeqer's limitations.
-
A 125Kb intron-containing human
gene structure.
-
Small samples of gzip'ed
plain text and
XML
GenomeThreader output.
Users
The following sites use GenomeThreader. This list is not intended to be
comprehensive.
If you want to appear on this list, please drop me a
note.
Developers
GenomeThreader is being actively developed by the following individuals:
Publications
Please cite the following article in publications about research using
GenomeThreader: