GenomeThreader is a software tool to compute gene structure predictions.
The gene structure predictions are calculated using a similarity-based approach
where additional cDNA/EST and/or protein sequences are used to predict gene
structures via spliced alignments.
GenomeThreader was motivated by disabling limitations in
GeneSeqer, a popular gene prediction program which is widely used
for plant genome annotation.
Features
Intron Cutout Technique:
The intron cutout technique allows to overcome the time and space
limitations of the dynamic programming (DP) algorithms used in
GeneSeqer,
in particular, when applied to organisms containing long introns.
Baysian Splice Site Models (BSSMs):
With BSSMs it is possible to assign probabilities to GT donor, GC donor,
and AG acceptor sites. This information is used in the DP to get the exact
exon/intron boundaries right.
Combination of cDNA/EST Based Spliced Alignments with Protein Based Spliced
Alignments:
After (spliced) aligning the supplied cDNAs/ESTs and protein sequences onto
the genomic template, GenomeThreader computes consensus spliced
alignments. Consensus spliced alignments combine several spliced alignments
to resolve the complete gene structure and to uncover alternative splicing.
Incremental Updates:
When the used cDNA/EST or protein database is updated, a common approach
was to redo the complete mapping. With GenomeThreader, you can combine
newly computed spliced alignments with precomputed spliced alignments to
quickly recompute consensus spliced alignments.
XML:
The additional GenomeThreader XML output conforms to our gthXML
standard GenomeThreader.rng.txt. With
the included script XML2GFF.py, it is possible to convert gthXML output to the
GFF format.
A variety of gthXML-specific tools can be found
here.
gthDB:
We also provide
a schema and load script for gthDB, which permits storage
and query of GenomeThreader output in a relational format.
References have been omitted for brevity; you can find them and more details on
the implementation in the GenomeThreaderpaper.
How to take advantage of these features and many more is described in depth in
the GenomeThreadermanual.
Please consult the FAQ page for frequently asked
questions.
All mentioned files and scripts are also part of the GenomeThreader
distribution (see below).
Availability
GenomeThreader is available free of charge for non-commercial research
institutions. To obtain a copy, please fill out the
license agreement and send it to us by fax or
email (as described in the document). As soon as we receive your license agreement, we send you a username and password by email which you can use to
downloadGenomeThreader.
Examples
Evaluation cases described in Gremme et
al. 2005 (see below)
A 16.6Kb rice gene structure tractable with GenomeThreader (using
both an intron cutout technique
and without), but beyond
GeneSeqer's limitations.