MetaTISA Input Format
MetaTISA Output Format
MetaTISA Input Format
CDS Annotation Format
One input format for CDS annotation accepted is the output of
MetaGene prediction. The other accepted format is our own MED format: an sequence fragment id beginning with ">", followed by
co-ordinations of all CDSs in the fragment, one CDS per line. The
coordinates of a CDS define the nucleotide region, from which the
first position for positive strand (second position for negative
strand) can be used to translate the CDS to amino acid sequence. At the same time, in order to
facilitate users to post-processing
results from other gene prediction tools, we provide several tools for a
conversion from a format of a gene-finder to the MED format (converting
formats).
Example (MED format):
>NC_000913_1
2 109 -
124 699 -
>NC_000913_2
3 305 +
539 700 +
>NC_000913_3
1 411 -
442 699 -
>NC_000913_4
2 685 -
>NC_000913_5
1 699 +
Sequence Format
The metagenome sequence is in Fasta format. The first line is recognized as unique id for sequence
fragment.
Example:
>mgutLn1_U_BL_aaa09a05_b1 Mouse Gut Community PT3 : mgutLn1_U_BL_aaa09a05_b1
AAATCTCGCCCTGTGGTGGATTCCTTTTCCCATTGCCCGATCTTATTTTT
ATCTTCCAAAAATGACGAACTGGACAAGATCCTGGGTCTTTCGGTGGGCG
GGGATGATTACGTGGCAAAGCCGTTCAGCCCGAAGGAGATCGCGTATCGG
GTCAAGGCGCAGCTCCGGCGGGCCGCGTATCAGCAAGACCCGTCGGAGGA
GGAGCTCATAAAAACAGGGGAATTGGAAATTGACGTGGAGGGCTGCAGGG
TCACAAAAGGCGGCAGCCCCATAGAACTGACCGCGCGGGAATTTGAAATC
CTGCGGTATCTGGCGGAAAATCAAGGCCGGGTCATCAGCCGCGAACGCTT
ATATGAAACCATCTGGGGCGAGGACAGCTTCGGGTGCGACAATACGGTCA
TGGTGCATATCCGGCATCTGCGTGAAAAAATAGAGGACGATCCCGCGGCG
CCCCGATACATCATCACGATGAAAGGATTAGGCTATAAGCTGGTGGACCC
TTATGAAGAATAAAAGCGATCTCAATCTGTTTTTTCGTTCGTTCGGCATT
GTCGTGATTGTGATCTTCGCGGCCATTGCAGCGGGGATATGCCTGTTTTA
TTATGTGTTCGCGATTCCGGCGCGGGAGGGACTCAGCCTGGCCTCATGGC
CAGACGTGTATACAGACAATTTTTCCCTTCAGCTTGAAGAAGAACAGGGA
GAGCTTAAAGTAAAAGAATTCGGGATTGAAGATCTGGACCGGTATGGCTT
ATGGCTGCAGGTGATCGATGAAACGGGACAGGAGTTTTTTCACACAATAA
GCCGGAGACCTGTCCCAACAGCTATACGGCCTCGAGCTTTTGGCATTCGG
GTACGAACGTTTA
Top of Input and Output format
MetaTISA Output Format
MED Format
MED format gives a sequence id and corresponding CDS
annotation, as described above. View the example.
GFF Format
The output in GFF (general feature format) is denoted according to the specifications of the
Sanger institute:
<seqname> <source> <feature> <start> <end> <score> <strand> <frame> [attributes] [comments]
Example:
##gff-version 3
##MetaTISA
##metagenome sequence name: NC_000913_45
NC_000913_45 MetaTISA CDS 3 395 . - .
##gff-version 3
##MetaTISA
##metagenome sequence name: NC_000913_255
NC_000913_255 MetaTISA CDS 2 298 . - .
NC_000913_255 MetaTISA CDS 320 616 . - .
Top of Input and Output format
|