|
Genome Re-annotation of Escherichia coli CFT073 |
|
As a human pathogen, the genome of uropathogenic Escherichia coli strain CFT073 was sequenced and published in 2002, which was a landmark for the study of uropathogenic infections (Welch et al 2002). However, the current RefSeq annotation of this pathogen is outdated to some degree, due to missing or misannotation of some ensential genes associated with its virulence. We carried out a systematic reannotation by combining automated annotation tools with manual effects to provide a comprehensive understanding of virulence of the CFT073 genome. Since public DNA sequence databases such as DDBJ, EMBL/EBI and GenBank accept updates of annotations only from original submitters, for third party annotators, it is advised to seek alternative solutions to make genome reannotation publicly accessible to the research communities (Salzberg 2007). This website is devoted to this issue. It includes three sections: 1) a brief overview of the methods for reannotation, 2) links to browse the reannotation, and 3) links for data download. Citation: Chengwei Luo, Gang-Qing Hu and Huaiqiu Zhu: Genome reannotation of Escherichia coli CFT073 with new insights into virulence. BMC Genomics, 2009,10:552. |
|
Methods All open reading frames (ORFs) longer than 60 bps are extracted from the genome sequence of the CFT073 strain downloaded from RefSeq. We first searched the ORFs against the Swiss-Prot by blastp and conserved domain database (CDD) by rps-blast (e-value < 1E-5 and identity > 30%), then considered results from gene-finders including EasyGene 1.2, GeneMark.hmm, Glimmer 3.02 and MED 2.0. Specially, we only include genes co-predicted by at least three of the tools. Besides, to have a more complete picture, the reannotation also includes genes with known functions from the original annotations. We made comments on gene functions from CDD/Swiss-Prot blast results, as well as the original function annotation, if available. RefSeq's original annotations on tRNAs and rRNAs and Cryptic prophages are retained in this reannotation. Of note, it is interesting to observe that most of the small RNAs (sRNA) known so far in Escherichia coli are missing from the original annotation. To correct his systematic defect, we combined Rfam9.0 prediction and literature investigation for sRNA annotation. For gene start annotation, we used the ProTISA pipeline that provides high quality annotations of gene starts with a variety of evidences including experiments, conserved domain search, n-terminal sequence alignments among orthologous genes and predictions from the state-of-the-art. |
Browser This section includes links to the reannotation of coding sequences of the genome. It gives information that occasionally appears in a file archived in databases GenBank and RefSeq with an extension name of "ptt" (for instance), including location, strand, length, PID, comments on function etc. In addition, it provides evidence from blast results from Swiss-Prot, CDD and VFDB for virulence factors, the name of pathogenicity island that the gene belong to (if available) , prediction support from the four gene finders and the category of the gene starts annotated by the ProTISA pipeline. To speed up the browser, we split the total annotations into 21 files, ~250 entries for each, sorted according to gene locations. Please click one of the links to browse (location of the first gene included in the file is set as the file name): Below lists explanations of column names of the file:
|
Downloads
|
References
|
|
Last update on July, 2009, Copyright(C)2009, All Rights Reserved กก |