A Web-based High-Throughput Tool for Next-Generation Sequence Annotation

reportActive / Technical Report | Accession Number: ADA572794 | Open PDF

Abstract:

The availability of a large number of genome sequences, resulting from inexpensive, high-throughput next-generation sequencing platforms, has created the need for an integrated, fully-automated, rapid, and high-throughput annotation capability that is also easy-to-use. Here, we present a web-based software application, Annotation of Genome Sequences AGeS, which incorporates publicly-available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. The current version of AGeS provides annotations for bacterial genome sequences, and serves as a readily-accessible resource to Department of Defense DoD scientists for storing, annotating and visualizing genomes of newly-sequenced pathogens of interest. The AGeS system is composed of two major components. The first component is a web-based application that provides a graphical user interface for managing users input genomes, submitting annotation jobs, and visualizing results. Sequence contigs are uploaded as a multi-FASTA input file and submitted for annotation, and the resulting annotations are visualized through GBrowse. The input genome sequences and the annotation results are stored in a secure, customized database. The second component is a high-throughput annotation pipeline for finding the genomic regions that code for proteins, RNAs and other genomic elements through a Do-It-Yourself Annotation framework. The pipeline also functionally annotates the protein-coding regions using an in-house-developed high-throughput pipeline, the Pipeline for Protein Annotation. The annotation pipeline has been deployed on the Mana Linux cluster at the Maui High Performance Computing Center. The two components are connected together using the DoD user interface toolkit application programming interface. The AGeS system was evaluated for scaling of its parallel execution and annotation performance. AGeS scaled with super-linear speedup for up to 128 processors.

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution:
Approved For Public Release
Distribution Statement:
Approved For Public Release; Distribution Is Unlimited.

RECORD

Collection: TR
Identifying Numbers
Subject Terms