to BioTechniques free email alert service to receive content updates.
‘BRAKE-ing’ The Monotony of Gene Annotation

02/05/2016
Jesse Jenkins

Researchers showcase new computational software to help analyze DNA sequences faster through automated gene finding. Learn more...


While genome sequencing has become more accessible for smaller labs, the process of genome annotation is still lengthy and labor intensive. It often takes many experts manually analyzing different regions of raw DNA sequences to identify genes for adding to genome databases.

<

Now, researchers introduce new software, called BRAKER1, capable of assessing DNA sequences immediately after assembly and finding new genes automatically. The researchers credit the software’s state-of-the-art GeneMark-ET and AUGUSTUS algorithms in helping create a more accurate alternative to current methods of automatic gene prediction.

“We have combined two of the most powerful algorithms that exist in the field of gene finding at this time, and this created a very versatile and accurate tool for groups around the world that work with eukaryotic genomes,” said study author Mark Borodovsky from Georgia Tech and the Moscow Institute of Physics and Technology.

“Genomic scientist can now find genes for hundreds or thousands of genomes at a time without having to spend manual time on each genome,” added co-author Mario Stanke from the University of Greifswald in Germany. “We just feed the raw sequence and alignment data into the program and it does everything automatically, and better than currently used competing pipelines.”

The team tested BRAKER1’s performance by comparing its prediction accuracy against the most commonly-used gene prediction pipeline currently in use, MAKER2. The team collected nuclear genomes, reference annotations, and RNA-Seq libraries from databases for four already highly-accurately annotated model organisms for comparison: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Schizosaccharomyces pombe. On average, the team found BRAKER1’s prediction accuracy to be more than 10% higher than MAKER2 in terms of gene prediction sensitivity and specificity.

According to Stanke, the BRAKER1 software has been downloaded 1,200 times since its initial launch month in January 2015, averaging 100 downloads per month from labs around the world. He and his collaborators are now working on modifications that will further improve the software’s accuracy.

“If we look at the fruit fly tests we did, for 65% of the genes we find 1 correct version of the gene, leaving 35% of the genes where we make a mistake…so there are definite improvements to be made,” said Stanke. “With modifications and other ideas we have in mind, I think we could get this to something like 80-90% accuracy.”

Reference

Katharina J. Hoff, Simone Lange, Alexandre Lomsadze, Mark Borodovsky, and Mario Stanke. BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS; Bioinformatics first published online November 11, 2015 doi:10.1093/bioinformatics/btv661.