to BioTechniques free email alert service to receive content updates.
Jeffrey Perkel, Ph.D.
BioTechniques, Vol. 60, No. 2, February 2016, pp. 56–60
Full Text (PDF)

Looking for the perfect sequencing instrument for your next experiment? As Jeffrey Perkel finds out, you might just need more than one.

Hardly a day goes by without the announcement of a newly sequenced genome. Whatever the DNA source—animal or plant, fungus or bacterium—genome sequencing has become so commonplace, the feat no longer amazes. It may shock younger readers to learn that not so long ago completed genomes merited journal covers and press conferences.

Today, the source of our groundswell of genomic data, next-generation sequencing (NGS), is regarded as largely routine. And why not? Prices have plummeted, while access has skyrocketed. Illumina's newly launched MiniSeq, priced at just $49,500, promises 7.5 Gb of data per run—not enough to decode a human genome, but definitely enough for targeted sequencing applications (see “New instruments, more options” sidebar). At the University of Edinburgh, Scotland, undergraduate honors students can even take a course where Illumina NGS technology and advanced bioinformatics are used to decipher bacterial genomes.

Perhaps lost in all this familiarity and access is the recognition that, a decade after the first NGS instruments went online and more than a decade since the completion of the Human Genome Project (HGP), sequence assembly remains a challenge. Even the small 5.2-Mb Bacteroides fragilis genome—the subject of the University of Edinburgh's 2013 undergraduate course—confounds, thanks to repetitive and mobile regulatory elements called “shufflons,” and “invertible promoters.”

A recent editorial in Nature Genetics crystallized the problem when it likened some particularly challenging genomic regions to “the disrupted runs of shuffled playing cards from multiple decks or the compound folded layers of butter and dough in Danish pastry” (1). Yet researchers are learning there is a way to untangle those regions by blending different, orthogonal sequencing technologies.

Nurseries of evolution

In theory, bacterial genomes should resolve into single contigs. Gene-rich and relatively small, there's little extraneous genetic material available to trip up sequence assembly tools. According to Judith Risse, a bioinformatics specialist at Edinburgh Genomics who processed the data for the Edinburgh undergraduate course, researchers usually don't get that lucky; a small number of contigs is routine. Still, when the Edinburgh students turned their attention to B. fragilis, they got a mess.

“If you try to assemble the B. fragilis genome from Illumina short reads, it will not assemble into a single contig, because it has small bits of DNA that can change orientation within the genome,” Risse explains. Feed those snippets into an assembly tool, and the software effectively throws up its hands. Instead of a single contig, the students in Edinburgh's Junior Honors “Genomes and Genomics 3” course were faced with more than 20. And eukaryotic chromosomes pose a far greater challenge.

Evan Eichler is a professor of genome sciences at the University of Washington School of Medicine and a Howard Hughes Medical Institute Investigator. He studies regions of the genome often overlooked in traditional genome sequencing projects.

“I am a child of the genome era, and I study both human disease and evolution from the perspective of looking at the genome first.”

Eichler concentrates on regions of genomic instability. Flanked by long segmental duplications, these regions can misalign during meiosis, leading to either deletion of the intervening sequence, or duplication. “Think of these regions as land mines,” Eichler explains. “Once you create this architecture where these duplications are separated by lots of unique sequence, that interspersed architecture then predisposes to higher rates of copy-number variation, creating instability.”

Eichler says there is good evidence that these regions represent nurseries of gene evolution, and they may even explain some of the biology that makes humans, well, human. The SRGAP2 gene, for instance, is present in one copy in many higher primates, but appears in triplicate on human chromosome 1, and the two additional, truncated copies could play a role in neocortex development and spine density in the brain.

But these complex regions can also lead to disease, such as intellectual disabilities and autism. 17q21.31 is a large region that in about 20%–30% of individuals of Northern European or Middle Eastern descent is inverted relative to the reference genome. While the change correlates with increased fecundity and recombination in women carriers, children of those carriers have a much higher rate of genetic microdeletions associated with neurological deficits—a constellation of genetic defects called Koolen-DeVries Syndrome. And it's a region the HGP, which relied on Sanger sequencing of bacterial artificial chromosome (BAC) clone libraries, and most subsequent genome-sequencing efforts, got wrong.

  1    2    3    4