The parasite Plasmodium falciparum causes millions of deaths due to malaria each year. To fight the illness, researchers believe new drugs and vaccines that exploit weaknesses indicated by the parasite’s genome could be useful. But sequencing its genome has been particularly challenging.
PCR problems don’t just plague studies of the malaria parasite either. The same issue shows up when preparing PCR samples of DNA from Staphylococcus aureus, a bacterium that causes boils and food poisoning, and also DNA from species of Clostridium, including the one that causes botulism.
But now, researchers have been working on alternative amplification methods that don’t depend on PCR. These sample preparation methods, according to Daniel Peiffer, Illumina’s market manager for DNA applications, produce sequencing libraries that provide “the most comprehensive, unbiased view of the genome.”
So far, DNA sequencing companies Illumina and IonTorrent have begun marketing and testing their PCR-free sample prep methods. The techniques promise to eliminate common problems with PCR such as bias and duplications, but these PCR-free sample prep methods come with their own limitations.
Before joining Oxford Nanopore, Daniel Turner spent several years at the Wellcome Trust Sanger Institute trying to understand the limitations of PCR, particularly when amplifying AT- and GC-rich sequences. He worked on new methods to circumvent PCR biases and to build a better P. falciparum genome.
Standard PCR is used to create a library from random genomic fragments, which have a wide range of base compositions. Some templates have a strong secondary structure while others have low thermal stability—both of these factors affect PCR amplification efficiency. In the end, not all of the genomic sequences are represented equally in the PCR-amplified library. And when a genome is rich in adenine-thymine (AT) or guanine-cytosine (GC) base pairs, biases in sequence coverage can be exacerbated during PCR amplification.
P. falciparum’s genome is a perfect example. AT content makes up more than 75% of the coding regions and up to 100% of sequences in the intergenic and non-coding regions. S. aureus has just over 67% AT content. “The highest I’ve come across is Carsonella ruddii, which is just over 83% AT,” says Turner.
When scientists sequence genetic material from organisms with such AT-rich content, they have to do additional sequencing to see which regions are underrepresented. With high AT content, the problems come from low thermodynamic stability and from the repetitive nature of the sequences. “It’s hard for a polymerase to copy a template if the strand it’s trying to extend does not want to stay hybridized, and repetitive sequences can lead to the polymerase slipping, which introduces errors,” Turner says.
High GC genomes—such as those of Streptomyces coelicolor, with about 72% GC, and Thermus thermophilus, with about 70% GC—have their own challenges. The high GC content gives these genomes high thermostability.
“Even if you are using universal primers, which anneal to GC-neutral adapters, it is hard to get a very high GC template to denature fully, and even when it does, strands are likely to form strong intramolecular hydrogen bonds when the temperature is lowered for the annealing step of the PCR, which will make things difficult for the polymerase,” Turner explains.
And both high-AT and high-GC genomes tend to be repetitive, which can lead to errors. In extreme cases, some regions are barely represented at all, while fragments that do amplify are overrepresented in the library. "These are duplicate sequences, which do not provide any useful information, meaning that throughput is wasted," Turner says.
Adapting and Amplifying
In a paper published in Biomed Central Genomics in 2012, Turner and colleagues tested PCR and non-PCR amplification techniques to determine the best way to sequence the genome of P. falciparum (1). Earlier studies had shown that PCR-free methods that replace PCR with other amplification steps during Illumina library preparation could improve read distribution and produce more even genome coverage than PCR-based methods (2).
The PCR-free cluster amplification works similarly to PCR but has several key differences that help prevent the introduction of bias during amplification. The biggest difference is the way the scientists structure the adapters. Cluster amplification adapters contain additional sequences that help hybridization to short, single-stranded DNA or RNA molecules already attached to the flowcell surface. In this approach, scientists use the flowcell itself to select for fully ligated template molecules, and cluster amplification only amplifies template strands that are fully ligated with a unique adaptor sequence complementary to the flow-cell adapters at their 5′ and 3′ ends. As a result, the cluster amplification step performs the enrichment that PCR would normally do and reduces amplification bias.
But cluster amplification is not the only way to get around PCR. There are also isothermal amplification methods, including recombinase polymerase amplification. This process uses genetic recombination enzymes to pair primers with double-stranded DNA based on homology followed by strand displacement amplification. There is no thermal or chemical melting of the DNA, and the reaction amplifies the genetic material from just a few target copies (3).
Because of the promise of these PCR-free sample prep methods, sequencing companies are starting to bring products that use the technology to the market. At the 2013 Advances in Genome Biology & Technology (AGBT) conference, IonTorrent CEO Jonathan Rothberg showcased the company’s Avalanche system, which uses a temperature-incubation-based amplification system. Rothberg showed the audience sequence data consisting of nearly 600 bases that were read with high-accuracy in about an hour. The Avalanche system is slated for release in 2014.
Meanwhile, researchers can already order Illumina’s TruSeq DNA PCR-free sample preparation kit. The kit promises a bead-based workflow designed to be completed in less than a day and to accommodate growing read lengths on the company’s sequencing platforms. "Given the increasing popularity of whole-genome sequencing, we’ve seen that researchers have a desire for the most complete view of the genome and PCR-free preps allow for a dramatic improvement in the sequencing coverage of challenging regions," Peiffer says.
Still Not Perfect
Although the PCR-free approach improves sequence read distribution and produces more even genome coverage (2), it requires a large amount of starting DNA material, which is often difficult to obtain, especially when dealing with clinical isolates. For example, there just isn’t enough P. falciparum DNA in clinical isolates for the PCR-free methods to be most efficient, says Turner.
He and his colleagues discovered that the best way to sequence limited amounts of P. falciparum DNA was to carefully select the polymerases used for PCR and to use the PCR additive TMAC (tetramethylammonium chloride) as part of the standard Illumina library amplification. TMAC normalizes melting temperatures of the different library fragments. The scientists found it inhibits most common polymerases, but it did work with Platinum pfx and the Kapa enzymes (1). Comparing the genome coverage reads from libraries made with these alternative PCR conditions to the libraries from standard PCR and PCR-free methods, the scientists found they could reduce amplification bias and retain the complexity of the parasite's extremes of base composition (1).
And, there are other issues with PCR-free methods, according to Turner. For example, when scientists perform a sequencing reaction on a clonally amplified template, the sequencing reaction is not very efficient. Some strands within the cluster will not be extended during each cycle and will become out of phase with the rest, limiting the read length of the system, he says.
To work around this problem, Illumina recently bought sequencing startup Moleculo, Inc., which has developed new technology for improving the accuracy of long reads. Used with an Illumina HiSeq 2500 sequencer, the technology produced localized assemblies of individual, long fragments of 8-10 kilobases, which could better resolve repetitive sequences, says Geoff Waldbieser, a molecular biologist at the U.S. Department of Agriculture Catfish Genetics Research Unit.
At the 2013 Plant and Animal Genome conference, Waldbieser showed how the Illumina-acquired technology was used to produce a more contiguous genome assembly for the blue catfish, Ictalurus furcatus, which has many repetitive sequences in its DNA (5).
"The ultimate extrapolation of PCR-free methods is to be able to sequence without any amplification at all," says Turner.
This statement is not surprising coming from the director of applications at Oxford Nanopore. Their third-generation sequencing system eliminates DNA amplification entirely, an advance that could produce more accurate genome reads for P. falciparum and other pathogen, leading to the development of better drugs and vaccines to fight off malaria and other illnesses.
1. Oyola, S., T. Otto, Y. Gu, G. Maslen, M. Manske, S. Campino, D. Turner, B. MacInnis, D. Kwiatkowski, H. Swerdlow, and M. Quail. 2012. Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes. BMC Genomics 13(1):1+.
2. Kozarewa, I., Z. Ning, M. A. Quail, M. J. Sanders, M. Berriman, and D. J. Turner. 2009. Amplification-free illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nature methods 6(4):291-295.
3. Piepenburg, O., Williams, C., Stemple, D., Armes NA. (2006). DNA detection using recombination proteins. PLoS Biol. 4(7): e204.
4. Waldbieser, G. et al. 2013. Production of long (1.5kb – 15.0kb), accurate, DNA sequencing reads using an illumina HiSeq2000 to support de novo assembly of the blue catfish genome. Plant & Animal Genome XXI. Poster.