to BioTechniques free email alert service to receive content updates.
Improving sequencing quality from PCR products containing long mononucleotide repeats
 
Aron J. Fazekas, Royce Steeves, and Steven G. Newmaster
Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada
BioTechniques, Vol. 48, No. 4, April 2010, pp. 277–285
Full Text (PDF)
Supplementary Material
Abstract

Stutter products are a common artifact in the PCR amplification of frequently used genetic markers that contain mononucleotide simple sequence repeats. Despite the importance of accurate determination of nucleotide sequence and allele size, there has been little progress toward decreasing the formation of stutter products during PCR. In this study, we tested the effects of lowered extension temperatures, inclusion of co-solutes in PCR, PCR cycle number, and the use of different polymerases on sequence quality for a set of sequences containing mononucleotide A/T repeats of 10–17 bp. Our analyses showed that sequence quality of mononucleotide repeats ≤15 bp is greatly improved with the use of proofreading DNA polymerases fused to nonspecific dsDNA binding domains. Our findings also suggest that the number of nucleotides with which the DNA polymerase interacts may be the most important factor in the reduction of slipped-strand mispairings in vitro.

Introduction

Simple sequence repeats (SSRs), or micro-satellites, are repetitive nucleotide sequences composed of 1–6 bases found in both organellar and nuclear genomes. These highly repetitive motifs make microsatellites particularly prone to mutation via slipped-strand mispairing (1,2). The relatively rapid rate of mutation, high number of alleles, and their high frequency in genomes have made SSRs popular markers for population genetics, linkage mapping, genetic fingerprinting, and taxonomic study (3,4).

The natural process of slipped-strand mispairing that results in SSR mutation in vivo also occurs in vitro during PCR-mediated DNA replication. Mutations at SSR sites during in vitro enzymatic replication of SSRs are usually the result of insertion or deletion of repeats in the extending, or nascent, DNA strand sequence (5). In order for slipped-strand mispairing to occur, the DNA polymerase enzyme first stalls and dissociates from the dsDNA complex during replication of the repeated motif. If base pairing is disrupted after polymerase dissociation, then a loop of one or more repeat units may form in either the nascent or the template strand prior to re-association and cause the insertion or deletion of one or more units, respectively, in the newly formed DNA strand (1,2). Deletion mutations are believed to be more common as they require fewer nucleotides of the dsDNA to dissociate and therefore are more energetically favorable than insertion mutations (5,6,7).

In PCR, these artifacts of the process have a greater mutation frequency with greater repeat numbers, and smaller sizes of repeat motifs (8,9,10). They occur at the greatest rate in mononucleotide repeats (also known as homopolymer runs) composed of eight or more nucleotides (9), which is the estimated number of bases that fill the active site of Taq DNA polymerase and many other DNA polymerases (11,12,13). It is assumed that when the active site of a polymerase is full of identical nucleotides, it is more likely to dissociate (9) and allow the DNA strands an opportunity to misalign.

These mutations can result in stutter products in microsatellite images or in sequence chromatograms, and may confound the delimitation of the true repeat number, as stutter products can be generated in similar or even greater proportions than the true product (9,14). Additionally, the quality of sequence data after a mononucleotide repeat of 10 or more bases is often greatly reduced, often to the point of being unreadable.

This issue is regularly encountered by those who sequence PCR products derived from genomic DNA. In particular, A/T rich regions such as intergenic spacers of the plastid genome—widely used by investigators of plant phylogenetics (15)—often contain mononucleotide repeats. While contigs can usually be generated by sequencing both strands, this necessitates doubling the amount of sequencing effort required in order to obtain a minimum 2-fold coverage. Confounding the problem further are samples that have two or more regions of mononucleotide repeats within targets of interest. Determining the nucleotide sequence between these regions is usually impossible to do with any confidence without the added step of designing internal sequencing primers (16). Slipped-strand mispairing does seem to be primarily (although not wholly) a function of the PCR process. Kieleczawa (16) reported that sequences containing A/T repeats up to 50 bp can be readily sequenced from plasmids, indicating that slippage during the sequencing reaction is not the main source of reduced sequence quality.

Here we report our efforts designed to improve the quality of sequence data generated from PCR-amplified genomic regions containing mononucleotide repeats. We first attempted to reduce the degree of stutter generated with Taq DNA polymerase–mediated PCR by increasing the affinity of the polymerase by varying the PCR conditions. We then tested new-generation DNA polymerases that are designed to have increased performance relative to Taq DNA polymerase.

Materials and methods

For our experiments, we focused on a particular genomic region and a set of 25 plant samples (Supplementary Table S1) that we knew contained mononucleotide repeats that negatively impacted sequence quality. Total genomic DNA was extracted from dried leaf material of each sample using the Plant II DNA extraction kit (Macherey-Nagel GmbH & Co. KG, Düren, Germany) according to the manufacturer's specifications, yielding DNA concentrations of 40–150 ng/µL per sample. As a starting point for PCR amplification, we used the following standard conditions, from which we subsequently altered various parameters: 20-µL reaction volumes containing 1 U AmpliTaq Gold Polymerase with 1× GeneAmp PCR Buffer II (100 mM Tris-HCl pH 8.3, 500 mM KCl) (Applied Biosystems, Foster City, CA, USA), 2 mM MgCl2, 0.2 mM dNTPs, 0.2 µM each primer and 20 ng genomic DNA. Thermal cycling was performed on a Veriti PCR thermal cycler (ABI) as follows: initial denaturation at 95°C for 3 min; 35 cycles of 95°C for 1 min, 58°C for 30 s, and 72°C for 1 min; hold at 72°C for 5 min; and an indefinite hold at 4°C. We targeted the plastid psbA-trnH intergenic spacer using the primer psbAF (17) (5′-GTTATGCATGAACGTAATGCTC-3′) and a modified version (2 base pairs shorter) of the primer trnH2 (18) (5′-CGCATGGTGGATTCACAATCC-3′).

  1    2    3    4