2Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia
Full Text (PDF)
Introduction
Whole-genome sequencing is a powerful approach for obtaining reference sequence information for multiple organisms. The whole-genome shotgun (WGS) strategy has been applied to the study of several genomes. This approach generates enough sequence data to cover the genome several times over and is used for computational assembly of overlapping sequence reads into contigs. Contig assembly is achieved by aligning and orienting reads based on regions of shared identity; redundant reads are then used to produce a consensus sequence (1). However, in the case of large genomes, the presence of repetitive sequences, which constitute a considerable proportion of total genome size, makes WGS sequencing impractical. Effective strategies for eliminating these repetitive DNA sequences can facilitate sequencing analyses of large genomes by limiting redundant sequencing of repetitive elements.
Several approaches have been proposed for enrichment of genomic DNA with single-copy sequences and elimination of repetitive elements. Methylation filtration (MF), development of hypomethylated partial restriction (HMPR) libraries, and methylation-spanning linker libraries (MSLL) have been proposed for the analysis of plant genomes. These methods are based on the tendency of repetitive sequences to be hypermethylated in higher plants (2,3,4). Thus, low-methylated, gene-rich fragments can be separated from most repetitive sequences by digestion with restriction enzymes sensitive to the 5′-methylation of cytosine (2,3) or by constructing genomic libraries in Escherichia coli strains with a McrBC restriction-modification system that prevents propagation of heavily methylated DNA (4). Unfortunately, these methods are clearly only applicable to a limited number of plant genomes that have the appropriate methylation character.
Another approach, high-C0t analysis (where “C0” is DNA concentration at time zero and “t” is re-association time), is based on DNA renaturation kinetics. Because low-copy DNA sequences renature more slowly than repetitive sequences, a ssDNA fraction of high-C0t DNA enriched with genes could be obtained under suitable conditions (5,6,7). In this technique, genomic DNA is sheared, heat-denatured, and slowly re-annealed. Then, the double-stranded (repetitive) DNA is separated from the single-stranded (low-copy) DNA by hydroxyapatite (HAP) chromatography. High-C0t analysis can be applied to any genome; however, practitioners generally need a firm understanding of DNA reassociation kinetics and relatively advanced expertise in spectrophotometry.
In this study, we investigated the applicability of duplex-specific nuclease (DSN) normalization technology for eukaryotic genomic DNA. DSN normalization is a simple and effective method previously suggested for cDNA normalization before transcriptome sequencing (8,9). The method is based on the properties of a DSN isolated from the Kamchatka crab. DSN is thermostable and specific to dsDNA (10).
Like high-C0t DNA fractionation, DSN normalization is based on hybridization kinetics, but does not involve physical separation of ssDNA and dsDNA fractions. After re-association of denatured DNA, dsDNA comprising repetitive sequences is hydrolyzed by DSN and the ssDNA fraction containing low-copy molecules is amplified by PCR (8,9). DSN normalization has been successfully used for normalization of cDNA from different sources before sequencing of expressed sequence tags (ESTs) using both standard and high-throughput sequencing techniques (11). In addition, DSN normalization has been suggested for use in post-amplification normalization of microbial genomes obtained from individual cells using non–PCR-based, whole-genome amplification (WGA) methods (12).
Materials and methods
DNA preparation
A total of 50 µg human genomic DNA (Clontech, Mountain View, CA, USA) was sheared using a Cole–Parmer CP750 Ultrasonic processor (Vernon Hills, IL, USA; 20 3-s cycles with 10 s between cycles) to a mean length of 500 kb (range 200–750 kb), as determined by agarose gel electrophoresis. The sheared DNA was purified by phenol:chloroform extraction, precipitated with ethanol, and redissolved in 50 µL 20 mM Tris-HCl (pH 8.0). A 1-µL aliquot of sheared DNA solution was treated with T4 DNA polymerase (Fermentas, Burlington, Canada) according to the manufacturer's instructions to create blunt ends, and then purified by phenol: chloroform extraction, precipitated with ethanol, and redissolved in 10µL 20 mM Tris-HCl (pH 8.0) to a final DNA concentration of 100 ng/µL.The 5′ ends of DNA fragments were phosphorylated by incubating 8 µL blunt-ended DNA sample with 1 µL 10× ligase buffer (Fermentas), riboAT P (3 m M final concentration), and 0.7 µL T4 polynucleotide kinase (5 U/µL; Fermentas) at 37°C for 15 min, followed by heat inactivation of the enzyme at 60°C for 20 min. Phosphorylated DNA was ligated to t he Not1S adapter (5′-GGT CGC GGC CGA GGT-3′) by incubating 4 µL phosphorylation reaction mixture with 1 µL 10× ligation buffer (Promega, Madison, WI, USA), 2 µL 10 µM adapter, 2 µL Milli-Q water (Millipore, Billerica, MA, USA), and 1 µL T4 DNA ligase (3U/µL, Promega) at 14°C overnight. The ligation mixture was then diluted 10× in Milli-Q water, and a 1-µL aliquot of the diluted sample was amplified by PCR using an Encyclo PCR kit (Evrogen, Moscow, Russia) and Not1S primer in a total reaction volume of 100 µL, as described by the kit manufacturer. After preheating at 72°C for 2 min, 14 PCR cycles were performed on an MJ Research PTC-200 DNA Thermal Cycler (MJ Research, Inc., Waltham, MA, USA); each cycle consisted of 95°C for 7 s, 62°C for 20 s, and 72°C for 1 min. PCR products were purified using a PCR purification kit (Qiagen, Valencia, CA, USA), precipitated with ethanol, and diluted in Milli-Q water to a final DNA concentration of 75 ng/µL.
DNA normalization
A two-microliter aliquot of DNA solution containing ~250 ng DNA was mixed with 1 µL 4× hybridization buffer (200 mM HEPES pH 7.5, 2 M NaCl, 0.8 mM EDTA), and 1 µL human Cot-1 DNA (1 µg/µL; Invitrogen, Carlsbad, CA, USA). For normalizations performed without the Cot-1 fraction, 1 µL Milli-Q water was added in place of the Cot-1 fraction. Reaction mixtures were overlaid with mineral oil, denatured at 98°C for 3 min, and allowed to renature at 68°C for 5 h (C0t ~50). After incubation, 5 µL 2× DSN buffer (100 mM Tris-HCl pH 8.0, 10 mM MgCl2, 2 mM DTT), preheated to 70°C, was added to the reaction mixture. Next, 1 µL DSN solution (1 U/µL) was added to the reaction, and incubation was continued for 20 min at 65°C. DSN was then inactivated by addition of 10 µL 5 mM EDTA and diluted in 20 µL Milli-Q water.
Amplification of normalized DNA before sequencing
The normalized ssDNA fraction remaining after DSN treatment (5-µL aliquots) was amplified in five reaction tubes by PCR using an Encyclo PCR kit and Not1S primers in a total reaction volume of 50 µL. Thermocycling conditions used were 18 cycles of 95°C for 7 s, 62°C for 20 s, and 72°C for 1 min. PCR products were purified using a PCR purification kit (Qiagen), precipitated with ethanol, and diluted in Milli-Q water to a final DNA concentration of 50 ng/µL.
DNA library construction for conventional and automated sequencing
For DNA plasmid library construction, fresh amplified normalized and non-normalized DNA (~200 ng each) was cloned into the pGEM-T-Easy vector (Promega) as described by the manufacturer, and transformed into E. coli (XL-1 Blue strain) using electroporation with a Bio-Rad Micropulser (Hercules, CA, USA). From each library, 300 randomly selected white clones were used for plasmid purification and conventional sequencing.
For automated sequencing, ~3-µg aliquots of each DNA population were used. DNA fragmentation and adapter ligation were performed as described in (13), and fragments were sequenced at the Centre “Bioengineering” RAS using a 454 GS FLX Standard series automated sequencer (Roche, Basel, Switzerland). Sequences were analyzed using Repeat-Masker software (www.repeatmasker.org/cgi-bin/WEBRepeatMasker).
PCR with gene-specific primers
Non-normalized and normalized DNA were diluted in PCR mixture to a final concentration of 15–20 pg/µL and amplified using gene-specific primer pairs (Supplementary Table S2) and an Encyclo PCR kit, according to the manufacturer's instructions. Depending on the gene being amplified, 28–34 PCR cycles were performed on an MJ Research PTC-200 DNA Thermal Cycler; each cycle consisted of 95°C for 7 s, 63°C for 20 s, and 72°C for 20 s.
Results and discussion
The normalization procedure is illustrated in Figure 1. Following DNA denaturation and re-association, the double-stranded fraction of repetitive sequences is degraded using DSN, and the ssDNA fraction enriched for low-copy sequences is amplified by PCR. Because DSN normalization includes a PCR step to amplify the ssDNA fraction, a specific sequence is introduced into the ends of sheared DNA by ligation of double-stranded adapters.