2Babraham Institute, Cambridge, UK
We have developed a sequencing method on the Pacific Biosciences RS sequencer (the PacBio) for small DNA molecules that avoids the need for a standard library preparation. To date this approach has been applied toward sequencing single-stranded and double-stranded viral genomes, bacterial plasmids, plasmid vector models for DNA-modification analysis, and linear DNA fragments covering an entire bacterial genome. Using direct sequencing it is possible to generate sequence data from as little as 1 ng of DNA, offering a significant advantage over current protocols which typically require 400–500 ng of sheared DNA for the library preparation.
Pacific Biosciences (Menlo Park, CA, USA) have developed a platform that will sequence a single molecule of DNA in real-time via the polymerization of that strand with a single polymerase (1-6). This technique has many benefits over multi-molecule (clonal) sequencing technologies (7, 8); one such potential advantage is that it may not be absolutely necessary to make a library (i.e., create SMRT bells (9)) to generate sequence data. The only input (molecular) requirements to enable sequencing are a primed piece of DNA; both single-stranded and double-stranded molecules will work. The polymerase is necessarily highly processive starting with a location on the DNA at which it can bind, i.e., a free 3′-OH group. We decided to test whether any primed DNA molecules, lacking any other features of a PacBio SMRT bell, could be used directly in a sequencing reaction. The bound complex (DNA-primer-polymerase), although lacking PacBio adapter sequences, can still be sequenced on the PacBio platform. The present efficiency of this process, in terms of the numbers of reads generated and Mb yield per SMRT cell, is considerably less than that using standard libraries. With standard methods a typical SMRT cell will yield 35,000–50,000 reads and 100–160 Mb of mapped bases. The direct sequencing method described here has generated up to 3000 reads per SMRT cell and therefore its utility is limited to small genomes. However, this approach enables one to acquire sequence data from comparatively low amounts of DNA, even less than 1 ng of input, and within eight hours from receiving the sample. There is a slight time saving, compared with the 12 h required for standard library prep. This is not the main advantage, though it does now offer a route from sample to sequence within an average working day. This protocol may be of benefit to the direct sequencing of plasmids, single-standed or double-stranded viruses, mitochondrial DNA, and microbial pathogens in a clinical setting.
Materials and methods
M13mp18 viral DNA (both single-stranded and double-stranded; catalog no. N4040S and N4018S, respectively) and M13 forward (5′-GTTTTCCCAGTCACGAC-3′) and reverse sequencing primers (5′-AACAGCTATGACCATG-3′) were from New England Biolabs (Hitchin, UK). Methicillin-resistant Staphylococcus aureus (MRSA) plasmids were purified from a solution prep of S. aureus TW20 using a Qiagen (Crawley, UK) Plasmid Midi Kit with Qiagen Genomic-tip 100/G following the manufacturer's “very low-copy plasmid/cosmid purification protocol” from a 500 ml culture. Plasmid Safe DNase (Epicentre Biotechnologies, Madison, WI, USA) was used to reduce the amount of linear single- and double-stranded molecules from the TW20 plasmid prep. Random hexamer primers from Roche (Welwyn Garden City, UK) were used, as provided in the Transcriptor First Strand cDNA Synthesis Kit. pET28a plasmid vectors encoding EcoDamI methyltransferase (Dam constructs) expressed in dam-/dcm Escherichia coli cells, were prepared in-house. Components from the DNA/ Polymerase Binding Kit 2.0 from Pacific Biosciences were used during the annealing and binding reactions. The Annealing and Binding Calculator (version 1.3.1) provided by Pacific Biosciences was used to calculate the concentration of bound complex to be loaded onto the sample plate for the instrument. An MJ PTC-225 thermocycler from MJ Research (Watertown, MA, USA) was used for the annealing and binding reactions. The PacBio DNA Sequencing Kit 2.0 (8Rxn) and SMRT Cell 8Pac v2 (8 Cells) were used for sequencing. Sequence analysis was performed with SMRT portal, SMRT pipe, and SMRT View, version 1.3.1, and Motif Finder, version 0807, all from Pacific Biosciences.
Standard library preparation was omitted; the DNA templates were used directly in the annealing reaction. For each experiment, a quantity of DNA between 1 ng and 100 ng was annealed with suitable primers. With ssDNA, the annealing reaction used the standard PacBio protocol; i.e., 2 min at 80°C followed by cooling at 0.1°C/s to 25°C. With dsDNA, a different annealing protocol was used; the reaction was heated to 95°C for 5 min, then immediately snap-cooled on wet ice. As an example, when using ds M13mp18 DNA, 2.2 µL of DNA at 46 ng/µl (∼100 ng), 0.9 µL PacBio Primer Buffer (10×), and both 0.9 µL forward primer (10 µM) and 0.9 µLreverse primer (10 µM) were mixed in a final annealing reaction volume of 9 µL. The final concentration of DNA template was therefore ∼2.5 nM, with 1000 nM primer (∼400×). In order to use the PacBio Annealing and Binding calculator, we assumed that denatured M13mp18 DNA is comparable to a SMRT bell, with half the original double-stranded M13mp18 molecule's nominal length; i.e., one double-stranded 7.2-kb molecule, when denatured, becomes two 3.6 kb SMRT bells. A 2-fold dilution series of DNA was used to create additional annealing reactions in the range of 0.8-100 ng of DNA. There was a massive excess of forward and reverse primers at the lower concentrations of DNA in these reactions.
Binding reaction, loading, and sequencing
In the binding reaction, the ratio of polymerase to template DNA used was 3:1. First, 1.5 µL of polymerase (1600 nM) was combined with 25 µL of binding buffer giving a 90 nM polymerase solution. Four µl of a 1:1:1 DTT:dNTP:binding buffer mix (each from the PacBio Binding Kit) was added to the annealed template DNA and 1.5 µL of 90 nM polymerase was added. This was mixed gently by pipetting and then incubated at 30°C for 4 h.
The bound complex was loaded at 1 nM onto the instrument. Typically this is achieved by diluting the bound complexes with a mixture of 1:10 DTT:Complex Dilution Buffer. In this experiment, however, it was only possible to achieve a 1 nM loading concentration for the samples containing 100 ng and 50 ng input DNA. For the other samples in the 2-fold dilution series, the calculated concentration was <1 nM before dilution. The total volume of 14.6 µL of binding reaction was therefore loaded directly into the sample plate wells for each of these dilute samples.
Two × 45 min sequencing movies were acquired for each sample in this study. Mapping, de novo assembly, and modification analysis, were carried out with PacBio's SMRT Analysis pipeline run via the SMRT Portal interface. PacBio's Motif Finder was used in the final step of analysis for the pet28a plasmid vector to characterize the sequence specific motif at which base modifications were observed.
Results and discussion
At the outset of this study, an experiment using single-stranded M13mp18 viral DNA and the M13 forward sequencing primer (5′-GTTTTCCCAGTCACGAC-3′) showed that it was possible to generate sequence data directly from circular DNA molecules without library preparation; i.e., fragmentation, end repair, and adapter ligation. From 25 ng of ssDNA and 100-fold molar excess of primer, it was possible to map the data generated against the 7.2 kb M13mp18 reference sequence, calling 100% of the bases with 100% consensus accuracy. We next attempted to sequence double-stranded circular molecules of M13mp18 using both forward and reverse primers to obtain information from both strands in a single run. The sequencing of dsDNA molecules should have much wider application; for example, plasmids, phages, and ultimately larger genomes. Proving the ability to generate sequence data for both strands was therefore an important step in the development of this technique, especially considering the future application of PacBio for epigenetics (including hemi-methylation patterning); the ultimate goal was to sequence fragmented linear dsDNA, e.g., any sheared genomic sample, and generate enough useful data for future applications. We denatured the double-stranded DNA at 95°C for 5 min and snap-cooled (see Materials and Methods) in the presence of excess primer to successfully prime the two strands. This snap-cooling technique and the large primer concentration was utilized to give maximum opportunity for priming each strand while minimizing re-annealing of the genomic DNA. Alternative annealing conditions were tested as well: (i) following the standard PacBio recommended protocol of slowly cooling from 80°C to 25°C, (ii) snap-cooling from 95°C then raising the temperature to 45°C for 2 min, and (iii) cooling as quickly as possible on a thermocycler from 95°C to 45°C. In each of the latter three cases, far fewer reads weregenerated in the sequencing run. Snap-cooling on ice from 95°C was used subsequently for each dsDNA sample.
Figure 1 shows the difference in coverage profile of the M13mp18 genome when sequenced as ssDNA with the M13 forward sequencing primer, and the dsDNA sequenced with both the M13 forward and reverse sequencing primers (Figure 1, middle panel). The uneven coverage profile is due to the population of mapped reads (on each strand) having a distribution of read lengths, but most of the reads would start from approximately the same position on the genome. With PacBio sequencing at present there is a “dark-time” of several minutes prior to the start of movie acquisition, which is the time it takes from initiation of the sequencing reaction to alignment of the SMRT cell in the correct position. Although the priming sequence for a given strand is in the same position on each molecule, variation in the polymerization speed and longevity does account for some of the observed distribution. Additionally, the SMRT pipe software might have difficulty in mapping some reads, especially those that extend beyond the end of the linear FASTA reference. The DNA molecules sequenced were circular, but the reference used is a single linear sequence. Therefore, a number of the reads generated in these runs will, in fact, extend beyond the artificially imposed boundaries of the reference file. Some of the longest reads will also span the entire circular genome, further complicating the automatic analysis. The SMRT analysis software is not designed to deal with reads of this nature; although the initial filtering of the data are unaffected, as it's based on read quality metrics only. None of the reads have PacBio adapters that signal the end of a DNA template fragment so the standard re-sequencing (mapping) protocol in SMRT portal possibly contributes to the uneven coverage profiles generated (mapping thresholds were a maximum of one hit per read, 30% maximum divergence, and minimum anchor size of 12). Some reads were longer than the entire genome as evidenced by the maximum read lengths in the SMRT Portal raw read-length histogram (i.e., any reads >7.2 kb). These very long reads could be observed using PacBio's SMRT view software by concatenating two M13mp18 references in tandem into a single FASTA file (Figure 2).