In a recent study of seven different RNA-seq library preparation methods (15), the majority involve some sort of fragmentation of the mRNA prior to adapter attachment. The two that do not use a hexamer priming method (16) or in the case of the SMARTer Ultra Low RNA Kit (Clontech, Mountain View, CA)(17), a full length cDNA is synthesized with a fixed 3′ and 5′ sequence added so that the entire cDNA library (average 2 kb in length) can be amplified in long distance PCR (LD-PCR). This amplified double-stranded cDNA is then fragmented by acoustic shearing to the appropriate size and used in a standard Illumina library preparation (involving end-repair and kination, A-tailing and adapter ligation, followed by additional amplification by PCR).
A second post-library construction sizing step is commonly used to refine library size and remove adaptor dimers or other library preparation artifacts. Adapter dimers are the result of self-ligation of the adapters without a library insert sequence. These dimers form clusters very efficiently and consume valuable space on the flow cell without generating any useful data. Thus, we typically use either magnetic bead-based clean up, or we purify the products on agarose gels. The first works in most instances for samples where sufficient starting material is available. When sample input is limiting, more adapter dimer products are often generated. In our experience, bead-based methods may not perform optimally in this situation and combining bead-based with agarose gel purifications may be necessary.
In the case of microRNA (miRNA)/small RNA library preparation, the desired product is only 20–30 bases larger than the 120 bp adaptor dimers. Therefore, it is critical to perform a gel size selection to enrich the libraries as much as possible for the desired product. This resolution of separation is not feasible using beads. Alternatively, we often create large library inserts (1 kb) combined with longer reads (2 × 300 base paired-end) and no PCR amplification for de novo assembly of bacterial genomes. To optimize the value of the data generated for de novo assembly, it is necessary to do careful gel-based size selections to ensure uniform insert size.NGS library construction using fragmented/size selected DNA
There are several important considerations when preparing libraries from DNA samples, including the amount of starting material and whether the application is for resequencing (in which a reference sequence is available to align reads to) or de novo sequencing (in which the reads will need to be assembled to create a new reference sequence). Library preparations can be susceptible to bias resulting from genomes that contain unusually high or low GC content and approaches have been developed to address these situations through careful selection of polymerases for PCR amplification, thermocycling, conditions and buffers (18-21).
Library preparation from DNA samples for sequencing whole genomes, targeted regions within genomes (for example exome sequencing), ChIP-seq experiments, or PCR amplicons (see below) follows the same general workflow. Ultimately, for any application, the goal is to make the libraries as complex as possible (see below).
Numerous kits for making sequencing libraries from DNA are available commercially from a variety of vendors. Competition has driven prices steadily down and quality up. Kits are available for making libraries from microgram down to picogram quantities of starting material. However, one should keep in mind the general principle that more starting material means less amplification and thus better library complexity.
With the exception of Illumina's Nextera prep, library preparation generally entails: (i) fragmentation, (ii) end-repair, (iii) phosphorylation of the 5′ prime ends, (iv) A-tailing of the 3′ ends to facilitate ligation to sequencing adapters, (v) ligation of adapters, and (vi) some number of PCR cycles to enrich for product that has adapters ligated to both ends (1) (Figure 1). The primary differences in an Ion Torrent workflow are the use of blunt-end ligation to different adapter sequences.
Once the starting DNA has been fragmented, the fragment ends are blunted and 5′ phosphorylated using a mixture of three enzymes: T4 polynucleotide kinase, T4 DNA polymerase, and Klenow Large Fragment. Next, the 3′ ends are A-tailed using either Taq polymerase or Klenow Fragment (exo-). Taq is more efficient at A-tailing, but Klenow (exo-) can be used for applications where heating is not desired, such as preparing mate-pair libraries. During the adapter ligation reaction the optimal adapter:fragment ratio is ~10:1, calculated on the basis of copy number or molarity. Too much adapter favors formation of adapter dimers that can be difficult to separate and dominate in the subsequent PCR amplification. Bead or column-based cleanups can be performed after end repair and A-tail reactions, but after ligation we find bead-based cleanups are more effective at removing excess adapter dimers.