2Roche Diagnostics, Roche Applied Science, 9115 Hague Road, Indianapolis, IN, 46250, USA
Full Text (PDF)
The Genome Sequencer FLX System from Roche and 454 Life Sciences™ is a versatile sequencing platform suitable for a wide range of applications, including de novo sequencing and assembly of genomic DNA, transcriptome sequencing, metagenomics analysis, and amplicon sequencing. The Genome Sequencer FLX enables long sequence reads separated by kilobase distances of genomic DNA. These Long-Tag Paired End reads enable improved de novo assemblies and genomic structural variation studies.
454 Life Sciences has developed and commercially released a new protocol for generating a library of paired-end fragments to determine the orientation and relative positions of contigs produced by de novo shotgun sequencing and assembly. This 3K Long-Tag Paired End protocol (Figure 1) can also be used to identify genomic structural variations (1) and their associated breakpoints. Structural variation of the genome, involving large, kilo- to mega-base-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements, is widespread in humans and is presumably responsible for a considerable amount of phenotypic variation. The 3K Long-Tag Paired End library DNA fragments contain an approximately 250-bp fragment with a 44-mer adaptor sequence in the middle, flanked by 100-mer sequences, on average. The two flanking 100-bp sequences are segments of DNA that were originally located approximately 3 kb apart in the genome of interest. In addition to the 3-kb paired-end protocol, initial results from an unreleased protocol that generates flanking reads separated by 16 kb are presented (Figure 2). The 16-kb protocol utilizes a different chemistry than the 3-kb protocol described here.
Traditional approaches to the sequencing of paired-end reads rely upon inserting a DNA fragment into a vector, such as a BAC or fosmid, cloning into bacteria, and subsequently generating two sequences, one from each end of the vector. These methods entail weeks of laboratory work and could cost several hundred thousand dollars to prepare the libraries needed for Sanger sequencing. The Genome Sequencer FLX method presented here, which requires no cloning, generates up to 200,000 paired-end reads from a single Genome Sequencer FLX instrument run with a total elapsed time — from genomic DNA to result — of less than four days.
Sample Preparation ProtocolThe preparation of a 3K Long-Tag Paired End library is depicted schematically in Figure 1.
The protocol begins with fragmentation of the high molecular weight DNA sample by hydrodynamic shearing (HydroShear, Genomic Solutions); the size distribution of the fragments (on average 3 kb) will determine the distance between the Paired End sequencing tags. After purification using size exclusion beads (AMPure, Agencourt) the fragments are protected from EcoRI cleavage by methylation with EcoRI Methylase, and their ends are polished (blunted and 5′-phosphorylated). These ends are made blunt with T4 DNA polymerase and T4 polynucleotide kinase (T4 PNK).
Hairpin Adaptors (biotinylated, and containing non-methylated EcoRI recognition sites; provided in the GS Paired End Adaptor Kit) are ligated onto both ends, and all DNA species that are not protected by hairpins are removed by exonuclease digestion. The small, unwanted molecular species are removed with AMPure beads. The remaining long insert fragments are circularized by digestion with EcoRI to remove the terminal hairpin structures, providing cohesive ends for ligation. The resulting 3-kb circular fragments contain the 44-bp linker (the remainder of the two Hairpin Adaptors), joining the two ends of the fragmented DNA.
The DNA circles are then fractionated by nebulization, generating molecules that are a few hundred base pairs in length, with random-sized paired ends flanking the linker (plus other random fragments from the circles). After polishing the fragment ends (as above with T4 DNA polymerase and T4 polynucleotide kinase), the Paired End library fragments are immobilized onto streptavidin beads (Dynal M-270, Invitrogen) using the biotin tags incorporated into the 44-linker, resulting in the enrichment of linker-positive fragments. The Long Paired End Adaptors (sequences shown below) are ligated to the ends of the linker-positive fragments. The Adaptors provide priming sequences for both amplification and sequencing of the Paired End library fragments, as well as the “sequencing key,” a short sequence of four nucleotides that the Genome Sequencer System software uses for base calling and to recognize legitimate library reads. Long Paired End Adaptor A:
5′–GCCTCCCTCGCGCCATCAG–3′
3′–GGGAGCGCGGTAGTC–5′
Long Paired End Adaptor B:
5′–GCCTTGCCAGCCCGCTCAG–3′
3′–ACGGTCGGGCGAGTC–5′



