to BioTechniques free email alert service to receive content updates.
Three Hacks to Get More Data from Your Illumina Sequencer

Lauren Arcuri Ware

Ever wanted to squeeze more information out of your Illumina sequencer? Well here are three labs that have modified their system to meet their own needs. Read more...

As the cost of DNA sequencing instruments has dropped precipitously, more and more labs are getting comfortable with DNA sequencing platforms than ever before. Of course, with this higher level of familiarity, it was inevitable that researchers would open the hood and start tweaking these machines to fit their individual needs. Indeed, several labs have modded their sequencers in unique and interesting ways.

Stuart Levine and colleagues designed a method that allows researchers to use the Illumina Genome Analyzer II with fewer than the standard eight lanes. Source: MIT

Jerrod Schwartz and colleagues hacked an Illumina sequencer to get sequence information the most cost-efficiently from long molecules instead of short. Source: University of Washington

Chris Burge and colleagues designed a method to analyze protein-DNA interactions on an Illumina flowcell. Source: MIT / Donna Coveney

The hacks below run the gauntlet from easy adjustments almost any lab can do to advanced modifications that require a higher level of expertise. But if you attempt any of these hacks, just don’t look to Illumina for any technical support.

Longer Reads from an Illumina Sequencer

The Hack: Illumina sequencers normally fragment DNA into short pieces before anchoring it on the surface of the flowcell. This hack allows researchers to flow longer pieces of DNA onto the flowcell, performing the final library preparation steps in the flowcell itself, in order to get sequence information the most cost-efficiently from long molecules instead of short.

Who: Jay Shendure's lab at the University of Washington in Seattle. Shendure’s lab is focused on targeted ways to do resequencing and looking at large cohorts to tease out variants. They're also interested in synthetic biology and fetal DNA sequencing. "We like tinkering with machines and instruments out there to do what we want them to do," says Jerrod Schwartz, a biologist in the Shendure lab.

Why: Shorter reads can be difficult for genome assembly and are not as good as long reads are for sequencing complex regions or for structural variation analysis. Short reads also can show haplotype variation. Other current methods for generating this kind of information are labor-intensive, limited to less than 40 kilobases, and have poorer data quality.

How: Shendure's lab adds adaptors to the ends of long DNA strands that are complementary to the adaptors already on the flowcell. When one end of the DNA molecule hybridizes to the surface, the other end will hybridize close to it when it is a random coil, or, it will hybridize a given distance away when longer DNA molecules are stretched with the help of an electric field applied to the flow cell. To make the long DNA pieces short enough for sequencing, the team flows in a transposase that binds to the long DNA and then shatters it into two short pieces with the other adaptor now attached at the cleaved ends. Now both pieces can be bridge-amplified, clustered, and sequenced. The sequences of clusters that are either near one another (for the random-coiled DNA) or a specified distance apart (for the stretched DNA) are then put back together, giving paired-end reads for the DNA molecules.

Difficulty: The approach is still a work in progress. "We've got it working up to 10 kilobases or so, and now we want to push it up to larger molecules," says Schwartz. But the actual technique, he says, is "fairly straightforward biochemistry." It involves modifying the standard Illumina library preparation, "tweaking the plumbing," and introducing an electrical field as needed. "Anyone that has a power supply in their lab could potentially do it," says Schwartz.

Paper: Schwartz, J. J., C. Lee, J. B. Hiatt, A. Adey, and J. Shendure. 2012. Capturing native long-range contiguity by in situ library construction and optical sequencing. Proceedings of the National Academy of Sciences 109(46):18749-18754.

Protein-DNA Interaction Analysis on a Flowcell

The Hack: High-throughput sequencing-fluorescent ligand interaction profiling, or HiTS-FLIP for short. Basically, adding fluorescently-tagged proteins directly to an Illumina flowcell to analyze protein-DNA interactions.

Who: Chris Burge's lab at the Massachusetts Institute of Technology (MIT). The Burge lab is interested in gene regulation, including alternative splicing and microRNA targeting.

Why: One of his lab's interests is the RNA binding affinity spectra of RNA binding proteins, a key to understanding RNA splicing. "We were talking at lunch, and said, what could we pump onto the sequencer besides the standard fluorophore-tagged dNTPs?" says Burge. "And then we thought, wouldn't it be cool if you could synthesize RNA onto the flowcell directly, then put a fluorescent-tagged protein onto the flowcell and see where it bound?" says Burge. Because that would be technically difficult to do, they decided first to attempt a similar but less challenging experiment with DNA binding proteins.

How: First the group clusters and sequences a library of random 25-base oligonucleotides to obtain the location and sequence of every cluster. After rebuilding the double-stranded DNA, they then pump fluorescently tagged DNA binding transcription factor (TF) onto the flowcell and image it again. The TF is tagged with mOrange, a GFP derivative with the right excitation and emission wavelengths for analysis using the sequencer's built-in laser and camera.

The resulting image provides information on the location and binding affinity of the clusters to the TF. Clusters that bind to the TF well give off lots of fluorescence, while those that don’t provide little or no signal. "Instead of the Milky Way you get when you do sequencing, as a result of dense packing of clusters, what you get looks more like constellations in the night sky," says Burge.

The binding intensities of each cluster can then be associated with the sequence of that cluster to give a readout of the binding affinities of roughly 100 million different sequences, more measurements than can be performed using current alternative technologies. By measuring binding at a range of different protein concentrations, the dissociation constant (Kd) for each oligonucleotide up to a size of about 11 bases can be measured in a single experiment.

Difficulty: HiTS-FLIP is moderately difficult to implement overall. Rewriting the recipes on the GAII is the easy part, while optimizing conditions takes some time. Analyzing the resulting data requires the most effort, but does not require development of “fundamentally new algorithms,” says Burge.

Since the paper was published, Burge's lab has worked with Bruce Baker's lab at Janelia Farms to make the method faster and more economical. In addition, Burge's former grad student Robin Friedman has improved the analysis component, "so readers may want to stay tuned for HiTS-FLIP 2.0," says Burge.

Paper: Nutiu, R., R. C. Friedman, S. Luo, I. Khrebtukova, D. Silva, R. Li, L. Zhang, G. P. Schroth, and C. B. Burge. 2011. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nature Biotechnology 29(7):659-664.

Lane-by-Lane Sequencing

The Hack: Lane-by-lane sequencing with an Illumina Genome Analyzer II. Modifications allow researchers to use the GAII with fewer than the standard eight lanes.

Who: The BioMicroCenter, a genomics core facility, at MIT, directed by Stuart Levine. The BMC's focus is providing support to roughly one hundred labs in the Biology and Biological Engineering departments, the Koch Institute for Integrative Cancer Research, and the MIT Center for Environmental Health Sciences.

Why: Smaller-scale experiments with unusual design parameters can be run on the GAII. "You can't normally choose to run one or two lanes of data," says Levine. There are plenty of experiments that don't require more than 20 million reads, which can easily be done on one lane. "If you don't have someone to fill out the other lanes, you can't fill out the flowcell. Then you wait, and science is not getting done."

How: Each lane has its own syringe to pull the reagents across the lane. A bar runs across all the lanes to keep everything flowing at the same time. The simplest way to do this, says Levine, is to "simply unscrew the lanes you aren't going to use." This disengages their syringes. However, this changes the volumes being pulled, so the team needed to correct for dead volume—the volume in the single inlet tube between the reagents and the flowcell. They did this by making changes to the script that controls the volume. "We wrote a small Python script that allows us to reliably make sure we catch all the different places where we need to make those changes," he says.

Difficulty: This was tricky to design, says Levine, requiring a fair bit of trial and error. However, the protocol itself is not complicated to execute. It doesn't require special equipment or knowledge of programming. He does mention something important to note: don't expect Illumina to help you out. "We don't do this on our instruments that are still under service contract," he says.

Paper: Gravina MT, Lin JH, Levine SS. Gravina, M. T., J. H. Lin, and S. S. Levine. 2013. Lane-by-lane sequencing using illumina's genome analyzer II. BioTechniques 54(5):265-269.

Keywords:  sequencing