to BioTechniques free email alert service to receive content updates.
Aptamer selection by high-throughput sequencing and informatic analysis
Shawn Hoon1, Bin Zhou3,4, Kim D. Janda3,4, Sydney Brenner1,2, and Jonathan Scolnick1,2
1Agency for Science Technology and Research, Molecular Engineering Lab, Singapore
2Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA
3Departments of Chemistry and Immunology and Microbial Science, The Skaggs Institute for Chemical Biology
4The Worm Institute for Research and Medicine
BioTechniques, Vol. 51, No. 6, December 2011, pp. 413–416
Full Text (PDF)
Supplementary Material

Traditional methods for selecting aptamers require multiple rounds of selection and optimization in order to identify aptamers that bind with high affinity to their targets. Here we describe an assay that requires only one round of positive selection followed by high-throughput DNA sequencing and informatic analysis in order to select high-affinity aptamers. The assay is flexible, requires less hands on time, and can be used by laboratories with minimal expertise in aptamer biology to quickly select high-affinity aptamers to a target of interest. This assay has been utilized to successfully identify aptamers that bind to thrombin with dissociation constants in the nanomolar range.

Aptamers are nucleic acid oligomers that are synthesized to bind to target molecules with high affinity and selectivity (1, 2). Aptamers have been described that bind to small molecules, proteins, and cells making them important reagents with a wide range of uses including medical diagnostics, imaging and therapeutics (2-6).

Aptamers against a particular target are most often identified through a process referred to as systematic evolution of ligands by exponential enrichment (SELEX)(1, 2). In the SELEX process, a library of randomized nucleic acid sequences, typically 40–60 bases in length, is exposed to a target molecule. Those sequences that bind the target are retained, amplified, and utilized in further rounds of selection. Following multiple rounds of selection, individual aptamers are cloned and sequenced in order to identify the highest affinity binding aptamers from the pool. These lead aptamers are further modified through a series of deletions in order to optimize the aptamer sequence for the highest affinity to the target. While powerful, the overall SELEX process is time consuming, taking weeks to move from random library to optimized aptamer.

Recently Cho et. al. (7) and Kupakuwana et. al. (8) published studies that utilized high-throughput sequencing to shorten the initial aptamer selection time. However, while both groups were able to identify aptamers that bind to the intended target, they either relied on multiple rounds of selection and sequencing (7) or limited the flexibility available in the sequence space by only analyzing fixed-length sequences (8).

Here we describe a method for selecting high-affinity DNA aptamers based on a single round of selection followed by high-throughput sequencing and bioinformatic analysis. Aptamer sequences were identified by selecting variable k-mer length sequences that are enriched in the sequenced library. Using this method to identify DNA aptamers that bind to thrombin, we found both known and novel thrombin binding aptamers. Our method, Aptamer Selection by K-mer Analysis of Sequences (ASKAS) provides a tool for researchers to discover high-affinity aptamers of varying lengths with minimal hands on time.

Materials and methods

Aptamer library

A single stranded DNA oligonucleotide library was made by Integrated DNA Technologies (Coralville, Iowa, USA). The library consisted of 33 random bases flanked by fixed regions corresponding to the Illumina GA adapter sequences 5' – ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' and 5' AGATCGGAAGAGCTC-3'. These flanking sequences form a stable 13bp stem so that the random library is presented as a 33bp hairpin. The library had a theoretical diversity of 7 × 1019 sequences. Prior to use, the library was diluted in thrombin binding buffer (9) consisting of 50mM Tris pH7.5, 100mM NaCl, 1mM MgCl2. A positive control aptamer described in Tasset et al. (9), 5'-GAGTCCGTGGTAGGGCAGGTTGGGGTGACTTCGTGGAA-3' was made and flanked by the same fixed regions as the random library.

Thrombin preparation

1790U of Human Thrombin (Sigma-Aldrich, St. Louis, MO, USA) was diluted in 130 µl of PBS giving a concentration of 11.9µM. 112µl was then added to 160µl of pre-washed epoxy coated magnetic beads (Life Technologies, Carlsbad, CA, USA) with 48µl of 3M (NH4)2SO4 and was incubated overnight at 37°C followed by washing three times in thrombin binding buffer. The beads were resuspended in 160µl of thrombin binding buffer and 10µl, approximately 80 pmoles of thrombin, were used for the aptamer selection. Negative selection beads were treated as above, but without the presence of thrombin protein.

Aptamer selection

Approximately 1012 molecules of aptamer library and 6000 copies of positive control aptamer were separately heated to 95°C followed by 5 min on ice and a 5 min incubation at room temperature. The two were then combined for selection in 50µl of thrombin binding buffer plus 0.1% BSA. A negative selection was performed by adding the library to the blank beads for 30 min. Following the negative selection, a magnet was used to capture the beads and the solution containing unbound aptamer was transferred to a tube containing thrombin-coated beads. Aptamers were incubated at room temperature for 30 min with the thrombin-coated beads at which point the unbound aptamers were removed and fresh buffer was added to the tube. Two more 30 min incubations followed by buffer exchanges were performed. After the third incubation, beads were quickly washed in buffer one more time before performing a PCR directly on the beads. Twelve cycles of PCR amplification was conducted using KOD Hot Start polymerase (Merck Biosciences, Gibbstown, NJ, USA) with Illumina amplification primers 5' – AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -3' and 5' – CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT – 3' that hybridize to the fixed sequence of the aptamers. The PCR product was gel extracted and quantitated on a Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) using the High Sensitivity DNA kit. The library was then sequenced on an Illumina GA2x (Illumina, San Diego, CA, USA) following the manufacturer's protocols for single pass 36 cycle sequencing using the Illumina Genomic DNA Sequencing Primer, 5' ACACTCTTTCCCTACACGACGCTCTTCCGATCT 3'. A second aptamer library that had not been selected for thrombin binders was also sequenced following the same methodology.

Sequence analysis

Sequence reads were preprocessed first by removing sequences that were > 80% homopolymeric. Sequences that did not contain any aptamer sequence were also removed. After analyzing the results of the positive control aptamer, further analysis was performed with a data set in which all sequences containing 13 contiguous bases matching the positive control aptamer sequence were removed. K-mer analysis was conducted using the freely available Tallymer software (10). K-mer values of 15–33 were used.K-mers were clustered (70% identity) using the cluster_seq ( program which is driven by the uclust clustering engine (11). The 70% identity level was chosen for clustering in order to compress the top ten sequences down to five for the determination of dissociation constants by surface plasmon resonance.

Dissociation constants of aptamers and thrombin were measured using surface plasmon resonance (Biacore Flexchip, GE Life Sciences, Piscataway, NJ, USA). A streptavidin coated chip was spotted with biotinylated aptamers (IDT) and 50mM Thrombin in thrombin binding buffer was used to make the measurements.

Quantitative PCR

Quantitative PCR was carried out using the Quantitect Sybr Green PCR kit (Qiagen) and analyzed on an ABI 7900 FAST (Life Technologies). Primers used were the Illumina amplification primers described above.

Results and discussion

Aptamer selection procedure

Traditional SELEX experiments require multiple rounds of binding aptamers to the target in order to increase the probability that the small number of individually cloned sequences will represent high-affinity aptamers and not artifacts of the SELEX experimental design. In contrast, we chose to utilize high-throughput sequencing which provides millions of sequences as a readout of the selection process.

An aptamer library was prepared and subjected to selection as described in Material and Methods. Previous data from Berezovski et al. (12) showed that three rounds of partitioning was sufficient to increase the overall library affinity to the target by 5 orders of magnitude. Our own preliminary experiments also suggested that three washes was sufficient to remove much of the non thrombin binding aptamer (data not shown), hence, three washes of the aptamer library bound to thrombin were performed. The selected aptamers were amplified and sequenced as described in Materials and Methods.

Two libraries were sequenced; the library selected for thrombin binding aptamers and a second non-selected (naïve) library as a test for the randomization of the original aptamer pool. We obtained sequencing reads from both libraries and after preprocessing the sequences to remove poor quality sequencing reads (see Materials and Methods), 11.5M and 15M reads respectively were used for further analysis.

Selection for thrombin-binding aptamers

An initial analysis was performed to determine whether or not our aptamer selection procedure altered the distribution of sequences obtained from the two libraries. 98.9% of the sequences in the naïve library were found to be unique giving us confidence that our initial aptamer pool did not have any strong sequence biases. After filtering out the positive control sequences (see Materials and Methods), we compared the distribution of sequences from the naïve library to that of the thrombin selected library. Figure 1 shows a clear shift in the distributions such that more aptamers were sequenced multiple times in the selected library compared with the naïve, unselected library. For example, in the naïve library only one 33mer was sequenced five times, whereas in the thrombin selected library 84,627 33mer sequences were counted five or more times with one sequence appearing 31 times. The larger number of repeated sequences in the selected library exists despite the higher number of overall sequencing reads in the naïve library. These data suggest that our selection protocol did enrich for sequences that bind to thrombin.

  1    2    3