
Figure 7. (Click to enlarge)
PCR amplification of DNA encrypting text, image, and music, using specific primers.
Figure 7. (Click to enlarge)
PCR amplification of DNA encrypting text, image, and music, using specific primers.
Unique sequencing primers for information retrieval
The sequencing primers (Figures 1 and 5) were flanked by 5′ CGC and 3′ GCC (sense) or 5′-GGC and 3′-GCG (anti-sense) for easy identification and also for creating a triple-GC clamp at the 3′ end for tight hybridization. In the middle of the primer, we reserved space for coding the plasmid number and the primer number. Primer number 1 indicates the sequence of the 5′ segment of the information insert, and primer number 0 indicates the sequence of the 3′ end of the information insert. Other primers are identified in ascending order, to a maximum of 20 in a 10,000-bp information insert. We specified here the plasmid and primer number in single digits. However, when more than 10 plasmids are included in the library, a space character (GAT) should be inserted to distinguish between the plasmid and primer numbers. Importantly, the plasmid and primer number encoded in the sequencing primer were designed to be flanked by a random seven-base sequence (a total of 14 bases per primer) to provide primer specificity when used for sequencing both in the sense and anti-sense orientation, or for PCR amplification.
Information retrieval by sequencingIn general, sequencing reactions do not provide sequencing information immediately adjacent to the sequencing primer. Moreover, the sequence could occasionally be <500 bp, or a base call can be inconclusive. Therefore, sequencing in both orientations may be required. With the unique design of our primers, sequencing could be achieved with a high degree of accuracy. A good sequencing practice for retrieving information from an information library might be to initially use the sense primers only, and anti-sense primer 2 to retrieve the information of the 5′ end. Thereafter, the other anti-sense primers can be utilized as required. For sequencing in our particular example, we used sense primers 1 and 2, and anti-sense primers 2 and 0. Sequencing with primer 1 yielded a 1276-bp product. Since our insert was 844 bp, it also yielded a plasmid sequence downstream from the insert. However, the sequence was missing 41 bp just downstream of primer 1. This sequence was retrieved with anti-sense primer 2. Together, these two sequencing reactions retrieved the original information with 100% accuracy. We recognize that a sequencing reaction cannot always retrieve 1276 bp as was the case with primer 1, and therefore acknowledge that additional sequencing may be required. As mentioned above, we further retrieved the insert information with additional sequencing with sense primer 2 and anti-sense primer 0, again with 100% accuracy, compared with the initial designed insert. We then decoded the information obtained by DNA sequencing with the guidelines provided in Figures 234, and were able to reconstruct the text, music, and image encrypted in the DNA. An example of part of a sequencing chromatogram achieved with sense primer 1 is shown in Supplementary Figure 2. This particular sequenced information is the DNA sequence coding the rectangle that constructs the image of the lamb's tail (bases 647–674 on the chromatogram) with 100% accuracy, compared with the original sequence.
In addition to the inherent advantageous attributes of information storage in DNA, our improved Huffman coding method for use of unambiguous DNA coding for archiving offers economical, easy pattern recognition and message retrieval through specially designed primers. As DNA synthesis and sequencing become faster and cheaper (Genome Analyzer Sequencing System, Illumina, San Diego, CA, USA; 454 Sequencing, Roche, Branford, CT, USA) information storage in DNA becomes even more attractive.

