Sign Up to BioTechniques free email alert service to receive content updates.
An improved Huffman coding method for archiving text, images, and music characters in DNA


Figure 6. (Click to enlarge)


Image of lamb encrypted in DNA.



Figure 7. (Click to enlarge)


PCR amplification of DNA encrypting text, image, and music, using specific primers.


Unique sequencing primers for information retrieval

The sequencing primers (Figures 1 and 5) were flanked by 5′ CGC and 3′ GCC (sense) or 5′-GGC and 3′-GCG (anti-sense) for easy identification and also for creating a triple-GC clamp at the 3′ end for tight hybridization. In the middle of the primer, we reserved space for coding the plasmid number and the primer number. Primer number 1 indicates the sequence of the 5′ segment of the information insert, and primer number 0 indicates the sequence of the 3′ end of the information insert. Other primers are identified in ascending order, to a maximum of 20 in a 10,000-bp information insert. We specified here the plasmid and primer number in single digits. However, when more than 10 plasmids are included in the library, a space character (GAT) should be inserted to distinguish between the plasmid and primer numbers. Importantly, the plasmid and primer number encoded in the sequencing primer were designed to be flanked by a random seven-base sequence (a total of 14 bases per primer) to provide primer specificity when used for sequencing both in the sense and anti-sense orientation, or for PCR amplification.

Information retrieval by sequencing

In general, sequencing reactions do not provide sequencing information immediately adjacent to the sequencing primer. Moreover, the sequence could occasionally be <500 bp, or a base call can be inconclusive. Therefore, sequencing in both orientations may be required. With the unique design of our primers, sequencing could be achieved with a high degree of accuracy. A good sequencing practice for retrieving information from an information library might be to initially use the sense primers only, and anti-sense primer 2 to retrieve the information of the 5′ end. Thereafter, the other anti-sense primers can be utilized as required. For sequencing in our particular example, we used sense primers 1 and 2, and anti-sense primers 2 and 0. Sequencing with primer 1 yielded a 1276-bp product. Since our insert was 844 bp, it also yielded a plasmid sequence downstream from the insert. However, the sequence was missing 41 bp just downstream of primer 1. This sequence was retrieved with anti-sense primer 2. Together, these two sequencing reactions retrieved the original information with 100% accuracy. We recognize that a sequencing reaction cannot always retrieve 1276 bp as was the case with primer 1, and therefore acknowledge that additional sequencing may be required. As mentioned above, we further retrieved the insert information with additional sequencing with sense primer 2 and anti-sense primer 0, again with 100% accuracy, compared with the initial designed insert. We then decoded the information obtained by DNA sequencing with the guidelines provided in Figures 234, and were able to reconstruct the text, music, and image encrypted in the DNA. An example of part of a sequencing chromatogram achieved with sense primer 1 is shown in Supplementary Figure 2. This particular sequenced information is the DNA sequence coding the rectangle that constructs the image of the lamb's tail (bases 647–674 on the chromatogram) with 100% accuracy, compared with the original sequence.

In addition to the inherent advantageous attributes of information storage in DNA, our improved Huffman coding method for use of unambiguous DNA coding for archiving offers economical, easy pattern recognition and message retrieval through specially designed primers. As DNA synthesis and sequencing become faster and cheaper (Genome Analyzer Sequencing System, Illumina, San Diego, CA, USA; 454 Sequencing, Roche, Branford, CT, USA) information storage in DNA becomes even more attractive.

Acknowledgments

This study was supported by the Canadian Institutes of Health Research (CIHR; grant no. 37779).

The authors declare no competing interests.

Correspondence
Address correspondence to Menachem Ailenberg, St. Michael's Hospital, 30 Bond Street, 16CC-044, Toronto, Ontario, Canada M5B 1W8. Email: m.ailenberg@utoronto.ca.

References
1.) Bancroft, C., T. Bowler, B. Bloom, and K.T. Clelland. 2001. Long-term storage of information in DNA. Science 293:1763-1765.

2.) Cox, J.P.L. 2001. Long-term data storage in DNA. Trends Biotechnol. 19:247-250.

3.) Wong, P.C., K.-K. Wong, and H. Foote. 2003. Organic data memory using the DNA approach. Commun. ACM 46:95-98.

4.) Smith, G.C., C.C. Fiddes, J.P. Hawkins, and J.P.L. Cox. 2003. Some possible codes for encrypting data in DNA. Biotechnol. Lett. 25:1125-1130.

5.) Yachie, N., K. Sekiyama, J. Sugahara, Y. Ohashi, and M. Tomita. 2007. Alignment-based approach for durable data storage into living organisms. Biotechnol. Prog. 23:501-505.

6.) Clelland, C. T., V. Risca, and C. Bancroft. 1999. Hiding messages in DNA microdots. Nature 399:533-534.

7.) Huffman, D.A. 1953. A method for the construction of minimum-redundancy codes. Proc. IRE. 40:1098-1101.

8.) Doig, A.J. 1997. Improving the efficiency of the genetic code by varying the codon length-the perfect genetic code. J. Theor. Biol. 188:355-360.

9.) Yachie, N., Y. Ohashi, and M. Tomita. 2008. Stabilizing synthetic data in the DNA of living organisms. Syst. Synth. Biol. 2:19-25.

10.) Hughes, C.W. 1980. American hymns old and new: notes on the hymns and biographies of the authors and composers, Columbia University Press, NY.

11.) Lewand, R.E. 2000. Cryptographical Mathematics, The Mathematical Association of America Publishing, Washington, DC.

  1    2    3  



Back to top

Search BioTechniques.com: Advanced Search