In May 2009, Suneet Agarwal was part of a Harvard Medical School research group that discovered a new nucleotide called hydroxymethylcytosine (hmC). Since then, he’s been trying to figure out exactly what it does.
But because of its low abundance, weak affinity to antibodies, and structural similarity to the modified nucleotide 5-methylcytosine (5mC), mapping hmC in DNA has been difficult. But now researchers are exploiting the products of chemical reactions involving hmC with the hope that they will eventually be able to generate genome-wide nucleotide-resolution maps with the help of third-generation sequencing. These maps will help to uncover what hmC is doing in the cell.
Hidden in plain sight
In May 2009, two separate research groups published back-to-back papers in Science describing a new DNA nucleotide called 5-hydroxymethylcytosine (1–2). “hmC may be a base modification that’s created in a regulated fashion by a set of enzymes that have an interesting pattern of tissue-specific expression,” says Agarwal. “What it leads to or what the functional significance of the modification isn’t clear yet.”
hmC’s interesting patterns of tissue-specific expression include higher abundance in stem cells, indicating some roles in development and differentiation. Also, hmC had been found in higher concentrations in brain, liver, and kidney tissues. In contrast, cancer cells have almost no hmC.
Being a pediatric oncologist, Agarwal is particularly interested in the link between hmC and childhood leukemia. The genes for TET enzymes—which are associated with the production of hmC—are also associated with the translocation of the gene MLL in childhood mixed-lineage leukemia. One member of the TET enzyme family—TET2—is frequently deleted or loses function in myeloid malignancies and pre-malignant conditions.
“One hypothesis that one could entertain is that there’s a link in terms of epigenetic dysregulation during blood stem cell development that leads to disorder development and then malignancy,” Agarwal says.
At the molecular level, researchers have several theories about what hmC does. hmC is typically found in lower abundance than 5mC, suggesting that it’s an intermediate in a larger process. “One of the most attractive hypotheses is that it’s an intermediate to ultimately demethylating methylcytosine,” Agarwal says. “That process might actually be an explanation for some observations, but they need to be taken with a grain of salt and need to be reinterpreted now.”
Those observations were published in Nature in 2000 by researchers from the Max Plank Institute of Molecular Genetics in Berlin, Germany and the University of Joseph Fourier in La Trouche, France. Staining the zygotic paternal genome of a mouse with a anti-5mC antibody, the team showed that the genome is demethylated within hours of fertilization, before the start of replication (3).
Demethylation is known to occur in two ways, through replication or demethylation enzymes. In replication, a polymerase replicates only the A, G, C, and T bases, leaving the subsequent task of adding epigenetic modifications to other enzymes. In PCR, these modifying enzymes are not present, so the resulting amplified DNA contains no modifications. The other option—enzymes that actively demethylate 5mc by removing the methyl group to convert it to C—have been identified in plant systems but have not been validated in a mammalian system.
Rather than direct demethylation via an enzyme, another plausible explanation for observed demethylation is that 5mC is being masked or converted into something else, possibly hmC, in a multiple-step demethylation process.
An alternative hypothesis for hmC function is that hmC has its own role, recruiting its own proteins or inhibiting the recruitment of 5mC-binding proteins to a particular genetic locus. As a result, it would affect the function of neighboring genes somehow. But, at the moment, all researchers have are theories.
Hard to handle
To reveal the function of hmC, epigenetic researchers first need to map and track the modification to see how it changes as the cell changes. But the detection of hmC has been rather challenging. For one thing, it is present at a low frequency in the genome, requiring a highly sensitive detection technique that can pick up single modifications.
As a result, one of the mainstays of molecular biological research techniques, antibody-based detection, is problematic because it is dependent on the concentration of a base at a particular location. Antibodies can detect hmC with specificity but work much better when multiple modifications are grouped together because of relatively weak affinity of the antibodies for hmC. “While it may not inhibit some level of profiling of methylcytosine, which often occurs in clusters since it’s a relatively predominant modification, it may truly inhibit good, robust mapping of hmC,” Agarwal says.
On the other hand, researchers have actually been detecting hmC for quite some time now, without knowing it. Two methods used to profile 5mC—restriction enzymes and bisulfite sequencing—have been inadvertently picking up hmC as well. The two modifications have similar chemical structures; the difference between the two is that a hydrogen atom in 5mC is substituted with a hydroxyl group in hmC.
At the Institute of Genetics and Molecular Medicine at the Western General Hospital in Edinburgh, Scotland, Donncha S. Dunican and Colm Nestor, researchers in the lab of Richard Meehan, decided to find out if these techniques could discriminate between the two. “We were really curious because we thought this new modification was going to be really hot,” Dunican says.
Restriction enzymes digest DNA at a specific recognition sequence, but for certain restriction enzymes, the sequence is resistant to digestion if it is methylated at a cytosine. Analyzing the BRCA1 gene promoter, which is known to be hypermethylated in cancer, Meehan's team found that both methylated and hydroxymethylated BRCA1 templates are resistant to digestion by these methyl-sensitive restriction enzymes. Previous 5mc maps generated using 5mc-sensitive enzymes may have also included hmC.
In bisulfite sequencing, sodium bisulfite is used to convert cytosine into uracils, but 5mC is resistant to the conversion. Through sequencing, researchers can identify the 5mC because in bisulfite-treated DNA those bases would be retained as cytosines, whereas unmodified cytosines are seen as uracils. But Meehan’s team found that when treated with bisulfite, hmC is converted to cytosine 5-methylenesulfonate (CMS), which is indistinguishable from mC in subsequent sequencing. These results were published in BioTechniques in April 2010 (4).
“Organizations like at the National Institutes of Health have spent so much money on epigenome sequencing in a lot of different human samples, and now we are realizing that the methods previously used don’t pick up the latest kid on the block,” Dunican says.
As a result, previous methylation map datasets must be re-analyzed because of the potential presence of hmC. So now the task is to develop a way to reinterpret these methylation maps with bioinformatics or technology that can differentiate between 5mC and hmC, or to redo these experiments, which carries a hefty price tag.
“This is a real challenge for the epigenetics field because of the potential importance of hmC,” Dunican says. “We basically need to develop better sequencing chemistry.”
Back at the Daley laboratory, Agarwal had already begun developing hmC-specific profiling methods. In particular, Agarwal was interested in exploiting the bisulfite conversion of hmC to CMS. The group was able to raise antibodies against CMS that had higher affinity than antibodies raised against the hmC (5). “Sulfonate and arsenates form good antigens, and even a polyclonal antibody is pretty good,” Agarwal says. “So that’s one way an antibody that might be better than an hmC antibody after a relative simple chemical modification.”
But Agarwal’s group did not stop there. “One of the difficulties is that when you are trying to profile any modification, in this case a relatively new and not-well-studied modification in a genome, you need some way of measuring your degree of coverage and your degree of success,” says Agarwal.
So, at the same time, his team also pursued an alternative method that exploited a phage enzyme that adds a glucose molecule to hmC (5). Using a chemical reaction, the group then turned this sugar molecule into aldehyde, a functional group not typically found in biopolymers such as DNA. Using a sensitive and specific aldehyde-reactive probe, two biotin molecules were added to each hmC to permit the pull down of a single hmC with streptavidin.
Meanwhile, similar techniques have been developed at other labs interested in hmC. For example, researcher Chuan He and colleagues at the University of Chicago also have developed a similar hmC detection technique that uses a glucose molecule to facilitate biotin pull down for detection and sequencing of hmC DNA fragments, and have found hmC in several new human cell lines (6). These complementary methods using different techniques to analyze the same genome are providing an additional reassurance that hmC-distribution patterns are accurate. “All of these methods are going to be important to compare side-by-side and with subsequent methods,” Agarwal says.
As a result, these hmC-specific profiling methods are providing a glimpse of the modification’s location and function. hmC seems to be associated with poised genes, which have relatively low levels of expression in embryonic stem cells. These genes will ultimately be activated or repressed upon differentiation, but prior to that, they remain in a poised state, waiting for a go or stop signal. Peculiarly, these genes have epigenetic marks associated with activation as well as ones associated with inactivation. “These genes are in a dynamic state of being repressed. Like having the gas pedal and the brakes applied at the same time, so it’ll be ready to go just by removing one of those influences,” Agarwal says.
But to functionally prove that may require mapping these hmC maps at the nucleotide resolution, something none of these methods can do currently in a high-throughput manner. “All of these methods say in this area that there’s an hmC, but they don’t say what base the hmC is on,” Agarwal says. “So the next best hope would be to couple these techniques with another technique that provides real-time single-molecule sequencing.”
Nucleotide resolution and beyond
“Right now, epigenetics and genomics are completely separate people,” says Pacific Biosciences CEO Stephen Turner. “Nobody will look into a bisulfite-converted sample and say ‘I’ve made a genomic discovery, but I’m missing all the cytosine bases and am basically looking at a 3-letter alphabet.’ Nobody’s going to do that.”
Unlike next-generation instruments, the PacBio RS system observes in real-time a polymerase molecule as it synthesizes DNA, recording the incorporation of fluorescently labeled nucleotides without the need for amplification. This lack of amplification is key for epigenetic applications because amplification erases modifications. When the polymerase in the system encounters these conserved modifications, it slows down, providing a signal for detection.
“We see them in the same way that you could infer the presence of a speed bump in a road because cars would slow down for it,” says Turner.
But it’s not just a one-dimensional signal that simply detects the presence of a modification. The polymerase actually touches about ten bases, providing ten different opportunities for the tempo and rhythm of nucleotide incorporation to be altered. These altered kinetics create a signature for each different chemical modification, allowing researchers to distinguish each modification. In a paper published in Nature Methods in May 2010, Turner and colleagues used the kinetics test not only to detect methylation but also to distinguish between 5mC and hmC (7).
Since then, Pacific Biosciences has been working with several groups—including He’s lab at the University of Chicago—to develop target enrichment strategies to select DNA regions that contain hmC and enrich them for single-molecule real-time sequencing on their system.
“These targeted enrichment strategies have been published in conjunction with second generation sequencing technologies, but you don’t get the base-resolution. You don’t get real sequencing,” says Pacific Biosciences principle scientist Jonas Korlach.
Incidentally, these targeted enrichment techniques will make modification detection more robust. Following enrichment, the modification is made much larger, increasing the size of the kinetic signature while not disrupting nucleotide incorporation by the polymerase. In the end, target enrichment should provide more confident hmC calls with lower coverage.
“The kinetic molecule is so big that we can detect it on just one molecule. That’s unprecendented,” says Turner.
And the PacBio system may extend beyond the detection of hmC. For example, there are modified DNA nucleotides associated with DNA damage, such as 8-oxoguanine (8-oxoG), a common lesion that has been implicated in aging, Parkinson’s disease, and Alzheimer’s disease, as well as other diseases. Currently no sequencing technique can distinguish 8-oxoG. But Pacific Biosciences claims that it has already mapped 8-oxoG across the mitochondrial genome using their system.
Also, the PacBio system may pave the way for the discovery of new modifications. When Pacific Biosciences chief scientific officer Eric Schadt finished sequencing the bacterium Ranunculaveae polustrus, he decided to try out the new modification detection feature. As a result, he found thousands of chemical modifications. Some made sense, such as N6-methylation of GATC motifs, but about half, Schadt had no idea what they were or what they were doing. There may be many more DNA modifications hidden in genomes that are important but invisible to our current detection techniques.
“The conception of DNA only having four bases, that picture of DNA may disappear,” says Turner. “This goes beyond just hmC.”
1. Kriaucionis, S. and N. Heintz. 2009. The nuclear DNA base 5-hmC is present in Purkinje neurons and the brain. Science 324:929-930.
2. Tahiliani, M., K.P. Koh, Y. Shen, W.A. Pastor, H. Bandukwala, Y. Brudno, S. Agarwal, L.M. Iyer, et al. 2009. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324:930-935.
3. Mayer, W., A. Niveleau, J. Walter, R. Fundele, and T. Haaf. 2000. Demethylation of the zygotic paternal genome. Nature 403:501-2.
4. Nestor, C., and A. Ruzov, R. Meehan, D. Dunican. Biotechniques. 2010. Enzymatic approaches and bisulfite sequencing cannot distinguish between 5mC and 5-hmC in DNA. 48:317-9.
5. Pastor, W.A., U.J. Pape, Y. Huang, H.R. Henderson, R. Lister, M. Ko, E.M. McLoughlin, Y. Brudno, et al. 2011. Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells. Nature 473:394-7.
6. Song, C.X., K.E. Szulwach, Y. Fu, Q. Dai, C. Yi, X. Li, Y. Li, C.H. Chen, W. Zhang, X. Jian, et al. 2011. Selective chemical labeling reveals the genome-wide distribution of 5-hmC. Nat Biotechnol. 29:68-72.
7. Flusberg, B.A., D.R. Webster, J.H. Lee, K.J. Travers, E.C. Olivares, T.A. Clark, J. Korlach, and S.W. Turner. 2010. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods 7:461-5.