to BioTechniques free email alert service to receive content updates.
Segmenting the Genome

Lauren Arcuri Ware

Researchers have defined different mutational states of the genome—work that should enhance biomedical data mining in the future. Lauren Ware finds out how they did it. Learn more...

Pinning down the mutation rate in a given organism has proven extraordinarily difficult for researchers. But in a recent study published in Proceedings of the National Academy of Sciences, Kateryna Makova and her colleagues at Pennsylvania State University shed some light on why this might be.

Segmentation by chromosome. Source: PNAS

While past studies have focused on regional variations in mutation rates, Makova and her team looked at four different mutation types simultaneously—something that hasn’t been done before in a single analysis. Focusing on small insertions and deletions, nucleotide substitutions, and mononucleotide microsatellite repeat number alterations using a statistical technique called hidden Markov models, or HMMs, they searched for contiguous stretches of the genome that had similar mutation rates and grouped these together, revealing six different divergence profiles, dubbed “hot”, “del/sub-warm”, “ins-warm”, “cold auto”, “cold X”, and “microsatellite”.

“The HMMs identified six regions that can explain most of the variability in the mutation rates across the genome,” says Francesca Chiaromonte, a co-author on the paper. “What we see is that we have hot and warm regions toward the tips of chromosomes, near telomeres, and cold regions either in the middle of chromosomes or on chromosome X, which is very cold. The only state that seems to be random is the microsatellite state.”

This means that near the ends of chromosomes mutations are happening more quickly, while in the middle mutation rates are depressed. What’s more, some mutation events change more rapidly than others. The team also correlated the mutation types with a long list of parameters in the genomic landscape.

“We tried to relate the location of genes and functional marks to our states, and what we found is extremely interesting and somewhat unexpected,” says Chiaromonte. “We find that genes, as well as functional marks, predicted enhancers, and promoters, are overrepresented in hot regions and insertion-warm regions. So we have these hot regions that are functionally important and mutate fast, as if there would be some advantage for functional regions to mutate more quickly.”

The study offers a possible explanation as to why there have been disputes about the accuracy of mutation rate estimates. Typically, pedigree studies show higher rates of mutation than phylogenetic studies. “What we are finding is that pedigree studies have this very high enrichment of SNPs in our hot regions. That could explain why they give higher rates,” notes Chiaromonte, explaining that both types of studies are right, they’re just looking at different parts of the genome.

One direct application of the group’s findings could be improved screening for disease-related gene variants. Right now, variant screens result in many false positives. But if scientists know that a particular variant occurs in a “hot” region (as determined by HMM analysis), it is more likely to be a false positive due to the higher mutation rate compared to “cold” regions.

The new approach can also be used to further understand the genome by identifying functional regions that are not located in genes. Currently, researchers focus on areas containing genes, which are evolutionarily more conserved between species, to identify functional elements. The accuracy of this type of analysis can be improved for non-genic regions by accounting for the variations in mutation rates across the genomic landscape.


Don et al. 2013. “Segmenting the human genome based on states of neutral genetic divergence.” Proc. Natl. Acad. Sci. doi: 10.1073/pnas.1221792110

Keywords:  Genome