If you want to understand chromosome biology, you’ll need to understand the centromere. First, though, you have to figure out precisely where it is. For some yeast species, centromeres may be as short as 125 bases, which can be surprisingly difficult to pinpoint.
Hi-C was originally developed to map three-dimensional chromosomal architecture, but researchers have also applied it to other complex problems such as de novo chromosome assembly and dissection of metagenomic datasets. As it turns out, centromeres in certain yeast, including Saccharomyces cerevisiae, cluster tightly together, producing spikes of inter-chromosomal contact in Hi-C datasets. On a heat map, that pattern is visible as a series of distinct “dots,” and Centurion is an algorithm designed to find them.
The team applied Centurion to Hi-C datasets from yeast with both mapped and unmapped centromeres, as well as to the malaria parasite, Plasmodium falciparum. They also studied pooled yeast genomes – a simulacrum of a metagenomic dataset. In some cases, the algorithm correctly mapped centromeres to within a kilobase of the correct position, but less accurate calls came within about 5 to 7 kilobases – close enough to make experimental follow-up practical. That was true even with relatively low-resolution Hi-C datasets and poor sequencing coverage, although in general, higher resolution and greater sequencing depth produced more accurate calls.
Dunham said the method may be used to drive comparative genomics studies of centromeric sequences and build more effective plasmid vectors for biotechnology applications. “Centromeres allow plasmids to be inherited stably,” Dunham explained, but for many organisms, including the biotechnologically useful Pichia pastoris, they had yet to be mapped prior to this study. Dunham plans to use the method to map other genomic elements, including replication origins.
Centurion is not the first algorithm designed to exploit the colocalization of yeast centromeres. In 2014, Romain Koszul of the Institut Pasteur in Paris and colleagues published their own algorithm for centromere finding.  When Noble and Dunham compared that algorithm with Centurion, they found Centurion was more accurate, with four- to 10-fold lower errors.
“Chromosomal 3D signatures can be used in many ways to solve or improve genomic limitations,” Koszul noted in an email to BioTechniques. “The key point was to realize that 3D contacts provide a robust, quantitative and objective measure that allows the characterization of these elements, and that could help many teams struggling with centromeric annotation.”
Job Dekker, Co-director of the Program in Systems Biology at the University of Massachusetts Medical School, who developed Hi-C, called Centurion an exciting new “off-label” application of the technology. Relatively few organisms exhibit the kind of centromeric clustering that was exploited here, he noted. But Centurion may be used more broadly to investigate other genomic features, as well -- even in organisms that cannot easily be cultured or for which genomic annotation is lacking.
“It’s amazing how many types of cis elements may be present in genomes that we are not aware of, because we have no functional assay for them,” Dekker said. “Who knows what else is to be discovered?”
 Varoquaux, N, et al., “Accurate identification of centromere locations in yeast genomes using Hi-C,” Nucleic Acids Res, doi:10.1093/nar/gkv424, 2015.
 Marie-Nelly, H, et al., “Filling annotation gaps in yeast genomes using genome-wide contact maps,” Bioinformatics, 30:2105–13, 2014.