Capturing the Perfect Reference Genome
Andrew S. Wiecek
So in an attempt to improve the reference sequence for their community, Hassanin and his colleagues have decided to try adding a new link to each accession number in the nucleotide database, which they call “external expertise:” that is, continual updates by non-anonymous searchers validate good-quality data or point out problems with a particular sequence.

But Gaudet is not so sure that these comments on the data quality will be prominent enough. “It's interesting, but you have to kind of happen chance on it,” she says.“It's not indexed anywhere, so it's kind of a dead-end way to handle this.” The challenge, she says, is to make such evaluations more powerful than just a comment.

There's always going to be imperfect, unannotated data, so Gaudet believes that biocurators will need some way to represent the quality of the sequence, assembly, and annotation in the future. She suggests that, after enough data is compiled, it may be possible to create a confidence index to assess the quality of a new submission. “I haven't seen anyone do this yet,” says Gaudet. “There's a lot of poorly sequenced, poorly annotated data out there, so we're going to need to have a way to prove this a lot better than we do right now.”

And in the case of the goat genome, it is not clear if the community will be large enough to support the depositing and annotation of the genomic sequences in RefSeq. But there is some hope on the horizon; as the total cost of sequencing continues to drop, coverage of these now-outlier organisms will expand, hopefully providing even better quality references in the future.

