Full Text (PDF)
In 2000, the parlor game among geneticists around the globe was guessing the number of protein-coding genes in the human genome. The fruit fly genome had just been decoded, revealing 14,000 protein-coding genes. Guesses for the human genome ranged anywhere from 30,000 to over 100,000. But when the full human genome sequence was finally published in 2001 (1,2), researchers got quite a surprise—there were 22,000 protein-coding genes, only a few thousand more than the tiny fruit fly.
The 2001 draft sequence roughly established the overall number of protein-coding genes in the human genome. But even before its debut, others had started looking more closely at specific chromosomes to understand gene structure and function relationships. Two years before the complete draft sequences were produced, a group of researchers from the Sanger Center in Cambridge, UK mapped human chromosome 22, which was published in Nature in 1999 (3). This gave researchers their first opportunity to survey the landscape of an entire chromosome. Coupled with the emergence of DNA microarrays, it was possible for the first time to explore transcriptional patterns along a complete human chromosome. Two groups published measurements of the global expression patterns along chromosome 22 in 2003, confirming that many protein-coding genes were expressed as previously predicted (4,5). But something else showed up in these studies: data showed there were twice as many transcribed bases as previously reported and many of them mapped to regions previously annotated as introns or to other portions of the genome not associated with protein-coding genes.
Starting pointJohn Rinn's primary lab is located in a new 18-floor building on the Harvard Medical School campus in Boston, MA; he also has lab space across town in Cambridge at the Broad Institute. He is young and energetic, and constantly on the move between the two locations. Rinn was a first author on one of the chromosome 22 papers in 2003. Although he is now an assistant professor at Beth Israel Deaconess Medical Center in Boston and an associate of the Broad Institute—his first faculty position—Rinn was just a second-year graduate student in Mike Snyder's Yale University lab in New Haven, CT when he began working through chromosome 22 in 2000.
“What we found was that there was RNA all over the place,” he recalls, unable to hide his delight. At the same time, other researchers were also picking up and cataloguing these numerous RNAs. Tom Gingeras's group (then at Affymetrix in Santa Clara, CA), published the other 2003 paper in Science, reporting that 49% of the observed transcription from chromosomes 21 and 22 fell outside any known annotation. Microarrays had demonstrated that more transcription was occurring than researchers had predicted.
While the papers intrigued a number of scientists, most dismissed those intron-transcribed regions as transcriptional noise or junk—in essence, mistakes that did not serve a functional purpose. Although criticism of his work and its significance mounted, Rinn was not deterred. “Whenever anyone says something is wrong, I am intrigued.”
Rinn finished his graduate studies in Snyder's lab by cloning a few of the longer RNA transcripts that did not associate with protein-coding genes, in an effort to study their actual sequences and exonic structures. Even though he and his colleagues found conservation of sequence with other mammalian species (which can be an indicator of functional significance) as well as other hallmarks of function, questions still remained for others looking at this work. Could all of these long RNAs be artifacts? How could so many RNAs with functional roles in the cell have been missed for so long? Skepticism mounted even despite revelations of conservation. “I heard that an artifact is an artifact, no matter how many times you show it,” recalls Rinn. He knew he would need more data to show the significance of these long noncoding RNAs.
Xisting dataIn 1992, Huntington Willard at Stanford University in Palo Alto, CA identified a human gene called X-inactive specific transcript (Xist), which creates a 17-kb RNA transcript (6). The gene did not appear to encode a protein; instead, the Xist RNA seemed to have a structural role since it was localized to the nucleus.
Researchers went on to show that Xist is expressed exclusively from the inactivated X chromosome and mediates the silencing of one X chromosome in females by coating the chromosome. The long Xist transcript represented one of the first long noncoding RNAs where a specific function could be ascribed. But as the work of Rinn, Gingeras, and others found more and more long noncoding RNAs transcribed in the cell, the question in 2003 was: are there others out there like Xist?

