to BioTechniques free email alert service to receive content updates.
Flu Epidemic Hits GenBank

Jim Kling

Over 10,000 influenza genomes are now in GenBank. How was this accomplished? Find out...

Right now, hospitals are turning away visitors with influenza symptoms, pharmacies are struggling to meet the demand for vaccines, and the Center for Disease Control has officially declared a flu epidemic. Without a doubt, the strains of influenza currently infecting thousands of Americans are challenging our nation’s health care system.

The Influenza Genome Sequencing Project has deposited 10,000 influenza virus genomes into GenBank. Source: CDC

To combat this threat, geneticists engaged in the National Institute of Allergy and Infectious Diseases’ Influenza Virus Genome Sequencing Project (IGSP) have collected extensive viral sequence data, depositing the 10,000th influenza genome in GenBank earlier this month.

The influenza genome contains about 13,500 nucleotides. Each replication averages one mutation per 10,000 base-pairs. As a result, almost every member of the next generation contains at least one mutation, causing a tremendous amount of variation (1). For example, in waterfowl alone, there are 16 variants of the genes encoding hemagglutinin glycoproteins, which help the virus bind to host cells.

Influenza’s genome comprises eight distinct RNA segments, enabling strains from different species to readily recombine with one another. “An excellent example is the 2009 H1N1 pandemic, which had segments from avian, human, and swine viruses that associated to create a single virus,” said David E. Wentworth, director of viral programs at the J. Craig Venter Institute whose group sequenced about 75% of the genomes deposited by the IGSP so far. “[With the IGSP,] we can better understand how that recombination is involved in the generation of novel pandemics. Often we see that more than one gene is important in that process,” explained Wentworth.

Prior to sequencing, viral strains must be isolated, their genomes amplified, and primers for each of the eight segments designed. To simplify these procedures, Wentworth’s team developed the multisegment reverse transcriptase PCR (m-RTPCR) amplification approach (2).

The method exploits the need for influenza genome fragments to form double-stranded panhandle structures by pairing 12 nucleotides on the 3’ terminus and 13 nucleotides on the 5’ terminus of each genome segment before replication. Because they are required for replication, these nucleotides are strongly conserved, and thus are ideal sites for annealing a single set of primers to amplify all eight genome segments at once. Once amplified, the viral DNA is ready for sequencing.

This method also simplified isolation of particular viral strains by eliminating the need for viral culture prior to selection of subtypes. “You often select a variant that may have been under-represented in the whole population [in the patient],” said Wentworth.

The IGSP is an ongoing project that has become an important resource for basic researchers, vaccine producers, and anti-viral drug developers. Ultimately, the database’s value is hard to explicitly define. “So many people can do different things with the information,” said Wentworth.


  1. Nelson, M. I., S. E. Detmer, D. E. Wentworth, Y. Tan, A. Schwartzbard, R. A. Halpin, T. B. Stockwell, X. Lin, A. L. Vincent, M. R. Gramer, and E. C. Holmes. 2012. Genomic reassortment of influenza a virus in north american swine, 1998–2011. Journal of General Virology 93(Pt 12):2584-2589.
  2. Zhou, B., M. E. Donnelly, D. T. Scholes, K. St George, M. Hatta, Y. Kawaoka, and D. E. Wentworth. 2009. Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and swine origin human influenza a viruses. Journal of virology 83(19):10309-10313.