2Department of Infectious Diseases, King's College London School of Medicine, United Kingdom
The inverted terminal repeats (ITRs) of adeno-associated virus (AAV) are notoriously difficult to sequence owing to their high GC-content (70%) and palindromic sequences that result in the formation of a very stable, 125 bp long, T-shaped hairpin structure. Here we evaluate the performance of two widely used next-generation sequencing platforms, 454 GS FLX (Roche) and MiSeq Benchtop Sequencer (Illumina), in analyzing ITRs in comparatively sequencing linear amplification-meditated PCR (LAM-PCR) amplicons derived from AAV-concatemeric structures. While our data indicate that both platforms can sequence complete ITRs, efficiencies (MiSeq: 0.11% of sequence reads; 454: 0.02% of reads), frequencies (MiSeq: 171 full ITRs, 454: 3 full ITRs), and rates of deviation from the derived ITR consensus sequence (MiSeq: 0.8%–1.3%; 454: 0.5%) did differ. These results suggest that next-generation sequencing platforms can be used to specifically detect ITR mutations and sequence complete ITRs.
Recombinant AAV-vectors (rAAVs) represent a promising tool for therapeutic transgene delivery (1). The recent success of rAAV vectors as the first gene therapy pharmacological agent of the Western world underscores the importance and potential of rAAV-mediated gene therapy (2, 3). To date, AAV gene therapy has been successfully used for the treatment of Leber′s congenital amaurosis (4, 5), α-1 antitrypsin deficiency (6), hemophilia B (7), and heart failure (8). However, two recent studies have also raised concerns regarding the safety of AAV-mediated gene therapy by reporting the development of hepatocellular carcinoma in murine models. Such findings emphasize the need for comprehensive sequence analysis of AAV vector integration and persistence (9, 10).
Both wildtype AAV (wtAAV) and rAAV genomes are flanked by 145 bp ITRs, crucial for virus replication and packaging. These ITRs possess a GC content of 70% and form 125 bp T-shaped hairpin structures (11). These properties pose significant challenges when it comes to integrity identification and sequencing. Intact and mutation-free ITRs are required to achieve high titer AAV vector production (12). Importantly, inability to accurately sequence ITRs could negatively affect identification of AAV integration site (IS) sequences, in particular if viral integrants possess full, or almost complete, ITR sequences. Previously, complete ITR sequences could only be obtained using the Maxam-Gilbert sequencing approach (13, 14) or Sanger sequencing with the addition of betaine or 7-deaza-dGTP (15). Amplification and sequencing of AAV ITRs is required for studying AAV biology and IS analyses to assess AAV safety in translational research (16-18).
We recently demonstrated that ITR PCR amplification can be accomplished with an adequate efficiency (2), and full ITR sequencing is possible using the 454 next-generation sequencing platform (19). However, a comparison of the abilities of different next-generation sequencing platforms to sequence the ITR GC-rich hairpin structures has not been conducted to date. Here, we evaluate the performance of the 454 and MiSeq next-generation sequencing systems for sequencing ITRs.
We infected HeLa cells and human dermal fibroblast (HDFs) in triplicate with wtAAV2 at a multiplicity of infection (MOI) of 104. The cells were passaged 5 times, total DNA was subsequently harvested, and LAM-PCR was performed on 0.5 mg of DNA (20). LAM-PCR primers that bind closely to the AAV 5′ ITR were used. LAM-PCR amplicons were sequenced with the MiSeq and 454 sequencing platforms, and raw sequence reads were processed as previously described (Figure 1A) (2). Samtools and Integrative Genomics Viewer were used to visualize sequencing coverage data (21, 22). LAM-PCR amplicons contain vector-genome junctions and vector-vector junctions. The genomic fragments of vector-genome junctions can be unambiguously mapped to the human genome to determine viral ISs. For our analysis of ITR sequencing integrity, we investigated a total of 14,768454 sequence reads from 7 LAM-PCR libraries and 419,209 MiSeq reads from 28 LAM-PCR libraries derived from 2 cell types (HeLa cells and HDFs).
The high GC content and T-hairpin structures in AAV inverted terminal repeats (ITRs) have made the successful sequencing of complete ITRs challenging. We performed LAM-PCR on DNA from wtAAV2 infected cells and directly compared the performance of two next-generation sequencing (NGS) platforms, GS Flx (Roche) and MiSeq (Illumina). Though efficiencies, frequencies, and rates of deviation varied, both NGS platforms can be used to specifically detect ITR mutations and sequence complete ITRs.
We were able to successfully sequence complete ITRs with both sequencing platforms. However, the MiSeq platform sequenced complete ITRs with an efficiency >5 times that of the 454 device (0.11% versus 0.02%) (Figure 1C, Table 1). Considering the higher data output of the MiSeq, the MiSeq platform is capable of providing 106-fold [(13.5*106 reads /0.7*106 reads) * (0.11%/0.02%)] (Figure 1D) more ITR sequences per run. Notably, the cost per sequenced nucleotide can be up to two log levels lower with the MiSeq platform compared to the 454 system.
We next performed a multiple alignment of concatemer derived ITR sequences to confirm the exact identity of the derived consensus sequence with our wtAAV plasmid map. Individual ITR sequences from both platforms did deviate from the derived consensus sequence (Figure 1E). Deviations may have been due to ITR recombination and erroneous template amplification or sequencing errors. Considering the rate of deviation from the consensus sequence, the Roche 454 deviates at 0.5% per ITR base pair, while the Illumina MiSeq showed a slightly higher deviation rate of 1.1% (0.3–1.4; 25th, 75th percentile) per ITR base pair, culminating in an average of 1.6 and 0.7 deviating base pairs per ITR in MiSeq and 454, respectively (Table 1). These data indicate that high-quality, complete ITR sequences can be obtained using both platforms.
We also performed direct MiSeq sequencing of the wtAAV plasmid to assess direct ITR sequencing limitations. We did note that sequence coverage is significantly reduced in ITR regions (Figure 2A) when using direct sequencing. Subsequently, we analyzed the distribution of sequence deviations from complete ITRs acquired by MiSeq sequencing of the wtAAV plasmid (22,844 sequence reads) (Figure 2B) and from LAM-amplicons (726 sequence reads) (Figure 1D). The 726 LAM-amplicon sequence reads represented 82 biologically existent concatemeric structures. When analyzingthe locations of the sequence deviations, we noticed varying rates of sequence deviation in different parts of the ITR, which could be overlaid to show a gradual decline in sequencing quality over length. In general, LAM-amplicon reads displayed lower rates of deviation than direct plasmid sequencing reads. The unambiguous sequencing of ITRs requires a certain deviation percentage that is dependent on minimal sequencing depth.
We developed a statistical model to estimate the minimal ITR coverage required to retrieve a correct ITR consensus sequence. If the ITR coverage is N, the probability that the consensus is correct is given by the equation:
where pi is the probability that the base at ITR position i deviates from the correct sequence, j is the number of deviating sequences at a given position and qi = 1 - pi (Supplementary Material). A 60 bp moving average was computed from the sequence deviation distribution in order to extract pi. We determined the minimum ITR coverage for error free sequencing (P value < 0.01) at 5 sequences for LAM-amplicons sequenced by the MiSeq sequencing platform. It should be noted that the sequencing depth needed for unambiguous sequence determination may vary depending on sequencing techniques and thus deviation percentages.
Our data demonstrate that ITR mutations can easily be detected at the sequencing depth normally achieved when using the MiSeq platform, despite a nonhomogeneous distribution of sequence deviations in the ITR. Since ITR mutations can be reliably detected by next-generation sequencing platforms, these instruments provide an easy time-saving method to screen AAV genome integrity or AAV production plasmids for mutations. Importantly, AAV integrants with complete or almost complete ITRs can be amplified and sequenced, and AAV vector genome fusion sequences can be detected reliably, which has important implications for comprehensive AAV persistence analyses, AAV biology research, and risk evaluation associated with AAV-mediated gene therapy. Author contributions
K.P. performed experiments, analyzed the data and wrote the manuscript. R.G., C.K., and A.N. provided valuable discussion and intellectual input. R.F. suppliedvaluable bioinformatics analysis. E.H. and R.M.L. gave valuable material and conceptual advice. M.S. set up the experimental design and wrote the manuscript.
K.P. is holder of a stipend (No.:110410) from the German Cancer Aid. This study has been supported by funds from the German Cancer Aid. This study has been supported by funds from the European Commission project with short name AIPGENE (grant FP7-HEALTH-2010-261506).
The authors declare no competing interests.
Address correspondence to Manfred Schmidt, Department of Translational Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany. E-mail: [email protected]
1.) Mingozzi, F., and K.A. High. 2011. Therapeutic in vivo gene transfer for genetic disease using AAV: progress and challenges. Nat. Rev. Genet. 12:341-355. 2.) Kaeppel, C., S.G. Beattie, R. Fronza, R. van Logtenstein, F. Salmon, S. Schmidt, S. Wolf, A. Nowrouzi. 2013. A largely random AAV integration profile after LPLD gene therapy. Nat. Med. 19:889-891. 3.) Gaudet, D., J. Methot, S. Dery, D. Brisson, C. Essiembre, G. Tremblay, K. Tremblay, J. de Wal. 2013. Efficacy and long-term safety of alipogene tiparvovec (AAV1-LPLS447X) gene therapy for lipoprotein lipase deficiency: an open-label trial. Gene Ther. 20:361-369. 4.) Maguire, A.M., F. Simonelli, E.A. Pierce, E.N. Pugh, F. Mingozzi, J. Bennicelli, S. Banfi, K.A. Marshall. 2008. Safety and efficacy of gene transfer for Leber's congenital amaurosis. N. Engl. J. Med. 358:2240-2248. 5.) Bainbridge, J.W., A.J. Smith, S.S. Barker, S. Robbie, R. Henderson, K. Balaggan, A. Viswanathan, G.E. Holder. 2008. Effect of gene therapy on visual function in Leber's congenital amaurosis. N. Engl. J. Med. 358:2231-2239. 6.) Flotte, T.R., B.C. Trapnell, M. Humphries, B. Carey, R. Calcedo, F. Rouhani, M. Campbell- Thompson, A.T. Yachnis. 2011. Phase 2 clinical trial of a recombinant adeno-associated viral vector expressing alpha1-antitrypsin: interim results. Hum. Gene Ther. 22:1239-1247. 7.) Nathwani, A.C., E.G. Tuddenham, S. Rangarajan, C. Rosales, J. McIntosh, D.C. Linch, P. Chowdary, A. Riddell. 2011. Adenovirus-associated virus vector-mediated gene transfer in hemophilia B. N. Engl. J. Med. 365:2357-2365. 8.) Jessup, M., B. Greenberg, D. Mancini, T. Cappola, D.F. Pauly, B. Jaski, A. Yaroshinsky, K.M. Zsebo. 2011. Calcium Upregulation by Percutaneous Administration of Gene Therapy in Cardiac Disease (CUPID): a phase 2 trial of intracoronary gene therapy of sarcoplasmic reticulum Ca2 + -ATPase in patients with advanced heart failure. Circulation 124:304-313. 9.) Donsante, A., D.G. Miller, Y. Li, C. Vogler, E.M. Brunt, D.W. Russell, and M.S. Sands. 2007. AAV vector integration sites in mouse hepatocellular carcinoma. Science 317:477. 10.) Rosas, L.E., J.L. Grieves, K. Zaraspe, K.M. La Perle, H. Fu, and D.M. McCarty. 2012. Patterns of scAAV vector insertion associated with oncogenic events in a mouse model for genotoxicity. Mol Ther. 20:2098-2110. 11.) Henckaerts, E., and R.M. Linden. 2010. Adeno-associated virus: a key to the human genome?. Future Virol. 5:555-574. 12.) Wang, X.S., S. Ponnazhagan, and A. Srivastava. 1996. Rescue and replication of adeno-associated virus type 2 as well as vector DNA sequences from recombinant plasmids containing deletions in the viral inverted terminal repeats: selective encapsidation of viral genomes in progeny virions. J. Virol. 70:1668-1677. 13.) Srivastava, A., E.W. Lusby, and K.I. Berns. 1983. Nucleotide sequence and organization of the adeno-associated virus 2 genome. J. Virol. 45:555-564. 14.) Duan, D., Z. Yan, Y. Yue, and J.F. Engelhardt. 1999. Structural analysis of adeno-associated virus transduction circular intermediates. Virology 261:8-14. 15.) Mroske, C., H. Rivera, T. Ul-Hasan, S. Chatterjee, and K.K. Wong. 2012. A capillary electrophoresis sequencing method for the identification of mutations in the inverted terminal repeats of adeno-associated virus. Human gene therapy. Part B. Hum Gene Ther Methods. 23:128-136. 16.) Hüser, D., A. Gogol-Doring, T. Lutter, S. Weger, K. Winter, E.M. Hammer, T. Cathomen, K. Reinert, and R. Heilbronn. 2010. Integration preferences of wildtype AAV-2 for consensus rep-binding sites at numerous loci in the human genome. PLoS Pathog 6:e1000985. 17.) Li, H., N. Malani, S.R. Hamilton, A. Schlachterman, G. Bussadori, S.E. Edmonson, R. Shah, V.R. Arruda. 2011. Assessing the potential for AAV vector genotoxicity in a murine model. Blood 117:3311-3319. 18.) Wang, Z., L. Lisowski, M.J. Finegold, H. Nakai, M.A. Kay, and M. Grompe. 2012. AAV vectors containing rDNA homology display increased chromosomal integration and transgene persistence. Mol Ther. 20:1902-1911. 19.) Nowrouzi, A., M. Penaud-Budloo, C. Kaeppel, U. Appelt, C. Le Guiner, P. Moullier, C. von Kalle, R.O. Snyder, and M. Schmidt. 2012. Integration frequency and intermolecular recombination of rAAV vectors in non-human primate skeletal muscle and liver. Mol Ther. 20:1177-1186. 20.) Schmidt, M., K. Schwarzwaelder, C. Bartholomae, K. Zaoui, C. Ball, I. Pilz, S. Braun, H. Glimm, and C. von Kalle. 2007. High-resolution insertion-site analysis by linear amplification-mediated PCR (LAM-PCR). Nat. Methods 4:1051-1057. 21.) Robinson, J.T., H. Thorvaldsdottir, W. Winckler, M. Guttman, E.S. Lander, G. Getz, and J.P. Mesirov. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24-26. 22.) Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078-2079.