The average number of total raw sequenced reads required to correctly identify the CODIS STR alleles with greater than 99% confidence in each sample was less than one million. In addition, the minimum number of reads mapped to the entire in silico reference required to make the same calls was as few as 18,500. However, the limitations of determining longer alleles of D21S11 and potential allele drop-out from D18S51 was further illustrated with the Monte Carlo Model, as increasing number of reads failed to provide 100% accurate genotyping for these loci using this sample set (Figure 2). Similar results were obtained from submitting subsets (10% and 1%) of total sequence reads to the reference aligner (Supplementary Table S4).These data showed that significantly fewer reads than were initially generated were required to make STR genotyping calls by the NGS approach. Consequently, in order to maximize the number of individuals included within a single sequencing run and thereby reduce the cost of analyzing STRs by this method, multiplexing of samples within a large cohort of individuals is recommended.
Here we have demonstrated the potential of using a widely adopted NGS platform for accurately genotyping the CODIS STR loci. Due to the short length of sequence reads attributed to this platform, sequencing STR loci was previously assumed impractical because of their relatively large repeat size (14). The D21S11 locus was the only problematic STR in this regard, as repeat regions extending beyond the 150 bp maximum read length were not resolved. Paired-end (PE) sequencing with overlapping reads could possibly be beneficial for the Illumina GAIIx platform, potentially generating assembled reads longer than 150 bp. However, with simple repeats, the degree of overlap between the paired ends cannot be determined with confidence. Furthermore, STR analysis would not profit from nonoverlapping PE reads with unsequenced inserts, since the reference alignment method requires a contiguous sequence covering the entire STR repeat and some of the unique flanking regions both up- and downstream of the repeat. A final benefit of using the Illumina NGS platform in STR genotyping analyses is the high level of coverage obtained from one sequencing run. Not only did the large sequence data allow for high probability in accuratelycalling alleles at STR loci, it is also amenable for genotyping a large number of individuals. Based on the sensitivity analysis for determining minimum read lengths necessary for accurately calling alleles for a given STR locus, we estimate that >300 individuals could theoretically be STR-genotyped using eight lanes (one flow cell) of an Illumina GAIIx, thereby providing a significant cost and time savings advantage. Alternatively, the massive sequencing bandwidth could be used to evaluate other forensically relevant genetic features, such as ancestry informative markers or forensic DNA phenotypic markers.
This research was funded by an internal research and development program at Battelle Memorial Institute. PSY and BR were supported by NCI Comprehensive Cancer Center Support Grant P30 CA016058.
The authors declare no competing interests. The authors and their institutions do not specifically endorse any of the third-party products or technologies described in this report.
Address correspondence to Seth A. Faith, Battelle Memorial Institute, 505 King Avenue, Columbus, OH, USA. e-mail: [email protected]
1.) Budowle, B., A. Masibay, S.J. Anderson, C. Barna, L. Biega, S. Brenneke, B.L. Brown, J. Cramer. 2001. STR primer concordance study. Forensic Sci. Int. 124:47-54.
2.) Jobling, M.A., and P. Gill. 2004. Encoded evidence: DNA in forensic analysis. Nat. Rev. Genet. 5:739-751.
3.) Butler, J.M. 2007. Short tandem repeat typing technologies used in human identity testing. BioTechniques 43:ii-v.
4.) Ellegren, H. 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 5:435-445.
5.) Budowle, B., T.R. Moretti, S.J. Niezgoda, and B.L. Brown. 1998.CODIS and PCR-based short tandem repeat loci: law enforcement tools, p.73-88. In Proceedings of the Second European Symposium on Human Identification (June 1998, Innsbruck, Austria). Promega Corporation, Madison, WI.
6.) Butler, J.M., E. Buel, F. Crivellente, and B.R. McCord. 2004. Forensic DNA typing by capillary electrophoresis using the ABI Prism 310 and 3100 genetic analyzers for STR analysis. Electrophoresis 25:1397-1412.
7.) Fregeau, C.J., and R.M. Fourney. 1993. DNA typing with fluorescently tagged short tandem repeats:a sensitiveand accurate approach to human identification. BioTechniques 15:100-119.
8.) Kimpton, C.P., P. Gill, A. Walton, A. Urquhart, E.S. Millican, and M. Adams. 1993. Automated DNA profiling employing multiplex amplification of short tandem repeat loci. PCR Methods Appl. 3:13-22.
9.) Edwards, A., A. Civitello, H.A. Hammond, and C.T. Caskey. 1991. DNA typing and genetic mapping with trimeric and tetrameric tandem repeats. Am. J. Hum. Genet. 49:746-756.
10.) Ge, J., B. Budowle, and R. Chakraborty. 2010. Interpreting Y chromosome STR haplotype mixture. Leg Med (Tokyo) 12:137-143.
11.) Pitterl, F., K. Schmidt, G. Huber, B. Zimmermann, R. Delport, S. Amory, B. Ludes, H. Oberacher, and W. Parson. 2010. Increasing the discrimination power of forensic STR testing by employing high-performance mass spectrometry, as illustrated in indigenous South African and Central Asian populations. Int. J. Legal Med. 124:551-558.
12.) Zietkiewicz, E., E. Zietkiewicz, M. Witt, P. Daca, J. Zebracka-Gala, M. Goniewicz, and B. Jarzab. 2012. Currentgenetic methodologiesin the identification of disaster victims and in forensic analysis. J. Appl. Genet. 53:41-60.
13.) Divne, A.M., H. Edlund, and M. Allen. 2010. Forensic analysis of autosomal STR markers using Pyrosequencing. Forensic Sci. Int. Genet. 4:122-129.
14.) Fordyce, S.L., M.C. Avila-Arcos, E. Rockenbauer, C. Borsting, R. Frank-Hansen, F.T. Petersen, E. Willerslev, A.J. Hansen. 2011. High-throughput sequencing of core STR loci for forensic genetic investigations using the Roche Genome Sequencer FLX platform. BioTechniques 51:127-133.
15.) McGuigan, J., N. Shouyong, S. Liu, E. Bouton, M. Manion, C.S.J. Liu, and M. Holland. 2011. Human Identity Analysis with Next GENe Software Using Pyrosequencing Reads, SoftGenetics. State College, PA.
16.) Hert, D.G., C.P. Fredlake, and A.E. Barron. 2008. Advantages and limitations of next-generation sequencing technologies: a comparison of electrophoresisand non-electrophoresis methods. Electrophoresis 29:4618-4626.
18.) Greenspoon, S.A., J.D. Ban, L. Pablo, C.A. Crouse, F.G. Kist, C.S. Tomsey, A.L. Glessner, L.R. Mihalacki. 2004. Validation and implementation of the PowerPlex16 BIO System STR multiplex for forensic casework. J. Forensic Sci. 49:71-80.
19.) Langmead, B., C. Trapnell, M. Pop, and S.L. Salzberg. 2009. Ultrafastand memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10:R25.
20.) Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin 1000 Genome Project Data Processing Subgroup. 2009. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25:2078-9.
21.) Quinlan, A.R., and I.M. Hall. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841-842.
22.) Ihaka, R., and R. Gentleman. 1996. R: A language for data analysis and graphics. Journal of computational and graphical statistics: a joint publication of American Statistical Association, Institute of Mathematical Statistics. Interface Foundation of North America 5:299-314.
23.) Budowle, B., B. Shea, S. Niezgoda, and R. Chakraborty. 2001. CODIS STR loci data from 41 sample populations. J. Forensic Sci. 46:453-489.
24.) Griffiths, R.A., M.D. Barber, P.E. Johnson, S.M. Gillbard, M.D. Haywood, C.D. Smith, J. Arnold, T. Burke. 1998. New reference allelic ladders to improve allelic designation in a multiplex STR system. Int. J. Legal Med. 111:267-272.
25.) Weir, B.S., C.M. Triggs, L. Starling, L.I. Stowell, K.A. Walsh, and J. Buckleton. 1997. Interpreting DNA mixtures. J. Forensic Sci. 42:213-222.
26.) Evett, I.W., C. Buffery, G. Willott, and D. Stoney. 1991. A guide to interpreting single locus profiles of DNA mixtures in forensic cases. J. Forensic Sci. Soc. 31:41-47.
27.) Budowle, B., A.J. Onorato, T.F. Callaghan, A. Della Manna, A.M. Gross, R.A. Guerrieri, J.C. Luttman, and D.L. McClure. 2009. Mixture interpretation: defining the relevant features for guidelines for the assessment of mixed DNA profiles in forensic casework. J. Forensic Sci. 54:810-821.