Lineage-based genetic markers are quite useful since the analysis of missing persons and unidentified human remains sometimes involve complex kinship scenarios. Typically, such analyses utilize genetic markers residing on the maternally-inherited mitochondrial genome and the paternally-inherited Y chromosome. Unfortunately, these lineage-based systems lack a high power of discrimination, so identity cannot be assigned with a high degree of confidence. Existence of haploblock structures in the human genome, revealed by using data from the International HapMap project (50,51,52,53,54), offers an opportunity to explore novel panels of SNP markers that would perform a similar function to non-autosomal lineage-based genetic markers. Several tightly linked autosomal SNPs that are inherited together form a haplotype block. As a unit, the haploblock has higher discrimination power for kinship analysis than the individual SNPs within the block. Ge et al. (55) described selection criteria for candidate haploblocks to include linkage disequilibrium of SNPs comprising the block, low and high levels of population heterogeneity, and haploblock conformance to Hardy-Weinberg equilibrium expectations. Several haploblocks have been identified, and it is likely more will be identified through additional research efforts. Haploblock panels will enable highly discriminating assays best suited for relationship testing, familial assessment, and admixture analysis.
Investigative information Phenotypic information from a DNA sampleWhen there is no suspect, SNPs that describe phenotypic traits would enable a genetic prediction of appearance for investigative leads to identify the perpetrator of a crime (3). If an individual's pigmentation, facial features and height can be predicted, investigators may be able to eliminate potential suspects, focus their search, and at a minimum help confirm or refute the more refractory eyewitness description. Once a suspect is identified, his or her reference DNA profile can be compared with the evidence profile using standard DNA markers for inculpation or exculpation. The same phenotypic SNPs could be used to facilitate facial reconstructions for identifying missing persons. The research to discover phenotypic SNPs has identified a few good candidates, but more genome-wide scans will be needed to develop a battery of informative SNPs. Because these SNPs may reside anywhere within or around a gene, alternative analytical technologies will be needed than what will be used for identity testing and lineage SNPs. The position of the informative SNP that may infer a particular phenotype may not always be known a priori, so the assay(s) will need to be able to determine the full sequence of the gene(s) of interest; the same demand will likely hold for pharmoco-genetic SNPs (see next section, below). It is likely that sequencing based technologies will be better suited for scanning genes for informative phenotypic SNPs in forensic evidence. This sequencing approach, combined with whole genome screens and association studies will be used to identify the causal SNPs or SNPs in high linkage disequilibrium (LD) that can be used to predict phenotype.
Pharmacogenetic information from a DNA sample (molecular autopsy)Genetic variation and its effects on metabolism can be applied to postmortem analysis to help resolve some cases initially believed to be suicides or classified as sudden, unexplained deaths, especially in cases where poisoning, incapacitation, inebriation, or certain diseases where pharmacotherapy is an essential treatment (such as epilepsy, depression, cardiac disease, or diabetes) are factors in the cause of death. Individuals vary in their response to drugs or physical exertion (44,45, 56,57,58). Some people, for example, can metabolize a drug better than others due to pharmocogenetic SNPs in or around specific encoded enzymes (such as those in the human cytochrome P450 monooxygenase superfamily) (44,45,56,58). Those who have a genetic makeup that enables very rapid metabolism of a drug may receive no benefit from a certain administered dose. In contrast, those individuals who cannot metabolize the drug may be poisoned by accumulation or overdose. Rodriguez-Calvo et al. (59) recently reviewed the potential role for genetic analysis into the cause of sudden cardiac death (SCD). SCD is one of the most common causes of death in developed countries and even though it is a highly heterogeneous and variable penetrance group of diseases, some elucidation into genetic associations and cardiac disease are emerging. For example, there may be explanations for sudden death in some cases of apparently healthy young people (i.e., <35 years of age). Scientists are exploring these metabolic differences among individuals and how they impact the cause of drug-related or unexplained deaths. Pharmocogenetic SNPs will eventually make their way into molecular autopsy protocols in pathology laboratories. An additional benefit is that pharmacogenetic analysis can help determine the cause and manner of death and may provide health information (certainly only via proper ethical disclosure practices) to at-risk relatives.
Expression analysis to determine tissue typeA matching DNA profile comprised of the core set of STR loci is very strong evidence regarding the source of a sample. Additional information regarding the tissue source of that sample can be useful: for example, determining whether the source of the DNA was from semen instead of saliva can help reconstruct how a sexual assault transpired. Crime scenes are rarely pristine and stains that are apparent may be human in origin or could be from other organic or inorganic sources. Being able to screen these samples for human origin and tissue specificity can reduce unnecessary DNA typing. Most presumptive and confirmatory serological tests for species specificity and tissue origin (limited to blood, semen, and saliva) are based on immunological or catalytic assays. Conventional serological methods of tissue identification are laborious, use diverse techniques, consume significant amounts of sample, and are costly. While the DNA in each tissue is essentially the same, the mRNA and protein profiles are substantially different. The differences in the proteins, which are the target of serological assays, account for the distinctive properties of the tissues. There are no confirmatory tests for some of the typically encountered tissues, such as saliva and vaginal secretions, making a serological approach to tissue identification problematic.
An alternative approach would be the use of low- to medium-density expression profiling for typing of the presence of mRNA species that are tissue-specific (60,61). Multiplex reverse transcription PCR (RT-PCR) methods for tissue identification for blood, saliva, semen, and vaginal secretions have some appeal because they can be assayed using the same platforms used for current DNA typing assays, can provide specificity for tissues of interest, and RNA can be recovered during DNA extraction thus reducing sample consumption. Work will continue on selecting genes that are expressed in only one tissue, developing assays that parallel DNA diagnostic methods, and determining the degree of stability of mRNA in aged and environmentally exposed samples. Alternatively, a tissue's proteomic profiles could be determined by mass spectrometry. The method of choice will likely depend on which species (RNA or protein) is more stable and more abundant in forensic samples.
Database searchesA number of countries have established DNA databanks that contain DNA profiles from, at a minimum, convicted offenders and forensic samples from unsolved cases (18,19). These databases are designed to help solve future crimes or identify missing persons by providing genetic investigative leads. The United States’ CODIS databank houses the largest number of DNA profiles compared with any other offender/forensic DNA database (Table 1). There are indices for crime-scene evidence, individuals convicted of felonies, arrestees (in some states), missing persons, human remains, and family members. Because of their success in providing investigative leads, these databases continue to increase in size and may provide additional information other than solely direct matching of DNA profiles for investigative leads.
In order for DNA profile databanks to be useful at a national (or international) level, standardization of the genetic markers used among laboratories was essential. In order to ensure comparability of DNA profiles across the United States, for example, the STR loci for characterizing DNA reference samples and forensic samples were standardized (18). Thirteen autosomal STR loci were selected as core markers for CODIS. (They are CSF1PO, FGA, THO1, TPOX, VWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, and D21S11.) The number and specific loci vary between some countries but a core set is common to all forensic DNA databases. ADNA profile comprised of these thirteen STR loci often yields a strong investigative lead when a direct-match hit is obtained through a database search. However, the full capability of developing investigative leads is not exploited by searching only for direct matches.
Although the database-searching algorithms are designed to facilitate obtaining direct matches (e.g., matching a forensic sample profile with a convicted-felon reference profile) some have sought to use the large profile archive to develop potential investigative leads by identifying possible relatives of the source of an evidentiary sample through kinship or familial inferences (62). On average, close relatives (i.e., parents, offspring, and siblings) share more alleles than do unrelated individuals. Therefore, despite a lack of a direct match in a database search, a partially matching profile still may be informative. While the use of this form of database searching—known as familial searching—is being debated on legislative and civil-rights grounds, some states and countries have proceeded and identifications of relatives have been obtained. It is likely that if more kinship associations ultimately result in solving crime, there will be more motivation to further exploit familial searching.
In addition to the need for better searching algorithms, certain limitations exist regarding the use of familial searching. When searching large databases with the thirteen core STR loci there will be a large number of fortuitous hits (possibly hundreds) that cannot be excluded as potential relatives. Moreover, it is likely that the top hits (i.e., the strongest associations) often will be with unrelated individuals. This phenomenon is due to the fact that the thirteen STR loci are not sufficiently resolving to be an efficient screen for one-to-one kinship analysis. Another exacerbating factor is that a mutation in one of the true relatives could appear as an exclusion. Searches have to tolerate a degree of “mismatch” and thus allow for more fortuitous candidates. To overcome this limitation, additional genetic markers are needed for more efficient searching of candidates. Since most individuals represented in these DNA databanks are males and close male relatives are being sought with familial searching, the use of genetic markers on the paternally inherited Y chromosome will substantially reduce the number of candidate hits. If familial searching becomes routine, then reference samples should be typed for Y STRs in addition to the core autosomal STRs (63). To further reduce the list of candidates, identity- and kinship-testing SNPs would be good markers for additional genetic characterization. SNPs also have very low mutation rates and because of their smaller amplicon size they may provide data on substantially degraded samples. A battery of SNPs with a high power of discrimination is desired. Commercial kits are needed that provide the reagents necessary to multiplex Y chromosome STRs with the core CODIS STRs or possibly combine a large suite of SNPs (as would be developed for missing persons identifications) with the core CODIS STR loci. Being able to multiplex these markers would be an economic boon, enabling one analysis for both sets of markers and thus reducing labor and cost for typing reference database samples.
Microbial forensics and high-throughput sequencingThe threat of terrorist or criminal use of microorganisms and their toxins is a great concern for biodefense and biosecurity worldwide (64,65). The anthrax-letters attack of 2001 demonstrated the public's vulnerability to such attacks and the U.S. government's inability to forensically investigate the evidence for attribution purposes (66). This resulted in the birth of the field of microbial forensics. Microbial forensics is an evolving subdiscipline of forensic science for analyzing evidence from a bioterrorism act, biocrime, hoax, or an inadvertent release for attribution purposes (65). In many ways, microbial forensics is not a novel field; its bases and practices are derived from similar approaches established in public health and epidemiology (67). The difference between microbial forensics and epidemiology is that the former desires to further individualize a sample. Nonetheless, microbial forensic analyses must encompass sample handling, collection, preservation, method selection, casework analysis, interpretation of results, validation, and quality assurance.
Molecular genetics, genomics and informatics will be central to species/strain identification, virulence determination, pathogenicity characterization, and source attribution. The ultimate in source attribution is to be able to individualize a sample such that it can be traced to a unique source. That is unlikely with current capabilities and may not be possible in many cases because of the nature of microbiological samples. Epidemiologic investigations tend to focus on species and strain level resolution, which are helpful information for a microbial forensics investigation (67). However, forensic science endeavors to individualize samples: for the anthrax-letters attack, a multi-locus variable-number tandem repeat (VNTR) analysis technique was used to identify the Bacillus anthracis bacteria as that of the Ames strain. While the strain data appropriately focused the investigation toward laboratory sources, differentiating closely-related laboratory samples of the same strain was far more challenging. For future cases, technology is needed to facilitate identification of those unique SNPs, duplications, deletions, insertions, or rearrangements—if they exist—that will better individualize samples and help focus an investigation (68). Unlike human identification, where a standardized core set of loci can be used to differentiate individuals, the microbial forensic marker(s) for individualization will be unknown and case-specific. Whole-genome sequencing is the preferred method for discovering genetic variation of forensic value (68,69). The most effective approach for comprehensive genetic variation discovery, which was used in the anthrax-letter investigations (69), has been by high-throughput shotgun sequencing exploiting Sanger sequencing (70). Though considered to be the gold standard of sequencing technology, this method is laborious, costly, has relatively low coverage, and exhibits sample bias problems. If whole-genome resequencing were desired for a repository of samples (of a few to thousands) the cost would be prohibitive.
Therefore, advances in sequencing technology are needed that increase accuracy and speed, reduce cost, and maximize efficiency for forensic analysis. Hybridization resequencing [such as the chip technology developed by Affymetrix (Santa Clara, CA, USA) (71)] enables an extremely large number of probings to be carried out simultaneously and would provide fast turnaround for typing results (71,72,73). But hybridization chip technology may not have the sensitivity of detection required for forensic applications. Matrix-assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF), which exploits the absolute mass of a nucleic acid molecule as an intrinsic property, offers advantages over hybridization and electrophoresis approaches (41,42): MALDI-TOF is not subject to the vagaries of electrophoretic anomalies and DNA secondary structure, and does not require labeling molecules for detection. However, it cannot be used to sequence whole genomes. Second-generation massively parallel sequencing technology, such as SOLiD (Applied Biosystems, Foster City, CA, USA), Genome Analyzer (Illumina, San Diego, CA, USA), and 454 GS-FLX (454 Life Sciences, Roche, Branford, CT, USA), and other related technologies offer rapid resequencing of whole bacterial genomes with high coverage (74,75,76,77,78,79,80,81,82,83). Third-generation single-molecule sequencing technologies, such as that in development by Pacific Biosciences (Menlo Park, CA, USA) and Helicos BioSciences Corporation (Cambridge, MA, USA), may supplant the current novel sequencing technologies (84). However, single-molecule approaches can have sampling issues that will need to be addressed and the current technologies, being more in the proof-of-concept phase, are far from robust. It is difficult to predict what technologies will be selected for microbial forensics, but low-cost, high-coverage, low-error, high-throughput sequencing of whole microorganism genomes will be a necessity for supporting development of the most effective microbial forensics attribution assays.
AutomationThe demands of generating, entering, and maintaining DNA profiles in a national DNA database have driven developments in automation. The number of reference samples from convicted felons, arrestees, detainees, and missing persons continues to increase, and the burden is such that these samples cannot continue to be typed and reviewed manually. Robotics and modified chemistries more amenable to automated processes have been developed to increase throughput and efforts will continue to improve automation efficiency (85,86,87,88,89,90,91).
Automation offers quality control, consistent results, and data management with lower operational costs. By removing the human component from the process, results tend to be more consistent and high-quality. Error is reduced primarily by minimizing the chance of sample switching and carryover contamination. Software developments enable tracking of sample handling throughout the process. Lower reagent volumes translate into fewer consumables and less waste.
Most automation has focused on the extraction of DNA from standard reference samples and some have extended the application to casework samples such as bone, hair, teeth, cigarette butts, and sperm. The robotic platforms vary and include the Tecan Genesis RSP 150/8 robotic workstation and the Tecan Freedom EVO liquid handling stations (Tecan, Mannedorf, Switzerland), the Biomek 2000 automation workstation (Beckman Coulter, Fullerton, CA, USA), the Plato 3000 robotic system (Rosys/Anthos AG, Hombrechtikon, Switzerland), and the BioRobot EZ1 System and BioRobot 8000 workstation (Qiagen, Dusseldorf, Germany), to name a few (85,86,87,88,89,90). The development and implementation of robotic workstations require alternative chemistries for extraction. Some parts of a manual extraction are not accommodated readily by a robotic system, such as organic solvent extraction, centrifugation and boiling. Solid-phase extraction chemistries, such as the DNA IQ (Promega, Madison, WI, USA) (92) and the Qiagen EZ1, QIAsymphony Investigator Kit, and QIAamp Investigator BioRobot Kit (Qiagen) have been adopted to facilitate automation of extraction (85, 87,88,89).
It is important to quantify the DNA and normalize the amount that is used in PCR to obtain more consistent typing results. Greenspoon et al. (87) used the same robotic platform (the Biomek 2000 Automation Workstation) for extraction, DNA quantitation, and PCR setup, thus automating three parts of the process prior to the PCR step. In addition, the remaining DNA extracts are transferred directly to storage tubes for long-term archiving. Automation has been and will continue to be developed for the protocols encountered in the forensic laboratory. However, these robotic systems are macroscale approaches that have yet to capitalize on the potential benefits of microscale technologies (see “In-field testing” section).
Forensic analysis of mitochondrial DNA (mtDNA) often provides results for samples where nuclear autosomal marker analyses are difficult or impossible (such as old bones, teeth, and hair shafts) (21). Typing generally involves the PCR amplification of two short regions of mtDNA called hypervariable regions 1 and 2 (HV1 and HV2), followed by direct sequencing of the PCR products by Sanger sequencing. This process is laborious, time-consuming, and costly. Additionally, data analysis can be confounded by sequence artifacts, electrophoretic anomalies, the presence of heteroplasmy (i.e., the presence of more than one mitochondrial genome variant within an individual) and limited ability to quantify the components of a mixed sample. Recently, multiplex PCR followed by electrospray ionization time-of-flight mass spectrometry (ESI-TOF-MS) was demonstrated to be applicable for typing the hypervariable regions of human mtDNA, expanding the discriminating potential of an assay beyond that of specific SNP targeting (40). Additionally, heteroplasmic samples can be analyzed and the relative quantity of the components of mixed samples can be determined (Figure 2). The T5000 Biosensor (Ibis Biosciences Inc., Carlsbad, CA, USA) (39) combines robotic workstations for PCR and sample cleanup (i.e., de-salting) with mass spectrometry, so mtDNA typing can be performed with at least a 10-fold increase in throughput and a 5-fold decrease in reagent cost, with no loss in sensitivity and little or no loss in information compared with traditional sequencing. This platform holds promise for readily accommodating other genetic marker assays.
The areas where automation has yet to improve throughput sufficiently are at the front and back ends of the analysis. For reference samples, some success has been achieved because the sample format can be standardized. However, for casework, the sample types and the substrates on which they reside vary substantially, making it difficult to standardize the initial sample preparation. A blood stain may reside on a non-porous car bumper or on porous wood, a bone sample requires pulverization, a semen sample may reside on a vaginal swab, a clothing item, and so on. Automating the sample preparation of casework materials is likely to be the most challenging endeavor for forensic scientists.
The back end of the process is the interpretation of results. Algorithms will be needed to facilitate this very labor-intensive step. Currently, two qualified scientists are required to manually read a DNA profile (whether STRs or sequences). Expert systems are being developed to replace one if not both of the scientists for typing STR reference samples entered into DNA databanks (93,94). Such efforts will continue and the design of expert systems will be attempted for interpreting the more challenging casework samples (93,94,95).
The platforms currently used in forensic laboratories serve their purposes but are still macrofluidics-based systems requiring relatively large volumes of reagents and generating relatively large volumes of waste. Additionally, they tend to be modular. Most of these robotic systems automate parts of the analytical process. An individual is needed to move the microplate from one robotic system carrying out one function (e.g., extraction) to another (e.g., PCR), and eventually to the capillary electrophoresis instrument. Integration of all facets from extraction to detection has yet to be realized. However, micro-fabricated devices offer the possibility of automating the entire analytical process and freeing the analyst to carry out other tasks.
In-field testingThere is some interest in the ability to perform DNA diagnostics at the crime scene. For microbial forensics and public health, the need is paramount to be able to determine the presence of microorganisms that are harmful to humans. A biocrime scene requires investigators to wear protective equipment, making it difficult to work for prolonged periods of time. If the scene could first be determined safe (as in the case of a hoax, for example), this onerous requirement for sample collection could be omitted. The instrumentation for pathogen detection should be portable, not just transportable. The diagnostic capability should have a high degree of sensitivity of detection and be able to detect a wide range of known harmful pathogens, as well as the genes that confer pathogenicity, in case genetic engineering was used to modify an otherwise harmless microorganism. Micro-fluidics has appeal because it enables molecular biology analyses to be carried out on miniaturized platforms that integrate all aspects of the analysis from sample preparation to nucleic acid typing [e.g., the lab-on-a-chip concept (96,97)]. Additional potential benefits of microfluidics include reduced sample consumption and reagents (lowering cost), less waste, better thermodynamics during the PCR (that possibly could reduce stochastic effects with limited template), and less contamination (being an integrated closed system) (96,97,98). It is conceivable that throughput would increase by decreasing analysis time and exploiting parallel processing. Analysts would also be freed from some manual processes that are still encumbered with macro-fluidic manipulations. Development in this area is exemplified by the research efforts at the Landers laboratory (University of Virginia, Charlottesville, VA, USA), which has demonstrated that a wide range of samples and even differential extraction (i.e., isolating sperm DNA from DNA from other cell types) can be accommodated in a microfluidic format. Thus, the macroscale of samples and the microscale of extraction analysis requirements can be bridged. In addition, they have developed an integrated system that enables the entire process from sample extraction through electrophoresis to be carried out therein (96,97, 99,100,101) (Figure 3).
There are advocates for a portable field testing microfluidic device for performing human identification DNA typing at the crime scene to rapidly identify suspects. Presumably, this would be by generating a profile and immediately searching a DNA database for developing an investigative lead. It certainly would not be used for eliminating lingering suspects: even if the perpetrator remained a the crime scene, obtaining a reference sample would require probable cause and thus is not amenable to rapid response. A significant concern would be the possible contamination of evidence by reference samples in a suboptimally controlled environment. The crime scene is a chaotic environment and it is important to control the scene, and efforts should be focused on proper collection of evidence and to minimize its contamination. If DNA typing were performed at the crime scene, then qualified practitioners would have to be deployed, since expertise is required for DNA typing and interpretation of the generated DNA profiles. This deployment would reduce the throughput of an already backlogged laboratory: scientists would be occupied going to and from crime scenes and could only work one case at a time, and there would not be enough qualified personnel to analyze DNA at multiple, simultaneous crime scenes. One solution is for scientists to remain remote and the profiles transmitted electronically. This approach still does not address the need for properly trained practitioners to process the samples and carry out the analytical portion of the assay. Again, the front-end sample preparation is perhaps the biggest hurdle to overcome, since crime-scene samples present themselves in myriad manners and these macro-samples may not be readily amenable to microfluidic processing. However, a microdevice may be useful at the crime scene for collection and sample storage.
Rarely is the time to move a sample from the crime scene to the laboratory an impediment. However, casework continues to increase while manpower does not, concomitantly. Fully integrated automated systems hold promise for increasing throughput by freeing an analyst from several manual aspects of the process so he or she can focus on other, more demanding processes. Automation is essential to address the increasing demand for casework analysis and for generating and entering samples into national DNA databases.
ConclusionThere still are a number of gaps that need to be addressed in forensic biology. We have identified some of these areas where further development is needed: improving the current limits on typing samples of low quantity and quality; improving the efficiency of sample recovery and extraction; converting current STRs to mini-STRs; selecting and validating new mini-STRs; selecting and validating a variety of SNPs for different applications; enhancing multiplexing; developing automation for high throughput; developing expert systems for data interpretation; developing sequencing capabilities for screening microorganism genomes; and field testing. There are likely other gaps as well. We did not address identification beyond that of humans and microorganisms. Plant and animal forensic genetics may have additional requirements that molecular biology may resolve. The future of molecular biology for forensic science will be exciting and dynamic. There is still much to achieve and molecular biology developments will be essential for assisting in solving crimes and identifying missing persons.



