What is DNA sequencing? Answering some of the most frequently asked questions

16 Sep 2019

Written by Abigail Sawyer (Senior Editor)

As part our In Focus on DNA sequencing, we are looking at some of the most frequently asked questions surrounding DNA sequencing to provide some clarification and further insight to this much-loved process.

What is DNA sequencing?

DNA sequencing refers to the process of identifying the order of bases in a strand of DNA. DNA is comprised of nucleotides, with each nucleotide consisting of a sugar, a phosphate group and a base. Within double stranded DNA (dsDNA), bases from single stranded DNA (ssDNA) will form pairs with the bases on the other ssDNA. This pairing occurs due to hydrogen bonds forming between complementary bases.

In DNA, the bases are cytosine (C), thymine (T), adenine (A) and guanine (G). In double stranded DNA, A-T and C-G will form bonds between them and therefore form base pairs. When these bonds have been formed in dsDNA, the classic double helix structure will then result.

So, why do we sequence DNA? The sequence of DNA can reveal lots of genetic information, helping identify genes that code for proteins, regulatory instructions that can instruct genes to turn on or off, as well as mutations that can cause disease.

How do we sequence DNA?

There are many different methods for DNA sequencing. One of the original DNA sequencing methods that prevailed for many years is Sanger sequencing – also known as the chain-termination method. This was developed by Frederick Sanger in 1977 and was the DNA sequencing method of choice from the 1980s to the mid-2000s.

Sanger sequencing was known for its reliability and ease of use, leading to its automation. Over time, advancements in technology have allowed for the time and cost of Sanger sequencing to be reduced, leading to its use to sequence a whole human genome in the Human Genome Project, which was completed from 1990–2003.

Further DNA sequencing technology advancements resulted and the time and cost of sequencing a whole human genome was driven down more, paving the way for next-generation sequencing (NGS) technologies.

Now, there are many different sequencing technologies in use, and different ones are preferred depending on budget or application. DNA sequencing and reading DNA sequences are mostly automated processes.

What is high-throughput DNA sequencing?

NGS methods can also be referred to as high-throughput DNA sequencing. Their invention was accelerated by the Human Genome Project and they are generally rapid, easy and cheap techniques for sequencing DNA.

Nanopore and microfluidic sequencing are just two examples of these novel, high-throughput methods. Key players in high-throughput sequencing methods include Illumina Inc (CA, USA), Roche Holding AG (Basel, Switzerland), Agilent Technologies Inc (CA, USA), Oxford Nanopore Technologies (Oxford, UK), Pacific Biosciences (CA, USA) and Life Technologies (CA, USA).

Where can I get my DNA sequenced?

There are many different services that offer whole genome sequencing to the general public, as a quick Google search will tell you. For those on a tighter budget, there are companies that will look at and sequence specific genes associated with health and fitness, as opposed to sequencing the whole genome.

Public health services, for example, the UK’s Genomics England (London, UK) are working towards sequencing 5 million whole genomes of those with genetic diseases and cancers, in order to build up an extensive database to enable better diagnosis, predictions and inform precision medicine treatments.

In terms of sequencing DNA for a research project, many universities and institutions have their own DNA sequencers. It is worth looking for a collaboration or recruiting an experienced research group to sequence the DNA for you.

If this is likely to be a longer-term project, there may be the option to purchase a DNA sequencer. It is worth considering the different options for this in terms of your particular application, what different companies offer and speak to a more experienced colleague for advice.

Why is PCR used in the process of DNA sequencing?

When the DNA sample is too small, PCR can be used to produce more copies of the required DNA sequence and amplify it. PCR is often a required step in Sanger sequencing, however, some of the NGS technologies have such good sensitivity that amplification would not be required, even for a small sample.

What can cause changes in DNA sequences?

Changes – or mutations – in DNA sequences can be caused by many different factors. There are different types of gene mutations and they can affect anything from one single base pair to a large section of DNA that incorporates many different genes. Mutations can be either hereditary (germline) or acquired (somatic). Hereditary mutations are inherited from a parent, therefore, are present from the point of conception and occur in every cell in the offspring’s body.

Acquired mutations occur at some point in the person’s life and will therefore be present in only some cells, as opposed to every cell in the body. These acquired mutations can often be caused by environmental factors, such as UV radiation, or a mistake in DNA replication during cell reproduction. These mutations cannot be passed onto the next generation, unless the mutation has occurred in a person’s gametes.

What are the different types of DNA mutation?

Different types of mutations include base substitutions, insertions and deletions. Substitutions occur when one base pair changes to another base pair. Missense substitutions lead to a change in the amino acid that is coded for and nonsense substitutions lead to a stop signal, meaning that protein production is stopped short, often resulting in a non-functional protein. If there is no effect on the amino acid and therefore the protein that is produced, then this is a silent mutation.

Insertion mutations are when a piece of DNA is added into the sequence where it shouldn’t be, changing the length of the gene and affecting the resultant protein’s function. Deletion mutations remove a section of DNA from a gene and can also affect the resultant protein’s function. Sections of DNA can also be abnormally replicated, leading to duplications within a DNA sequence and potential production of a non-functioning protein.

The addition or loss of base pairs can lead to a frameshift mutation, in which a gene’s reading frame is altered. Genes are read in groups of three bases that code for one amino acid. If a base pair is added, deleted or duplicated, it can have a knock-on effect throughout the entire gene, shifting the order of the base pairs and the sequence of amino acids that are produced.

What is the difference between a mutation and a SNP?

Single nucleotide polymorphisms (SNPs) are variations in a base pair that occur at a particular location in a DNA sequence in more than 1% of the population. When there is a variation in a DNA sequence that occurs in less than 1% of the population, this is considered to be a mutation. Since SNPs are common and there is no standard version of the allele in question, alternative sequences where SNPs are present will also be considered to be ‘normal’.

Most SNPs have no effect on a person’s health; however, these genetic variations can be of use to scientists studying genetic prediction or association of a disease.

How do you find a mutation in a DNA sequence?

Once the DNA is sequenced, the Ensembl BLAST/BLAT tools can be used to match the DNA to a gene, finding its location in a published reference genome and differences can be seen. Using a database of known SNPs, such as dbSNP, will then enable these to be identified against possible mutations. Programs including SIFT and PolyPhen can then be used to predict the potential effect of any mutation on the relevant protein and indicate whether this could be disease-causing.