How long-read sequencing is scaling beyond the specialist lab

Written by Aaron Wenger (Principal Scientist at PacBio)

Here, Aaron Wenger, Principal Scientist – Bioinformatics at PacBio (CA, USA), explores how advances in accuracy, throughput and cost are making long-read sequencing more accessible at scale.  

Advances in genomic sequencing have led to significant gains in our understanding of human, plant, animal and microbial biology. Repeatedly, every technological breakthrough that uncovers new layers of genomic complexity also reminds us how much more remains to be understood.

For example, breakthroughs in cytogenetics during the 1950s enabled scientists to visualize whole chromosomes for the first time, but researchers quickly realized that the genomic resolution was low and most variants remained hidden.

Over decades, subsequent innovations expanded the depth of biology researchers could study, but the final 8% of the human genome wasn’t resolved until 2022, thanks to advances in software algorithms and whole-genome sequencing (WGS) technologies such as HiFi long-read sequencing. Once reserved for specialist projects, this level of genomic resolution is now becoming accessible at scale as the throughput of long-read sequencing increases and costs decline.

Short-read vs long-read sequencing: how do these WGS techniques compare?

While both techniques aim to give scientists a view of whole genomes, short-read sequencing lacks the context and completeness of long reads. This is because short-read sequencing splits genomes into fragments of 100-300 base pairs, which provide limited information and require alignment to a reference genome for interpretation. Although effective for identifying single-nucleotide changes and small insertions or deletions, short reads struggle to characterize repetitive or highly variable regions of the genome because the fragments are too short to span them.

In contrast, native long-read sequencing captures thousands of bases in a single, continuous read. This means long reads can span complex structural variations and genomic regions associated with disease that short reads miss. By preserving genomic context, long reads also enable phasing, which allows scientists to link genetic changes to the maternal or paternal chromosome they originate from to assess how combinations of mutations may influence disease.

Beyond genomic data, the most advanced long-read sequencing can now capture additional biological information, such as epigenetic DNA methylation patterns. This insight helps researchers understand regulatory changes linked to disease risk and progression. Analyzing multiple ‘omes previously required multiple assays, but native long reads now regularly provide multiomic insights in a single experiment, saving time and money and relieving the burden of multiple tests.


Free access to single-cell long-read mRNA sequencing tech with new grant

ArgenTag has announced the opening of a grant program to provide access to single-cell sequencing tech, free of charge.


Scaling long reads for research

For many years, the completeness of long reads came with a trade-off of lower throughput and higher costs. Now, improvements in sequencing chemistry and platform efficiency are changing that equation, delivering thousands of genomes per instrument per year at a few hundred dollars per genome at scale. This has made long-read sequencing feasible for larger studies and routine use. Here are three research areas where those benefits are expected to drive breakthroughs in discovery.

  1. Population genomics research

Generating high-resolution data from thousands of people allows researchers to explore genetic variants that influence biological function, often revealing entirely new pathways involved in health and disease. Long-read sequencing strengthens these studies by improving detection of structural variants, resolving complex and repetitive regions, and enabling haplotype phasing, all of which remain challenging for short-read approaches. These capabilities are critical for capturing forms of variation that have historically been missed, particularly in underrepresented populations.

Studying population-specific genetics has already demonstrated its value in drug discovery. For example, research into a rare mutation in the SOST gene in a small Afrikaner population revealed a mechanism regulating bone density. While the condition itself is extremely rare, understanding this biology enabled the development of therapies targeting sclerostin for osteoporosis. Similarly, studies of individuals in Iceland carrying PCSK9 loss-of-function mutations uncovered a protective mechanism against cardiovascular disease, leading to a new class of cholesterol-lowering drugs.

  1. Rare diseases research

Many rare diseases are driven by specific and complex changes in the genome. However, much of this variation remains poorly understood due to the limitations of traditional sequencing approaches. For example, conditions caused by repeat expansions, such as fragile X syndrome, are difficult to resolve with short-read technologies. As a result, not only do 60% of patients with rare diseases remain undiagnosed, but large areas of rare disease biology also remain unexplored for R&D purposes.

Long-read sequencing enables researchers to interrogate the full spectrum of genomic variation, as well as epigenetic modifications such as DNA methylation – chemical changes that regulate whether genes are switched on or off without altering the underlying DNA sequence. These signals are increasingly recognised as important contributors to rare disease biology, particularly in disorders involving imprinting or gene regulation. This multiomic view allows scientists to identify novel variant types and better characterize how they disrupt gene function.

  1. Resolving complex tumor genomes

Cancer is fundamentally a disease of the genome, driven by genetic and epigenetic alterations. Tumor genomes are structurally complex, featuring rearrangements, fusions, copy number changes and heterogeneous cell populations. Long reads allow scientists to capture a complete view needed to understand tumor biology and inform R&D direction.

On a wider scale, increases in the throughput of long-read sequencing are making it practical for larger cohorts of cancer patients and tumor samples to be analyzed. This scale is enabling researchers to identify new biomarkers, study tumor evolution and better understand how structural and regulatory variation influences disease progression and treatment response.

Solving biological mysteries

With improvements in accuracy, throughput and cost bringing long-read sequencing within reach at scale, researchers can now revisit longstanding biological questions with a far more complete view of the genome. With the benefits of long reads being proven, the next step is moving beyond proof-of-concept and integrating long reads into settings with real impact.

This shift will enable routine detection of structural and regulatory variation that has remained difficult to access with earlier approaches, and support more comprehensive, scalable genomic analysis across diverse applications. In doing so, long-read sequencing is positioned to transition from a specialized tool to a foundational technology in genomics, translating deeper genomic insight into meaningful advances in biology and drug discovery.


No potential competing interest was reported by the contributor to this feature.

The opinions expressed in this article are those of the author and do not necessarily reflect the views of BioTechniques or Taylor & Francis Group.

 

 


  You might also be interested in...