Integrating computational and experimental techniques to decipher neuronal heterogeneity
The brain is an extremely heterogeneous tissue. Within one small piece, there are neurons transmitting information, oligodendrocytes and astrocytes acting as support cells and microglia functioning as part of the immune system. Within the major classes of neurons, there are – depending on who you ask – hundreds to thousands of different subclasses. Knowing each neuron’s specific function is essential to our understanding of how these neurons perform different roles in behaviors and systems.
In this interview, Andreas Pfenning – Associate Professor in the Computational Biology Department in the School of Computer Science and a member of the Neuroscience Institute at Carnegie Mellon University (CMU; PA, USA) – shares the experimental and computational techniques he’s using to investigate cell heterogeneity in the brain. Additionally, we learn about the targeted therapeutics he’s developing, what’s next for his research and what we can expect from the session he’s part of at ABRF 2026 (28–31 March; PA, USA).
What techniques and technologies do you use to investigate cell heterogeneity in the brain?
We can break this down into computational techniques and experimental techniques. In my lab, we really span and try to integrate both disciplines. On the experimental side, we started off using droplet-based sequencing methods to conduct single-nucleus RNA-seq. We used single-nucleus – as opposed to single-cell – because the neurons and cell subtypes of the brain are very different shapes and sizes; if you’re using droplet-based techniques, where you’re flowing cells through a microfluidic device, the cells of different shapes might flow at a different rate or even get stuck, so using single-cell RNA-seq can produce biased results.
We do single-nucleus RNA-seq to get a sense of what the cell identities are, determining the levels of different genes in the different cells. We also use single-nucleus ATAC-seq to determine the open chromatin regions, which can reveal how the epigenetic landscape is different across different subtypes of cells. Together, these techniques provide a broad genomic picture.
Whenever you have new experimental techniques, you need corresponding computational techniques to make sense of that data. Many of the computational tools we use are very standard for processing the reads, mapping them to the genome and identifying clusters of genes.
However, sometimes we need to use additional techniques to make sense of more specific information. For example, if you see differences in genes at a single-cell level, it might be because there’s a continuous population of cells – think a dorsal to ventral cross-section of the brain – and there might be a gradient of expression across these cell types. On the other hand, you might be seeing discrete cell type differences – one specific cell type dorsally and another specific type ventrally. In the lab, we applied a statistical method called regression discontinuity to this type of data. This statistical test essentially lets us know whether the population of cells could better be described continuously as a gradient or as two discrete cell types.
This is where spatial transcriptomics comes in. One of the areas we’re exploring as part of the BRAIN Armamentarium Consortium is figuring out what discrete subtypes of neurons do, and how to target them to treat a specific disease or disorder. In mouse models, people tend to use Cre-driver lines to genetically insert something into the mouse genome that will label one cell subtype versus another. If you’re trying to study the role of cell types in mouse behavior, these Cre-driver lines often work well, but they’re expensive to generate. You can imagine that if you’re trying to turn it into a therapeutic where you’re specifically manipulating or accessing a certain cell type, you don’t necessarily want to directly genetically engineer the human genome to be able to do that.
In the BRAIN Armamentarium Consortium, what we’re doing is basically trying to design regulatory elements, so enhancers or promoters, that will only activate in the cell type of interest. Just like you would in a mouse with the Cre-driver line, you can selectively inhibit one population of cells and then look at how the mouse’s behavior changes. We want to build a set of tools that can do this. The first step is identifying the cell-type heterogeneity in the first place, and then we want to start to manipulate or systematically study these different cell populations. To do this, we’re finding enhancers that will selectively drive expression only in a particular cell population.
What technical challenges have you found frustrating? What approach do you take to overcome these?
I’m primarily trained as a computer scientist; I did my postdoc at the Computer Science and Artificial Intelligence Laboratory at MIT (MA, USA). Although I have this background, I now run a wet lab and a mouse colony at CMU. I’m a bit of an anomaly in this respect; you often hear about wet lab researchers utilizing AI for data analysis, but it’s rarer to find computational researchers turning to the lab bench for answers. However, I think that experimental techniques are really now the major bottleneck for scientific discovery and that the way forward is a tighter integration of AI and wet lab experimentation, rather than AI slowly replacing the experimental side or supplementing it simply for data analysis.
My lab developed AI methods to be able to analyze the single-cell open chromatin data to more effectively design enhancers and promoters to target cells of interest. We used AI to label specific cell populations to design these sequences, and then followed this up by individually testing each one, which was a laborious process. Now, people are trying to develop experimental techniques to conduct high-throughput screening of these sequences, using barcoding techniques to measure the specificity of many different regulatory elements in one experiment. However, the technical challenge being encountered is crosstalk, where the delivered enhancers and barcodes can chain together to form concatemers that confound the results of traditional droplet-based techniques. This makes it difficult to discern whether a certain enhancer was specific to the cell type of interest. This is where sensitive spatial technologies become exceptionally useful because they can provide a complete picture of which enhancers and barcodes are present. We use the 10x Xenium to accomplish this.
Then, once you know all of the barcodes present for a given cell, we can run a deconvolution step, which recovers the signals that are coming from the different barcodes to infer the specificity of a given enhancer. The combination of spatial transcriptomics, which is really sensitive, and some of the computational methods to deconvolve the data allow us – in one experiment – to be able to determine whether a given enhancer is specific to a given cell type.
What kind of targeted therapeutics are you and your lab developing using AI and spatial transcriptomics?
We’re furthest along on a project with Becky Seal (Department of Neurobiology, University of Pittsburgh, PA, USA), in which we’re developing targeted therapeutics for the spinal cord in mouse models. The spinal cord is another heterogeneous tissue; there are cells that are very close to each other involved in different sensory behaviors. For instance, the cells for normal touch or sensation are located near cells involved in different types of pain, such as acute pain, chronic pain or neuropathy, as well as cells important for things like movement and breathing. The major challenge here is you want to selectively target or disrupt the cells involved in chronic pain, but you don’t want to disrupt the cells in normal touch or breathing.
To work out how to selectively disrupt chronic pain cells, we first built a cross-species atlas of different cell types in the spinal cord and linked them to specific neural circuits and pain behaviors via functional studies. We then used machine learning analysis of the single-cell datasets to design enhancers that would only activate in the populations of cells linked to chronic pain. Xenium technology was used to screen those enhancers for specificity, identifying some promising enhancer candidates. The Seal Lab paired those enhancer candidates with chemogenetics, which is a technique where the enhancer drives a particular receptor and you can deliver a drug to the animal that inhibits all of the cells that express that receptor. With this cell-type-specific enhancer, you can shut down neural activity in any cell that has an active enhancer, such as chronic pain cells – while leaving the other cells active and functioning normally.
With this work, the Seal Lab has shown that inhibiting those populations can either completely block or substantially inhibit multiple forms of chronic pain and even other types of sensation like itch, yet leave normal sensation, motor behavior and even some forms of acute pain intact.
Do you have any advice for using spatial technologies to investigate the brain?
From a computational biology perspective, it pays to dig deeply into the tools and techniques being used to analyze data. There is so much work being done to create better methods for understanding spatial data that allow you to: use spatial patterns of information to distinguish new cell types; make cleaner inferences about what populations of cells are there; and integrate new information coming from spatial technology with information coming from traditional experimental tools like droplet-based technology. By understanding those new analysis tools and how they work, you can truly get the deepest insight into your data, rather than using only the basic out-of-the-box software systems.
What’s next for your research?
That’s a good question; we’re going in a lot of different directions.
One direction takes us into manipulating specific cell types. For instance, in our spinal cord work, we’ve been able to show how designing tools and manipulating specific cell types can impact or abate specific behaviors. We’re interested in starting to apply that technology to different areas, including Parkinson’s disease. In the past, we’ve used AI combined with electrophysiological techniques to selectively activate cells that would improve Parkinson’s symptoms without affecting cells performing other important motor functions. We think that there are diseases and disorders, like Parkinson’s and addiction, where targeting or manipulating specific cell populations could have a real impact on treatment.
Another aspect of my work is directing the comparative genomics working group in the Vertebrate Genomes Project. This is an international consortium of researchers who are trying to sequence the genomes of every vertebrate species on earth, which is an ambitious project. We’re trying to use those genomes to make sense of very specific biological problems and create resources for biologists worldwide.
Data can be a computational bottleneck. If you’re only using data from the human genome, mouse genome and rhesus macaque genome, you’re inherently limited in the sophistication of the models you can build. There are only so many bases in the genome and only so many ways you can annotate those genomes to use the information. What we’re trying to do is use the genomes of the Vertebrate Genomes Project to more tightly trace the evolution of different parts of the genome and use that to inform human disease biology. This work also lends itself to identifying completely new model organisms that haven’t been brought into the lab before; this framework could help design tools to study their brains and their behavior.
Finally, in future, we plan to more tightly integrate experimental and computational techniques through automation. CMU has an impressive automated lab, which provides a foundation for using programming to integrate experiments that are being conducted – in this case by things like liquid handling robots – with computation and machine learning. In other words, these automated systems would be able to take the results of those experiments and make decisions about the next round of experiments to conduct or how to optimize different tools or technologies. There’s still a lot of work to do in this area, but I’m excited by the potential of these facilities and techniques to make significant leaps in biology by intelligently integrating experimental and computational approaches.
Please tell me about the session you’re a part of at ABRF 2026: Spatial transcriptomics – advances and applications.
The session is being organized by Amanda Poholek at the University of Pittsburgh. Within the Pittsburgh area, we’re trying to build more of a community around spatial biology. Whenever there is a new technology like spatial transcriptomics, especially at the kind of resolution now available, a lot of questions come with it, such as: how can you leverage this new technology to answer questions? How can you make it available and educate the broader community about how they can use it?
Amanda’s group uses spatial technology to study immune activation and how different tissues interact with the immune system. In this research, it’s essential to have spatial context because they’re trying to understand how cells interact. In my case, we’re primarily interested in the sensitivity of the technology and the spatial patterns that can help us discern relevant cell types and inform how we engineer better technologies.
If people can see the broad set of uses for spatial transcriptomics, then they can better envision how it might be helpful in their research. Furthermore, from the core facility perspective, we’re trying to form a community of researchers utilizing spatial transcriptomics in diverse ways, who are facing similar problems when using these techniques.
What are you most looking forward to at ABRF 2026?
I’m particularly interested in tightly integrating computational and experimental approaches. There’s a lot of room for innovation in this space; we’re at the stage where solving problems using computation or experimentation alone isn’t enough. At ABRF, I’ll be interested to learn how different core facilities are integrating AI with experimental infrastructure, within different areas of biology but especially genomics. I want to see what’s working and what needs improvement and use this intel to figure out how we can build new tools.
The interviewee has not disclosed any competing interests.
The opinions expressed in this interview are those of the interviewees and do not necessarily reflect the views of BioTechniques or Taylor & Francis Group.