Genetics Unzipped is the podcast from the Genetics Society - one of the oldest learned societies dedicated to promoting research, training, teaching and public engagement in all areas of genetics. Find out more and apply to join at genetics.org.uk

David Goldstein: Mining the genome for medical insights

David Goldstein: Mining the genome for medical insights

Image Credit: David Goldstein, Photo courtesy of David Goldstein

Image Credit: David Goldstein, Photo courtesy of David Goldstein

Click here to listen to the full podcast episode

Kat: Increasingly, as Steve explained, the first inklings of an interesting new target are coming from genomics - trawling through billions of ‘letters’ of DNA from hundreds or even thousands of people in search of genetic variations that might be linked to disease and could point towards an exciting new target for drug development - and, importantly, identifying the right patients who might benefit from them.

Kat: David Goldstein is a Professor of Genetics at the Columbia University Medical Center in New York and chief advisor for AstraZeneca’s Centre for Genomics Research, providing advice on how best to turn the complex information within the human genome into insights that can lead to better targets and therefore better therapies. Given that last year saw the 20th anniversary of the draft sequence of the human genome, I started by asking David how far we’ve come in unlocking the potential of genomics in how drugs are developed and used.

David: I would say that some things have moved faster than many expected in terms of the application of genomics clinically and in drug development. And some things have moved more slowly. So one of the really exciting developments is that we really now can systematically trace down the causes of disease and strongly genetic diseases. And that's really turned into a remarkably powerful clinical application. I think it's also fair to say that genetics has had a more modest impact on drug development so far than we might have hoped. My own orientation is that that is probably set to change. The economists like to talk about a kind of J shaped curve and the impact of innovation. And I think that probably applies in terms of genomics and drug development, where we really needed to figure out how to effectively use genomics in drug development. And it wasn't an easy thing to do. And I would say that the focus of how to use genetics and genomics is shifting from the idea of using it to kind of discover and validate targets towards the idea of using omic approaches to elucidate targets to really understand better where to apply therapeutics in what disease areas, as opposed to really pointing the way to a target that no one ever thought about before.

Kat: So let's start unpacking this a little bit. So how much of the genome do you think we really have explored in terms of finding targets, understanding diseases? We know some of the big genes for disease like cystic fibrosis, we know the cystic fibrosis gene, but how much else of the genome have we really started to get to grips with?

David: Well, we know a little bit about most of the genes in the genome, some genes of course we know a lot about, but we know at least a little bit about most of them. We have information about where a lot of genes are expressed, we have genetic association evidence for lots and lots of genes connecting them to both rare diseases and to common diseases. I think the real challenge now is trying to really work out what all the information we have is actually telling us about exactly how specific targets are relevant to what diseases. So I would say it's really now a matter of effectively leveraging the really complex emerging data sets that we have.

Kat: So now, given where we are in terms of the genomic data that we have access to and the technologies that we have access to. How are you starting to really mine this information to find new targets, to find the genes that are relevant in disease? How do you go about it?

David: So I think I would actually reformulate it a little bit and I think we need to move a little bit away from the idea of finding novel targets and validating targets, and I do think we need to move more towards the idea of elucidating targets to understand how to use them therapeutically. And of course, one way to put that is we know almost all of the genes in the genome that encode protein and, as I was saying earlier, for a lot of those, we know something about them and you're really not frequently now going to be in a situation where somebody performs a genetic study and ends up with a contribution to drug development where the genetics points to a target that people hadn't really thought about and the genetics tells you exactly how to modulate that target that does happen, but it really is very much the exception.

David: Instead, what we have is really bewilderingly complex data showing that certain genes go up a little bit in expression and the context of a disease state, or they'll go down a little bit in expression in the context of a disease state or certain genes have modest associations for a range of different complex traits and very strong associations for a range of different mendelian ones, and you have to actually sift through all of that and decide what the right indication is for modulators of those targets. So I really think it's becoming much more of a data analysis and interpretation challenge in order to figure out really fundamentally the right indications to consider for modulators of specific targets. We know about lots and lots of targets that are connected to a whole broad range of phenotypes. But you have to actually guess right in terms of where to test them because the trials are very expensive to run.

David: So you need to decide for a particular target, do you want to go after heart failure, chronic kidney disease? And once you decide whether you want to go after these broad indications, you have to think about the appropriate way to stratify these highly heterogeneous conditions. For a given therapeutic it might be generally applicable in that disease area, or it might only be applicable to individuals with a particular underlying cause of disease. And those, I think, are the ways in which we have to figure out how to use genetics. And we really are only at the beginning. I mean, it's really striking to me that we are still running trials in diseases we know are massively heterogeneous, like heart failure, like chronic kidney disease, like many, many others. And we are really not stratifying those populations in meaningful ways. And it's almost a certainty that many of the treatments we're considering will work better in some groups than others. So it's really a primary focus, I think, going forward, it needs to be.

Kat: So what do we need to solve this challenge? Obviously, lots and lots of genomic data is one and clever computers are another. So what really needs to happen to put this together?

David: One thing that needs to happen that's obvious is that we really need to have large paired datasets for patients that combine genomic and clinical data. Right now, by and large, patients are enrolled into clinical trials without reference to underlying genomic data. Lots of genomic data are being collected as part of clinical care, but by and large, when you look for patients for trials, it's not done as a function of underlying genomic data. And it needs to be, even if you're not considering a treatment that is targeted to a genetic form of disease.

David: It really is a very reasonable expectation that many treatments will work better for some subgroups in comparison to others and if we ignore those underlying stratifiers, we won't discover those connections. And so what I would say, a very obvious, really quite urgent priority is to develop the data sets that would allow trials for common complex diseases to be run in a way that is stratified. So I would say that that's one really high priority.

David: The other of course is really generating an effective data commons so that we can really understand connections between genes and complex human phenotypes. It's not just, is this gene involved in a disease? It really is, how is the gene involved in a disease? And how can we make effective use of omic data to tell us that and really figure out the right ways to think about modulating targets in different disease States? That's, that's really actually, I think the biggest challenge we face.

Kat: I've always described it as the black box between genotype and phenotype. What genes you've got, what variations you've got and then how you actually come out and the impact that it has on your health? And what sort of data can we now get access to that helps us to open that black box? Genomics is one, but you talk about this sort of idea of multi-omics. What sort of information do we need about people and their genes and their health to start really figuring out what disease is actually like?

David: I think it's clear that we really do need to go beyond looking at inherited genomic variation. We really need to think about systematically characterising all the omic levels that can be systematically characterised and we need to do that in a variety of disease contexts. As perhaps the most obvious example, we now are actually quite good at characterising cell specific variation in gene expression, and that gives us a lot of information about how genes are connected to disease. And we clearly need to really dramatically increase the amount of gene expression data that we have for patients at different stages of disease. It's not good enough to just look at gene expression late in the course of disease, because we have really no way then of unpacking which are the expression changes that influence disease development as opposed to secondary changes that are the result of having disease that actually are of no utility in therapeutic interventions.

David: So certainly one of the things that we need to do is really get much more systematic about generating gene expression data. But I would also say that wherever we can make comprehensive characterisations we need to. And that increasingly applies to metabolic proteomic data types. And really right now we've gotten to the point where we really can think about sequence variation systematically, and try to relate that to phenotypes, we need a lot more data, but we sort of know how to do it. We need to get to that point for these other omic data types as well.

Kat: And finally, what gets you excited about the future? Where do you think we'll be in five years time with the tools, the technologies, the datasets, the advances in computing, where do you want to be?

David: So I think the thing that really excites me the most right now is the prospect of truly having a molecular taxonomy of disease to underpin our drug development efforts. This is something that we've been talking about for a long time in the community that genetics and other omic approaches would really finally break these very heterogeneous, very complex, disease areas up into subgroups where we understand a lot more about why patients have disease and can target our treatments to those subgroups, where we have a mechanistic understanding of disease.

David: This has been the hope for a very long time and I now see signs that we really are finally moving in that direction. And I think once we develop a really systematic molecular taxonomy of disease, it will give us pointers to targets that are relevant to the different subgroups of the different complex diseases. I think that's what's exciting. And it's why I think we might be finally moving towards the steep part of the J curve in terms of genomics, really making a contribution to drug development.

Kat: David Goldstein, from Columbia University Medical Center.

Dave Michalovich: Using AI to seek out new drug targets

Dave Michalovich: Using AI to seek out new drug targets

Hello CRISPR

Hello CRISPR

0