Dave Michalovich: Using AI to seek out new drug targets
Click here to listen to the full podcast episode
Kat: The kind of datasets that David was talking about are growing ever larger and more complex, taking them far beyond the capability of a human brain, or even a simple computer, to analyse. So it’s lucky that alongside the exponential rise in genomics and all the other ‘omics’ we’ve seen a similar expansion in computing capacity, with the development of sophisticated machine learning algorithms and artificial intelligence, or AI as it’s more usually known.
Kat: Dave Michalovich is vice president of Precision Medicine at Benevolent AI, a technology-based drug discovery company that has teamed up with AstraZeneca to smoosh all this data together, figuring out what might be going on in disease and homing in on the best targets for new therapies. So, just how much data are we talking about here?
Dave: What is amazing is really the scale, the depth and the breadth of the information we're confronted with. With the multiple evidence streams of information which are coming through into computational systems that we have to deal with. I think there's a really nice kind of paraphrasing quote from Douglas Adams around space, how big it is. I think biological space is big. It's really big, it's just vastly, hugely mindbogglingly big, and we've found what we have to deal with.
Dave: And I think if you think about what data we're handling now, it's large patient cohorts with deep clinical phenotyping information, genetics, both the kind of genotyping and sequence information, epigenetic data that tells us how genes are regulated expression data from RNA-Seq and now single cell RNA-Seq and then outputs from functional genomic screen. So genome wide CRISPR screens that Steve may have talked about earlier, proteomics Metalogix microbiome data as well.
Dave: So it's really vast amounts of biological data from different evidence streams that we need to bring together. And I think each of these evidence streams Has a depth and scale that requires significant resources to manage the data, analyse and present these results. So each of these data streams really needs kind of individual teams and departments to analyse it. And that this has a tendency to cause data silos really. And I think when we want to really understand human health and disease, we really need to bring all these different data streams together and to understand the real kind of jigsaw of disease mechanisms and ultimately select the right targets. And I think given the scale of the data, there's only really one way forward, which is really using advanced computational methodologies, like artificial intelligence and machine learning approaches to really develop knowledge graphs over this space and surface the clinically relevant findings from the data.
Kat: And that's what I always say about genetics, it's not just about the genome, it's not just about what you've got, it's what you do with it that counts. It's about understanding all these outputs and putting them together to build a picture of what's going on in biology, whether that's at the level of a cell or a tissue or an organ or of a whole person and it's really complicated. And I kind of want to find out a bit more about what's actually changed in terms of the computing tools? What's changed in terms of the technology we have that enables us to even handle process smooshed together these very complex datasets?
Dave: Where we're seeing real advancements now is what people are describing as deep learning. So there's these massive neural networks. So kind of almost recapitulating how our neurons work in the brain and requiring a large amount of computing power underlying these things. And the difference here is actually we can provide these tools with much more raw data, as it were, and they will find the patterns within that data and surface those. And I think that's the real exciting area in the field.
Dave: And I guess it's great having those tools, but you obviously have to aim it at something and the approach we've taken at Benevolent is to develop what are called knowledge graphs. So these are ways of representing relationships between different entities, different data types and for us our knowledge graph is built up on entities related to human health and drug discovery. So we have entities related to disease, so listed disease terms, entities which relate to genes and proteins, entities related to drugs, biological processes, cell types, and tissues. And we generate this sort of network of relationships based on causal information. So functional information that may link a gene to a disease through a knockout experiment or expression data or genetic signal, for instance, or a drug to a gene in that that drug would act on the protein that, that gene produces
Kat: So you're kind of saying, all right, we've got some data that is, here's a load of genes, and here's some effects when these genes are on or off or present or missing. We've got a load of diseases, we've got a load of processes like inflammation or fibrosis. And then we've got a load of drugs that we know work in this way or that way. And you're sort of squishing altogether to see, well, what new things can we find here?
Dave: You make a good point there. So it's great that you've got this knowledge graph, so what do you do with that really? And it's the ability to apply some of these machine learning tools over there. So there's a range of programs that allow you to run what we've called inference. So you've got these relationships, but can you infer further relationships based on that priory of connections and I guess a great, maybe more real world example of this is how we use streaming media for your favourite movies and you like your movies and then you find that you're suggested other movies and that's based on a knowledge graph and there the entities are you as a user, but there's lots of other users, there will be the movies, the actors in the movies, the type of movies it is, the genre, whether that movie contains a 40 foot giant radioactive lizard, all these sorts of pieces that come together. And so when you say you like something, what's offered back to you is really an inference of what you may like based on that knowledge graph. And that's some of the approaches that we take over our drug discovery, biology knowledge graph to surface kind of novel findings really.
Kat: I love that this is kind of like the Netflix recommender of drug discovery. It's like targets in heart disease like these kinds of things. So let's see if we can find some more out of that.
Dave: Well, it's a simplistic view, but these are the sorts of types of approaches where we can level over that once you've got the knowledge into this sort of framework, we can do that. And I think we have, certainly as a scientist who's often confronted with a list of genes at the end of an experiment. It might be a sort of genetic study and you've got a list of loci to look at, or you've run a CRISPR screen and you've got a set of hits from your CRISPR screen or done a phenotypic screen. You're often confronted with a list of genes and you've gotta say, actually, which of these genes are going to be my targets I'm gonna take forward. And I think we do our best to maximise the use of data in that triaging process, but I've always got that feeling I'm missing some knowledge here, I don't know enough about C10 or 57 or something that's a bit more abstract and really having these tools, the knowledge graph to really surface all the relevant information as you go through these processes is so key really and that's why I'm really excited about this space as an extension of what I've been doing in the past really. It seems that it's a great place to explore.
Kat: So what can we expect to see coming down the pipeline? It does feel like all of this is really accelerating. We're just getting so much more data. We're being able to go down to the level of single cells. We're being able to do high throughput, large scale experiments, and the computing tools are just accelerating all the time. So what's going to be coming down the pipeline in this kind of area.
Dave: Yeah. So I think I view things in a quite pragmatic approach. I don't think we're looking at one great big red button to hit, and it's going to answer all drug discovery needs in one go. But I think we're going to see aspects of drug discovery starting off with patient phenotyping, electronic healthcare record analysis, into genetics and genomics through into the work we've been doing and the knowledge graph around target discovery being augmented and enhanced by AI and ML approaches and that's something we're working on.
Kat: This may be a bit of a cheeky question, but we hear a lot about AI and ML and it's all very cool, is this just a trend or is this actually going to be the way that we do biology now or somewhere in between, but it feels like there is a transition in that we're actually starting from the data side of things and moving forward. Is that the way it's going to be done? Or is there still going to be a place for the sort of old school way of doing things?
Dave: I think we're in a transition period as we see these tools coming online and picking off problems. So I do believe it's a trajectory we're now on. I think the important piece is having that sort of domain knowledge and understanding how the different data types hang together and making sure we bring that into the equation. But yeah, so I do think, coming back to the kind of earlier points around the vastness of the data that we're having to handle, this is the only way forward, really. So we have to develop these approaches. We have to make sure we're not missing the key information that the biological signals that we want to understand in disease. So I do feel this is the way forward. And I think you can see it across the industry. More and more companies have invested in AI and ML approaches both at the biotech level at the pharma level. But I think it's really making it real, So putting it into practice, making it work, testing, showing that the outputs that we're getting are real, and moving forward from that. So I think that's so yes, yes, yes, I do believe it's the way forward, but I think there's pragmatic approaches testing, validating and advancing really.
Kat: Yeah. I mean the one thing I do know about data science is that it still always falls on the garbage in garbage out. It's like you can have amazing tools, but if you're not putting in quality data then you're not going to get quality answers.
Dave: Yeah. And I think that's where Benevolent's model is fantastic. So we have within the company, deep expertise in drug discovery, different aspects like chemistry, biology, genomics, genetics, which we are bringing together with the AI and ML scientists as well, and then it's, I think it's that augmented world where were combining our expertise and scientific insights really, which allows us to make sure what we do is real, can be validated as applicable. So yeah, I do think it's the way forward, but I think the best scientific advances are often at interfaces rarely and I think having a well connected interface such as we have at Benevolent puts us in a great position to advance these technologies
Kat: Dave Michalovich from Benevolent AI. And you might be interested to know that just a couple of months ago saw the announcement that BenevolentAI’s collaboration with AstraZeneca has already borne fruit, with their sophisticated knowledge graphs revealing a new target in chronic kidney disease which has now entered the AstraZeneca portfolio. Chronic kidney disease is a complicated condition with huge unmet patient needs so any advances in this area will be hugely welcome.