The AI Challenge in drug discovery and development: What is “truth”?

Date posted: January 7, 2022 | Author: Robert Plenge | No Comments »

Categories: Drug Discovery Embedded Genomics Human Genetics Precision Medicine

[ I am an employee of Bristol Myers Squibb. The views expressed here are my own, assuming I am real and not a humanoid. ]

In the original Blade Runner (1982), Harrison Ford’s character, Deckard, implements a fictitious Voight-Kampff test to measure bodily functions such as heart rate and pupillary dilation in response to emotionally provocative questions. The purpose: to establish “truth”, i.e., determine whether an individual is a human or a bioengineered humanoid known as a replicant.

While the Voight-Kampff test was used to establish truth for humans vs replicants, the concept of “truth” is central to neural networks used in machine learning and artificial intelligence (AI). And for AI to be effective in drug discovery and development, it is critical to ask a fundamental question: what is “truth” in drug discovery and development?

INTRODUCTION

I recently read the book Genius Makers by Cade Metz and was reminded of the long history of machine learning, neural networks, and artificial intelligence (AI). This is a field more than 60 years in the making, with slow growth for the first 50 years – AI was founded as an academic discipline in 1956 – and exponential growth in the last 10. The original mathematical framework of neural networks was created in the 50’s (perceptron), 60’s and 70’s (backpropagation), but went largely unappreciated outside of academics, as the practical applications were few and far between. Yet as new algorithms, massive datasets (Google, Facebook, Amazon), and expanded computing power (GPUs – graphics processing unit) emerged, so did the applications of AI in the tech industry.

In this blog, I compare and contrast AI in the tech sector (e.g., gaming, imaging, natural language understanding, self-driving cars) vs. AI in drug discovery and early development (e.g., new targets, chemical screens, lead optimization, proof-of-concept clinical trials). The key message is that AI requires massive, curated datasets of true positive and negative results, or “truths”, to train, test, and refine algorithms. Such datasets exist in tech but are less prevalent in drug research and early development (R&ED). However, there are emerging opportunities to establish “truths” in drug R&ED that will likely unlock AI in the next decade (e.g., human genetics and population-scale biobanks combined with functional genomics to link genotype and function to clinical outcomes).

[A number of articles and blogs have been written on AI in drug discovery; see here, here, here, here.]

THE RISE OF AI IN TECH VS BIOLOGY

AI was not very impactful in the tech industry until just 10 years ago, according to Genius Makers. Google started the tech-based AI gold rush in 2012 with the hiring of Ray Kurzweil, followed by the acquisition of a University of Toronto AI company DNNresearch in March 2013. Facebook announced its new AI Lab in December 2013. Google expanded its AI capabilities with the acquisition of DeepMind in 2014. Indeed, if you want to see the rise in AI graphically, just plot NVIDIA stock price (the maker of GPUs) over the last 10 years.

The last 10 years have seen the widespread adoption of AI in tech, so much so that AI is part of our vernacular. Everyone is aware of AI applications in voice recognition software (“Hey, Siri, when were you born?”), self-driving cars (e.g., listen to Malcolm Gladwell’s podcast on Waymo here), text recognition (e.g., auto fill on gmail), recommendation systems (“How did Netflix know I wanted to watch Blade Runner and Westworld?”), and many other practical real-world applications.

AI has been in biology for many years, and the impact has been relatively modest. I am sure many will argue with me on this statement, as AI seems to be so prevalent in so many areas of biological sciences. One prominent example has been the success of AI in protein folding (e.g., AlphaFold; here, here, here, including the NewCo Isomorphic Laboratories). A number of AI-based drug discovery companies have also been formed (links here, here). Indeed, there is so much attention to AI in biology that it provides fodder for one of my favorite jokes:

Question: How do you know if someone does AI in drug discovery and development?

Answer: Don’t worry, they will tell you!

EXAMPLES OF AI IN BIOLOGY – DEFINING “TRUTH” TO TRAIN ALGORITHMS

Before introducing AI in drug R&ED, I want to share two personal examples that reinforce the key point of this blog: The key to AI is having curated datasets (i.e., true positives and true negatives) to establish truths, and then larger datasets to refine algorithms through reinforcement learning. The two biology examples I provide in this section are not pure neural network examples of AI, instead, they represent components of AI such natural language processing, text mining, and machine learning.

The first biology example is the application of AI and other computational approaches to define clinical phenotypes in electronic health records (EHR; review here). This was done collaboratively as part of the informatics for integrating biology and the bedside, or i2b2, project. The AI and computational specialists on this project included Zak Kohane, Tianxi Cai, Peter Solovitz, Guergana Savova, Soumya Raychaudhuri, and others. We started a project on rheumatoid arthritis (RA) and other autoimmune diseases in 2007. The goal was to use all available EHR data to define patients with rheumatoid arthritis in one health care system – including unstructured data mined with natural language processing – and to link these data to genetic data to derive new biological insights. Our first EHR publication was in 2010 (link here), with subsequent publications over the next several years (here, here, here), including the demonstration of portability of the algorithm across health care systems (work led by Josh Denny, link here).

At the start of the EHR project, the biggest challenge was the definition of truth – patients who had rheumatoid arthritis and patients who did not – in order to train prediction algorithms. We had to manually curate EHR data to enable training and tests sets of rheumatoid arthritis patients. This took many, many hours of chart reviews. As I was reviewing medical charts, I frequently asked myself whether this investment of time was worth it. Am I spending dozens of hours to define a training / test set of 1000 RA patients only to apply the algorithm and identify 5-times more RA patients? Is that really a breakthrough? Over the last several years, improvements were made to speed up establishing gold truths to enable AI in EHR research (e.g., PheCAP, which uses machine learning for algorithm training, see here)…but the process is still not optimized.

To be fair, and as acknowledged above, the EHR algorithms weren’t neural networks. We tried different AI-based approaches, and realized the AI approaches were no better than basic logistic regression. With larger curated datasets, I suspect these AI approaches could have been superior.

The second example is the use of text mining to interpret results from genome-wide association studies (GWAS). A team led by Soumya Raychaudhuri and Mark Daly developed a method to identify the most likely causal gene from GWAS by mining the text from the abstracts of published research papers (here). We seeded the algorithm with a certain number of true positive associations to find patterns from PubMed abstracts. The more true positives we added to the algorithm the better the prediction. At the time, we only had 15 genome-wide significant SNPs outside of the Major Histocompatibility Complex (MHC) locus. To increase the amount of input information, we expanded the list to 370 SNPs from 179 independent loci with P < 0.001 in an RA GWAS of 3,393 cases and 12,462 controls. To establish true negatives, we randomly selected SNPs outside of these regions. This method, termed GRAIL, was very effective at nominating causal genes, as demonstrated by a prospective study where we used GRAIL to prioritize SNPs for replication in independent RA case-control samples (here).

Again, the key message from these two examples: For AI-based approaches to be most impactful, it is important to seed algorithms with true positive and true negatives, and then apply the AI algorithms on extremely large datasets to refine the models through reinforcement learning. Without such “truths” and large datasets to learn over time, AI is of limited value.

AI IN DRUG RESEARCH AND EARLY DEVELOPMENT

Establishing truth in early AI tech applications included mastering games such as Atari’s Breakout in 2013 and the board game Go in 2016 (links here, here). The neural networks learned by playing rather than by feeding the AI algorithms an exhaustive curated database of true positives and true negatives. Nonetheless, the “truths” were easy to establish: the outcomes of the games and the scores received. With each game, the deep reinforcement learning network used the points it scored in the game as a feedback mechanism, which led to mastery of the games over time.

A more complicated AI application came when a human-like robot hand learned to solve a Rubik’s cube (here). In this tech AI example, the game (Rubik’s cube) was combined with a physical machine (robot hand). The neural networks were trained entirely in simulation, using the same reinforcement learning code as described above for Breakout and Go. Truth was defined as solving the cube.

And an even more complicate problem is self-driving cars. Here, sensors are required to evaluate surroundings and move safely with little or no human input. Truth is defined as successfully navigating streets and avoiding obstacles such as other cars.

But what is “truth” in drug discovery and early development?

Is “truth” an approved drug? If so, there are relative few approvals: less than 3000 approved drugs in the history of the FDA, dating back to its inception in 1930, with “only” 948 new approvals since 1993 (here). While this represents a large number of approved therapies, it is a very small number to train AI algorithms.

Is “truth” the identification of a validated target or pathway before a tool molecule is ever generated? If so, how does one know that the target is truly validated until a drug is developed and tested in clinical trials?

Or is “truth” an intermediate step between target identification and development candidate nomination, such as a readout from a surrogate assay in a high-content small molecule screen or a step in chemical synthesis? Again, how do we know that the assays reflect causal human biology or that the chemical substrates represent drug-like molecules?

Or is “truth” speeding up steps to screen initial molecules or optimize physical properties of a drug-like molecule? Alas, speed is only part of the problem faced in drug R&D – the real problem is success in the clinic (see @DerekLowe blog here).

Moreover, even if “truth” is established, it is important to understand the causal relationship of biological events that lead to the outcome of interest (see here). As described in a Nature Machine Intelligence article: “Precision medicine, however, is not only about predicting risks and outcomes, but also about weighing interventions. Interventional clinical predictive models require the correct specification of cause and effect, and the calculation of so-called counterfactuals, that is, alternative scenarios.”

In the remainder of the blog, I focus on four key steps in drug R&ED that I believe could benefit from AI-based approaches, assuming there is sufficient confidence in the definition of “truth”, and understanding the casual relationship between biology and outcome at each step. And as with the Rubik’s cube robot (game and robot hand) and self-driving car (sensors and decision-making), a key element will be not just in solving one component of the problem, but in linking multiple components together to drive end-to-end drug R&ED via an “innovation stack” (a concept borrowed from a book by Jim McKelvey, co-founder of Square).

1. Target ID and validation

The first step is to define causal human biology (CHB) and then to turn CHB into actionable readouts for AI-based high-content screens (as per the next step). One approach is to start with large-scale genetic studies as the source of CHB, as we know human genetics is a predictor of approved drugs (here, here; link to my 2021 ASHG presentation here). I use a recent rheumatoid arthritis trans-ancestry GWAS and a new computational method (sc-linker) as two examples. There have been other interesting human genetic studies, too (e.g., recent GWAS and exome sequencing study in lipids [here, here], IL6 vs IL6R Mendelian randomization [here], Open Targets genetic portal [here]).

A recent RA trans-ancestry GWAS provides a good test case of the general approach (medRxiv pre-print here). It represents the state-of-the-art in GWAS: large sample sizes; rigorous statistics to define true positive associations; integration of functional annotations to refine biological hypotheses (e.g., cell-specific chromatin marks, expression QTLs, splicing QTLs, cell-type-specific transcription factor binding sites); trans-ancestry assessments; and polygenic risk score analysis. The study, which includes 35,871 RA patients and 240,149 control individuals, identified 122 genome-wide significant loci, of which 34 were novel. The study used these genetic hits to drill into the causal human biology of RA. Insights include the role of cytokines and related signaling molecules (e.g., TYK2, IL6R), CD4+ T cells (e.g., Tbet annotations), Tregs (e.g., IL2RA/CD25, LEF1), citrullination (e.g., PADI4, PADI2), B cell antibody production (e.g., CCR6), and joint tissue and bone biology (e.g., CILP2, TNFRSF11A/RANK,WISP1). The study provides non-overlapping areas of biology to explore further in functional studies, together with specific targets that are associated with those areas of biology.

Another recent example is the development of a framework, termed “sc-linker”, to establish causal relationships between genetics and a specific cell types (bioRxiv pre-print here). More specifically, the study integrates scRNA-seq data, epigenomic maps and GWAS summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. Ulcerative colitis served as a use case, where sc-linker identified multiple pathogenic cell types: enterocytes and M cells; complement cascade in plasma, B cells, enterocytes and fibroblasts; and MHC-II antigen presentation. Of these, the study highlights M cells, which are rare cells in the colon that search for pathogens in the gut lumen and play a key role in immune–microbiome homeostasis.

But as the size and complexity of these genetic studies continue to increase, it will be even more difficult to digest the results using traditional methods – as demonstrated by two genetic studies of lipids and a fine-mapping manuscript from Open Targets. A recent Global Lipids Consortium study included 1.65 million individuals with GWAS data leading to 941 genome-wide significant loci and 1,486 independent variants for at least one lipid trait (here). A second study performed exome sequencing in >170,000 individuals across multiple ancestries and identified 35 genes with rare variants and/or genetic burden signals associated with blood lipid levels (here). The Opens Target genetics portal used machine learning to develop a “Locus-2-Gene score”, which represents one of the best approaches to identify a causal gene from GWAS (Nature Genetics manuscript link here). Thus, as human genetic datasets become large, more complex statistical approaches – including machine learning and AI – will likely be beneficial to distill complex genetic signals into actionable biology.

The key is to reduce the genetic signals into actionable biology – for example, individual targets with an allelic series (e.g., TYK2, IL2RA, IL6R in RA) or cell types and readouts for phenotypic screens (e.g., M cells in ulcerative colitis). For the pathogenic cell types, it should be possible to use functional interrogation of the implicated genes from GWAS (e.g., gene editing of the disease-associated variant, saturation mutagenesis) to confirm that the assay used for a phenotypic screen is relevant to the biology of disease.

That is, it should be possible to use human genetic studies to define “truth” and causality for targets and relevant readouts as starting points for drug discovery efforts. Moreover, human genetics can validate the relevance of assays amenable to AI-based high-throughput screening approaches, as described in the next section.

2. Screens to identify chemical leads

While validating a target or a pathway is not easy, it is not the hardest step in drug R&ED. Equally important is translating the target or pathway into actionable biology that can be used to triage early tool molecules. In most drug discovery, there is so much bespoke work that drives these early stages of assay development for high-throughput small molecule screens. (Note that I will focus on small molecules rather than biotherapeutics, cell therapy, or gene therapy, as the early discovery stages for these other modalities are different.) For example, if an assay is designed to find an orthosteric inhibitor of a kinase, then this is what the screen will deliver. Yet, there are many aspects of target biology that are more complex than orthosteric inhibition – and this is where human genetics, complex cellular readouts, and AI can have an impact.

I will focus on cell-based phenotypic screens, as I believe that the output from these screens serve as input into AI-based algorithms. Further, human genetics can establish the validity of these assays to human physiology – that is, human genetics can establish the “truth” of the readouts for drug discovery and early development.

The physiological relevance of most cell-based phenotypic screens is questionable. However, if the cell type is validated by human genetics (e.g., see above for cell types implicated in the RA trans-ancestral GWAS and M cells in the sc-linker study of ulcerative colitis), then the individual genes and variants implicated from genetics can be used to validate the cellular readout through functional genomics. For example, bone and cartilage biology are implicated by RA genetics, and specific genes (e.g., CILP2, TNFRSF11A, WISP1) and variants within those genes can be assigned to these cell types. Through saturation mutagenesis (see example here for how this was done recently for SARS-CoV-2), CRISPR-based gene editing, and other cellular perturbations (e.g., Perturb-seq), the cellular readouts can be validated as disease-relevant.

In the RA example, a validated cellular readout would be influenced by RA-associated genes and variants. If an RA-associated gene is knocked-out or an RA-associated variant is introduced, then the cellular readout should change. The goal of a phenotypic screen would be to convert a mutant cellular phenotype to a wild-type cellular readout through the introduction of a small molecule.

I will note that this approach likely will require development of new cellular assays – human organoids (not to be confused with humanoids!), imaging of live cells, automated microscopy, and other image-based approaches – where the readouts are amenable to AI algorithms.

That is, it should be possible to use human genetics to validate cellular readouts in a data format suitable for AI-based approaches. If this can be done, then – like solving the Rubik’s cube with a robotic hand and fully autonomous cars navigating the streets of Phoenix – it should be possible to create an integrated neural network with reinforcement learning where “truth” can be established for a cellular readout. Ideally, such an integrated system could be semi-automated, where a computer could pick the next set of small molecules to test, based on results from the previous experiments.

3. Lead optimization

Once a tool molecule has been tested for biological activity in vitro and in vivo, the next step is to optimize the pharmacological properties of the small molecule. This requires not just improvements in selectivity and potency, but also improvements in the drug-like physical properties that enable oral bioavailability, cell permeability, protein binding, metabolism, pharmacokinetics, formulation, off-target toxicity, and other features. These are all the properties required to test optimize a tool compound into a development candidate to test a drug in humans.

One challenge is establishing multi-parameter metrics that capture complex features into a score to drive lead optimization. Optimizing one feature (e.g., potency, selectivity) may lead to unfavorable properties of another feature (e.g., absorption, metabolism). This is where AI can have a big impact, as there are many features that must be considered at each step – like the number of moves in a Go game. That is, AI can facilitate the creation of a single “optimization score” that can drive active learning in the delivery of a development candidate for clinical testing.

At this stage in drug discovery, it is imperative that the features of the optimization score reflect “truth”, as described in previous sections. Many features are well-established in medicinal chemistry and generalizable to small molecules regardless of the biological target (e.g., hERG liability for cardiac toxicity, Caco-2 efflux for intestinal absorption, microsome stability for metabolism). However, biological assays of potency and selectivity are specific to a biological target. Thus, the validated assays established for early screens (see above) should be used to drive optimization.

Finally, in vivo testing is important to confirm biological activity of a drug, establish estimates of predicted human dose, and assess potential toxicities. To channel a commonly used aphorism in statistics, “all models are wrong, but some are useful”. Similarly, in vivo efficacy pre-clinical models are notorious for failing to predict human efficacy…but some are useful. The focus should not be on clinical efficacy, per se, but in establishing reliable PK-PD relationships to enable human dose projection and biomarkers for decision-making in clinical trials.

Thus, a multi-parameter “optimization score”, where the biology readouts are based on validated assays (as per steps 1 and 2 above), can be used to drive AI-based active learning for the delivery of a development candidate for clinical testing. In vivo testing is performed to establish PK-PD relationships, not test “efficacy” in a pre-clinical model.

4. Biomarkers for clinical trials

The ultimate goal of drug discovery is to deliver development candidates into the clinic with an increased probability of success for improving patient outcomes. It is not just about doing optimization faster – it is about driving optimization to better therapies. As Derek Lowe puts it in his 2021 blog on AI in drug discovery:

“We come up with mechanistic biochemical rationales, cell assays, animal assays, evaluation schemes for compound structures and physical properties, all sorts of things to try to increase our chances for success when the curtain goes up and the real show starts. Which is human dosing. These proxies generate heaps of numerical data, so it’s understandable that computational approaches use them to try to make better predictions. But in the end, they’re all still just proxies.”

To increase probability of success in the clinic, it is important to have validated biomarkers that reflect the “truths” established at the time of target validation. While the biomarkers should be based on human data, the biomarkers should be tested in pre-clinical models to establish PK-PD relationships to enable decision-making in humans. This may seem like a subtle point – or perhaps an obvious point. Nonetheless, it is important to emphasize, as a tremendous amount of time can be spent on in vivo testing in pre-clinical models, often due to lack of focus on the role of these pre-clinical models.

An important source of human biomarker data is the wealth of data being generated by living biobanks (see Tweet and replies here, Global Biobank meta-analysis pre-print here, a16z podcast on real-world data here, and UK Biobank circulating metabolic biomarker pre-print here). These living biobanks provide the human data to come full circle in drug R&ED: start with human genetics to nominate targets and pathways, develop assays for AI-based screens that are sensitive to genetic perturbations, test for PK-PD relationships in pre-clinical models for human dose projection, and test therapeutic hypotheses in humans through biomarkers that are linked with the same human genetic perturbations used to seed “truth” at the very beginning!

For example, imagine every possible mutation in a protein and the effect of these mutations on (1) a series of molecular in vitro assays that can serve as readouts for each drug screen (experimentally generated via saturation mutagenesis), (2) biomarkers measured in millions of humans, and (3) clinical phenotypes in millions of humans. Given the number of biobanks that are emerging – with genetic, clinical and molecular phenotypes – it is not unreasonable to think that we will have millions of naturally occurring mutations across the allele frequency spectrum (from ultra-rare to low frequency to common) linked to biomarkers and clinical phenotypes.

Thus, biomarker readouts in a clinical trial are directly linked to the human genetic perturbations used to seed truth at the time of target identification and validation. Pre-clinical in vivo models are used gain confidence that the PK-PD relationships can be measured in human clinical trials.

IN SUMMARY

Just as Deckard implements the Voight-Kampff test to establish truth in the Blade Runner, establishing truth is a key concept in AI. Unfortunately, there is no Voight-Kampff test for “truth” in drug discovery and development, but it should be possible to use human genetics to guide AI-based approaches throughout multiple stages of drug R&ED to get as close to “truth” as possible.

The concept for AI-driven drug discovery is not that different compared to how AI has been used in the tech sector. As curated datasets – sources of truth – have grown in the last decade, so to have tech-based AI applications in gaming, imaging and natural language understanding. Notable examples include addressing focused albeit complex problems such as winning computer games (e.g., Atari’s Breakout) and beating humans in board games (e.g., AlphaGo).

Here, I argue that AI-based drug discovery is where AI in technology was ten years ago. However, just as AI in tech has integrated multiple technologies to solve more complex problems (e.g., solving a Rubik’s cube with a robotic hand, fully autonomous self-driving cars), AI in drug discovery and early development is poised to integrate multiple steps – target and pathway validation, early chemical screens, lead optimization, and biomarker-driven clinical trials. The key, however, is to establish biological “truth” and a causal relationship early in drug discovery, to carry such “truths” through the delivery of development candidates, and to test these “truths” in clinical trials. As summarized by Andreas Bender and Isidro Cortés-Ciriano in their review article: “addressing the questions of which data to generate and which end points to model will be key to improving clinically relevant decision-making in the future.”

That is, the goal of AI in drug discovery should not be to speed up a flawed process (e.g., high-throughput screens or optimizing development candidates based on biological assays of questionable physiological relevance). The goal of AI in drug discovery and early development is to create a more efficient and effective process to deliver transformative medicines to patients.

AI may be able to speed up individual steps such as protein folding or molecule optimization, but until AI improves biology – and integrates multiples steps into a semi-automated system – AI will have only a minor impact on drug R&ED…and be relegated to science fiction stories such as Blade Runner.

The essential features for AI-driven drug discovery are:

Causal human biology (e.g., human genetics) to establish “truth” and causality.
Validated assays that capture “truth” and are amenable to AI-based readouts.
Integrated infrastructure (biological, computational, operational) to carry-out end-to-end testing – from early screens to the selection of development candidates and testing in human clinical trials.
Ability to repeat the approach for other targets and pathways based on learning from previous programs.

I am sure others will have counterpoints to the ones I have made here. In the spirit of reinforcement learning, I welcome feedback on these ideas!

The AI Challenge in drug discovery and development: What is “truth”?

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

Subscribe to this Blog

The AI Challenge in drug discovery and development: What is “truth”?

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Tags

Meta

Subscribe to this Blog