Exceed a Substandard Status Quo in Drug Discovery
Improving the speed and quality of early preclinical drug discovery pipelines is directly related to unlocking new therapies to ultimately improve patient outcomes and save lives. Traditional drug discovery is time-consuming and expensive. After a drug target is identified and optimized, the likelihood of getting to market successfully is less than 10 percent.
Even small advancements in time-to-lead optimization and improvements in the likelihood of clinical success are important to addressing the thousands of diseases that today have no known treatment or cure.
Large Language Models are Learning the Language of Biology
Today’s generative AI models understand the language of biology, chemistry and genomics, leveraging many of the tools that have been developed over the past several years that led to the advent of ChatGPT and other LLMs. Although these models had been applied to biopharma sequence data before (for example, evolutionary scale modeling trained on large databases of protein sequences), the demonstrated success of Google DeepMind’s AlphaFold was the seminal moment demonstrating the promise of these tools.
Since then, the exponential growth of research and development in the space has been hard to ignore: Virtually every week, there are new state-of-the-art architectures, models and approaches published and made available, from academia to industry. For example, DiffDock was recently released, which demonstrated the first viable AI-based small molecule protein docking tool. Where applicable, this model speeds up orders for docking workflows, leading to cheaper and more efficient small molecule screening workflows.
EXPLORE: What types of AI are being used in healthcare?
Generative AI Is a Prescription for Healthcare Success
Every step of the AI drug discovery pipeline is being accelerated with AI. LLMs sift through literature to discover novel druggable targets. Generative models trained on small molecule and protein data provide tools to virtually generate drug candidates with defined properties (and, in a recent joint paper between NVIDIA and Evozyne, have been synthesized and validated in the lab).
Representation learning workflows, such as those supported by evolutionary scale models in protein design, provide the highest-quality property prediction tools yet known in the field and directly contribute to the performance of structure prediction tools such as AlphaFold and ESMFold. Applications of these models to genomics have led to the first true generalizable foundation model for tasks such as gene expression prediction, recently published by NVIDIA, the Technical University of Munich and InstaDeep (acquired this year by BioNTech).