The DPhil in Computational Discovery is a multidisciplinary programme spanning projects in Advanced Molecular Simulations, Machine Learning and Quantum Computing to develop new tools and methodologies for life sciences discovery. This innovative course has been developed in close partnership between Oxford University and IBM Research. Each research project has been co-developed by Oxford academics working with IBM scientists. Students will have a named IBM supervisor/s and many opportunities for collaboration with IBM throughout the studentship.
The scientific focus of the programme is at the interface between Physical and Life Sciences. By bringing together advances in data and computing science with large complex sets of experimental data more realistic and predictive computational models can be developed. These new tools and methodologies for computational discovery can drive advances in our understanding of fundamental cellular biology and drug discovery. Projects will span the emerging fields of Advanced Molecular Simulations, Machine Learning and Quantum Computing addressing both fundamental questions in each of these fields as well as at their interfaces. Students will benefit from the interdisciplinary nature of the course cohort as well as the close interactions with IBM Scientists.
There are 16 projects available and you may identify up to 3 projects to be considered for in your application. The details of Project 5 are listed below.
There is no application fee to apply to this course. For information on how to apply and entry requirements, please see https://www.ox.ac.uk/admissions/graduate/courses/dphil-computational-discovery
Project 5
An approach will be developed that integrates machine learning with experimental measurements for the rapid design of bioactive compounds that are suitable as chemical probes. Chemical probes are often used as the starting point for drug discovery campaigns and can help elucidate the mechanism of molecule-target interactions. Machine learning techniques will be developed that use structural data as input to suggest potent, synthetically tractable molecules to make within this workflow. Techniques that better deconvolve signal from noise in biophysical assays will also be investigated. This project aims to generate an algorithmic formalism that achieves rapid design of bioactive compounds suitable as chemical probes. We will develop a machine learning approach that iteratively integrates experimental data from low-cost robotic organic synthesis, high-throughput crystallography (XChem), and rapid sensor-based biophysical measurements (Grating-Coupled Interferometry). The engine will be able to suggest new molecules that are potent, synthetically tractable and have good pharmacological properties. This approach builds on methodological discoveries made in the successful COVID Moonshot initiative, an open science consortium that Prof von Delft co-founded, which delivered preclinical candidates against SARS-CoV-2 main protease from fragment hits in 18 months with <£1m [1].
Chemical probes are enormously powerful reagents: as potent and selective small-molecule modulators of a protein’s function, they enable one to answer detailed mechanistic and phenotypic questions about those biological targets [2]. They often are also starting points for drug discovery campaigns. However, only a small fraction of the human proteome has an associated chemical probe [3], which therefore currently contribute little to unravelling the fundamental chemical biology of the vast number of genotype-phenotype correlations revealed by genome sequencing. The limiting factor is the high discovery cost and scientific challenge of designing potent and selective ligands.
This project aims to resolve these limitations by bringing together advances in automation, actuated by machine learning.
Recent work by Prof von Delft demonstrates the feasibility using robotic synthesis as part of a fragment-based probe discovery campaign. The key insight is how to use biophysical assays and crystallography to analyse crude reaction mixtures [4]. This sidesteps purification, which is the rate-limiting step in chemical synthesis: protein crystallography directly confirms the ligand’s chemical structure, while sensor-based biophysics provides binding kinetics.
The synthesis planning methodology is driven with machine learning, possibly using an already developed computer aided synthesis planning tool produced by IBM, IBM RXN [5]. Utilizing synthetic accessibility reaction scores as a post hoc filter after compound generation generated with IBM RXN, is a possible avenue to explore. Similarly, a deep learning compound price prediction, CoPriNet, could be used instead of the computationally expensive retrosynthetic based synthetic accessibility reaction score, enriching our compound generation approach with real-world compound prices [6].
This project will address the two interrelated questions that must be answered for these proof-of-concept successes to become an effective platform for probe discovery. First, how can machine learning take structural biology data as input to suggest new molecules to make. Second, how can we deconvolve signal amid noise in biophysical assays when the input is a crude reaction mixture.
We will investigate different techniques to describe protein-ligand complexes, ranging from protein-ligand interaction fingerprint to machine learning-based docking score and full free energy calculations. Using the exponential increase in structural biology throughput possible at the Prof von Delft’s XChem facility, we will develop a new class of descriptors. This work will utilize open source software, including tools developed within our teams, and new developments can be contributed to the open source community.
A related question is deconvoluting biophysical assay data, as the assays are done directly to crude reaction mixtures rather than purified product. We will investigate approaches such as Bayesian statistics to rigorously determine the number of chemical components in the system, as well as incorporating ideas such as using the predicted binding affinity from the structural biology-based machine learning model described above as a prior.
The von Delft group and The Oxford Protein Informatics group (OPIG) led by Professor Deane have complementary expertise, and a collaboration is essential to achieving the project goals. The von Delft group has a track record in high throughput structural biology, robotic synthesis and biophysical assays. OPIG is a world leading group in the area of AI for small molecule design and all code from OPIG is made available as open source. IBM researchers will contribute skills at the intersection of molecular discovery, computational chemistry and AI to this collaboration. Dr. Cornell leads the drug discovery strategy within IBM Accelerated Discovery Research and Dr. Morrone develops computational methods to combine structural data and simulation with artificial intelligence.