Centre for Genomic Regulation (CRG) Featured PhD Programmes
University of Oxford Featured PhD Programmes
University College London Featured PhD Programmes

Using machine learning to identify aggregation resistant biopharmaceuticals

   Faculty of Biological Sciences

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
  Prof David Brockwell, Prof David Westhead, Prof S E Radford  No more applications being accepted  Competition Funded PhD Project (European/UK Students Only)

About the Project

The UK is a major stakeholder in biopharmaceutical development and production, a sector that had sales of $228 billion in 2016. Aggregation is a major hurdle to their manufacture resulting in the failure of promising candidate biologics even at very late stages in the development pipeline. The ability to identify sequences likely to aggregate during production, transport or storage is of crucial importance to the biologics industry. This is currently beyond our capability both for mAbs and for the arsenal of advanced therapies (antibody-drug conjugates etc) that have the potential of revolutionising medicine in the future.
Together with Astra Zeneca, we have developed an in vivo selection method in E.coli able to quantify the aggregation propensity of bio-therapeutics that include mAbs by linking aggregation to antibiotic resistance. We have shown the assay can be used to screen for aggregation-resistant proteins of therapeutic importance with different protein scaffolds (reference 1) (a previous BBSRC CASE student with Avacta/AZ)) and, most recently, have used it combined with directed evolution to generate new proteins with enhanced bioprocessing capability (under review).
Excitingly, in addition to isolating inherently developable therapeutics, this combined approach allows isolation of thousands protein sequences with known aggregation properties, opening the door to using machine learning (ML) to identify the key drivers of aggregation (whether during ageing and neurodegeneration or during advanced therapy manufacture) from such highly complex datasets.

Objectives. In collaboration with our industrial collaborators at Astra Zeneca we will:
1. Generate a large dataset of protein sequences with improved (positive selection) and worsened (negative selection) aggregation propensity. This will be achieved by performing directed evolution on five single-chain Fv (scFv) sequences with low sequence identity but poor biophysical behavior identified from the literature and our industrial partner.
2. Use these data as training sets for the development of ML algorithms to identify aggregation resistant sequences.
3. Validate the machine learning outputs by quantifying the aggregation properties of a test set of sequences ranked by the optimized ML algorithm.

Novelty and timeliness
The ability to identify aggregation-resistant protein therapeutics early in development, without the need of large scale purification is both novel and timely, especially as more complex protein therapeutics are currently in development. Additionally, our novel evolution platform will be used as a high throughput screen enabling the generation of large datasets which will be used in a ‘big data’ approach to understand the complex multi-factorial mechanisms underlying selection. This will ultimately lead to novel predictors of aggregation and an understanding of the fundamental mechanisms.

Experimental Approach
Molecular biology (error prone PCR and golden gate cloning) will be used to generate libraries of mutated scFv. High throughput sequencing and high throughput aggregation assays will be used to construct a large dataset of sequences with known aggregation behaviour.
These data will be used to carry out ML initially within Python using Scj-kit Learn with the aim of generating new predictive methods for protein aggregation.
The predictive power of the optimised classifier will be verified by expressing a range of optimised sequences in the full IgG scaffold (the student will do these experiments at AZ) and their properties assessed using industry employed methods (e.g. accelerated stability assays, SEC and AC SINS).

Work during placement
it is envisaged that several short visits to Astra Zeneca’s Cambridge site in years one and two will precede a longer visit in year 3. The aims of the visits in years 1 and 2 will be to construct, express, purify and characterize the “wild-type” IgG sequences that will be subjected to directed evolution at Leeds. In year 3, similar work will be undertaken on a larger number of constructs to quantify prediction accuracy of the developed algorithm. Proteins will be characterized using the panoply of methods used in industry e.g. SEC, AC-SINS, DSC, IEF, MS. The project will form part of a true collaboration with AZ, and visits to AZ will also be organized as the science dictates as the project develops.

Funding Notes

BBSRC White Rose Mechanistic Biology DTP CASE 4 year studentship.
Studentships covers UK/EU fees and stipend (c.£15,009) for 4 years to start in Oct 2020. Applicants should have/be expecting at least a 2.1 Hons. degree in a relevant subject. EU candidates require 3 years of UK residency in order to receive full studentship. English language requirements may apply.
Apply online https://studentservices.leeds.ac.uk/pls/banprod/bwskalog_uol.P_DispLoginNon Course is PhD in Biological Sciences and we require a CV and transcripts.


1. An in vivo platform for identifying inhibitors of protein aggregation. Saunders, J., Young, L., Mahood, R., Jackson, M., Revill, C., Foster, R., Smith, A., Ashcroft, A., Brockwell, D. and Radford, S. (2016) Nat Chem Biol. 12:94-101.
Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.