Coventry University Featured PhD Programmes
University of Southampton Featured PhD Programmes
University of Reading Featured PhD Programmes

EASTBIO Machine learning driven codon optimisation for heterologous protein expression

School of Biological Sciences

This project is no longer listed on and may not be available.

Click here to search for PhD studentship opportunities
Dr G Stracquadanio No more applications being accepted Competition Funded PhD Project (Students Worldwide)

About the Project

BACKGROUND. Expression of proteins requires the transcription of DNA into RNA, followed by its translation into amino acid sequences. Each amino acid is encoded by triplets of nucleotides, called codons, which are universal in Nature. However, an amino acid can be encoded by different codons, a phenomenon known as degeneracy of the genetic code, and the use of a codon instead of another affects downstream protein abundance. Interestingly, despite the synthesis machinery is relatively conserved across species, synonymous codons usage varies across species and even across genes, as a function of a number of factors, including GC content, recombination rates, mRNA stability and codon position [Novoa et al, 2019]. Moreover, it has been shown that once a given codon is used, subsequent codons encoding the same amino acid are not randomly picked but follow complex combinatorial patterns [Cannarozzi et al, 2010].
Despite the wealth of knowledge generated by high-throughput sequencing and proteomics experiments, the rules underpinning codon usage are mostly unknown.
From an industrial biotechnology perspective, this knowledge gap limits our ability to efficiently express heterologous proteins and to optimise properties for end-user applications, such as solubility [Pellizza et al, 2018].

AIMS AND OBJECTIVES. In collaboration with Fujifilm Diosynth Biotechnologies UK (FDBK), we propose to learn codon usage rules by rephrasing protein synthesis as a language modelling problem. We will then use deep learning in order to capture complex epistatic and evolutionary patterns associated with highly expressed genes and with optimal solubility. Ultimately, these models will be validated in silico and in-vivo.

WORKPLAN. The project is structured in 3 work packages.
- WP1 – the student will collect transcriptomic data for E. coli from public repositories and generate a dataset of curated transcripts and associated protein sequences.
- WP2 – the student will develop a neural language model to convert amino acid sequences into DNA sequences, by taking into account evolutionary information and protein function.
- WP3 – experimental validation of models’ effectiveness, by synthesizing, building and expressing codon optimised proteins in E. coli and performing downstream comparison against wild-type variants and genes optimised with existing methods.

TRAINING PROGRAM. The student will receive training in machine learning, statistical learning and deep learning, and will build a competitive profile in biological sequence modelling and design. The student will be also introduced to the emerging field of synthetic biology and will learn modern DNA cloning and assembly techniques and the use of protein expression systems at scale. We also put a strong emphasis on reproducible research; the student will receive training in advanced research software engineering and in reproducible workflows for data analyses.

Fujifilm Diosynth Biotechnologies UK supervisor - Christopher Lennon

The School of Biological Sciences is committed to Equality & Diversity:

How to Apply:
The “Institution Website” button will take you to our Online Application checklist. Complete each step and download the checklist which will provide a list of funding options and guide you through the application process.

Funding Notes

This 4 year PhD project is part of a competition funded by EASTBIO BBSRC Doctoral Training Partnership This opportunity is open to UK and International students and provides funding to cover stipend and UK level tuition fees. The fee difference will be covered by the University of Edinburgh for successful international applicants. Please refer to UKRI website ( and Annex B of the UKRI Training Grant Terms and Conditions ( for full eligibility criteria.


1. Novoa, Eva Maria, et al. "Elucidation of codon usage signatures across the domains of life." Molecular biology and evolution 36.10 (2019): 2328-2339.
2. Cannarozzi, Gina, et al. "A role for codon order in translation dynamics." Cell 141.2 (2010): 355-367.
Pellizza, Leonardo, et al. "Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli." Scientific reports 8.1 (2018): 1-12.
Search Suggestions

Search Suggestions

Based on your current searches we recommend the following search filters.

FindAPhD. Copyright 2005-2021
All rights reserved.