26-27 Jan | FREE virtual study fair | REGISTER NOW 26-27 Jan | FREE virtual study fair | REGISTER NOW
University of Salford Featured PhD Programmes
Heriot-Watt University Featured PhD Programmes
University of Reading Featured PhD Programmes

Understanding transfer learning through the calculation of chemical reaction barriers


   Centre for Accountable, Responsible and Transparent AI

  , Dr Pranav Singh  Applications accepted all year round  Competition Funded PhD Project (Students Worldwide)

Bath United Kingdom Applied Chemistry Artificial Intelligence Chemical Physics Computational Chemistry Data Science Machine Learning Pharmaceutical Chemistry Software Engineering

About the Project

Transfer learning (TL) is a promising means to create high quality machine learning (ML) models. In TL, a model trained on an initial dataset (the source dataset) is repurposed for another dataset (the target dataset) by making a relatively small number of alterations to the old model (changing and training a few layers in a deep network, for instance). The source dataset may be of greater size but lower fidelity, or specific to a certain domain, and the target dataset may be of higher quality than the source dataset or concern a different (but related) domain. The ability to successfully transfer the learning from one task or dataset to another can enable greater levels of applicability or lower cost of creation compared with single-domain models. However, the mechanisms by which TL improves the predictions are not well-understood, and TL can be ineffective when the source and target datasets are too disparate.

In recent years, there have been significant developments in ML techniques for problems that traditionally fall in the domain of scientific computation such as weather prediction, fluid dynamics simulations, protein folding, and predicting properties of molecules and chemical reactions. Training ML models on these problems requires large datasets that are generated using existing computational methods. Typically, a wide range of computational methods are available with different and well-understood accuracies, and often arbitrary levels of accuracy can be achieved at the expense of a larger computational cost. Unlike problems in image classification, for instance, where source and target domains may differ significantly, in these applications it is possible to have source and target datasets that differ in a continuous manner.

The first part of this project will investigate TL when source and target systems to be learned are the same, but the source dataset is generated by a lower accuracy method, while the target dataset is generated by a higher accuracy method; this could be referred to as "vertical" TL. By varying the relative and absolute levels of accuracies of the source and target datasets, the effects of dataset quality on the success and stability of TL will be understood.

The second part of the project will consider applications of TL to transfer between different but closely related systems which are governed by the same laws of physics. The datasets for these will be computed using the same computational method and at the same level of accuracy, and thus this can be termed “horizontal” TL. Further experiments could involve transferring between both different levels of dataset accuracies and different systems governed a common law of physics ("diagonal" TL).

These experiments will not only create successful TL models, but crucially will also find the limits of TL in the context of scientific computing, creating destabilised models and finding guidelines for the selection and creation of suitable source and target datasets for TL. These considerations may also lead to better training strategies that enable faster and/or more robust learning in the domain of scientific computation. The project will also explore architectural designs for TL in these applications that reflect the multiscale nature of the datasets and the shared physical laws, allowing a more straightforward interpretation of the contributions of various components of the network at different scales.

As a concrete application, the project will focus on the prediction of the properties of chemical reactions such as the activation energy. The field of computational chemistry provides a wide range of methods and approximations for computing molecular properties, usually referred to as the different "levels of theory". There are also countless chemical reactions of interest, of varying degrees of structural similarity. Overall, predicting activation energies presents a vast supply of source and target combinations, allowing for an extensive exploration and interpretation of TL as the source and target datasets vary in terms of application domain and accuracy.

This project is associated with the UKRI Centre for Doctoral Training (CDT) in Accountable, Responsible and Transparent AI (ART-AI).

Applicants should hold, or expect to receive, a first or upper-second class honours degree in a relevant subject. Applicants should have taken a mathematics unit or a quantitative methods course at university or have at least grade B in A level maths or international equivalent. Experience with coding (any language) is desirable.

Formal applications should be accompanied by a research proposal and made via the University of Bath’s online application form. Enquiries about the application process should be sent to .

Start date: 3 October 2022.


Funding Notes

ART-AI CDT studentships are available on a competition basis and applicants are advised to apply early as offers are made from January onwards. Funding will cover tuition fees and maintenance at the UKRI doctoral stipend rate (£15,609 per annum in 2021/22, increased annually in line with the GDP deflator) for up to 4 years.
We also welcome applications from candidates who can source their own funding.

Email Now


Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs