FindAPhD Weekly PhD Newsletter | JOIN NOW FindAPhD Weekly PhD Newsletter | JOIN NOW

Data Engineering in Automatic Machine Learning

   Department of Computing Science

This project is no longer listed on and may not be available.

Click here to search for PhD studentship opportunities
  Dr Mingjun Zhong, Dr Y Sripada  Applications accepted all year round  Self-Funded PhD Students Only

About the Project

Artificial intelligence (AI) using machine learning (ML) is being widely used in industry such as in finance, engineering, health care, energy, chemistry, etc. However, building and deploying ML models embedded in AI systems need humans to prepare data to be acceptable to ML models. It is estimated that data scientists may spend 80% of their time in data preparation; the data preparation process is more formally called data engineering (DE) [1]. Yet, more work in the field of ML is devoted to model fitting rather than data engineering, despite compelling evidence that ML projects that overlook DE failed to meet their project goals.

Automatic machine learning (AML) is the main research topic for this PhD project where focus will be on automating the development of AI systems end-to-end. Precisely, we want to develop the AML pipeline so that our AI system is able to read data directly, prepare it for modelling, build models for user tasks and produce the task results – all automated without significant human intervention. This means that our AML system will automatically complete the DE task.

We will develop DE methods for enabling AML. We will focus on various existing big data sets including the recent household energy consumption data [2] as one of our case studies. Specifically, we will develop machine learning methods to 1) automatically recognise the data schema by labelling the data features and names, 2) identifying and unifying feature names, 3) recognising and correcting data quality issues and 4) performing feature engineering for producing the data structures required by ML. Our aim is that we will have an AML model which is able to read data and produce the analysis results without significant human intervention.

Selection will be made on the basis of academic merit. The successful candidate should have, or expect to obtain, a UK Honours degree at 2.1 or above (or equivalent) in computer science or related subjects.

Formal applications can be completed online:

• Apply for Degree of Doctor of Philosophy in Computing Science

• State name of the lead supervisor as the Name of Proposed Supervisor

• State ‘Self-funded’ as Intended Source of Funding

• State the exact project title on the application form

When applying please ensure all required documents are attached:

• All degree certificates and transcripts (Undergraduate AND Postgraduate MSc-officially translated into English where necessary)

• Detailed CV, Personal Statement/Motivation Letter and Intended source of funding

Informal inquiries can be made to Dr M Zhong ([Email Address Removed]) with a copy of your curriculum vitae and cover letter. All general enquiries should be directed to the Postgraduate Research School ([Email Address Removed])

Funding Notes

This PhD project has no funding attached and is therefore available to students (UK/International) who are able to seek their own funding or sponsorship. Supervisors will not be able to respond to requests to source funding. Details of the cost of study can be found by visiting


[1] Nazabal, A., Williams, C.K., Colavizza, G., Smith, C.R. and Williams, A., 2020. Data engineering for data analytics: a classification of the issues, and case studies. arXiv preprint arXiv:2004.12929.
[2] Pullinger, M., Kilgour, J., Goddard, N., Berliner, N., Webb, L., Dzikovska, M., Lovell, H., Mann, J., Sutton, C., Webb, J. and Zhong, M., 2021. The IDEAL household energy dataset, electricity, gas, contextual sensor data and survey data for 255 UK homes. Scientific Data, 8(1), pp.1-18.
Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs