Bayesian Learning for Sparse High-Dimensional Data at University of Liverpool on FindAPhD.com

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities

Prof S Maskell No more applications being accepted Funded PhD Project (UK Students Only)

Liverpool United Kingdom Applied Statistics Electronic Engineering

About the Project

This PhD project is part of the University of Liverpool's Centre for Doctoral Training in Distributed Algorithms (CDT): The What, How and Where of Next-Generation Data Science.

The CDT works in partnership with STFC Hartree Centre and 20+ external partners from the manufacturing, defence and security sectors. Together they will provide a 4-year innovative PhD training course that will equip over 60 students with the essential skills needed to become future leaders in distributed algorithms, with the technical and professional networks needed to launch a career in next generation data science and future computing and the confidence to make a positive difference in society, the economy and beyond.

The successful PhD student will be co-supervised by Professor Simon Maskell and work alongside our external partner QinetiQ.

This project is focussed on understanding uncertainty in machine learning models trained on limited datasets. There are many problems where the number of data points is small relative to the number of features. Typical solutions assume independence of features or use dimensionality reduction to learn a maximum likelihood projection of the data. For small data sets, learnt models are critically dependent on the actual data points used. The project will investigate whether Bayesian methods can be used to characterise the uncertainty of estimated parameters efficiently when developing machine learning models for sensor signal time series.

QinetiQ is a defence and security company with specialist expertise in Artificial Intelligence (AI), Analytics, and Advanced Computing, and works with an array of industry and academic partners to further the research, develop leading-edge solutions and services, and provide advice to customers in these fields.

QinetiQ’s Data Science experts work on a range of projects using a variety of Machine Learning techniques to discover patterns in, analyse, classify, and verify data. Statistical methods are widely used by our mathematicians, physicists, computer scientists, and engineers, often in conjunction with the Human Performance team to help customers solve a wide range of engineering problems. QinetiQ leads in Explainable AI for safety-critical and National Security domains, specifically providing “diagnostic” information to a user on why a particular decision has been taken. This is essential to building trust for Human-Machine Teams, particularly where decisions need to be justified. Adversarial AI is a vital area for defence and security customers, where enemies increasingly try to subvert or “game” AI systems. QinetiQ has several projects ongoing that use these techniques.

Much recent progress in machine learning has relied on the availability of large datasets, which allows the development of complex models. However, many problems in defence and security do not have access to such data, either because they require use of less widely studied sensors (such as sonar) or they relate to adversaries, who strive to limit data about their activities. Most published models rely on point estimates of parameters, achieved through algorithms such as maximum likelihood or stochastic gradient descent. However, when this type of model is applied in situations with limited data, the uncertainty associated with parameter estimates is usually not taken into account, either when integrating machine learning models into wider systems, or when assessing performance to predict how the model might behave in operational scenarios. Even when other approaches to deal with limited datasets are used, such as transfer learning, uncertainty characterisation is still important as there is often a mismatch between the distribution of the pre-training and training datasets.

This project aims to investigate to what extent Bayesian methods can be used to characterise the uncertainty of estimated parameters when dealing with sparse but potentially high-dimensional data sets, and how this can be implemented in a distributed computing setting.

The expected outcome of the project is the development of suitable Bayesian algorithms, along with a software implementation, and an analysis of algorithm performance on relevant datasets.

The research will start with a literature review into appropriate approaches, which could include Variational Bayesian methods, Markov Chain Monte Carlo (MCMC), Sequential Monte Carlo (SMC), Approximate Bayesian Computation (ABC), and other approximate methods. Consideration will be given to the computational feasibility of the algorithms, including the extent to which computing can be distributed to multiple processors or virtual machines in a cloud infrastructure and the transparency (confidence) and performance improvements the various approaches could provide. Suitable innovative techniques will be developed, assessed, and compared against baseline approaches. The algorithms will be applied to a number of sponsor-supplied datasets, such as sonar sensor or electrical device measurement time-series. The research will be determine the extent to which the uncertainty representation accommodates operational data that may not have the same distribution as the training data. Based on discussions with the sponsor and an analysis of the results, industrially relevant scenarios where the algorithms can be used will be identified.

This project is due to commence on 1 October 2023.

Students will be based at the University of Liverpool and will be part of the CDT and Signal Processing research community - a large, social and creative research group that works together solving tough research problems. Students are supervised by two academic supervisors and an industrial partner who provides project direction, placements and the opportunity to work on real world challenges. In addition, students attend technical and professional training to gain the skills needed to work at the interface of academia and industry.

The CDT is committed to providing an inclusive environment in which diverse students can thrive. The CDT particularly encourages applications from women, disabled and Black, Asian and Minority Ethnic candidates, who are currently under-represented in the sector. We can also consider part time PhD students. We also encourage talented individuals from various backgrounds, with either an UG or MSc in a numerate subject and people with ambition and an interest in making a difference.

The studentship is open to: UK nationals capable of gaining SC clearance.

Contact the supervisors (named above) in the first instance or visit the CDT website for Director, Student Ambassador and Centre Manager details.

Application Web Address:

Visit the CDT website for application instructions, FAQs, interview timelines and guidance.

Funding Notes

This project is a funded Studentship for 4 years in total and will provide UK tuition fees and maintenance at the UKRI Doctoral Stipend rate £18,622 per annum, 2023/24 rate).
You must enter the following information:
Admission Term: 2023-24
Application Type: Research Degree (MPhil/PhD/MD) – Full time
Programme of Study: Electrical Engineering and Electronics – Doctor in Philosophy (PhD)
The remainder of the guidance is found in the CDT application instructions on our website: https://www.liverpool.ac.uk/distributed-algorithms-cdt/apply/

Where will I study?

University of Liverpool

We are curious problem solvers at the forefront of world-leading research. A hotbed of discovery and innovation, we're committed to addressing some of the toughest challenges in our society, both now and in the future. Join us in the search for answers by pursuing your PhD at the Original Redbrick.