Recent advances in machine learning have already achieved superhuman performance in a wide range of applications. This success of machine learning relies on availability of large training labelled datasets. However, for many applications it is hard or impossible to collect clean labelled data where all labels are correct. Then before any machine learning algorithm can be applied the data points with incorrect labels (outliers) should be filtered out. In other applications there is a direct interest to find outliers in data. These applications include fraud detection, cyber-attack detection, security surveillance to detect any dangerous situations, etc. Outlier detection is also known as anomaly, novelty, or out-of-distribution detection.
Existing methods are based on different assumptions on label availability. Supervised setting assumes that at training time labels for both inlier and outlier data points are available and methods are based on binary classification. Semi-supervised setting assumes that at training time data does not contain any outliers and at test time methods detect anything deviating from the training data as an outlier. In unsupervised settings no data labels are available and data represents a mixture of inliers and outliers both at training and test times. Unsupervised outlier detection is the most challenging case.
This PhD project will investigate the unsupervised outlier detection problem for high-dimensional data such as images, which goes beyond the area of applicability of classic methods for this problem. A very limited number of existing methods tackle the problem of outlier detection in this formulation [1] and yet this is the most general formulation that is commonly required in practice due to the lack of labelled data. Therefore, the project will require development of novel machine learning methods able to process high-dimensional data, e.g. using deep learning, that will bring significant impact by tackling a currently open problem.
In addition to delivering impact by developing a novel general methodology, the project will have further direct social impact as part of a UNICEF collaboration. Project partners will provide data for real-world applications that require unsupervised outlier detection and the outcome of the project will be deployed by UNICEF.
This research project will be carried out as part of the UKRI Centre for Doctoral Training in Accountable, Responsible and Transparent AI (ART-AI). Successful detection of outliers in given data is essential for further building of a trustworthy and reliable machine learning algorithm for any downstream task since a machine learning algorithm can only be as good as its training data and if we want to trust the algorithm we first need to trust the data it is trained on. Further details about ART-AI can be found at: https://cdt-art-ai.ac.uk/.
Informal enquiries about the project should be directed to Dr Isupova ([Email Address Removed]).
Candidates are expected to have or be near completion of an MSc or MEng in Computer Science, Mathematics, Statistics or related areas. A strong mathematical background and programming experience are desirable.
Enquiries about the application process should be sent to [Email Address Removed].
Formal applications should be made via the University of Bath’s online application form.
Start date: 4 October 2021.