Birkbeck, University of London Featured PhD Programmes
University of Kent Featured PhD Programmes
Engineering and Physical Sciences Research Council Featured PhD Programmes
Sheffield Hallam University Featured PhD Programmes
The Hong Kong Polytechnic University Featured PhD Programmes

Sparsity and structures in large-scale machine learning problems.

Project Description

The terminology “big data” is generally used to refer to datasets that are too large or complex for traditional data-processing technics to adequately deal with them. The application to such datasets of modern machine learning techniques therefore raises many theoretical and numerical challenges. The numerical complexity inherent to the processing of a dataset indeed generally grows polynomially with its size, compromising de facto the analysis of very large datasets. In addition, the treatment of complex datasets often results in models involving a large number of parameters, making such models difficult to train while increasing the risk of overfitting and limiting their interpretability. Since such large-scale datasets are more and more common, their efficient processing is of great importance, not only at a purely scientific level, but also for many industrial and real-life applications.

In parallel with the use of high-performance computing solutions (e.g., parallelisation, computation using graphic processing units), many alternatives exist to try to overcome the difficulties inherent to the learning-with-big-data framework. For instance, problems related to the size of the datasets might be addressed through sample-size and dimension reduction techniques, while feature extraction, low-dimensional approximation or sparsity-inducing penalisation techniques might be used to prevent the model complexity to explode. Such operations need however to be applied with great care since they might have a significant impact on the quality of the final model, their effects being in addition often intrinsically connected. To make matters worse, the existing theory surrounding such approximation techniques is generally quite modest.

The main objective of this project is to investigate the design of efficient approaches to scale-up and improve state-of-the-art machine learning techniques, while providing theoretical guarantees on their behaviour. A special emphasis will be drawn on sample size reduction and feature extraction procedures based on the notion of kernel discrepancy (also referred to as maximum mean discrepancy). Thanks to its ability to characterise representative samples, this notion has recently emerged as a powerful concept in machine learning, statistics and approximation theory (cf. reference 1. in section 4.2); combined with auto-encoder techniques, it is for instance at the core of recent developments in Generative Adversarial Networks (the MMD-GAN method). Investigating to what extent this type of approaches can be generalised is one of the main motivations behind this project.

Funding Notes

In the funding section please confirm you have sourced your own funding by selecting 'YES' to self-funding


Applicants should submit an application for postgraduate study via the online application service.

In the research proposal section of your application, please specify the project title and supervisors of this project and copy the project description in the text box provided.

Related Subjects

How good is research at Cardiff University in Mathematical Sciences?

FTE Category A staff submitted: 24.05

Research output data provided by the Research Excellence Framework (REF)

Click here to see the results for all UK universities

Email Now

Insert previous message below for editing? 
You haven’t included a message. Providing a specific message means universities will take your enquiry more seriously and helps them provide the information you need.
Why not add a message here
* required field
Send a copy to me for my own records.

Your enquiry has been emailed successfully

FindAPhD. Copyright 2005-2019
All rights reserved.