University of Hong Kong Featured PhD Programmes
Birkbeck, University of London Featured PhD Programmes
Peter MacCallum Cancer Centre Featured PhD Programmes

Sparsity and structures in large-scale machine learning problems.

Cardiff School of Mathematics

About the Project

The terminology “big data” is generally used to refer to datasets that are too large or complex for traditional data-processing technics to adequately deal with them. The application to such datasets of modern machine learning techniques therefore raises many theoretical and numerical challenges. The numerical complexity inherent to the processing of a dataset indeed generally grows polynomially with its size, compromising de facto the analysis of very large datasets. In addition, the treatment of complex datasets often results in models involving a large number of parameters, making such models difficult to train while increasing the risk of overfitting and limiting their interpretability. Since such large-scale datasets are more and more common, their efficient processing is of great importance, not only at a purely scientific level, but also for many industrial and real-life applications.

In parallel with the use of high-performance computing solutions (e.g., parallelisation, computation using graphic processing units), many alternatives exist to try to overcome the difficulties inherent to the learning-with-big-data framework. For instance, problems related to the size of the datasets might be addressed through sample-size and dimension reduction techniques, while feature extraction, low-dimensional approximation or sparsity-inducing penalisation techniques might be used to prevent the model complexity to explode. Such operations need however to be applied with great care since they might have a significant impact on the quality of the final model, their effects being in addition often intrinsically connected. To make matters worse, the existing theory surrounding such approximation techniques is generally quite modest.

The main objective of this project is to investigate the design of efficient approaches to scale-up and improve state-of-the-art machine learning techniques, while providing theoretical guarantees on their behaviour. A special emphasis will be drawn on sample size reduction and feature extraction procedures based on the notion of kernel discrepancy (also referred to as maximum mean discrepancy). Thanks to its ability to characterise representative samples, this notion has recently emerged as a powerful concept in machine learning, statistics and approximation theory (cf. reference 1. in section 4.2); combined with auto-encoder techniques, it is for instance at the core of recent developments in Generative Adversarial Networks (the MMD-GAN method). Investigating to what extent this type of approaches can be generalised is one of the main motivations behind this project.


Applicants should submit an application for postgraduate study via the online application service:

In the research proposal section of your application, please specify the project title and supervisors of this project.

Funding Notes

We are interested in pursuing this project and welcome applications if you are self-funded or have funding from other sources, including government sponsorships or your employer.

In the funding section, please select the ’self -funding’ option and specify the project title

Email Now

Insert previous message below for editing? 
You haven’t included a message. Providing a specific message means universities will take your enquiry more seriously and helps them provide the information you need.
Why not add a message here

The information you submit to Cardiff University will only be used by them or their data partners to deal with your enquiry, according to their privacy notice. For more information on how we use and store your data, please read our privacy statement.

* required field

Your enquiry has been emailed successfully

Search Suggestions

Search Suggestions

Based on your current searches we recommend the following search filters.

FindAPhD. Copyright 2005-2020
All rights reserved.