• University of Macau Featured PhD Programmes
  • University of Stirling Featured PhD Programmes
  • University of Surrey Featured PhD Programmes
  • University of Manchester Featured PhD Programmes
  • University of Exeter Featured PhD Programmes
  • University of Birmingham Featured PhD Programmes
  • Northumbria University Featured PhD Programmes
University of Warwick Featured PhD Programmes
Anglia Ruskin University Featured PhD Programmes
FindA University Ltd Featured PhD Programmes
Coventry University Featured PhD Programmes
University of Sheffield Featured PhD Programmes

A Fully-Parallel Alternative to MCMC (industrial CASE award with IBM Research)

This project is no longer listed in the FindAPhD
database and may not be available.

Click here to search the FindAPhD database
for PhD studentship opportunities
  • Full or part time
    Prof S Maskell
    Dr J Thiyagalingam
  • Application Deadline
    No more applications being accepted
  • Funded PhD Project (European/UK Students Only)
    Funded PhD Project (European/UK Students Only)

Project Description

The aim of this PhD is to develop techniques for implementing state-of-the-art Bayesian techniques in ways that fully exploit the computational power of modern and next generation many-core architectures and systems (such as multicore CPUs, GPUs, Xeon Phis and super-computing clusters). This will link closely to a large current research project (“Big Hypotheses”, see here: http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=EP/R018537/1) and pull on previous work related to high performance computing, Big Data and Bayesian statistics. The aim is to solve difficult problems related to applications relevant to the IBM Research lab at Daresbury.

Markov-Chain Monte Carlo (MCMC) is a numerical Bayesian method that allows high-fidelity physical models to be combined with data to make inferences in the presence of pronounced uncertainty. Improvements to MCMC have historically focused on algorithmic advances, involving, for example, the use of local gradient information and of gradually migrating from an easy reference problem to the problem of interest. Particularly with these improvements, MCMC is an effective solution to the vast number of problems that can be posed as inferences involving data using statistical models. In the context of any one problem, bespoke optimisation can be used to exploit the available (parallel) computational resources. However, because MCMC fundamentally uses the evolution of a single Markov-Chain to convey uncertainty, such optimisation is necessarily problem-specific. There is therefore little scope to develop a generic MCMC implementation that fully exploits parallel processing architectures. As a result, the ability of MCMC to provide solutions to next-generation problems is limited.

Sequential Monte Carlo (SMC) samplers can solve the same problems as MCMC. In contrast to MCMC, SMC samplers use the diversity of a population of samples to convey uncertainty. For the majority of the operation of an SMC sampler, each sample is processed independently. This makes it trivial to parallelise the majority of the SMC sampler. However, at a specific point in the SMC sampler, it becomes necessary to perform a “resampling” step. A text-book implementation of this resampling step is impossible to parallelise in a scalable fashion. However, previous research has rearticulated the resampling operation as a divide-and-conquer algorithm. In so doing, it becomes possible to parallelise the resampling step. More recent work has identified that, by carefully considering data locality and pipelining and by making appropriate use of middleware (e.g., MPI and OpenMP), it is possible to implement the resampling algorithm in such a way that using more cores results in faster operation.

There is a need for this research to be developed to provide implementations of an SMC sampler that fully exploit multicore CPUs, GPUs, Xeon Phis and super-computing clusters. The aim is to have implementations which can dramatically outperform MCMC and to use these implementations to solve pertinent problems.

The PhD will therefore comprise three main strands of research on: understanding the problem and solution space (i.e., applying off-the-shelf MCMC algorithms to problems relevant to IBM Research); enhancing existing implementations of SMC samplers that fully exploit some exemplar many-core architectures (as being developed by “Big Hypotheses”); using these implementations (embodied in frameworks such as Stan) to solve some exemplar problems that are directly relevant to IBM research.

The PhD includes components that pull on Computer Science, Statistics, and Engineering and is at the intersection of these three academic disciplines. The successful applicant will have experience in one of these domains and will gain experience in the others. It is also anticipated that the successful applicant will gain experience on a number of HPC technologies (e.g., MPI, CUDA and OpenMP) and that the project will enable the student to enhance valuable programming skills (e.g., in Python, Java and C++).

Funding Notes

The PhD will be funded for 4 years by an industrial CASE award and includes a top-up of £4500 per year over and above fees and the stipend associated with a standard EPSRC-funded PhD. To be eligible, applicants must be have British or other EU nationality. Extensive collaboration with IBM Research is expected.

Let us know you agree to cookies

We use cookies to give you the best online experience. By continuing, we'll assume that you're happy to receive all cookies on this website. To read our privacy policy click here