This project is focused on using machine learning to emulate computationally expensive calculations. The emulator can then be used to answer pertinent questions that are impossible to answer otherwise. The aim is to mirror successes that have been achieved using similar approaches in, for example, chemical formula formulation (where a two thirds reduction in compute required has been reported) and drug discovery (where a 95% reduction in the compute requirement needed for a certain objective has been reported). This project has been co-defined with and will be co-supervised by Dstl.
The specific motivation relates to hydrocodes, high-fidelity (and highly optimised) simulations of fluid dynamics which involve computationally expensive calculations pertaining to the chemistry and physics involved. Individual simulations can take days, even with supercomputers. Were it possible to use historic simulations to learn to emulate the calculations involved, the emulator could then be used to perform offline sensitivity analyses with respect to, for example, the parameters of the chemistry and physics: such sensitivity analyses are, at best, limited today, making it very challenging to identify opportunities to, for example, reduce the number of inputs to the hydrocode. Given the parameters are not known precisely, it is also desirable for any online use of the hydrocode to consider the uncertainty in those parameters in the calculation of any prediction. However, such Uncertainty Quantification (UQ) would demand multiple runs of the hydrocode. Given that it would be impossible to perform one simulation in an online setting, an emulator is a vital component of any online UQ.
While one could use, for example, a Gaussian Process (GP) to implement the emulator, it is not clear what kernel should be used, ie how any emulator should interpolate between the input-output pairs associated with the hydrocode. The statistical inference of the kernel is challenging, particularly in this setting, where there will be a need to interpolate between the information in the historic simulations and the prior knowledge (albeit incomplete and imprecise) of kernels implied by the knowledge of the physics and chemistry. Emerging numerical Bayesian inference algorithms (specifically Sequential Monte Carlo samplers) make it possible to capitalise on high performance computing without compromising the fidelity of that inference process.
The aim of this PhD is to take a specific hydrocode and to examine how these approaches can be used to expedite analysis. The aim is to develop a single integrated approach to analysing and speeding up UQ on complex systems that is underpinned by a synergistic understanding of computer science and statistics. The anticipation is that this integrated approach would be sufficiently generic and transferable that it could be readily applied to other, similar problems.
This project is part of the EPSRC Funded CDT in Distributed Algorithms: The What, How and where of Next-Generation Data Science. https://www.liverpool.ac.uk/research/research-themes/digital/cdt-distributed-algorithms/
The University of Liverpool is working in partnership with the STFC Hartree Centre and other industrial partners from the manufacturing, defence and security sectors to provide a 4 year innovative PhD training course that will equip over 60 students with the essential skills needed to become future leaders in data science, be it in academia or industry.
Every project within the centre is offered in collaboration with an Industrial partner who as well as providing co-supervision will also offer the unique opportunity for students to access state of the art computing platforms, work on real world problems, benchmarking and data. Our graduates will gain unparalleled experiences working across academic disciplines in highly sought-after topic areas, answering industry need.
As well as learning from academic and industrial world leaders, the centre has a dedicated programme of interdisciplinary research training including the opportunity to undertake modules at the global pinnacle of Data science teaching. A large number of events and training sessions are undertaken as a cohort of PhD students, allowing you to build personal and professional relationships that we hope will lead to research collaboration either now or in your future.
The learning nurtured at this centre will be based upon anticipation of the hardware recourses arriving on desks of students after they graduate, rather than the hardware available today.