About the Project
Large amounts of diverse types of data from multiple sources and from different study designs are increasingly available in all areas of science including in infectious disease epidemiology, health technology assessment and ‘omics (genomics, proteomics etc). Using all available information, rather just the “best quality” subset of information, typically provides more precise results and minimises the chance of selection biases skewing the results.
However, it may be unwise to use a joint "big model" of several sources of evidence, at least directly. For instance, it can be hard to diagnose problems, such as poor model fit, in large models. Computation may also be time-consuming or even infeasible. Instead, a modular approach is often preferable, in which separate sub-models are specified for smaller, more manageable, parts of the available data. Each sub-model is simpler (lower-dimensional) than the "big model" and so will be easier to work with.
Two different approaches can then be taken to join these submodels into overall results. We may wish to join the submodels into a standard joint Bayesian model, so that all data and uncertainty are fully accounted for, using an approach we recently developed called Markov melding that builds on ideas from the graphical models literature (https://arxiv.org/abs/1607.06779). However, it remains an open question how best to join the submodels computationally. Alternatively, we may not be comfortable assuming that all the submodels represent the “truth”, as we do in Markov melding. Instead, we may wish to regulate the influence of poor quality data and/or submodels to prevent unreliable submodels “contaminating” more reliable submodels. However, a formal framework for this approach is currently lacking.
This PhD project would suit a student interested in methodological and computational statistics, since there is considerable scope for new methodology and algorithms in this area. The PhD will involve working towards developing, implementing and assessing promising approaches. There is the potential to draw upon and extend ideas in the connected literatures that are developing in this area including divide-and-conquer/parallel computation methods for large-n "big data" (including "tall data"); newly-developed approximate methods for estimating the ratio of two densities; connections to sequential Monte Carlo; and metrics for assessing and comparing models when none are the "truth". There is also scope to study the application of these methods in substantive application areas, including in network meta analysis.
Funding Notes
The MRC Biostatistics Unit offers 4 fulltime PhDs funded by the Medical Research Council for commencement in October 2018.
Academic and Residence eligibility criteria apply.
More details are available at
(https://www.mrc-bsu.cam.ac.uk/training/phd/ )
Informal enquiries are welcome to [Email Address Removed] .
To be formally considered all applicants must also complete a University of Cambridge application form- full details here (https://www.mrc-bsu.cam.ac.uk/training/phd/ )
Projects will remain open until the studentships are filled but priority will be given to applications received by the 4th January 2018