Log-linear modelling is the standard approach for investigating the full joint dependence structure between categorical variables. Applications include discerning the relation between phenotypes and environmental, anthropometric or genetic risk factors. Complex dependence structures can be easily discerned using graphical log-linear models (Papathomas and Richardson, 2016). This can lead to the identification of functionally important pathways. Another application concerns the size of hidden populations, such as victims of modern slavery (Cruyff, M., Overstall, Papathomas, McRea (2020)). The number of cells in the associated contingency table increases rapidly with the number of variables, creating sparse contingency tables with a number of zero cell counts, even for a large number of subjects. The presence of zero cell counts can potentially make some model parameters non-estimable, also referred to as non-identifiable (Sharifi Far, Papathomas, King, 2019). Non-identifiability is a major impediment to evaluating how factors interact, and understanding important biological mechanisms. Problems associated with identifiability are currently not sufficiently understood, and have not been addressed in a systematic manner. The aim of this project is to develop methods that will utilize information pertaining to the Bayesian identifiability of interaction parameters, towards choosing the best log-linear model given the data.
For more information, please see the School's Postgraduate Research page, and in particular the information about Statistics PhD opportunities.