Background: Until recently, detailed investigations of genetic regulatory processes have been limited to the study of a limited number of genes. With new generation DNA sequencing and techniques such as RNA-seq and ChIP-seq we now have the technology to measure both gene expression and regulatory inputs at a whole genome scale. These data present the challenge of understanding the how the expression of the complete genome is regulated, and this would be impossible without mathematical and statistical modeling. This project will address this challenge, using data from our collaborative projects associated with blood cell differentiation and leukaemia, and using public data such as that from the ENCODE project.
Objectives: The project will use statistical and machine learning methods to build models that relate gene expression outputs to regulatory inputs, and serve as mechanistic hypothesis for regulatory processes that could be further investigated in more detailed single gene or gene group experiments. We will develop novel clustering methods, aiming to simultaneously cluster genes by expression pattern or dynamics and shared regulatory inputs, as well as linear models and other machine learning methods to relate regulatory inputs to expression outputs. We will also use penalized regression methods to identify key regulatory variables within defined gene sets, and develop novel methods whereby different models can be fitted simultaneously in gene sets with different regulatory codes.
Novelty: To date work in this field has been largely descriptive, with the complexity and scale of the data preventing the development of clear mechanistic hypotheses. This project aims to move the field forward significantly to the generation of genome scale predictive models for gene expression.
Lichtinger, M., Ingram, R., Hannah, R., Muller, D., Clarke, D., Assi, S.A., Lie-A-Ling, M., Noailles, L., Vijayabasker, M.S., Wu, M., Tenen, D.G., Westhead, D.R., Kouskoff, V., Lacaud, G., Gottgens, B., Bonifer, C. 2012. RUNX1 reshapes the epigenetic landscape at the onset of hematopoiesis. EMBO Journal doi:10.1038/emboj.2012.275.
Ptasinska A, Assi SA, Mannari D, James SR, Williamson D, Dunne J, Hoogenkamp M, Mengchu W, Care M, McNeill H, Cauchy P, Cullen M, Tooze R, Tenen DG, Young BD, Cockerill PN, Westhead DR, Heidenreich O, Bonifer C. 2012. Depletion of RUNX1/ETO in t(8;21) AML cells leads to genome-wide changes in chromatin structure and transcription factor binding. Leukaemia 26(8):1829-41.
Yu, J., Guo, M., Needham, C.J., Huang, Y., Cai, L., Westhead, D.R. Simple sequence-based kernels do not predict protein-protein interactions. Bioinformatics 26(20):2610-2614.
Whitaker, J.W., McConkey, G.A. and Westhead, D.R. The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes. Genome Biology 10: R36.