Rationale New technologies mean we are now able to collect data on thousands or millions of different molecular variables in parallel in an individual sample. The application of these in large population samples like the Avon Longitudinal Study of Parents and Childhood (ALSPAC) is creating extremely powerful data resources for understanding the mechanisms of health and disease. In addition, the availability of a wealth of molecular data in public databases enables rich annotation and enhancement of population-based studies. However, data integration and analysis of these high- dimensional datasets presents specific computational challenges, and new methods are required to maximize their potential.
Aims & Objectives The aim of this bioinformatics project is to develop new approaches to data integration and visualisation in high-dimensional omics datasets. The project will be based on one disease area (to be agreed with the student), and will develop methods for prediction of outcomes, for discovery of aetiological mechanisms and for the ranking of molecular pathways by relevance.
Methods Beginning with methylation, genetic, expression and metabolomic data generated on samples from the ALSPAC cohort, the student will develop methods to integrate these datasets, identify interesting patterns in the data and relate these to data from public databases. The project will initially be focused on one specific mechanism or phenotype (to be agreed with student), but the expectation would be that the methods developed will be widely applicable.
Methods will be developed as part of the project, but amongst many others may involve kernel methods, graph-based data integration, GPGPU approaches, data federation and consolidation.