Improved data management solutions for stratified and personalised medicine.
Prof R Dalgleish
Prof A J Brookes
No more applications being accepted
Competition Funded PhD Project (European/UK Students Only)
Stratified and personalised medicine combines clinical and research knowledge towards targeted diagnostics and treatments (“bench-to-bedside”) in subgroups and individuals respectively. The challenge and the opportunity at the heart of this goal entails being able to manage and exploit vastly diverse and dispersed sources of large-scale input data. We have experience handling these data in local, national and international projects. Progress entails harnessing a plethora of resources from basic research (including omics), observational studies (many thousands of patients), and clinical data, including information derived from hospital equipment (e.g. CT scans, radiotherapy) and biomedical assays (e.g. qPCR, immunoassays, microbiomics, apoptosis assays). Ethico-legal and governance considerations are critical in ensuring that data, which are often personal and sensitive, are used in accordance with given consents, align with institutional needs, protect patient privacy and afford maximal utility within consortia and beyond.
We have the necessary critical mass of skilled post-docs to provide support to a PhD student in the various aspects of bioinformatics that will be appropriate to his or her studies. Two key areas of unmet need have been identified where innovation and PhD student training would be attractive:
1) Better connecting healthcare and research data
Open source platforms exist for clinical and biomedical data, e.g. i2b2 (http://i2b2.org) & tranSMART (http://transmartfoundation.org), supported by developer communities. However, both have major shortcomings, reflecting developers’ focus on research rather than healthcare objectives. Specifically, the systems share a data model based on a core entity-attribute-value table. We and others have found that this: fails to satisfactorily model complex temporal data and relationships (which are key to making sense of clinical information); cannot efficiently handle large data sets such as genotypic observations; and poorly supports diverse data federation for co-analysis of different database content.
Therefore, this PhD project will use datasets available to us via collaborations and co-funded projects, to explore how this data model can be adapted and connected to more abstract information management systems (such as “star-schema” data marts and NoSQL technologies), to solve the above problems. An initial focus will be on making observational datasets more effectively and yet safely shared in federated networks, stressing time and causality relationships between data items. Additionally, we will test out formats and technologies for storing and integrating genomics data, especially considering ways to index these data at minimal computational expense.
This work will be done in concert with open-source communities, and so there will be an onus on the student to build on existing relationships in a strong network with other like-minded developers.
2) Making biomedical data and their conditions of use optimally discoverable
Biomedical data need to be made available to third parties, with conditions of use being transparently stated and fully respected. Systems that enable data ‘discovery’ via summary-level characteristics with the ability to impose conditions and limitation of use are essential to this process.
We will develop our existing discovery models and software to handle the full spectrum of stratified and personalised medicine data. This will incorporate the use of a new internationally-developed (GA4GH) standard, in which our team took the lead, for stating conditions of use. It will also exploit flexible, modular graphical interfaces for searching and displaying discovery results, without exposing personally sensitive or identifiable information, and tools for the validation of gene variant data based on our in-house VariantValidator web service. This is a leading area of research that we can advance rapidly, achieving significant impact on the field.
The ideal applicant will have an outstanding understanding of Computer Science or Bioinformatics with a degree at BSc or MSc level (preferably the latter). Applicants with degrees in Biological Sciences are discouraged from applying unless they can demonstrate advanced computing skills including creating relational databases and have the ability to code original applications in languages commonly used in bioinformatics, such as Python, Perl or C/C++. We know from experience that we can teach the necessary genetics skills to academically bright students with little or no prior knowledge of genetics.
The studentships, jointly financed by the Department and the College of Medicine, Biological Sciences and Psychology, are open to Home/EU students and offered at the standard Research Council UK rate.
Applicants should expect to hold a 1st or 2.1 BSc in a relevant field by the end of September 2017 when the studentship will commence. Those holding a 2.2 degree plus a Master’s degree or >3 years relevant post-graduate experience may be eligible. Candidates with degrees from abroad may be eligible if their qualifications are deemed equivalent.