Improved open-source data management solutions for translational medicine in the genomics era.
No more applications being accepted
Competition Funded PhD Project (European/UK Students Only)
The Science of the Project:
Translational medicine seeks to combine clinical data with basic-research knowledge towards novel diagnostics and treatments (“bench-to-bedside”) that are effective in subgroups (“stratified medicine”) or single individuals (“personalised medicine”). The key challenge that the field faces, but also one of its main opportunities, is the vastly diverse and dispersed array of possible input data. We have experience handling these data in local, national and international projects. In general, it entails harnessing a plethora of data from basic research (including omics content), observational studies (thousands of patients), and clinical data, including information derived from hospital equipment (e.g. CT scans, radiotherapy) and biomedical assays (e.g. qPCR, immunoassays, microbiomics, apoptosis assays). Ethico-legal and governance considerations are critical in ensuring that data are used in accordance with given consents, align with institutional needs, protect patient privacy and afford maximal utility within consortia and beyond.
Two key areas of unmet need have been identified where innovation and PhD student training would be attractive:
1) Better connecting healthcare and research data
Open source platforms exist for clinical and biomedical data, e.g. i2b2 (http://i2b2.org) & tranSMART (http://transmartfoundation.org), supported by developer communities. However, both have major shortcomings, reflecting developers’ focus on research rather than the nature of healthcare data. Specifically, the systems share a common data model with a core entity-attribute-value table. We and others have found that this: fails to satisfactorily model complex temporal data and relationships (which are key to making sense of clinical information); cannot handle very large data sets such as genotypic observations; and poorly supports diverse data federation arrangements for bringing different database instances together.
Therefore, this PhD project will use datasets available to us via collaborations and co-funded projects, to explore how this data model can be adapted and connected to more abstract information management systems (such as “star-schema” data marts and NoSQL technologies), to solve the above problems. An initial focus will be on i2b2/tranSMART for observational studies, particularly with respect to the timings of, and relationships between, clinical events. Additionally, we will test formats and technologies for storing and integrating genomics data, especially considering methods to index these data at minimal computational expense.
This work will be done in concert with open-source communities, and so there will be an onus on the student to build on existing relationships in a strong network with other like-minded developers.
2) Making translational data maximally discoverable and shareable
There is a fundamental need to make translational medicine consortia data more widely available and exploitable. This means making it easier to mine potentially useful data, and to establish whether the data may be re-purposed under given conditions.
Through this PhD project, we intend to upgrade our existing and widely adopted discovery data models and software to handle full translational medicine data. Given data sensitivities, this will incorporate a standard matrix that specifies and makes computer-readable the pertinent consent and data-use conditions (a Global Alliance for Genomics and Health Task Team we lead), and exploit graphical interfaces for searching and displaying discovery results without exposing personally sensitive or identifiable information. This is an untapped area of research that we can advance rapidly with our existing collaborators, to have a significant impact on the field.
The ideal applicant will have an outstanding understanding of Computer Science or Bioinformatics with a degree at BSc or MSc level (preferably the latter). Applicants with degrees in Biological Sciences are discouraged from applying unless they can demonstrate advanced computing skills including creating relational databases and have the ability to code original applications in languages commonly used in bioinformatics, such as Python, Perl or C/C++. We know from experience that we can teach the necessary genetics skills to academically bright students with little or no prior knowledge of that subject.
The University of Leicester is providing funding for a small number of PhD studentships. It is probable that there will be more applicants than studentships and the successful candidates will be chosen in competition with one another on the basis of academic merit. Only UK and EU applicants are eligible to apply.