University of Birmingham Featured PhD Programmes
University of Dundee Featured PhD Programmes
University of Edinburgh Featured PhD Programmes
Norwich Research Park Featured PhD Programmes
Life Science Zurich Graduate School Featured PhD Programmes

Strategies of handling missing data in federated data network #NDORMS 2020/9

About This PhD Project

Project Description

The use of electronic healthcare data is increasingly being proposed as a source of evidence to support drug development and regulatory decision making but also to understand the pathogenesis of diseases.
The use of multiple electronic health care databases is recommended, not only to increase sample size but also to investigate country specific differences, differences by type of databases (e.g. primary vs. secondary care) and to replicate findings. A main challenge relies on the heterogeneity between databases with regard to the underlying structures and semantic mapping. The interoperability of the data can be strongly improved by standardising the data to the Observational Medical Outcomes Partnership (OMOP) common data model (CDM).

The European Health Data & Evidence Network (EHDEN) ( Consortium is an Innovative Medicines Initiative (IMI) funded project. We aim to provide a new paradigm for the discovery, standardization and analysis of health data in Europe, by building a large-scale, sustainable federated network of data sources mapped to the OMOP-CDM. This includes implementing a platform with tools to perform data standardization, analytical pipelines, to share study results, and for stakeholder engagement and training . Analytical tools incorporated in the EHDEN platform will be based on codes that have been developed and used by the Observational Health Data Sciences and Informatics (OHDSI) community (

This EHDEN funded DPhil project will aim to develop strategies on handling missing EHR data on top of the OMOP-CDM, and to develop R analytical codes for missing data imputation on the EHDEN/OHDSI platform. This project will involve literature review on handling missing HER data in multiple database studies, followed by assessing the completeness of OMOP-CDM data, i.e. to identify the source of missing data. This DPhil candidate will implement R analytical packages for handling missing data on the top of OHDSI existing analytical methods, e.g. patient-level prediction, and risk effect estimation including propensity score methods.

This DPhil studentship presents an exciting opportunity to join an interdisciplinary EHDEN consortium and other OHDSI collaborators with expertise in software developers, clinical researchers, epidemiologists, data scientists and statisticians. This DPhil project will be suitable for a candidate who wishes to develop analytical tools for real-world medical decision making.

Essential and Desired Qualifications/Experience

Essential Criteria:
• Hold or be about to obtain a first or upper second class BSc degree or a Master degree (or equivalent) in subjects relevant to statistics, maths or data science.
• Proficient in R programming.

Additional Qualifications:
• Should have a commitment to research in the applied health sciences.
• A good team player as well as work independently.
• Experience of developing statistical methods/software tools in R would be an advantage.
• Understanding electronic healthcare data would be an advantage.

Details of the Research Group

The DPhil will be jointly supervised by Prof Prieto-Alhambra (Professor of Pharmaco- and Device Epidemiology and Theme Lead for Observational Research), Dr Victoria Strauss, Dr Antonella Delmestri, all members of the Centre for Statistics in Medicine, NDORMS, University of Oxford; and by Prof Irene Petersen, University College London. The research will be conducted with the Pharmaco- and Device Epidemiology Research Group (, at the premises of the Botnar Research Centre, in Oxford, UK. Supervision meetings with Prof Petersen will be organized regularly, to be held either in Oxford or in London. The DPhil candidate will also work closely with other EHDEN consortium (both academic institution and industries) as well as OHDSI collaborators to provide the candidate with cutting-edge environment to develop their career.

Prof Daniel Prieto-Alhambra ( has published extensively in the field of pharmaco-epidemiology, and is recognised internationally as an authority on use of routine data for pharmaco- and device epidemiology and related methods. He is the EHDEN work-package one lead on the evidence workflow development. He will be the primary supervisor and will oversee the guideline for the DPhil student.

Dr Victoria Strauss ( is a Senior Statistician. She has extensive expertise in the use, validation and development of statistical methods, both for the analysis of routinely collected data as well as in randomized clinical trial settings. She has been involved in the EHDEN project from beginning. She will provide close supervision under the guideline of Prof Prieto-Alhambra.

Dr Antonella Delmestri ( is Senior Database Manager with a background of computer scientist and software engineer. She has been involved in mapping electronic healthcare data to OMOP-CDM. She will provide support in OMOP-CDM.

Prof Irene Petersen ( (University College London) is a most experienced researcher in the field of handling missing data in routinely collected health data. She will provide expert guideline for this DPhil project.

Current DPhil Students within the pharmaco-epidemiology research group: 6


Training will be provided in relevant related research methodology, including the handling and analysis of large datasets, OHDSI analytical tools and OMOP-CDM structure. Attendance at formal training courses will be encouraged, and will include the "Real world epidemiology Oxford summer school" directed by Prof Prieto-Alhambra, and the pre-conference course/s offered by the International Society of Pharmaco-epidemiology, amongst others.

A core curriculum of lectures organized departmentally will be taken in the first term to provide a solid foundation in a broad range of subjects including epidemiology, health economics, and data analysis.

How to apply

The department accepts applications throughout the year but it is recommended that, in the first instance, you contact the relevant supervisors or the Graduate Studies Officer, Sam Burnell (), who will be able to advise you of the essential requirements.

Interested applicants should have or expect to obtain a first or upper second class BSc degree or equivalent, and will also need to provide evidence of English language competence. The application guide and form are found online and the DPhil will commence in October 2020.

For further information, please visit and/or contact Dr Victoria Strauss () and Prof Prieto-Alhambra ()


1. Pedersen AB; Mikkelsen EM; Cronin-Fenton D; Kristensen NR; Pharm TM; Pedersen L; Petersen I. Missing data and multiple imputation in clinical epidemiological research. Clinical Epidemiology 2017: 9 157-166.
2. Petersen I; Welch CA; Nazareth I; Walters K; Marston L; Morris RW; Carpenter JR; Morris TP; Pham TM. Health indicator recording in UK primary care electronic health records: key implications for handling missing data. Clinical Epidemiology 2019: 11 157-167.
3. Hripcsak G, Duke JD, Rijnbeek PR, van der Lei J, et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Studies in health technology and informatics 2015;216:574-8
4. Rijnbeek PR. Converting to a common data model: what is lost in translation? Drug Saf 2014;37(11):893-6

Email Now

Insert previous message below for editing? 
You haven’t included a message. Providing a specific message means universities will take your enquiry more seriously and helps them provide the information you need.
Why not add a message here
* required field
Send a copy to me for my own records.

Your enquiry has been emailed successfully

FindAPhD. Copyright 2005-2019
All rights reserved.