1st Supervisor - Dr Jorge Cardoso 2nd Supervisor - Professor Richard Dobson
This project aims to leverage the EHR data collection infrastructure developed at KCH together imaging and non-imaging data, to provide diagnostic, prognostic information and operational outcomes in real time. On the technical side, this project will develop algorithms that can learn directly from multiple sources, can handle missing information, and can provide an explanation of its findings and conclusions. From an application point of view, the main aims are the diagnosis and prognosis of stroke patients, and improving stroke patient workflow.
Background KCH and the Maudsley have, 20 years ago, pioneered the transition to a fully digital patient Electronic Healthcare Record (EHR). This has been followed by a digital consolidation and a protocol bridging phase, culminating in an unprecedented access to patient care and clinical workflow data for auditing and research purposes. This digital research backbone has been recently refactored through the development of CogStack and SemanticEHR, a platform capturing and analysing clinical and text data live and aiming to improve patient care. While clinical events, patient letters and radiological reports have been tagged and databased, the information has not yet been converted into actionable live predictions that have impacted patient care.
This project aims to combine the multiple sources of available EHR information and the contents of neuro-imaging data to inform patient diagnosis, prognosis and disease progression, with a specific focus on stroke, and provide more precise models of how patients receive care in the NHS.
Key challenges EHR data is naturally multimodal. While imaging data is very rich, it is also unstructured and complex. On the other side, non-imaging data is summary in nature, but requires extensive tagging, curation and distillation to become informative and algorithmically tractable. In order to make this information intelligible, and jointly analyzable, three primary challenges need to be addressed during this PhD: 1) development of novel models that are able to jointly analyze imaging and non-imaging EHR data, while dealing with missing information. 2) use these models that diagnose and prognose neurological conditions, with a focus on strokes.. 3) understand how EHR data can predict the operational patient workflow, bed usage and resource allocation.
PhD plan Year 1) The semanticEHR tagging system will be extended to deal with imaging data. A joint model for image and text tagging will be developed, exploiting the commonalities in the representation. This model will make use of an attention model to explicitly link radiological reports, patient letters, patient ICD10 codes and the spatially-informative nature of neurological images.
Year 2) make use of the models trained in a highly multimodal setup to predict disease diagnosis and disease progression from single and multi-source data. Use the model attention to explain the predictions, towards auditable machine learning.
Year 3) explore ways to integrate predictions into the clinical workflow by targeting operations (prioritisation and dashboarding), safety (warning and helper systems), and by producing clinically useful predictions (measurements, diagnosis and prognosis)
Year 3.5) Algorithm deployment within CogStack and thesis write up.