This project concerns the use of artificial intelligence (AI) to correct errors in individual medical records. There has been a great deal of interest in the application of data analytics (“big data”) to such records. With millions of healthcare records potentially available, there is clear interest in spotting patterns between symptoms, diagnoses, and treatments. However, there is plenty of evidence that a large proportion of individual medical records contain errors, rendering the data analytics prone to “garbage in, garbage out”.
NHS Digital have investigated whether data quality failures could be detected in national data returns using the diagnosis of dementia as an example. The data show that of the 317,000 patients diagnosed with dementia between April 2010 and March 2015, only 51% of these had a recorded diagnosis of dementia when admitted to hospital during the year 2014/15. Clearly the dementia had not gone away, so the records were flawed. A separate study of emergency admissions data by Dr Tom Hughes of John Radcliffe Hospital found that 40% of patients have no diagnosis at all and that, of those that do, nearly half are meaningless, vague or merely a symptom.
In this context, big data approaches are important but insufficient for extracting useful information. The data need to be checked for inconsistencies and repaired. This is a challenging problem for an AI system. While some data records may be impossible, such as the ‘cured’ dementia, many others will lie on a spectrum from unusual to implausible. Despite the scale of the challenge, there is nevertheless recent work to draw upon in correcting both free-text data (primarily for spelling) and structured data records. The data records for this project will come from research-community sources such as the Mimic III database, managed by MIT.
It is proposed that a multi-stage (and possibly multi-agent) approach is adopted to repair the free text, repair the structured data, and finally cross-check between the two forms of data. Data-driven and knowledge-driven approaches will be explored. In a data-driven approach, any records that show a unique pattern among a dataset of millions would be considered suspect. In a knowledge-driven approach, fuzzy rules might propose likely combinations of symptoms and diagnoses, based on medical knowledge, and recognise any improbable combinations in the data records.
How to apply:
We welcome applications from highly motivated prospective students who are committed to develop outstanding research outcomes. You can apply online at http://www.port.ac.uk/applyonline
. Please quote project code CCTS4190218 in your application form.
Applications should include:
- a full CV including personal details, qualifications, educational history and, where applicable, any employment or other experience relevant to the application
- contact details for two referees able to comment on your academic performance
- a research proposal of 1,000 words outlining the main features of a research design you would propose to meet the stated objectives, identifying the challenges this project might present and discussing how the work will build on or challenge existing research in the above field.
- proof of English language proficiency (for EU and international students)
All the above must be submitted by the 11th of February 2018.