Extracting knowledge from longitudinal data: A probabilistic approach that represents longitudinal data as a continuous-time stochastic processes
In the recent years, electronic transactions and record keeping together with cheap data storage has resulted in accumulation of massive amounts of data containing valuable information. Examples of such vast data sets are electronic patient records in the health sector or loyalty card data in the retail sector. The wealth of information in these data sets is at the moment under-explored as any analysis and inference is hampered by the size of the datasets.
To address this issue, machine learning techniques in biomedical, consumer data and other fields have seen rapid advances of algorithms and their implementations (Herland2014). However, very little progress has been made in analysis of longitudinal data, i.e., the data with events and observations spaced irregularly over time (Bellazzi2011). Existing statistical and machine learning methods cannot be directly applied since that data are heterogeneous with a varying number of events per record. Available algorithms are computationally very intensive, not scalable and not applicable to large data sets (see, e.g., Moskovitch2014). This project will attempt to change this by looking at the data from a new point of view: a probabilistic approach that represents longitudinal data as a continuous-time stochastic processes and links to a large theory of stochastic processes. This modeling approach promises to be:
• robust with respect to temporal inaccuracies often encountered in this type of data
• scalable allowing for exploration of large data sets
The aim of this project is to realise this potential. You will develop new mathematical techniques and new algorithms and apply them to medical and consumer data sets available in Leeds Institute of Data Analytics (www.lida.leeds.ac.uk).
Bellazzi, R., et al. (2011). Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1, 416-430.
Herland, M., et al. (2014). Journal of Big Data, 1(1), Article 2.
Moskovitch, R. and Y. Shahar. (2014). Data Mining and Knowledge Discovery, 1-43.
How good is research at University of Leeds in Mathematical Sciences?
FTE Category A staff submitted: 53.00
Research output data provided by the Research Excellence Framework (REF)
Click here to see the results for all UK universities