Accurate understanding and forecasting of human behaviour is a core challenge in many disciplines. Predicting whether people will be infected by a disease, buy a product or choose to travel more sustainably is the focus of many researchers’ effort to develop mathematical and statistical models which are suitable and as accurate as possible to inform health, marketing and transport policies, to name a few. Such models are used to produce a representation of the outcomes of behaviours observed in data and ultimately predict how such behaviours will change in the future in response to specific interventions, such as vaccines or the introduction of autonomous vehicles.
Much of the data collected to develop such behavioural models is “cross-sectional”, i.e. collected at a specific point in time. Such data record the state of the world at a specific point in time, and do not grant insights on causality mechanisms. In order to capture time-related dynamics, panel data (where repeated measurements of outcomes are made over time for the same person) are necessary.
While the advantages of panel data are well-known to many scholars and practitioners, their exploitation is limited. For example, transport researchers have used such data to investigate car ownership over the life course using mainly qualitative, descriptive or relatively simple statistical methods. One of the concepts that the literature mainly relies on is state dependence, i.e. the idea that current car ownership can be explained by car ownership in the past. While this is found to be a significant factor in modelling results, such an approach does not help us understand why the observed levels of car ownership have been maintained over the years. Indeed, the factors which more likely explain long-term car use (such as personal characteristics or attitudes) are generally only used to explain car ownership at each point in the panel, but not the overall trend. More advanced methods providing alternative frameworks have been developed in the last decade, but their complexity and steep learning curve to apply them resulted in limited uptake.
Despite these limitations, the methods used in transport research are grounded in well-established behavioural frameworks (such as Random Utility Theory), ensuring the ability to derive economic measures highly valuable to policy makers, such as willingness to pay for goods and services. For example, a longitudinal dataset of rail journeys including ticket prices could be used to evaluate whether people would be willing to pay a given amount for a given trip in the future.
Differently, in Mathematics, longitudinal data models are well-developed. Mathematicians do not limit themselves to model structures which are derived from recognised theories of behaviour and produce flexible structures which are aimed at tackling the specific research question at hand. Such models are used, for example, for predicting disease outcomes and hospital admissions and present good forecasting performance, but economic measures are not easily derivable.
On the other side, forecasting from such models is believed to be more accurate. Not only it is common practice to consider exogenous factors (i.e. external from the elements captured in the model) but also a range of techniques are used such as regarding longitudinal data as functional data (a curve, rather than a data point) and use these curves to perform predictions in the near future.
Against this backdrop, this project aims to incorporate some of the flexibility and forecasting tools of mathematical models into transport models, without deviating from the behavioural foundations of the latter, so that economic measures for policy-making can still be obtained. This will lead to the development of a better methodological toolkit for modelling panel data in transport and other fields where forecasting human behaviour is a priority, for example environment and health.
The PhD candidate will first produce a detailed review of the existing literature on modelling panel and longitudinal data which will also allow the candidate to identify the most desirable aspects of the different approaches and work on a framework to incorporate them. Two case studies will be then used to demonstrate the new methods, with different fields of application (e.g. transport and health) using open-access datasets which have been identified by the supervision team.