Large language models (LLMs) are an artificial intelligence approach that have recently been shown to have extremely promising ability, for example, for conversing with humans or performing tasks such as summarising or extracting information from text. This project will investigate the potential opportunities and challenges of using LLMs in epidemiological research and explore the use of these models for deriving health traits from Twitter data as an example.
This project will investigate the opportunities and challenges of large language models to assist epidemiological research and explore applications of large language models for deriving health phenotypes from Twitter data as an example. Background: Large language models (LLMs) are an artificial intelligence approach that typically have a very large number of parameters (e.g. millions or billions) and have been trained on extremely large datasets. In recent years these models have gained substantial attention, as they have demonstrated extremely promising performance for being used for a variety of tasks, and they are set to disrupt the way tasks are conducted across many areas of life. They are already being adopted to help scientists do their work more efficiently, for example, writing code or helping to write research publications. In epidemiology research there are also opportunities to exploit these pre-trained models, such as for assisting with summarisation, information extraction or prediction using textual data. Objective 1: To review the literature and availability of LLMs, to determine the opportunities and challenges for using LLMs to assist epidemiology research. This could include consideration of: (1) The LLMs that are available and the differences between them (e.g. in terms of performance/ capability, environmental impact); (2) limitations of using LLMs for epidemiological research, for example hallucinations, model interpretability; (3) ethical considerations, for example, the use of LLMs with sensitive epidemiological data, and the potential bias in these models; (4) the broad tasks that LLMs have been used for in epidemiology and what performance have LLMs achieved on these tasks. Objective 2: Determine the extent that LLMs can be used to summarise or extract information from Twitter data. The student will investigate using LLMs to derive topic information from tweets (e.g. example topics may be politics, or social life), and output can be compared to other topic analysis approaches. The student will also explore using LLMs to extract sentiment or mood from tweets, and this can be compared with existing sentiment analysis approaches. The student will be able to suggest other health-relevant information that may be inferred from tweets and explore approaches to do this. This objective will use open source LLMs that can be downloaded and run on the University’s compute services, as they will be applied to sensitive ALSPAC data in Objective 3, that cannot be transferred to external services. Objective 3: Explore the association of the LLM derived data with mental health traits in ALSPAC. The student will set up an analytical pipeline that uses a LLM to derive phenotypes from Twitter data available in 750 ALSPAC participants. This will use the approaches developed as part of Objective 2 to derive phenotypes and explore the relationship of these phenotypes with mental health traits, such as anxiety and depression. The specific mental health traits to be explored can be chosen depending on the student’s interests.
Lead Supervisor Name Dr Louise Millard Affiliation Bristol College/Faculty Faculty of Health Sciences Department/School Bristol Medical School Email Address [Email Address Removed]
Co-Supervisors: Professor Tom gaunt, Dr Oliver Davies, Professor Frances Rice
How to apply:
Please read the full application guidelines before applying.
This project is part of the GW4 BioMed2 MRC DTP projects.
Please complete an application to the GW4 BioMed2 MRC DTP for an ‘offer of funding’. If successful, you will also need to make an application for an 'offer to study' to your chosen institution.
Please complete the online application form linked from our website by 5.00pm on Wednesday, 1st November 2023. If you are shortlisted for interview, you will be notified from Tuesday 19th December 2023. Interviews will be held virtually on 24th and 25th January 2024. Studentships will start on 1st October 2024.
For enquiries regarding the application procedures please contact [Email Address Removed].
For enquiries specifically on this project please contact Louise Millard above.