This project will develop methods for the automated extraction of data and metadata from social science and biomedical questionnaires from the UK and Finnish Data Archives.
Studentship group name
Digital Resilience
Department/School
School of Computer Science and Electronic Engineering
Research group(s)
Distributed and networked systems
Project Description
Social science and biomedical data from large scale longitudinal surveys has been available through the UK Data Archive (UKDA) for over 50 years. These longitudinal questionnaires provide a rich source of data for analysing social sciences and population health shifts and for the reuse of question banks by researchers. Automating the extraction of structured data and metadata from these questionnaires calls for the design of novel Natural Language Processing (NLP) algorithms to understand the complex nature of such longitudinal text documents. Recent advances in deep neural networks have significantly improved the performance of many language processing tasks such as automatic text annotation and classification. However, the effectiveness of the critical feature capturing of these models is dependent on a large amount of training data being available for each annotation concept. The longitudinal nature of the questions gives the data additional complexity both in terms of the language used for different age groups as well as the possible shift in semantics that happen over several decades.
This project will research deep learning techniques, including graph neural networks, for automated extraction of various data/metadata elements from the non-uniformly structured text of the questionnaires and its annotation to a defined vocabulary. The developed methods will extend the state of the art in NLP techniques by taking into account the semantic structure of the text tin order to be applicable to longitudinal text.
This project is an exciting collaboration between academics from the Department of Computer Science and social scientists from the Social Research Institute at UCL, and as such, the project student will benefit from working in close proximity to domain experts in the Cohort and Longitudinal Studies Enhancement Resources (CLOSER) Discovery team at UCL. The project has also scope for extension to multilingual text, in collaboration with Tampere University (Tampereen korkeakoulusaatio sr)/Finnish Social Science Data Archive.
How to Apply
Applications should be submitted via the Computer Science PhD programme page. In place of a research proposal you should upload a document stating the title of the projects (up to 2) that you wish to apply for and the name(s) of the relevant supervisor. You must upload your full CV and any transcripts of previous academic qualifications. You should enter ’Faculty Funded Competition’ under funding type.
Funding
The studentship will provide a stipend at UKRI rates (currently £17,668 for 2022/23) and tuition fees for 3.5 years. An additional bursary of £1700 per annum for the duration of the studentship will be offered to exceptional candidates.