Despite the significant performance gains across a range of tasks in both vision and Natural Language Processing (NLP), deep neural models are far from being able to reason effectively . One of the reasons for this is that the majority of the datasets used for training and evaluating models tend to be limited to a single modality (e.g., text, images or sound). As such, deep neural models are often able to exploit spurious statistical cues in making predictions, instead of being able to generate effective abstractions of input data. Additionally, these large performance gains have been, in large part, due to the increased use of deep neural models which consist of hundreds of millions (and often billions) of parameters. Unfortunately, this has led to methods which are fundamentally opaque and near impossible to interpret [2, 3].
On the other hand, certain problems fundamentally require the incorporation of data from multiple modalities. For instance, online hate speech often makes use of multiple modalities to minimise the chance of being flagged by platforms .
This project is aimed at addressing these shortcomings through the development of multimodal models which fuse information from images and text. Additionally, this project lays emphasis on the need for explainability in such models, a relatively recent avenue of research . Importantly, models developed will be analysed to ensure diversity and the absence of bias. Finally, in addition to the development of explainable multimodal neural models this project will investigate the impact of such models on social media data, for example, hate speech corpora .
Information pertaining to each of the different modalities can be fused in different ways and at different stages. For example, the fusion of information from different modalities can be either “early” or “late”. Similarly, input to the models can either be completely abstract (e.g. embeddings) or consist of features associated with a particular modality (such as, for example, a textual description of an image). This project aims to explore all of these methods with a focus on identifying and developing methods which are, to the extent possible, transparent, diverse and unbiased.
This project is associated with the UKRI Centre for Doctoral Training (CDT) in Accountable, Responsible and Transparent AI (ART-AI). We value people from different life experiences with a passion for research. The CDT's mission is to graduate diverse specialists with perspectives who can go out in the world and make a difference.
Informal enquiries are strongly encouraged and should be directed to Dr Harish Tayyar Madabushi (email@example.com).
Candidates should have a good first degree or a Master’s degree in computer science, maths, a related discipline, or equivalent industrial experience. Good programming skills are essential. A strong mathematical background and previous machine learning experience are highly desirable. Familiarity with bash, linux and using GPUs for high performance computing is a plus.
Formal applications should be accompanied by a research proposal and made via the University of Bath’s online application form. Enquiries about the application process should be sent to firstname.lastname@example.org.
Start date: 2 October 2023.