Explainability in Multimodal Deep Learning: Transparent Fusion of Text and Images

   Centre for Accountable, Responsible and Transparent AI

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
  Dr Harish Tayyar Madabushi, Prof Neill Campbell  No more applications being accepted  Self-Funded PhD Students Only

About the Project

Despite the significant performance gains across a range of tasks in both vision and Natural Language Processing (NLP), deep neural models are far from being able to reason effectively [1]. One of the reasons for this is that the majority of the datasets used for training and evaluating models tend to be limited to a single modality (e.g., text, images or sound). As such, deep neural models are often able to exploit spurious statistical cues in making predictions, instead of being able to generate effective abstractions of input data. Additionally, these large performance gains have been, in large part, due to the increased use of deep neural models which consist of hundreds of millions (and often billions) of parameters. Unfortunately, this has led to methods which are fundamentally opaque and near impossible to interpret [2, 3].

On the other hand, certain problems fundamentally require the incorporation of data from multiple modalities. For instance, online hate speech often makes use of multiple modalities to minimise the chance of being flagged by platforms [4].

This project is aimed at addressing these shortcomings through the development of multimodal models which fuse information from images and text. Additionally, this project lays emphasis on the need for explainability in such models, a relatively recent avenue of research [2]. Importantly, models developed will be analysed to ensure diversity and the absence of bias. Finally, in addition to the development of explainable multimodal neural models this project will investigate the impact of such models on social media data, for example, hate speech corpora [4]. 

Information pertaining to each of the different modalities can be fused in different ways and at different stages. For example, the fusion of information from different modalities can be either “early” or “late”. Similarly, input to the models can either be completely abstract (e.g. embeddings) or consist of features associated with a particular modality (such as, for example, a textual description of an image). This project aims to explore all of these methods with a focus on identifying and developing methods which are, to the extent possible, transparent, diverse and unbiased.

This project is associated with the UKRI Centre for Doctoral Training (CDT) in Accountable, Responsible and Transparent AI (ART-AI). We value people from different life experiences with a passion for research. The CDT's mission is to graduate diverse specialists with perspectives who can go out in the world and make a difference.

Informal enquiries are strongly encouraged and should be directed to Dr Harish Tayyar Madabushi ([Email Address Removed]). 

Candidates should have a good first degree or a Master’s degree in computer science, maths, a related discipline, or equivalent industrial experience. Good programming skills are essential. A strong mathematical background and previous machine learning experience are highly desirable. Familiarity with bash, linux and using GPUs for high performance computing is a plus. 

Formal applications should be accompanied by a research proposal and made via the University of Bath’s online application form. Enquiries about the application process should be sent to [Email Address Removed].

Start date: 2 October 2023.

Computer Science (8) Mathematics (25)


[1] Tayyar Madabushi, Ramisch, Idiart, Villavicencio. COLING Tutorial: Psychological, Cognitive and Linguistic BERTology (Part 1), COLING https://sites.google.com/view/coling2022tutorial/
[2] Joshi, G., Walambe, R. and Kotecha, K., 2021. A review on explainability in multimodal deep neural nets. IEEE Access, 9, pp.59800-59821.
[3] Bayoudh, K., Knani, R., Hamdaoui, F. and Mtibaa, A., 2022. A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. The Visual Computer, 38(8), pp.2939-2970.
[4] Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Fitzpatrick, C.A., Bull, P., Lipstein, G., Nelli, T., Zhu, R. and Muennighoff, N., 2021, August. The hateful memes challenge: competition report. In NeurIPS 2020 Competition and Demonstration Track (pp. 344-360). PMLR.

How good is research at University of Bath in Computer Science and Informatics?

Research output data provided by the Research Excellence Framework (REF)

Click here to see the results for all UK universities

Where will I study?

 About the Project