Emotion recognition is the task of identifying human emotion, and is an increasingly important factor in the field of human computer interaction (HCI), with applications in healthcare, well-being, gaming, safety and security. Emotion recognition can be achieved using a variety of input modalities, ranging from video, images, voice, and text. The combination of these different input modalities such as vocal signal, vision, and verbal content have been shown to have great potential in representing a wide variety of different emotions. Traditional approaches have used feature-level fusion in combination with machine learning algorithms to achieve emotion recognition with some success, however in recent years deep learning has shown remarkable capabilities in learning detailed representations of high-dimensional image and video data, as well as audio spectral features, to achieve better classification performance. Unlike tasks that focus on a single input modality to determine emotion recognition, multimodal input techniques take into consideration a range of different input data types, such as visual and audio, without the need for more intrusive physiological data signal recordings (electrocardiogram (ECG), galvanic skin response (GSR), etc). Recent studies utilising multimodal deep learning approaches have shown promising results, however due to the complex nature of multimodal emotion recognition further challenges remain.
This project aims to explore deep learning architectures and to propose, design, and develop a multimodal deep learning pipeline for emotion recognition. This project will address the following research questions:
What is the state-of-the-art deep learning models in image and audio emotion recognition?
How can we utilise audio representative learning using deep learning models.
What deep learning models can be used to extract features from high-dimensionality data such as video and image data to generalise emotional states?
What model-level fusion strategies can be used with image and audio feature representations to achieve accurate emotion recognition?
How can we incorporate representative data sources in the development and evaluation of deep learning emotion recognition models?
In answering these research challenges, the aim of the project is to advance the state-of-the-art in deep learning models for multimodal emotion recognition. Outputs from this research project could have potential impact in the application of emotion recognition within different fields, including healthcare, mental health and well-being.