University College London Featured PhD Programmes
University of Nottingham Featured PhD Programmes
Birkbeck, University of London Featured PhD Programmes
Xi’an Jiaotong-Liverpool University Featured PhD Programmes
University of Reading Featured PhD Programmes

Speech enhancement using audio and visual speech information within a deep learning framework (MILNERBU20SCIO)

  • Full or part time
  • Application Deadline
    Thursday, January 30, 2020
  • Competition Funded PhD Project (Students Worldwide)
    Competition Funded PhD Project (Students Worldwide)

Project Description

This project is concerned with speech enhancement and speaker separation. In everyday environments a speech signal may be contaminated by a variety of different acoustic noises - from sound sources such as traffic and machinery and from other speakers, from a single speaker to the babble produced by a group of speakers. The effect of this noise is to reduce the quality of the received speech and to make it less intelligible and more difficult to understand.

The aim of this project is to generate an improved speech signal that exhibits both higher quality and better intelligibility. Many approaches have been developed for speech enhancement and speaker separation but these are generally restricted to using only the audio speech signal. In this work the challenge will be to investigate how visual speech information can be exploited and combined within the speech enhancement/speaker separation framework. Deep learning-based approaches to extracting useful visual and audio features will be examined.

Most methods of speech enhancement and speaker separation operate by filtering the noisy speech using filters created from estimates of the local signal-to-noise ratios. Methods based on this approach will be investigated. Furthermore, an alternative, less well explored approach, is to reconstruct a clean speech signal using a model of speech production from a set of parameters estimated from the noisy audio and visual speech signals. The task now becomes how to exploit deep learning architectures to estimate model parameters from the noisy signals.

Throughout the project objective evaluations of quality and intelligibility will be carried out. Towards the end of the project a thorough subjective evaluation will be performed using human listeners to reveal the importance of using visual speech information, determine the best methods of feature extraction and reveal whether filtering or reconstruction is better and under what noise conditions.

For more information on the supervisor for this project, please go here https://people.uea.ac.uk/b_milner

Type of Programme: PhD
Start Date: October 2020
Mode of Study: Full-time

Entry requirements:
Acceptable first degree in computing science, electronic engineering or mathematics.

Funding Notes

This PhD project is in a competition for a Faculty of Science funded studentship. Funding is available to UK/EU applicants and comprises home/EU tuition fees and an annual stipend of £15,009 for 3 years. Overseas applicants may apply but they are required to fund the difference between home/EU and overseas tuition fees (which for 2019-20 are detailed on the University’s fees pages at View Website . Please note tuition fees are subject to an annual increase).

Related Subjects

Email Now

Insert previous message below for editing? 
You haven’t included a message. Providing a specific message means universities will take your enquiry more seriously and helps them provide the information you need.
Why not add a message here
* required field
Send a copy to me for my own records.

Your enquiry has been emailed successfully





FindAPhD. Copyright 2005-2020
All rights reserved.