Prof Thomas Hain
Applications accepted all year round
Self-Funded PhD Students Only
About the Project
In the past two decades speech technology has heavily relied on concatenate structures, in particular hidden Markov models (HMMs), for representation of the acoustic. Although these models have significant and well established short-comings, they are dominant because of their simplicity in practical use. Technology has moved dramatically and in order to make those models work under the complex situations required in speech or speaker recognition (or any other speech classification task) many algorithms were developed that aim to alleviate the known deficiencies. That has now led to a situation where it takes many years to develop practical recognition systems, and even then many applications still show very poor performance. One of the main shortcomings is the temporal modelling in HMMs.
In this project we will work on a new model that, instead of treating speech as a concatenation of elements, represents it as dynamic sequences that overlap in time. One can show that such modelling can be substantially more powerful than the existing concepts, allowing for new models to diversify different aspects such as intra speaker variation. The project will be conducted in the Speech and Hearing Group at the Department of Computer Science at Sheffield University. The group is at the fore-front of international research into speech recognition and the project naturally can leverage on the extensive infrastructure and knowledge available in the group.