Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Towards linguistically-informed automatic speaker recognition


   Department of Language and Linguistic Science

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
  Dr V Hughes  No more applications being accepted  Competition Funded PhD Project (European/UK Students Only)

About the Project

Understanding and modelling the human voice in all its complexity is a key issue in both the humanities (linguistics, phonetics) and the sciences (engineering, computer science). Different disciplines approach this problem in fundamentally different ways. Linguistics adopts a componential approach: speech is broken down into its constituent parts (vowels, consonants, pitch, etc.) via analytical listening and acoustic measurements. Speech technology works holistically, extracting abstract features from longer speech samples to build a mathematical model of the voice. Despite the clear overlap in interests, very little work has been situated at the intersection between linguistics and speech technology.

The proposed project focuses on speaker recognition – i.e. identifying individuals from their voice. Banks and other institutions (e.g. HMRC) use automatic speaker recognition (ASR) systems to verify the identity of customers attempting to access accounts. Such systems are also used by forensic scientists (e.g. the Spanish police) as evidence in cases involving recordings of unknown criminal voices. It is of central importance that such systems make very few errors, preventing imposters from gaining access to bank accounts, and protecting innocent people from prosecution.

ASR is widely perceived to be a black box, whereby the inner workings of the system are opaque to users, and in some cases even to developers. State-of-the-art ASR systems are now increasingly based on artificial intelligence. This means that the systems themselves learn the best way of structuring and analysing the data. These approaches have shown dramatic improvements in performance, achieving almost perfect recognition under certain conditions. However, as the underlying algorithms get more complex, understanding their inner workings becomes increasingly difficult. This is now arguably the biggest problem for the field of ASR.
This project contributes to the small but emerging body of research investigating how linguistic information can help us better understand ASR systems. The project will be hosted by the University of York and Aculab, and will address three key questions: To what extent do ASR systems capture tangible linguistic properties of a voice? By understanding what information is captured by ASR systems, can we predict which speakers will be problematic for the system? Can linguistic information be used to improve the performance of ASR?

The project will use collections of voice recordings available at York and Aculab. Voices will be processed using Aculab’s VoiSentry ASR system and the output will be compared with linguistic data extracted from the same voices. The linguistic approach will involve auditory and acoustic analysis of features commonly used in forensic voice casework, e.g. vowels, consonants, and pitch. The voice samples will also be manipulated systematically to alter certain linguistic features (such as pitch, timbre etc.) and evaluate the effects on the output of the system. As there is no methodological precedent for how to approach this issue, there will be considerable scope for the student to mould the project.

The collaboration between York and Aculab is unique. York is home to a world leading research centre for forensic speech science. The project builds on work on our AHRC-funded Voice and Identity project, which represents one of the only large-scale investigations into how speaker-specific information is encoded in the voice, by comparing the output of ASR systems with linguistic approaches to assess their efficacy and complementarity. Aculab is a leading commercial developer of ASR systems used by institutions such as banks. The student will have unique access to the underlying code of the VoiSentry system, allowing research into the inner workings of ASR in ways that were not possible previously. Aculab will host the student, providing unique employment-related skills via experience in a commercial environment.


Funding Notes

The selection of Collaborative Doctoral Award applicants is a two stage process:

STAGE 1
Please send the following information directly to [Email Address Removed]: A CV detailing your academic record (no more than 3 pages); Academic transcript from your university; Names and contact details for two referees who will be able to provide a reference by the interview on 3rd December at latest if required; 800 word statement of purpose.

Deadline for STAGE 1 applications: 14 December 2018

Date of interviews: 8-9 January 2019

STAGE 2
Apply for funding to WRoCAH by the 5pm on Wednesday 23 January 2019 deadline.

Where will I study?