Synthetic spoofing speech, commonly known as deepfake, has become a threat to online communication and automatic speaker verification systems, given that deep learning methods used for synthetic voice modelling can reproduce anyone’s voice as long as there is a sample authentic voice to work from. Given the plethora of media on YouTube, Facebook, BBC News online etc, it seems impossible to protect people from having their voice “stolen” through being downloaded and hence used to train a generative AI deepfake system. There are also possibilities of synthetically generated speech being used in crimes such as telephone banking fraud.
Existing audio detection methods work exclusively in English, meaning there is a lack of data resources in other languages. Cross-lingual synthetic detection, a critical but rarely explored area, requires more study, as do English regional accents. The proposed work will overcome this by considering the formulation of frequencies made by a human voice versus a synthetic one. A disadvantage of previous work is its lack of consideration of noisy environments. Most audio recordings in criminal investigations come from environments with background noises such as conversations or traffic. The proposed work will embrace noise.
The project will involve investigating the use of a range of variables to create a pseudo audio-fingerprint for a person, which will be combined with a developed framework for forensic practitioners to follow when reviewing submitted audio evidence. The end-point being an indication of the level of authentication for a potential synthetic audio recording.
Whilst there are many AI tools available online to detect synthetic media, each has a level of success which is far from 100%. There is also limited literature on how to manually detect synthetic audio. The advantage of developing the proposed detection approach and framework is that these can be presented in court as an approved and quality assured methodology, that can be more easily explained to a Judge, jury and lawyers in comparison to saying “an AI tool has been used”.
In particular, this cross Faculty (Faculty of Engineering and Technology, as well as Faculty of Science) project aims to support the Police in developing and enhancing their capabilities in audio forensics.
Additional subject specific training and opportunities include:
- Comprehensive transferable skills development programme from the university’s Researcher Development Programme.
- Coaching and/or mentoring support.
- Specialised software training
- Liaison with UK Police Forces and International Audio/Video Forensic Software companies.
- Additional subject-specific training relevant to the research project.
Applicant requirements include a first degree and master’s degree in a relevant subject (e.g. audio forensics, audio engineering) and specific knowledge or experience of working with synthetic media is desirable.
FORRI supports the enhancement of diversity and inclusion in forensic research, thus applicants from diverse backgrounds are encouraged to apply. This TDP is only open to UK students.
For an informal discussion about this opportunity please email Dr Sebastian Chandler-Crnigoj for more information [Email Address Removed]. Applicants should email a CV, covering letter detailing their suitability for the project and contact details of two referees to: [Email Address Removed]
Applicants must be available for an online interview on 5th January 2024.