Get free PhD updates, every week | SIGN UP NOW Get free PhD updates, every week | SIGN UP NOW

Learning to Play: Assessing Music Playing Skill from AudioVisual Data

   UKRI Centre for Doctoral Training in Socially Intelligent Artificial Agents

This project is no longer listed on and may not be available.

Click here to search for PhD studentship opportunities
  Mr Jared de Bruin  No more applications being accepted  Funded PhD Project (Students Worldwide)

About the Project

For instructions on how to apply, please see: PhD Studentships: UKRI Centre for Doctoral Training in Socially Intelligent Artificial Agents.


  • Tanaya Guha: School of Computing Science
  • Subarna Tripathi: Intel Labs San Diego

Motivation & novelty:

Humans can often assess how well someone performs at a given task simply by watching (and hearing) them in action. The task of ‘skill assessment’, if automated, can potentially create assistive technology for humans to learn and practice independently, achieving eventual mastery. Although several learning apps and tools are available these days, few can offer automated feedback on the learners’ skill level.

Aims and methodology:

This project will develop a multimodal (audiovisual) AI tool to assess human skills from video streams accompanied with audio. In particular, our aim is to assess the skill of a learner playing a musical instrument using both audio and video as inputs. This is a fine-grained video understanding problem, where the input videos have similar actions while audio could be different. The project will develop new deep learning models to combine information from the two modalities that can attend to the modalities spatially and temporally with appropriate attention. A relevant database will need to be curated from YouTube and labeled in a semi-automated fashion.

Alignment with industrial interests:

Multimodal sensing and sense-making technologies are at the heart of Intel’s effort to build smart and personalized learning space. For example, Intel’s deployment of such technologies in ‘Kid Space’ showed encouraging results in terms of students’ engagement and learning effectiveness.


This is envisioned as a full-time PhD project involving the following activities: Literature survey, database curation, baseline model development, new model development, testing and evaluation, dissemination of results (e.g., publication, presentation) and thesis writing.

Desired skills:

Python, Machine Learning, prior experience of working with video/audio.


[1] Doughty et al., ‘The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos’, CVPR 2019.
[2] Parmar and Morris, ‘What and how well you performed? A multitask approach to action quality assessment,’ in Proc. CVPR 2019.
[3] Aslan et al. Exploring Kid Space in the wild: a preliminary study of multimodal and immersive collaborative play-based
Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs