Looking to list your PhD opportunities? Log in here.
About the Project
With the development of deep learning approaches and convolutional neural networks (CNN) in particular, the task of recognising objects from an image has become associated with the ability to train a network using a large number of labelled images for each class of interest [He2016]. The creation of large, labelled datasets such as ImageNet [Ru2015] has permitted the development of ever more performing architectures. In order to extend further the number of classes to which objects can be recognised without requiring to have access to a large number of labelled images of the new classes, few-shot learning (FSL) and even one short learning were proposed: in such frameworks, the initial training set is completed with a support set containing only a few images to define each additional object class [Sung2018]. Eventually, reliance on training images for new classes has been totally removed; instead, textual attributes proved sufficient. By learning a mapping between object images and textual attributes using a large training set, an object from a new class only defined by its textual attributes could be recognised. While initially zero-shot learning (ZSL) focused on identifying images belonging to the unseen classes of interest, generalized ZSL offers predictions for both the seen and unseen classes [Ji2021]. Although textual attributes can be retrieved automatically from the known labels using resources such as word2vec [Ch17] and Wikipedia, ZSL is essentially a retrieval problem as labels of all classes of interest must be known. Unfortunately, such constraint can only be met in a limited number of scenarios, such as those associated to existing specialised datasets dedicated to specific applications, e.g., ‘bird watching’ [Wa11]. Here, it proposed to replace this constraint by a less limiting one, i.e., access to an internet connection (or at least an electronic copy of an encyclopaedia). In such scenario, in principle, picture of any not only unseen, but also unidentified object can be processed and labelled. For example, if a system were able to characterise an unidentified object in an image as ‘a beaver with a duck beak’, a simple query using one’s favourite search engine would be sufficient to label the object as a ‘platypus’.
The aim of the project is to develop an efficient deep learning-based pipeline allowing the annotation of any object present on an image without prior knowledge by the system of their existence. Taking advantage of extracted visual features to create an Unidentified Featured Object (UFO), and existing mapping between visual and textual features, natural language processing techniques can then be applied to generate a list of putative annotations. Eventually, they can be analysed using an FSL-based framework to identify the most likely label.
Successful completion of the project requires addressing the following scientific objectives:
1. Design of a CNN-based architecture able to convert the photograph of any object into a set of relevant features to create an UFO
2. Design of a suitable visual-to-textual features mapping function
3. Apply natural language processing and FSL techniques to label the UFO
4. Significantly reduce the size of the deep learning solution, while maintaining its performance, by using differential equations as neural network layers. Usage of efficient solvers should result in a global decrease of computation complexity
Applicants should have, at least, an Honours Degree at 2.1 or above (or equivalent) in Computer Science or related disciplines. In addition, they should have a good mathematical background, excellent programming skills in Python, and an interest in machine learning.
References
[He16] K. He et al., Deep Residual Learning for Image Recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
[Ji21] Y. Jin et al., Zero-Shot Video Event Detection with High-Order Semantic Concept Discovery and Matching, IEEE Transactions on Multimedia, 2021
[Ru15] O. Russakovsky et al., ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV), 2015
[Su18] F. Sung et al., Learning to Compare: Relation Network for Few-Shot Learning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
[Wa11] C. Wah et al., The caltech-ucsd birds-200-2011 dataset,” California Institute of Technology, Tech. Rep. CNS-TR-2011-001, 2011.
Email Now
Why not add a message here
The information you submit to Kingston University will only be used by them or their data partners to deal with your enquiry, according to their privacy notice. For more information on how we use and store your data, please read our privacy statement.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Search suggestions
Based on your current searches we recommend the following search filters.
Check out our other PhDs in London, United Kingdom
Check out our other PhDs in United Kingdom
Start a New search with our database of over 4,000 PhDs

PhD suggestions
Based on your current search criteria we thought you might be interested in these.
Developing Children’s Rights-based approaches for law and policy inScotland – journeying from recognition towards incorporation andimplementation of the UN Convention on the Rights of the Child
Edinburgh Napier University
Use of Deep Learning for Image-Based Fruit Trees Disease Detection on Edge Devices.
Glasgow Caledonian University
Automatic Object and Behaviour Recognition in Video/ Image Sequences
University of Sheffield