Deep learning relies on training by using huge amounts of data. However, there are many applications where such data are not available due to various reasons, not least because of privacy concerns. In such cases we would like to be able to generate data by an artificial system in order to train another artificial system. On the other hand, the generated data should be realistic. One way to perform this is through Knowledge Distillation (KD) [1,2,3,4]. KD is used in dual Teacher-Student [5,6,7] deep learning systems in which a secondary, smaller network called a Student, learns from the data generated by a primary, usually a larger network called Teacher. Essential in Teacher-Student networks is defining how data is generated by the Teacher to comprehensively model the information learnt. Being able to learn multiple tasks simultaneously is very important in many applications [7,8]. Visual attention through transformers was shown as a promising approach as a criterion for KD . Collaborative teaching by sets of several models may be required for complex tasks . Criteria, based on KD for defining lifelong-learning architectures was analysed in .
During this project you will develop an optimal knowledge distillation loss function and the corresponding architecture.
Objectives: defining loss functions for Teacher-Student models for optimal transfer of information, defining the architecture of artificial systems teaching each other, studying the knowledge diversity that can be taught between artificial systems.
Research areas: Deep Learning; Neural networks; Computer Vision and Image Processing;.
Applications: Learning successive databases without forgetting (continual – lifelong learning), robotics, AI-AI interaction.
The candidate should be familiar or willing to learn about deep learning tools such as PyTorch or TensorFlow.