FindAPhD Weekly PhD Newsletter | JOIN NOW FindAPhD Weekly PhD Newsletter | JOIN NOW

Artificial Systems Teaching Each Other

   Department of Computer Science

This project is no longer listed on and may not be available.

Click here to search for PhD studentship opportunities
  Dr A Bors  Applications accepted all year round  Self-Funded PhD Students Only

About the Project

Deep learning relies on training by using huge amounts of data. However, there are many applications where such data are not available due to various reasons, not least because of privacy concerns. In such cases we would like to be able to generate data by an artificial system in order to train another artificial system. On the other hand, the generated data should be realistic. One way to perform this is through Knowledge Distillation (KD) [1,2,3,4]. KD is used in dual Teacher-Student [5,6,7] deep learning systems in which a secondary, smaller network called a Student, learns from the data generated by a primary, usually a larger network called Teacher. Essential in Teacher-Student networks is defining how data is generated by the Teacher to comprehensively model the information learnt. Being able to learn multiple tasks simultaneously is very important in many applications [7,8]. Visual attention through transformers was shown as a promising approach as a criterion for KD [9]. Collaborative teaching by sets of several models may be required for complex tasks [10]. Criteria, based on KD for defining lifelong-learning architectures was analysed in [11].

During this project you will develop an optimal knowledge distillation loss function and the corresponding architecture.

Objectives: defining loss functions for Teacher-Student models for optimal transfer of information, defining the architecture of artificial systems teaching each other, studying the knowledge diversity that can be taught between artificial systems.

Research areas: Deep Learning; Neural networks; Computer Vision and Image Processing;.

Applications: Learning successive databases without forgetting (continual – lifelong learning), robotics, AI-AI interaction.

The candidate should be familiar or willing to learn about deep learning tools such as PyTorch or TensorFlow.


[1] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network, Proc. NIPS-w,, 2015.
[2] J, Gou et al., Knowledge distillation: a survey,, 2020.
[3] Y. Zhang et al., Deep mutual learning, Proc CVPR, 2018, 4320–4328.
[4] M. Phuong, C. H. Lampert., Towards understanding knowledge distillation, Proc. ICML, vol. PMLR 97, pp. 5142-5151,, 2019.
[5] Y. Liu et al., Search to distill: pearls are everywhere but not the eyes, 2020, Proc. CVPR, 2020, pp. 7539-7548.
[6] X. Chen et al., Explaining knowledge distillation by quantifying the knowledge, Proc. CVPR, 2020, pp. 12925-12935.
[7] F. Ye, A.G. Bors, Lifelong Teacher-Student Network Learning, IEEE Trans. on Pattern Analysis and Machine Intelligence, 2021, DOI: 10.1109/TPAMI.2021.3092677
[8] W.-H. Li and H. Bilen, Knowledge Distillation for Multi-task Learning,, 2020.
[9] H. Touvron et al., Training data-efficient image transformers & distillation through attention,, 2020.
[10] Q. Guo et. al., Online Knowledge Distillation via Collaborative Learning, Proc. CVPR, 2020, pp. 11020 – 11029.
[11] Y. Fei, A. G. Bors, Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process, Proc. ICCV 2021.

How good is research at University of York in Computer Science and Informatics?

Research output data provided by the Research Excellence Framework (REF)

Click here to see the results for all UK universities
Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs