FindAPhD Weekly PhD Newsletter | JOIN NOW FindAPhD Weekly PhD Newsletter | JOIN NOW

Self-supervised Machine Learning from Multiple Sensory Data


   School of Computer Science

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
  Dr Jianbo Jiao  No more applications being accepted  Funded PhD Project (UK Students Only)

About the Project

Machine Learning or more specifically, Deep Learning has made great progress in many areas including but not limited to computer vision, natural language processing, audio data processing, healthcare, science etc., demonstrating performance even better than human experts. Many of these deep learning-based techniques have been successfully applied in our real life, e.g. face recognition, background replacement in the online meeting, autonomous driving, virtual assistant, drug development and medical diagnosis etc.

However, success heavily relies on manually annotated ground-truth data from human experts to train the deep models. Obtaining high-quality labelled data requires a huge amount of manpower and financial resources and in many situations needs domain knowledge. This is rather difficult to scale but the success of deep neural networks is powered by large-scale datasets. Such limitations also restrict any developed deep model to a particular application scenario and prevents its power from being generalised or transferred to other applications.

An ability to learn without large amounts of manual annotation is crucial for generalised representation learning and could possibly be the key to general artificial intelligence - we humans rarely rely on many annotations. Self-supervised learning, which means learning by the model itself purely from the data itself while without reference to external human annotations, is a path towards the goal. To this end, self-supervised learning has shown its effectiveness in image [1] and video [2, 3] understanding, medical data analysis [4, 5], etc.

Although the major information acquisition comes from visual data for our humans, as we all know and experience, the world around us consists of many other different data sources for us humans to perceive and is essential to understanding the world. For instance, the audio sound, text/language, tactile sense, and sense of smell, to name a few. How to adequately leverage such information from multiple data modalities would be the key question to approaching a more natural learning framework as well as more general intelligence. Preliminary study [6] has shown the possibility of learning self-supervised representations from multi-modal data, but it is still under-explored with many challenging problems to address.

This project aims to study and explore the potential of learning general transferable representations from multi-modal data in a self-supervised manner. The multi-modal data here means data from multiple sensors. For example, potential data modalities could be image, video, audio, text, 3D depth, multi-view, geographical information and other metadata. The target general transferable representations indicate the knowledge learned by the deep model that can be well transferred to downstream tasks. For example, a model was pre-trained on a large-scale dataset for task A, and then applied to tasks B, C, etc. without requiring additional effort on data from tasks B and C. This is important as it can greatly alleviate the cost of building AI models. Multi-modal data is also beneficial for self-supervised representation learning as it provides more constraints and consistency among different modalities. The student is expected to start by working on public datasets available in the community. Deep learning models will be developed by the student to take multi-modal data as input and generate the high-quality representations as mentioned above. In a later stage, a new dataset would be constructed and novel algorithms will be developed upon that to answer the challenging questions within this topic and beyond.

Essential Knowledge and Skills:

Strong programming skills (Python, C, Matlab, etc.)

Strong mathematical knowledge and background

Strong communication skills in both written and spoken (English)

Strong motivations for research and desire to address challenging problems

Be a team player and willing to work with graduate students

Good time management skills

Patience and determination

Desirable Experiences:

Practical experience with deep learning frameworks (e.g. PyTorch and TensorFlow)

Experience with Computer Vision projects

Experience with Audio or NLP processing

Knowledge of self-supervised learning, CNNs, transformers, geometry, computer graphics rendering models

Experience with data collection

Experience with scientific paper writing (e.g. publication or submission)

We want our PhD student cohorts to reflect our diverse society. UoB is therefore committed to widening the diversity of our PhD student cohorts. UoB studentships are open to all and we particularly welcome applications from under-represented groups, including, but not limited to BAME, disabled and neuro-diverse candidates. We also welcome applications for part-time study.

Eligibility: First or Upper Second Class Honours undergraduate degree and/or postgraduate degree with Distinction (or an international equivalent). We also consider applicants from diverse backgrounds that have provided them with equally rich relevant experience and knowledge. Full-time and part-time study modes are available.

If your first language is not English and you have not studied in an English-speaking country, you will have to provide an English language qualification.

We will consider applications from students wishing to start during the 2022-23 academic year or who wish to begin their studies in autumn 2023.


Funding Notes

The position offered is for three and a half years full-time study. The value of the award is stipend; £16,602 pa; tuition fee: £4,596 pa. Awards are usually incremented on 1 October each following year.

References

References:
[1] Ge, C., Liang, Y., Song, Y., Jiao, J., Wang, J., & Luo, P. (2021). Revitalizing CNN attention via transformers in self-supervised visual representation learning. Advances in Neural Information Processing Systems, 34, 4193-4206.
[2] Wang, J., Jiao, J., Bao, L., He, S., Liu, W., & Liu, Y. H. (2021). Self-supervised video representation learning by uncovering spatio-temporal statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Wang, J., Jiao, J., & Liu, Y. H. (2020, August). Self-supervised video representation learning by pace prediction. In European conference on computer vision (pp. 504-521). Springer, Cham.
[4] Jiao, J., Namburete, A. I., Papageorghiou, A. T., & Noble, J. A. (2020). Self-supervised ultrasound to MRI fetal brain image synthesis. IEEE Transactions on Medical Imaging, 39(12), 4413-4424.
[5] Jiao, J., Droste, R., Drukker, L., Papageorghiou, A. T., & Noble, J. A. (2020, April). Self-supervised representation learning for ultrasound video. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) (pp. 1847-1850). IEEE.
[6] Jiao, J., Cai, Y., Alsharid, M., Drukker, L., Papageorghiou, A. T., & Noble, J. A. (2020, October). Self-supervised contrastive video-speech representation learning for ultrasound. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 534-543). Springer, Cham.

How good is research at University of Birmingham in Computer Science and Informatics?


Research output data provided by the Research Excellence Framework (REF)

Click here to see the results for all UK universities
Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs