Foundational Multimodal Representation Learning Models for Proteins

   School of Medicine

  Prof Denis Shields

About the Project

IBM-UCD Pre-doc Fellowship in Machine Learning

We are delighted to offer an opening for a 4-year paid (salary + full fees) PhD position in the area of artificial intelligence (AI) as part of a new initiative between IBM Research and the Science Foundation Ireland Centre for Research Training in Genomics Data Science ( at University College Dublin (UCD). The PhD project will be jointly supervised by Professor Denis Shields at the Conway Institute of Biomolecular and Biomedical Research at UCD and Dr. Thanh Lam Hoang at IBM Research, Dublin Lab ( in the AI for Health and Social Care team led by Dr. Vanessa Lopez Garcia.

This selected student will be employed by IBM for the duration of the PhD and be a registered

PhD student in UCD, resulting in the following benefits:

• Access to resources and expertise both at IBM Research and UCD

• Research experience in both private and public sectors

• A substantial PhD Salary (>40,000 euro)

• Full UCD PhD programme fees (EU or non-EU level) paid

Scientific discovery often deals with problems where the space of possible candidate solutions is too large for human evaluation alone. Recent advances in AI research focus on developing large foundational deep learning models pretrained with large unlabelled data using self-supervised learning methods. These models are then fine-tuned for narrowing down the search space and accelerating scientific discovery significantly. This project will leverage protein data fused from public databases and knowledge graphs, with diverse information and modalities including protein sequences, protein 3D structure, gene ontology, and protein interactions, to build foundational multi-modal representation pretrained models. The protein representation given by the pretrained machine learning models will be applied to various predictive, descriptive, and generative protein learning tasks such as predicting cleaving sites between proteases and their substrates which has a wide range of potential applications, for example, characterizing gut disorders (such as Inflammatory Bowel Disease), and understanding the complex process of nutritional release during seed germination.


  • Good programming skills with one of the following programming languages: python, Java, C, C++.


  • Familiar with a deep learning framework (PyTorch, Keras, or Tensorflow), familiar with basic machine learning concepts.
  • MSc in Computer Science, Engineering, Mathematics or related areas is preferable, but not a must.
  • A clear motivation to work on biological sequence, structure and functional data and an appreciation of the complexity of existing and emerging large molecular biological datasets and their relationships.  

To apply please email a single pdf containing a cover letter and a full CV with names and contact details of two referees to both [Email Address Removed] and [Email Address Removed], using the subject line “IBM-UCD-Fellowship” by October 25th  2022.

About IBM

IBM Research in Ireland has a proven record of creating and executing R&D projects - contributing to the creation of innovative research approaches, with potential to advance existing products, publications in top-tier journals and conferences, patents and open sourcing. You can expect to contribute to ground-breaking research  seeking to advance the state-of-art in knowledge representation, natural language processing and machine learning technologies, considering aspects such as scalability and interpretability. You will be part of a cross-disciplinary team defining and testing new technologies on real world problems; discovering new growth opportunities; collaborating with scientists, business units, industries, and universities. Our culture is to drive ideas all the way from research to impact. We are seeking top talent with the drive, vision, deep curiosity and the desire to learn skills required to research and implement cutting edge research solutions.

About UCD Conway Institute

UCD Conway Institute is located on the 300-acre Belfield campus of University College Dublin (UCD), the largest university in Ireland. At UCD Conway Institute, we provide a dedicated research environment for over 400 research staff from across the University and its associated teaching hospitals. Through interdisciplinary research programmes, the close collaboration of our expert scientists with clinicians and industry partners underpins the translational nature of our research. Our approach combines human potential and innovative technologies to develop creative solutions for exploring the underlying mechanisms of chronic disease. Our goal is to translate this knowledge into preventative strategies, novel diagnostics and therapeutics that will benefit patients and ultimately impact positively on society.

Funding Notes

This is funded by IBM who provide a competitive salary and pay the fees.
