Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Deep learning-based classification of proteins represented by Structural Alphabets to identify potential drug targets

   Faculty of Engineering, Computing and the Environment

  Dr Jad Abbass  Applications accepted all year round  Self-Funded PhD Students Only

About the Project

-       This project does not require any previous knowledge in Biology -

Proteins are the building blocks of all living organisms’ cells. Many types of cancer as well as Alzheimer's disease, Parkinson's disease, Mad-cow disease, and many others are all associated with protein misfolding. Everyone has memorised the ‘logo’ of COVID19; that grey Styrofoam ball dotted with red spikes. Those red spikes have been considered the most ‘famous’ part of the virus since they are crucial not only to their ‘attack job’ in human bodies but also as a target of most vaccines. COVID19’s red spikes are simply proteins. Therefore, any new insight into the structural features of proteins is considered invaluable to biologists, drug and vaccine designers.

All protein structures deposited in Protein Data Bank (PDB) are supposed to be classified based on their architecture and fold. However, annotation is not a straightforward process as it requires most of the time to be carried on manually and ‘visually’. Even the automated approaches do not produce reliable results. Consequently, not all proteins deposited in PDB have proper structural annotations. One of the main reasons protein structures are annotated is to gain insight into the molecular basis of their functions. In addition, such annotations may connect different proteins with possible evolutionary relationships; a crucial step toward the full understanding of any protein. In the same context, Structural Alphabets (SA) have been introduced to represent any protein structure. A structural alphabet is a set of sequence-independent fragments that is assumed to be able to mimic any protein structure.

On the other hand, clustering a set of proteins is a very common task in the world of protein bioinformatics. Current clustering tools are CPU-time and RAM-consuming, use the standard k-means algorithms and deal with a set of different protein structures having the same length. One would notice the ‘very standard’ way and limitations of the current clustering techniques.

Using SA, we will create a totally novel way to cluster protein structures not only globally but also ‘locally’ using deep learning. In other words, we will be able to detect common areas amongst several thousands of proteins despite the overall difference in their global structures in a relatively short processing time by learning the global fold of proteins as well as some local regions/motifs. One could apply this technique to cluster proteins of different lengths, that is, totally different proteins. This will help classify unannotated proteins into their proper topology, architecture or fold. Since the number of current protein structures in PDB is around 188,000, deep learning algorithms will be able – in principle – to produce excellent results. 

Biological Sciences (4) Computer Science (8)


- Theoharides TC, Conti P. Be aware of SARS-CoV-2 spike protein: There is more than meets the eye. J Biol Regul Homeost Agents. 2021
- Snyder H, Wolozin B. Pathological proteins in Parkinson's disease: focus on the proteasome. J Mol Neurosci. 2004;24(3):425-42.
- Abbass J, Nebel JC. Customised fragments libraries for protein structure prediction based on structural class annotations. BMC Bioinformatics. 2015 Apr 29;16(1):136.
- Abbass J, Nebel JC. Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure. BMC Bioinformatics. 2020 May 1;21(1):170.