Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Interpretation and Explanation of Language Model Outputs [Self-funded students only]


   Cardiff School of Computer Science & Informatics

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
  Dr Mohammad Taher Pilehvar, Dr Jose Camacho Collados  Applications accepted all year round  Self-Funded PhD Students Only

About the Project

Project description:

Present-day deep learning models often operate as formidable black boxes, excelling in performance metrics but lacking transparency in their decision-making processes. The field of interpretability aims to illuminate the inner workings of these black boxes. Interpretability is crucial for two main reasons: firstly, it offers insights into the limitations of existing models, guiding research directions; and secondly, it aids in creating more resilient models against adversarial attacks by exploiting a model's vulnerabilities.

The aims of this project are twofold:

(1) To develop techniques for enhanced interpretation of model decisions, specifically Transformer-based models. This can be in the form of token attribution analysis, i.e., to assess which parts of the input were responsible for making the final decision, or other probing analysis experiments to shed light on the inner workings of these models.

(2) To use interpretation/explanation techniques to delve deeper into the issues that are commonly associated with LLMs. For instance, their lack of understanding of numerical concepts, semantic ambiguity, or common sense. This can either be an analytical study to explain the reasons behind these shortcomings or a technical piece of work on improving these models with respect to the shortcomings.

Deliverables:

The outputs of this project will be mostly published at NLP and AI conferences and journals. Successful techniques also have the chance to be integrated into existing interpretation frameworks, such as Inseq and Captum.

Contact for more information on the project: Dr Taher Pilehvar; [Email Address Removed]

Academic criteria: A 2:1 Honours undergraduate degree or a master's degree, in computing or a related subject. Applicants with appropriate professional experience are also considered. Degree-level mathematics (or equivalent) is required for research in some project areas. 

Applicants for whom English is not their first language must demonstrate proficiency by obtaining an IELTS score of at least 6.5 overall, with a minimum of 6.0 in each skills component. 

How to apply:  

Please contact the supervisors of the project prior to submitting your application to discuss and develop an individual research proposal that builds on the information provided in this advert. Once you have developed the proposal with support from the supervisors, please submit your application following the instructions provided below 

Please submit your application via Computer Science and Informatics - Study - Cardiff University 

In order to be considered candidates must submit the following information:  

  • Supporting statement  
  • CV  
  • In the ‘Research Proposal’ section of the application enter the name of the project you are applying to and upload your Individual research proposal, as mentioned above in BOLD 
  • Qualification certificates and Transcripts 
  • References x 2  
  • Proof of English language (if applicable) 

If you have any questions on the application process, please contact [Email Address Removed] 

Computer Science (8)

References

References (optional) Recent publications on the topic:
A. Modarressi, M. Fayyaz, E. Aghazadeh, Y. Yaghoobzadeh, and M. T. Pilehvar: DecompX: Explaining Transformers Decisions by Propagating Token Decomposition, ACL 2023.
A. Modarressi, H. Amirkhani, M. T. Pilehvar: Guide the Learner: Controlling Product of Experts Debiasing Method Based on Token Attribution Similarities, EACL 2023.
A. Modarressi, M Fayyaz, Y. Yaghoobzadeh, and M. T. Pilehvar: GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers, NAACL 2022.
A. Modarressi, H. Mohebbi, M.T. Pilehvar: AdapLeR: Speeding up Inference by Adaptive Length Reduction, ACL 2022.

How good is research at Cardiff University in Computer Science and Informatics?


Research output data provided by the Research Excellence Framework (REF)

Click here to see the results for all UK universities

Where will I study?

 About the Project