In the middle of applying to universities? | SHARE YOUR EXPERIENCE In the middle of applying to universities? | SHARE YOUR EXPERIENCE

Developing and analysing document understanding and natural language processing systems in the context of business information extraction.

   School of Electronics, Electrical Engineering and Computer Science

This project is no longer listed on and may not be available.

Click here to search for PhD studentship opportunities
  Dr Barry Devereux, Dr KR Rafferty  No more applications being accepted  Competition Funded PhD Project (Students Worldwide)

About the Project

Project Introduction:

This proposed research aligns to the new Advanced Research and Engineering centre (ARC) within Northern Ireland. This Centre will drive future innovations in technology and enhance our capabilities in important research areas such as robotic process automation (RPA), workflow automation, visualisation, data analytics and artificial intelligence (AI).  The Centre brings together expertise from PwC, University of Ulster and Queen’s University Belfast.

This research project aligns to the workflow and AI streams within the Centre. A selection process will determine the strongest candidates across a range of projects, who may then be offered funding for their chosen project. Approximately £6000 per year is payable to the sponsored student in addition to the normal stipend. 

The automation of repetitive information processing tasks has the potential to realise enormous advances in productivity and user satisfaction across a range of business services and solutions. Deep learning approaches, using large scale neural network models, have recently been successfully applied to many information processing tasks, including knowledge discovery and information extraction, text summarization, and text generation. Such methods have been used to generate powerful models in the legal and commercial domain; for example, state-of-the-art Natural Language Processing models have been applied to the analysis and summarization of legal documents (Elwany et al 2019), legal textual entailment (Rosa et al 2021; Yoshioka et al 2021) and modelling the structure of commercial contracts (Hegel et al 2021). Moreover, document image understanding models have shown promising utility in extracting relevant information from structured commercial documents. The time is therefore ripe to develop and build systems that automate textual data analysis and text generation tasks for a range of real-world, commercially orientated problems.

Project Description:

In this project, the goal is to build on recent progress in deep learning and natural language processing to develop methods and systems for processing information in commercial document data and using the resultant representations to generate useful, task-relevant knowledge, for example, through text summarization, text generation, and question answering. The project will build on modern foundational models and architectures in machine learning, and in particular transformer models, such as BERT (Devlin et al 2019) and RoBERTa (Liu et al 2019). Particular problems that are to be tackled within the business services domain include defining Service-Level Agreements (SLAs), a stage in the finalisation of a contract between a service provider and a client. They are defined at different levels and used by organisations internally as well as in the supplier/customer relationship. All SLA use terminology and commonality of vocabulary that ensures the same quality of service across different units in an organisation as well as across multiple locations and subcontract work. Because of their ubiquity and importance in business services, it is costly, in terms of staff time, to ensure verification and compliance. At the same time, the structure, content and meaning of SLAs tend to follow particular patterns, and this kind of statistical regularity makes them an interesting candidate for deep learning-based text analysis, text generation, and other forms of automated processing.

A key challenge in this project is to develop models that process textual data in a way which is transparent, understandable and accountable. The large-scale transformer models that are now ubiquitous in Natural Language Processing research are essentially “black boxes”, where the representations of linguistic meaning and the basis for output decisions by the model are not readily explainable to an end user. Currently, therefore, there is intensive research on developing a wide range of statistical methods and other analysis techniques to better quantify and explain such models (Rogers et al 2020). For example, a key consideration is identifying key components in a document (sentences or words) that are critically relevant to the task at hand (Ormerod et al 2019).

In addition to the development of deep learning architectures and model explainability methods, an additional component of this project is to develop an assessment metric for the implemented intelligent systems and benchmark them against current practices. In particular, this will involve evaluating error rates and making recommendations for the depth of deployment of intelligent agents in practice.


Start Date: 01 / 10 / 2023

Application Closing date: 28 / 02 / 2023

Funding Notes

International: We welcome applications from international candidates. For candidates who do not meet the DfE funding residency requirements, a small number of international studentships may be available from the School. These are awarded via a competitive, selection process which will determine the strongest candidates across a range of School projects, who may then be offered funding for their chosen project.

UK/ROI: Applications from candidates in the UK and ROI are eligible for consideration for a DfE Studentship. As this is an industry-sponsored PhD, approximately £6,000 per year is payable to the sponsored student in addition to the annual DfE stipend if successful. Full eligibility information for UK/ROI candidates can be viewed via: View Website

Academic Requirements:

The minimum academic requirement for admission is normally an Upper Second Class Honours degree from a UK or ROI Higher Education provider in a relevant discipline, or an equivalent qualification acceptable to the University.

To Apply please complete an application through the Direct Applications Portal:


Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Elwany, E., Moore, D., & Oberoi, G. (2019). Bert goes to law school: Quantifying the competitive advantage of access to large legal corpora in contract understanding. arXiv preprint arXiv:1911.00473.
Hegel, A., Shah, M., Peaslee, G., Roof, B., & Elwany, E. (2021). The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues. arXiv preprint arXiv:2107.08128.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Ormerod, M., Martínez-del-Rincón, J., Robertson, N., McGuinness, B., & Devereux, B. (2019, August). Analysing representations of memory impairment in a clinical notes classification model. In Proceedings of the 18th BioNLP Workshop and Shared Task (pp. 48-57).
Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A primer in bertology: What we know about how bert works. Transactions of the Association for Computational Linguistics, 8, 842-866.
Rosa, G. M., Rodrigues, R. C., de Alencar Lotufo, R., & Nogueira, R. (2021, June). To tune or not to tune? zero-shot models for legal case entailment. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law (pp. 295-300).
Yoshioka, M., Aoki, Y., & Suzuki, Y. (2021, June). BERT-based ensemble methods with data augmentation for legal textual entailment in COLIEE statute law task. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law (pp. 278-284).
Project Key Words: Artificial Intelligence, Deep Learning, Natural Language Processing, Document Image Understanding.
Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs