Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Characterising Automatically Generated Text

   School of Computing, Engineering & the Built Environment

This project is no longer listed on and may not be available.

Click here to search for PhD studentship opportunities
  Dr Peter Barclay  No more applications being accepted  Funded PhD Project (Students Worldwide)

About the Project

Research is needed to understand the differences between human and machine generated text. Prior research has focused on identification of identification of deceptive text, such as phishing attempts or bot-generated tweets. There is, however, less work on identifying other forms of generated text such as automatic translations, or text that has been recorded by ‘essay assistant’ software. This project would focus on characterising the differences between human generated and machine generated text, for example by comparing originally authored material with automatically rewritten versions.

Academic qualifications 

A first-class honours degree, or a distinction at master level, or equivalent achievements ideally in Computer Science, Data Science, Linguistics. 

English language requirement 

If your first language is not English, comply with the University requirements for research degree programmes in terms of English language

Application process 

Prospective applicants are encouraged to contact the supervisor, Dr Peter J Barclay ([Email Address Removed]) to discuss the content of the project and the fit with their qualifications and skills before preparing an application.  

Contact details 

Should you need more information, please email [Email Address Removed]

The application must include:  

Research project outline of 2 pages (list of references excluded). The outline may provide details about 

  • Background and motivation, explaining the importance of the project, should be supported also by relevant literature. You can also discuss the applications you expect for the project results. 
  • Research questions or 
  • Methodology: types of data to be used, approach to data collection, and data analysis methods. 
  • List of references 

The outline must be created solely by the applicant. Supervisors can only offer general discussions about the project idea without providing any additional support. 

  • Statement no longer than 1 page describing your motivations and fit with the project. 
  • Recent and complete curriculum vitae. The curriculum must include a declaration regarding the English language qualifications of the candidate. 
  • Supporting documents will have to be submitted by successful candidates. 
  • Two academic references (but if you have been out of education for more than three years, you may submit one academic and one professional reference), on the form can be downloaded here

Applications can be submitted here. To be considered, the application must use: 

  • SCEBE1123” as project code. 
  • the advertised title as project title 

All applications must be received by 3rd December 2023. Applicants who have not been contacted by the 8th March 2024 should assume that they have been unsuccessful. Projects are anticipated to start on 1st October 2024. 

Download a copy of the project details here

Computer Science (8)


Afroz, S., Brennan, M., & Greenstadt, R. (2012). Detecting hoaxes, frauds, and deception in writing style online. 2012 IEEE Symposium on Security and Privacy, 461–475. Boudin, F., Mougard, H., & Cram, D. (2016). How document pre-processing affects keyphrase extraction performance. ArXiv Preprint ArXiv:1610.07809.
Dou, Y., Forbes, M., Koncel-Kedziorski, R., Smith, N., & Choi, Y. (2022). Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 7250–7274.
Hancock, J. T., Curry, L. E., Goorha, S., & Woodworth, M. (2007). On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes, 45(1), 1–23.
Ippolito, D., Duckworth, D., Callison-Burch, C., & Eck, D. (2019). Automatic detection of generated text is easiest when humans are fooled. ArXiv Preprint ArXiv:1911.00650.
Jawahar, G., Abdul-Mageed, M., & Lakshmanan, L. V. (2020). Automatic detection of machine generated text: A critical survey. ArXiv Preprint ArXiv:2011.01314.
Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29(5), 665–675.
Varshney, L. R., Keskar, N. S., & Socher, R. (2020). Limits of detecting text generated by large-scale language models. 2020 Information Theory and Applications Workshop (ITA), 1–5.
Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

 About the Project