The field of stylometry uses statistical techniques to analyse literature and answer questions about authorship. A typical question would be “Given two different pieces of writing, it is possible to determine whether both pieces have been written by the same author, or by two different authors?”. This is often formulated as a supervised learning problem where the goal is to build a statistical or machine learning model from a training set consisting of previous (known) works that each candidate author has written, and using this model to make inferences about the probability of them being the author of the new text.
Such techniques have previously been used for the analysis of literary works, such as detecting forgeries when a newly discovered work is claimed to have been written by some famous author (e.g. Shakespeare). Recently, there has been an increased interest in applying these techniques to the analysis of social media data. Questions here might include:
- If a person claims that their social media account has been hacked, is it possible to determine whether posts that have been made after the hack were really written by the original author?
- If we suspect that two user accounts on a platform are controlled by the same person, is it possible to confirm this using statistical analysis?
This project aims to develop new methodology for the analysis of writing, and apply it to both literary and social media applications. There are many potential projects in this area, and some potential methodological issues might include: the use of hierarchal modelling or regularisation to help scale traditional stylometric methods up to large social media datasets. Nonparametric modelling of authorship style. Unsupervised learning where we do not have a training set for each author. Etc.
• A Bachelor’s degree in Statistics, Mathematics, Physics, Computer Science or similar (a First Class or good Upper Second Class Honours degree, or the equivalent from an overseas university);
• Strong verbal and written communication skills in English.
This project is funded by a University of Edinburgh scholarship which fully covers the cost of tuition fees and provides an annual stipend. This scholarship is open to home, EU, and overseas students.