This project aims to explore the affordances and limitations of an ensemble approach to topic modelling and the opportunities and challenges in brings to the visualisation of such models for non-expert end-users. While powerful in their ability to construct complex thematic models of large text portfolio, probabilistic topic modelling methods are stochastic by nature. This trait often leads to instability in the computed model. This issue also often lead to several interpretability and explainability challenges faced by non-expert end-users.
This project primary objective is to create a learning framework that would take on an ensemble approach to topic modelling. While the aim of such approach primarly focuses on stabilising the topic discovery, the project would also seek to use it for improving metrics such as coherence and perplexity.
Research and evaluation would be performed to study suitable output fusion methodologies. From this, the project will investigate and compare the effects of an ensemble learning approach to the further data analysis processes that stem from topic modelling, for example, topic similarity, document relevance ranking, or trend analysis. A topic stability metric greatly impact the traditional approaches to such analyses, typically by creating an importance ranking of topics.
In the end the project should the see the development of an intuitive and interactive Topic Mapping viualisation dashboard that allows non-expert end-users to explore their portfolio and gain greater insights from the ensemble model. The final objective of this project will be to evaluate the interpretability and explainability of these stochastic ensemble models and their visualisations.
Prospective applicants are encouraged to contact the Supervisor before submitting their applications. Applications should make it clear the project you are applying for and the name of the supervisor(s).
A first degree (at least a 2.1) ideally in Computer Science or related areas, with a good fundamental knowledge of Data Science or Machine Learning.
English language requirement
IELTS score must be at least 6.5 (with not less than 6.0 in each of the four components). Other, equivalent qualifications will be accepted. Full details of the University’s policy are available online.
- Experience of fundamental concepts in Data Analytics and Visualisation
- Knowledge of Machine Learning methods and user studies
- Good written and oral communication skills
- Strong motivation, with evidence of independent research skills relevant to the project
- Good time management
- Experience with building data analysis pipeline and/or developing interactive visualisation dashboards for the web (e.g., D3.js) is a plus.
For enquiries about the content of the project, please email Dr Pierre Le Bras P.LeBras@napier.ac.uk
For information about how to apply, please visit our website https://www.napier.ac.uk/research-and-innovation/research-degrees/how-to-apply
To apply, please select the link for the PhD Computing FT application form