The rise of the web and digital media has seen an explosion in the volume of unstructured text data that is available for analysis. One exciting opportunity is to use such data to identify relationships between people, organisations, places and events. Natural language processing (NLP) can extract entities and relations from text documents, which can then be represented as complex networks of interaction between different actors (i.e. people and organisations). Analysis of such networks has the potential to identify influential actors within domains such as business, politics, finance and security. For example, networks linking individuals to organisations have been used to trace corporate funding of the climate sceptic lobby in US politics, while the ongoing investigation into leaked documents from legal firm Mossack Fonseca (the so-called “Panama Papers”) is based on a massive network of different relationships linking individuals, corporate entities and offshore banking facilities.
Gaining insight from these complex networks demands advanced mathematical and computational techniques for large-scale data processing and analysis. Most work in network science has considered a single type of node linked by a single type of relationship. Yet the actor-relationship networks created from unstructured text can be both multipartite (having many types of node) and multiplex (having many types of relation). Methods for analysing these kinds of network are at the forefront of progress in network science.
This PhD project will develop new methods for identifying key actors in multipartite and multiplex networks derived from unstructured text documents. The student will have the opportunity to work on all aspects of the problem, from use of natural language processing methods to extract entities and relations from large collections of unstructured text documents, to network construction based on these entities and relationships, to application of network statistics and machine learning to identify influential actors. Various datasets and networks are available to underpin this research, as well as a range of tools and software libraries.
This fully funded project is co-sponsored by a fast-growing commercial data science startup with offices in London and Bristol. The student will be based in the computer science department at the Streatham (Exeter) campus of the University of Exeter. They will interact with the vibrant data science research community at Exeter, working with colleagues in mathematics, computer science and relevant quantitative social science disciplines, as appropriate. They will join the “Networks, Data and Complex Systems” research group led by Dr Hywel Williams and receive additional supervision from the commercial sponsor, with opportunities to gain industrial training and experience with real-world big data problems.
Candidates should have a strong background in a quantitative discipline, with programming skills and experience of data analysis. They will learn a variety of techniques in network analysis, machine learning and natural language processing, with excellent opportunities for research publications and further employment in both academic and industrial settings.
Interested candidates are encouraged to contact Dr Hywel Williams ([Email Address Removed]) for further information.
The University of Exeter’s College of Life and Environmental Sciences, in partnership with Adarga Ltd, is inviting applications for a fully-funded PhD studentship to commence in April 2017 or as soon as possible thereafter (flexible start date). For eligible students the studentship will cover UK/EU tuition fees plus an annual tax-free stipend of £14,296 for 4 years. The student would be based in Biosciences in the College of Life and Environmental Sciences at the Streatham Campus in Exeter.