Data Lakes are emerging as data management infrastructures for storing data in various schemata and structural forms. Their goal is to serve as a single entry point for the data analysis process across highly heterogeneous datasets, supporting analytical tasks following a schema-on-read approach, in which data is discovered and integrated when it is to be used. Due to their semantic and structural heterogeneity, Data Lakes bring integration challenges to a new scale of complexity.
The Information Management Group at the University of Manchester invites applications for PhD candidates in the area of data integration and exploration on Data Lakes. PhD projects in this area will explore how contemporary techniques in Natural Language Processing (such as Open Information Extraction, Distributional Semantics and Semantic Parsing) can be used as a foundation to support exploratory data analysis on real-world data lakes.
Examples of research challenges include:
How to scale the integration of unstructured, semi-structured and structured datasets.
How to support end-users in exploratory data analysis (using Natural Language Questions for example).
How to use information embedded in large-scale corpora to support data integration.
How to use contemporary techniques in one-shot machine learning to support data integration.
Applicants are expected to have:
An excellent undergraduate degree in Computer Science or Mathematics (or related discipline), and preferably, a relevant M.Sc. degree.
Confidence and independence in programming complex systems in Java or Python.
Previous academic or industry experience in Natural Language Processing or Data Science (desired).
Excellent report writing and presentation skills.
Please note that applicants must additionally satisfy the standard requirements for postgraduate studies at the University of Manchester, such as a first-class or high upper-second class (or an equivalent international qualification) and English language qualifications, as stated in the PGR guidelines.
Qualified applicants are strongly encouraged to informally contact Norman Paton ([email protected]
) and Andre Freitas ([email protected]
) to discuss the application prior to applying.