The goal of this project is to capture visual information from real-world environments, and to further reconstruct and convert it into editable geometric models with semantic contexts.
Scene understanding may involve information capture and analysis at geometric and semantic levels. The former focus on extraction of geometric entities/primitives from a scene, as well as interactions between them. The latter mainly learns the dense semantic labels for each of the geometric primitives obtained from a scene. Properly understanding a scene is an important prerequisite for richer industrial applications, including autonomous systems, navigation, mapping, and localisation. The ability to understand a scene depicted in a set of static images with other multi-sensory information has been an essential computer vision problem in practice. However, this level of understanding is rather inadequate since the real-world scene is often dynamic and noisy where unknown objects might be moving independently, and visual properties like illumination and texture might change with time.
The aim of this project is to develop explainable deep learning techniques that:
• enable real-time estimation of both geometric and semantic information from a real-world scene;
• create 3D editable contents using geometric and semantic information we obtained from a scene;
• provide human-understandable explanation and visualisation for the learning process.
In particular, the project is expected to achieve a certain level of transparency of a deep learning system and further fill the gap between it and human understanding. This should comprise real-time 3D reconstruction for a real-world scene including a series of real-world challenges e.g. dynamic objects, illumination changes, large textureless regions etc. In addition, a set of high quality labels should be achieved for such a raw 3D model, e.g. semantic labels and information about the shape and pose of objects and layouts of the real-world scene. The raw model is then turned into a proper representation that can be further edited by an average user using a visual interactive environment. The successful candidate is expected to work closely with experts from Electronic Engineering, as well as external collaborators from Facebook Oculus, Lambda labs, Kujiale.com and Imperial College London.
This project is associated with the UKRI Centre for Doctoral Training on Accountable, Responsible and Transparent AI and will be in line with the themes of the centre. The project will carry out a deep learning solution for indoor scene understanding from large-scale data. This will bring potential research into security and privacy of machine-based data collection and selection, which will further lead to more discussion about how a learning framework should be accountable. In addition, the proposed solution is expected to be explainable at a human-understandable level. This will interpret the black-box nature of typical deep learning and further achieve a transparent AI framework for more general practice.
Further details about the UKRI CDT in ART-AI can be found at: http://www.bath.ac.uk/centres-for-doctoral-training/ukri-centre-for-doctoral-training-in-accountable-responsible-and-transparent-ai/
Applicants should hold, or expect to receive, a First or Upper Second Class Honours degree. A master’s level qualification would also be advantageous. Desirable qualities in candidates include intellectual curiosity, a strong background in maths and programming experience.
Informal enquiries about the project should be directed to Dr Wenbin Li: [email protected]
Enquiries about the application process should be sent to [email protected]
Formal applications should be made via the University of Bath’s online application form: https://samis.bath.ac.uk/urd/sits.urd/run/siw_ipp_lgn.login?process=siw_ipp_app&code1=RDUCM-FP02&code2=0002
Start date: 28 September 2020.
 Wenbin Li, Sajad Saeedi, John McCormac, Ronald Clark, Dimos Tzoumanikas, Qing Ye, Yuzhong Huang, Rui Tang, Stefan Leutenegger. InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset, British Machine Vision Conference, BMVC 2018.
 Binbin Xu, Wenbin Li, Dimos Tzoumanikas, Michael Bloesch, Andrew J Davison, Stefan Leutenegger. MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM. International Conference on Robotics and Automation, ICRA 2019.
 Sajad Saeedi, Eduardo da Costa Carvalho, Wenbin Li, Dimos Tzoumanikas, Stefan Leutenegger, Paul H J Kelly, Andrew J Davison. Characterizing Visual Localization and Mapping Datasets. International Conference on Robotics and Automation, ICRA 2019.