Semantic image segmentation describes the process of associating each pixel of an image with a class label (such as flower, person, road, sky etc). The purpose of which is to distil the complexities of a high-resolution image containing millions of pixels to a lower level representation where image understanding can be achieved. Historically single image segmentation has been performed with probabilistic models although the current state of the art uses deep learning approaches. Many of the recent datasets for automotive vision contain hand labelled semantic segmentations where images are labelled in terms of road, road markings, buildings, vehicles, pedestrians, cyclists, and street furniture.
The typical approach is to train a network to reproduce these labels on unseen images. However, there is very little work which attempts to build the dynamics of the scene into the segmentation process. For example, the motion of a walking person is easily identifiable to the human eye even when the person occupies only a few pixels in size. At this resolution even, a human would have trouble distinguishing the human from a static image but through context and recognition of the pattern of moving pixels in video we can deduce human presence.
This project will focus on real time, complex urban sematic segmentation where the temporal evolution of the scene is used to increase segmentation accuracy. Although the dynamics of the scene can be used in segmentation they can also be used to predict the evolution of the scene and the future actions of other road users. Importantly this would allow an AI vision system to process an incoming video stream in real time, break that scene into its constituent components and answer questions such as “what do we expect the scene to look like in 5 seconds?”.
The project will investigate the use of spatiotemporal sematic segmentation and prediction of actions of other road users in the context of SAE level 4-5 autonomous vehicles. But specifically, it will focus on inner city/urban driving. It will focus on the computer vision tools for segmentation and reasoning about scene content and motion with integration into ROS and testing on our autonomous testbed.
The PhD is located within the Centre for Vision Speech and Signal Processing (CVSSP) at the University of Surrey but will involve close collaboration and internship opportunities at Jaguar Land Rover in Warwick. CVSSP is an internationally recognised leader in audio-visual machine perception research. With a diverse community of more than 150 researchers, we are one of the largest audio and vision research groups in the UK. You will join around 50 other postgraduate research students conducting research across a broad range of research areas in vision and deep learning.
This is 4 year project, starting in October 2019.
• A first class or 2:1 honours degree (or equivalent overseas qualification) in an appropriate discipline (e.g. engineering, computer science, signal processing, applied mathematics, and physics)
• You should be able to demonstrate excellent mathematical, analytic, programming skills
• Previous experience in computer vision, machine/deep learning, or augmented reality would be advantageous.
• IELTS 6.5 or above (or equivalent) with no sub-test of less than 6.
How to apply:
Applications should be sent through the PhD course page: https://www.surrey.ac.uk/postgraduate/vision-speech-and-signal-processing-phd
. Please clearly state the studentship title on your application. For enquiries contact Prof Richard Bowden ([email protected]
) indicating your areas of interest and including your CV with qualification details (copies of transcripts and certificates).