This project aims to develop a new approach for joint syntactical and statistical data representation modelling. Image or video data representing a scene can be modelled by a graph  whose nodes are represented by Convolution Neural Networks (CNN), such as ResNet-50 , or they can be represented by relationships between Transformer networks , resulting in Graph Convolution Neural Networks (GCN) or Graphs of Transformer Networks (GTN). The scene representation can be performed following identification of objects or regions  or scene segmentation . The relationships between these regions are modelled by the edges connecting the nodes, while their associated weights indicate the degree of connectiveness [5,6] or interactions, either static or through movement [7,8]. The inter-dependencies between the features extracted by each CNN network can be defined through adjacency tensors [5,8]. An orthogonal decomposition can then be used to extract a syntactical image representation [4,9]. Data representations can be defined hierarchically [2,4,7], where the significance in the scene is modelled by the upper nodes. Your approach will have to define how the networks in the GCN interact with each other for the common goal . A generative tree model, GAN-Tree was proposed in . An interesting approach would be to use Transformers such as the Swin Transformer  for defining GTNs.
This project will develop a graph representation for images/video, by defining criteria for assigning processing nodes to specific regions in scenes and optimizing the modelling of the interconnectivities between such regions.
Objectives: Define a graph convolution network (CGN), jointly syntactical and statistical image representations, cost function for training CGN, minimizing the total number of parameters and computational complexity required.
Applications: Scene representation and understanding, person & vehicle reidentification, retrieval, scene synthesis, matching occluded images
Research areas: Computer Vision and Image Processing; Machine learning; Neural networks;
Applications: Syntactic scene representation from images/video, Recognition and Classification, Content Based Image retrieval, Vehicle/Object re-identification.
The candidate should be familiar or willing to learn about deep learning tools such as PyTorch or TensorFlow.