Networks arise naturally in many areas of life; supermarkets use networks of customers to propose specific deals to targeted groups; banks orchestrate a complex system of transactions between them and clients; terrorists organise themselves in networks spread across countries; media and social networks dominate our lives, and inside each living being genes express and co-regulate themselves via complex networks. Graphs are mathematical objects that can be used to describe networks in terms of a set of nodes (vertices) and their interconnections (edges).
The goal of Network clustering is to provide a partition of the network into clusters or communities (groups) of related nodes. Many algorithms exist that can automatically infer such clusters under various assumptions and a range of validation measures can be used to determine the quality of the resulting partition and improve our understanding of the network structure. Most of these algorithms only consider the topology of the network under study, however, additional information about each node, i.e. metadata, is often available to us and we might be able to use this to further validate the partition and improve our understanding of the network structure.
For example, in biological gene networks, proteins or cellular functions are expected to correlate to the gene clusters. As a further example, in a social network, such as Facebook, we expect gender or age to correlate to clusters of people. In addition to this, in many complex systems, the exact relationship between nodes is unobserved or unknown. In some cases, we may observe interdependent signals from the nodes, such as time series.
We aim to use such signals to infer the missing relationships. This project will consider a range of approaches to explore the relation of networks metadata with a given network partition when network edges are unobserved. The automatic embedding of metadata alongside class label inference will be investigated using a hierarchical Bayesian modelling approach. The most important stage of this research will be to automatically model the mismatch between metadata, predicted edges and ground truth.
The output of this research will have a high impact in the field of biomedical science, where there is the need for fast integration of partial information on biological samples and class prediction.