Large amounts of multi-modal data are already being stored in many collections. There is a huge amount of information, but it is not possible to access or make use of this information unless it is efficiently organized to extract the semantics. The proposed research will show how semantics can be learned from loosely related multi-modal data. Learning from such a data is important, because it is available in large amounts. On the other hand, it is very hard to obtain the tightly related data which can be only produced by manual labelling.
This project aims to develop a new approach for multi-modal data sets, focusing on image and video collections with associated textual information. Learning the relationships between visual and textual information is an interesting example of multimedia data mining, particularly because it is hard to apply data mining techniques to collections of images and videos. The proposed system will lead to efficient retrieval and browsing, and some interesting applications including auto-annotation, auto-illustration and auto-documentary.
Students having background in image processing with expertise in any of the programming languages (Matlab, Python, C, C++, C#) are welcome. The domain of this project can be adjusted as per the qualification and interests of students.