FindAPhD Weekly PhD Newsletter | JOIN NOW FindAPhD Weekly PhD Newsletter | JOIN NOW

Content-based Information Retrieval Framework Considering both Relevant and Diverse Representations of User Query

   Faculty of Engineering and Informatics

  , Dr Kulvinder Panesar  Applications accepted all year round  Self-Funded PhD Students Only

About the Project

Multimedia items make for an important share of the data distributed and searched for on the Internet. In particular, geographic queries for tourism locations represent a substantial chunk of users’ queries. Current video/photo search technology mainly relies on employing text information to provide users with accurate results for their queries. Retrieval capabilities are however still below the actual needs of the common user, mainly due to the limitations of the content descriptors. For example, textual tags tend to be noisy or inaccurate (e.g., people may tag entire collections with a unique tag), automatic visual descriptors fail to provide high-level understanding of the scene while GPS coordinates capture the position of the photographer and not necessarily the position of the query.

Until recently, research focused mainly on improving the relevance of the results. However, an efficient information retrieval system should be able to summarize and rank search results so that it surfaces results that are both relevant and that are covering different aspects of a query (e.g., providing different views of London Bridge rather than duplicates of the same perspective). In this work we introduce a novel framework to provide solution for this emerging area of information retrieval that fosters new technology for improving both the relevance and diversification of search results with explicit focus on the actual social media context. This work is intended to support related areas of machine analysis, human-based computation (e.g., crowdsourcing) as well as hybrid approaches (e.g., relevance feedback, machine-crowd integration).

The proposed framework divides the multimedia data into different data streams: 1) text, and 2) visual data. Text analysis is considered from two different perspectives: lexical analysis for retrieval of desired content and sentiment analysis to determine writer’s (video description, tags etc.) attitude with reference to some topic or a document’s general contextual polarity. The attitude represents description/tags writer’s evaluation, emotional state during writing and the impact on the readers. The appraisal theory says that human cognitive process is very complex. Things happen and based on various criterion, humans appraise such events or happenings. Thus, their feelings and emotions are based on those appraisals. Extracting writer’s attitude is similar to appraisal theory in psychology. Human attention models are efficient methods for affective content extraction. Viewer attention is based on visual perception. Next, an aggregated attention curve is generated by an intra- and inter-modality fusion mechanism. Finally, the relevant and diverse content is extracted considering the users’ query sentiment and objectiveness. The fusion of multimedia provides a bridge that links the digital representation of multimedia with the user’s perceptions. This proposed system could provide more convenience for users and/or tourists and decrease the restriction of searching desired tourist locations and information.

Funding Notes

This is a self-funded PhD project; applicants will be expected to pay their own fees or have a suitable source of third-party funding. A bench fee may also apply to this project, in addition to the tuition fees. UK students may be able to apply for a Doctoral Loan from Student Finance for financial support.


1. Lew, M.S., Sebe, N., Djeraba, C. and Jain, R., 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2(1), pp.1-19.
2. Anand, S., 2019. Content-Based Creative Suggestions for User Queries. North Carolina State University.
3. Colombo, C., Del Bimbo, A. and Pala, P., 1999. Semantics in visual information retrieval. Ieee Multimedia, 6(3), pp.38-53.
4. Zhang, C. and Chen, T., 2002. An active learning framework for content-based information retrieval. IEEE transactions on multimedia, 4(2), pp.260-268.

Email Now

Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.

PhD saved successfully
View saved PhDs