Multimedia Indexing and Retrieval


SIGIR '99 Workshop Summary

Workshop chairs:

    Rohini K. Srihari, Zhongfei Zhang

    SUNY at Buffalo

    R. Manmatha, Chandu Ravela

    University of Massachusetts

The workshop on Multimedia Indexing and Retrieval(MMIR) was held on 19 August, 1999 after the SIGIR'99 conference in Berkeley, CA, USA. The workshop started with a presentation by the organizers with an overview of the general issues in MMIR without concentrating on specific media. One of the expected outcomes of the workshop was set out to be a consensus on the role of SIGIR in multimedia information retrieval research. The user need for semantic retrieval in different domain was stressed and this theme reappeared in the different panel discussions held during the workshop. The overview was followed by a session of technical paper presentations. Papers on combining text and image information for content labeling of images, framework for information retrieval over multimedia documents, and experiences on a real-time system for multidimensional browsing of video archives were presented.

The workshop included three panel discussions with one focused on video and image database. The morning session included a panel discussion on Adaptive Multimedia Search Agents. The focus of this discussion was on providing truly user-friendly search mechanisms in the context of multimedia. Some real examples were introduced to emphasize the need for multimodal search capabilities and a single query space for all modalities. The user may have portions of the information in different modalities. A multimedia system should be able to capture the essence of user need from these parts. The difference between adaptable(user-initiated) and adaptive(system-initiated) systems was introduced and some issues in adaptive multimedia search were identified: multimodality and scalability, user and usage modeling. User integration in adaptive systems involves learning the user behavior and maintaining user information or profiles.

The workshop participants, drawing instances from their experiences, identified the gap between the user needs and the results presented by retrieval systems. It was identified that there are significant problems in inferring semantics from user query and their representation in different modalities. The problem of image understanding was given as an example. This being a difficult problem to solve, it was suggested that effort can be directed towards identifying specific applications with short terms results in multimedia indexing and retrieval. A list of MMIR applications are expected in the next few years includes searching clipart/image databases for professional presentations, filtering web documents of specific visual contents.

The second panel discussion was on Video and Image Databases. This included a presentation by Paul Over on the plans of NIST in building a video test collection with digitized videos and their transcripts that can be used by researchers in content-based information retrieval and related fields as a collection in which search, retrieval and analysis can be performed. The objective of the presentation was to get some feedback from the workshop participants on the test collection, how it can improved, who can use it and what should be its characteristics. Rohini Srihari presented the details of a similar effort undertaken at SUNY, Buffalo for a image test collection. This session included a presentation of on the status of the MPEG-7 standard. The workshop participants provided some suggestions on the type of data that can constitute the video test collection. It was agreed upon that such efforts, like video and image test collection are required to compare and evaluate multimedia indexing systems.

The last panel discussion was on Multimedia Query Processing. The focus of this discussion was on expressing and interpreting multimedia queries. While SQL-like representations have been suggested for multimodal queries, there was agreement with the panel and participants that they are inadequate for representing low-level mechanisms used in multimedia processing. The need for providing MMIR systems with multimodal query interface was recognized, however, there is no consensus on the method of capturing user queries.

The participants agreed that the workshop should continue to be held in conjunction with SIGIR. It was felt that perhaps a focus on a single task, or a single medium, say images, would render the workshop more beneficial.