Riding the Multimedia Big Data Wave
Dr. John R. Smith, IBM T.J. Watson Research Center. Across generations of information retrieval and data management involving structured and unstructured data, the explosion of multimedia data is now the biggest wave of all. Today, multimedia makes up 60% of internet traffic, 70% of mobile phone traffic and 70% of all available unstructured data. Huge amounts of video are being both generated and consumed. Web users are uploading 72 video-hours to YouTube per minute. On an average day, social media users are posting 300 hundred million photos to Facebook. Consumers using mobile phones and digital cameras are taking 500B photos per year, which is 78 per person on the planet. Specialized domains are participating too. Medical institutions are acquiring 1B radiology images per year. Cities are installing hundreds of millions of video cameras worldwide for safety, security and law enforcement.
Multimedia is “big data”, but not just because there is a lot of it. Multimedia is big data because increasingly it is becoming a valuable source for insights and information. It can tell us about things happening in the world, give clues about a person’s preferences or experiences, point out places, people or events of interest, provide evidence about activities that have taken place and even capture a rolling log of human history. However, the challenge with multimedia big data, is that images and video require much more sophisticated algorithms for content analysis than previous waves of structured and unstructured data. This is spurring on tremendous amount of research and development of techniques for “bridging the semantic gap” to enable effective multimedia information extraction and retrieval.
In this talk we describe multiple industry problems requiring effective analysis and retrieval of images and video across safety and security, medical, Web, social media and mobile domains. We present a multi-layer approach of feature extraction, machine learning and semantic modeling that provides a powerful framework for classifying and retrieval contents of image and video data. We show how modeling and extraction can be mapped to “big data” distributed computing platforms to enable massive scale image and video analytics. We describe multiple efforts at IBM Research on image and video analysis and retrieval, including IBM Multimedia Analysis and Retrieval System (IMARS) and IBM Smart Vision Suite (SVS) and demonstrate recent results. We conclude with future directions for improving multimedia analytics through automated and interactive techniques for learning effective visual features and semantic models.

Dr. John R. Smith is Senior Manager, Intelligent Information Management Dept, IBM T. J. Watson Research Center. He leads IBM’s research in multimedia information retrieval including image/video content extraction, multimedia content-based search, video event detection and retrieval, and social media analysis. Dr. Smith is currently principal investigator for IBM Multimedia Analysis and Retrieval System (IMARS), which has been recognized by multiple awards including a Wall St. Journal innovation award. Dr. Smith is a long-time participant in the NIST TRECVID video retrieval evaluation and co-led the development of the Large Scale Concept Ontology for Multimedia (LSCOM), which has been incorporated into multiple TRECVID tasks since 2006. Dr. Smith earlier served as Chair, MPEG Multimedia Description Schemes Group from 2000-2004 and led the development of multiple parts of the MPEG-7 Multimedia Content Description Standard and MPEG-21 Digital Framework Standard. Dr. Smith also previously served as co-project Editor of MPEG-7 Multimedia Description Schemes and MPEG-7 Conformance specifications. While a student at Columbia University in the mid-1990’s, Dr. Smith conducted some of the earliest work on content-based image search (VisualSEEk) and Web image/video search (WebSEEk), which has been highly influential for researchers and practitioners. Dr. Smith has published more than two hundred papers in leading journals and conferences (14K citations, h-index of 55, i-index of 164). Dr. Smith is currently Editor-in-Chief of IEEE Multimedia and is a Fellow of IEEE.