Internet Video Search

Cees G.M. Snoek Arnold W.M. Smeulders
University of Amsterdam
Science Park 904
1098 XH Amsterdam, The Netherlands
Centrum Wiskunde & Informatica
Science Park 123
1098 XG Amsterdam, The Netherlands



Course Description

In this half-day tutorial we focus on the computer vision challenges in internet video search, present methods how to achieve state-of-the-art performance while maintaining efficient execution, and indicate how to obtain spatiotemporal improvements in the near future. Moreover, we give an overview of the latest developments and future trends in the field on the basis of the TRECVID competition -- the leading competition for video search engines run by NIST -- where we have achieved consistent top-2 performance over the years, including the 2008, 2009, 2010 and 2011 editions. This half-day tutorial is especially meant for researchers and practitioners who are new to the field of video search (introductory), people who have started in this direction (intermediate), or people who are interested in a summary of the state-of-the-art in this exciting area (general interest).

The scientific topic of video search is dominated by five major challenges:

The gaps are bridged by forming a dictionary of visual detectors for concepts and events. The largest ones to date consist of thousands of concepts excluding concept-tailored algorithms. It would simply take too long to achieve. Instead, we come closer to the ideal of one computer vision algorithm tailored automatically to the purpose at hand by employing example data to learn from. We discuss the advantages and limitations of a machine learning approach from examples. We show for what type of semantics the approach is likely to succeed or fail. In compensation for the absence of concept-specific (geometric or appearance) models, we emphasize the importance of good feature sets. They form the basis of the observational model by all possible color, shape, texture or structure invariant features help to characterize the concept and event at hand. Apart from good features, the other essential component is state-of- the-art machine learning in order to get the most out of the learning data. We integrate the features and machine learning aspects into a complete internet video search engine, which has successfully competed in TRECVID. Throughout the tutorial we follow the video data as they flow through the efficient computational processes. Starting from fundamental visual features, covering local shape, texture, color, motion and the crucial need for invariance. Then, we explain how invariant features can be used in concert with difference coding and kernel-based supervised learning methods to arrive at an object, concept or event detector. We end our component-wise decomposition of video search engines by explaining the complexities involved in delivering a limited set of uncertain concept detectors to an inpatient online user. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits.

Comparative evaluation of methods and systems is imperative to appreciate progress. We discuss the data, tasks, and results of TRECVID, the leading benchmark. In addition, we discuss the many derived community initiatives in creating annotations, baselines, and software for repeatable experiments. We conclude the course with our perspective on the many challenges and opportunities ahead for the visual search community.

Lecture Topics

The technical content of our short course on video search engines is organized as follows:

Lecture Material

The lecture slides are available: lecture 1, lecture 2, lecture 3.

Several relevant papers are listed on our publication server.

Instructors Bios

Cees G.M. Snoek received the M.Sc. degree in business information systems (2000) and the Ph.D. degree in computer science (2005), both from the University of Amsterdam, Amsterdam, The Netherlands. He is currently an Assistant Professor in the Intelligent Systems Lab at the University of Amsterdam. He was a visiting scientist at Carnegie Mellon University, Pittsburgh, PA (2003) and at the University of California, Berkeley, CA (2010-2011). His research interest is video and image search. He has published over 100 refereed book chapters, journal and conference papers, and serves on the program committee of the major conferences in multimedia, computer vision, and information retrieval. Dr. Snoek is the lead researcher of the MediaMill Semantic Video Search Engine, which is a consistent top performer in the yearly NIST TRECVID evaluations. He is a co-initiator and co-organizer of the VideOlympics, co-chair of the SPIE Multimedia Content Access conference, and member of the editorial boards for IEEE MultiMedia and IEEE Transactions on Multimedia. He is a lecturer of post-doctoral courses given at international conferences and European summer schools. Cees is recipient of an NWO Veni award (2008), a Fulbright Junior Scholarship (2010), an NWO Vidi award (2012), and the Netherlands Prize for ICT Research (2012). All for research excellence. Several of his Ph.D. students have won best paper awards, including the IEEE Transactions on Multimedia Prize Paper Award.

Arnold W.M. Smeulders graduated from Technical University of Delft in physics in 1977 (M.Sc.) and in 1982 from Leyden University in medicine (Ph.D.) on the topic of visual pattern analysis. In 1994, he became full professor in visual information analysis at the University of Amsterdam. He has an interest in cognitive vision, content-based image retrieval and the picture-language question. He has written over 350 papers in refereed journals and conferences and has been cited 11,000 times. He received a Fulbright grant at Yale University in 1987, and has held a visiting professorship at the City University Hong Kong, Tsukuba Japan, Modena, Italy and Cagliari, Italy. He was elected fellow of International Association of Pattern Recognition. He was associated editor of IEEE Transactions PAMI and IJCV. Currently, he is with the national research institute CWI, scientific director of the large public-private COMMIT research program in the Netherlands, and chair of the policy committee for ICT-research in the Netherlands. He has graduated 40 PhD-students.