Part I - The Basics
Donald Metzler (Yahoo!) [Short Bio]
Victor Lavrenko (University of Edinburgh) [Short Bio]
Abstract (Part I – The Basics):
This half-day tutorial will cover the basics of probabilistic models for information retrieval. Most of today's state-of-the-art retrieval models, including BM25 and language modeling, are grounded on probabilistic principles. Having a working understanding of these principles can help researchers understand existing retrieval models better and also provide industrial practitioners with an understanding of how such models can be applied to real world problems.
The tutorial will cover elements of the classical probabilistic model, including the probability ranking principle, the binary independence model, the 2-Poisson model, and the widely used BM25 model. The tutorial will also cover the language modeling approach to information retrieval, including various distributional assumption and smoothing techniques. Several document and query expansion approaches, including translation models and cluster-based smoothing will also be covered.
Attendees should have a basic understanding of probability and statistics. A brief refresher of basic concepts, including random variables, event spaces, conditional probabilities, and independence will be given at the beginning of the tutorial. In addition to slides, hands on exercises and examples of real world applications of the models will be used throughout the tutorial.
Bios:
Donald Metzler
Donald Metzler is a Research Scientist in the Search and Computational Advertising group at Yahoo! Research. He obtained his Ph.D. from the University of Massachusetts. His research interests include formal information retrieval models, web search, advertising, and machine learning. He has published research papers at major information retrieval venues, including SIGIR, CIKM, and WWW, and is a co-author of the book Search Engines: Information Retrieval in Practice. He is currently serving as co-chair of the SIGIR 2009 poster track.
More information can be found at http://research.yahoo.com/Don_Metzler.
Victor Lavrenko
Victor Lavrenko is a Lecturer in Informatics at the University of Edinburgh. He received his Ph.D. in Computer Science from the University of Massachusetts Amherst in 2004, and worked as a language technology consultant for the Credit Suisse Group prior to his appointment at Edinburgh. He has served as a co-chair of a HLT/NAACL 2003 student workshop and gave a tutorial on language modeling techniques at the SIGIR 2003 conference. Victor has published research papers in and has reviewed for the SIGIR, CIKM, NAACL/HLT, KDD and NIPS conferences. His research interests include formal models for searching text in multiple languages, annotating and retrieving images, and detecting and tracking novel events in the news.
Additional information can be found at http://homepages.inf.ed.ac.uk/vlavrenk/.