Tutorials

All SIGIR tutorials occur on Sunday, July 19, and will run either a half day or the full day. They will be held at Shillman Hall on the campus of Northeastern University, approximately 0.9 miles (1.4 kilometers) from the conference hotel. You should allot approximately 20 minutes to reach the tutorial site from the Sheraton, by foot or train. (For those of you staying in Northeastern University Housing, you can reach the tutorial site by foot in 3 minutes or less.) In addition to transit time, you should allot 10 to 15 minutes to register, obtain your tutorial materials, and proceed to your tutorial room

Shillman Hall is Building 30 on the Northeastern University main campus maps. Upon entering the building, check in at the registration desk to obtain your tutorial materials and your exact room assignment.

All morning tutorials start at 9:00am and all afternoon tutorials start at 1:30. Morning tutorials end at 12:30 and afternoon tutorials end at 5:00 or 5:30, depending. Specific hours for each tutorial are listed below.

Lunch is provided for all tutorial attendees, so stick around or come by earlier, depending. Lunch will run from 12:30-1:30.

Full day tutorials

T1: Data-Intensive Text Processing with MapReduce (9:00-5:00)

Summary: [Click here for more details]
The emphasis of this tutorial is scalability and the tradeoffs associated with distributed processing of large datasets. The tutorial will cover "core" information retrieval topics (e.g., inverted index construction) as well as related topics in the broader area of human language technologies (e.g., distributed parameter estimation, graphs algorithms).

Presenter:
Jimmy Lin (University of Maryland)
T2 and T5: Probabilistic Models for Information Retrieval: Part I and Part II (9:00-5:00)

Summary:
Part I [Click here for more details] , 9:00-12:30
The tutorial will cover elements of the classical probabilistic model, including the probability ranking principle, the binary independence model, the 2-Poisson model, and the widely used BM25 model...

Part II [Click here for more details] 1:30-5:00
This half-day tutorial will cover advanced topics in probabilistic models for information retrieval. The tutorial will cover dependence assumptions in the classical probabilistic model and the language modeling framework for information retrieval...

Presenters:
Donald Metzler (Yahoo!)
Victor Lavrenko (University of Edinburgh)
Morning tutorial: Part I - The Basics
Afternoon tutorial: Part II - Advanced Topics
You may register for Part I, Part II, or both parts

Morning tutorials

T2: Probabilistic Models for Information Retrieval: Part I (9:00-12:30)

[Click here for more details]

Summary:
The tutorial will cover elements of the classical probabilistic model, including the probability ranking principle, the binary independence model, the 2-Poisson model, and the widely used BM25 model...

Presenters:
Donald Metzler (Yahoo!)
Victor Lavrenko (University of Edinburgh)
Morning tutorial: Part I - The Basics
Afternoon tutorial: Part II - Advanced Topics
You may register for Part I, Part II, or both parts
~~T4: If you like The Beatles you might like... A tutorial on Music Recommendation~~ (cancelled)

[Click here for more details]

Summary:
In this tutorial we look at the current state-of-the-art in music recommendation. We examine current commercial and research systems, focusing on the advantages and the disadvantages of the various recommendation strategies.

Presenters:
Òscar Celma (Barcelona Music and Audio Technologies)
Paul Lamere (The Echo Nest)
T6: IR Prototypes and Web Search Hacks with Open Source Tools (9:00-12:30)

[Click here for more details]

Summary:
Web search is a public-facing industry application of IR research. One of the best ways to gather data about web search behavior is to build your own search system. Prototype IR and web search systems can be used to gather user interaction data and test the applicability of research ideas. Open source tools and services by can greatly speed up the implementation of these systems, allowing for quick evaluation. We will give detailed overviews of several open source tools, providing examples of search and IR algorithms and systems implemented using them, as well as discussing how evaluation can be carried out using these tools.

Presenters:
Rosie Jones (Yahoo!)
Vik Singh (Yahoo!)

Afternoon tutorials

T3: Web Query Log Mining (1:30-5:30)

[Click here for more details]

Summary:
The Web continues to grow and evolve very fast, changing our daily lives. It can be considered the unique result of the collaborative work of the millions of institutions and people that contribute content to the Web as well as the one billion people that use it. In this ocean of contributed data there is a huge amount of both explicit and implicit information and knowledge. Web Mining is the task of analyzing this data and extracting information and knowledge for many different purposes. This tutorial will review studies analyzing how users interact with search engine systems; how can a query be considered correctly answered, and so on. In particular, the main objective of this tutorial is to give participants a unified view on the literature on query log analysis.

Presenters:
Ricardo Baeza-Yates (Yahoo!)
Raffaele Perego (ISTI - CNR)
Fabrizio Silvestri (ISTI - CNR)
T5: Probabilistic Models for Information Retrieval: Part II (1:30-5:00)

[Click here for more details]

Summary:
This half-day tutorial will cover advanced topics in probabilistic models for information retrieval. The tutorial will cover dependence assumptions in the classical probabilistic model and the language modeling framework for information retrieval...

Presenters:
Donald Metzler (Yahoo!)
Victor Lavrenko (University of Edinburgh)
Morning tutorial: Part I - The Basics
Afternoon tutorial: Part II - Advanced Topics
You may register for Part I, Part II, or both parts
~~T7: Speech Search: Techniques and Tools for Spoken Content Retrieval~~ (cancelled)

[Click here for more details]

Summary:
This tutorial will provide researchers in information retrieval with an introduction to the challenges and technologies of spoken content search. It is designed for a broad audience including students undertaking PhD research in spoken content access as more experienced researchers looking to extend or update their knowledge. The tutorial will review the history of spoken content search to date, its component technologies (including a summary introduction to speech recognition), its relationship to text information retrieval, critical system design issues, domains of application, and issues of interaction with spoken content to support efficient access to specific content within spoken content.

Presenters:
Gareth J. F. Jones (Dublin City University)
Martha Larson (Delft University of Technology)
T8: Logistic Regression for Information Retrieval (1:30-5:00)

[Click here for more details]

Summary:
Logistic regression is a flexible, effective approach to supervised learning of classifiers. It is most notably and widely used in text classification and text mining, but has seen application to almost every information retrieval and natural language processing problem. I will present logistic regression from several points of view (statistics, machine learning, neural networks, and maximum entropy), so that attendees can draw on good ideas and good software from each field. I will also point out how the basic ideas of logistic regression have been adapted to produce structured approaches such as conditional random fields and probabilistic relational models, giving the attendee a leg up on understanding these techniques.

Presenter:
David D. Lewis (www.DavidDLewis.com)

The 32nd Annual ACM SIGIR Conference July 19-23 2009

Full day tutorials

T1: Data-Intensive Text Processing with MapReduce (9:00-5:00)

T2 and T5: Probabilistic Models for Information Retrieval: Part I and Part II (9:00-5:00)

Morning tutorials

T2: Probabilistic Models for Information Retrieval: Part I (9:00-12:30)

T4: If you like The Beatles you might like... A tutorial on Music Recommendation (cancelled)

T6: IR Prototypes and Web Search Hacks with Open Source Tools (9:00-12:30)

Afternoon tutorials

T3: Web Query Log Mining (1:30-5:30)

T5: Probabilistic Models for Information Retrieval: Part II (1:30-5:00)

T7: Speech Search: Techniques and Tools for Spoken Content Retrieval (cancelled)

T8: Logistic Regression for Information Retrieval (1:30-5:00)

Sponsors:

Program

For attendees

For contributors

About us