Tutorials

Half-Day Tutorials (Monday morning, August 7, 2017)
- Statistical Significance Testing in Information Retrieval: Theory and Practice
- Candidate Selection for Large Scale Personalized Search and Recommender Systems
Half-Day Tutorials (Monday afternoon, August 7, 2017)
- A/B Testing at Scale: Accelerating Software Innovation
- Probabilistic Topic Models for Text Data Retrieval and Analysis
Full-Day Tutorials (Monday morning and afternoon, August 7, 2017)

Candidate Selection for Large Scale Personalized Search and Recommender Systems

August 7 (Monday) 9:30-12:20, 42F Takao

Presenters:
Dhruv Arya (LinkedIn), Ganesh Venkataraman (LinkedIn), Aman Grover (LinkedIn), Krishnaram Kenthapadi (LinkedIn)

Abstract:
Modern day social media search and recommender systems require complex query formulation that incorporates both user context and their explicit search queries. Users expect these systems to be fast and provide relevant results to their query and context. With millions of documents to choose from, these systems utilize a multi-pass scoring function to narrow the results and provide the most relevant ones to users. Candidate selection is required to sift through all the documents in the index and select a relevant few to be ranked by subsequent scoring functions. It becomes crucial to narrow down the document set while maintaining relevant ones in resulting set. In this tutorial we survey various candidate selection techniques and deep dive into case studies on a large scale social media platform. In the later half we provide hands-on tutorial where we explore building these candidate selection models on a real world dataset and see how to balance the tradeoff between relevance and latency.

From Design to Analysis: Conducting Controlled Laboratory Experiments with Users

August 7 (Monday) 9:00-17:20, 43F Comet

Presenters:
Diane Kelly (University of Tennessee), Anita Crescenzi (University of North Carolina at Chapel Hill)

Abstract:
This full-day tutorial provides general instruction about the design of controlled laboratory experiments that are conducted in order to better understand human information interaction and retrieval. Different data collection methods and procedures are described, with an emphasis on self-report measures and scales. This tutorial also introduces the use of statistical power analysis for sample size estimation and introduces and demonstrate two data analysis procedures, Multilevel Modeling and Structural Equation Modeling, that allow for examination of the whole set of variables present in interactive information retrieval (IIR) experiments, along with their various effect sizes. The goals of the tutorial are to increase participants’ (1) understanding of the uses of controlled laboratory experiments with human participants; (2) understanding of the technical vocabulary and procedures associated with such experiments and (3) confidence in conducting and evaluating IIR experiments. Ultimately, we hope our tutorial will increase research capacity and research quality in IR by providing instruction about best practices to those contemplating interactive IR experiments.

Building Test Collections: An Interactive Guide for Students and Others Without Their Own Evaluation Conference Series

August 7 (Monday) 9:00-17:20, 42F Mitake

Presenters:
Ian Soboroff (NIST)

Abstract:
Test collections are vital to information retrieval research and experiment. However, test collections only exist for a limited number of genres, data types, and search tasks. Since many graduate students would like to explore IR in novel areas, it’s likely they will need to build their own test collection to get the best measurements for their research.

Authors rolling their own test collections have a high bar to overcome because reviewers prefer test collections that emerge from large community evaluations like TREC. However, a significant body of recent research has made it possible for small teams to not only build their own test collections, but to support their use by measuring the properties of the test collection and including those figures in their work.

The intended audience is advanced students who find themselves in need of a test collection, or actually in the process of building a test collection, to support their own research. Not everyone can talk TREC, CLEF, INEX, or NTCIR into running a track to build the collection you need. The goal of this tutorial is to lay out issues, procedures, pitfalls, and practical advice.

Attendees should come with a specific current need for data, and/or details on their in-progress collection building effort. The structure of the tutorial will include a lecture component covering history, techniques, and research questions, and an interactive discussion component during which we will collaboratively work through problems the attendees are currently working on.

Neural Networks for Information Retrieval (NN4IR)

August 7 (Monday) 9:00-17:20, 5F Concord Ballroom C

Presenters:
Tom Kenter (University of Amsterdam), Alexey Borisov (University of Amsterdam), Christophe Van Gysel (University of Amsterdam), Mostafa Dehghani (University of Amsterdam), Maarten de Rijke (University of Amsterdam), Bhaskar Mitra (Microsoft, University College London)

Abstract:
Machine learning plays an important role in many aspects of modern IR systems, and deep learning is applied to all of those. The fast pace of modern-day research into deep learning has given rise to many different approaches to many different IR problems. What are the underlying key technologies and what key insights into IR problems are they able to give us? This full-day tutorial gives a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research and our understanding of IR problems. Additionally, we peek into the future by examining recently introduced paradigms as well as current challenges. Expect to learn about neural networks in semantic matching, ranking, user interaction, and response generation in a highly interactive tutorial.

A/B Testing at Scale: Accelerating Software Innovation

August 7 (Monday) 14:00-17:20, 42F Fuji

Presenters:
Alex Deng (Microsoft), Pavel Dmitriev (Microsoft), Somit Gupta (Microsoft), Ron Kohavi (Microsoft), Paul Raff (Microsoft), Lukas Vermeer (Booking.com)

Abstract:
The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments, also known as A/B tests. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google, Bing, Yahoo!) to retailers (e.g., Amazon, eBay, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and evaluation of online controlled experiments at scale (100’s of concurrently running experiments) across variety of web sites, mobile apps, and desktop applications presents many pitfalls and new research challenges.

In this tutorial we will give an introduction to A/B testing, share key lessons learned from scaling experimentation at Bing to thousands of experiments per year, present real examples, and outline promising directions for future work. The tutorial will go beyond applications of A/B testing in information retrieval and will also discuss on practical and research challenges arising in experimentation on web sites and mobile and desktop apps.

Our goal in this tutorial is to teach attendees how to scale experimentation for their teams, products, and companies, leading to better data-driven decisions. We also want to inspire more academic research in the relatively new and rapidly evolving field of online controlled experimentation.

SIGIR 2017 Tutorial on Health Search (HS2017) – A Full-day from Consumers to Clinicians

August 7 (Monday) 9:00-17:20, 43F Orion
Cancelled

Presenters:
Guido Zuccon (Queensland University of Technology), Bevan Koopman (CSIRO)

Abstract:
The HS2017 tutorial at SIGIR will cover topics from an area of information retrieval with significant societal impact — health search. Whether it is searching patient records, helping medical professionals find best-practice evidence, or helping the public locate reliable and readable health information online, health search is a challenging area for IR research with an actively growing community and many open problems. This tutorial will provide attendees with a full stack of knowledge on health search, from understanding users and their problems to practical, hands-on sessions on current tools and techniques, current campaigns and evaluation resources, as well as important open questions and future directions.

Probabilistic Topic Models for Text Data Retrieval and Analysis

August 7 (Monday) 14:00-17:20, 42F Takao

Presenters:
Chengxiang Zhai (University of Illinois at Urbana-Champaign)

Abstract:
Text data include all kinds of natural language text such as web pages, news articles, scientific literature, emails, enterprise documents, and social media posts. As text data continues to grow quickly, it is increasingly important to develop intelligent systems to help people manage and make use of vast amounts of text data (“big text data”). As a new family of effective general approaches to text data retrieval and analysis, probabilistic topic models, notably Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocations (LDA), and many extensions of them, have been studied actively in the past decade with widespread applications. These topic models are powerful tools for extracting and analyzing latent topics contained in text data; they also provide a general and robust latent semantic representation of text data, thus improving many applications in information retrieval and text mining. Since they are general and robust, they can be applied to text data in any natural language and about any topics. This tutorial will systematically review the major research progress in probabilistic topic models and discuss their applications in text retrieval and text mining. The tutorial will provide (1) an in-depth explanation of the basic concepts, underlying principles, and the two basic topic models (i.e., PLSA and LDA) that have widespread applications, (2) a broad overview of all the major representative topic models (that are usually extensions of PLSA or LDA), and (3) a discussion of major challenges and future research directions. The tutorial should be appealing to anyone who would like to learn about topic models, how and why they work, their widespread applications, and the remaining research challenges to be solved, including especially graduate students, researchers who want to develop new topic models, and practitioners who want to apply topic models to solve many application problems. The attendants are expected to have basic knowledge of probability and statistics.

Statistical Significance Testing in Information Retrieval: Theory and Practice

August 7 (Monday) 9:00-12:20, 42F Fuji

Presenters:
Ben Carterette (University of Delaware)

Abstract:
The past 25 years have seen a great improvement in the rigor of information retrieval experimentation, due primarily to two factors: high-quality, public, portable test collections such as those produced by TREC/CLEF/NTCIR, and the increased practice of statistical hypothesis testing to determine whether measured improvements can be ascribed to something other than random chance. Together these create a very useful standard for reviewers, program committees, and journal editors; work in information retrieval (IR) increasingly cannot be published unless it has been evaluated using a well-constructed test collection and shown to produce a statistically significant improvement over a good baseline.

But, as the saying goes, any tool sharp enough to be useful is also sharp enough to be dangerous. Statistical tests of significance are widely misunderstood. Researchers and developers tend to treat them as a “black box”: evaluation results go in, a p-value comes out, and if that p-value is less than 0.05, it’s good. This tutorial will present a detailed guide to understanding what statistical significance tests are telling us, and how we can use that understanding to develop better tests for the specific experimental problems we face in IR. It will touch on issues from statistical power to multiple testing to generalized linear models to Bayesian testing models, all from the perspective of someone interested in knowing whether their IR system can beat a baseline.

Please note that SIGIR 2017 does not offer a registration option for workshop or tutorial only participation.