Tutorials

Tutorials will follow the main conference flipped-classroom format, with a pre-recorded lecture available in the conference platform before the conference. In addition, each tutorial will have two live-sessions on Sunday, July 11 (EDT), listed below as either “Live Q&A” or “Live Q&A + Practicum”. Attendees are expected to watch the recorded lectures in preparation for the live sessions.

Presenters: Ruoyuan Gao and Chirag Shah

Abstract: Search systems have unprecedented influence on how and what information people access. These gateways to information on the one hand create an easy and universal access to online information, and on the other hand create biases that have shown to cause knowledge disparity and ill-decisions for information seekers. Most of the algorithms for indexing, retrieval, and ranking are heavily driven by the underlying data that itself is biased. In addition, orderings of the search results create position bias and exposure bias due to their considerable focus on relevance and user satisfaction. These and other forms of biases that are implicitly and sometimes explicitly woven in search systems are becoming increasing threats to information seeking and sense-making processes. In this tutorial, we will introduce the issues of biases in data, in algorithms, and overall in search processes and show how we could think about and create systems that are fairer, with increasing diversity and transparency. Specifically, the tutorial will present several fundamental concepts such as relevance, novelty, diversity, bias, and fairness using socio-technical terminologies taken from various communities, and dive deeper into metrics and frameworks that allow us to understand, extract, and materialize them. The tutorial will cover some of the most recent works in this area and show how this interdisciplinary research has opened up new challenges and opportunities for communities such as SIGIR.

Times:

  • 12:00-13:30 (July 11; Live Q&A + Practicum)
  • 23:00-00:30 (July 11; Live Q&A + Practicum)

Presenters: Liang Pang, Qingyao Ai and Jun Xu

Abstract: Probability Ranking Principle (PRP), which assumes that each document has a unique and independent probability to satisfy a particular information need, is one of the fundamental principles for ranking. Traditionally, heuristic ranking features and well-known learning-to-rank approaches have been designed by following the PRP principle. Recently, neural IR models, which adopt deep learning to enhance the ranking performances, also obey the PRP principle. Though it has been widely used for nearly five decades, in-depth analysis shows that PRP is not an optimal principle for ranking, due to its independent assumption that each document should be independent of the rest candidates. Counter examples include pseudo-relevance feedback, interactive information retrieval, search result diversification, etc. To solve the problem, researchers recently proposed to model the dependencies among the documents during the designing of ranking models. A number of ranking models have been proposed and state-of-the-art ranking performances have been achieved. This tutorial aims to give a comprehensive survey on these recently developed ranking models that go beyond the PRP principle. The tutorial tries to categorize these models based on their intrinsic assumptions: assuming that the documents are independent, sequentially dependent, or globally dependent. In this way, we expect the researchers focusing on ranking in search and recommendation can have a novel angle of view on the designing of ranking models, and therefore can stimulate new ideas on developing novel ranking models. 

Times:

  • 11:30-13:00 (July 11; Live Q&A)
  • 22:30-00:00 (July 11; Live Q&A)

Presenters: Lingfei Wu, Yu Chen, Heng Ji and Bang Liu

Abstract: Due to its great power in modeling non-Euclidean data like graphs or manifolds, deep learning on graph techniques (i.e., Graph Neural Networks (GNNs)) have opened a new door to solving challenging graph-related NLP problems. There has seen a surge of interests in applying deep learning on graph techniques to NLP, and has achieved considerable success in many NLP tasks, ranging from classification tasks like sentence classification, semantic role labeling and relation extraction, to generation tasks like machine translation, question generation and summarization. Despite these successes, deep learning on graphs for NLP still face many challenges, including automatically transforming original text sequence data into highly graph-structured data, and effectively modeling complex data that involves mapping between graph-based inputs and other highly structured output data such as sequences, trees, and graph data with multi-types in both nodes and edges. This tutorial will cover relevant and interesting topics on applying deep learning on graph techniques to NLP, including automatic graph construction for NLP, graph representation learning for NLP, advanced GNN based models (e.g., graph2seq, graph2tree, and graph2graph) for NLP, and the applications of GNNs in various NLP tasks (e.g., machine translation, natural language generation, information extraction and semantic parsing). In addition, hands-on demonstration sessions will be included to help the audience gain practical experience on applying GNNs to solve challenging NLP problems using our recently developed open source library – Graph4NLP, the first library for researchers and practitioners for easy use of GNNs for various NLP tasks.

Times:

  • 10:00-11:30 (July 11; Live Q&A + Practicum)
  • 21:00-22:30 (July 11; Live Q&A + Practicum)

Presenters: Huazheng Wang, Yiling Jia and Hongning Wang
Abstract: Information retrieval (IR) in nature is a process of sequential decision making. The system repeatedly interacts with the users to refine its understanding of the users' information needs, improve its estimation of result relevance, and thus increase the utility of its returned results (e.g., the result rankings). Distinct from traditional IR solutions that rigidly execute an offline trained policy, interactive information retrieval emphasizes online policy learning. This, however, is fundamentally difficult for at least three reasons. First, the system only collects user feedback on the presented results, aka, the bandit feedback. Second, users' feedback is known to be noisy and biased. Third, as a result, the system always faces the conflicting goals of improving its policy by presenting currently underestimated results to users versus satisfying the users by ranking the currently estimated best results on top.

In this tutorial, we will first motivate the need for online policy learning in interactive IR, by highlighting its importance in several real-world IR problems where online sequential decision making is necessary, such as web search and recommendations. We will carefully address the new challenges that arose in such a solution paradigm, including sample complexity, costly and even outdated feedback, and ethical considerations in online learning (such as fairness and privacy) in interactive IR. We will prepare the technical discussions by first introducing several classical interactive learning strategies from machine learning literature, and then fully dive into the recent research developments for addressing the aforementioned fundamental challenges in interactive IR. Note that the tutorial on "Interactive Information Retrieval: Models, Algorithms, and Evaluation""will provide a broad overview on the general conceptual framework and formal models in interactive IR, while this tutorial covers the online policy learning solutions for interactive IR with bandit feedback.

Times:

  • 10:00-11:30 (July 11; Live Q&A)
  • 21:00-22:30 (July 11; Live Q&A)

Presenter: Chengxiang Zhai

Abstract:
Information Retrieval (IR) is in general an interactive process in which a user would interact with a retrieval system in potentially many different ways in order to finish the information seeking task. It is thus important to study Interactive Information Retrieval (IIR) where we would model and optimize an entire interactive retrieval process (rather than a single query) with consideration of many different ways a user can potentially interact with a search engine, including, e.g., conversational search and various forms of user feedback. This tutorial systematically reviews the progress of research in IIR with an emphasis on the most recent progress in the development of frameworks, models, algorithms, and evaluation strategies for IIR. It starts with a broad overview of research in IIR and then gives an introduction to formal models for IIR using a cooperative game framework and covering decision-theoretic models such as the Interface Card Model and Probability Ranking Principle for IIR. Next, it provides a review of some representative specific techniques and algorithms for IIR, such as various forms of feedback techniques and diversification of search results, followed by a discussion of how an IIR system should be evaluated and multiple strategies proposed recently for evaluating IIR using user simulation. The tutorial ends with a brief discussion of the major open challenges in IIR and some of the most promising future research directions.

Note that this tutorial emphasizes more on a broad coverage of general conceptual frameworks and formal models than on specific algorithms. For a more in-depth treatment of cutting-edge machine learning algorithms for IIR, please attend another SIGIR 2021 tutorial on this topic titled “Interactive Information Retrieval with Bandit Feedback” (presenters: Huazheng Wang, Yiling Jia and Hongning Wang).

Times:

  • 11:30-13:00 (July 11; Live Q&A)
  • 22:30-00:00 (July 11; Live Q&A)

Presenters: Andrew Yates, Rodrigo Nogueira and Jimmy Lin

Abstract:
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This tutorial, based on a forthcoming book, provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond.

We cover a wide range of techniques, grouped into two categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that perform ranking directly. Two themes pervade our treatment: techniques for handling long documents and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). In a hands-on session we demonstrate how open-source toolkits can be used to rank documents with a variety of these approaches.

Times:

  • 10:00-11:30 (July 11; Live Q&A + Practicum)
  • 21:00-22:30 (July 11; Live Q&A + Practicum)

Presenters: Alexander Kuhnle, Miguel Aroca-Ouellette, Anindya Basu, Murat Sensoy, John Reid and Dell Zhang

Abstract:
 There is strong interest in leveraging reinforcement learning (RL) for information retrieval (IR) applications including search, recommendation, and advertising. Just in 2020, the term ""reinforcement learning"" was mentioned in more than 60 different papers published by ACM SIGIR. It has also been reported that Internet companies like Google and Alibaba have started to gain competitive advantages from their RL-based search and recommendation engines. This full-day tutorial gives IR researchers and practitioners who have no or little experience with RL the opportunity to learn about the fundamentals of modern RL in a practical hands-on setting. Furthermore, some representative applications of RL in IR systems will be introduced and discussed. By attending this tutorial, the participants will acquire a good knowledge of modern RL concepts and standard algorithms such as REINFORCE and DQN. This knowledge will help them better understand some of the latest IR publications involving RL, as well as prepare them to tackle their own practical IR problems using RL techniques and tools.

Please refer to the tutorial website (https://rl-starterpack.github.io/) for more information.

Times:

  • 16:00-17:30 (July 11; Live Q&A + Practicum)
  • 03:00-04:30 (July 12; Live Q&A + Practicum)

Presenters: Yunqi Li, Yingqiang Ge and Yongfeng Zhang

Abstract:
Recently, there has been growing attention on fairness considerations in machine learning. As one of the most pervasive applications of machine learning, recommender systems are gaining increasing and critical impacts on human and society since a growing number of users use them for information seeking and decision making. Therefore, it is crucial to address the potential unfairness problems in recommendation, which may hurt users' or providers' satisfaction in recommender systems as well as the interests of the platforms.

The tutorial focuses on the foundations and algorithms for fairness in recommendation. It also presents a brief introduction about fairness in basic machine learning tasks such as classification and ranking. The tutorial will introduce the taxonomies of current fairness definitions and evaluation metrics for fairness concerns. We will introduce previous works about fairness in recommendation and also put forward future fairness research directions. The tutorial aims at introducing and communicating fairness in recommendation methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.

Times:

  • 10:00-11:30 (July 11; Live Q&A)
  • 21:00-22:30 (July 11; Live Q&A)

Presenters: Dilek Küçük and Fazli Can

Abstract:
Stance detection (also known as stance classification and stance prediction) is a problem related to social media analysis, natural language processing, and information retrieval, which aims to determine the position of a person from a piece of text they produce, towards a target (a concept, idea, event, etc.) either explicitly specified in the text, or implied only. The output of the stance detection procedure is usually from this set: {Favor, Against, None}. In this tutorial, we will define the core concepts and research problems related to stance detection, present historical and contemporary approaches to stance detection, provide pointers to related resources (datasets and tools), and we will cover outstanding issues and application areas of stance detection. As solutions to stance detection can contribute to significant tasks including trend analysis, opinion surveys, user reviews, personalization, and predictions for referendums and elections, it will continue to stand as an important research problem, mostly on textual content currently, and particularly on social media. Finally, we believe that image and video content will commonly be the subject of stance detection research soon.

Times:

  • 16:00-17:30 (July 11; Live Q&A)
  • 03:00-04:30 (July 12; Live Q&A)
Scroll to Top