Full Day Tutorials
Economics provides an intuitive and natural way to formally represent the costs and benefits of interacting with applications, interfaces and devices. By using economic models it is possible to reason about interaction, make predictions about how changes to the system will affect performance and behavior, and measure the performance of people’s interactions with the system.
Development of the majority of the leading web services and software products today is generally guided by data-driven decisions based on evaluation that ensures a steady stream of updates, both in terms of quality and quantity. Large internet companies use online evaluation on a day-to-day basis and at a large scale. The number of smaller companies using A/B testing in their development cycle is also growing. Web development across the board strongly depends on quality of experimentation platforms. In this tutorial, we will overview state-of-the-art methods underlying everyday evaluation pipelines at some of the leading internet companies.
We invite software engineers, designers, analysts, service or product managers — beginners, advanced specialists, and researchers — to join us at the conference SIGIR 2019, which will take place in Paris from 21 to 25 of July, to learn how to make web service development data-driven and do it effectively.
This is the third version of the tutorial that have already been presented at WWW and KDD, where it was one of the most popular. We present you a program of a balanced mix between an overview of academic achievements in the field of online evaluation and a portion of unique industrial practical experience shared by both the leading researchers and engineers from Yandex and Facebook. Whether you work at a company, might do so in the future or plan to drive the practice of online evaluation in academia, we welcome you at our tutorial. Please, visit the web-site with materials from the previous versions of our tutorials by the link above for more information.
This tutorial aims to weave together diverse strands of modern learning-to-rank (LtR) research, and present them in a unified full-day tutorial. First, we will introduce the fundamentals of LtR, and an overview of its various subfields. Then, we will discuss some recent advances in gradient boosting methods such as LambdaMART by focusing on their efficiency/effectiveness trade-offs and optimizations. We will then present TF-Ranking, a new open source TensorFlow package for neural LtR models, and how it can be used for modeling sparse textual features. We will conclude the tutorial by covering unbiased LtR – a new research field aiming at learning from biased implicit user feedback.
The tutorial will consist of three two-hour sessions, each focusing on one of the topics described above. It will provide a mix of theoretical and hands-on sessions, and should benefit both academics interested in learning more about the current state-of-the-art in LtR, as well as practitioners who want to use LtR techniques in their applications.
The tutorial is based on our long-term research on open domain conversation and rich hands-on experience on development of Microsoft XiaoIce. We will summarize the recent achievements made by both academia and industry on chatbots, and give a thorough and systematic introduction to state-of-the-art methods for open domain conversation modeling including both retrieval-based methods and generation-based methods. In addition to these, our tutorial will also cover some new trends of research of chatbots, such as the transition from neural architecture design to neural architecture learning, and the transition from single-modal conversation to multi-modal conversation.
Exploration is one of the primordial ways to accrue knowledge about the world and its nature. As we accumulate, mostly automatically, data at unprecedented volumes and speed, our datasets have become complex and hard to understand. In this context, exploratory search provides a handy tool for progressively gather the necessary knowledge by starting from a tentative query that can provide cues about the next queries to issue. An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL) and convoluted mechanism, and at the same time retain the flexibility and expressiveness required to express complex information needs. Recently, we have witnessed a rediscovery of the so-called example-based methods, in which the user, or the analyst circumvent query languages by using examples as input. This shift in semantics has led to a number of methods receiving as query a set of example members of the answer set. The search system then infers the entire answer set based on the given examples and any additional information provided by the underlying database.
In this tutorial, we present an excursus over the main example-based methods for exploratory analysis. In this tutorial, we will provide a detailed overview of the new area of example-based methods for exploratory search, surveying the relevant state-of-the-art techniques. We will detail the overall problem formulation and taxonomy of methods, related to the questions they answer. Moreover, we will present future directions discussing various machine learning techniques used to infer user preferences in an online fashion.
Explainable recommendation and search attempt to develop search/recommendation models that are both accurate (i.e., high-quality recommendation or search results), and explainable (i.e., model is explainable or intuitive explanations of the results can be generated), which can help to improve the system transparency, persuasiveness, trustworthiness, and effectiveness. The tutorial focuses on the recent research of explainable recommendation and search algorithms, as well as their application in real-world systems such as search engine, e-commerce and social networks. The tutorial aims at introducing and communicating explainable recommendation and search methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.
Tables are a practical and useful tool in many application scenarios. Tables can be effectively utilized for collecting and organizing information from multiple sources. With the help of additional operations, such as sorting, filtering, and joins, this information can be turned into knowledge and, ultimately, can be used to support decision-making. Thanks to their convenience and utility, a large number of tables are being produced and are made available on the Web. These tables represent a valuable resource and have been a focus of research for over two decades now. In this tutorial, we provide a systematic overview of this body of research.
Tables on the web differ from traditional tables (that is, tables in relational databases and tables created in spreadsheet programs) in a number of ways. First, web tables are embedded in webpages, which brings about plenty of contextual information, such as the embedding page’s title and link structure, and the text surrounding the table, which can be utilized. Second, web tables are rather heterogeneous regarding their quality, organization, and content. Among the different table types, relational tables are considered to be of the highest utility, because of the relational knowledge contained in them. However, unlike from tables in relational databases, these relationships are not made explicit in web tables; uncovering them is one of the main research challenges. The uncovered semantics can be leveraged in various applications, including table search and completion, question answering, and knowledge base augmentation. For each of these tasks, we identify seminal work, describe the key ideas behind the proposed approaches, discuss relevant resources, and point out interdependencies among the different tasks.
Deep learning models have been very successful in many natural language processing tasks. Search engine works with rich natural language data, e.g., queries and documents, which implies great potential of applying deep natural language processing on such data to improve search performance. Furthermore, it opens an unprecedented opportunity to explore more advanced search experience, such as conversational search and chat bot.
This tutorial offers an overview on deep learning based natural language processing for search systems from an industry perspective. We focus on how deep natural language understanding powers search systems in practice. The tutorial introduces basic concepts, elaborates associated challenges, reviews the state-of-the-art approaches, covers end-to-end tasks in search systems with illustrative examples, and discusses the future trend.
The abundance of user generated content on social networks provides the opportunity to build models that are able to accurately and effectively extract, mine and predict users’ interests with the hopes of enabling more effective user engagement, better quality delivery of appropriate services and higher user satisfaction. While traditional methods for building user profiles relied on AI-based preference elicitation techniques that could have been considered to be intrusive and undesirable by the users, more recent advances are focused on a non-intrusive yet accurate way of determining users’ interests and preferences.
In this tutorial, we will cover five important aspects related to the effective mining of user interests: (1) we will introduce the information sources that are used for extracting user interests, (2) various types of user interest profiles that have been proposed in the literature, (3) techniques that have been adopted or proposed for mining user interests, (4) the scalability and resource requirements of the state of the art methods and, finally (5) the evaluation methodologies that are adopted in the literature for validating the appropriateness of the mined user interest profiles. We will also introduce existing challenges, open research question and exciting opportunities for further work.
Fairness and related concerns have become of increasing importance in a variety of AI and machine learning contexts. They are also highly relevant to information retrieval and related problems such as recommendation; however, translating algorithmic fairness constructs from classification, scoring, and even many ranking settings into information retrieval and recommendation scenarios is not a straightforward task. This tutorial will help to orient IR researchers to algorithmic fairness, understand how concepts do and do not translate from other settings, and provide an introduction to the growing literature on this topic.
Quantification is the task of estimating, given a set s of unlabelled items and a set of classes C, the relative frequency (or “prevalence”) p© of each class c in C. Quantification is important in many disciplines (such as e.g., market research, political science, the social sciences, and epidemiology) which usually deal with aggregate (as opposed to individual) data. In these contexts, classifying individual unlabelled instances is usually not a primary goal, while estimating the prevalence of the classes of interest in the data is.
Quantification may in principle be solved via classification, i.e., by classifying each item in s and counting, for all c in C, how many such items have been labelled with c. However, it has been shown in a multitude of works this “classify and count” (CC) method yields suboptimal quantification accuracy, one of the reasons being that most classifiers are optimized for classification accuracy, and not for quantification accuracy.
As a result, quantification has come to be no longer considered a mere byproduct of classification, and has evolved as a task of its own, devoted to designing methods and algorithms that deliver better prevalence estimates than CC.
The goal of this course is to introduce the main supervised learning techniques that have been proposed for solving quantification, the metrics used to evaluate them, and the most promising directions for further research.