SIGIR 2020 Summer School , Xi’an, China

Time Zone: GMT+8
Saturday, July 25
  • Susan Dumais (8:00pm-9:30pm in Boston)
  • Jimmy Lin (9:30pm-11pm in Waterloo)
  • break
  • Luke Zettlemoyer (8:30pm-10:00pm in Seattle)
  • break
  • Mark Sanderson (5:00pm-6:30pm in Melbourne)
  • Maarten de Rijke (10:30am-12:00pm in Amsterdam)
  • break
  • Mounia Lalmas (1:00pm-2:30pm in London)

Personalized Search

8:00am-9:30am, Susan Dumais

Abstract: Traditionally Web search engines returned the same results to everyone who asks the same question. However, using a single ranking for everyone in every context at every point in time limits how well a search engine can do in providing relevant information. In this talk I present a framework to quantify the "potential for personalization” which is used to characterize the extent to which different people have different intents for the same query. I describe several examples of how different types of contextual features are represented and used to improve search quality for individuals and groups. Finally, I conclude by highlighting important challenges in developing personalized systems at Web scale including privacy, transparency, serendipity, and evaluation.

Susan Dumais is a Technical Fellow at Microsoft, Director of the Microsoft Research Labs in New England, New York City and Montréal, and an adjunct professor at the University of Washington. Prior to joining Microsoft, she was a Member of Technical Staff at Bell Labs where she developed Latent Semantic Analysis, an early word embedding technique for search. Her current research focuses on personalization, email search, and large-scale behavioral log analysis.

She has worked with several Microsoft product groups (Bing, Windows Search, SharePoint, and Office Help) on search-related innovations, and holds several patents on novel retrieval algorithms and interfaces. Susan has published widely in the fields of information retrieval, human-computer interaction, and cognitive science. She is an ACM Fellow, was elected to the CHI Academy, the SIGIR Academy, the National Academy of Engineering (NAE) and the American Academy of Arts and Sciences (AAAS), and received the SIGIR Gerard Salton Award for lifetime achievement in information retrieval, the ACM Athena Lecturer Award for fundamental contributions to computer science, the Tony Kent Strix Award for outstanding contributions to information retrieval, the ACM SIGCHI Lifetime Research Award for lifetime achievement in human-computer interaction, and the Lifetime Achievement Award from Indiana University Department of Psychological and Brain Science.

Natural Language Processing and Information Retrieval: Together at Last

9:30am-11:00am, Jimmy Lin

Abstract: It is an intuitive hypothesis that techniques from natural language processing (NLP) should improve information retrieval (IR). Surely, attempts to understand the meaning of texts would help systems better provide users with relevant information? Yet, the history of IR is littered with ideas from NLP that intuitively "should work", but never panned out, at least with the implementations at the time. Two examples from the 1990s include word sense disambiguation (WSD) and linguistic indexing.

In this talk, I will discuss my own personal journey over the past two decades grappling with this hypothesis, culminating in recent advances in transformer architectures as applied to information access problems. BERT, the most famous of these models, is I believe the first NLP technique that has unequivocally improved information access. Thus, we are at an exciting and unique moment where NLP and IR have come together at last, producing a fertile research landscape full of interesting possibilities. In the hopes of stirring up some discussion, I'll offer the following slogan: NLP makes IR interesting and IR makes NLP useful.

Professor Jimmy Lin holds the David R. Cheriton Chair in the David R. Cheriton School of Computer Science at the University of Waterloo. Lin's research aims to build tools that help users make sense of large amounts of data. His work mostly lies at the intersection of information retrieval and natural language processing, with a focus on data-driven approaches and infrastructure issues. Although most of Lin's work deals with text, he's also worked on relational data, semi-structured data, log data, speech, and graphs. From 2010-2012, Lin spent an extended sabbatical at Twitter, where he worked on services designed to connect users with relevant content and analytics infrastructure to support data science. He currently serves as the Chief Scientist of, a Waterloo-based startup that aims to build deep natural language understanding technologies to facilitate seamless dialogues between users and systems.

Recent Advances in Language Model Pre-training

11:30am-1:00pm, Luke Zettlemoyer

Abstract: Language models can be pre-trained at a very large scale by noising and then reconstructing any input text. Existing methods, based on variations of masked languages models, have transformed the field and now provide the de facto initialization to be tuned for nearly any NLP task. In this tutorial, I will review recent work in language model pretraining, from ELMo and BERT to more recent models. I will aim for broad coverage but provide more details on the models we have been developing recently in FAIR Seattle. These including in particular pre-training methods for sequence-to-sequence model, such as BART, mBART, and MARGE, which provide some of the most generally applicable approaches to date.

Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Scientist at Facebook. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. Honors include multiple paper awards, a PECASE award, and an Allen Distinguished Investigator Award. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.

Offline Evaluation of IR systems

3:00pm-4:30pm, Mark Sanderson

Abstract: Evaluation of information retrieval (IR) systems is a critically important part of reporting in the IR research community. In this tutorial, I will provide an overview of the main offline and online techniques that are used to evaluate search engines. Topics covered in the tutorial will include the classic test collection based approach, statistical significance testing, user evaluation, ethics approval, AB testing, and the recent innovation in counterfactual evaluation, where offline collections can be induced from online click data. At the end of this tutorial, students will understand the principles, methods, and resources available for evaluation. I will also briefly discuss how users can get access to data to enable them to evaluate.

Mark Sanderson is Professor of Information Retrieval at RMIT University where he is head of the RMIT Information Retrieval (IR) group. Mark received his Ph.D. in Computer Science from the University of Glasgow, United Kingdom, in 1997. He has raised over $10 million dollars in grant income, published hundreds of papers, and has over 9,000 citations to his work. He has 25 current and/or past PhD students. In collaboration with one student, Mark was the first show the value of snippets, a component of search interfaces which are now a standard feature of all search engines. One of Mark's papers was given an honourable mention at SIGIR's 2017 test of time awards. Mark has been co-editor of Foundations and Trends in Information Retrieval; associate editor of IEEE TKDE, ACM TOIS, ACM TWeb, and IP&M; and served on the editorial boards of IRJ and JASIST. Mark was general chair of ACM SIGIR in 2004. He was a PC chair of ACM SIGIR 2009 & 2012; and ACM CIKM 2017. Prof Sanderson is also a visiting professor at NII in Tokyo.

Information Retrieval as Interaction

4:30pm-6:00pm, Maarten de Rijke

Abstract: Modern Information Retrieval (IR) systems, such as search engines, recommender systems, and conversational agents, are best thought of as interactive systems. And their development is best thought of as a two-stage development process: offline development followed by continued online adaptation and development based on interactions with users. In this lecture, I will sketch a rich landscape of offline and online topics that any student interested in IR system development should be familiar with. I will discuss IR scenarios such as search, recommender systems, conversational interaction, and topics such as query and interaction mining and understanding, offline, counterfactual and online evaluation, and offline, counterfactual and online learning to rank.

Maarten de Rijke is University Professor of Artificial Intelligence and Information Retrieval at the University of Amsterdam. He is also VP Personalization and Relevance and Senior Research Fellow at Ahold Delhaize. His research strives to build intelligent technology to connect people to information. His team pushes the frontiers of search engines, recommender systems and conversational assistants. They also investigate the influence of the technology they develop on society. De Rijke is the director of the Innovation Center for Artificial Intelligence.

Music recommendations (research) at Spotify

8:00pm-9:30pm, Mounia Lalmas

The aim of the Personalization mission at Spotify is “to match fans and artists in a personal and relevant way”. In this talk, I will describe some of the (research) work to achieve this, from using machine learning to metric validation and evaluation methodology. I will describe works done in the context of Home and Search.

Mounia Lalmas is a Director of Research at Spotify, and the Head of Tech Research in Personalization. Mounia also holds an honorary professorship at University College London. Before that, she was a Director of Research at Yahoo, where she led a team of researchers working on advertising quality for Gemini, Yahoo's native advertising platform. She also worked with various teams at Yahoo on topics related to user engagement in the context of news, search, and user generated content. Prior to this, she held a Microsoft Research/RAEng Research Chair at the School of Computing Science, University of Glasgow. Before that, she was Professor of Information Retrieval at the Department of Computer Science at Queen Mary, University of London. Her work focuses on studying user engagement in areas such as native advertising, digital media, social media, search, and now audio. She has given numerous talks and tutorials on these and related topics, including a WWW 2019 tutorial on 'Online User Engagement: Metrics and Optimization', which will also be given at KDD 2020. She is regularly a senior programme committee member at conferences such as WSDM, KDD, WWW and SIGIR. She was co-programme chair for SIGIR 2015, WWW 2018 and WSDM 2020. She is also the co-author of a book written as the outcome of her WWW 2013 tutorial on 'measuring user engagement.