SIRIP

SIRIP will be held July 14-15, 10:00 AM- 4:00 PM EDT.

Keynotes

Beyond Being Accurate: Neural Modeling and Reinforcement Learning for Large-Scale Real-World Recommendation Problems – Ed Chi and Minmin Chen, Google

Zero to One Billion: The Path for a Rich Product Graph – Luna Dong, Amazon

Invited Talks

You can’t improve what you don’t measure: How to design robust online metrics – Widad Machmouchi, Microsoft

ML/IR in healthcare: Knowledge preserving transfer learning – Anitha Kannan, Curai

Top of Funnel Ad Retrieval – Roeloef Van Zwol, Pinterest

Wednesday, July 14

10:00 – 11:00: Keynote Talk 1: Ed Chi and Minmin Chen

Talk title: Beyond Being Accurate: Neural Modeling and Reinforcement Learning for Large-Scale Real-World Recommendation Problems

Talk abstract: Fundamental improvements in recommendation and ranking have been much harder to come by, when compared with recent progress on other long-standing AI problems such as visual/audio machine perception and machine translation. Some reasons include: (1) large amounts of data making training difficult, yet having (2) noisy and sparse labels; (3) changing dynamics of context such as user preferences and items; and (4) low-latency requirement for a recommendation response. Beyond that, one recent challenge is devising approaches to (5) learning more inclusive and robust models.

In this talk, we will touch upon many recent advances in neural modeling techniques for recommendations and their impact in Google products covering ~320 improvements over the last 3 years, including:

policy gradient RL techniques with off-policy correction in recurrent recommendation models;
multi-task models with gated mixture of experts;
diversification and slate optimization with determinantal point processes;
large output item spaces with Neural Deep Retrieval;
utilizing TPUs for large sparse models;
adversarial approaches for inclusiveness and robustness for Classifiers and Recommenders.

11:00 – 11:40: Invited Talk 1: Widad Machmouchi (Microsoft)

Talk title: You can’t improve what you don’t measure: How to design robust online Metrics

Talk abstract: Online experimentation is becoming more and more popular. Controlled experiments with thousands or even millions of users are applied to establish causal relationships between a new treatment and a change in user behavior. Such A/B experimentation is used widely now in industries related to social media, e-commerce, online publishing, search engines, etc., which try to optimize for engagement, revenue, user success, among other aspects. One of the key factors in evaluating online controlled experiments are metrics. They help discern whether the treatment effect on users was desired or not and therefore guide ship decisions of the teams building the new treatments. For that reason, good A/B metrics are of critical importance in order to make sound data-driven decisions. Yet, it is very easy to build A/B metrics that suffer from undetected weaknesses and which eventually point in the wrong direction leading to – unknowingly – incorrect ship decisions. Great care has to be devoted to proper design of A/B metrics that are expressive, robust, and trustworthy. In this talk, we discuss a few important lessons learnt while designing A/B ship metrics for products like Bing, VSCode, Office and Microsoft Search. Based on many years of experience, we elaborate on a number of important aspects that should be taken into account when designing online A/B metrics at scale.

Speaker bio: Widad Machmouchi is a Principal Data Science Manager in the AI Platform at Microsoft. She leads a team of data scientists in the Experimentation Platform group focusing on experimentation and user measurement. Widad develops tools and techniques that enable teams to make data-driven decisions and grow users, through trustworthy A/B testing, metric development, and user behavior modeling. She applies these techniques in multiple products covering many industries like web search (Bing), collaboration and productivity (Microsoft Office) and AI development (VSCode and Azure Machine Learning). In her free time, Widad is an angel investor via the Seattle Angel Conference, helping early-stage startups develop their business plans and raise funding. Widad holds a PhD in Theoretical Computer Science from the University of Washington, Seattle and is a co-founder of a technology hardware start-up.

11:40 – 12:00: Contributed Talk 1

Viet Ha-Thuc, Matthew Wood, Yunli Liu and Jagadeesan Sundaresan, From Producer Success to Retention: a New Role of Search and Recommendation Systems on Marketplaces

12:00 – 12:20: Contributed Talk 2

Shiri Dori-Hacohen, Keen Sung, Jengyu Chou and Julian Lustig-Gonzalez, Restoring Healthy Online Discourse by Detecting and Reducing Controversy, Misinformation, and Toxicity Online

12:20 – 1:30: Lunch

1:30 – 2:10: Invited Talk 2: Anita Kannan (Curai)

Talk title: ML/IR in healthcare: Knowledge preserving transfer learning

Talk abstract: Telemedicine is a rapidly growing medium of interaction with the healthcare system. The current pandemic has led to 100% increase in virtual urgent care visits and greater than 4000% increase in virtual non-urgent care visits. Despite substantial growth in telemedicine, equitable access to best healthcare continues to be a dream that is yet to be realized. In this talk, we discuss the vision of Curai for bridging the gap through machine learning as part of the medical team, and the technical challenges to bring that vision to reality.

Much of the recent advancements in many machine learning applications can be attributed to the paradigm of transfer learning: the idea being that we can repurpose large scale pre-trained deep neural networks for task specificity. As is, these models currently lack the fidelity in high stake application of healthcare. In this talk, we explore an integral component to transfer learning applied in healthcare: the need to encode medical knowledge and preserve medical correctness and completeness. We will present some of our research in (1) medical question answering especially in the backdrop of COVID and (2) medical conversation summarization, wherein we harness these models to incorporate medical knowledge by incorporating data from diverse sources such as medical texts and EHRs; These learned models are to be used within data feedback loop with practitioners.

Speaker bio: Anitha Kannan is a founding member and Head of Machine Learning Research at Curai. At Curai, she leads research on AI-driven innovative solutions for equitable best quality healthcare accessible to all. Prior to Curai, she has held senior research scientist positions at Facebook AI research and at Microsoft research. She holds a Ph.D. in machine learning from the University of Toronto and was a Darwin Fellow at the University of Cambridge, UK. She has extensively published in top-tier conferences and holds numerous patents.

2:30 – 2:50: Contributed Talk 3

Nikhil Rao, Learning with Little Data: Industry Challenges and Innovations

2:50 – 3:10: Break

3:30-4:20: Panel Discussion

Panelists: Andrei Broder, Mounia Lalmas-Roelleke, Haixun Wang, Fabrizio Silvestri, Jacopo Tagliabue, Stephen Lynch
Moderator: Ricardo Baeza-Yates

Thursday, July 15

10:00 – 11:00: Keynote 1: Luna Dong (Amazon)

Talk title: Zero to One Billion: The Path for a Rich Product Graph

Talk abstract: Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Personalization, Voice, that we wish to support, all present big challenges in constructing such a graph.

In this talk we describe our efforts for knowledge collection for products of thousands of types. We describe how we nail down the most important first step for delivering the data business: training high-precision models that generate accurate data. We then describe how we scale up the models with learning from limited labels, and how we increase the yields with multi-modal models and web extraction. We share the many learnings and lessons in building this product graph and applying it to support customer-facing applications.

Speaker bio: Xin Luna Dong is a Senior Principal Scientist at Amazon, leading the efforts of constructing Amazon Product Knowledge Graph. She was one of the major contributors to the Google Knowledge Vault project, and has led the Knowledge-based Trust project, which is called the “Google Truth Machine” by Washington’s Post. She has co-authored book “Big Data Integration”, was awarded ACM Distinguished Member, VLDB Early Career Research Contribution Award for “advancing the state of the art of knowledge fusion”, and Best Demo award in Sigmod 2005. She serves in VLDB endowment and PVLDB advisory committee, and is a PC co-chair for WSDM 2022, VLDB 2021, KDD’2020 ADS Invited Talk Series, and Sigmod 2018.

11:00 – 11:40: Invited talk 3: Roeloef Van Zwol (Pinterest)

Talk title: Top of Funnel Ad Retrieval

Talk abstract: During this presentation I’ll give a short overview of ads marketplace design and principles, followed by a discussion of top of funnel ads retrieval methods and funnel optimization strategies.

Speaker bio: Roelof van Zwol is leading the Ads Quality engineering organization at Pinterest. Ads Quality is responsible for all ML model models and algorithms powering the Ads Marketplace at Pinterest. The team works on optimizing long term value for Pinners, Partners and Pinterest through ads delivery and marketplace optimization as well as through a number of advertiser centric ML solutions like media planning, targeting, advertiser recommendations and privacy preserving ML models for measurement. Previously, Roelof was a Director of Product Innovation at Netflix, working on Search, Core recommendations, Content Promotion and Programmatic Marketing at Netflix. Prior to joining Netflix, Roelof managed the multimedia research team at Yahoo!, first from Barcelona, Spain, and later from Yahoo!’s headquarters in California. He started his career in academia as an assistant professor in the Computer Science department in Utrecht, the Netherlands, after finishing his PhD at the University of Twente in Enschede, the Netherlands.

11:40 – 12:00: Contributed Talk 4

Dmitri Goldenberg, Trending Challenges and Applications in Personalization

12:00 – 12:20: Contributed Talk 5

Xinlin Xia, Shang Wang, Han Zhang, Songlin Wang, Sulong Xu, Yun Xiao, Bo Long and Wen-Yun Yang, SearchGCN: Powering Embedding Retrieval by Graph Convolution Networks for E-Commerce Search

12:20 – 1:30: Lunch

1:30 – 2:00: Contributed Talk 6 and Demo

Feng-Lin Li, Zhongzhou Zhao, Qin Lu, Xuming Lin, Hehong Chen, Bo Chen, Liming Pu, Jiashuo Zhang, Fu Sun, Xikai Liu, Liqun Xie, Qi Huang, Ji Zhang and Haiqing Chen, AliMe Avatar: Multi-modal Content Production and Presentation for Live-streaming E-commerce

Guohai Xu, Yan Shao, Chenliang Li, Feng-Lin Li, Bin Bi, Ji Zhang and Haiqing Chen, AliMe DA: a Data Augmentation Framework for Question Answering in Cold-start Scenarios

2:00 – 2:20: Contributed Talk 7

Davide Liu, Alexandre Boulenger and George Farajalla, Transformer-based Banking Products Recommender System

2:20 – 2:40: Contributed Talk 8

Deepanshi Seth, Rukma Talwadker, Tridib Mukherjee, Usama Chitapure, Nagesh Adiga and Avantika Gupta, AI Based Information Retrieval System for Identifying Harmful Online Gaming Patterns

2:40-3:00: Coffee Break

3:00 – 4:00: Industry/Student Zoom Breakout Rooms

An informal interaction session for the SIGIR student community to get exposure to nuances of industrial research and seek career guidance from industry experts. We have industry researchers from across 3x timezones with dedicated Zoom rooms, where students can join and have informal chats.

North America (3-4pm Eastern Time)
- Konstantina Christakopoulou (Google) (Zoom room 1)
- Julia Kiseleva (Microsoft) (Zoom room 2)
- Surya Kallumadi (Lowe’s Companies) (Zoom room 3)
- Praveen Chandar (Spotify) (Zoom room 4)
Europe (12-1pm British Summer Time)
- Jiyin He (Signal AI) (Zoom room 1)
- Anne Schuth (DPG Media) (Zoom room 2)
- Ridho Reinanda (Bloomberg) (Zoom room 3)
- Roi Blanco (Amazon) (Zoom room 4)
Asia (9:30-10:30am Indian Standard Time)
- Debdoot Mukherjee (Sharechat) (Zoom room 1)
- Vishwa Vinay (Adobe) (Zoom room 2)