Full-day Tutorials
From Design to Analysis: Conducting Controlled Laboratory Experiments with Users.
Diane Kelly (University of North Carolina at Chapel Hill)
Anita Crescenzi (University of North Carolina at Chapel Hill)
Room E, Sunday, July 17, 9:00-10:30; 11:00-12:30; 14:00-15:30; 16:00-17:30.
details
This full day tutorial will (1) provide general instruction about how to design laboratory IIR experiments with human participants, with an emphasis on control; (2) describe different data collection methods and procedures, with an emphasis on self-report measures and scales and techniques for establishing validity and reliability of these measures; (3) introduce the use of statistical power analysis for sample size estimation in experiments with users and (4) introduce and demonstrate two data analysis techniques, multilevel modeling and structural equation modeling, to estimate the effects of variables present in IIR experiments.
The audience for this tutorial includes students and researchers who are interested in conducting laboratory experiments with human participants. We will assume participants have had little formal experience learning about experimental design in the context of laboratory user studies, but have a basic understanding of some terminology such as variables and hypotheses. With respect to the data analysis section, we will assume participants have a basic understanding of descriptive and inferential statistics, and hypotheses testing.
Diane Kelly is a Professor at the School of Information and Library Science at the University of North Carolina at Chapel Hill. Her research and teaching interests are in interactive information search and retrieval, information search behavior, and research methods. Kelly is the recipient of a Francis Carroll McColl Term Professorship at UNC, the 2014 ASIST Research Award and the 2013 British Computer Society’s IRSG Karen Spärck Jones Award. She is the recipient of two teaching awards: the 2009 ASIST/Thomson Reuters Outstanding Information Science Teacher Award and the 2007 SILS Outstanding Teacher of the Year Award. Kelly received a Ph.D. in information science and a Graduate Certificate in cognitive science from Rutgers University and an undergraduate degree in psychology from the University of Alabama.
Anita Crescenzi is a PhD student at the School of Information and Library Science at the University of North Carolina at Chapel Hill. Her research interests are in interactive information retrieval, information-seeking and information behavior. Crescenzi also has a special interest in research methods and statistical analysis methods including multilevel modeling and structural equation modeling. Crescenzi teaches graduate-level courses in systems analysis and information behavior. Crescenzi also has eight years experience conducting applied user research including lab-based usability evaluation and three years of teaching experience. Crescenzi has a master’s degree in information science from UNC and a bachelor’s degree in education from the University of Illinois.
Succinct Data Structures in Information Retrieval: Theory and Practice.
Simon Gog (Karlsruhe Institute of Technology)
Rossano Venturini (University of Pisa)
Room A, Sunday, July 17, 9:00-10:30; 11:00-12:30; 14:00-15:30; 16:00-17:30.
details
Tutorial page here.
Simon Gog is a researcher at Karlsruhe Institute of Technology (KIT), Germany. His main research area is the design of practical index data structures with focus on compact and succinct solutions. He received his Ph.D. in 2011 from Ulm University. In his thesis titled “Compressed Suffix Trees: Design, Construction, and Applications” he improved compressed suffix trees for applications in Bioinformatics. A practical byproduct of his thesis is the Succinct Data Structure Library (SDSL), which was further developed and adapted to Information Retrieval applications at the University of Melbourne. SDSL is available on GitHub and successfully used in many projects in Bioinformatics, Natural Language Processing and Information Retrieval. Simon is teaching graduate courses at KIT on a regular basis. In the last year he taught the courses Advanced Data Structures and Text Indexing and also gave several invited tutorials about SDSL at different venues.
Rossano Venturini is a researcher at Computer Science Department, University of Pisa. He received his Ph.D. from the Computer Science Department of the University of Pisa in 2010 discussing his thesis titled “On Searching and Extracting Strings from Compressed Textual Data”. His research interests are mainly focused on the design and the analysis of algorithms and data structures with special attention to problems of indexing and searching large textual collections. He received two Best Paper Awards at ACM SIGIR in 2014 and 2015. Rossano is teaching both undergraduate and graduate courses at the Department of Computer Science at the University of Pisa on a regular basis. In the last year, he taught a undergraduate course on Algorithms and Data Structures, a graduate course on Algorithms for Big Data, two graduate courses on introduction to Computer Science with Python, and a graduate course on Information Retrieval.
Morning Tutorials
Online Learning to Rank for Information Retrieval.
Artem Grotov (University of Amsterdam)
Maarten de Rijke (University of Amsterdam)
Room Fermi, Sunday, July 17, 9:00-10:30; 11:00-12:30.
details
Search engines have developed into complex systems that combines hundreds of ranking criteria with the aim of producing the optimal result list in response to users’ queries. Traditionally, learning to rank algorithms are trained in batch mode, on a dataset of query and document pairs with their associated manually created relevance labels. Creation of such datasets is expensive and therefore infeasible for smaller search engines, such as small web-store search engines. It may be impossible for experts to annotate documents, as in the case of personalized search. Also, the relevance of documents to queries can change over time, like in a news search engine. Online learning to rank addresses all of these issues by incrementally learning from user feedback in real time. Online learning is closely related to active learning, incremental learning, and counterfactual learning. However, online learning is more difficult because one has to balance exploration and exploitation: actions with unknown performance have to be explored to learn better solutions. There is a growing body of established methods for online learning to rank for information retrieval and the time is right to organize and present this material to a broad audience of interested information retrieval researchers. Prerequisite knowledge: Basic familiarity with machine learning. Basic knowledge in statistics and probability theory is also required. Understanding basics of statistical estimation. Basic programming skills to follow examples.
Artem Grotov is a PhD candidate at the Informatics Institute of the University of Amsterdam. He works on online Learning to Rank as well as other topics that deal with interpreting data obtained from user interactions with interactive systems. His work on click models and evaluating search engines based on logged user interaction have been published at SIGIR 2015 and CLEF 2015. Grotov has helped to teach multiple courses on Information Retrieval and has been a thesis supervisor for Bachelor and Master students working on algorithmic information retrieval topics.
Maarten de Rijke is a Professor of Computer Science at the Informatics Institute of the University of Amsterdam. Together with a team of PhD students and postdocs he works on problems on semantic search and on- and offline learning to rank for information retrieval. Some of their recent work on on- and offline learning to rank has been (or will be) published at ICML 2014, WSDM 2014, CIKM 2014, WSDM 2015, SIGIR 2015, NIPS 2015, WSDM 2016, ECIR 2016, WWW 2016. De Rijke has taught extensively at all levels, from public lectures on search engine technology to advanced tutorials aimed at PhD students and researchers. Recent tutorials include SIGIR 2015 and WSDM 2016.
Deep Learning for Information Retrieval.
Hang Li (Huawei Technologies)
Zhengdong Lu (Huawei Technologies)
Room Pacinotti, Sunday, July 17, 9:00-10:30; 11:00-12:30.
details
The tutorial consists of three parts. In the first part, we introduce the fundamental techniques of deep learning for natural language processing and information retrieval, such as word embedding, recurrent neural networks, and convolutional neural networks. In the second part, we explain how deep learning, particularly representation learning techniques, can be utilized in fundamental NLP and IR problems, including classification, structured prediction, matching, and translation. In the third part, we describe how deep learning can be used in specific application tasks in details. The tasks are search, question answering (from either documents or knowledge base), and image retrieval.
The tutorial is set at an intermediate level. It is assumed that the attendees have certain knowledge on machine learning and information retrieval. It is not a requirement, however, that the attendees know much about deep learning.
Hang Li is director of the Noah’s Ark Lab of Huawei Technologies, adjunct professors of Peking University and Nanjing University. He is ACM Distinguished Scientist. His research areas include information retrieval, natural language processing, statistical machine learning, and data mining. Hang graduated from Kyoto University in 1988 and earned his PhD from the University of Tokyo in 1998. He worked at the NEC lab as researcher during 1991 and 2001, and Microsoft Research Asia as senior researcher and research manager during 2001 and 2012. He joined Huawei Technologies in 2012. Hang has published three technical books, and more than 100 technical papers at top international conferences including SIGIR, WWW, WSDM, ACL, EMNLP, ICML, NIPS, SIGKDD and top international journals including CL, NLE, JMLR, TOIS, IRJ, IPM, TKDE, TWEB, TIST.
Zhengdong Lu is a senior researcher at Noah’s Ark Lab, Huawei Technologies. His research interests are neural network-based methods for natural language processing, including dialogue, machine translation, semantic parsing, and reasoning. Previously he was an associate researcher at Microsoft Research Asia and a postdoctoral researcher at University of Texas at Austin, after receiving his Ph.D. degree from Oregon Health and Science University in 2008 in computer science. He has published over 30 papers in prestigious journals and conferences, including NIPS, ICML, ACL, KDD, IJCAI and AAAI, including over 10 recent papers on deep learning methods for NLP and AI.
Constructing and mining web-scale knowledge graphs.
Evgeniy Gabrilovich (Google Research)
Nicolas Usunier (Facebook)
Room C, Sunday, July 17, 9:00-10:30; 11:00-12:30.
slides of the tutorial: here
details
Evgeniy Gabrilovich is a senior staff research scientist at Google, where he works on improving healthcare. Prior to joining Google in 2012, he was a director of research and head of the natural language processing and information retrieval group at Yahoo! Research. Evgeniy is an ACM Distinguished Scientist, and is a recipient of the 2014 IJCAI-JAIR Best Paper Prize. He is also a recipient of the 2010 Karen Sparck Jones Award for his contributions to natural language processing and information retrieval. He earned his PhD in computer science from the Technion — Israel Institute of Technology.
Nicolas Usunier is a research scientist at Facebook AI Research. Before joining Facebook in 2015, he was associate professor at Universite de Technologie de Compiegne (UTC, France) with a Higher-education chair from the CNRS, the French National Research Center. Prior to his position at UTC, he was associate professor at Universite Pierre et Marie Curie in Paris, from where he received his PhD in computer science in 2006. His main areas of research are large scale learning of embeddings and learning to rank, in particular with applications to knowledge bases. He served as an area chair for NIPS 2014 and is a reviewer for the major conference in machine learning and journals such as NIPS, ICML, JMLR and the Machine Learning journal.
Temporal Information Retrieval.
Nattiya Kanhabua (Aalborg University)
Avishek Anand (L3S Research Center & University of Hannover)
Room Galilei, Sunday, July 17, 9:00-10:30; 11:00-12:30.
details
This tutorial will provide a comprehensive overview of temporal IR approaches, essentially regarding processing dynamic content, temporal information extraction, temporal query analysis, and time-aware retrieval and ranking. We will explain the general and wide aspects associated to temporal dynamics by focusing on the web domain, from content and structural changes to variations of user behavior and interactions. We will begin with the pre-processing step of a temporal document collection, such as, dynamic crawling and temporal in order to support temporal queries. Later, we will discuss particular aspects of temporal information extraction, for instance, identifying and extracting of temporal information useful for leveraging in the retrieval.In the latter session, we will describe research issues centered on determining the temporal intent of queries, and time-aware query enhancement, namely, temporal relevance feedback, and time-aware query reformulation. Next, we will explain current approaches to time-aware retrieval and ranking, which can be classified into different types based on two main notions of relevance with respect to time, namely, recency-based ranking, and time-dependent ranking. In addition, we present applications in related research areas, e.g., exploration, summarization, clustering of search results, future event retrieval and prediction. Finally, we conclude our tutorial and outline future directions. This tutorial targets graduate students, junior researchers and practitioners in the field of information retrieval. Our prospective participants should have a basic knowledge of search processes, essentially regarding web crawling, document indexing, query analysis, and retrieval and ranking. A prerequisite skill about other research areas, e.g., natural language processing, is required but not mandatory.
Nattiya Kanhabua is an assistant professor at the Department of Computer Science, Aalborg University, Denmark, with several years of research experience in information retrieval, data mining, machine learning, statistical and predictive analytics, temporal analysis. Her research interests are information retrieval, Web and social media mining, and web archiving. She worked in several EU-funded projects, e.g., (1) ForgetIT – Concise Preservation by Combining Managed Forgetting and Contextualized Remembering, (2) ALEXANDRIA, an ERC Advanced Grant Project on Foundations for Temporal Retrieval, Exploration and Analytics in Web Archives, and (3) Medical Ecosystem: Personalized Event-based Surveillance. She has published her research work in top-tier conferences, such as, SIGIR, WSDM, CIKM, JCDL and ECIR.
Avishek Anand is a postdoctoral researcher at the L3S Research Center in Hannover, Germany. His research interests lie in the intersection of retrieval, mining, and data management aspects of temporal Web collections like Web archives, Wikipedia and news collections. He did his PhD at the Department of Databases and Information Systems, Max Planck Institute for Informatics, Saarbruecken where he worked on indexing and query processing approaches for supporting temporal text workloads. Currently, he is working on retrieval models for historical intents, tag-based search over Archives and mining methods for enriching Wikipedia using news collections. He has published his research in several top-tier conferences, such as, SIGIR, WSDM, CIKM, ICDE and EDBT.
Collaborative Information Seeking: Art and Science of Achieving 1+1>2 in IR.
Chirag Shah (Rutgers University)
Room D, Sunday, July 17, 9:00-10:30; 11:00-12:30.
details
Tutorial page here.
Chirag Shah is an assistant professor in both the School of Communication & Information (SC&I) and the Department of Computer Science at Rutgers University, USA. Shah received a PhD in Information Science from the University of North Carolina (UNC) at Chapel Hill. He holds an MTech, Computer Science & Engineering from IIT Madras, India and an MS, Computer Science from UMass Amherst. His research interests include various aspects of interactive information retrieval/seeking, especially in the context of online social networks and collaborations, resulting in two streams of works – Collaborative Information Seeking (CIS) and Social Information Seeking (SIS). This research is supported by grants from NSF, IMLS, Google, and Yahoo! At Rutgers, he directs the InfoSeeking Lab. He has published two books on CIS – one as a solo author (2012) and the other as a co-editor (2015). He was a guest editor for the IEEE Computer Special Issue on CIS published in March 2014.
Afternoon Tutorials
Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement.
Thorsten Joachims (Cornell University)
Adith Swaminathan (Cornell University)
Room Pacinotti, Sunday, July 17, 14:00-15:30; 16:00-17:30.
details
Prerequisites:
– This tutorial is aimed at an audience with intermediate experience with information retrieval.
– Familiarity with standard IR methods, applications and evaluation metrics is assumed and only briefly reviewed.
– Basic understanding of probability theory and introductory statistics is sufficient for understanding most of the tutorial.
– Some topics require basic understanding of machine learning.
– All code samples demonstrating counterfactual analysis for IR will be in Python3.
– Participants who wish to run these demos locally must bring a device capable of running Python3 scripts.
Thorsten Joachims is a Professor in the Department of Computer Science and in the Department of Information Science at Cornell University. His research interests center on a synthesis of theory and system building in machine learning, with applications in information retrieval and recommendation. His past research focused on learning to rank, learning with preferences, learning from implicit feedback, text classification, and structured output prediction. He is an ACM Fellow, AAAI Fellow and Humboldt Fellow.
Adith Swaminathan is a PhD candidate in the Department of Computer Science at Cornell University, advised by Prof. Thorsten Joachims. His research interests are at the core of this tutorial, focusing on principles and algorithms for off-policy evaluation and learning for retrieval and recommendation systems. He received a BTech degree in Computer Science and Engineering from IIT Bombay in 2010 and a MSc in Computer Science from Cornell University in 2014.
Simulation of Interaction: A Tutorial on Modelling and Simulating User Interaction and Search Behaviour.
Leif Azzopardi (University of Glasgow)
Room Fermi, Sunday, July 17, 14:00-15:30; 16:00-17:30.
details
In this tutorial, we aim to provide researchers with an overview of simulation, detailing the various types of simulation, models of search behavior used to simulate interaction, along with the models of querying, stopping, selecting documents and marking documents. Through the course of the tutorial we will describe various studies and how they have used to simulation to explore different behaviours and aspects of the search process. The final section of the tutorial will be dedicated to “best practice” and how to build, ground and validate simulations. The tutorial will conclude with a demonstration of an open source simulation framework that can be used develop various kinds of simulations. An optional follow on session will be offered to participants wanting to learn more about the toolkit and how to build simulations.
Dr. Leif Azzopardi is a Senior Lecturer within the School of Computing Science at the University of Glasgow, within the Glasgow Information Retrieval Group. His research focuses on building formal models for Information Retrieval – usually drawing upon different disciplines for inspiration, such as Quantum Mechanics, Operations Research, Microeconomics, Transportation Planning and Gamification. In 2010, he co-organized an ACM SIGIR Workshop on the Simulation of Interaction, delivered a keynote on the `Assimilation of Users’ at the ACM SIGIR 2013 Workshop Modeling User Behaviour for Evaluation and gave a keynote on `Usor Economicus’ at CORIA 2015. He has also given numerous lectures and invited talks on simulation at various universities and at the Information Foraging Summer School (2010-2012). He has also given a series of tutorials on Retrievability (ECIR 2015, ACM SIGIR 2014, ACM ICTIR 2015) and on Formal Models of Information Seeking, Search and Retrieval (ACM SIGIR 2015, ACM CIKM 2015).
Question Answering with Knowledge Base, Web and Beyond.
Scott Wen-tau Yih (Microsoft Research)
Hao Ma (Microsoft Research)
Room C, Sunday, July 17, 14:00-15:30; 16:00-17:30.
details
Scott Wen-tau Yih is a Senior Researcher at Microsoft Research Redmond. His research interests include natural language processing, machine learning and information retrieval. Yih received his Ph.D. in computer science at the University of Illinois at Urbana-Champaign. His work on joint inference using integer linear programming (ILP) helped the UIUC team win the CoNLL-05 shared task on semantic role labeling, and the approach has been widely adopted in the NLP community since then. After joining MSR in 2005, he has worked on email spam filtering, keyword extraction and search & ad relevance. His recent work focuses on continuous semantic representations, with applications in lexical semantics, knowledge base embedding and question answering. Yih received the best paper award from CoNLL-2011, an outstanding paper award from ACL-2015 and has served as program co-chairs (CEAS-09, CoNLL-14), area chairs and co-chairs (HLT-NAACL-12, ACL-14, EMNLP-16) and action editor (Transactions of ACL).
Hao Ma is a Researcher at Microsoft Research, Redmond, WA, USA. He obtained his Ph.D. in Computer Science at The Chinese University of Hong Kong. His research interests include Information Retrieval, Natural Language Processing, Machine Learning, Recommender Systems and Social Network Analysis. Most recently, Dr. Ma has been working on entity related research problems and applications. He designed the core learning algorithms that powered both Bing’s and Microsoft’s entity experience, including question answering, entity recommendation, attributes ranking, interpretation, exploration, carousel ranking, etc. He has published more than 40 research papers in prestigious conferences and journals, including WWW, SIGIR, WSDM, AAAI, TOIS, TKDE, TMM, TIST, etc. Some of his research work has been widely reported by popular news media, like MIT Technology Review, Search Engine Land, etc. Dr. Ma is also in the winning team that won the Microposts Entity Linking Challenge in WWW 2014.
Instant Search – A hands-on tutorial.
Ganesh Venkataraman (LinkedIn)
Viet Ha-Thuc (LinkedIn)
Dhruv Arya (LinkedIn)
Room Galilei, Sunday, July 17, 14:00-15:30; 16:00-17:30.
details
Ganesh Venkataraman currently leads several search quality efforts at LinkedIn. His contributions at LinkedIn include, leading end to end re-architecture of job search, machine learned ranking for people search typeahead (system that allows members to search for 400MM+ users via instant results), introducing machine learned ranking towards skills search at LinkedIn (ex: searching for people skilled at ‘information retrieval’). He co-authored a paper on personalized ranking which won the best paper award at the IEEE Big Data Conference 2015. Prior to LinkedIn he was the founding engineer of a payments startup where he developed algorithms to detect/prevent eCommerce fraud. He holds a Ph.D. from Texas A&M in Electrical & Computer Engineering where he was the recipient of the Dean’s graduate merit scholarship. He has several publications with 200+ citations including one fundamental contribution to graph theory.
Viet Ha-Thuc leads machine learning efforts for improving search quality at LinkedIn. He has played a key role in designing and implementing machine learned ranking for personalized search and federation across several verticals at LinkedIn. His work on LinkedIn search has been published at conferences such as CIKM, Big Data and WWW. One of the publications received the Best Application Paper Award at 2015 IEEE Big Data. Prior to LinkedIn, he was a scientist in the Content Understanding group at Yahoo! Labs, where he developed a machine learning system for extracting relevant entities and concepts in text documents. The system was deployed to annotate every email and news article in the Yahoo! ecosystem. He received a Ph.D. in Computer Science from the University of Iowa in 2011.
Dhruv Arya currently focusses on improving job search quality at LinkedIn. His goal is to apply machine learning and data mining approaches to build talent matching algorithms that connect job seekers to the most relevant jobs. Apart from this, he has made key contributions to query understanding and rewriting, whole page optimization, as well as personalized federated search, which was presented at CIKM 2015. He received a Master’s degree in Computer Science from University of Pennsylvania in 2013.
Scalability and Efficiency Challenges in Large-Scale Web Search Engines.
B. Barla Cambazoglu (Independent Researcher)
Ricardo Baeza-Yates (Universitat Pompeu Fabra & Universidad de Chile, Santiago)
Room D, Sunday, July 17, 14:00-15:30; 16:00-17:30.
slides of the tutorial: here
details
Berkant Barla Cambazoglu (Independent Researcher) received his BS, MS, and PhD degrees, all in computer engineering, from the Computer Engineering Department of Bilkent University in 1997, 2000, and 2006, respectively. After getting his PhD degree, he worked as a postdoctoral researcher in Bilkent University for a short period of time. In 2006, he joined the Biomedical Informatics Department of the Ohio State University as a postdoctoral researcher. In 2008, he joined Yahoo Labs as a postdoctoral researcher. He got research scientist and senior research scientist positions at the same institution, in 2010 and 2012, respectively. Between 2013 and 2015, he was previously a senior manager, heading the web retrieval group in Yahoo Labs Barcelona. His main research interests are distributed information retrieval and web search efficiency. In 2010, 2011, 2014, and 2015, he co-organized the LSDS-IR workshop. He was the proceedings chair for WSDM’09 and the poster and proceedings chairs for ECIR’12. He served as an area chair in SIGIR’13 and SIGIR’14. He regularly serves in the program committees of SIGIR, WWW, and KDD conferences. He currently serves in the editorial board of IP&M. He has many papers published in prestigious journals including IEEE TPDS, JPDC, JASIST, Inf. Syst., ACM TWEB, and IP&M, as well as papers and tutorials presented at top-tier conferences, such as SIGIR, CIKM, WSDM, WWW, and KDD.
Ricardo Baeza-Yates (Universitat Pompeu Fabra, Barcelona, Spain & Universidad de Chile, Santiago, Chile) areas of expertise are information retrieval, web search and data mining, data science and algorithms. He was VP of Research at Yahoo Labs, based in Sunnyvale, California, from August 2014 to February 2016. Before he founded and lead from 2006 to 2015 the Yahoo labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab, and started the London lab in 2012. He is part time Professor at the Dept. of Information and Communication Technologies (DTIC) of the Universitat Pompeu Fabra (UPF), in Barcelona, Spain, as well as in the Dept. of Computing Science (DCC) of Universidad de Chile at Santiago. During 2005 he was an ICREA research professor at UPF. Until 2004 he was Professor and founding director of the Center for Web Research at Universidad de Chile. He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.