(Big) Usage Data in Web Search

Bio | Summary


Ricardo Baeza-Yates, Yahoo! Research, Diagonal 177, p9, Barcelona, Spain. Email: rbaeza@acm.org
Ricardo Baeza-Yates is VP of Yahoo! Labs for Europe, Middle East and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile. Until 2005, he was the director of the Center for Web Research at the Department of Computer Science of the Engineering School of the University of Chile, and ICREA Professor at the Dept. of Technology of University Pompeu Fabra in Barcelona, Spain. He is co-author of the bestseller textbook Modern Information Retrieval by Addison-Wesley, first published in 1999 with a second edition in 2011, as well as co-author of the second edition of the Handbook of Algorithms and Data Structures, Addison- Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 200 other publications. He has been PC-Chair of the most important conferences in the field of Web Search and Web Mining. He has given tutorials in most major conferences many times, including SIGIR, WWW and VLDB. He is, both, ACM and IEEE Fellow.

Yoelle Maarek, Yahoo! Research, MATAM, Haifa 31905, Israel. Email: yoelle@ymail.com
Yoelle Maarek is the Senior Director of Yahoo! Research in Israel, leading research activities for Yahoo! Mail and Yahoo! Answers. Until 2009, Yoelle was the Director of Google Haifa Engineering Center, which she opened in 2006. There she led among other things, the development of "Suggest", Google's query completion feature deployed on google.com and YouTube worldwide. Prior to this, Yoelle was with IBM Research, first in the US, and then in Israel. At IBM, she held a number of technical and management positions around search products and became a Distinguished Engineer. She received her PhD in Computer Science from Technion, in Haifa, Israel, in 1989, graduated from the "Ecole Nationale des Ponts et Chaussees", and received her DEA degree in Computer Science from Paris VI University, both in Paris, France, in 1985. Yoelle has served as senior PC member at most recent SIGIR, WWW and WSDM conferences, as PC co-chair of WWW'2009, WSDM'2012 and SIGIR'2012. Yoelle is a member of the Board of Governors of Technion. She was appointed an ACM Distinguished Scientist in 2010.


Web Search, which takes its root in the mature field of information retrieval, evolved tremendously over the last 20 years. The field encountered its first revolution when it started to deal with huge amounts of Web pages. Then, a major step was accomplished when the structure of the Web graph was taken into consideration and link analysis methods were invented to improve both crawling and ranking. Most recently, search engines started to monitor and mine the signals provided by users while searching. In this tutorial we focus on this last step of exploiting usage data at a large scale. We will first consider the various forms it takes, such as query logs and click data. Then we will review the numerous key Web Search applications that it made possible. Finally, we will discuss its limitations and more specifically three factors that often pull in opposite directions when dealing with usage data: the size of the data, personalization needs and privacy concerns. We will conclude by offering some possible ways to circumvent some limitations through different types of aggregation. This half-day tutorial is a sequel of the "Web Retrieval: The Role of Users", a tutorial offered for the first time at SIGIR'2010 in Geneva and then at WSDM'2011 and ECIR'2011. While this tutorial does not assume having attended the prequel, it should attract the same type of audience and hopefully some returning attendees. This tutorial will be conducted in the form of an advanced graduate class (minus the assignments). Active participation from the audience will be strongly encouraged.