![]() |
||||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||||
|
The Web graph, meaning the graph induced by Web pages as nodes and their hyperlinks as directed edges, has become a fascinating object of study for many people: physicists, sociologists, mathematicians, computer scientists, and information retrieval specialists. Recent results range from theoretical (e.g.: models for the graph, semi-external algorithms), to experimental (e.g.: new insights regarding the rate of change of pages, new data on the distribution of degrees), to practical (e.g.: improvements in crawling technology). The goal of this talk is to convey an introduction to the state of the art in this area and to sketch the current issues in collecting, representing, analyzing, and modeling this graph. Although graph analytic methods are essential tools in the Web IR arsenal, they are well known to the SIGIR community and will not be discussed here in any detail; instead, we will explore some challenges and opportunities for using IR methods and techniques in the exploration of the Web graph, in particular in dealing with legitimate and "spam" perturbations of the "natural" process of birth and death of nodes and links, and conversely, the challenges and opportunities of using graph methods in support of IR on the Web and in the enterprise. Andrei Broder is an IBM Distinguished Engineer and the CTO of the Institute for Search and Text Analysis in IBM Research. From 1999 until early 2002 he was Vice President for Research and Chief Scientist at the AltaVista Company. Previously he has been a senior member of the research staff at Compaq's Systems Research Center in Palo Alto. He was graduated Summa cum Laude from Technion, the Israeli Institute of Technology, and obtained his M.Sc. and Ph.D. in Computer Science at Stanford University under Don Knuth. His main research interests are the design, analysis, and implementation of randomized algorithms and supporting data structures, in particular in the context of web-scale information retrieval and applications. Broder is co-winner of the Best Paper award at WWW6 (for his work on duplicate elimination of web pages) and at WWW9 (for his work on mapping the web). He has published more than seventy papers and was awarded sixteen patents. He serves as chair of the IEEE Technical Committee on Mathematical Foundations of Computing. |