![]() |
||||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||||
|
The core task of an Internet Search Engine is to search over 2 billion items in less than 1/10 of second and return a list of highly relevant results given only 2.2 query terms. A number of technologies have been brought to bear on this problem drawing from distributed systems theory, computational linguistics, data mining and information retrieval. This tutorial will survey these technologies, including large-scale Internet crawling strategies, Web page content analysis, result set ranking algorithms, and evaluation methodologies. The tutorial will assume a familiarity with IR concepts, such as vector-space matching, but will otherwise develop from the ground-up all that is required to gain an operational understanding of Internet Search Engine fundamentals. The tutorial will also touch on active areas of research for Third Generation Internet Search Engines. Jan Pedersen is currently Chief Scientist at AltaVista, a leading Internet search engine company. In that role Dr. Pedersen spearheads AltaVista's middle- and long-term product initiatives, including research and development efforts on user-centric, next-generation search technologies. Prior to joining AltaVista, Pedersen held senior technical management positions at several Silicon Valley start-ups in the Wireless Internet and CRM Analytics arenas. Dr. Pedersen also worked at Infoseek/Go Network, a first generation Internet search engine, as Director of Search and Spidering, and at Verity, the leading enterprise search software vendor, as Manager of the Advanced Technology Group. Pedersen began his career at Xerox's Palo Alto Research Center (PARC). He holds a Ph.D. in Statistics from Stanford University and a BA in Statistics from Princeton University. He is credited with eleven issued patents and has authored twenty-three refereed publications on information access topics, seven of which are in the Special Interest Group on Information Retrieval (SIGIR) proceedings. |