This year’s SIGIR conference will again include an Industry Track, which will be held on Wednesday 31st July, during the regular conference program, and in parallel with two technical/scientific tracks. The Industry Track will be held at the Exam Hall (Public Theatre) at Trinity College Dublin.
Participation in the Industry Track is included as part of the main conference registration. A separate Industry Track-only registration is also available at €250 for attendees who only want to attend the Industry Track on Wednesday.
Please see below the full program of the Industry Track, together with speaker information and abstracts.
The conference co-chairs would like to thank Sue Feldman and James G. Shanahan, Industry Track Chairs, for their efforts in putting together a fantastic program!
Wednesday, July 31, 2013 - SIGIR 2013 Industry Track Program
Title: Web of Confusion: Why Everything in Web Search is Broken
Abstract: It’s the classic innovators dilemma: companies have built empires on the web of yesterday, matching keywords to ads and providing organic results. The challenge to which search companies must adapt is the makeup of the changing web – aspects relating to social, multimedia, device input, and geospatial – and how both IR models and the financial models that enable them must undergo a revolution. What will it take to reboot consumers’ expectations and demands of search and what are some roads the industry must take to get there?

Speaker Bio
Stefan Weitz - Senior Director, Bing Search, Microsoft Corporation
Stefan Weitz is a Senior Director of Search at Microsoft and charged with working with people and organizations across the industry to promote and improve Search technologies. While focused on Microsoft's product line, he works across the industry to understand searcher behavior, academic developments, and innovations from all over and, in his role as an evangelist for Search, gathers and distills feedback to drive product improvements.
Prior to Search, Stefan led the strategy to develop the next generation MSN portal platform and developed Microsoft's muni WiFi strategy and implementation, leading the charge to blanket free WiFi access across metropolitan cities. Stefan has been writing code since he was 8 years old and is fluent in both hardware and software architecture, trends, and potentials. A 15-year Microsoft veteran, he has worked in various groups including Windows Server, Windows, Informatics Security, and licensing in roles ranging from development to program management, business development to marketing. Stefan holds a half-dozen patents in various disciplines and is a frequent lecturer to industry and academic groups on the future of information storage, retrieval, and usage. Stefan is a huge gadget 'junkie' and can often be found in electronics shops across the world looking for the elusive perfect piece of tech.
Stefan also serves on advisory boards for many startups ranging from biometrics to advertising to virtualization and is an active Angel investor. In his other spare cycles, he is working with national educational reinvention groups to reboot K-12 education in this country and is actively advising startups that are focusing on boosting student achievement through technology and big data. Finally, Stefan is working on a book with the nation’s youngest VC to promote entrepreneurism to the high-school crowd and is advising on how to make available 40 years of archived data from the 92nd Street Y in NYC.
Back to Programme
Title: An Engaging Click
Abstract: A good search engine is one when users come very regularly, type their queries, get their results, and leave quickly. With user engagement metrics from web analytics, these translate to a low dwell time, often low CTR, but a very high return rate. But user engagement is not just about this. User engagement is a multifaceted, complex phenomenon, giving rise to a number of approaches for its measurement: self-reporting (e.g. questionnaires); observational methods (e.g., facial expression analysis, desktop actions); and of course web analytics using online behavior metrics. These methods represent various trade-offs between the scale of the data analyzed and the depth of understanding. For instance, surveys are hardly scalable but offer rich, qualitative insights, whereas click data can be collected on a large-scale but are more difficult to analyze. This talk will present various efforts aiming at combining approaches to measure engagement and seeking to provide insights into what makes an engaging experience. The talk will focus of what makes users click or not click, and what this means in terms of user engagement. This is joint work mainly with Mounia Lalmas and Janette Lehmann.

Speaker Bio
Ricardo Baeza-Yates - VP, Yahoo! Research Europe & Latin America
Ricardo Baeza-Yates is VP of Yahoo! Research for Europe and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile, since 2006. Since 2005, he is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra in Barcelona, Spain. Until 2005 he was Professor and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile. He obtained a Ph.D. from the University of Waterloo, Canada, in 1989.
Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electrical engineering degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 300 other publications.
He has received the Organization of American States award for young researchers in exact sciences (1993) and the CLEI Latin American distinction for contributions to CS in the region (2009). In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences. During 2007 he was awarded the Graham Medal for innovation in computing, given by the University of Waterloo to distinguished ex-alumni. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.
Back to Programme
Title: Find and Be Found: Information Retrieval at LinkedIn
Abstract:LinkedIn has a unique data collection: the 200M+ members who use LinkedIn are also the most valuable entities in our corpus, which consists of people, companies, jobs, and a rich content ecosystem. Our members use LinkedIn to satisfy a diverse set of navigational and exploratory information needs, which we address by leveraging semi-structured and social content to understanding their query intent and deliver a personalized search experience. In this talk, we will discuss some of the unique challenges we face in building the LinkedIn search platform, the solutions we've developed so far, and the open problems we see ahead of us.

Speaker Bio:
Shakti Sinha heads LinkedIn's search relevance team, and has been making key contributions to LinkedIn's search products since 2010. He previously worked at Google as both a research intern and a software engineer. He has an MS in Computer Science from Stanford, as well as a BS degree from College of Engineering, Pune.

Speaker Bio:
Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
About LinkedIn
LinkedIn operates the world’s largest professional network on the Internet with more than 200 million members worldwide, and two new members joining per second. The company is publicly held, and has a diversified business model with revenues coming from talent solutions, marketing solutions and premium subscriptions. Headquartered in Mountain View, LinkedIn is a global company, with members in over 200 countries, and two thirds of them outside the United States. LinkedIn members did over 5.7 billion professionally-oriented searches on the platform in 2012.
Back to Programme
Title: Current and Future Needs for Commercial Search Engines
Abstract: There is a clash between openly available Web information and information secured inside Apps. The problem of siloed information, well known from the world of database, poses similar problems for Enteprise Search engines. In this talk, we will discuss some of the problems we have encountered trying to bridge these information sources, what works, and what remains to be done.

Speaker Bio:
Gregory Grefenstette is Senior Strategist at Exalead, a division of Dassault Systèmes. He received his B.S. from Stanford University in 1978, and a Ph.D. in Computer Science from the University of Pittsburgh in 1993. He has been Principal Scientist at the Xerox Research Centre (1993-2001), with Clairvoyance (2001-2003) and at the French applied research centre, the CEA (2001-2008). His research interests range from most subjects in Natural Language Processing to all aspects of Information Retrieval. He serves on the editorial board of the Journal for Natural Language Engineering, and he edited the first book on Cross Language Information Retrieval (Kluwer 1998). His most recent book, written with Laura Wilber is called Search-based Applications: at the confluence of search and database technologies (Morgan Claypool, 2011).
Back to Programme
Title: Information Retrieval for Electronic Discovery in Legal Cases
Abstract: Changes in the US Federal Rules of Civil Procedure in December 2006 led to an explosion in electronic discovery (e-discovery): the finding of electronically stored information to be turned over to parties in legal cases. Traditional manual review approaches (rooms full of low paid lawyers and paralegals reading documents one at a time) have collapsed under this burden, spawning a multi-billion dollar e-discovery software and services industry. Information retrieval technology, particularly supervised machine learning for text classification (referred to as "predictive coding" in e-discovery), plays a pivotal role. I will review the major technological and process challenges in e-discovery, the ways in which information retrieval has been brought to bear on these challenges, and results from benchmarking efforts (in particular the NIST TREC Legal Track) in this area. I will also briefly discuss open information research questions whose solutions might have a substantial impact on e-discovery practice.

Speaker Bio:
Dave Lewis Ph.D. (www.DavidDLewis.com) is a consulting computer scientist working in the areas of information retrieval, data mining, natural language processing, and the evaluation of complex information systems. He formerly held research positions at AT&T Labs, Bell Labs, and the University of Chicago. He has published more than 75 scientific papers and 8 patents, and is a Fellow of the American Association for the Advancement of Science. Dr. Lewis has served as a consulting and testifying expert on e-discovery issues in civil litigation, including in the Kleen Products, Actos, da Silva Moore, and FHFA cases.
Back to Programme
Title: From Pre-Crime to Disaster Relief: Discovering the Power of Social Media Analytics
Abstract:With the massive amount of data being generated by social media networks like Twitter, organizations are exploring new and impactful ways to use the information. In this session, you'll hear from Rod Smith, IBM Fellow and Vice President of IBM's Emerging Internet Technologies group, and learn how IBM has been working on big data analytic projects that can benefit the public. Rod will discuss various solutions his group has created which analyze social media in various ways, ranging from helping disaster relief workers identify the areas most affected and route emergency supplies to those areas, to helping police departments identify potential crimes before they happen. Rod will also discuss how analytics can be used to find value and insight in big data and share lessons his team has learned while building first-of-a-kind solutions around big data analytics.

Speaker Bio:
Rod Smith is an IBM fellow and Vice President of the IBM Emerging Internet Technologies organization, where he leads a group of highly technical innovators who are developing solutions to help businesses gain insight from big data. In his many years in the industry, Rod has moved IBM – and the industry – to a rapid adoption of technologies such as Web services, XML, Linux, J2EE, rich Internet applications, and various wireless standards. As an IBM Fellow, Rod is helping lead IBM's strategic planning around big data analytics and the application of IBM Watson like technologies to business solutions, helping companies make better decisions more quickly for improved business outcomes.
Back to Programme
Title: Online Controlled Experiments: Introduction, Insights, Scaling and Humbling Statistics
Abstract: The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). From front-end user-interface changes to backend algorithms, online controlled experiments are now utilized to make data-driven decisions at Amazon, eBay, Facebook, Google, Intuit, LinkedIn, Microsoft, Netflix, Yahoo, Zynga, and at many other companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and to Gosset’s t-test at Guinness in Dublin Ireland (where SIGIR this year is held), running online controlled experiments at scale—hundreds of concurrent experiments on a given day at Bing---has taught us many lessons. We provide an introduction, share real examples, key insights, cultural challenges, scaling challenges, and humbling statistics.

Speaker Bio
Ronny Kohavi is a partner architect in the Online Services Division at Microsoft. He joined Microsoft in 2005 and founded the Experimentation Platform team in 2006. He was previously the director of data mining and personalization at Amazon.com, and the Vice President of Business Intelligence at Blue Martini Software, which went public in 2000, and later acquired by Red Prairie. Prior to joining Blue Martini, Kohavi managed MineSet project, Silicon Graphics' award-winning product for data mining and visualization. He joined Silicon Graphics after getting a Ph.D. in Machine Learning from Stanford University, where he led the MLC++ project, the Machine Learning library in C++ used in MineSet and at Blue Martini Software. Kohavi received his BA from the Technion, Israel. He was the General Chair for KDD 2004, co-chair of KDD 99's industrial track with Jim Gray, and co-chair of the KDD Cup 2000 with Carla Brodley. He was an invited speaker at the National Academy of Engineering in 2000, a keynote speaker at PAKDD 2001, an invited speaker at KDD 2001's industrial track, a keynote speaker at EC 10 (2010) and at Recsys 2012.
Back to Programme
Title: Open Source Search FTW!
Abstract: Apache Lucene and Solr are the most widely deployed search technology on the planet, powering sites like Twitter, Wikipedia, Zappos and countless applications across a large array of domains. They are also free, open source, extensible and extremely scalable. Lucene and Solr also contain a large number of features for solving common information retrieval problems ranging from pluggable posting list compression and scoring algorithms to faceting and spell checking. Increasingly, Lucene and Solr also are being (ab)used to power applications going way beyond the search box. In this talk, we'll explore the features and capabilities of Lucene and Solr 4.x, as well as look at how to (ab)use your search engine technology for fun and profit.

Speaker Bio
Grant is the CTO and co-founder of LucidWorks as well as an active member of the Apache Lucene community – a Lucene and Solr committer, and co-founder of the Apache Mahout machine learning project. He is also the lead author of "Taming Text" from Manning Publications. Grant’s experience includes engineering a variety of search, question answering and natural language processing applications for a variety of domains and languages. Grant earned his bachelor degree from Amherst College in Math and Computer Science and his master degree in Computer Science from Syracuse University.
Back to Programme
Title: Uniting Enterprise Search with Database Technology
Abstract: Five years ago Fast Search and Transfer (FAST), one of the leading enterprise search providers, was acquired by Microsoft. During the years before and continuing after the acquisition, the team has done extensive research and development to combine database technologies with search engines. The work has been done both internally in FAST and as part of the research programs like Pharos and iAD and in collaboration with several universities. This effort has been driven by requirements from customers wanting consistency, flexibility, extensibility, extreme scale and a wide span of other novel features without compromising the usual treats of search engines like query speed, scalability, ad-hoc query capability and relevancy ordering.
The talk will cover how the team brought database logs, transactional consistency, column stores and a relational query engine into a search engine. The presentation will also discuss how these mechanisms enabled real-time indexing, a flexible relevancy model, distributed joins and an extensible evaluation engine that powers both content processing, query processing and query evaluation. We will in addition include a section on how it is possible to index both unstructured, semi-structured and structured data in the same index without predefining a schema and still provide efficient and flexible query capabilities.

Speaker Bio
Øystein Torbjørnsen - Partner Program Manager, Microsoft Development Center Norway
Øystein Torbjørnsen is working as architect for enterprise search in the Microsoft Office Division and is part of the team developing search for SharePoint and Exchange. He entered Microsoft as part of the acquisition of Fast Search and Transfer in 2008 and his main focus is on core search engine technology including performance, scalability and dependability. Prior to starting working for FAST, he co-founded Clustra Systems and was architect for a high availability, real-time database management system. Later Clustra were acquired by Sun Microsystems where he held a position as Distinguished Engineer. Torbjørnsen has a Ph.D. in high-availability database management systems from the Norwegian University of Science and Technology in Trondheim where he currently also holds a position as adjunct professor.
Back to Programme
Title: Some of the Problems and Applications of Opinion Analysis
Abstract: Websays strives to provide the best possible analysis of online conversation to marketing and social media analysts. One of the obsessions of Websays is to provide "near-man-made" data quality at marginal costs. I will discuss how we approach this problem using innovative machine learning and UI approaches. .

Speaker Bio
Hugo Zaragoza is the founding CEO of Websays, a company dedicated to the analysis of conversations and opinions online. Hugo has been a researcher at the frontier of Natural Language Processing, Machine Learning and Search (or Information Retrieval) for over ten years. At Yahoo! Research (Barcelona) Hugo led the Natural Language Retrieval group from 2006 to 2011. Research dealt mainly with applications of natural language processing to web search applications, in particular relevance ranking and algorithms for search over large and heavily annotated collections (our version of "semantic search"). From 2001 to 2006 Hugo worked at Microsoft Research (Cambridge, UK) with Stephen Robertson, where he explored applications of machine learning and natural language processing for information retrieval, in particular for corporate and web search, but also on document classification, expert finding, relevance feedback and dialogue generation for games.
Back to Programme