Information Retrieval for E-Discovery

Bio | Summary

Bio:

David D. Lewis, Ph.D. is a consultant working in the areas of information retrieval, text mining, machine learning, natural language processing, and the statistical evaluation of complex information systems. He has published more than 75 scientific papers and 8 patents, and is a Fellow of the American Association for the Advancement of Science. Dr. Lewis was one of the co-founders of the TREC Legal Track, the first large scale open evaluation of information retrieval technology for electronic discovery in legal cases. He has designed algorithms and processes for e-discovery software and service companies, and has served as a consulting expert and expert witness on e-discovery issues in US Federal Court cases.

Summary:

This tutorial will introduce the IR issues posed by the "discovery" process in legal cases, i.e. the requirement that parties turn over documents responsive (relevant) to the issues at dispute. The explosion of electronically stored information has created a multi-billion dollar market in technology and services for discovery on electronically stored information (e-discovery). The tutorial is intended both for practitioners interested in operational e-discovery issues, as well as for researchers looking for new research problems prompted by the needs of this industry.

I will discuss the discovery process, the scale and diversity of materials to be searched, and the economics of identifying and reviewing potentially responsive material. I will then focus on three major IR areas: search, supervised learning (including text classification), and relevance assessment. For each, I will discuss the technologies used in e-discovery, the evaluation methods applicable to measuring effectiveness, and research results that have not yet seen commercial practice. I will also outline research directions in IR for e-discovery, and in particular ones where progress can be made without access to "realistic" test collections. Connections will be drawn with the use of IR in related tasks, such as enterprise search, criminal investigations, intelligence analysis, historical research, truth and reconciliation commissions, and freedom of information (open records or sunshine law) requests.