Conference Home Page

Toronto Life Magazine

eye Magazine

toronto.com

Important Dates

Submitting Research Papers

Submitting Posters and Demos

Submitting Workshop Proposals

Submitting Tutorial Proposals

Printed Call for Papers (PDF)

Events

Keynote

Banquet

Tutorials

Workshops

Registration

Accommodation

Schedule at a Glance (PDF)

Area Coordinators

Conference Committee

Local Arrangements Committee

Contact

The Association for Computing Machinery
 
Microsoft Research
 
IBM Research
 
Altavista
 
Hummingbird
 
University of Waterloo
 
University of Toronto
 

SIGIR 2003 Morning Tutorial
Text Mining - State of the Art
July 28, 9:00-12:30 at the Tom Thomson Room, Hilton Hotel
Ronen Feldman

The information age has made it easy to store large amounts of data. The proliferation of documents available on the Web, on corporate intranets, on news wires, and elsewhere is overwhelming. However, while the amount of data available to us is constantly increasing, our ability to absorb and process this information remains constant. Search engines only exacerbate the problem by making more and more documents available in a matter of a few key strokes. Text Mining is a new and exciting research area that tries to solve the information overload problem by using techniques from data mining, machine learning, Information Extraction, Text Categorization, Visualization and Knowledge Management. Text Mining is the process of building up networks of interconnected objects through various relationships in order to discover patterns and trends. The main tasks of text mining are to extract, discover, and link together sparse evidence from vast amounts of data sources, to represent and evaluate the significance of the related evidence, and to learn patterns to guide the extraction, discovery, and linkage of entities. Text Mining involves the preprocessing of document collections (text categorization, term extraction, and information extraction), integration with structured information sources, the storage of the intermediate representations, the techniques to analyze these intermediate representations (distribution analysis, clustering, trend analysis, association rules, etc.) and visualization of the results. In this tutorial we will present the general theory of Text Mining and will demonstrate several systems that use these principles to enable interactive exploration of a combination of structured and unstructured collections. We will present a general architecture of text mining systems and will outline the algorithms and data structures behind the systems. The Tutorial will cover the state of the art in this rapidly growing area of research. Several real world applications of text mining will be presented.

Ronen Feldman is a senior lecturer at the Mathematics and Computer Science Department of Bar-Ilan University in Israel, and the Director of the Data Mining Laboratory. He received his B.Sc. in Math, Physics and Computer Science from the Hebrew University, M.Sc. in Computer Science from Bar-Ilan University, and his Ph.D. in Computer Science from Cornell University in NY. He was an Adjunct Professor at NYU Stern Business School. He is the founder and president of ClearForest Corporation, a NY based company specializing in development of text mining tools and applications.