David D. Lewis [Short Bio]
(www.DavidDLewis.com)
Abstract:
Logistic regression is a flexible, effective approach to supervised learning of classifiers. It is most notably and widely used in text classification and text mining, but has seen application to almost every information retrieval and natural language processing problem. I will present logistic regression from several points of view (statistics, machine learning, neural networks, and maximum entropy), so that attendees can draw on good ideas and good software from each field. I will also point out how the basic ideas of logistic regression have been adapted to produce structured approaches such as conditional random fields and probabilistic relational models, giving the attendee a leg up on understanding these techniques.
I will discuss a number of practical issues in applying logistic regression to high dimensional data in general, and to textual data in particular. Particular emphasis will be put on methods, including Bayesian priors, for using domain knowledge to reduce the amount of training data necessary to achieve good effectiveness. I will also discuss the range of software available for applying logistic regression, and the strengths and weaknesses of software approaches from different fields.
Bio:
David D. Lewis (www.DavidDLewis.com) is a freelance computer scientist working in the areas of information retrieval, text mining, machine learning, and natural language processing. He has consulted for startups, corporations of all sizes, investors, universities, nonprofits, government agencies, and law firms. He previously held research positions at AT&T Labs, Bell Labs, and the University of Chicago. Dave has published more than fifty scientific papers and holds six patents on information retrieval and text mining technology. He was elected a Fellow of the American Association for the Advancement of Science in 2006. He is co-designer and project manager for the widely used open source C++ Bayesian logistic regression packages BBR, BMR, and BXR (www.bayesianregression.org), and is currently collaborating on a new Java-based logistic regression package designed for embedding in streaming data systems.