![]() |
||||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||||
|
Margin-based learning approaches, like Support Vector Machines (SVMs) and Boosting, have substantially enhanced the understanding and design of machine learning methods over the recent years. They combine excellent empirical performance with good theoretical understanding, in particular, regarding the possibility to learn accurately and efficiently in high-dimensional spaces. While these advances have led to improvements in text classification and related problems, other information retrieval tasks with similar properties have seen no or little treatment. The goal of this tutorial is threefold. First, it gives a self-contained introduction to the basics of SVMs and Kernels, with an emphasis on their application in text classification. Second, it gives an overview of the "tricks-of-the-trade" for making SVMs work well on a problem. And third, it introduces new Support Vector (SV) methods that have the potential to make a contribution to other information retrieval tasks beyond text classifications. In particular, the tutorial will cover SV methods for learning retrieval functions and for novelty detection. And, finally, the tutorial will cover kernels that operate directly on strings, circumventing a bag-of-words representation. Thorsten Joachims is an Assistant Professor in the Department of Computer Science at Cornell University. His research interests lie in machine learning, statistical learning theory, and information access. |