--------------------------------------- PRE-CONFERENCE TUTORIALS --------------------------------------- Sunday, July 9, 1995 SIGIR tutorials provide an opportunity to learn the basics of information retrieval or to learn a new or specialized area from experts in the field. This year six half-day tutorials are available prior to the main program, held in parallel sessions during the morning and afternoon. Separate payment is required for tutorials. Morning Tutorials: 8:30 a.m. - 12:00 p.m. INTRODUCTION TO INFORMATION RETRIEVAL PETER WILLETT and PETER INGWERSEN University of Sheffield and Royal School of Librarianship, Copenhagen This tutorial will provide an overview of the two principal current approaches to the searching of text databases. After a brief introduction to the characteristics of information retrieval that differentiate it from other types of database searching, the first part of the tutorial will describe the algorithms and data structures that are needed to maximize the effectiveness and the efficiency of ranked-output approaches to information retrieval. The second part will summarize work on cognitive approaches that focus on the role of the user and of the knowledge resources involved in information retrieval. Peter Willett holds a Personal Chair in Information Science at the University of Sheffield, where he heads a large research group studying novel techniques for searching biological, chemical and textual databases. He is a Member of the British Computer Society, a Fellow of the Institute of Information Scientists and was the recipient of the 1993 Skolnik Award of the American Chemical Society for his contributions to chemical information science. Peter Ingwersen is Head of Department of IR Theory at the Royal School of Librarianship, Denmark. His main research interest is the development of cognitive aspects and understanding of IR. He is Fellow of the Library Association and member of the Institute of Information Scientists from which he received the 1993 Jason Farradane Award. He was the recipient of the ASIS/NJ Distinguished Lectureship Award 1994 for his contributions to the field among which is his book "Information Retrieval Interaction". QUERY--DOCUMENT SYMMETRY AND DUALITY STEPHEN ROBERTSON, City University, London This tutorial will discuss certain formal aspects of modelling IR systems. The discussion may be of interest to students, and perhaps to some established researchers, involved in developing specific mathematical or logical models of IR as part of their re search (for example, those investigating or experimenting on probabilistic, other statistical, linguistic or AI approaches to IR). The central argument is that presented in a paper recently published in the Journal of Documentation (vol. 50, 1994, pp 233- 238), which will be given as a text for the tutorial, together with some additional notes. The format will involve presentation by the instructor with frequent invitations to the audience to contribute to the discussion. The discussion will cover: * examples of similar situations where symmetry is more clearly present(e.g., in matching people seeking work against vacant posts); * discussion of the interpretation of symmetry in the case of IR, and associated difficulties; * examples of dual models in IR (e.g., the traditional Boolean model versus a model in which queries are lists of terms and documents are Boolean expressions); * discussion of the difficulty of enriching a model by combining it with its dual. The basic argument is model-independent, and can be used in the context of very different models or approaches to IR. The tutorial should therefore be of interest to a variety of people. Stephen Robertson has a first degree in mathematics and a doctorate in information science, and has been publishing in information retrieval since 1969. His main areas of specialization are the evaluation of IR systems, and probabilistic models in IR. He is joint director of the Centre for Interactive Systems Research, which is the home of the Okapi experimental system. He is also head of the Department of Information Science at City University, London. WHAT DIFFERENCES ARE SIGNIFICANT? STATISTICAL ANALYSIS OF IR TESTS JEAN TAGUE-SUTCLIFFE, JAMES BLUSTEIN and PAUL KANTOR University of Western Ontario and Rutgers University The TREC tests and conferences have stimulated interest in the statistical analysis of the results of information retrieval tests. Essentially, statistical analysis answers the question: what differences in retrieval results from different systems or strategies are significantly established in a test situation in the sense that they are unlikely to have appeared merely as the result of random variation over query sets? There are a number of statistical tests which can appropriately answer this quest ion, where the appropriateness of the test depends on the nature of the data, its scale, variability, and distributional features. In this tutorial we will present a number of these tests, using the TREC-3 data for purposes of examples, and provide an opportunity for attendees to gain familiarity with the tests through using a customized software package. Jean Tague-Sutcliffe is Dean and Professor at the Graduate School of Library and Information Science, University of Western Ontario. She is a well-known writer and speaker on the design and analysis of information retrieval tests. She is currently developing a suite of statistical tests for TREC-like results which will be made available to IR researchers worldwide by the National Institute of Standards and Technology. She teaches in the areas of information systems, research methods, and statistical techniques. Recently, Academic Press published her book "Measuring Information: an Information Services Perspective". Paul Kantor is Professor at the School of Communication, Information and Library Studies at Rutgers, the State University of New Jersey (USA). He is an internationally recognized authority on the evaluation of the costs and benefits of library and information services, and the author of more than 80 refereed papers and technical reports. He has participated in the TREC-2, TREC-3 and TREC-4 conferences. He teaches Quantitative Research Methods and Statistics, as well as Information Retrieval Techniques at Rutgers SCILS. In 1994 he received the SCILS Research Award. James Blustein is a doctoral student in the department of Computer Science at the University of Western Ontario. His main research interest is the creation and evaluation of hypertext. Afternoon Tutorials: 1:30 p.m. - 5:30 p.m. EVALUATION OF IR SYSTEMS WILLIAM HERSH and MICHELINE HANCOCK-BEAULIEU Oregon Health Science University and City University, London The aim of this tutorial is to provide an overview and critical assessment of information retrieval system evaluation. Until now the Cranfield approach to IR with recall and precision measures has dominated retrieval testing. Developments in end-user information systems such as CD-ROM's, hypertext public access systems, and the Internet are presenting new evaluation challenges. The tutorial will start with basic research concepts and their application in IR evaluation. Approaches adopted in various classic retrieval experiments will be presented and their limitations will be discussed. More recent evaluative studies conducted at City University London, Oregon Health Sciences University, and TREC will be used to illustrate efforts towards more user-centered evaluation. The final discussion will sum up the issues and consider future directions in accommodating both system and user oriented evaluation in IR. William Hersh is Assistant Professor of Medicine and Medical Informatics at Oregon Health Sciences University in Portland, Oregon. His main research interests are in the areas of automated indexing, evaluation methodologies for end-user searching, and data extraction from the electronic medical record. While his evaluation work was initially focused in the medical domain, the problems encountered have led him to confront issues of evaluation more generally. Micheline Hancock-Beaulieu is Professor of Information Science and co-director of the Centre for Interactive Systems Research at City University in London. The Centre is concerned with the design and evaluation of advanced retrieval systems and has been responsible for the development of Okapi, a system based on a term weighting probabilistic model and one of the leading participants in TREC. Her research interests are in evaluation methodology, information seeking behaviour and human-computer interaction in IR. DESIGNING INFORMATION FOR THE COMPUTER SCREEN PAUL KAHN, Dynamic Diagrams The tutorial will focus on the issues of visual orientation in hypermedia and information retrieval software environments. The presentation method will be a combination of slide lecture and interactive demonstration of materials developed by the instructor and others. The overall goal is to help participants see and articulate the elements of good screen design. The purpose of the workshop is as much consciousness raising about the value and vocabulary of design as it is a collection of practical tips. This tutorial is intended for computer professionals responsible for or working with information to be read on the computer screen. No formal background (or talent) in design is required. Paul Kahn has training in literature and typography and has worked with a variety of electronic publishing systems since 1977. From 1985 through 1994 he worked on the development of hypermedia materials at Brown University's Institute for Research in Information and Scholarship, where he served as project coordinator and director. Kahn is president of Dynamic Diagrams, an information design studio which specializes in information graphics and electronic publications. Dynamic Diagrams provides design services for a broad range of computer applications, from electronic textbooks and reference materials to specialized graphics and telecommunications applications. DATA FUSION IN IR PAUL KANTOR, Rutgers University Data fusion (DF) comprises methods for improving retrieval (or indexing) performance by combining the outputs of several distinct methods for performing the task at hand. In contrast to combination-of-evidence methods, data fusion is not limited to combining inputs compatible with a specific conceptual framework. Thus data fusion can deal with "black box" components such as proprietary systems. In addition, data fusion develops its own assessments of the power of the component systems, using this to develop optimal fusion rules. The systems need not be presumed to produce stochastically independent results. This tutorial will be self-contained, will develop the basic ideas of data fusion in IR, and will survey the growing array of results from diverse applications of DF in IR. Paul Kantor is Professor at the School of Communication, Information and Library Studies at Rutgers, the State University of New Jersey (USA). He received the 1994 SCILS Research Award. He is an internationally recognized authority on the evaluation of the costs and benefits of library and information services, and the author of more than 80 refereed papers and technical reports. He has participated in the TREC-2, TREC-3 and TREC-4 conferences, was an invited participant in the 1992 National Engineering Foundation/CLR conference on a National Engineering Information System and the 1993 NSF workshop on the Digital Libraries initiative. Prior to 1991 he worked extensively on the application of data fusion concepts to distributed detection and decision in connection with the Strategic Defense Initiative and related programs. He teaches Information Retrieval Techniques at Rutgers SCILS, and has recently developed, with H. Hirsh, a graduate Computer Science course on Information Retrieval in the Networked Environment. His degrees are in Mathematics and Physics, from Columbia University and Princeton University. --------------------------------------- POST-CONFERENCE RESEARCH WORKSHOPS --------------------------------------- Thursday, July 13, 1995 8:30 a.m. - 3:00 p.m. The conference will be followed by five parallel one-day workshops. Separate payment is required for a workshop. Registration includes a coffee break and lunch. VIRI: VISUAL INFORMATION RETRIEVAL INTERFACES ______________________________________________ A Research Workshop A visual information retrieval interface (VIRI) is defined as one that uses graphic elements in addition to text to aid the solution of a problem related to information storage and retrieval. More than twenty such interfaces already exist, with different retrieval models, graphical metaphors, and user interactions. Furthermore, the interfaces have different strengths, for example, retrieval, browsing, and document classification. The focus of the workshop is to exchange information, and to begin development of a method for comparing these interfaces. Researchers and practitioners who are actively working on VIRI projects are particularly invited to participate. Some effort will be put into developing a classification scheme for VIRIs and identifying major research issues related to visual interfaces. Following this the discussion will center on identifying test collections and developing experimental tasks and measures that will provide a sound basis for comparing and evaluating the interfaces. Program Committee: Robert R. Korfhage, University of Pittsburgh Xia Lin, University of Kentucky David S. Dubin, University of Pittsburgh Workshop attendees should submit a statement about their interests or views related to VIRI, and describe briefly their interfaces during the workshop. Requests for further information and submission of interest statements should be addressed to: Z39.50 AND THE IR RESEARCH COMMUNITY _________________________________________ A Research Workshop The Z39.50 Computer-to-computer retrieval protocol is an increasingly mature US national standard (version 3 is currently in the ballot process as of early 1995); it is widely implemented both in the US and, increasingly, also seeing use internationally, particularly in Europe. Z39.50 is potentially of great importance to the IR research community for several reasons: * Because Z39.50 provides a means of separating a user interface from a retrieval system, it allows research in clients and user interfaces to proceed independently from research in back-end retrieval engines, and, of particular importance, allows new user interfaces to be tested against very large production databases. It also allows new experimental retrieval systems to be offered to large user communities through familiar interfaces. * Z39.50 can form the linkage between a number of large-scale research projects that involve the IR community, such as the various Digital Library efforts. * Z39.50 raises and provides a concrete framework to explore a number of important research issues in its own right about the design of interoperable clients and servers for information retrieval, the representation and exchange of metadata about information servers, and related matters. The workshop has several goals: * To introduce the broad IR community to Z39.50, including its history, its current status, its function, and implementation progress; * To highlight several IR research projects that are exploiting Z39.50 today; * To sketch some of the research issues that are raised by Z39.50. After an introduction delineating the history of Z39.50 and the current status of implementations, a short tutorial will explain the operation of the protocol. The second part of the workshop will include two panels: one about the use of Z39.50 to support IR research, and another about research issue in information retrieval protocols. Attendees will be invited to contribute to the discussion. Program Committee: Clifford Lynch, University of California Ray Larson, University of California at Berkeley INFORMATION RETRIEVAL AND DATABASES ___________________________________ A Research Workshop The integration of database management systems and information retrieval systems is of great practical interests. There are, however, hard research problems that remain to be solved. The workshop aim is to assist the information retrieval community in understanding the integration problems and to set up a research agenda. The workshop will include short presentations on the following topics: Architecture: loosely coupled, tightly coupled, total integration; does the DBMS control the IRS or vice versa; support for distributed computing. Retrieval Model and Query Language: reconciling classical DB retrieval and classical (weighted) IR retrieval; retrieval models taking advantage of DB schema; treating DB attributes in an IR way, e.g. in a probabilistic way; integration query languages for IR/DB systems; query processing/optimization. Concurrence Control and Transaction Management: concurrence control on the IR index; is ACID enough or is ACID too much for IR; new transaction models (nested transactions); long lived transactions (for indexing). Performance: new access structures; new buffering schemes (caches); retrieval performance on dynamic data; insertion, deletion, modification performance; scalability (parallel architectures); identify bottlenecks. After the presentations, attendees will participate in round table discussions about each topic. To allow this to proceed in a workshop atmosphere, the workshop is restricted to 30 participants. Program Committee: David Harper, Robert Gordon University Peter Schauble, Swiss Federal Institute of Technology Workshop attendees should submit a short position statement on one of the topics listed above, or a statement of interest. Requests for further information and submission of interest statement should be addressed to: or CURRICULUM DEVELOPMENT IN COMPUTER INFORMATION SCIENCE: A FRAMEWORK FOR DEVELOPING A NEW CURRICULUM IN IR ____________________________________________________________ In this one-day workshop, Doris Lidtke and Michael Mulder will report on their extensive experience in the development of new curricula in computer information science, emphasizing preparation of students to deal with large scale information systems AND new paradigms of learning/teaching. Topics to be covered by the workshop leaders include: (1) involvement of the stakeholders---employers, faculty, and instructional/ curriculum designers; (2) determining content---both depth and breadth; ( 3) validation by the stakeholders; (4) packaging--- knowledge units vs. courses; (5) special delivery mechanisms, and (6) essential/desired infrastructure to support the new/revised curriculum. These topics will provide a framework for discussion of curriculum development in information retrieval. Individuals and groups representing various points of view (library and information science, computing science, MIS, information systems, business, government and academia) will be invited to prepare submissions and act as group leaders. An opportunity will be provided for attendees to participate in working groups developing an IR curriculum in their area of interest. The Workshop Leaders: Doris Lidtke and Michael Mulder have been involved in several national curriculum development groups including ACM/IEEE-CS Curriculum '91. They currently have a 3-year DUE/NSF grant to develop a Curriculum in Computer Information using new paradigms of learning/teaching. Who Should Attend: Participation from individuals and groups involved in or planning curriculum development in CIS and IR is particularly invited. The workshop is limited to 30 attendees. If the workshop is oversubscribed, attendees will be selected to ensure participants come from a variety of environments. Program committee: Edward A. Fox, Virginia Tech Doris K. Lidtke, Towson State University, Maryland Michael C. Mulder, University of Southwestern Louisiana Edie M. Rasmussen, University of Pittsburgh Kazem Taghva, University of Nevada Las Vegas IR AND AUTOMATIC CONSTRUCTION OF HYPERMEDIA ________________________________________________ A Research Workshop The workshop will address IR methods and tools that can be used in the automatic construction of a hypermedia base to produce an informative hypertext collection of documents that can be searched and browsed by content. Passage retrieval is one of the methods that can be used in the segmentation of documents in a collection of flat documents for hypermedia information retrieval design. This method, as well as other methods for automatic authoring of hypermedia bases will be presented and discussed in the workshop. Both techniques that construct a hypertext from an unlinked set of data and those that can be applied to an existing hypertext/media permitting augmention of its set of links are relevant to the workshop. Typing of links in the resulting hypertext needs to be addressed as well as having both static and dynamic links in the resulting hypertext. The workshop also will address evaluation of the quality of hypertext collections and their construction. After the presentations of a few position papers, the participants will discuss specific methods or other topics of interest. The workshop will conclude with the approval of a short working paper presenting all the methods that the participants deem useful for automatic construction of hypermedia. Program Committee: Maristella Agosti, Padua University James Allan, University of Massachusetts at Amherst Workshop attendees should submit a two-page statement on the specific method or topic they propose for discussion at the workshop. Requests for further information and submission of interest statements should be addressed to: