________________________________________________________ REUSABILITY, INTERCHANGEABILITY, AND COMPATIBILITY: ANSWERING THE QUESTIONS OF TEXT ENCODING STANDARDS Lou Burnard, Oxford University Judith Klavans, Columbia University C. M. Sperberg-McQueen, University of Illinois at Chicago A PRE-CONFERENCE COURSE to be held in association with SIGIR '95: 18th International Conference on Research and Development in Information Retrieval Seattle, WA, USA Saturday, July 8, 1995 8:30 a.m. - 3:30 p.m. ________________________________________________________ SIGIR '95, an international research conference on information retrieval theory, systems, practice and applications, will be held in Seattle, WA, from July 9-13. On the Saturday prior to the conference, a one-day course will be offered covering the theory and practice of markup languages for the representation of textual and other data, such as SGML and the Text Encoding Initiative. Taught by Lou Burnard, Judith Klavans, and C. M. Sperberg-McQueen. COURSE DESCRIPTION: The representation of textual data has raised serious problems since the early days of digital technology. Incompatibility between representations range from simple formatting issues, such as word delimitation, to data encoding schemes, such as 7-bit encoding for English, 8-bit for accented languages, up to 32-bit for Asian languages. Furthermore, the complications seem to be growing as the amount of digital data increases. Recognizing the predicament these complications cause in the information age, a group of researchers and practitioners, sponsored by the Association for Computational Linguistics, the Association for Computers and the Humanities, and the Association for Literary and Linguistic Computing, joined in 1988 to explore ways to resolve the serious emerging incompatibilities in the representation of text. The Text Encoding Initiative has addressed these problems by developing detailed SGML Document Type Definitions (DTDs) to achieve comprehensive and generalizable encoding standards for a range of data types, from verse to syntactic analyses, from spoken language to hypertext, from terminological data to multilingual corpora. This one-day course will consist of three parts: the first will describe the challenges raised by the three ``abilities'' which concern effective text representation: reusability, interchangeability, and compatibility. The next section of the course will present the types of data handled so far by the TEI encoding scheme, some of the problems already solved, some ongoing projects, and some unsettled questions. If hands-on is possible, we will provide a session to experience the strengths of using the TEI for building intelligent text data bases from existing on-line texts. Otherwise, we will demonstrate widely available software and discuss practical issues in using the TEI for building intelligent text data bases from existing on-line texts. The course will be of interest to: computer scientists who are building large test-beds of textual data, researchers who must analyze and encode representational systems over such data, practitioners who must solve the incompatibility problem by choosing a standard encoding scheme for textual data, SGML hackers who want to know more about TEI DTDs, humanists who want to learn more about the issues in text representation. Since most of IR currently operates over textual data, the indexing issues in the TEI are of particular and pressing interest to the IR audience. Further information can be found at: http://www.columbia.edu/~klavans/home.html http://www-tei.uic.edu/pub/tei/sigir.html Questions re workshop content should be directed to C.M. Sperberg-McQueen, u35395@uicvm.cc.uic.edu; addresses for queries re registration and accommodation are given below. MATERIALS AND PRESENTERS All participants will be provided with a printed introductory summary guide to the TEI scheme and supporting materials on PC disks, including full versions of the TEI DTDs, public domain SGML software and sample TEI texts. The electronic version of the Guidelines will also be provided. Lou Burnard, of Oxford University Computing Services, is the European editor of the TEI project. He has degrees in English literature from Oxford, and has worked in computers since the seventies. His areas of expertise are in the applications of computing to linguistic and literary research, particularly with reference to database and text retrieval systems. He has published and lectured widely on these and related topics. His present responsibilities, aside from TEI work, include management of the British National Corpus project at OUCS, and the Oxford Text Archive, of which he is Director. Judith Klavans is the Director of the Center for Research on Information Access (CRIA) at Columbia University. The goals of the Center, established in January 1995, are to integrate and coordinate the various digital library related activities at Columbia University, to push forward research on technologies related to information access, and to serve as a source of information on the technological aspects of digital library applications to external projects. Dr. Judith Klavans has a research career which combines aspects of computer science and linguistics, including the automatic acquisition of lexical knowledge, multilingual text analysis, and the development of symbolic techniques for the presentation of information within the context of digital libraries. C. M. Sperberg-McQueen is a senior research programmer at the academic computer center at the University of Illinois at Chicago; he currently works in the database group, on SGML applications and the university library's information arcade. Since 1988 he has been editor in chief of the ACH/ACL/ALLC Text Encoding Initiative. REGISTRATION: Cost of the course is $50 before May 29 and $65 after May 29 which includes a box lunch and course documentation. The attached registration form covers this course only. Attendance at SIGIR '95 is not required for this course. Those wishing to attend SIGIR as well should complete the separate SIGIR registration form; a copy plus full information on SIGIR '95, including descriptions of tutorials, workshops, all technical sessions, and accommodation, etc. is available from ftp.u.washington.edu (\public\sigir95\program) by anonymous ftp; or via WWW at URL: http://info.sigir.acm.org/ sigir/conferences/SIGIR_95_adv.pgm.html; or request a copy of the program by mail by contacting sigir95@u.washington.edu. The course venue will depend on enrolment but at present it is expected that it will be at the SIGIR conference hotel, the Seattle Sheraton Hotel & Towers, 1400 Sixth Avenue, Seattle, WA 98101. Details of conference accomodation are available from the ftp and www addreses above. Cut here: >-------------------------------------------------- SGML/TEI COURSE REGISTRATION FORM in conjunction with SIGIR '95 Seattle, WA, USA, July 8, 1995 Please use block letters or type, and tick where appropriate __ Mr. __ Ms. __ Dr. __ Prof. Other: ______ LAST NAME:________________ FIRST NAME:_______________________ BADGE NAME (if different): __________________________________ COMPANY/ORGANIZATION:________________________________________ ADDRESS:_____________________________________________________ CITY:__________________ STATE:______ ZIP CODE: __________ COUNTRY:_______________ PHONE: ( ___ )____________________ FAX: ( ___ ) _______________ EMAIL: ________________________ COURSE REGISTRATION FEE: $50 prior to May 29; $65 after May 29) $ ________________ DO YOU HAVE ANY SPECIAL NEEDS? Please explain: ___________________________________________________________ ARE YOU ALSO ATTENDING SIGIR '95? ____ yes ____ no METHOD OF PAYMENT (US Currency only): __ Check payable to ACM/SIGIR95 __ Credit card (Visa, MC, AMEX) ____________________________________ Credit card number, expiration date ______________________________________ Signature, date (I authorize to charge my account fees indicated above) Return Registration Form by May 29 to qualify for early registration. Use fax or email (credit card payment) or mail check or credit card) to: SIGIR95 c/o Convention Services Northwest 1809 Seventh Avenue, Suite 1414 Seattle, WA 98101 USA Fax: +1 206-292-0559 Email: SIGIR95@aol.com (Registration queries to: +1 206-292-9198 (Ask for Sarah Amendola) ______________________________________________________________ ------- End of Forwarded Message