SECTION I IIWRODUCTION Kenneth H. Cook

This introductory section includes (1) Previous Research Effortsf (2) Overview and Objectives of the Current Work, (3) Philosophy and Approach, (4) a major section on Overview of Work Accomplished: February 1971-January 1972, and (5) Summary. The overview of work accomplished does not intend to report results and conclusions but rather overviews the work accomplished and gives references to other sections which deal specifically with particular areas and topics.

1. PREVIOUS RESEARCH EFFORTS A previous research effort (July 1969-January 1971) by the SUPARS Research Group (Syracuse University Psychological Abstracts Retrieval Service), called 9UPARS I, developed, inplemanted and evaluated a free-text reference retrieval system. The research group at the Syracuse University School of Library Science was contracted by the U.S. M r ffxfce Systems €csrmand, Rome Air Development •Center to investigate a large-sc^ie reference ifetrieval system in an on-line, interactive raode. One of the objectives of the work was to determine the reactions to such a system in a "real life" field environment, rather than a limited laboratory setting. IBM's batch-node Document Processing System (DPS) was extensively modified as the free-text proaessor to operate in an on-line interactive mode. A teleprocessing program was developed to allow the SUPARS/lOPS package to interface with the university's 360/50 computer and 70 IBM 2741 cxanrounications terminals. Machine readable versions of rented Psychological Abstracts (PA) tapes were translated, reformatted and then processed through DPS to form a data base of 35,000 document surrogates. The surrogates included bibliographic citations and abstracts for the January 1969-October 1970 issues of PA. A free service was thoroughly publicized and offered to the 17,000-menber Syracuse University caxtpus cximunity for a three-month period in fall, 1970. The system operated four hours a day, Monday through Friday and provided lasers direct access to the data base, without any intermediaries, through use of the 2741 terminals. Users were provided with step-by-step examples of interaction in. a User's Manual, a telephone aid service with experts available to discuss searching problems, and on-line demonstrations to classroom and other canpus groups. The system was evaluated through a multi-method approach, including the following techniques. (1) A cost-effectiveness study plotting the relationship between recall levels and oaqputer processing cost of retrieved documents. Cost effectiveness curves were developed for various levels of search logic

l

, complexity that were available in SUPARS/DPS, (2) A Semantic Differential attitude scale developed, inplemanted and evaluated at Syracuse for on-line retrieval systems, (3) In-depth interviews of the campus oomnunity, based on a stratified random selection of SUPARS users and nonusers, and (4) a descriptive analysis of the wide variety of users registered to use the system, jacluding field of study, specific information requirements, and previous use of computerized systems. The results of the SUPARS I project are available in a 6-part report of the work accomplished during July 1969-January 1971 under the title "LargeScale Information Processing Systems" with each part sub-titled separately,

2. OBJECTIVES OF CURRENT WORK (SUPARS II) The current effort, SUPARS II, is an extension and continuation of the j f r explained above and oovers the period February 1, 1971, to January 31, tok 1972. Hie primary abjective of the sponsoring agency was to "perform further experimentation with and evaluation of a large-scale on-line free-text retrieval system and to develop techniques for inproving effectiveness." (1) T& meet the basic objectives, the reguireimnts of the work to be performed called for: r^v ^.^ a. b. c. Developing recall-iiiproving algorithms that could be added to the existing on-line program, Inducing the rate of vocabulary growth and increasing the vocabulary capacity of the Docurrent Processing System, and Developing statistical data about system use to document the use, growth, and oost of the free-text retrieval system.

The "Overview of Work Accomplished" section beginning on page 4, is organized using these three major objectives as a framework for reporting the type of work done during the SUPARS II project. The research work was accomplished by the SUPARS Research Group of the School of Library Science at Syracuse University. After an eight-month developmental period, the experimental free-text system was offered as a free bibliographic reference retrieval service to the students, staff and faculty of the l.?#00Q-xtatber carpus. The period of experimental use began in Novenber, 1971 and ended in mid-January, 1972, during which time it was available three to four hours a day, five days a week. During 1971, the SUPARS/DPS system ran under the control of the IBM 360/50 operating system cBid the 2314 disk pack facility• During January, SUPARS/DPS ran under a new IBM 370/155 operating system which replaced the 360. User access to the system was gained throu^i the use of 100 IBM 2741 terminals using a standard APL €y£fefaoe and keyboard and were located in most major buildings on campus. The majority of programming involved in the modification of the basic ;&>curnent Processing System was written in Basic Asserrfoler Language.

2

3.

PHILOSOPHY AND APPROACH

The philosophy of the current research is that the roost promising area of irtproving free-text effectiveness is the human intelligence of the users and the by-products of free-text processing. This approach means giving users on-line access to the varied and complex search inquiries which have been constructed and submitted to the system. The approach also means recycling to the user the extensive vocabulary of terms generated by free-text processing, which would otherwise be stored internally arid not be available to the user. This basic philosophy was developed after consideration and recognition of alternative techniques and approaches to improving retrieval systems. These alternative approaches tend to emphasize intervention of the system d e signer through the use of statistical techniques for indexing, and lexographic decisions about the control of the terms allowed into the index. The approach taken by the SUPARS II research group was that iirprovements te interactive, free-text searching should provide a user with various forms of cues, aid, and help during, before, and after the formulation of a search inquiry. This type of help recognized the advantage of, but did not enphasize, the use of on-line tutorials. Qprther,' the development of help stemmed from the notion that in "an information center or library an individual tries to enlarge his strategy o f ihfonration seekiftg through the help of the reference librarian. Ihe reference librarian can generally suggest additional cues, sources, and alternative methods of searching — aids which are rarely available in interactive searching of a retrieval system. The idea behind the new algorithms developed in the SUPARS II study was DO provide these cues in order to help pronpt the on-line, interactive searcher during his negotiations with the computerized retrieval system. Because modern, interactive searching generally takes place at terminal locations remote from a reference librarian or any other human specialists in che user's interest area, one source of additional help was considered to be the human intelligence of all other users of the system. Because the intelligence of this user group is reflected in the search inquiries submitted to the system, techniques were developed to record, store, and reformat* the data for U3e by all other lasers. The section titled "Development of New A l g o r i t h m " on patge 4 explains hew this objective was reached. Another underlying idea in the research reported here was that the d e velopment of search inquiries in free-text searching rests almost entirely on.the primary searcher — the individual with a specific information requirement statement -~ and his ability to specify the inquiry search words and oGTtoinations, and the documents considered relevant to his requirement statement, f. < For a free-text retrieval system to renain adaptable and viable to a variety of user groups, the user must be given the maximum opportunity to cfevelop M s cwn unique and personal search inquiries that adequately reflect his requirements. The system built at Syracuse was developed for use by the

primary users — the individual having his own specific information requirements — and specifically did not provide intermediaries. For future free-text retrieval systems of this type to be attractive to a wide range of users, the formulation of the inquiry will have to be easily acocHTplished by even the most unsophisticated, first-time user. l/hether an intermediary can effectively take the place of the primary searcher is still an enpirical question. The question raised by the current research is how effectively the user can take advantage of new free-text, online algoritlims to inprove his recall level while maintaining sane acceptable degree of precision. To investigate this question, three main areas of work were performed, including: (1) development of new searching algorithms, (2) modification of free-text vocabulary storage, and (3) documentation, through a variety of techniques, of the use, growth, and oost of SUPARS/bPS. These three work areas parallel the basic objectives stated on page 2 and are used as an outline for a description of the work accomplished. This overview provides a broad general review of the work accomplished, while the specific procedures, conclusions, and results are explained in more depth in other referenced sections following Section I. 4. OVERVIEW OF WORK ACCOMPLISHED: FEBRUARY 1971-JANUARY 1972 This overview is organized into three parts, following the three main objectives: (1) Development of recall improving algorithms that could be added to the existing on-line program, (2) reducing the rate of vocabulary growth and increasing the vocabulary capacity of the Document Processing System, and (3) developing statistical data about system use to document the use, growth, and oost of the free-text retrieval system. a. Developnent of New Searching Algorithms

The SUPARS/DPS searching capability developed during SUPARS I (July 1969January 1971) allcwed a user to interactively submit free-text \vords as input and retrieve document surrogates of bibliographic citations and abstracts. In addition to the ability to search and retrieve documents from a docuraent data base, two complementary algorithms were developed during the current SUPARS II in an effort to inprove the user's ability to formulate free-text searches and to inprove recall at an acceptable level of precision. The first idea was to store and make retrievable interactively the previous search inquiries of all users for on-line retrieval. The second idea was to make the free-text vocabulary terras available interactively along with the document postings. These two ideas are discussed below under (1) The Search Data Base, and (2) The Vocabulary Data Base. One of the difficulties with developing and constructing free-text

searches is the demand placed an the user to knew and use a variety of synonyms related to his specific interest area in an iirjuiry. Ideally, a user would be able to talk with someone familiar with his interest area and be provided with a variety of words and other cues. The user oould then develop or formulate a search inquiry to the system that would have the specificity or exhaustiveness needed to maximize retrieval. Although seme specialists are able to cone up with most potential search words, most users don't have the opportunity for an expert to be immediately available for consultation. To be responsive to the needs of an on-line user with no intermediary, a free-text system has to augment the intellect of each user and provide synonym control and other cues to help him develop effective searches. The concept of oemputer augmentation of the human intellect is not new, but has not yet been fully applied to the inprovement of on-line reference retrieval systems. The new algorithms developed during the current research effort are attenpts to extend the search formulation capabilities of the free-text searcher. Ihe two new algorithms discussed below are separately searchable data bases of (1) the previous search inquiries of a U users — the search data base — and (2) the entire range of free-text vocabulary terms — the vocabulary data base. These two new ideas were provided to the user in addition to the primary data base of-bibliographic citations and abstracts — the doajmeait data base; (1) Ihe Search Data Base Ihe search data base was developed by taking as input the previous search inquiries that users had submitted to the system when searching for documents. After reformating this data, the searches were processed through the SUPARS/DPS programs and an inverted file of search words and documents were formed. How*ever, the "documents" were not bibliographic citations and abstracts, but the "text" of the previously used searches. A detailed description of the development of the search data is oontained in Section II. Users were allowed to submit search inquiries to the search data base in the same form as the document data base — in fact, all search inquiries use basically the same three-part structure: (1) specification of data base to be searched, using one or two keyboard characters, (2) search words and Boolean operators^ and (3) specifications of type of output desired. An exairple of hpw the search would be entered is given on the following page in Figure 1, where the indented lines are the aonputer, and the lines to the far left are the user. Each user-typed line is transmitted to the computer by depressing a carriage return key on the terminal keyboard. Ihe search inquiry shown en page 6 in Figure 1, uses the symbol AS to specify the search data base as the one to be queried. Each line containing search words can be substituted later as a unique individual term. For example, under 13, the user is asking for searches that contain LI (either aaNCEPT or MOTIVATION) and also 12 (ESTEEM). Ihis final line, L3 specifies the words that the user wants to appear in a stored search before it can be

5

•

SEARCH NO. 000125*43 LI
CONCEPT OR MOTIVATION

Specifies data base of searciies : Labels and search number automatically generated

L2
ESTEEM

L3 LI AND L2; — Use of labels as search words in final search requirement semi-colon stops generation of labels Output form desired requests listing of entire search END statement is signal for processing to begin

LIST SEARCHES

END

Figure No. 1 Search Inquiry Using Search Data Base

retrieved. The statement LIST SEARCHES is the form of output the user wants listed for retrieved items. END statement, followed by depressing the carriage return key, is the signal for the system to begin processing the search inquiry. TVro types of output can be requested when using the search data base: (1) LIST SEARCHES, which requests a printing of the entire search as originally submitted to the system, along with data on the number of documents which were retrieved and printed for the search, and (2) LIST VOWS, which requests a printing of all unique words found in the retrieved searches, along with the nuntoer of searches in the data base containing these unique words. An example of LIST SEARCHES output is given in Figure 2; a LIST WDRDS output is shown in Figure 3. As Figure 2 shews at the top, the search inquiry is ended with the END statement, and the system reply states that SEARCHING is taking place, followed by the maximum nuKtoer of items (in this case searches) possible, which is 12. The command )G0 automatically prints 10 searches at a time like the one shewn, and then reports tliat 2 of .the 12 total searches retrieved may be available. 6

END
SEARCHING MAXIMUM ITEMS POSSIBLE: )GO 0000012

0Q017 Kxanple of one retrieved
LI MOTIVATION OR ESTEEM OR ACHIEVEMENT', \ ^ s e a r c h fron the search

LIST WORDS 520 10 007234

I / d a t a base

[9 more searches printed out here]
MAY BE 2 MORE ITEMS

Figure Ito. 2 Output Exanple of "LIST SDUOffiS" in Search Data Base

Toe message NO MORE ITEMS AVAILABLE would have been printed if only 10 documents v;ere retrieved and printed. To print the remaining two searches, the user types )G0. In the boxed exairple of a retrieved search, the first number (00017) is tiie accession nuriber assigned to this search by SUPAR5/t>PS input processing. The search lines and words as originally submitted by a user are printed next, along with the form of output requested, which in this case was LIST WORDS. The three numbers below the LIST statement represent (1) the number of documents originally retrieved with this search, (520); (2) the nurrfoer of documents actually printed (10), and the sequential nunber originally assigned to this search as identification, i.e., this was the 7234th search processed (SEARCH NO. 007234). As Figure 3 shows, the other search type of output for tlie search data base is the LIST WORDS option. LIST WORDS prints just the unique word found in all searches vvfaieh were retrieved from the search data base. The user is given the message "rSMOMtM ITEMS POSSIBLE: 0000011" which indicates that there are 11 other unique words found in searches that contain LI (either concept or mDtivation) AND also L2 (esteem) which was the search inquiry of Figure 1. Up to 100 words at a time will be printed when the user has selected the LIST WDRDS option. After electing to print the 11 words retrieved by typing the oonmand )Q0, the system responds with the message "NO. IN ( ) IS NO. OF SEARCHES" reminding the user that the postings following each word

END SEARCHING MAXIMUM ITEMS POSSIBLE: )GO NO. IiJ v ) IS NO. OF SEARCHES CONCEPTS MOTIVE ACHIEVE ItalGs' (003139; (000090) K 000032 > (000014; CONCEPTUAL 1000120) CONCEPTUALIZATION '000102; MOTIVES (000040) MOTIVATES (.000034) THOUGHT (000030) E/IiZK (000021) MOTIVATED (000002) 000011

NO MORE ITEMS AVAIL/IBLE

Figure :to. 3 output iixanp^e o f "LIST VJORDS" i n Search Data Base

LV

SEARCH NO. LI ESTEEM; LIST END WRD(S)

000367

Figure No. 4 Vocabulary Data Base Input: Determining Status of Single Word

in the LIST W3RDS output represent the number of searches containing these morels, When all eleven words have been printed the message "NO ?DRE H E M S AVALlAiilE" indicates that all items retrieved have been printed. Had the MftXIMUTl ITEMS POSSIBLE total Jjeen more than 100, the user would sinply have to reoeat the )GO coimand to have the next 100 (or less) words printed. (2) The Vocabulary Data Base

A second on-line option available to the user was the vocabulary data base. The vocabulary data base was different from the search data base b e cause no reprocessing of data had to be made through the SUPARS/DPS loading process, 'ihe vocabulary containing all unique free-text terms and the number of documents containing each word was simply the portion of the inverted file developed from the original processing of documents. Because the vocabulary was not reprocessed through DPS, as the stored searches were, and could not be accessed through use of the basic DPS search module programs, new search module programs had to be constructed. In the version inplemented during SUPARS II,. users were limited to entering as inrmt only one word (or character string) per search because of programming limitations. Later versions o f the vocabulary search4 itodule could extend this c«tbability to-fHore than one^word. . -• •• T o allcw the user t£ "browse" through the'"alphabetical listing of words in the vocabulary data base, three search options were considered and two were inplemented. Ihe first search option allowed a user to determine if his potential search word was in the vocabulary of free-text words. The second option alleged the user to enter any character string prefix of from 1 to 25 characters and have all words a t t a i n i n g that prefix retrieved. The third option would have allowed a user to enter a word and specify the "n" words before and/or after it from the alphabetical list to be printed as output. Because of p r o g r a m i n g difficulties (explained in Section I I ) , this option was not made available. As Figure 4 indicates, the form for entering a search of the vocabulary data base is basically the sane as the search data base shewn in Figure 1. In the Figure 4 example, by typing the characters AK, the user specifies the vocabulary data base as the one lie wants to search through. To determine, fpr exanple, if the word "Esteem" is contained in the free-text vocabulary of acceptable search words, the word.is entered under label number 1, I I . The output form is specified in tlie search by typing LIST WORD, or LIST WORDS. She output that would be received by a user if the word "Esteem" were in the vocabulary data base is shown in Figure 5. The reply that the "MAXIMUM ITEMS POSSIBLE is 00001" indicates that the single input word has been retrieved. •Hie user, can either elect to type )G0 and have the word and document postings number printed, as shavfct,\6r becjin a search of any one 6f the two other data bases by typing the appropriate^ delta A symbol, i.e. A for document data base, o r AK for vocabulary data base. The iressage "NO. IN ( ) IS NO. OF DOCUMENTS" in Figure 5 reminds the user that the nurnber in parentheses after the retrieved word represents the
Q

EIW SEARCHING MAXIMUM ITEMS POSSIBLE: )GO NO. IN ( ) IS NO. OF DOCUMENTS 000001

ESTEEM

^000123)
AVAILABLE

NO MORE ITEMS

F;igure 5. Vocabulary Data Base Output: Determining Status of Single Wbrd

nurrber of documents in which the word is found. This massage was used to avoid confusion with the LIST WDRDS option hi the search data base that posts tae number of searches in which each word was found. The second search option available to the user searching the vocabulary data base is the truncation expander. The terms "truncation" and "expander" mean that a user can type in any prefix truncation of a word followed by a special symbol [Motivate (?) ] and the system will "expand" that root to retrieve from the vocabulary any word with the prefix "ri-OT-I-V-A-T-E." The output for this option will print up to 100 words and documsnt postings with a single )GO ccrrmand, and would be basically the same as the example in Figure 5 with up to 25 rows of four words a row printed. (3) Integration of All Ihree Data Bases in Seardiing By integrating the use of all three data bases, a user could continually revise his searching strategy. A preliminary statement of his information requirements could lead to the selection of certain "key" words which would be candidates for the actual search inquiry. Before a search inquiry was submitted to the document data base for the retrieval of documents, a user could revise his overall strategy through the use of the vocabulary and/or the search data base. For exanple, a user could query the vocabulary data base to retrieve all fnee-text terns with a specified prefix. Terms having no relation to the area of interest could IXB eliminated (e.g. "PATCLIFF" and "FJVTIFY" would contain the prefix "RAT" but be of little help in a study of small rodents ) and those with some connection could be retained for further use. Also, the number of document postings for cadi useful word could !je used to further sort out terns that retrieved in the user's estimation too many or too few documents.

10

A user who felt that he had a sufficient nuntoer of terms in order to proceed could submit an inquiry directly to the document data base and begin the retrieval of documents. A user who needed more terms related to his informational requirements could directly begin querying the search data base to find additional terms. Input could consist of words and word oonbinations a user extracted from his information requirement statement, or any other source. Output would be all the previously submitted search inquiries containing those terms specified in the input format. The user could print out the entire search inquiry as output or merely a listing of all unique terms found in the retrieved inquiries. New terms found through queries of the vocabulary and search data bases aouid then be used to develop and submit new inquiries to the basic data base pf documents. In turn, new terms found in some of the relevant documents retrieved could be used to modify the inquiries submitted to the document data base or oould be used to find additional terms from the search or vocabulary data base. This iterative process of searching through first one data base and then another in any order or at any individual pacing rate could help the user to cjevelgp search inquiries that v*epe more effective for his specific requirements. .' • The limitations of these two new searching algorithms are fairly evident. First, the usefulness of a search data base depends on its variety and depth. Before it can become maximally useful to most users, the data base has to "build qp" a stockpile of search inquiries in a wide range of sub-areas. Depending on the number of potential users and the subject matter, this building up process oould take a substantial time. The most popular and frequently submitted inquiries oould represent a wide range of users or any small specialized sub-group. (4) Alternatives Considered but Not Implemented One alternative which was not a direct part of the objectives and was not inpleraented because of tine and programming limitations was the establishment 6f a user queue that would control and report to users their order of access to the data bases. The queuing system was originally devised to over*came the previous user dissatisfaction with not being able to enter an inquiry because the mximum amount of computer core necessary was already in use. The queuing system was developed to establish two separate queues: (a) the primary queue, which would consist of those users who oould have search inquiries submitted and processed with the available computer core available, ana (b) th6 secondary queue, which would allow additional users to enter their search inquiry, have it held in temporary storage, but not have it processed. Kaah user in the secondary queue would be told how nany people were ahead of hiia in the queue. When a spaoe in the primary queue was vacated, each member of the secondary queue would move up one spaoe and be given a printed message that "There are x users ahead of you." When "x" was zero, the user would be first in line in the secondary queue.

11

The queuing system had the advantages of (a) giving the user a chanoe to enter a search inquiry without being "bounced" from the system, (b) transmi tting feedback to the user from the system about his search inquiry status, (c) allowing more users than just those having their search processed to begin interaction with SUPARS/DPS. However, the complexities in the development of the pro^aitining task to maintain control over the status and locations of 15 to 20 different users, to recognize and oonpensate for inquiries that were interrupted, and to account for user inquiries that were abruptly terminated accounted for the discontinuation of the queuing system development.

5. MODIFICATION OF THE FREE-TEXT VOCABULARY FILE A second objective of the sponsoring agency dealt with a way to handle the large free-text vocabulary that was developed and stored for use in the inverted file of documents. For SUPARS II, the SUPARS/t>PS vocabulary file contained 106,702 unique words for the 46,828 documents stored in the document data base. Because free-text processing generates very large vocabularies such as these, sane research choice has to be made between (a) developing techniques to control the vocabulary and reduce its grtXrth while still providing representation of the data base £ or (b) providing for the expansion and continued development of the vocabulary. The research direction taken during this effort was directed at developing techniques for storing and making available all free-text words derived from document process, with the exclusion of cannon words, such as "and", "or", "but'\ etc. !he choice was made on the basis of three considerations: (a) The maintenance of the research posture of a free-text investigation called for the retention of a non-restructed choice of words and terms in the vocabulary. Restrictions on the vocabulary would involve, in some measure, man-made decisions that would result in one form or another of a controlled, rather than a free vocabulary, (b) restrictions on search word choice eliminate the exhaustivity and specificity essential for effective free-text search irquiries, especially when the primary user has the responsibility for formulating his cwn searchf and (c) a relatively easy but time consuming and major programming technique was available for effecting the expansion of the vocabulary. In addition, enpirical evidence from the SUPARS I research effort indicated that after 35,874 documents were processed, each new document contributed only one new unique word to the free-text vocabulary. At the beginning of processing an average of 8 new words were added for each new document. A complete description of the data base growth is given in Section II. To allow for an expansion of the vocabulary, a modification was made to the coding structure of the original DPS program. The coding in the original program is 16-bit structure, i.e. 2 1 6 . Ihis meant that the vocabulary for any one file aould be no larger than 2 1 6 , or 65,534 words. Under this arrangement, the only way to have a larger vocabulary would be to define and construct a

12

oanpletely new file. This second file would have to be searched separately and would be one which "started from the beginning" storing words already found in the first file. Among other limitations, this multi-file structure meant repetitive and inefficient searching and retrieval of documents. By reprogramming the original DPS coding with a 32-bit structure, i.e. 2 # a greatly increased capacity could be developed. This 32-bit coding provides for over 4 billion unique words to be stored. The programming work involved changing each segment of the old search module programs from 16-bit to 32-bit coding, and then debugging and testing the new programs during test loadings of documents. See Section II for documentation of this programming work. This new coding structure allowed a single SUPARS/DPS file to grow to a total of 106,702 unique words, after the free-text processing of 46,828 documents.

6.

IXCUMEOTING OOST, USE, AND GROWTH OF SUPARS/DPS II

The final major objective of the sponsoring agency was the reporting of the cost, use, and ^growth of SUPARS/tJPS 11% These three areas were documented in tl» following ways and are explained in detaM below: Post — a cost-performance study of the operating efficiency and effectiveness of the SUPARS/t>PS system b. Use — a structured phone interview with a representative sanple oF"users and nonusers; an attitudelneasure of users from a semantic differential scale; and an analysis of user initiated calls for help and information to a special telephone line, and c. Growth — analysis of the growth of the three major data bases; analysis of user demographic data and rate of user registration with the system; and analysis of the unobtrusively recorded on-line user interaction during the 2-month period of system operation. a.

a.

Documenting Cost of SUPARS/DPS II

A cost-performance study was conducted during January 1972 in order to take advantage of results frati a newly installed IBM 370/155 which was used to run the SUPARS/DPS II system from January on. The procedures used and the evaluation of the data are given in detail in Section IV. The objective of this study was to develop oost-performanoe curves for the three data bases that could be employed by the user during the on-line searching process. These three data bases included: (1) documents, (2) freetext vocabulary, and (3) previously submitted search inquiries.

13

^ ^ measure of oost was the number of documents retrieved; a secondary cost measure was the central processing unit time of corrputer processing of a SUPARS/DPS search inquiry as measured by the IBM 370/155 operating system. Ihe measure of system perfonranoe was the recall ratio, i.e. the ratio of the relevant documents retrieved to the total number of relevant documents in the data base. By determining separate cost-performanoe relationship curves for each of the three data bases, enpirical data was obtained to judge the relative efficiency and effectiveness of these three algorithms in an interactive, free-text system. b. Documenting Use of SUPARS/DPS II (1) Structured Phone Interview

One technique used to assess the use and nonuse of SUPARS/DPS was a structured interview administered by telephone during December, 1971. The purpose of the interviews, based on random samples of users and nonusers, was to obtain reliable estimates of the reactions to SUPARS of user and nonuser groups. _ -«The pre-tested inteaJvidw guide was basedi on extensive personal interview schedules conducted during Fall, 1970. The survey was aimed at obtaining data on the following six questions: a. Can we identify the variable (s) which differentiate the population of registrants into users and nonusers? b. c. d. e. What mechanisms are effective in the education of SUPARS lasers? How often and to what extent is the SUPARS system used? What are the major problems encountered by users? Hew do users judge the quality of their search output?

f. ffcw do users and nonusers characterize their reactions toward the system? The techniques used for selecting the sanple, the procedures used in interviewing, a sianmary and discussion of the responses, and conclusions of the telephone interview are given in Section III of this report. (2) Attitude Measures A second technique used to obtain data on the usage of the SUPARS system was through the application of the Semantic Differential Attitude Scale. Ihe Semantic Differential was used because it oould identify independent

14

dimensions of users' attitudes toward the system. The sanple of 102 people given the attitude scale represented users of the system from such academic fields as psychology, education, journalism, social work, and library science. The study conducted during 1971 was a replication of the 1970 study (reported in the 1971 SUPARS Final Report, Section IV-B-5) and indicated similarity of response that supported the reliability of the testing instrument for assessing user reaction to an interactive bibliographic retrieval system. A carplete description of the study which was conducted is given in Section III. (3) User Initiated Calls

A third means of examining use of the SUPARS system was through an analysis of data from user initiated telephone calls to a special HELP! line manned by experts in the use of the system. The basis of making the line available was to allow users to know that there was "someone out there who is interested and will help." This type of personalized help is a major factor in overcoming the |tereotype ima<gje of fthe cold, iitpersonal conputer over which the user ha3 no control. Especfally with users^aew to the computerized retrieval system, the chance to talk to a human and be given an explanation of why the system might not be working > dr what type of remedial action could be taken when a user error was nacte, was extremely inportant in maintaining a satisfied and reasonably interested group of users. Various categories of inquiries made by users are grouped, totaled, and examined under such areas as (a) requests for general information, (b) user problems in interacting with the system, (c) conputer related problems, and (d) problems related to too many users in the queue. The analysis of the various inquiries made through the KELP! line is presented in detail in Section i n . c. Docirenting Growth of SUPARS/t)PS n

(1) Growth of the Three Major Data Bases The dcxainentation of the growth of the SUPARS/bPS system began with the three major data bases that were accessible on-line and interactively, by users. A complete description of the growth of these data bases is given in Section II of this report. Included are summaries of the nunfeer of documents processed by SUPARS/DPS for various periods of time, and the growth rate of the fre^-text vocabulaj^v. Also dociirented are the development of a data base for previously submitted search inquiries, and another data base of the freetext vocabulary terms. (2) Profiles of User Registrants

A second technique used to document the growth of the SUPARS/DPS system was through a tabulation and thorough analysis of the demographic data

15

supplied by each individual who registered to use SUPARS. This analysis includes the cumulative growth of the registrant population, and a descriptive examination of the registrant's status, department, time spent in teaching, research, etc., user of Psychological Abstracts and other computer related activities, and previous experience with computers in general. Ihis profile of registrants is given in more detail in Section III. (3) STATPAC Another means used to document the growth of the SUPARS system was by neans of data obtained from the STATPAC program, which unobtrusively recorded and collected the entire user interaction and other system parameters. The type of data recorded for later batch-made retrieval included: (a) user identification nuntaer, (b) date of interaction, (c) terminal use, (d) clock cime used during searching, (e) the actual line-by-line interaction, including the user's search inquiry and printed output in a condensed version. Other data included the oonputer CPU time to process an inquiry, measures of oortputer channel use and input-output sequences, and a cost figure for the processing and output. ..Die STATPAC programs v*ere written as a.general retrieval system which included a variety of identifiable fields (such as date,: time, terminal, output, cost, etc.) and Boolean and other Ibgical operators that could be used to combine fields in various aorcbinations for retrieval.
IVKD standard summaries were printed at two-week intervals to monitor the various types of use of the system. Other summaries could be tailored individually according to the field or field combinations desired for a more specific analysis of data that was stored.

Ihe analysis of the STATPAC program, and examples of data that reflect the growth of the SUPARS interactive retrieval system are shown in Section III. 7. SITVMARY This introductory section has presented the major objectives of a study of free-text retrieval evaluation conducted by the SUPARS Research Group at the Sdfiool of Library Science,. Syracuse University. The major portion of the section was devoted to a description of the work accomplished during the period February 1, 1970 - January 31, 1972 in relation to the three major objectives: a. Developing recall-improving algorithms that could be added to the existing on-line program. b. c. Reducing the rate of vocabulary growth and increasing the vocabulary capacity of the Document Processing System. Developing statistical data about system use to document the use, growth, and cost of the free-text retrieval system.

16