i i mi. ••• , Syracuse University School of Library Science Syracuse, New York 13210 S. REPORT T I T L E DOCUMENT CONTROL DATA - R & D UNCLASSIFIED 26. CROUP {Security elms siticmtion of title, body of abatract and indexing annotation must be entered when the overall report Is claeeltiad) ». O R I G I N A T I N G ACTIVITY (Corporate author) 2a. REPORT SECURITY CLASSIFICATION N/A FREE 'TEXT RETRIEVAL EVALUATION 4. DESCRIPTIVE NOTES (Type of report and inclusive dates) Final Report 1 February 1971 - 31 January 1972 ft. A O T H O R ( S ) (Firmt name, middle initial, taal name) Pauline Atherton Kenneth H. Cook Jeffrey Katzer ft. REPORT DATE 7a. TOTAL NO. OF PACES 7b. NO. OF REFS July 1972 da. CONTRACT OR GRANT NO. 196 9a. ORIGINATOR'S REPORT NUMBER(S) 6 F30602-71-C-0185 Job Order No. U59UOOOO None &e. OTHER REPORT NO(S) (Any other number* that may be memiffd thle report) RADC^CR-72-159 10. DISTRIBUTION STATEMENT Approved for public release; distribution unlimited. I ' SUPPLEMENTARY NOTES I. 12. SPONSORING MILITARY ACTIVITY S None II. ABSTRACT Rome Air Development Center (IRDT) Griffiss Air Force Base, New York 1 3 ^ 0 The basic problems this research effort investigated were (l) the development, implementation, and evaluation of algorithms to improve recall levels in interactive, free-text retrieval using a modified version of IBMfs Document Processing System (DPS), (2) the development of techniques for increasing the vocabulary capacity of DPS, and (3) unobtrusive, statistical data gathering of system use, growth, and cost. through a previously developed computer program. A free-text document data base (DD?) of U6,828 bibliographic citations and abstracts from Psychological Abstracts' was developed. Also, two interactively accessible data bases were developed and implemented to provide free-text vocabulary control and recall improvement directly to the user. No intermediaries were used in the retrieval process. These two algorithms were (l) a Vocabulary DataBase (VDB) containing the 106,7027unique freetext terms from the inverted file of processed documents in the DDB, and (2) a Search Data Base (SDB) containing previously submitted user search inquiries to the DDB. A two-month period of experimental use of the entire system with all three data bases by students, staff, and faculty of Syracuse University in fall 1971 provided the required "real-life" field environment. A total of 2399 search inquiries were submitted Via the 27^1 terminals. The system operated under both the 360/50 and the 370/1155 operating systems. The capacity of the DPP vocabulary was increased by the development and successful implementation of computer programs that revised the DPS coding structure of the vocabulary file. The newly developed structure changing the 16-bit coding to 32-bit coding resulted in increasing the vocabulary capacity DD w«1473 \0~ UNCLASSIFIED Security Classification u n v s j j j t o o x r ±r*ij Security Classification K EY WORDS Abstracts Information Retrieval 13. Abstract (continued) from the former single-file limit of 65,53^ to over k biklion terms An qxtensa ve user-oriented public relations/publicity, i n s t r u c t i o n , and edijjcatiorj package was developed and implemented. This user emphasis r e s p i t e d i n a si ignifijcantljf great er number of registrants and actual users in the systJem, which di]d not use search intermediaries, than during a similar fall 1970 pefriod, Results o f |a conH r o l l e d cost-performance (recall) study indicated that use) of t h e VDB [or tftd SDB jfieldeq better cost-performance levels, especially at highjer rec(all pejrcentelges, t]|han bjj H P use of the DDB alone. These results are the i n i t i k l f i n d i n g s and would Require additional testing to substantiate their validity, Othepr evaJjuativd techniques included a Semantic Differential attitude scale fob interactive retrieval systems, a structured telephone interview of users, and a s p e c i a l number useils couJjd c a l i for help in developing search inquiries. The STATIPAC program for unobtrusively monitoring and retrieving data on system use, grov(th, arid cost! was and successfully used to provide evaluative data, General conclusions aire thdt i n t e r active, free-text cost-performance (recall) levels) sin b|e improved tjy direjct us^r control of algorithms providing vocabulary control] ReaR - l i f e ! a p p l i c a t i o n of these algorithms suggests that the recall improvemjent avkilabU e thrqugh cq ntrol3|ed vocabulary or indexing systems might be obtained rieadiljrt the free-- t e x t searcher who is provided with techniques such as those imp]Jemente(d i n tjhis MsearcH e f f o r t . SACr-Griffiss AFB NY ft UNCLASSIFIED Security Classification N O T I C E THIS D O C U M E N T HAS S E E N R E P R O D U C E D B E S T COPY F U R N I S H E D US BY THE AGENCY. FROM THE SPONSORING CER- ALTHOUGH IT IS R E C O G N I Z E D THAT TAIN P O R T I O N S ARE I L L E G I B L E . IT IS BEING R E AVAILABLE L E A S E D IN THE I N T E R E S T OF MAKING AS MUCH INFORMATION AS P O S S I B L E . FREE TEXT RETRIEVAL EVALUATION Pauline Atherton Kenneth H. Cook Jeffrey Katzer Syracuse University Approved for public release; distribution unlimited. I Cy FOREWORD This report was done by the Syracuse University Psychological Abstracts Retrieval Service (SUPARS) Research Group at the Syracuse University School of Library Science, under contract F30602-71-C-0185, Job Order Number 45940000, for Rome Air Development Centerf Griffiss Air Force Base, New York. Mr. Nicholas M. DiFondi (IRDT) was the RADC Project Engineer. This report represents a continuation of work conducted under contracts F30602-69-C-0013 and F30602-70-C-0190 during the period 1 July 1969 and 31 January 1971. The current reseach contract covers work accomplished during 1 February 1971 to 31 January 1972, and deals with the development, iitplementation, and evaluation of new algorithms to iinprove recall in an interactive, on-line, free-text retrieval system. •Individual authors of different sections include: Section I, Kenneth H. Cook; Section II, Lynn Trurtp and Mr. Cock; Section III, Sandra Browning, June Brewer, Jeffrey Katzer, Patricia Moell, and Peggy Mucia; and Section IV, Mr. Katzer. Adoidwledgernent is given to the following individuals: Dean itoger C. Greer and Dr. Allan F. Hershfield of the School of Library Science who provided continued support for the SUPARS project; Mr. William J. Jones, Director, Syracuse University Carputing Center, for his cooperation and assistance in the inplementation of the on-line system; and to Mrs. Brenda Lefebvre, tthose overall editorial and typing made this report possible. This report has been reviewed by the Information Office (01) and is releasable to the National Technical Information Service (MTIS). Approved: NICHOLAS M. DIFONDI Technical Evaluator Approved^XpfcANZ H. DETTMER C o l o n e l , USAF C h i e f , I n t e l & Recon ,?£)< Division FOR THE COMMANDER FRED I . DIAMOND Acting Chief, Plans Office ii ABSTRACT The basic problems the current research effort (February 1, 1971-January 31, 1972) investigated were (1) the development, implementation, and evaluation of algorithms to inprove recall levels in interactive, free-text retrieval using a modified version of IBM's Document Processing System (DPS), (2) the development of techniques for increasing the vocabulary capacity of DPS, and (3) unobtrusive statistical data gathering of system use, growth and aost through a previously developed computer program. A free-text document data base (DDB) of 46,828 bibliographic citations and abstracts from Psychological Abstracts was developed. Also, two interactively accessible 'lata bases were developed and implemented to provide free-text vocabulary control and recall improvement directly to the user. No intermediaries were used in the retrieval process. These two algorithms were (1) a Vocabulary Data Base (VDB) containing the 106,702 unique free-text terms from the inverted file of processed documents in the DDB, and (2) a Search Data Base i(SDB) containing previously submitted user search inquiries to the DDB. A tfco-nonth period of experimental use of the entire system with all three data bases by students, staff, and faculty of Syracuse University in fall 1971 provided the required "real-life" field environment. A total of 2399 3earoh inquiries were submitted via the 2741 terminals. The system operated underfeoththe 360^50 and the 37§/l55 operati^ ^ The capacity of the DPS vcfcabulary was increased by the development and successful implementation of aomputer programs that revised the DPS coding structure of the vocabulary file. The newly developed structure changing the 16-bit coding to 32-bit coding resulted in increasing the vocabulary capacity from the farmer single-file limit of 65,534 to over 4 billion terms. An extensive user-oriented public relations/publicity, instruction and education package was developed and implemented. This user enphasis resulted in a significantly greater number of registrants and actual users in the system, which did not use search intermediaries, than during a similar fall 1970 period. Results of a controlled cx>st-performance (recall) study indicated that use of the VDB or the SDB yielded better cost-performance levels, especially at higher recall percentages, than by use of the DDB alone. These results are the initial findings, and would require additional testing to substantiate their validity. Other evaluative techniques included a Semantic Differential attitude scale for interactive retrieval systems, a structured telephone interview of users, and a special number users could call for help in developing search inquiries. The STKTPAC program for unobtrusively monitoring £nd retrieving data on system use, growth, and aost was modified and successfully used to provide evaluative data. General conclusions are that interactive, free-textrosfc-performanae(recall) levels can be inproved by direct user control of algorithms providing vocabulary control. Real-life application of these algorithms suggests that the recall iinprovement available throu^i controlled vocabulary or indexing systems might be obtained as readily by the free-text searcher who is provided with techniques such as those iKplonented in this research effort. ••* in EVALUATION The o b j e c t i v e of t h i s study was to develop, implement and evaluate methods for increasing vocabulary f i l e space and improving the r e t r i e v a l effectiveness of a f r e e - t e x t indexed o n - l i n e document r e t r i e v a l system at Syracuse U n i v e r s i t y . The system f i r s t operated on an IBM 360/50 computer and recently on a 370/155 computer. The document data base consisted of 46,828 b i b l i o g r a p h i c c i t a t i o n s and/or abstracts from Physchological A b s t r a c t s . Vocabulary storage space was increased by developing computer programs to convert the half-word ( 1 6 - b i t ) coding scheme as defined by the IBM/DPS program to f u l l word (32 b i t ) coding* Methods of improving r e t r i e v a l e f f e c t i v e n e s s include a vocabulary data base and a search data base as o n - l i n e searching a i d s . Results are reported in terms of nine l e v e l s of Recall (the portion of r e l e v a n t documents r e t r i e v e d ) * t o t a l r e t r i e v a l (the number of documents r e t r i e v e d to achieve a s p e c i f i c Recall l e v e l ) , and Cost-performance (the cost incurred to achieve a s p e c i f i c A v Recall l e v e l ) . * There are several s i g n i f i c a n t conclusions derived from the r e s u l t s of t h i s e f f o r t : 1. The conversion from 16 b i t to 32 b i t coding has increased the l i m i t on a single vocabulary f i l e size from approximately 65,000 words to over 4 b i l l i o n words. Without t h i s c a p a b i l i t y , upon reaching the 65,000 word l i m i t a new vocabulary would have to be defined and then created by the f r e e - t e x t indexing program r e s u l t i n g in i n e f f i c i e n t use of core due to redundant information between f i l e s and longer search cycles. 2. The use of the vocabulary data base as a searching aid r e s u l t s in b e t t e r cost performance than using the other data bases,* Since the vocabulary data base is a portion of the i n v e r t e d f i l e developed from the o r i g i n a l processing of documents, i t is r e l a t i v e l y inexpensive to add to the system and reduces cost performance by decreasing the need to search the more expensive document data base. 3. S t a t i s t i c a l l y s i g n i f i c a n t differences in t o t a l r e t r i e v a l at a l l levels except the 10% level of Recall r e f l e c t e d v a r i a t i o n s in e f f i c i e n t use of the system by search e x p e r t s . Each expert was equally knowledgable in the subject a r e a , equally t r a i n e d in the use of f r e e - t e x t r e t r i e v a l , and used the same Information requirement statements to formulate his searches. v Preceding page blank However, each chose a different search strategy in the hopes of minimizing total retrieval. This finding indicates that it is difficult for system experts to find or establish efficient search methods. Since most users may be knowledgeable in their fields but not necessarily experts in the use of on-line retrieval, attempts by novice users to formulate efficient search strategies may be difficult to achieve. 4. Total document retrieval is very high at all levels of Recall. Although use of the vocabulary data base or the search data base does reduce total retrieval considerably from that achieved by using the document data base* too much time and effort would be required of the user to scan for relevant documents. As a result of this study future research can be directed toward establishing better search methods to reduce total retrieval, continue work on the Recall improving algorithms to insure total retrieval improvement does not negate their effect on system effectiveness, and identify differences in the experts methods of searching to determine the impact on the general population of users and make adjustments accordingly. NICHOLAS M. DIFONDI vi TABLE OP CONTENTS SECTION I. INTRODUCTION 1. 2. 3. 4. Previous Research Efforts Objectives of Current Wbrk (SUPARS II) Philosophy and Approach CXrarview of Wbrk Acoarplished: February 1971 - January 1972 a* Development of New Searching Algorithms (1) The Search Data Base 5 9 1 2 3 TITLE PAGE 4 (2) The Vocabulary Data Base (3) fBbegratian o£ Ml in Searching ;-'• Three Data Bases 10 11 12 (4) Alternatives Considered but not implemented 5. 6. Modification of the Free-text Vocabulary File Documenting.Cost, Use, and Grcwrth of SUPARS/bPS II au b. Documenting Cost of SUPARS /DPS II Documenting Use of SUPARS/DPS II (1) Structured Phone Interview 13 14 14 14 15 n 15 15 16 16 (2) Attitude Measures (3) c* User Initiated Calls Documenting Growth of SUPARS/DPS (1) Growth of the Three Major Data Bases (2) (3) 7. Summary Profiles bf User Registrants STATPAC vii TITLE DATA BASE 1. 2. Docunent Data Base Vocabulary Data Base a. Special Programs Developed for a Vocabulary Data Base Expanding the Capacity of the Vocabulary Data Base PAGE 17 22 b. 24 3. The Search Data Base a. b. Definition of Data Base Description Modifying Search Module and Reformat Programs Specification of Oiitput Forms 27 27 29 29 29 c* 4. 5. SuKirary Implications and Projections APPENDICES Appendix I Appendix II System Overview Translate Psychological Abstracts 32 34 49 Appendix III Search Reformat THE USER 1. Profile of Registrants a. b. Cumulative Growth of the Registrant Population Demographic Data 57 58 60 60 60 • • (1) University Status . (2) Departmental Status (3) Time S£ent on Teaching, Research and Other Activities viii 60 SECTION TITLE (4) Previous SUPARS Registration and Use c. Use of Psychological Abstracts by Registrants • (1) Reaent Use of Psychological Abstracts (2) Future Use of Psychological Abstracts PAGE 64 64 64 68 (3) Reaent and Future Use of Psychological Abstracts for Preparation of a Term Paper, Thesis, or Speech (4) Specific Need far Psychological Abstracts 71 71 76 d. Ccnputer Experience of Registrants (1) Previous Scperience with Computer ,TB3tdnals (2) Previous Experience with Oonputer-Based Retrieval Systems 76 77 77 78 78 79 80 e. Summary 2. 3. Publicity 4220 LOGS a. b. Description of Log Summary B Summary 4. STATPAC a. b. c. Summary 1 Summary 2 Summary 85 88 90 91 93 93 5. Telephone Interviews of SUPARS Registrants a. b. The Sample Completed Versus Non-completed Interviews ix SECTION c. TITLE Users: Description PAGE 94 95 99 101 10 4 106 106 112 113 d. Users: General Usage Patterns e. Users: Reactions f. Nonusers: Description g. Sunmary 6. Semantic Differential a. b. Description of Procedure Data Organization c. Results d* CarpariLson with the Results of Last Year's Study 113 118 118 119 119 119 120 121 121 e„ * Conclusions .f. Discussion 7. Inplicattons and Projections a. b. c. User Orientation Publicity and Instruction Fwictions User Control of Interaction d. Obtaining User Response e. Conclusion APPENDICES Appendix IV Appendix V Appendix VI Program Description of SUPARS SIATPAC SUPARS telephone Survey Question and Registration Form 122 134 • Introduction to Semantic Differential Questionnaire 147 x SECTION TITLE PAGE IV 3T-PI3RFORMRNCE ANALYSIS CO! 150 150 152 153 154 156 156 157 157 158 158 161 163 172 177 177 179 180 181 1. Definitions 2. Method a. Experimental Design b. IRSs c. Identification of Relevant Documents d. Search Experts e. Procedure f. Analysis •j.f Results 4 a. Use of Optional Daba Bases b. Computer Costs c. Cost-Performance d. Comparing the Cost-Performance of Two Years of SUPARS Operation 4. Discussion a. The VDB b. The SDB c. Differences Among IRSs and Experts 5. Summary of Findings APPENDICES AppenSlx VII Instructions to Subjects Appendix VIII Analysis of Variance Suntnaxy 183 185 XI SECTION V. TITLE CONCLUSIONS AND RECOMMENDATIONS 1. Major Conclusions a. b. c. Rocall-Iraproving Algorithms Improvir^ SUPARS/DPS Vocabulary Capacity Computer Programs Recording System Use, Ccst, and Growth Publicity, Instruction aixl Education Systematic Evaluation of User Attitudes and Reactions Growth of the Document Data Base Cost Comparisons Between SUPARS I and SUPAR&ai PAGE 192 192 192 193 193 193 d. e. 193 193 f. g. 194 194 2. Reccniriendaticns Made a* Continue Development of User Control of Vocabulary and Synonyms Maintain a Non-Reentrant Search Module Improve User Access to Hardware Investigate Searching Styles and Techniques of Free-Text Users Develop Interactive Algorithms Based on User Styles and Techniques 194 194 194 b. c. d. 195 e. 195 196 REFERENCES xii LIST OF FIGURES AND TABLES PAGE Search Inquiry Using Search Data Base Output Example of "LIST SEARCHES" in Search Data Base Output Exairple of "LIST WORDS" in Search Data Base Vocabulary Data Base Input: Determining Status of Single Wbrd Vocabulary Data Base Output: Determining Status of Single Word Grcwth Rate of SUPARS/lDPS n Document Data Base Curtulative Gropfch of the SUK®S/t3PS Vocabulary Data B s e at-. . v Modification of Characters in Search Data Base Cumalative Grcwth of SUPARS II Registrant Population 6 7 8 8 10 20 23 28 59 Carrputer Cost per Search to Achieve Nine Levels of Recall: A Conparison Among Three Treatment Conditions 168 TVo Methods of Conputing Dollar Cost of Searching on DDB Using LIST BRIEF Output Format Number of Documents Retrieved (Total Retrieval) to Achieve Nine levels of Recall: A Comparison Among Three Treatment Conditions 170 169 Nunber of Documents Retrieved (Total Retrieval) to Achieve Nine Levels of Recall: A Comparison Between SUPARS I and SUPARS II Searches to the DDB 175 Nimber of Documents Retrieved (Total Retrieved) to Achieve Nine Levels of Recall: A Conparison Among the SUPARS II Ctonditions and the Most Efficient Use of SUPARS I 178 xiii TABLE PAGE I II III IV V loading History of SUPARS/DPS II Document Data Base Track Allocation and Usage of Document Data Base Track Allocation and Usage of Search Data Base Growth of Search Data Base Number of SUPARS Registrants by university Status Number of SUPARS Registrants by Major Departmental Categories Number of SUPARS Registrants by Departmental Categories Percentage of Tine .Qigaged in Teaching and/or Learning Percentage of Time Engaged in Research Percentage of Time Engaged in Other Activities Registration for SUPARS Last Year Usage of SUPARS by Last Year's SUPARS Registrants Average Usage of Psychological Abstracts in the 2-3 Month Period Preceding Registration Anticipated Usage of Psychological Abstracts in the 2-3 Month Period Following Registration Past Use of Psychological Abstracts for Paper, Thesis, and Speech Preparation Anticipated Future Use of Psychological Abstracts for Paper, Thesis, and Speech Preparation 19 21 26 25 61 VI VII 62 63 VIII 65 66 67 68 69 IX X XI XII XIII XTV 70 72 XV XVI XVII XVIII XIX 73 74 Anticipated Types of Use of Psychological Abstracts 75 Previous Experience with Computer Terminals Previous Experience with Cotputer-Based Retrieval Systems 76 77 xiv TABLE PAGE XX XXI XXII Log Summary A Log Summary B STATPAC Summary I Mean CPU lime and Cost Over the Three Data Bases Frequency of Completed Interviews Academic Status of Users Frequency of Attempted Use of SUPARS Average Tine Spent Using SUPARS in one Session at the Terminal Frequency and Percentage of Estimated Number of Searches Mad&"~ ^ Problems Experienced' by Users Which H&npered Attenpts to Sign On Problems experienced by SUPARS Users After Signing On Frequency and Percentage of Responses: "Have you located more relevatn information with SUPARS?" Academic Status of Nonusers Attitudes Itoward and Cbntact with SUPARS Canaepts Used in the SUPARS Semantic Differential Packet Exanple of a Semantic Differential Used in the Present Study Classification of Respondents by Ctxrpletian of Serantic Differential Classification of Respondents by Number of Searches Made Number of Days Students Used SUPARS 81 84 87 89 92 94 96 xxfii XXIV XXV XXVI XXVII XXVIII 96 97 XXIX 98 XXX 99 XXXI 100 101 1Q3 107 108 XXXII XXXIII xxxiv XXXV XXXVI 109 XXXVII 110 112 XMCVIII xv TABLE Concept Means Standard Deviations for Concept Means Experimental Design Characteristics of Subjects and IRSs Sources of Variation and Degrees of Freedom in Cross-over Design Use of Three Data Bases to Achieve the 90% Recall Level Estimates of the Ccnputer Cost of Searching the Three Data Bases Estimates of the Mean Cost of Searching tinder the 'D' Experimental Condition at Nine Levels of Performance (Recall) Estimates of the Mean Cost of Searching Under the 'S1 Experimental Condition at Nine Levels of Performance (Recall) Estimates of the Mean Cost of Searching Under the 'V Experimental Condition at Nine Levels of Performance (Recall) Differences Between SUPARS I and SUPARS II Estimates of the Computer Cost of Searching to DDB Counting All Entered Searches (N=246) PAGE xmx XL XLI XLII XLIII XLIV 114 116 154 155 158 159 162 XLV XLVI 164 XLVII 165 XLVIII 166 173 XLIX L 174 xvi LIST OF FIGURES FOR APPENDICES FIGURE PAGE APPENDIX I 1 System Overview 33 APPENDIX II 1 2 3 4 5 6 Input Record Description Translate Psychological Abstracts Logic Diagram Translate Psychological Abstracts Flow Chart Reformatted Data Reformat Psychological Abstracts Logic Diagram Refarwat Psychological Abstracts Flow Chart 35 36 37 41 42 44 APPENDIX III 1 2 3 4 5 Data Base Description Input Record Description Output Record Description Refonrat Searches Module 1 - KEBF1DS Reformat Searches Module 2 - POEMSRCH 51 52 53 54 56 APPENDIX IV 1 2 3 4 5 Data Flow of SUPARS log Program Flow for Producing STAT Output Program for Conversion of Log to STAT Usable Form MACRO - General Outline of STAT Programs PROCESS - Main Wbrking Section of STAT Programs 124 125 126 129 130 xvii APPENDIX V 1 SUPARS Registration Form 146 APPENDIX VIII 1 Differences Among Three Treatment Conditions at the 10% Recall Level Differences Among Three Treatment Conditions at the 20% Recall Level Differences Among Three Treatment Conditions at the 30% Recall Level Differences Among Three Treatment Conditions at the 40% Recall Level Differences Among Three Treatment Conditions at the 50% Recall Level Differences Among Three Treatment Conditions at the 60% Recall Level Differences Among Three Treatment Conditions at the 70% Recall Level Differences Among Three Treatment Conditions at the 80% Recall Level Differences Among Three Treatment Conditions at the 90% Recall Level 186 2 3 4 187 188 188 5 189 6 189 7 190 8 190 9 191 xviii USAGE OF TERMS Because the definitions of terms used in the information technology field are not completely standardized nor consistent the terms in this report are e^lained below. An effort has been made, where possible, to follow the most consistently used and reasonable meaning to convey a concept. Where specialized or nore specific usages of these terms are enployed in specific sections of this report, an explanation will be given by the section author. (a) Delta: The character (upshift "H") on the 2741 keyboard that is used in SUPARS/DPS user interaction to access the document data base and initiate a search inquiry. (b) Delta S: The character "delta" and the letter "S" ( S) which are used in SUPARS/bPS user interaction to access the search data base and initiate a search inquiry. (c) Delta V; ihe character "delta" and the letter 'V" ( V) which are used in SUPARS/DPS user interaction to acaess the vocabulary data base and initiate a search inquiry. (d) Dictionary; The internally stored list of unique free-text terms processed by DPS and the document frequency count for each work. The DPS dictionary forms one part of the inverted file. (e) pocunent: In this study, the term "document" stands for the bibliographic citation and abstract that are used as a surrogate of the original journal, article, proceeding, book, etc. (f) Document Data Base (DDB): Consists of SUPARS/DPS processed docur irents of Psychological Abstracts which are interactively accessible on-line by users- The DDB is one of three data bases available to the SUPARS/t>PS user (others are vocabulary data base and search data base). (g) Document Processing System (DPS): The IEM free-text, batch mode programs that convert machine readable textual data into searchable and retrievable data sets organized in inverted file structure. (h) Free-Text; Specifically refers in this study to the Document Processing System. The general reference is to an organized system alleging the indexing and retrieval of documents or their surrogates by any of the terms used in a defined text, rather than terms detived by a controlled set of terms* (i) Information Requirement Statement (IRS); The verbal or written statement of an individual's interest area as generally related to documents or their surrogates. The IRS is the publically verifyable indication of the internally held construct, "information xix (j) Label (labelled line): Ttje portion of a search inquiry, such as IA, 12, etc. that identifies and stands for the search words and operators used to act on those wards; a label can be used itself in an irquiry as a search ward. (k) Operator; The user language accepts as input one or more keywords which represent the IRS of the searcher. Keywords nay be combined with Boolean operators (AND, OR, NOT) or grammatical operators (those which specify the desired proximity of keywords within a sentence or those which specify the root of a word). (1) Search: The search inquiry, the user/computer interaction, and the printed output, if any. The beginning of a new search inquiry marks the end of a search. (m) Search Data Base (SDB): Consists of the previously stored and prooessecFsearch inquiries made to SUPARS/DPS. The SDB is one of three data bases available to the user and was newly developed during the current research. (Others are the document data base and the vocabulary data base.) (n) Search Inquiry: The user arrangement of words, word combinations and logical operators in A form acceptable as input for machine .processing. A SUPARS/DPS search inquiry would consist of the free-text terms combined with Boolean and other logical operators, the request for output, and an "end" statement. Examples of search inquiries are given in Section II. (o) Search Word: Free-text word(s) or term(s) used as part of a search inquiry. A KEYWORD is a synonym for a search word. (p) STATPAC: The Statistical Package used in con junction with SUPARS/ DPS to unobtrusively collect, store and retrieve the elements of user interaction and other system parameters such as time, terminal number, cost, etc. STATPAC includes a highly flexible retrieval system in itself which allcws the operator to specify and retrieve various ccnbinations of data reflecting user interaction or system performance. In addition to standard summaries printed periodically, the operator oould request, for exanple, a listing of the oonputer time used for all searches of the document data base by one-time users after a oertain calendar date. (q) SUPARS/bPS: Syracuse University Psychological Abstracts Retrieval Serviae/Document Processing System. The modified DPS program developed at Syracuse University which allows on-line, interactive searching of free-text data. SUPARS/DPS I refers to the research work aoixlucted from July 1969 to January 1971. SUPARS/bPS II refers to the work oonducted from February 1971-January 1972. (r) Voccibulary: The on-line, interactively accessible dictionary that is stored by DPS. The term "vocabulary" rather than "dictionary" is used to connote the words and terms accessible to the user xx that can be used as free-text index tents. (s) Vocabulary Data Base (VDB): Consists of the on-line, interactively accessible DPS dictionary of free text terms. The VDB is one of three data bases available to the user and was newly developed during the current research. (Others are document data base and search data base) • xxi