SECTION IV COST-PERFORMAICE ANALYSIS Jeffrey Katzer* This section of the report is an extension of the performance study conducted with last year's version of SUPARS (Section V, 1971 Final Report). Oar major goal was to obtain estimates of system performance — specifically recall, paired with scroe measure of the cost required to achieve particular levels of performance. A cost-performance relationship for various subcomponents of the system can be determined and compared, thereby providing valuable input to design considerations for future versions of SUPARS. Last year, for example, we found that searches using simple Boolean operators performed better and at less cost than searches using syntactical or positional operators. This finding affected our plans for user education, budget estimates and the design of this study. 1. DEFINITIONS SUPARS/DPS - The current version of soft-ware support for SUPARS is a highly modified and extended version of IBM \s Document Processing System (DPS) . (6) The input-output interactive user language was developed by the SUPARS programming staff for interface with the batch-mode DPS search modules. Since system performance is a function of input and output, it is misleading to separate the SUPARS components of the system from those of DPS. Thus, SUPARS/DPS correctly designates- the total operating retrieval system. Document - As used here, a document is item in Psyclio logical Abstracts. Usually a graphic citation and an abstract. Whenever publish an abstract for an item, the SUPARS a citation only. , the total representation of one document consists of a biblioPsychological Abstracts does not files represent the document as Document Data Base(DDB - The DDB is that data base searched by SUPARS/ DPS consisting of Psychological Abstracts documents (as defined above.) That portion of the DDB used in this study consisted of most documents published between February, 1970 and June, 1971.l (The size and growth of the total * The author wishes to acknowledge the contribution of several staff members to this report — especially Miss Sandra Browning and Dr. Kenneth Cook for completing the onerous duties of search experts, and Miss Margaret Mucia for the lengthy task of data tabulation. 1 Documents published in March and May, 1970, were discovered to be missing from the original maclline-readable tapes. While we ultimately obtained these docianents, they were received too late to be included in this study. 150 DDB available to users for searches is described in Section II of this report.) As used hereafter, DDB refers to that subset of the total DDB used in this study. Vocabulary Data Base (VDB)- The VDB is a pseudo-data base searched by a simplified version of the SUPARS/DPS user language. The components of the VDB are the words in the system's vocabulary — i.e. all non-oannmon words found in the DDB derived from the free-text processing documents by DPS. The VDB is called a pseudo-data base because we did not have to create a new data base as defined by DPS. The comnonents of VDB are stored and used during construction and searching of the DDB. Our problem was to access this vocabulary as a separately retrievable data base — a complex, but not too difficult prograinuing task. Search Data Base (SDB) - The SDB is that data base searched by SUPARS/ DPS consisting of previously made SUPARS search inquiries. Throughout last year's and this year's operating periods we attempted to store all interactions with the SUPARS system. Because of complex systems problems only a subset of the searches made in 1970 could be identified and stored. These, plus the great majority of the 1971 searches to the DDB make up the SDB. At the time this study was begun, the SDB consisted of 4,235 searches. Information Requirement Statement (IRS) - In this study an IRS is a * written statement of a person's information needs. We assume that the written statement actually represents tliat need. Operators - The user language accepts as input one or more keywords (search words) which represent the IRS of the searcher. The keywords may be combined with Boolean operators (and, or, not); or grammatical operators (e.g. which specifies the desired proximity of keywords, or the truncation operator — see below.) Truncation Cfrerator - The truncation operator is part of the user language. It is requested by typing the characters (?) at the end of a whole or partial word. Internally the truncation operator retrieves the union of documents containing any word in tlie vocabulary beginning with the letters to the left of the (?). Search Inquiry- A SUPARS/DPS search inquiry is one complete injxit into the system. It consists of seme system ocnmands, an output specification statement, plus one or more user chosen keywords combined with the operators desired in order to represent concepts described by IRS. M l correctly input search inquiries will not necessarily retrieve output. For one information requirement, a user may typically enter several search queries. Logically Different Search Inquiry - One of the arbitrary parameters of the SUPARS system is the maximum number of retrieved documents vrtiich can be typed out in response to any one search inquiry. Using the fuller output format (LIST REOORD) each search will print a maximum of 10 documents. The citation output format (LIST BRIEF) will print up to 100 document identifications per search. If a searcher wishes to print more than this maximum, 151 the search inquiry must be repeated, (although he need not type the key words ard operators again.) Each of these repeated search inauiries will be counted as one logical search inquiry- In order to have two logically different inquiries, the user would have to change one or more keywords or operators. Estimated Recall - Recall, as a measure of performance, is defined as the retrieved proportion of relevant documents in the data base. For large data bases it is difficult to know how many documents in the total data base are relevant (which is tlie denominator of the recall ratio). To estimate a recall ratio, we have assumed that each of the ten documents identified as relevant by the writer of an IRS represents 10% of the total number of relevant documents in the entire data base. Total Retrieval - This is siirply the number of documents retrieved by one search inquirv. It can be used as the denominator of the precision ratio. By using total retrieval instead of nrecision, we were able to snecify how many documents a user will have to scan through in order to fin:! the relevant ones. In this sense, total retrieval is a measure of effort or cost of labor of the system. In this study, total retrieval is one of our major measures of system cost. It should be noted, however, that the effect of this cost, (effort) on users of a particular system, probably varies. It is not considered here. Costs - Many factors contribute to the cost of a retrieval system. King ^4j nas identified the majo£ ones. Many of these factors are installation dependent (e.g. salaries, whether or not an intermediary is used, whether retrievals are mailed to requester, etc.) For this reason we will limit our analysis to two measures of cost: (1) total retrieval — which is reasonably independent of the particular installation, (2) computer costs in terms of CPU time and the number of I/O chainel executions. Though these are dependent upon particular hardware configurations, comparisons among alternative computing systems can be made. CPU time and I/O executions are easily obtained and we chose them because they constitute a common standard for comparing different computer systems. 2. METHOD The major objective of the study outlined below is to obtain an estimate of the cost-performance relationship for the current versions of the SUPARS/DPS. At the end of our analysis of the 1970 SUPARS system, we concluded that oerformance oould be improved by some form of vocabulary control. The VDE and the SDB were constructed as an initial attempt to "control" the vocabulary, but retain the free-text component of the system. I The VDB allows a searcher to determine if his keywords are in the vocabulary. Another option available with the VDE retrieves all vocabulary items with the same initial letters. We had hoped that this option could be 152 used to reduce the total retrieval of a search incruiry. If, for example, a searcher plans to use the truncation operator, prior use of tlie VDR might identify unrelated words which the truncated keyword would encompass. By eliminating the non-related words frcm lois incruiry, the total retrieval will be reduced. This method of using the VDB could also improve the proportion of relevant documents retrieved by identifying synonym-like2 keywords in the vocabulary. The SDB was designed priinarily to aid the searcher to obtain synonyms.2 By typing as input into the SDB a keyword which is not general enough (i.e. does not retrieve enough documents), the searcher can retrieve all keywords and all stored searches which contain his keyword. On option, tlie searcher can have the entire retrieved incruiry printed — including the operators which link the keywords. Another possible input into the SDB is the identification of a known relevant document; the output would consist of all searches which retrieved that document. M l of these options were designed to make the vocabulary and logic of any searcher anonymously available to any other searcher — hopefully increasing a users ability to search the DDB more effectively. In order to test the effect of the DDB and the VDB on performance we conducted a laboratory e^cpferiment very much similar to the one conducted last year. In the. experiment searching stimuli were obtained by hiring •subjects to write a statement of their information needs (IRS). Upon completing his IRS each subject manually searched the published version of Psychological Abstracts to find documents relevant to his information needs. The ten most relevant of these documents were used in the study. Expert searchers read each IRS and used SUPARS/DPS to retrieve 9 of the 10 documents previously identified as relevant. Each retrieved relevant document was taken to be an estimate of another 10% of all relevant documents in the DDB. Experts were restricted in their search inquiries, however, to one or more of the three data bases. (DDB, VDB, SDB.) Through proper experimental design we had hoped to obtain clear measures of the relevant effectiveness of the two new data bases. a. Experimental Design Our original plans called for four treatment conditions (controlling the data base permitted to be used by a search expert), four search experts and sixteen IRSs. These variables were arranged in a 4 x 4 Latin Square design replicated 4 times. 2 The term "synonym" is used here as a general name for all words which niay refer to the same class of objects. True synonyms fit in this class as do those items listed under a "See also" heading in an index. For example, synonyms of "human" which may help a user, are male, female, boy, girl, child, adolescent, student, subject, teenager, etc. — plus all of their plurals. 153 For various reasons,3 we had to delay the start of the experiment and consequently one of the experts liad to resign fron the study because of orior craTiruttinents. The study was redesigned with three treatment conditions, three experts, and 15 IRSs arranged in a 3 x 3 Latin Square replicated 5 times. This design is presented below in Table XLI. TABLE XLI . EXPERIMENTAL DESIGN IRS 8 7 D \T c. 10 V 15 S D V 12 D V S 16 V 9 S V D 2 V D S 4 D S V 3 S V D 13 V D S 14 D S V 11 5 S V D 1 D S V A Search Expert B S D v D S s D s D C V In Table XLI three treatment conditions identify the data bases each exnert was permitted to use for a particular IRS: 'Df means the exnert was restricted to the DDB, 'V means the exnert could use the ^ B as well as the DDE, 'S* means that the SDB and the DDB could be used.u b. IRSs Subjects were needed to produce the required IRSs for the experts to search. They were contacted through advertisements asking for people with information needs in the social or behavioral sciences. A brief description of the fifteen subjects whose IRSs were used and of the IRSs themselves is presented in Table XLII. Each subject was asked to write a detailed description of one of his current information needs. He was not told to construct his IRS in terms of the content of Psychological Abstractsf in fact, the data base was not mentioned at this stage of the experiment. The instructions given to the subjects are presented in the Appendix. In general, each subject was asked to state his information needs in such a way that we could go out and get the information for him. 3 Notably, the delays caused by the computing center's conversion from the IBM 360/50 to an IBM 370/155 before this experiment was Planned to be carried out. ^In the original 4 x 4 design, the fourth treatment would have allowed a search expert to search all 3 data bases. While it is unfortunate that this condition is missing frcm the study we can still estimate the effects of the 4 condition — if it turns out to be additive. 154 TABIE XLII CHARACTERISTICS OF SUBJECTS AND IRSs 1 Academic IRS | Sex Status Dept Topic i Number of Relevant Documents Found T~ 1M 2 3 4 5 7 g 9 10 11 12 13 Fac Grad Bduc Vocational choice proaess of the disadvantaged Behavioral Theory, counseling, and education Activity level of animals in response 1 to stiinulation Change in suburbs and voting patterns Efficiency and disoriininability of tests Inaonpatability in marriage Comprehensive services for unmarried pregnant teenagers Development of human relationships in T group Education of Exceptional Children and Instructional Technology Instructional t.v, in education Effects of intercultural contact and bilingualism Errorless learning in adult problem solving Attitude of people confronted with technology Police intervention in social/sexual deviancy Review of federally supported youth work programs 18 43 73 i F Educ M M M F H M M M M F Fac Grad Grad Psy Law Psy Nursing 16 17 33 43 42 18 65 13 19 17 54 42 Undgr Grad Psy Psy Educ Bduc Grad Grad Grad Grad Grad Psy Psy Lib Sci 1 4 15 16 J M j Fac M j Undgr M Fac Psy Psy 155 c. Identification of Relevant Documents Upon completing the writing of his IRS, each subject was given the February, 1970 — June, 1971 issues of Psychological Abstracts. These issues were chosen for two reasons: (a) they constituted a data base of approximately the same size as that used in last year's study, and (b) they were the most recently received issues of Psychological Abstracts (the IRSs were collected in early Fall of 1971) which WDuld also be included in the DDB. Each subject was asked to find the entries in Psychological Abstracts which he judged to be relevant to his IRS. A subject could complete this task in any manner he chose. We suggested that he at least look at each month of the journal — but there were no directions specifying how to find the relevant documents or how many to choose,5 The final task for each subject was to rate each of the documents chosen into terms of their relevance to his IRS. The 10 documents rated most relevant were used in this study.6 In last year's study we used the 5 documents rated most relevant and the 5 rated least relevant by the subjects. Upon analysis, we found a better cost performance relationship for the most relevant documents. Because of the plausability of this finding we did not think it necessary to replicate that portion of the study this year. d. Search Experts Three SUPARS staff members served as search experts. All had considerable experience with SUPARS: they WDrked on last year's project, helped in the design of this year's system, made numerous searches with the system and prepared materials which taught others how to use the system. Each expert was given the 15 IRSs and the identification (Volume and Abstract No.) of the 10 documents judged relevant by the subjects. The experts were instructed to use SUPARS as effectively as possible to retrieve any 9 of the 10 documents. 5None of the fifteen subjects completed this test without identifying at least 10 documents he considered relevant. If this had not hapnened, he vrauld have been asked to continue the task. 6 It was sometime after the subjects completed their job, when we discovered the two gaps in the DDB — see footnote #1. At that time it was necessary to check each of the ten documents paired with each IRS. If the chosen document fell in one of the gaps of the DDB, the next most highly rated unused document not in the gap was substituted. 156 Their job was to use the system to maximize the retrieval of relevant documents with minijral total retrieval. To achieve this goal they were permitted to use any of the SUPARS/DPS operators available. The only restriction the experts were to observe vas the treatment condition — the data base combinations permitted for each IRS. (Experts were permitted, but not required, to use the SDB or the VDB when they were searching under the 'S' or 'V1 conditions (see Table XLI.)* e. Procedure Each expert was given a packet of 15 IRS — each paired with a list of 10 documents judged to be relevant to that IRS. The order of IRSs in each packet was randomized. Unfortunately, however, it became essential for the experts to begin their work before the 1971 searches were loaded into the SDB. The loading of searches took nnch longer than expected; consequently the three experts searched their 5 IRSs under the 'S1 treatment condition last. That is, the effect of the 'S1 treatment condition is confounded with a practice effect. This will be discussed, at more length, later. All of the experts' terminal interactions with SUPARS were saved. For each IRS, the first 9 relevant documents retrieved ware marked, and the cumulative number of non-relevant documents retrieved before each relevant document was counted. In this way, 9 measures were obtained for each of the 45 expert-IRS combinations. The mean number of non-relevant documents retrieved before the first relevant document was retrieved, (across all fortyfive expert-IRS combinations) provides the estimate of total fall-out, (or cost of screening) at the 10% recall level. W e also counted the number of search inquiries made because the computer f cost of running SUPARS depends mainly upon the num^oer of inquiries, not the number of documents retrieved. CXir second measure of cost (CPU time and I/O executions) is obtained from data collected by STATPAC. For each search inquiry an expert enters for an IRS, STATPAC stores the CPU time and I/O executions (among other things ) Summing these through each relevant > document retrieved produces the second measure of cost. f. Analysis The Latin Square design was analyzed as a cross-over design. The partitioning of sources of variation is shewn in Table XLIII. An analysis of variance procedure, followed by hypothesis testing or interval estimation will determine whether or not there is a significant difference among the 3 treatment conditions. There probably will be real differences among experts and IRSs. •Experts were given the option of using SDB under the f S' condition or the VDB under the 'V1 condition. 157 TABLE XLIII SOURCES OF VARIATION AND DEGREES OF FREEDOM IN CROSS-OVER DESIGN Sources of Variation Among Treatment Among Experts Among IRSs Residual1 df 2 2 14 26 Ibtal 44 1 Ihe degrees of freedom for residual is not as large as desired. Our original design called for four 4 x 4 Latin Squares. The degrees of freedom in that design WDuld have been more satisfactory. 3. RESULTS The findings of this study are presented in 4 sections: First, a brief look at the treatment variable to see if search experts actually used the tvo optional data bases (VDB, SDB) as searching aids. Second, another look at the dependent variable, computer cost. Third, we will compare the cost-performance relationship of the different treatments. Finally, an attempt will be made to compare last year's and this year's systems in terms of cost-performance. a. Use of Optional Data Bases As noted earlier each search expert was assigned five IRSs he could search with the VDB as an aid. Five other IRSs could be searched with the SDB, and the remaining 5 IRSs could not be searched with any aid — only the DDB was permitted to be used. Since any differences between the use of different data bases is probably cumulative, V e only need to examine the total number of search v inquiries made to achieve the 90% recall level. The search frequencies and average use of the three data bases is presented in Table XLIV. The DDB was necessarily used in all forty-five IRSs because it contained the documents to be retrieved. Of the 824 logically different inquiries 158 CO in en m oo 3 •9 en S MM i 00 o¥> CO J uu C M o o o o\P *S3 6 in CO CO -I CM CM .c CO CO *y oo VD *W 0 , 0 , CM CO rrt r-H CO CM 00 CM § § g g (0 (0 ^•s te £3i (f) 00 00 C\00 •6 6 % u-t U-4 iC M-J C M v./ w in v-, _ ^ 3 4 log call O U •H -H Q SI H tf e n 4J S$ 4-f M-4 4-» IM m in in >- 0 -H made (to achieve a 90% recall level for 3 experts each searching 15 IRSs) v 691 (84%) were to the DDB. The remaining search inquiries were made to the VDB (11%) and the SDB (5%). Fifteen IRSs (5 for each expert) were searched. under the VDB condition. Two experts each chose not to employ the ^/DB for one of their IRSs, Averaged over the 15 IRSs in which the VDB was permitted to be searched, experts chose to use it over 27% of the time. The SDB had less usage. Search experts found at least one IRS in which they did not employ the SDB. Moreover, average use of the SDB dropped to approximately 20% of all inquiries nade by the experts under this treatment condition. In each of the 5 times the SDB was not used, the experts achieved the 90% recall level with fewer total searches than the othet experts. Across these 5 IRSs, the median difference between the fewest number of searches arri the next fewest was 7 — i.e. experts who chose not to use the SDB achieved a 90% recall level with an average of seven fewer searches than the next best performing expert.7 Seert in this light, it appears that experts who are using a wsrkihg strategy for retrieving relevant documents may not have the need to use the SDB for help. Perliaps they would turn to the SDB when and if their list of searching strategies became depleted. However, when asked, the search experts did not recall consciously lurking in the manner described. There is an obvious need for further study of the use of the SDB; more will be discussed in this report. At this time, it is sufficient to point out that the SDB was infrequently used. The dilenroa affecting an assessment of the performance of the SDB is apparent: (1) If the performance of the SDB does not differ from that of the DDB it may be due to its infrequent application rather than any intrinsic uselessness on its part; or (2) if tlie performance of the SDB does differ from that of the DDB it may be due to the fact that all 3 experts searched their SDB IRSs last -- so an increase in performance because of practice, or a decrease in performance because of fatigue or boredom is possible. 7 A similar pattern is apparent, looking at the two IRSs which did not esmnloy the VDB. One expert, not using the VDB achieved the 90% recall level with fewer logically different searches than either of tlie other experts. The other expert, achieved a 90% recall level with the second fewest number of searches — but the total number of searches made (7) was small in absolute terms. The rationale postulated in the text supports the nonuse of the VDB as well as the non-use of the SDB. 160 b. Gomputer Costs There are 4 different, but related, measures of computer cost collected by ST&TPAC. (1) The CPU time is the amount of processing required to execute one search. This includes input and output processing plus accessing the inverted file and using the search operators. CPU time will be presented in seconds. (2) The EXCP count is a measure of I/O activity plus any internal channel executions.8 It will be given as a frequency count. (3) The number of completed searches is another indicator of computer costs. In general, only logically different, completed search inquiries were counted. (4) Finally, we have combined the CPU time and the EXCP count for a search into a dollar figure. How to combine these into one index of cost is somewhat arbitrary • We. have chosen to use the rates currently being charged for the university's IBM 370/155: $360 per hour of CPU time plus $1.50 for every 1,000 EXCPs.9 Our plan was to retrieve frem STATPAC these four cost measures for all of the completed inquiries made by the search experts. Unfortunately, several disc tracks were irretrievably lost, making it impossible to get accurate post figures for searches in 23 IRSs. Eleven IRSs were missing cost figures for all;searches, 12 IRSs had some cost figures, and the remaining 22 IRSs had cost figures for all searches. Two of the group of 12 IRSs with partial cost information had less than 20% of their searches missing. The missing costs in these two IRSs were estimated using the mean cost per search (See Table XLV). This raised the number of IRSs with complete cost data to 24. These are divided into seven complete IRSs searched under the 'D' condition, 12 searched under the 'Sf condition, and 5 searched under the 'V1 condition.10 While each expert contributed at least one IRS 8 The operating system's teleprocessing package executes one EXCP for every carriage return on input and output — i.e. one EXCP for every line (regardless of size) of input and output. In addition, there are two EXCPs for every keyword in a search inquiry (one to the dictionary and one to the vocabulary.) Plus, there is approximately one EXCP for every use of a label as a keyword in a search inquiry. It is impossible to determine the exact nuuber of EXCPs without using STATPAC or some computer monitoring system. Truncation operators, the length of internally generated temporary computation strings, and the size of buffers have an effect on the number of EXCPs. 9 CPU time and EXCP count are only twD of four components which determine the total oonfxiter charge for a search at Syracuse University. There is also a charge for (1) core residency (i.e. number of bytes of core required per hour); and (2) the length of time a terminal is connected to the computer. Neither of these are included as cost figures in this report as they contributed less of our understanding of the operating characteristics of SUPARS/DPS. °1hese 3 experimental conditions refer to the design outlined in Table XLI. Each condition was a restriction upon wliich data base an exnert could 161 4-> TJ 0 Stand Devia \M CM O o • • o O o • s 3 s M 0 U I •8 -J 1 C O o i ! H i* ! « •K ^ -P 1 8 rH £ i A I i — $ to these data bases * ee ** and ** i sC! rH O • r• rH 00 0) o o• 3 8 Q Standard Deviatio] i ! i 1 1 1 •H •H m • O 0 CO • •H • O in H M i 03 O I A 1 w M * * ; CQ . S c TJ 0 •KT CM • O in •^* • CO r^ to to •P 1 rrt E to •r-J •• 4-* CO i 8 10 0 to to % • o as 5 •H co c^ • ro I CO u & I I r-l C O Stan Devi 1* n3 rd r-\ C\ r in vo cr. 00 rH CM rH rH rm ^ m «* rH r* ^r in r-i ON CO CO rH e n en pL in rH 9a •P «H 8 jjj * s V) Bas H3 0)iL>- I 8 rH VO in 00 1 in 1 r- 5 .H rH "00 rH CN ro O ^ CM ^r en en en *? Is* VO <*r rH XT CO CO VO K* -P o TT m "«* in vo c flj 1 < u rH Q4 (ii C/j 8 i 00 00 rH ^ c*• • en CN o rH rH rH CT> 00 VD rH • 00 in m cc rH CN o • VO CM • 00 CN • C^ CO o• • CO KD • en r^ *r • VO 00 S o O Pn 1 i 1 c 1 *• eved r* s »' DOCU E * 1 rO CN -H 1 *S it • K r-• c^ (N vo *T ro o^ r-i VO O CN ro • en in r r• •^r • m S rH r*» • ** ^ r~i VO ro o ^r o CO r-* 00 00 r-\ ** i n• o en en VO in o• o CN in m o r* • in •H 1 e n rH 8 O r-i rH rH 1 g«i S2 <> * c#> O # o r^ e n ctf> O CO dP O CO CO o> in c* VD o. r > 00 in • «y CM CM in o % • h o CO in oo CO VD CO O CM o CM CO m m r- 00 00 VD • in co CM o 00 m CO oo in ^ O tH o ^ rH co co CM CM 82 o o CM O CO o <#> o in o VD O O r- co o o> 165 ™ —r only o CO CO rH O vc CM o o o CO o o CO o o O O o CO o CN Q CO i 11 g £ M-l 3s (D a 8 C Q • • • co i n• • • o> o• rH o• rH • in — c o 3J o o CO O vo o o VD o ^r o o o o o VD 9 8 O ,0) 14-1 H C Q t/J i • *u pQ LO 8 S ^r « • VD • o • ^ 0) CD CO e •6 & «tf 1° P (0 VO CN •H in IS AH o C B | H» ' i vo rH • r- o> • rH CO co CO o^ • • CO r* r• CO CO VD VD co rr o rH g M • H o • rH in | ! 1 i in 1 1 8 5 I | 1 b r H 1 ta i i aVO LO CT} VD r- rro CO m c o CN CO ^r H ro ON VD KT CN r* 0> 00 CN VD CN cr. CN VD 1 C co -tf in 1 - • c\ O in CN ^ 1 • c» r 1 "d• ro i ! r | ! I I (D i § * ' 0 > 3 co 00 CO VD fi-S m -P ; ! o VD f-{ m CN «T CN • • ro VD rH • ro Ch o VD rH CN r-A CN T CN • -^ • VD rH rrH r-i • • ^ rH o CN • C O °s Is Recal Level HH r*» o> in rH ro i 8 <*o O rH dP O CN dP O C*> dP <> * o VD co o •^ O in df O <#> O CO dfi O r» o> 166 data bases. For the SDB the average cost per search is $0.13 (Table XLV), but according to Table XLVTI it equals (7.31-rl3.58)=$0.57. The explanation for this difference lies in what is being counted in each table. Table XLV data is based upon searches to the SDB only, while in Table XLVII the data comes from searches completed under the 'S' condition. Under the 'Sf condition, an expert was permitted to use the SDB as an aid in retrieving a document, lb achieve this goal the expert was required to use the DDE to retrieve documents. The data in Table XLVII is taken from 12 IRSs under the 'S' condition. Four of these did not include any searches to the SDB. Of the 163 searches only 37 were to the SDB while 126 were to the more expensive DDB. A similar situation occurs with the VDB. In Table XLV the average cost of a search to the VDB is $0.08. But from Table XLVIII it is $0.72 (15.04T20.80). Again this difference occurs because Table XLVIII includes 73 searches to the more costly DDB as well as 31 searches to the VDB. One tentative finding of this analysis of the average cost per search is that use of the VDB decreases the need to use the more expensive DDB and is, therefore, a valuable searching aid. This is evident when we look at the last..column of Table XLVI and Table XLVIII, and vdien we cxsnpare the average Cost. Sihce tinder both condition searches to the DDB had to be made, the only way the average cost per search to the VDB ($0.72) can be less than that to the DDB ($1.16) is if the use of the VDB reduces the need to search the DDB. At present this is a tentative finding because it is based upon incomplete unbalanced data. However, as we sliall see, other evidence of a more reliable nature supports this finding. A similar argument can be made for the SDB, but because the 'S' condition was confounded with a learning or practice effect, it is impossible to tell how valid it wuld be. Another way of saying this is that the average number of logically different searches at all recall levels is about the same for the three data bases. The most they differ is apprximately seven searches (at the 90% level.) While the average number of searches seems comparable, the other three estimatesof computer costs differ widely across the 3 experimental conditions. In general, the 'D' is the most expensive, the f S' the least expensive and the 'V extends the full range between them. This general finding is true for all three measures and across the nine levels of recall. Figure 10 presents graphically the cost-performance differences betvaeen the 3 in computer dollar costs. Upon inspection, we determined that the radical decrease in performance of the 'V condition between the 30% and the 50% recall levels was due solely to one IRS. This may be artificial; if the remaining 10 IPSs in the 'Vv condition were available, the performance decrease could conceivably be averaged down. A similar pattern emerges if we plot either of the other two computer cost measures. It should be noted that all of the computer cost measures of the 3 conditions (with the possible exception of 'V') appeared to be log-normally 167 ESTIMATED RECALL RATIO 10% 90.00 80.00 70.00 60.00 50.00 40.00 30.00 20% ^0% 40% S0% r,0% 70* 80% 90< 20.00 *D' Condition* jg)'V Condition** g > H 8 10.00 9.00 8.00 7.00 6.00 5.00 4.00 3.00 &l • PER •S* Condition*** 2.00 - -* s^ 1.00 *Based upon 7 IRSs **Based upon 5 IRSs ***Based upon 12 IRSs Figure No. 10. Conputer Cost per Search To Achieve Nine Levels of Recall: A Conparison /\mong Three Treatment Conditions 168 ESTIMATED RECALL RATIOS 10,000 9,000 8' 000 7,000 6,000 5,000 4,000 3,000 2,000 10% 20% 30% 40% 50% 60% 70% 80% 90% D'Condition* 1,000 900 800 700 600 500 400 300 200 100 90 80 70 60 •Based upon 15 IRSs in Cortplete Study Figure No. H » Nuntoer of Documents Retrieved (Total Retrieval) to Achieve Nine Levels of Recall: A Comparison Among Three Treatment Conditions. 169 3.50 As one logical plus two redundant searches 4.00 2.50 2.00 1.50 As one logical search 1.00 0 100 200 300 NUMBER OF DOCUMENTS RETRIEVED *This splits the mean cost per search of $1.16 (Table XLI) into $1.12 for input processing and searching, and $0.04 for printing 100 LIST BRIEF citations (25 lines of output). Figure No. 12. Two Methods of Computing Dollar Cost of Searching on DDB* Using LIST BRIEF Output Format 170 distributed. Since the computer cost estimates are derived from a portion of the total experimental design, it was impractical to test this finding statistically. For the same reason, we did not attempt to determine the "significance" of the differences between the data bases on any of the computer cost measures. This is an appropriate place to explain the rationale underlying our reliance, this year, on logically different searches. As noted in the definitions, logically different searches are somewhat independent of tlie arbitrary system parameter which specifies the maximum amount of printed output. In this sense, logically different searches represent a good estimate of the minimum computer costs of operating SUPARS/DPS.11 Figure 12 presents a plot of costs as a function of number of documents retrieved by a search. For each logically different search there is the initial cost of input and processing, plus a steady slight increase as a function of the number of documents printed. Under the current system's parameter with LIST BRIEF output, the user is required to pay for the input and search processing after every 100 documents are printed. Costs under the current system can be described as a step-function where the size of the step depends on CPU processing, and the number of steps for a given amount of output depends upon an arbitrary parameter of the system. By counting logically different searches, we attempted to simulate the minimal cost conditions in which the 'output parameter is set-to the maximum. Up to this point in the discussion of cost-performance characteristics of the 3 experimental conditions, we have been solely concerned with various estimates of computer costs as our dependent variable. The findings have been useful to get some idea of the cost of making different types of SUPARS/ DPS searches. Comparisons among the conditions and data bases were considered tentative because of tlie incomplete nature of the data. The major dependent variable, number of documents retrieved, has not as yet been discussed. This variable is based upon the complete experiment, and except for the interpretation of the performance of the SDB, is more reliably interpretable. Total retrieval for the 3 data bases is presented in the first column of Tables XLVI-XLVIII. It is the same measure used to evaluate the performance of last year's version of SUPARS. Figure n graphs these data on a log-normal grid. A statistical analysis supports an inspection of the data; there is a difference betaken the treatment means at the higher recall levels (70%, 80%, 90%.) The analysis of variance sutirary tables are presented in the Appendix. Figure 11 supports the notion that use of the VDB as a searching aid materially helps inprove the cost-performance of the system. This is particularly true at the higher levels of recall. Figure 11 also suggests the value of the SDB, as it has the best performance curve. However, because This is a slight underestimation as we need to add $0.0015 for every line of output above that permitted by the system's operator. 171 of the lack of control over the 'S' condition, a plausible alternative explanation is tliat search experts learn how to perform tetter over time. It is unfortunate that this study is incapable of adequately determining the value of the data base of searches. d. Comparing the Cost-Performance of TMD Years of SUPARS Operation Two comparisons are of interest here. First, a comparison of costperformance in terms of computer cost. This is useful because the tvro years of SUPARS operation included a major change in computer svstems. Second, a comparison of cost-performance in terms of total retrieval. This will give us some idea of the stability of the DDB, IRSs, and experts1 ability to search efficiently. As would be expected, many changes have occurred between Fall, 1970 and February, 1972, which will affect this comparison. The major changes are outlined in Table XLIX. As we can see from the Table, it will be iiipossible to make straightforward comparisons. Three factors affect the comparison of computer cost; the third one affects the comparison of total retrieval; (1) the charges for computer service (from 360/50 to 370/155 increased 80% for CPU use and 36% for EXCP activity, (2) the performance capability of the twD computer systems differ,12 and (3) as noted in the last three columns of Table XLIX, there are differences in terms of what searches were counted and how EXCPs were treated. In addition, there are several other differences between the two operations which may have an affect on any comparison. The size of the DDB grew from 35,874 in 1970 to 46,828 in 1972. Moreover the free-text vords were coded differently: 16 bits per word in SUPARS I, and twice that in SUPARS II. This coding difference permitted an increase in the number of unique wDrds in the inverted file from 64,534 to 106,702. The extent to which these changes affect any comparison between the two systems is not known. To obtain a rough estimate of the average cost per search to the DDB from this year's data, which would be comparable to that of SUPARS I, we recommend the 3 computer cost estimates of searching the DDB reported in Table XLV by counting all 246 searches in the 7 IRSs. Of the 246 searches 111 are logically different (reported in Table XLV) and the remainder are the "redundant11 ones.13 Table L gives these cost figures. 2 Depending upon the nature of the data and the type of operation being performed, the 370/155 is 3-4 tinres faster than the 360/50 in terms of computation. The 3330 disc devices have a transfer rate of 2.6 times faster than the 2314s. Seek time across a cylinder is 50% faster and time within a track is 33% faster in the newer discs. We cannot determine what this increase in ocmputer performance means to SUPARS specifically. As a data retrieval system, the improvement in disc performance probably affects SUPARS more strongly tlian raw computation power. 3 Since these make up 7 complete IRSs the relative proportion of logically different searches to "redundant" searches should be reasonably representative of the use of the DDB. 172 H 3 W M w M • 8 £6 0) Mh*lsa •H - H 8 a a .3 as 1 8 B W* U § Q) 0} 43 tp • 4J U (D o HI £8S SB J <3j ^ ^o ^e^ mi < D U-4 ° 0« 0 S8.8- a m i 3 in fi in 5 9§ c in g Condition, SUPARS II Simple Logic Level,SUPARS I *V Conriition, ' Condition, SUPARS II g 1000 g 800 P 900 8 700 O t, GOO 400 g En 300 200 100 90 80 70 60 Figure No. 14. Number of Documents Retrieved (Total Retrieved) to Achieve Nine levels of Recall: A Comparison Among the SUPARS II Conditions and the Most Efficient Use of SUPARS I 178 searcher. Apparently, use of the VDB decreases the need to search the DDE. This is supported by the data presented in the last columns of Tables XLVI and XLVIII. The search experts used the VDB primarily as an adjunct to the truncation operator. With a free-text system, a truncation operator is a great help pulling together irony vords with the same root, but with different suffixes. Tb search a system such as SUPARS without a truncation operator might be more efficient (See Figure 13), but it would be more arduous for the searcher. This is suggested by the fact that approximately 90% of the search inquiries to the DDB included the truncation operator — even tlough the experts were admonished to muiimize total retrieval. Use of the truncation operator occasionally decreases the precision of a search because it may retrieve documents containing irrelevant keywords with the same root as the desired keywDrd. By entering the root into a search inquiry of the VDB, one can identify those irrelevant keywords and eliminate them from the inquiry to the DDB. This pairing of the truncation operator with the VDB allows an expert to keep the searching easy, but maintain a high precision ratio. An added benefit to the pairing of these two searching aids is that the cost of searching the VDB is quite low (See Table XLV); searchers can be encouraged to use the VDB liberally with marginal increases in cost. Finally, it should be noted that the cost-performance of SUPARS II under the 'V1 condition is better than that of the most efficient version of SUPARS 1.19 Thus, the relationship between tlie truncation operator and the VDB is not siitply a trade-off — what one loses in performance, the other gains. Rather, the VDB appears to be a worthwhile investment regardless of the use of the truncation operator. b. The SDB as a means of synonym control working within the system.20 The SDB is one way to develop a list such a list is not available frcm either the or professional indexers. The SDB was developed constraints of a free-text of synonym-like words when publisher of the data base As shewn in Figures 10 and 12, the 'S' experimental condition proved to be most efficient. However, search experts are unanimous in their agreement that use of the SDB was not noticeably helpful. This, nlus the fact 8 For example, see J. II. Williams, Jr. "Functions of a Man-riachine Interactive Information Retrieval System.1' Journal of the American Society for Information Science, Volume 22. 1971. Pages 311-317. 9 Though we have not tested this statistically. As shown in Figure 14, the curve for the 'V1 condition is lewer than the curve representing the most efficient version of SUPARS I, and tliis is true for all nine levels of recall. °Another use of the SDB is to provide a data base for those who wish to study how searchers interact with an on-line retrieval system, but tliis is not of interest here. 170 that the 5 IRSs under the 'S' condition were searched last by the experts, leads to the conclusion that the cost-performance curves depict a learning effect. While not an implausible finding, it is useful to know that experts can improve their searching abilicy noticeably over time. In terms of evaluating the SDB as a searching aid, the evidence suggests that in its present form it was not useful. The usefulness of the SDB may increase with its size. The current data base consists of 4,235 searches, which is not a large number considering the numerous topics covered in the 15 IRSs searched by the experts (See Table XLII). Another factor which lessened the use of the SDB was the knowledge of the search experts. As frequent users of SUPARS, with formal training in psychology, the 3 search experts many times found it easier to generate synonyms frcra memory than froii a small SDB. Perliaps less frequent users of the system or less knowledgeable users in the subject area would have a greater need for the SDB. Since the SDB is a true data base, a system designer has to allocate a large amount of storage for it. And, as its size increases he would expect the cost of a search inquiry to the SDB to rise to that of an inquiry to the DDB. Thus, a SDB is a relatively large investment for a retrieval system which might only pay off after several months or years when the total number pf searches to the DDB is large enough to constitute a viable SDB. If a SDB is too costly, a system designer should consider alternative means to make synonym-like words available to the searcher of a freetext system.21 c. Differences Among IRSs and Experts The 15 IRSs differ according to the ease with which experts can achieve a 90% recall level. The magnitude of the difference is staggering. Three experts achieved a 90% recall level for IRS 7 with an average total retrieval of 242 documents. The corresponding figure for IRS 2 .was 11,321 — 47 times as many documents. The difference has little to do with SUPARS per se and probably cannot be reduced (toward the lower number) by the system designer. Tliree interrelated factors contributed to the difference: (1) the generality and/or specificity of the IRS as written by the subject, (2) the stringency with which the relevance criterion was applied when the subject identified the relevant documents and (3) the breadth of topic areas published in Psychological Abstracts — each topic differing in the specificity of its technical language. It is likely that the cost-performance of SUPARS/DPS would improve if DPS has one such alternative. The system designer can load a list of synonyms or equivalent terms into a DPS file. The searcher can use this file, on option, to augment the keywords in his search inauiry. Since SUPARS does not have such a list available, we could not make use of this option. 180 its DDD covered fewer topics, and each topic had f v ^ r terms referring to evc each important concept. As one would expect, the 3 experts differ in their ability to use SUPERS efficiently. The size of this difference might be unexpected however. At the 90% recall level the total retrieval, averaged over 15 IRSs, ranged from 707 to 7,840.22 Search experts, equally trained and equally knowledgeable have characteristically different ways of attacking an IRS. At tliis time we do not know what the differences are among the experts which account for the differences in total retrieval. Once these differences are identified, one can choose search experts or intermediaries more selectively. Or, if the system is available to a general population, the system designers can use this information when they develop the user interactive language and v t e they prepare training materials rin for the potential user. If differences between trained users of a retrieval system are of this magnitude generally, then intensive study into this matter is needed, because better training of users or a more adaptive interactive language might contribute more to improving the cost-performance characteristics of a system than costly sof tware developments. 5. SUMMARY OF FINDINGS Several of the major findings of this study of the cost-performance characteristics of SUPARS II cure listed belcw. The listing is only meant as a partial surrmary and as such does not include explicit limits, exceptions, or explanations of the findings. a. Search experts did not find the SDB to be a useful aid in retrieving documents. The 'S' experimental condition did have the best cost-performance characteristics, but this was attributed to learning effects, rather than to any intrinsic value of the SDB. b. The VDB is a good investment in a free-text system such as SUPARS. It is relatively inexpensive to add to the system, and search inquiries into the VDB are the lowest costing of all 3 data bases and should remain so as the data bases grow. More importantly, use of the VDB decreases the need to search the more expensive DDB. Therefore, inquiries carried out under the •V1 condition had a better cost-performance relationship than searches under the 'D' condition. c. Searches carried out under the 'D' condition had the poorest cost-performance curve. 22 Uie differences among experts sure statistically significant at all recall levels except the lowest (10%). 181 d. The cost-performance curves of searches to the DDB in SUPARS I and SUPARS II are remarkably similar insofar as total retrieval is concerned. DDB search inquiries in SUPARS II are about 14% less expensive in terms of dollar cost than similar inquiries carried out in SUPARS I (using the same charge for computer services.) The decreasing cost is attributed to the ijnrprovement of hardware performance in the new IBM 370/155. e. Search experts found the truncation operator too valuable to discard, even though the evidence fron SUPARS I shows it decreases cost-performance. The use of the VDB paired with the truncation operator seems to be a useful match: It meets the searching style of the experts and improves the cost-performance beyond that of the best sub-system tested in SUPARS I. f. Some means of synonym control is needed. The SDE is one means of achieving it when other alternatives are not available. However, the value of the SDB may not become evident until the number of stored searches is ouite large. For many systems this requires a large initial investment with no consistent pay-off for several months. g. Some study into the differences between searching styles may conceivably contribute greatly to the cost-performance of a system. 182 APPENDIX V I I INSTRUCTIONS TO SUBJECTS 183 APPENDIX VII INSTRUCTIONS TO SUBJECTS Choose a specific or general topic you need information for right new. If you are doing a paper or planning a talk you probably have a topic in mind. If you don't have any current topic you are working on, oonsider one you are familiar with. In order to acquire this information for your topic we want you to write down your information requirements as if you ware talking to a colleague who understood the field as well as you do. (1) Start off by making a broad general statement. your topic. Give an overview of (2) When that is oorpleted give as much specific information on your topic as you would give to a colleague. (3) After you have written the suirmary, make certain that you have given as much specific information as possible. Write down such things as the major author and/or people in your topic area, recent publications, related concepts to the topic, or any other clues that relate to your topic. In writing this sumrary give us enough of your thoughts so that we can theoretically go out and get this information for you. 184 APPENDIX VIII ANALYSIS OP VARIANCE SUMMARY 185 Source of Variation Among Treatarent Conditions Experts IRSs Residual Tbtal Sum of Squares df Mean Square 504170 523088 2136598 3162212 6326068 2 2 14 26 44 252085 261544 152614 121623 Standard Error of the ?tean = 90.05 Standard Error of the Difference Between Means = 127.34 Figure No. 1. Differences Among Three Treatment Conditions at the 10% Recall Level 186 Souroe of Variation Among Treatment Conditions Experts IRSs Residual Total v" Sum of Squares df Mean Square 1406988 2223255 5493483 5826493 14950219 2 2 14 26 44 703494 1111628 392392 224096 Standard Error of the Mean » 122.23 Standard Error of the Difference Between Means » 172.86 Figure No. 2. Differences Among Three Treatment Conditions at the 20% Recall Level 187 Source of Variation Among Treatment Conditions Experts IRSs Residual Total Sum of Squares df Mean Square 1171550 3558931 10116553 6884394 21731428 2 2 14 26 44 585775 1779465 722611 264784 Standard Error of the Mean = 132.86 Standard Error of the Difference Between Means = 187.89 Figure No. 3. Differences Among Three Treatment Conditions at the 30% Recall Level Source of Variation Among Treatment Conditions Experts IRSs Residual Total Sum of Squares df Mean Square 670002 10826397 23116099 19091616 53704115 2 2 14 26 335001 5413199 1651150 734293 44 Standard Error of the Mean = 221.25 Standard Error of the Difference Between Means = 312.90 Figure No. 4. Differences Among Three Treatment Conditions at the 40% Recall level 188 Source of Variation Sum of Squares df Mean Square Among Treatment Conditions Experts IRSs Residual Total 4538354 23623521 43113614 32147044 103422533 2 2 14 26 44 2269177 11811761 3079544 1236425 Standard Error of the Mean • 287.10 Standard Error of the Difference Between Means • 406.02 Figure No. 5. Differences Among Three Treatment Conditions at the 50% Recall Level Source Of Variation •AncSng Treatment Conditions Experts ' •".IRSST ." .. Sum of Squares df Mean Square 8504225 33061006 71507851 47738801 160811883 2 2 14 26 44 4252113 16530503 5107704 1836108 Residual Total Staaidard Error of the Mean = 394.87 Standard Error of the Difference Between Means * 494.79 Figure No. 6. Differences Among Three Treatment Conditions at the 60% Recall Level 189 Source of Variation Among Treatment Conditions Experts IRSs Residual Total Sum of Squares df Mean Square 42632120 77944731 138288573 83901321 342766745 2 2 14 26 44 21316060 38972365 9877755 3226974 Standard Error of the Mean = 463.82 Standard Error of the Difference Between Means = 655.94 Figure No. 7. Differences Among Three Treatment Conditions at the 70% Recall Level Source of Variation Among Treatment Conditions Experts IRSs Residual Total Sum of Squares df Mean Square 124757078 194712111 327989517 289814769 937273475 2 2 14 26 44 62378539 97356056 23427823 11146722 Standard Error of the Mean = 862.04 Standard Error of the Difference Between Means = 1219.11 Figure No. 8. Differences Among Ihree Treatment Conditions at the 80% Recall Level 190 Source of Variation Among Treatment Conditions Experts IRSs Residual Total Sum of Squares df Mean Square 195418483 382735553 485520111 320914847 1384588994 2 2 14 26 44 97709242 191367777 34680008 12342879 Standard Error of the Mean = 907.12 Standard Error of the Difference Between Means = 1282.85 Figure No. 9. Differences Among Three Treatment Conditions at the 90% Recall Level 191