SECTION II DATA EASE Kenneth H. Cook and Lynn Trunp This section describes the development, organization, and growth of three separate data bases used by SUPARS/DPS n . in order to give a broad and general overview of how data flows through the system, Appendix A presents a description of the utilization of data as it is channeled through the various programs. The three data bases that were developed were: (1) Document Data Base: includes bibliographic citations and abstracts (documentsT iron Psychological Abstracts (January 1969 - June 1971), (2) Vocabulary Data Base: all unique terms identified through the free-text processing of documents and stored in the DPS inverted file, and (3) Search Data Base: search Inquiries originally entered by users to search the document data base, including the search words and Boolean operators. A data base of documents wds developed during the previous year's experimental work in 1969 - 71;' the vocabulary and search data bases were developed as new searching algorithms for the 1970 - 71 research. 1. DOCUMENT QATA BASE The document data base was developed from machine readable types of Psychological Abstracts rented from the American Psychological Association. Ihe general rules for exclusion of common words, special character handling, sentence and paragraph endings were the same as reported in the July, 1971 report. (2) Because of formating changes and character set changes made to the 1971 KV tapes by the American Psychological Association, modifications had to be made to the SUPARS/t>PS programs in order to proaess data. First, program TRANSLATE had to be modified in order to deal with the standard IBM scientific Character train using upper case characters rather than the previously used special multiple alphabet set of characters. Second, program REFORMAT had to be modified to accomodate (a) changes of document "fields" by APA and (b) the new use of new, fixed length fields to be used as pointers to the actual variable length fields of data, rather than the previous use of all variable length fields. A description of the general logic and a flowchart of the TRANSLATE and REFORMAT programs is given in Appendix B. A chart of the fourteen fields, such as author, title, source, abstract, etc. used to organize the data for each document are also given in Appendix B. Figure 1 of Appendix B lists the fields as found on the original APA tapes before translation; Figure 4 of Appendix B, gives the translated fields that have been reformatted and 17 are ready to be processed (loaded) by the SUPARS/DPS programs. After proceeding through a VALIDATE program, which checked each document for maximum length allowable, and deleted those which were too long, documents were ready for processing, or loading* Documents from January 1969 through Novenber 1970 were processed through the two-phase DPS program that developed an inverted file of free-text words and documents. Ihe loading history of the SUPARS/DPS II document data base is shown in Table I. A total of 46,828 documents were loaded into the data base. The number of unique words and terms derived from free-text indexing of these 46,828 documents amounted to 106,702. Documents were loaded into 6 separate batches as shown in the six rows of Table I, and included tlie months of January 1969 - June 1971. A graphic representation of the growth of the data base is shown in Figure 6, "Growth Rate of SUPARS/DPS Data Base." The six numbered points on the chart represent a separate batch of documents that were loaded, and correspond to the six batches of documents shewn in Table I. Another indication of the size of the document data base is the number of tracks used on the main storage devices, the 2314 and the 3330 disk pack. The track usage is shown in Table II. Three separate files are maintained in SUPARS/DPS: (a) Dictionary, containing each uni<^ue word and the document postings, (b) Vocabulary, cxDntaining unique words, and all document numbers which contain the word, and (c) Master, containing a coded representation of the entire document. For each file, oenparisons are given of the nunber of tracks allocated and the number of tracks used on the 2314 disk pack (used with the 360 operating system) and the 3330 disk pack (used with the 370 operating system.) During two separate periods of time during the growth of the data base, the vocabulary file had to be reduced in size, or restructured, in order to fit into the available storage room on the 2314 disk packs. A restructure program allows for a more conpact and efficient storage of the large strings of document numbers listed after each unique word in the vocabulary. The first restructure program was acoonplished while the 360/50 operating system and the 2314 disk packs were in use and was done on the data base of January 1969 - April 1970. As a result of this restructure, the amount of storage spaoe needed on the disk packs was reduced by 45% in the vocabulary. •Jhe smaller document nuntoer strings contributed to both an increased search efficiency and a greater amount of available work spaoe. A second restructure was necessary when the 370/155 operating system was installed in December, 1971 and the 360/50 system was removed. The different relative track addresses on the new 3330 disk packs from the old 2314 packs necessitated a restructure of data for the vocabulary, dictionary, and master files. Ihe reduction in size of the three files from the 2314 to the 3330 in terms of the number of tracks used is given in Table II. For 18 ro • »d VD 00 00 o 00 en ro 5 £ ^t VD CM VD «•» C^ o ^ CN *. O r- ro in CTi . £ CN »H CN VD 00 CO in CO rH CO ««T CO % ^ VD * as fe r* CN *y rH VD CN VD ^f I - •» rH •H * r* rH in 00 CO V VD *r * ro rH 00 VD O ro 00 ro * CO rH * CO CN r* * CO 00 CN 00 8 00 T* 8 Pj a M ro 3 « * * VD «* VM 1 col in CO o\ r* o VD rH in o VD o0 0 in CN C\ o in ro o CO r> H 00 a\ g cr> VD ON VD CTi iH o r- rH rH °> V *-! r~ •• 0) Jan-June CM« July-Sep May -Sept ^ -P c 3 Oct Jan I ? o 1 I .5 CO -Hi •6 CN CO in VD a 19 X Oct 1970-Jvne 1971 - ib^m^ —: —Key-Sept 1970 /&. • • J a n - A w 1970 -Oct-Dec 1969 July-Sept'1969 Jan-June 1969 * SCALES ARE NOT IN PROPORTION ** SEE TABLE I FOR EXACT FIGURES 10 • II II 20 D 0 C U 'ti~E I! 1 1 1 1 30 NTS** • • if I l l - HO 50 (IN THOUSANDS) Figure 6. Growth Rate of SUPARS/DPS n Document Data Base 20 V£> ID CO x? cH o co in #f - co CM oo o rH ro ** CM oo in fH TT VO 00 CM f I 00 I V£> r oo CM 8 o o CM O O o in m rH o in KD o o o CO o o CM co CM o o O O O CO I I I O 00 O CO .5 I in rH ««r in S 8 (M O r- jq ^ en o | > | f 0> VD ? ? ? f V 0> V£> 0% V^ CT» VO 0*\ VD O VO CQ (3 (g lo 8 <3 < D 38 c> / Is M 3 21 exanple, in the dictionary, for the docurents January 1969 - April 1970, the number of tracks used on the 2314 was 286, while the number of tracks used on the 3330 disk pack was only 156 after the restructure. By the time the total restructure was completed, only 1 1/2 3330 disk packs as compared with four 2314 disk packs were used to store the document data base, the search monitor, and DPS search modules, and the search data base (explained below in sub-section 3, "The Search Data Base"). 2. VOC&BUIARY D M A BASE A second major data base that was made available on-line to users of SUPARS/DPS was the entire free-text vocabulary. "The vocabulary contains all the unique words taken from the free-text processing of documents and stored in the inverted file. The vocabulary, called the "dictionary" in the original DPS programs, is actually a by-product of the free-text processing. Ttie total size of the vocabulary reached 106,702 words by the time documents from January 1969 - June 1971 were processed. Figure 7, "Cutnulative Growth Rate of the SUPARS/DPS Vocabulary Data Base" shews the cumulative growth of the Vocabulary data J?ase for each batch of documents processed. In order to make use of the words contained in the vocabulary and have them accessible in an on-line mode for users to query, special search programs had to be developed. The term "delta V" is used to describe these types of searches, because the character "delta" and "V" are typed by the on-line user to access the vocabulary data base. au Special Programs Developed for a Vocabulary Data Base T\*o new search modules had to be developed to handle the Vocabulary Data Base. The programs were necessary in order to access the vocabulary as a data base rather than have it operate in its normal capacity as a special handling procedure for the DPS Dictionary file. The vocabulary control program is a examination, with a great deal of modification, of the existing programs of the standard DPS search routine which is called by the search control program for all Delta V searches. The vocabulary control program reads user input line-by-line, storing the first user-entered search word. If the user has not established a limit to the number of words to be printed as output, the program sets it to 100. That is, a maximum of 100 items (words) will be allowed to be outputad. The program then calls the interface program, which is a modification of the DPS program which actually locates the words in the file of all free-text words. TWO output options are currently available. The first sirrply calls for a frequency count of documents in which the user-entered word appears and returns the record address of the word to the control program. In the case of a search where the user enters a word prefix, or root, and wants all words containing the root to be printed out, a maximum of 100 words at a time are located that contain the root. The string of data record addresses of these 22 o 5 I O v-i o CM I s & m CO 77 , 6 2 6 a, o o CO EH I m 3- * 1 — ; •g • H £ s cr> O CO o Co en 00 N « i o N CD »-3 rH 1 co CM • CO t t CN T"J 1 iH | O 1 to o o o 1 1 I o 1 1 o if 1 1 o CN <> x ( S at! venom NI) 5 Z I S XVYMeVDOA XXSX-SHUd 23 words are then returned tc the control program. In both cases, when control is returned to the vocabulary program, it reformats the output for the user and collects statistics that are stored for later retrieval. The third option, which would allcw the user to request the printing of words procediixj and following a specified word in the vocabulary, was never implemented, although programming was begun. Time restrictions did not allow the programming work to be completed, although initial efforts were begun. The major difficulty in implementation rests in the complicated structure of the DPS dictionary file. Ihe embedded master and block index records greatly complicates the problem of reading and collecting words .preceding a specified word, although the task of reading words following the word is riot as difficult. In addition to the problems of developing new search modules to access the vocabulary data base on-line, another major programming effort in the development of the vocabulary data base was the expansion of the total word size capacity of the vocabulary. b. Expanding the Capacity of the Vocabulary Data Base One limiting factor in th£ original DPS program was that the size of the vocabulary (dictionary) file was limited to 65,534 free-text words because of the half-word, or 16-bit coding scheme, i*e. 2 1 6 • 65,534. When this limit was reached, a new file had to be defined. However, the new file would collect all new words, including those in the first file. Each file of documents would have to be searched separately with its own r a vocabulary. *w If a larger capacity could be developed for-the vocabulary, only one file would be necessary which would make for less cumbersome and more efficient searching of the data base. A second advantage would be the opportunity for the user to access a oorplete file of all free-text words that would be available to him in developing search inquiries. To increase the size of the vocabulary and maintain it in one file, the half-word, or 16-bit coding of a word identifier, was changed to acoept fullword coding of 32 bits, i.e. 2 3 2 . This change increased the capacity of the vocabulary file from approximately 65,000 words to over 4 billion words, as 2 3 2 equals 4,294,967,296, In order to proaess 32-bit word identifiers, the format of all data records containing the word identifier field (WID) and all programs creating and referencing those records had to be altered. . Ihe WID fields in the Dictionary, Master, and Master Identification Update records and those in the qictionary record area in the search DSECT were increased from two byte (halfword) to four-byte (full-word). Consequently, the relative addresses of all subsecfuent fields in tft&se records were displaced. The loading programs (PRELOAD, SORT and DCftD), two search modules (KEYWORD and POSITIONAL NOTATION PROCESSOR) and all three versions of the dictionary interface program (DICTIO) had to be changed to handle the new record formats. Progranitdng included converting pertinent half-word instructions to full-word instructions, changing the displacement values for references to all fields following WID in the data records, and altering reoord-length calculation routines to take 24 onto account the new field length. In addition to the changes necessitated by the change to full-word word identifiers, substantial changes were made to all search modules. First, the unnecessary coding was deleted to save space. All instructions and variables referring to the synonym, equivalent, and text files were renoved, as were unused routines handling weighted keywords and unlabelled search statements. Second, in order to save space in the search monitor and facilitate handling the many different output requirements, the output formatting formerly handled by the search monitor was incorporated into the search programs and new formating routines were written where necessary. Finally, all data entered by the user or written to him by the search monitor program were written into separate intermediate disk files by the search and monitor programs and file acaess routines were added to all affected prograirs. 3. THE SEARCH DATA BASE Ihe third major data base which was developed for SUPARS/DPS n was the search data base which contained the collection of search inquiries that had been previously submitted to the system and subsequently stored during October3eoenter, 1970, m$ NoventoeiHDeaenber, 1971. The development of this data base sinply meant processing each" search thrOu^S the SUPARS/DPS loading programs in the same manner as the document had* been. This ibading process created a data base containing its own dictionary, vocabulary and master file. ihe searches were loaded in two separate batches: those of the 1970 period of system operation and those fron the 1971 period. A summary of searches loaded and the niattoer of free-text words generated from those searches is shewn in Table III, on page 26. Table IV gives an indication of the size of the three files contained in the oorrplete search data base by the nuntoer of tracks used on the two different disk storage packs. Ihe DPS Dictionary file contains all unique words followed by a string of document nunobers in which a coded representation of tlie entire document is entered into the system. TABLE IV GHDWTH OF SEARCH DATA BASE Batch 7H #1 #2 Dates Oct-Dec '70 Nov-Dec '70 TOTALS Searches Loaded 2,409 1,826 4,235 Total Searches Loaded 2,409 4,235 New Words Added 12,016 5,477 17,493 Total New Words Added 12,016 17,493 25 o ro rsi CO Cs IT) CN CM 00 jo CN O VD Oh CO I 5 rH CM J V0 CM r ro in i 1 O CO CO ro o o CM o o CM o m P- 5 r* o o o rH O O C M LO I I I O O in CM ! i co CO ° W CO U) f fl fl | 1 1 VD rH O CM 9 9 s r CO O^ "3* VD rH O CM 0) 03 M CO - 9 10.-11 12-13 14-18 19-20 21-26 27-30 31-34 35-38 39-42 43-46 47-50 51-54 55-58 59-62 63-66 67-70 71-74 rteoord length Generation code Year Volume nurnber Issue Number Abstract nuntoer Type of publication Journal title code Language Availability Directory Fields Binary control field tumeric code for 1971 documents onlyblanks for 1970 documents blank or FRGN Subject Index Codes All directory fields are ric£it justified Subject index phrase numerics, blank-filled on the left. The Author Subdirectory fields contain the displacement of the Designator other than first byte of the corresponding variable author length data field relative to the first Affiliation of first data byte, (generation code) of the author record. Publication title Source document title Source document description Abstract Abstractor's name Variable Nurrber of Fixed Length Fields Number of classification codes Classification codes Nuntoer of subject index codes Subject index oodes This is the last field which begins at a fixed displacement. Each is 6-digit code Each is 5-digit code All are left justified 75-76 Subject index phrase Author Subdirectory Author (s) Designator Affiliation Publication title Sourae document title Source document description Abstract Abstractor's name Figure 1. Input Record Description 35 2 characters: right justified count of nurrber of authors 2 characters each: Number of characters in each author's name ( , £ " w ) GETREC L ~"$ IPT NU " l—K ™ISH ) TRAN5LAW TEST EACH BYTE CNVRSj CONVERT TO YES VALID CHARACTER N O NO ± PRINT ERROR "* Figure 2. Translate Psychological Abstracts Logic Diagram 36 TRANSLATE C PARAMETER CARD t BEGIN 3 4L READ IN VOLUME 4 ABSTRACT RANGE ERROR MESSAGE PRINTED |>T FINISH J roo iifia—frf FINISH ) Figure 3. Translate Psychological Mastracts Flow Chart 37 1A 2 M TRANSLATE EACH 3YTE OF INPUT ADD 1 TO ERROR COUNTER] & IViAKE REPLACEMENT. SET UP TO CONTINUE TRANSLATION. SAVE AOST 4 -SET UP TO CONTINUE TRANSLATION Figure 3. (Gontinued) 38 2. 3. ERROR tape - see Input record description PRINTED OUTPUT The following messages may be printed: (a) TAPE HAS NO COPYRIGHT MESSAGE (b) ABSTRACT NUMBER nnnnnn HAS xxxxxx ERRDR(S) where nnnnnn and xxxxxx are the abstract nurtber and error count as computed by the processing program. Program Name: ABSTRACT REIDFMAT is a BAL program written to run on the IBM S/360 Model 50. Its purpose is to reformat the data contained in each Psychological Abstracts record into a format that is compatible with the SUPARS/DPS input record description. It acoonplishes this by rearranging the fields, inserting termination character, truncating field which exceed the maximum acceptable length, and desigrfating sentence terminators. REFORMAT Conputer Definition 1. 2. 3. 4. IBM S/360 Model 50 TWo 2400 tape drive facilities and 9-track MDdel 1403 Printer Core requirements: a. Assembler 140K b . Linkage editor 128K c. Program execution 20K tapes System Description 1. 2. 3. Syracuse University Operating System (SOOS) Assenbler Level F translator program Linkage Editor Level F program Program Description This program takes as input the TRANSLATED Psychological Abstract tape. It processes 1 input record at a time and produces either 2 or 3 output records for each. Each document is assigned a DPS assension nurtber. J Ihen the fields are broken down and reconstructed into a format suitable for DPS processing. The first record contains all bibliographic fields with their termination identifiers and, if there is room, the reformatted 39 abstract. The abstract is rewritten so that the character handling statements will process the punctuation properly* If the abstract will not fit in the first record, it is outputted as the second record for that document number. If the abstract is too long for a record it is truncated to the maximum length allowabJfe for an output record — 1646 characters. The last record for each document is the text portion, paragraph B, sentences 1 through 4. Input See input record description for Translate Program (Figure 1.) Output 1. Printed output The follcwing messages may be printed when this program is run: CDPYRIGHT STATEMENT MISSING LENGTH ERROR FOR ABST NUMBER nnnnn - if a field or the entire record exceeds limits by DPS. ERROR IN DIRBCTbRY FIELD OR AUTHOR SUB-if a narv-nuneric field found END OF PROCESSING - for successful termination of job LAST DOCUMENT NUMBER ASSIGNED WAS xxxxxx 2. REPORMATED DATA For each input record, 2 or 3 output records are produced. If there are 2 output records, the first contains the bibliographic data and Text Paragraph A; otherwise the bibliographic data and Text Paragraph A data are separate records. Text Paragraph B is always the last output record. Each output tfeoord is preoeeded by a 4 byte control field containing the record length. (See Figure 4.) 40 Field Length Data Item Bibliographic Data 6 4* 2* 5* Variable*** DPS Ascension Number Year Volume Number Abstract Nunfoer Author Editor Affiliation Article Title Souroe document title Souroe docurrent description Language Type of Wade** ." '• Random Number i4>stract Text Paragraph A •••*——•• •• ••!•!• » I — !•«•• HI • • II I II I l l 111 •• I II • • 1 4 Variable Paragraph indicator Abstract Text Paragraph B 6 Variable*** Variable*** Variable*** Variable*** Document Number Article Title Souroe Doc. Title Author Affiliation *Field is terminated with the character •• to meet DPS requirements. ihis character is rtSt included in the listed field length. **Field length range is 1 to 255 characters. No field entry in the input document is indicated by one blank as the bibliographic entry; input fields exceeding the maximum are truncated to 254 characters and an asterisk is added as the final character to signal truncation. ***Bitry is followed by a sentence indicator. Figure 4. Reformatted Data 41 REFORMAT C Taps generated by TRANSLATE program 3EGIN ^9© PRINT LAST ---w COMPUTE CURRENT DOCUMENT NOidflER DOCUMENT ,A ASSIGNED -T FINISH j iViOUEFIXD SETUP SET UP LENGTH MOVE FIXED FORMAT FIELDS [sjoc ADDRESS FOR "tq\/ARIAdLE LENGTH TO OUTPUT IFIELDS USING 3UF.FE* 1DIRECTDRY .,- qciJiHPfr YES ©> ujRIT MOVE VARIAdLE LENGTH FIELDS TO OUTPUT 6UFFER aiHLlOGRAPHIC iECORD -M NEJJ INITIALIZE OUTPUT RECORD © Figure 5. Reformat Psychological Abstracts Logic Diagram 42 © MOVE TEXT FORMAT OF ABSTRACT TO! OUTPUT (CREATE TEXT RECORD vvlITH REMAINING ! FIELDS Figure 5. 43 (Continued) REFORMAT ( 'BEGIN J Tape qnneratod by TRANSLATE prooram TA^E VALUE OF LAST DOC. * ASSIGNEO FROM PARN FIELD ® PRINT LAST DOCUMENT # ASSIGNED READ INPUT •f FINISH J PHI NT. ERROR MESSAGE COMPUTE CURRENT DOCUMENT NUMBER I'IOVE EACH FIXED FIELD OH NULL INDICATOR TO OUTPUT « Figure 6. Reformat Psychological Abstracts Flowchart 44 © INSERT TERMINATOR AFTER EACH FIELD ENTRY FILL FIELD LENGTHS AND DISPLACEMENTS IN VARIABLE FIELD TABLE NULL FIELD INOICATJW IN OUTPUT NO YES IwiOVE NANE(5) |AND TERMINATOR TO OUTPUT FIELD 10VE ALL REGAINING VARIABLE FIELDS TO OUTPUT MOVE LANGUAGE AND TYPE FIELDSl TO OUTPUT MOVE NULL FIELD INDICATOR TO OUTPUT CALCULATE RANDOM NU.HdER AND -HOVE I T TO OUTPUT @ HIOVE FIRST 254 CHARACTERS OF ABSTRACT TO OUTPUT Figure 6. (Continued) 45 NO, THERE AN A-3STRAC; IS JiQ_ NE,AJ INITIALIZE OUTPUT RECORD TFXTntlT . P.3 [f'10V/E ABSTRACT I N TEXT FORMAT! [TO OOTPOT m\lt l\35{RI\CT TO INDICATOR OUTPUT MOVE TITLE, SOURCE TITLE,] AUTHOR,AFFIL AND SENTENCE INDICATORS TO OUTPUT Figure 6. (Continued) 46 TLXTUUr ( .ENTER J ©. -EM TEST EACH BYTE OF ABSTRACT •.• FOR NO PIITRAN5-P.5 ftOVE FIELD TRANSLATED TO OUTPUT ADD 2 BLANKS AT END OF SENTENCE ( RETURN T u \ WAIN PGKi.y 'RAM5-P.5 •YiOWE FIELD TRANSLATED TO OUTPUT -©. YES SET UP TO CONTINUE JJITH NEXT BYTE O /RETURNTON VIA IN PGffl. V Figure 6. (Continued) 47 APPENDIX III SEARCH REFORMAT Program Name: SBCREMAT ABSTRACT SBCFRMAT is oarcposed of two BAL programs written to run on both the IBM 3/360 MDdel 50 and the IBM S/370 Model 155. Hie modules reformat data collected by the SUPARS monitor into the format defined by the SEARCHES data base description, a format acceptable as input to the SUPARS/DPS loading and search programs. Computer Definition 1. IBM S/360 Model 50 or S/370 Model 155 2. Two 2400 tape drive facilities and 9-tarack tapes 3. Mfcxtel 1403 Printer 4. Core requirements: a. Assenbler 140K b. Linkage Editor 128K c. Program Execution System Description 1. Syracuse University Operating System (SUOS) 2. Assenbler Level F translator program 3. Linkage Editor Level F Program Program Description This program is two BAL modules combined into one load module by the linkage editor. BIBFLDS is the main program. It reads as input the statistic records collected for it by the STATPAC programs, processing 1 input record at a time and producing for each 2 output records. Each document is assigned a DPS asaension number. Then the bibliographic fields are extracted from the raw data and moved to the output record in DPS format. Control is passed to iie second mcdule, SBCFRMAT, which reformats the remaining bibliographic fields and Paragraph A of the text portion of the record. Control is returned to the main program and output Record 1 is written. The renaining text data is formatted and written as Output Record 2. Processing continues jntil all documents have been reformatted. Before terminating a message is ^dritten out of the last document nurrber assigned. 49 Figure I f used in adjunction with the DPS Program Description and Operations Manual (H20-0477-1), pages 27-47, gives the oonplete data base description for the search data base. 50 CONCORD SEARCHES Y FILE SEARCHES 'FID DOCNO EBCD FLD SSN FLD LOGN FLD TERN FLD DATE FLP ECPU FLD ECLOCK FLD LCOST V 1650 9999 6 9 TER -> 6 TER -' 3 TER ~> 8 TER -• 6 TER -» 6 TER -• 7 TER -" I 1 1 1 1 1 1 FLD MAXDOF 6 TER -» 1 FLD SRCHA 255 TER ^ 1 FLD SRCTB 255 TER -> 1 ^LD LOGIC 24 TER "• 1 FLD WRD 255 TER "« 1 FLD WRDA 255 TER "• 1 FLD NDOCPR 5 TER "^ 1 FLD TXT 30 TXT 1 SPCL 0 7 5 ; SENT 080 094 1 0 7 ; TRNS NONE DMDBD DD DJSNAME^EB.DED,DISP=SER=LB0005 DMTEXTS QMDICTN DMVOCAB DMMASTR DMWRKT1 DMWRKT2 DMWRKT3 DMSORTWK01 DMSORTWK02 DMSORTWK03 DMSORTWK04 DMSORTLIB DMSYSOUT DMWORK1 DMWORK2 DMTOKK3 END DD DD DD DD DD DD DD DD DD DD DD DD DD DD DD DD DUMMY U^O7IV2314,DSNA^^E^RDICIN,DISP=1ASTR,DISP=<3LD,VOIr=^^ UNTT=2314, SPACE= (TRK, ( 5 0 , 1 0 ) ) ,DSNAME=SRCWR1 ,VDL=SER=LB0005 UNTT=2314,SRACE= (TRK, ( 5 0 , 1 0 ) ) ,DSNRME=SRCWR2,VOL=SER=LB0005 UNIT=2314,SPACE= (TRK,(50,10)),DSNAME=SRCWR3 ,VOL=SER=LB0005 UNn^2314,DISP=OLD,DSllAME=SYSl.UTl UinT=2314,DISP=OLD,DSNAr>CE)=SYSl.UT2 UNIT=2314,DISP=OLD,DSNAf4E=SYSl.UT3 UNIT=2314,DISP=QLD,DSNAME=SYS1.UT4 DSN^IE=SYSl.SORTLIB,DISP^II),DCB»(BLKSIZE=3265,RECP'NU) SYSOUT=A UNIT=2314,SPACE"(TRK, ( 5 0 , 1 0 ) ) , D S t a M E ^ F O B l , V D L = S E R = L B 0 0 0 5 UNIT=2314,SPACE=(TRK,(50,10)),DSNAMB=SRCWS2 ,VOL=SER=LB0005 UNIT=2314,SPACE= (TRK, ( 5 0 , 1 0 ) ) ,DSNAME=SRCWS3 ,VOL=SER=LB0005 Figure 1. Data Base Description 51 Displacement Data Item ?ZE£ Binary Packed Decimal Packed Decimal Packed Decimal Binary Binary Packed Decimal Binary Binary Binary Binary Binary Binary Packed Binary Character Character Character 0- 3 4- 8 9-11 12-15 16-19 20-23 24-27 28-29 Record length Social Security Hunter Log nunter Date (YYTICD) Elapsed CPU (1/300 sec.) Elapsed CLOCK (1/300 sec.) I4aximum Documents Dropped List Length Terminal Nurrter A Type . Cost User I/P Length Error Flag No. Docs Printed List type User Input User Output List 30 31 32-35 36-37 38 39-41 42 43-n Var Var Figure 2. Input Record Description 52 Output: TVo v a r i a b l e length records for each leg rojtmber /Record 1. Bibliographic data and f i r s t paragraph of t e x t d a t a . Field Name Displacement 0 4 10 20 27 31 40 47 54 62 variable variable Length 4 6 9 6 3 8 6 6 7 variable variable 24 Data itort and oenments Record length DPS Ascension Number Social security number (in alphabetic code) Log number Terminal number (port number for hardwires, code for dial-ups) Date (YY-WI-DD) Elapsed CPU time (mmm:ss) Elapsed clock time ( r a i s ) irtrs Search cost (*ddd.cc) Labels, keywords, and operators LIST statement Frequency of occurance of each operator in same order as first 7 entries of LOGIC list for Record 2. Length count and keywords from search Overflow field for WRD, Used only if keywords exceed 254 characters, lumber of documents printed Paragraph A indicator Search, from LI through Ln in text format, or NONESRCH LIST statement DOCNO SSH IiOCN TERN DATE EOPU BCLOCK Lcosr SFCHA. SRCHB JJOGIC WRD WRDA NDOCPR variable variable variable variable variable variable Variable variable 5 4 variable variable Record 2. Second paragraph of text data. 0 4 10 14 23 variable variable variable variable variable variable 4 6 4 9 variable variable variable variable variable variable variable Record length DPS Ascension Number Paragraph B indicator Social security number L4-LOGN+L or IWONE T+TEPN+T or TNONE D+fMDDYY+D or DNONE LIST type - LSTBRIEF, JjSTKfcXJORD, LSTOTHER, or NOLST LOGIC - from 1 to 7 entries depending on search Completeness of search User output in format VnnAnnnnn or VO0ANONE Figure 3. Output Reoord Description 53 6HsE SECURITY Tape qanerato'd by STATPAC program CONVERT SOCIAL ,f TO ALPHABETIC CODE IN OUTPUT ifiUVE LOO i£ AND C05T..*IUVE TO OUTPUT. I'iOVE DATE AND i'iAXIiflUM DOCS. FOUND TO OUTPUT. COMPUTE LENGTH OF USER OUTPUT! |5ET SEARCH OMPLETENESSJ INDICATOR 5BL EXCLUDE &V, AS,EXPERT SEARCHES 'SET SEARCH COMPLETENESS INDICATOR CUtfPUTE DPS DOCUMENT f, WOVE IT fO OUTPUT EXIT TO l'!0DULL 2 (FORiflSRCri) <© Figure 4. Reformat Searches Module 1 - BIEFIDS 54 PRINT ERROR MESSAGE WOVE DOC $ * PA'RAGRAPH| INDICATOR TO OUTPUT -N MOVE 5 0 C I A L SECURITY # TO OUTPUT ' MOVE TEXT FORiHAT OF LOGN,TERN, DATE TO OUTPUT VIOVE ALL VOLUMEABSTRACT PAIRS TO OUTPUT N- YES •