on library routines in the statistical - …nordbotten.com/articles/library 1962.pdfon library...

10
ON LIBRARY ROUTINES IN THE STATISTICAL DATA PRODUCTION 1 By SVEIN NORDBOTTEN and THOR AASTORP, Oslo

Upload: phamdieu

Post on 07-Apr-2018

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

ON LIBRARY ROUTINES IN THE STATISTICALDATA PRODUCTION1

By SVEIN NORDBOTTEN and THOR AASTORP, Oslo

Page 2: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

Reprint from Statistical Review nr 3, 1962

I. Routine libraries

Early in the history of program con-trolled computer application, users learn-ed that programming efforts might besaved if programs and subroutines werewritten to satisfy certain criteria as togeneral use, and the concept of r o u -t i n e l i b r a r y was conceived.

There are two rather different typesof libraries2, the s p e c i a l l i b r a r yand the g e n e r a ] l i b r a r y . The spe-cial library contains all tailor-made rou-tines for recurring problems of exactlythe same specification. In the statisti-cal data production tailor-made pro-grams have been extensively used owingto the peculiar applications occurring ina statistical bureau. This tendency makescooperation between bureaus difficultas to special program exchange.

The cooperation has to be concen-trated to the general libraries, which,however, might be very useful for thelater elaboration of programs for thespecial libraries \vithin each statisticalbureau.

The definition of a general libraryroutine is that it should at least be appli-cable to the same problem with formatsor dimensions varying within given lim-its. The routines within the general li-brary can be grouped into three classesaccording to how they work:

1 This paper is based on notes prepared fora meeting in Stockholm 12th—13th October1961 of representatives from statistical bureaususing IBM 1401/7070 machines. — 2 Accordingto the special terminology used in connection•with data processing.

Class I: Routines for production runsof special applications theformat of which are control-led by parameters.

Class II: Routines associated with aset of subroutines each iden-tified by an assigned param- _-eter value, and working on avarying length sequence ofparameters by which the re-lated subroutines either arecompiled before or currentlycalled during the productionprocessing.

Class III: Routines comprising more orless complete alternative pro-duction routines for a specialapplication and which bymeans of a parameter set de-fining format, etc. select andgenerate the optimum pro-duction routine for the prob-lem with the specified for-mat.

There exist programs which reallymay be classified in two classes. Theprinciple difference between the classesis, however, important and may be mademore clear by means of examples. Aprogram for computing the square rootof all integers up to a specified integer,is a typical program within the firstclass. The programs in this class can bemade very efficient by machine orientedcoding. They are, however, e i t h e rby leaving few format dimension openfor parametric specification, very much

-

-

.

Page 3: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

STATISTISK TIDSKRIFT 1962: 3 167

related to the routines in the speciallibrary and not within the scope of thispaper, or by leaving many formatdimension open for specification, areslow or inefficient for recurring runs.

The remaining classes represent twodifferent approaches to solve the diffi-culties in making previous programmingefforts useful.

The second class contains compilerand interpretive programs the aim ofwhich are to reduce the programming bydoing once and for all as much as pos-sible of the detail coding. The class IIprograms are utilized in producing bothgeneral class I programs and specialprograms. The included subroutines areoften problem oriented, and the specifi-cation parameter values take form of aproblem oriented language which set upfor solution of a problem is called as o u r c e p r o g r a m . IBM-compilersfor 7070 are the Autocoder, Fortran andCobol processors. The main character-istics of this type of compilers are theirwide range of possible applications andthe value of their associated languagesas means in the efforts for standardiza-tion in problem formulation and docu-mentation of solutions. Their generalfeatures may, however, cause a loss inoperating efficiency of the resulting pro-duction programs and if these are tobe used frequently the loss in efficiencymay outweigh the gain in programming.The compiler programs are thus fittedfor jobs which should be runned a fewtimes or in a situation when machinecapacity is no limiting factor.

Belonging to the second class are alsothe interpreters, where the translationof source program is performed step bystep during the production run. Exam-ples of interpretive programs are theprograms for Simulation of the IBM 650

on the IBM 7070 and Simulation of theIBM 7070 on the IBM 704 and IBM 7090.Because translation has to be done everytime an interpretive system is applied,the operating speed is usually low andinterpreters are used mainly for a joboccurring only once, in program testingand in the period transferring jobs fromone computer to another.

There is another type of interpreterswhich is very important. That is themonitor programs the aim of which isto integrate several programmed pro-cesses in one continuous process avoid-ing manual managing of the system asfar as possible.

The third class of programs containsthe generators. The idea of a generatoris that several applications are very fre-quently performed and deserve specialattention. The general point of view rep-resented by class II programs is sacrifiedin the generators. The aim of the gener-ator program is both to reduce pro-gramming effort to a minimum and toachieve maximum operating efficiencyfor all formats of a certain problem. Asthe research in systems of administrativedata processing applications has advanc-ed, it has been found that these appli-cations basically consist of standardprocesses even though the sequencingmay vary. This research has increasedthe interest in generators and for theIBM 1401/7070 systems generators forSorting, Merging and Report programsare now offered the user.

2. General programs for the sta-tistical data production

Preparation of a large number of sim-ilar, but special programs and the in-tegration of the processes of the differ-ent stages to an integrated, automatic

Page 4: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

168 1962: 3 STATISTISK TIDSKRIFT

process, are common objectives for theefforts in the data processing field ofthe statistical bureaus. Recent reportsfrom the US Bureau of the Census indi-cate that even working in the large scaleof this agency, the processes are yetdivided in separate runs requiring man-ual interference and implying risks ofintroducing human failures.

Two approaches have been tried toobtain the mentioned objectives. Messrs.F. Yates and H. R. Simpson have workedout an interpreter for a small machineable to take care of the different stagesof statistical data production in onesingle run. Besides being restricted bythe size of their machine, an interpretivescheme must be slow and of course re-quire a new source program for eachapplication. To our opinion this solutionmay be well fitted for application tosmall survey processing only to be doneonce.

The second approach is representedby Messrs. A. S. Douglas and A. J. Mit-chell by a compiler with a source lan-guage well oriented towards the statis-tical terminology. This compiler seemsto be very general and should in prin-ciple be able to solve many of the prob-lems of statistical bureaus in one processincluding editing, selection, tabulationand analytical computation. The pro-duced production program will pro-bably be much faster than a correspond-ing interpretive, but in both cases acomplete new source program must bewritten for each application. Thescheme should therefore be adequatefor frequently recurring applications onsmall data masses.

In general, the work of the statisticalbureaus is comprising a large number ofextensive jobs which often are recurring.Our proposal is a third alternative ap-

proach based on a brick-system of gen-eral routines integrated and controlledby a monitor program.

3. Monitor controlled statisticalprocessing

The statistical data processing can beconsidered as comprising the followingfunctions which may be performed indifferent sequences:

1. Administration2. Data conversion and transcription3. Sorting4. Editing5. File work6. Tabulation7. Analytical computationsWe propose to elaborate general pro-

grams which together with already ex-isting will cover these functions bymeans of compilers and/or generatorsproducing special optimum productionprograms which are integrated to asingle process by means of a monitorprogram. Some of these programs willbe useful also for other kinds of usersand it is reasonable to leave the con-struction of these to the manufacturers,while others are particularly useful forthe statistical bureaus and must prob-ably be worked out by ourselves. Therest of this section will be used fordescribing the ideas of the system morein detail.

3.1 The monitor programThe monitor program takes care of

all coordination simulating the work ofmachine operation scheduling staff andrepresents the function of administra-tion. The operation of the monitor iscontrolled by a set of control cardswhich the monitor is interpreting be-

Page 5: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

STATISTISK TIDSKRIFT 1962: 3 169

tween the operation of two programs.Controlled by these specifications, themonitor has to be able to leave the con-trol for further operations to any speci-fied program on a library tape, it mustbe able, if necessary, to modify by spec-ification all input and output alloca-tions made in the individual programsin such a way that tapehandling maybe reduced to a minimum, it must beable to do the housekeeping operationsspecified between the operation of twolibrary programs such as printingmessages, storage clearing, failure in-dicating, etc.

A very important part of the monitorsystem is the library tape. The librarytape contains the necessary general pro-grams, class I, class II or/and class IIIprograms, covering all the necessaryprocesses and conveniently labelled forreference in monitor control cards.

When the called program is a com-piler, the monitor should be able to storethe resulting production program on thelibrary tape as an extra provisional pro-gram with specified label. A generator,on the other hand, will usually store thegenerated program directly in the in-ternal store and leave the control to itimmediately.

Another component of the system isthe monitor control specifications sup-plying monitor and general programswith the necessary information. Thesespecifications may be read by a card-reader.

A typical statistical process sequenceconsists of data input, sorting, editingand tabulation. Assume that the librarytape besides the monitor program, con-tains a general input-program, a sortgenerator, a compiler for working out aproduction program according to a spe-cial editing, and a generator for tabulat-

ing programs, \vhich are labelled fromLi to L.4, respectively.

The work will proceed in this se-quence specified by the monitor controlcards:

1. Control is transferred to the editingcompiler which itself reads in sourceprogram. The monitor is copying theresulting production program on thelibrary tape with the specified labelLs.

2. Control is transferred to the generalinput-program which after readingthe necessary parameters performsthe initial card-to-tape process on thedata to be processed.

3. Control is transferred to the sort gen-erator which reads its parameters andleaves the generated program in thestore, and then transfers control toit. The parameters read by the gen-erator direct the sorting program tofetch the input automatically fromthe output station of the previousprocess.

4. Control is transferred to the compilerproduced editing program labelled Lswhich works on the sorted output ofthe process no. 3.

5. Control is transferred again to thesort generator and the edited data arere-sorted.

6. Control is transferred to the generatorof tabulation programs which gener-ates a production program and leavescontrol to it to perform the finalprocessing.

7. Control may again be transferred tosort and tabulation generators if nec-essary for further tabulation.

Monitor programs for the IBM 7090systems already exist and IBM is alsoworking on monitors for IBM 7070. Theuse of monitor programs is said to in-

2—ezoseo

Page 6: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

170 1962: 3 STATISTISK TIDSKRIFT

crease the capacity of an 7090 systemwith 50—100 per cent, saving operatingstaff and increasing the quality of thefinal results by elimination of handlingfailures.

3.2 Input-output program

In addition to the usual editing fea-tures of card-to-tape, tape-to-card andtape-to-printer programs, a general pro-gram for statistical processing shouldalso include the possibility of labellingeach item within a record. Usually theitem represents the observation of acharacteristic which is identified by theposition of the item within the record.When the record is merged in a filewith other types of records from thesame statistical unit but collected inother relations, the resulting record maybe of varying length from one unit toanother requiring labels for each item.It may be efficient to label the itemsalready during conversion.

As the statistical masses are extensiveand the conversion formats varying, itis thought that it will be worth whileworking out a generator for input-outputprograms.

3.3 Sorting programs

Sorting is a general function in dataprocessing. The reason for sorting in thestatistical data production is due to thenecessity to merge one set of data withothers and because it is a more efficientalternative than complete internal se-lection connected to the extensivetabulations.

Existing generators for sorting suchas IBM 7070 Sort 90 seems to be power-ful, and our attention should be concen-trated at other functions.

3.4 Editing programsThe editing function is supposed to be

particular for the statistical data produc-tion. An editing program must take careof both control of collected data andcorrection of wrong items. The controlas well as the correction process are byprinciple more or less similar from onestatistics to another and may perhaps bestandardized.

There will, however, be marked dif-ferences between the editing in a popu-lation census and in a foreign trade sta-tistics. In some applications toleranceintervals must be determined by statis-tical methods and correction performedby estimation procedures, in other thecontrol criteria are prefixed and cor-rections based on Monte Carlo genera-tions from certain empirical distribu-tion.

The editing processes are thereforeboth similar in most applications andat the same time different as to the mostconvenient solution. Perhaps a compil-er with a source language directly ori-ented towards editing problems will bethe most suitable solution.

3.5 File -work programsAs indicated above, the collected data

represented as records must often bemerged in larger files and later records -or parts of records with specified labelshave to be retrieved. This file work isalso a very usual part of most businessapplications, and generators for fileprograms as for instance IBM 7070 Merge91 are worked out.

Again the special requirements of sta-tistical data production must be stressed.During the merging process it shouldfor instance be possible to give the itemsof the new data records specified labelsand during the retrieval process it must

Page 7: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

STATISTISK TIDSKRIFT 1962: 3 171

be possible to retrieve either records oronly items or both according to speci-fications referring to a complete orincomplete set of labels. The needs in-dicate that special attention should begiven to the file work requirements ofthe statistical data production whenproducing general programs for thisfunction.

3.6 Tabulation programsAggregating records in tabular forms

or reports seems also to be a functioncommon for very many applicationswithin both business and statistical dataproduction and several generators havebeen designed, e. g. the IBM 7070 ReportGenerator Program.

Usually it is desirable that a tabula-tion program is able to perform bothindividual and collective arithmetic op-erations in connection with the tabula-tion process. In our monitor scheme, itmay be most efficient to do this kindof operations separately by the analyti-cal computation programs, and con-centrating efforts in the generator fortabulation on the pure tabulation func-tions, i. e. internal selection and accu-mulation. This will greatly simplify thedesign of a generator.

On the other hand, the generator ought_ to be designed to do selection of items

based on item labels, the pair of labelsand items arbitrary scattered in the re-cords. So far, this seems to be a specialrequirement to tabulation programs fromthe statistical data producer.

3.7 Programs for analytical computationsThe last brick in the proposed monitor

system will be a compiler producingprograms for arithmetic and analyticalcomputations. The language of this com-piler must be oriented towards the sta-

tistical terminology with powerful in-struction which besides the elementaryarithmetic operations should include in-structions for direct computation of va-riance and standard deviation of itemswith specified labels, correlation andregression coefficient for specified la-bels, etc. In contrary to most other com-pilers, this one must as before men-tioned, possess the ability to produceprograms which can identify the itemsboth by their relative position withinthe record and by their labels in thecase they are packed and their relativeposition is not fixed.

The monitor controlled statistical pro-cessing outlined in this section is by nomeans planned in detail and much workhas yet to be done as to the require-ments for each program. Some prelimi-nary studies and work have been donein the Central Bureau of Statistics ofNorway on small generators for an IBM1401 without any tape-equipment, andour experience so far seems to indicatethat the monitor controlled system maybe a useful approach for making statis-tical data production both integratedand efficient.

4. General programs and programgenerators

The ordering of IBM 1401 for the Cen-tral Bureau of Statistics of Norway wasmade assuming that this system shouldreplace some of the conventional punch-card machines. We were fully awarethat we had to rely upon an extensiveuse of general programs, because we didnot have the programming capacity tomake special programs for all problems.We therefore investigated the standardprograms available for 1401, but foundthat they did not quite satisfy our re-

Page 8: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

172 1962: 3 STATISTISK TIDSKRIFT

quirements. On that account we decidedto develope general programs and pro-gram generators which were moresuited for our purposes.

We started developing the programsby stating the basic requirements, whichwere:

i) The programs should be flexible,i. e. they should cover many differ-ent problems

ii) The programs generated should beeconomic and efficient regardingmachine-time and storage positions

iii) The general programs should beeasy to use.

On further evaluation of these require-ments we strongly felt that the last twowere the most important by far. Ac-cordingly we wTere willing to sacrificein flexibility in order to gain efficiencyand ease in use, the more so as theflexibility was very restricted becauseof very limited storage capacity in our1401.

Four months after the installation ofthe 1401 system, we were inclined tosay that, owing to the general programsand program generators, this first periodhad been much easier and more success-ful than anticipated.

The rest of this section gives a shortdescription of a general program andthree frequently-used program genera-tors which are developed by the Cen-tral Bureau of Statistics. Although theseprograms are written for a 1401 systemusing card input, card and printed out-put and comparatively few storage posi-tions (4000), it is hoped that they may beof interest to other 1401 users as well.

4.1 List Program GeneratorA problem often encountered, is to list

the informations contained in the cards.

This problem of preparing a listed re-port of the information in the cards isa conversion problem, i. e. card-to-printer, and requires a programmer'seffort if not having the assistance of aprogram generator.

The List Program Generator produceslist programs with a minimum of timeand effort. The user writes specifica-tions for the problem on a form designedfor this purpose. The specifications arelater punched on control cards whichare used as parameter input to the ListProgram Generator.

The list programs which are gener-ated, wTill produce listed reports with aspeed of 533 lines per minute. The cardsto be listed can consist of up to 4 dif-ferent card types. One or twro columnsin the cards may be used in identifyingthe card types. The user must give thefollowing specifications in the controlcards: The number of card types, whichcolumns that are used for identifica-tions and what punched codes that iden-tifies the different card types. Anypunching that is valid in 1401, may beused as identifier.

The list programs can list up to 30fields from the data cards. These fieldsare specified in the control cards by thecolumn number of the first and lastcolumn. Furthermore the user must in-form the generator where each field isgoing to be printed, i. e. the rightmostprint position, and if the field shall beprinted with or without left-hand zero-elimination. Any editing of the printedresults beyond that can not take place.

Title and heading can be printed onthe two first lines of the report if theinformation regarding this are given inthe control cards. More than one cardcan be printed on the same line providedthese cards belong to different card

Page 9: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

STATISTISK TIDSKRIFT 1962: 3 173

types, i. e. only one card of each typecan be printed on one line.

This program generator does not in-clude any provision for tabulating, ex-cept optional counting of cards andprinting out of these totals on up to threedifferent control levels.

4.2 Report Program GeneratorThe Report Program Generator has

been developed to produce programsthat write and/or punch reports ofvarying format from cards. The needfor writing a tailor-made program foreach report is reduced to the require-ment of supplying only a set of speci-fications. The user writes down the in-formation relevant to the report on aspecial designed form. The specifi-cations have to be written down ac-cording to some rules and conventions,these are, however, so simple that nodetailed knowledge of the 1401 or pro-gramming are required from the users.The specifications are punched on con-trol cards that constitute the input tothe generator, which in turn generatesthe report program.

The report programs allow a maxi-mum number of 30 accumulators oneach control level. Fields from the cardsmay either be transferred to, added orsubtracted into the accumulators whichare numbered from A 1 to A 30. Theaccumulators may also be used for crossadding or — subtracting of numberswithin cards, in these cases one has touse as many accumulators as factors inthe cross summation. An accumulatormay also be used as a counter by addinga \ into it. There is no practical limitof the size of each accumulator, but thenumber of positions in all the accumu-lators on one level must not exceed 119.

Totals may be printed and/or punched

out on four levels. One may freelychoose which totals to print and/orpunch and the order of these totals. Ifone summary card is not sufficient forall totals on one level, two cards can bepunched.

The report programs provide for pro-cessing of max. 5 different types ofcards in the same run. In order to iden-tify the different card types, the follow-ing information should be given on thecontrol cards:

i) location of the identification fields,max. 3 fields with up to 5 columnseach. The identification fields mustbe located in the same place in alltypes of cards within one run,

ii) identification codes for each typeof card, one for every identificationfield that is used,

iii) one identifier for every identifica-tion code.

The contents of the identificationfields are compared writh the identifiersto see if it is equal, unequal, greater orless. The identification codes specifywhat the results of the comparisonsshould be for each card type.

Furthermore, the generator includesprovision for printing of title and head-ing, sorting of data — and summarycards, carriage control and programmedhalts.

Although this generator can not pro-duce list programs, such programs may,however, be simulated by the setting ofone of the sense switches. The effect ofthis switch setting is that every cardwill be »listed», i. e. minor totals will beprinted for every card. In the same wayone may choose whether the cardsshould be sequence controlled or not bythe setting of another sense switch.

There may occur programmed halts

Page 10: ON LIBRARY ROUTINES IN THE STATISTICAL - …nordbotten.com/articles/Library 1962.pdfon library routines in the statistical data production1 by svein nordbotten and thor aastorp, oslo

174 1962: 3 STATISTISK TIDSKRIFT

both in the processing of the controlcards and the data cards. A programmedhalt on the control cards signals an errorin these cards or out of sequence, whilea programmed halt on the data cardsindicates either undefined card type orout of sequence.

4.3 Reproducing Program GeneratorBy means of programs produced by

this generator, all reproducing, X-gang-punching and reproducing with gang-punching can be carried out. This pro-gram generator may in fact accomplishall what normally is done on a conven-tiona,! reproducer.

All informations concerning the re-producing and/or gangpunching arepunched in control cards.

4.4 Condensing ProgramWhen a program is tested and found

to be correct, this Condensing Programcan be used in order to pack the pro-gram so that the deck of program cardswill consist of less cards.

The Condensing Program works, incontrast to IBM's Condensing Routine,after the Post Mortem principle. Thismeans that the program to be condensedis first read into 1401 and stored, thenthe Condensing Program is run in andtakes care of punching out the conden-sed program cards.

When using this program one mustfollow these two rules:

i) the programs to be condensed cannot have instructions stored in Read,Punch or Print Area and

ii) the first instruction to be executedmust have the same address in allprograms, namely 190.

The condensed program will get aprogram number and a continuous cardnumber punched in the cards.

5. Summary

In each computing center there usu-ally exist two types of routine librariesof which the general library is of com-mon interest for other data processingcenters. The routines in this library are1) general production routines, 2) pro-gram producing routines of compilerand interpreter type, and 3) programgenerating routines.

In order to reduce programming ef-forts and machine running time in sta-tistical data production two approacheshave been tried, i. e. construction of astatistical oriented interpreter and astatistical oriented compiler. A third ap-proach is proposed in the paper. Thisapproach is based on the monitor prin-ciple controlling and integrating thework of suited routines designed asprogram bricks for the different opera-tions of the statistical data production.

The last part of the paper reports onseveral such general program bricks de-signed for an IBM 1401 to satisfy theparticular requirements of the statis-tical data production.

1

_

References

DOUGLAS, A. S. & MITCHELL, A. J., Autostat: alanguage for statistical data processing. TheComputer journal 3 (1960): 2, s. 61—66.

YATES, F. & SIMPSON, H. R., A general programfor the analysis of surveys. The Computerjournal 3 (1960): 3, s. 136—140.

KUNGL. BOKTR. STHLH 1962