data in diff environmentssas

7

Click here to load reader

Upload: purna-ganti

Post on 29-May-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data in Diff Environmentssas

Establishing Production and Development Environments for Base SASSoftware Development

Craig Ray, Westat, Rockville, MD

Abstract:A good development process is one of the surestways to assure a successful development project.And, a good development environment is one ofthe most important pieces of a gooddevelopment process. This is certainly true, forexample, developing systems in C++ or inVisual Basic. It is just as important whendeveloping SAS production systems. Without awell designed development environment, largeprojects can be very difficult to successfullycomplete, and once in production the system canbe hard to maintain. This paper describes thefeatures of well designed development andproduction environments for base SAS softwareprojects and presents the framework for ageneric development environment in SAS whichwould be appropriate, regardless of theapplication.

1.0 Introduction :While good practices are always desirable, theyare not quite as vital for many routine SAStasks, such as ad-hoc, or exploratory dataanalysis. Rather, the importance of this topicemerges in production systems, ones which runrepetitively and must be supported over timewith upgrades.

Characteristics of a good environment fordevelopment and production are:1. the process of testing, tracking changes, and

migrating code from development toproduction is well understood and followedby all project members,

2. code can be developed and modified withoutadversely affecting existing production runsduring the development process,

3. code can be migrated to the productionenvironment without having to be altered inany way nor be retested.

This paper describes a generic method forestablishing development and production SASenvironments. The foundation of this process isthe SAS Macro language and the correspondingAutocall facility.

2.0 SAS Macro and the Autocall Facility :When implementing any SAS productionsystem, the primary building block is providedby the SAS Macro facility. While the SASMacro language is a powerful run-time codegenerator, its primary purpose here is to allowthe construction of ‘subroutines.’ In general,when developing production systems, all codeshould be packaged as macros. As will bedemonstrated, this will facilitate 1) the smoothmigration of code from the development to theproduction environment, 2) modifying andtesting code that is already in productionwithout adversely affecting production, and 3)code reuse.

The Autocall facility is invoked with theMAUTOSOURCE option along with the macrosearch path specified in the SASAUTOS option.In order for the Autocall facility to work, the filename where the macro is stored must have thesame name (with .SAS as the extension) as theSAS macro it contains. When using theAutocall facility, the physical location of amacro does not have to be explicitly specified.Rather, a macro is simply invoked within SAScode. If the macro has not already beencompiled within this SAS session, then SASsearches for a file of the same name as themacro in the search path, searching in the orderspecified in the SASAUTOS option. When afile of the same name is found, then the macro is‘included,’ compiled, and executed.

3.0 The Importance of Design:It has been said that ‘you will do system andprogram design.’ All systems are designed,from the smallest to the largest, whether or notthere is a formal design phase. If there is not aformal design phase, then the design is doneeither in the mind of the coders or in randomdiscussions during the programming phase. Ashas been demonstrated repeatedly, however,when there is a formal design stage, theresulting system is ultimately less expensive tobuild, of higher quality, and delivered sooner.Therefore, one can and should eliminate theurge to declare that ‘we don’t have time to do adesign.’ To that, one should respond that ‘wedon’t have time to not do design.’

Applications DevelopmentApplications Development

Page 2: Data in Diff Environmentssas

2

It is beyond the scope of this paper to cover thevast topic of software design. However, indiscussing software development it is worthstressing the importance of producing a coherentsoftware design prior to undertaking theprogramming of a system. While there arenumerous techniques for software design, asample design technique is presented here toserve as an example for this paper. Illustrations1 and 2 show the design of a fairly straight-forward SAS system. Illustration 1 represents asample preliminary design, which depicts theflow of physical files through varioussubsystems of the overall program. Illustration2 represents a further refinement, where theprogram is depicted as a modular hierarchy.Each module then becomes a macro.

4.0 Life-Cycle Environments:The following section outlines the structure ofthe production and development environments.It is important to design the productionenvironment first, then construct a developmentenvironment to simulate production.

4.1 Production Environment:The production environment consists of separatedirectories for:

SAS database - directory consists of the SASdatabase; this includes code tables in the form ofSAS datasets for dynamic tables,

Sample Directory: \bigproj\prod\data

code - this directory consists entirely of Autocallmacros,

Sample Directory: \bigproj\prod\code

drivers - the actual main programs which areexecuted; it is only the drivers which have anyknowledge of the physical environment, in thiscase, the production environment; it is in thisdirectory only that .LOG and .LST files aremaintained.

Sample Directory: \bigproj\prod\runcode

Utility macros - Company or division-widegeneric macro library. Typical utility macrosmight include:• %TESTPRNT - conditionally generate

PROC PRINT code,• %NOBS - return the number of

observations in a SAS dataset,• %DYNFMT - dynamically generate PROC

FORMAT code from the contents of a SASdataset.

Sample Directory: \company\prod\maclib

4.2 Development Environment:The development environment is an exact copyof the production environment:

SAS database - the SAS datasets have the exactsame structure as the production database, butgenerally will have a small sample of the data;the code tables are generally exact copies of theproduction code tables, periodically copieddirectly from production to development,

Sample Directory: \bigproj\test\data

code - this directory consists entirely of Autocallmacros, but generally only the macros which arecurrently under development or maintenance;this maintains high integrity in the developmentenvironment and allows managers to moreeasily track the code that is being worked on.

Sample Directory: \bigproj\test\code

drivers - the actual main programs which areexecuted; it is only the drivers which have anyknowledge of the physical environment, in thiscase, the development environment; it is in thisdirectory only that .LOG and .LST files aregenerated and maintained.

Sample Directory: \bigproj\test\runcode

Utility macros - generally refers to theproduction environment only. Altering thesemacros is out of scope for any project specificdevelopment.

Sample Directory: \company\prod\maclib

Applications DevelopmentApplications Development

Page 3: Data in Diff Environmentssas

3

5.0 Driver Programs:Driver programs should be the only code whichhas knowledge of the physical environment.This allows for code (i.e., macros) to bemigrated to different environments and only thedriver program needs to be changed.

In general, driver programs:

• Contain all LIBNAME, FILENAME, andOPTIONS statements. With fewexceptions, these statements are not allowedin other code

• Define all global macro variables

• Contain virtually no other code

5.1 Example Production Driver Program:

OPTIONS ls=160 ps=60mautosource mprintsasautos=(‘\bigproj\prod\code’,

‘\company\prod\maclib’);

/* Global macro variable defined in all Drivers*/%let environ = prod;%let debug = no;

/* The Database */LIBNAME data “\bigproj\&environ\data”;

/* Code Tables */LIBNAME tabldata “\bigproj\&environ\data”;

/* ASCII Input Files */FILENAME rawdata“\bigproj\&environ\data\trans.txt”;

%main(infile = rawdata, inlib = data)

5.2 Example Development Driver Program:By design, the development driver is differentfrom the production driver in that 1) it refers totest datasets, and 2) the Autocall search pathincludes the test code library before theproduction code library. The latter fact isparamount. Code exists in the developmentcode library only while it is being tested.During test runs, the code which is being

modified is invoked from the developmentenvironment while code which is not beingmodified is invoked from the productionenvironment.

OPTIONS ls=160 ps=60mautosource mprintsasautos=(‘\bigproj\test\code’, ‘\bigproj\prod\code’, ‘\company\prod\maclib’);

/* Global macro variable defined in all drivers*/%let environ = test;%let debug = yes;

/* The Database */LIBNAME data “\bigproj\&environ\data”;

/* Code Tables */LIBNAME tabldata “\bigproj\&environ\data”;

/* ASCII Input File */FILENAME rawdata“\bigproj\&environ\data\trans.txt”;

%main(infile = rawdata, inlib = data)

6.0 Sample Main Macro:Systems of any substantial size should include amain macro which directs the overall flow of thelogic. The flow of the main macro shouldcorrespond with the high-level design so thatone could have a basic understanding of thesystem by analyzing this code.

It is very common to combine the main macrowith the driver. This temptation should beavoided. It is important to separate the physicaldetails of the driver from the logical details ofthe main macro. Except for code which is beingactively altered, the only differences between theproduction and development environments arethe drivers, themselves, which should ‘live’ intheir respective environments and rarely change.With the driver separated from the main macro,then changes to the main macro can be migratedto production without having to change anycode. If the two are combined, then whenmigrating changes to the main macro toproduction, the driver portion of the code must

Applications DevelopmentApplications Development

Page 4: Data in Diff Environmentssas

4

be altered. The code below depicts the mainmacro for the system illustrated in Illustrations 1and 2.

%MACRO main(infile1 = , inlib =, tabldata =);/* Utility macro from company lib */%dynfmt(tabldata = &tabldata )

/* Project macro */%readraw(infile = &infile)

/* Project macro */%process()

/* Project macro */%update(inlib = &inlib)

/* Project macro%reports()

%MEND main;

7.0 Code Migration Process:If programs are written once and run in theproduction environment forever, withoutmodification, then there would be no purpose forthis paper. This, however, is very unrealistic. Itis a fact that the cost of the initial developmentis a very small fraction of the overall life-cyclecost for any particular module. Programs mustbe continually modified to adapt to shiftingbusiness requirements. Code modifications cantake days, weeks, even months, during whichtime, the production system must remainoperational. During this process, thisdevelopment must be kept completely separatefrom the production environment until fullytested. This accentuates the importance ofestablishing separate but structurally equivalentproduction and development environments. Theenvironments must be separate so code can bechanged and tested without affecting productionruns; the environments must be structurallyequivalent so that tests in the developmentenvironment as nearly as possible simulate theproduction environment.

The development and production environmentsare depicted in Illustration 3. This alsoillustrates the process whereby code is ‘checkedout’ of production, modified and tested, andmigrated back into the production environment.

Generally, to alter a macro:1. COPY from production library to

development code library2. Make any changes3. Test with the EXISTING driver in

development (generally, no changes to thedriver needs to be made)

When testing is complete, to migrate the macroto production:1. MOVE the macro from development to

production2. Run the production system; generally, NO

changes need to be made to either theproduction driver nor the macro.

8.0 Miscellaneous Topics:The following are advanced topics which may beuseful when developing large productionsystems potentially with multiple developers.

8.1 Adding an Integration Test Environment:It is common that more than one developer maybe simultaneously working on a given release ofa system. Each developer may individually testhis/her own changes, but it is essential tointegration test all changes as a complete unit.While an integration test could be performed indevelopment environment, the code has to be‘frozen’, precluding other development activitiesduring the integration test. To remedy this, aseparate integration test environment may beadded. This environment will look almostexactly like the development and productionenvironments already described. Thefundamental difference will be the SASAUTOSpaths for macro searching:

Development Environmentsasautos=(‘\bigproj\test\code’,

‘\bigproj\int\code’, ‘\bigproj\prod\code’, ‘\company\prod\maclib’);

Integration Environment :sasautos=(‘\bigproj\int\code’,

‘\bigproj\prod\code’, ‘\company\prod\maclib’);

Applications DevelopmentApplications Development

Page 5: Data in Diff Environmentssas

5

8.2 Version Control and ConfigurationManagement:The need for version control and configurationmanagement is well understood by all who haveworked on large systems that may involvemultiple developers and multiple installations ofa system. SAS, however, does not explicitlysupport configuration management tools. Thisdoes not preclude the use of a 3rd party tool. Theproduction macro code library can be placedunder version control. By doing so, the macroscan be write-protected whereby code can bemigrated to production only by adhering toprescribed procedures. All previous versions ofall macros are thereby recoverable. In addition,a configuration management system can recorda ‘snapshot’ of a system at any point in time.This records the current version of all macroswhich allows for recreating a release of thesystem as it existed. This can be extremelyuseful for technical support of a system.

9.0 Conclusion:Developing high quality production systemsrequires careful planning and preparation. Thispaper described the establishment of properdevelopment and production environments aswell as other processes and techniques tosupport rapid development of professionalsystems.

Acknowledgements:The author wishes to thank Jim Ingraham ofWestat for his contributions to the paper andMike Rhoads of Westat for his careful review ofthe paper.

The author may be contacted at:Westat

1650 Research Blvd.Rockville, MD 20850

(301) [email protected]

Input Raw Data

(%READRAW)

Master DB

(SAS)

Create dynamic formats

(%DYNFMT)

SAS Formats

Compute New Values

(%PROCESS)

Update Master DB

(%UPDATE)

Illustration 1, Preliminary Design

Generate Reports

(%REPORTS)

Transaction Reports

Trans Mstr (ASCII)

Trans Detail

(ASCII)

Applications DevelopmentApplications Development

Page 6: Data in Diff Environmentssas

6

Driver

%main

%dynfmt %readraw %process %update %reports

%makemft %readmstr %readdet

Illustration 2, Module Hierarchy

BIG_DRVR.SAS LIBNAMES

FILENAMES MAUTO SOURCE

SASAUTOS = (TEST,PROD)

%MACRO MAIN %MAC1

.

.

. %MACn

%MEND

BIG_DRVR.DRV.SAS LIBNAMES

FILENAMES MAUTO SOURCE

SASAUTOS = (PROD)

%MACRO MAIN %MAC1

.

.

. %MACn

%MEND

%MAC1 . .

%MEND

%MACn . .

%MEND

%MAC1 . .

%MEND

%MACn . .

%MEND

TEST PRODUCTION

BIGPROJ/TEST/CO DE: macros only /RUNCO DE: drivers only /DATA: test data bases /TEMP: test junk , etc.

BIGPROJ/PROD/CODE: m acros only /RUNCO DE: drivers only /DATA: prod data bases /TEMP: test junk , etc.

Copy

Move

I.E. Test Code Overrides Prod Code

Only m acros actively being tested. Fully tested macros are moved to prod.

* MACRO NAME AND FILE NAME M UST BE THE SAME.

Illustration 3, Code M igration Process

Applications DevelopmentApplications Development

Page 7: Data in Diff Environmentssas

7

Applications DevelopmentApplications Development