documenting your data using the contents© … your data using the contents© procedure curtis a....

8
1 Documenting Your Data Using the CONTENTS© Procedure Curtis A. Smith, DCAA, La Mirada, CA ABSTRACT SAS© makes documenting our data very simple, whether we have one SAS data set, or an entire data warehouse. We can use the CONTENTS procedure to generate a detailed report of our SAS data libraries, SAS data sets, and SAS data views. The CONTENTS procedure will document our data set columns attributes, data set and column descriptions, number of rows, indexes, creation date, modification date, and more! We can use the data set LABEL option and the variable LABEL statement to better document our data. The CONTENTS procedure will use these labels to improve the quality of our metadata. Documenting the data about our data, called “metadata”, is critical to providing better assurance that our data is used properly. INTRODUCTION There is an old saying in the information technology world that documentation happens last. However, we need to be diligent and build documentation into our data warehouses. Documenting our SAS data libraries, SAS data sets, and SAS data views is so very important. Without some type of documentation, our users will not know which SAS data libraries are which, or what is in them; nor will they know one SAS data set from another. They wouldn’t even know a SAS data set from a SAS data view. Without documentation, we programmers, developers, and data warehouse architects will have a difficult time, too. SAS makes documentation very simple. We can use the data set LABEL option to attach a descriptive label to SAS data files. We can also use the variable LABEL statement to attach a descriptive label to each variable in our data files. That’s right, there is a means to attach a descriptive label to both a data file and the variables within a data file. If we use them both on all the SAS data sets and SAS data views in our data warehouse and on each variable within each SAS data set and SAS data view, we will be able to produce better documentation. Do not be fooled into thinking that some variable names and SAS data sets names are self-explanatory. Surely, everyone will know what is in the variable named ‘amount', right? Well, maybe it represents a year-to-date amount, or a summarized amount, or an overtime amount. Because we can place more information in the label than we can in the data set or variable name, labels become very useful. Once we have adequately labeled our SAS data files and the variables within them, then what? How do we get meaningful documentation out of SAS? Well, SAS makes that simple. There is an entire procedure for just that purpose. PROC CONTENTS’ only function is to generate output documentation about our SAS data libraries, data sets, and data views. The procedure will create printed output documentation and can also send the output documentation to another SAS data set. Used with the Output Delivery System (ODS), we can create very spiffy looking output documentation. LABELING MADE EASY Adding descriptive labels is as easy as coding a SAS program. SAS provides the tools in the form of the data set LABEL= option and the variable LABEL statement. We will look at each separately and we see the impact they have on the quality of the data warehouse documentation we can produce. DATA SET LABEL= OPTION We can add a label to a SAS data set or data view when creating the data set or data view using the DATA step. We can also create or modify an existing label using the DATASETS © procedure. When we specified a label for an output data set or data view, the label becomes a permanent part of that file and can be printed using PROC CONTENTS. A label assigned to a file remains associated with that file when we update a data set in place, such as when we use the APPEND procedure or the MODIFY statement. However, a label will not be associated with a data set or data view that we create from a file that contains a label. For example, if we create a label on data set A and then create data set B using a DATA step with data set A in the SET statement, the label will not be added to data set B. This is not a problem, of course, because we can simply add a label to data set B as we create it. So, how do we create a label on a SAS data set or data view? Quite simply, using the data set LABEL= option. The syntax is, as we would guess: LABEL=’ ’ We can use the LABEL= option on both input and output data sets. When we use the LABEL= option on input data sets it assigns a label to the file for the duration of that DATA step or procedure. Creating a label on the input data set doesn’t do much good for documenting our data warehouse, because it will not be permanent.

Upload: trinhthu

Post on 10-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Documenting Your Data Using the CONTENTS© … Your Data Using the CONTENTS© Procedure Curtis A. Smith, DCAA, La Mirada, CA ABSTRACT SAS© makes documenting our data very simple,

1

Documenting Your Data Using the CONTENTS© ProcedureCurtis A. Smith, DCAA, La Mirada, CA

ABSTRACTSAS© makes documenting our data very simple, whether we have one SAS data set, or an entire data warehouse. Wecan use the CONTENTS procedure to generate a detailed report of our SAS data libraries, SAS data sets, and SAS dataviews. The CONTENTS procedure will document our data set columns attributes, data set and column descriptions,number of rows, indexes, creation date, modification date, and more! We can use the data set LABEL option and thevariable LABEL statement to better document our data. The CONTENTS procedure will use these labels to improve thequality of our metadata. Documenting the data about our data, called “metadata”, is critical to providing better assurancethat our data is used properly.

INTRODUCTIONThere is an old saying in the information technology world that documentation happens last. However, we need to bediligent and build documentation into our data warehouses. Documenting our SAS data libraries, SAS data sets, andSAS data views is so very important. Without some type of documentation, our users will not know which SAS datalibraries are which, or what is in them; nor will they know one SAS data set from another. They wouldn’t even know aSAS data set from a SAS data view. Without documentation, we programmers, developers, and data warehousearchitects will have a difficult time, too. SAS makes documentation very simple. We can use the data set LABEL optionto attach a descriptive label to SAS data files. We can also use the variable LABEL statement to attach a descriptivelabel to each variable in our data files. That’s right, there is a means to attach a descriptive label to both a data file andthe variables within a data file. If we use them both on all the SAS data sets and SAS data views in our data warehouseand on each variable within each SAS data set and SAS data view, we will be able to produce better documentation.

Do not be fooled into thinking that some variable names and SAS data sets names are self-explanatory. Surely, everyonewill know what is in the variable named ‘amount', right? Well, maybe it represents a year-to-date amount, or asummarized amount, or an overtime amount. Because we can place more information in the label than we can in thedata set or variable name, labels become very useful.

Once we have adequately labeled our SAS data files and the variables within them, then what? How do we getmeaningful documentation out of SAS? Well, SAS makes that simple. There is an entire procedure for just that purpose.PROC CONTENTS’ only function is to generate output documentation about our SAS data libraries, data sets, and dataviews. The procedure will create printed output documentation and can also send the output documentation to anotherSAS data set. Used with the Output Delivery System (ODS), we can create very spiffy looking output documentation.

LABELING MADE EASYAdding descriptive labels is as easy as coding a SAS program. SAS provides the tools in the form of the data setLABEL= option and the variable LABEL statement. We will look at each separately and we see the impact they have onthe quality of the data warehouse documentation we can produce.

DATA SET LABEL= OPTION

We can add a label to a SAS data set or data view when creating the data set or data view using the DATA step. Wecan also create or modify an existing label using the DATASETS © procedure. When we specified a label for an outputdata set or data view, the label becomes a permanent part of that file and can be printed using PROC CONTENTS. Alabel assigned to a file remains associated with that file when we update a data set in place, such as when we use theAPPEND procedure or the MODIFY statement. However, a label will not be associated with a data set or data view thatwe create from a file that contains a label. For example, if we create a label on data set A and then create data set Busing a DATA step with data set A in the SET statement, the label will not be added to data set B. This is not a problem,of course, because we can simply add a label to data set B as we create it.

So, how do we create a label on a SAS data set or data view? Quite simply, using the data set LABEL= option. Thesyntax is, as we would guess:

LABEL=’ ’

We can use the LABEL= option on both input and output data sets. When we use the LABEL= option on input data setsit assigns a label to the file for the duration of that DATA step or procedure. Creating a label on the input data set doesn’tdo much good for documenting our data warehouse, because it will not be permanent.

Page 2: Documenting Your Data Using the CONTENTS© … Your Data Using the CONTENTS© Procedure Curtis A. Smith, DCAA, La Mirada, CA ABSTRACT SAS© makes documenting our data very simple,

2

We can use the data set LABEL= option with the DATA step and with PROC DATASETS. Below is an example usingthe DATA step.

Data mylib.myfile(label=’General Ledger YTD 2004');set mylib.yourfile;

run;

Here is an example using the PROC DATASETS.

Proc datasets library=mylib nolist;modify myfile(label=’General Ledger YTD 2004');

run;quit;

VARIABLE LABEL STATEMENT

We can add a label to a SAS data file variable when creating a data set or data view using the DATA step. We can alsocreate or modify an existing label using PROC DATASETS. When we specified a label for a variable in an output dataset, the label becomes a permanent part of that file and can be printed using PROC CONTENTS. A label assigned toa variable remains associated with that variable when we update a data set in place, such as when we use the APPENDprocedure or the MODIFY statement. It will also be associated with the variables of a new data file we create from thefile containing the labels. For example, if we create labels on the variables in data set A and then create data set B usinga DATA step with data set A in the SET statement, the variable labels will be added to data set B.

So, how do we create a label on variables within SAS data sets and data views? Quite simply, using the LABELstatement. The syntax is, as we would guess:

LABEL variable=<'label'>

[Tips: If a single quotation mark appears in the label, write it as two single quotation marks in the LABEL statement. Also,specifying variable= or variable=' ' removes the current label.]

We can assign a label to any number of variables in a single LABEL statement. Using a LABEL statement in a DATAstep or PROC DATASETS permanently associates labels with variables by affecting the descriptor information of theSAS data set or data view that contains the variables.

We can use the data set LABEL statement in a DATA step and with PROC DATASETS. Below is an example using theDATA step.

Data mylib.myfile;set mylib.yourfile;label bu=’Business Unit’

pool=’Overhead Pool’jv=’Journal Voucher’cdate=’Business Cycle Date’tdate=’Transaction Date’account=’Ledger Account’hours=’YTD Hours’amount=’YTD Amount’;

run;

Here is an example using PROC DATASETS.

Proc datasets library=mylib nolist;modify myfile;label bu=’Business Unit’

pool=’Overhead Pool’jv=’Journal Voucher’cdate=’Business Cycle Date’tdate=’Transaction Date’account=’Ledger Account’hours=’YTD Hours’amount=’YTD Amount’;

run;quit;

Page 3: Documenting Your Data Using the CONTENTS© … Your Data Using the CONTENTS© Procedure Curtis A. Smith, DCAA, La Mirada, CA ABSTRACT SAS© makes documenting our data very simple,

3

DOCUMENTING MADE EASYSo, what good are these labels anyway? Well, we can use PROC CONTENTS to generate detailed output of our SASdata libraries, SAS data sets, and SAS data views. We can use the procedure options to get more useful informationout of our files. In addition to listing our SAS data files and their variables, PROC CONTENTS will list the number ofvariables and the number of observations in each SAS data file; it will identify any sort sequences; it will identify whetheror not the SAS data file is compressed; it will list any indexes; and it will list a wealth of host information about thephysical file. We can then manually supplement the PROC CONTENTS output with anything else our users anddevelopers need to know about the data warehouse. Once done, we can distribute the output to our users.

The PROC CONTENTS syntax is as follows:

proc contents data=libref.filename <options>

Fortunately, the PROC CONTENTS offers us many options to customize and maximize the documentation we can getabout our SAS files. Below are the available options.

CENTILES Print centiles information for indexed variablesDATA= Specify the input data setDETAILS|NODETAILS Include information in the output about the number of observations, number of variables, and

data set labelsDIRECTORY Print a list of the SAS files in the SAS data libraryFMTLEN Print the length of a variable's informat or formatMEMTYPE= Restrict processing to one or more types of SAS fileNODS Suppress the printing of individual filesNOPRINT Suppress the printing of the outputOUT= Specify the output data setOUT2= Specify an output data set that contains information about constraintsSHORT Print abbreviated outputVARNUM Print a list of the variables by their logical position in the data set

GETTING THE BASICS

Well, that’s a lot of great looking options. Let’s take a look at some PROC CONTENTS output. Below are severalversions of the PROC CONTENTS output without the use of labels and with the use of labels, and using the variousoptions. Notice where the data set label and the variable label are reflected in the output and how these make the PROCCONTENTS output more useful. Obviously, the variable labels are more meaningful to our users than are the variablenames.

Figure 1

In the example in figure 1 we have the PROC CONTENTS file header section. We notice that the area for the file labelis null. All we really know about the contents of this file is that it is a data set rather than a view (the Member Type) andwe know the data set name. Our documentation is not so impressive. However, after adding a data set label, we can getbetter documentation from PROC CONTENTS, as we see in figure 2.

Page 4: Documenting Your Data Using the CONTENTS© … Your Data Using the CONTENTS© Procedure Curtis A. Smith, DCAA, La Mirada, CA ABSTRACT SAS© makes documenting our data very simple,

4

Figure 2

After adding the data set label, we see the label we chose displayed in the data set label area of the PROC CONTENTSfile header section. Now we know what is in this file. We have greatly improved the quality of our data warehousedocumentation.

In the example in figure 3 we have the PROC CONTENTS file variable section. We notice that the area for the variablelabels is missing from the output. All we really know about the contents of the variables is the type, length, informats,and formats. Our documentation is not so impressive.

Figure 3

However, after adding variable labels, we can get better documentation from PROC CONTENTS, as we see in figure4 on the next page. Now we know what is in the variables in this file. We have greatly improved the quality of our datawarehouse documentation.

Page 5: Documenting Your Data Using the CONTENTS© … Your Data Using the CONTENTS© Procedure Curtis A. Smith, DCAA, La Mirada, CA ABSTRACT SAS© makes documenting our data very simple,

5

Figure 4

CUSTOMIZING THE OUTPUT

As we can see, labeling our data set and our data set variables makes the output from the PROC CONTENTS muchmore useful. But, there is more. As we saw above in the PROC CONTENTS syntax, there are several options that wecan use to better document our data warehouse. Exciting, isn’t it? We will look at examples of some of these optionsbelow. A few I will leave you to experiment with.

DETAILS

To get more information about our data libraries and data sets, we use the DETAILS option. The syntax would besomething like the following:

proc contents data=mylib.myfile details;

The additional information from PROC CONTENTS will depend on the operating system. Under the Windows operatingsystem, the option does not appear to provide any additional information. But, there is something we haven’t seen in ourprior figures - information about our indices and sorts. Refer back to figure 2 and notice that the file header section showsthat we have an index and a sort. In figures 5 and 6 below we see the details PROC CONTENTS provides about oursorted and indexed variables.

Figure 5 Figure 6

Page 6: Documenting Your Data Using the CONTENTS© … Your Data Using the CONTENTS© Procedure Curtis A. Smith, DCAA, La Mirada, CA ABSTRACT SAS© makes documenting our data very simple,

6

DIRECTORY

Until now, we have asked PROC CONTENTS for information about only one SAS data set. SAS provides the ability todisplay the contents information about all the files in our data library using _ALL_ in place of the data file name. Forexample,

proc contents data=mylib._all_;

When we do so, PROC CONTENTS also provides useful information about our SAS data library. If we want this usefuldata library information while wanting data file information about only one data file (that is, when we don’t use the _all_feature), we can use the DIRECTORY option. When we use this option, PROC CONTENTS will provide the sameinformation we saw before, but will also include a directory section. The syntax looks like the following:

proc contents data=mylib.myfile directory;

The output for our data library will look like the following, in two pieces:

Figure 7

In this output we can see each of the files in our data library. SAS provides the name of the files, the member type, thenumber of observations in the file, the number of variables, the file size, the date the file was last modified, and that allimportant data file label we created with the LABEL=option. Note in the example in figure 7 the entries where wecarelessly did not create a data file label.

VARNUM

Looking back to figure 4, notice that the variables are listed alphabetically. We can see that two ways. First, the title ofthe output tells us so. Second, the variable number in the first column is not in sequential order. The alphabetic listingis the default behavior. But, what if we want the output listing In variable order? PROC CONTENTS permits us to do sousing the VARNUM option. The syntax will look something like the following:

proc contents data=mylib.myfile varnum;

The output will look like what we see in figure 8 on the next page.

Page 7: Documenting Your Data Using the CONTENTS© … Your Data Using the CONTENTS© Procedure Curtis A. Smith, DCAA, La Mirada, CA ABSTRACT SAS© makes documenting our data very simple,

7

Figure 8

MEMTYPE=

Sometimes we will want to document all of the files in our data warehouse of a certain type. For example, we might wantdirectory and data set information about only the files that are views. PROC CONTENTS allows us to do so using theMEMTYPE= option. In the example in figure 7 we can see three member types, DATA, INDEX, and VIEW. We simplyset the option to a specific member type (in our example, a “VIEW”). The syntax will look something like the following:

proc contents data=mylib._all_ memtype=view;

The output will look like the following:

Figure 9

Refer back to figure 7 where we saw all of the files in our data library listed. In that example, we saw only one with themember type of “VIEW”. By specifying the MEMTYPE= option, we can obtain a listing of only the member type that wespecify.

REMAINING OPTIONS

There are a few other options that we might find useful for some needs. For example, if we want to write our PROCCONTENTS output to a SAS data set. Try the remaining options and let me know which you like best.

CONCLUSIONDocumenting our data warehouse is very important. Users and developers alike appreciate good documentation, as itmakes their job easier. Planning and designing our data warehouse will help us store more useful information into ourdata warehouse. The CONTENTS procedure is a great tool for documenting our data libraries and data files. By the way,we can also use the CONTENTS statement with PROC DATASETS to accomplish the same documentation about ourdata warehouse.

Page 8: Documenting Your Data Using the CONTENTS© … Your Data Using the CONTENTS© Procedure Curtis A. Smith, DCAA, La Mirada, CA ABSTRACT SAS© makes documenting our data very simple,

8

REFERENCESBase SAS 9 Procedures Guide

TRADEMARK CITATIONSAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

CONTACT INFORMATIONYour comments and questions are valued and encouraged. Contact the author at:

Curtis A. SmithP.O. Box 20044Fountain Valley, CA 92728-0044email: [email protected]