chapter 5 creating sas data sets from raw files and excel work- sheets

80
Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Upload: arlene-hodges

Post on 02-Jan-2016

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Chapter 5

Creating SAS Data Sets from Raw Files and Excel Work-sheets

Page 2: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Overview

Data EntryRaw Data in External Files

Excel and Other Types of Data

SAS Data Set

DATA STEP

Page 3: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Examine the Raw Data File and File Layout

Partial listing of Sales.dat raw data file:

>----+----10---+----20---+----30- SMITH    JAN 140097.98 440356.70DAVIS    JAN 385873.00 178234.00JOHNSON  JAN  98654.32 339485.00SMITH    FEB 225983.09  12250.00DAVIS    FEB  88456.23  55564.00

Field Name

Start Column

End Column

Data Type SAS Variable Name

Last Name 1 7 Character L_name

Month 9 11 Character month

Residential 13 21 Numeric residential

Commercial 23 31 Numeric commercial

Each field represents a variable. To read the data set, one needs to define a variable name for each variable:

Page 4: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Steps to Create a SAS Data Set from a Raw Data File

This is accomplished in the DATA Step, which requires program statements for conducting the tasks:

1. Provide a physical location for the new SAS data set to be store.

2. Identify the location, name of the external file

3. Define a name for the new SAS data set

4. Provide a reference to identify the external file

5. Define and describe the variables and data values to be read

6. Conduct any additional data manipulations for processing the data

Page 5: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

A Summary of SAS Statements for accomplishing the required tasks

The Tasks Use the SAS statements

Reference SAS data library LIBNAME statement

Reference external file FILENAME statement

Name SAS data set DATA dataset-name;

Identify external file INFILE statement

Describe and read data INPUT statement

Manipulate variables and data values

Depending on the objectives. More will be discussed later

Execute DATA step RUN statement

List the data PROC PRINT statement

Analyze and report Depending on objectives. Different reporting procedures and data analysis procedures will be needed.

Execute final program step RUN statement

Page 6: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

The DATA Step to read external data:

Libname libref ‘__________________’;

filename fileref ‘ ‘;data _______________ ;

infile fileref;

input _________ ;

. . .

run;Or:

data _______________ ;

infile ‘ ‘ ;

input _________ ;

. . .

run;

NOTE: If you copy a program from PPT or from Word File to SAS system, you MUST retype the quotation marks in SAS system. They are defined differently.

Page 7: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

ExampleOur objective is to read the raw data: salesdata.dat

stored in folder of the C-drive with the path:

C:\math707\RawData\RawData_dat

And create a SAS data set Sales_sasdata, then store this SAS data in the Sales folder, a folder needed to be created prior to writing your program, in C-drive with the path:

C:\math707\Sales

Two SAS program statements are required in your SAS program before reading the file:

• A statement to reference the folder to the SAS data set.

• A statement to reference the external data set.

Page 8: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Reference SAS Data Library

LIBNAME saleslib ‘C:\math707\Sales’;

This statement defines a SAS data library saleslib referring to the folder Sales, which will be used to store the new SAS data set to be created.

Page 9: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Reference the External Raw Data File

FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\

salesdata.dat’ ;

NOTE: we define the external raw data file reference name is sal_dat, and data set is located in the HD described in the path above.

NOTE: The rules of external file reference name are the same as Library reference name.

• 1-8 characters, starting with alphabet or underscore, contains only letters, numbers or underscores.

Page 10: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

More on FILENAME statement• It is a global statement.

• It can reference to ONE external raw data file or a folder of external data files.

NOTE: LIBNAME references to a folder of SAS data set, not to ONE SAS data set.

Syntax to reference to ONE external data file:

FILENAME fileref ‘path-to-the_external_datafile_Name’;

NOTE: The fileref will be use in the INFILE statement later to inform SAS to locate the exact external raw data set.

Ex: FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\

salesdata.dat’ ;

Page 11: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

FILENAME for Referencing a GROUP of External Data files

Syntax to reference to a GROUP of external raw data files:

FILENAME fileref ‘path-to the_external_datafile_Folder’;

Ex:

FILENAME EXT_DAT ‘C:\math707\RawData\RawData_dat’ ;

Page 12: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

An example: Read salesdata.datThe Tasks Program Statements in the Data Step

Reference SAS data library LIBNAME saleslib ‘C:\STA575\Sales’;

Reference external file FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\ salesdata.dat’ ;

Name SAS data set DATA dataset-name;

Identify external file INFILE statement

Describe and read data INPUT statement

Manipulate variables and data values

Depending on the objectives. More will be discussed later

Execute DATA step RUN statement

List the data PROC PRINT statement

Analyze and report Depending one objectives. Different reporting procedures and data analysis procedures will be needed.

Execute final program step RUN statement

Page 13: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Define the SAS data set name in DATA Step to read the external data setDATA sas_data_setName;

A SAS Data Step is to read data files into SAS system for further processing and creating a new SAS data set. Once the external raw data is read and processed, it requires a new SAS data set name. This is defined at the DATA statement.

For the example of reading salesdata.dat, we can call the new SAS data set: Sales_sasdata.

Ex: DATA SALESLIB.Sales_Sasdata;

This creates a SAS data set sales_sasdata, which is stored in the SAS library SALESLIB.

Page 14: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Am example: Read salesdata.datThe Tasks Program Statements in the Data Step

Reference SAS data library LIBNAME saleslib ‘C:\STA575\Sales’;

Reference external file FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\ salesdata.dat’ ;

Name SAS data set DATA Saleslib.Sales_sasdata;

Identify external file INFILE statement

Describe and read data INPUT statement

Manipulate variables and data values

Depending on the objectives. More will be discussed later

Execute DATA step RUN statement

List the data PROC PRINT statement

Analyze and report Depending one objectives. Different reporting procedures and data analysis procedures will be needed.

Execute final program step RUN statement

Page 15: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Identify the External Data Set to be INPUT into SAS system

In order to read the external raw data set, SAS will need two statements to accomplish this:

• One is to inform SAS system where to find the External raw data set. The statement is INFILE statement.

• One is to read variables in each record correctly. The Statement is INPUT statement.

Page 16: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

INFILE StatementGeneral syntax:

INFILE file-specification <options>;

The file-specification depends on how the FILENAME statement defines the fileref.

• If the fileref references to exactly ONE external raw data set , then, file-specification is the fileref.

Ex:

FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\salesdata.dat ‘;

INFILE sal_dat;

Page 17: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

IINFLE statement Continued

If fileref references to a folder of external raw data sets (an aggregated group of raw data sets),

then, file-specification needs to be specifically pointing to the exact data set using:

INFILE fileref(data-set-name.file_extension)

Page 18: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Example for INFILE statement When FILEREF references to an aggregated group of Raw data setsEx:

FILENAME EXT_DAT ‘C:\math707\RawData\RawData_dat’ ;

INFILE ext_dat(salesdata.dat);

The fileref is EXT_DAT, which references to the entire folder of external raw data sets. The raw data set in the folder to be input is salesdata.dat

Page 19: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Am example: Read salesdata.datThe Tasks Program Statements in the Data Step

Reference SAS data library LIBNAME saleslib ‘C:\STA575\Sales’;

Reference external file FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\ salesdata.dat’ ;

Name SAS data set DATA saleslib.Sales_sasdata;

Identify external file INFILE sal_dat;

Describe and read data INPUT statement

Manipulate variables and data values

Depending on the objectives. More will be discussed later

Execute DATA step RUN statement

List the data PROC PRINT statement

Analyze and report Depending one objectives. Different reporting procedures and data analysis procedures will be needed.

Execute final program step RUN statement

Page 20: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Describe and Read the Raw External Data: Fixed Column INPUT usingThe INPUT StatementNow, we have informed SAS where to get the raw data and where to store the new SAS data.

The next is to describe the variables and read the data values of the variables from the raw data set. SAS uses the INPUT statement to accomplish.

SAS needs to know exactly the formats of variables in the data set. Different INPUT statements are needed to handle different types of formats in the data set.

In this chapter, we will focus on the variables with STANDARD and FIXED format.

Page 21: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Determine Variable Type:Numeric Vs. Character Data TypesBased on examining the raw data file or the file layout, every SAS variable can be one of two types:

• character

• numeric.

Page 22: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Character data TypeA variable is considered to be character if the data values of the variable contains any combination of the following:

• letters (A - Z, a - z)

• numbers (0-9)

• special characters (!, @, #, %, and so on).

NOTE: characters are case-sensitive. ‘Tom’ is different from ‘tom’ or ‘TOM’.

NOTE: Character data is displayed left-adjusted.

Examples:

Mr. John Doe

126 Apt. A

$34,540

583

Page 23: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Numeric Data TypeA variable is considered to be numeric if it contains

• numbers (0-9).

It may also contain

• a decimal point (.)

• a minus sign (-)

• a letter E to indicate scientific notation.

NOTE: Numeric data is displayed right-adjusted

Examples:

25.6

543

-5.7

4.12E5 [This is 4.12 x 105]

Page 24: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Standard Vs. Nonstandard Numeric DataStandard numeric data can contain only

• Numbers

• Decimal places

• Numbers in scientific or E-notation (ex, 4.2E3)

• Plus or minus signs

Nonstandard numeric data includes

• Values contain special characters, such as %, $, comma (,), etc.

• Date and time values

• Data in fractions, integer binary, real binary,, hexadecimal forms, etc.

Page 25: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Determine if each of the following numeric data standard or nonstandard data

345.12 Standard

$345.12 Nonstandard

3,456.12 Nonstandard

20DEC2010 Nonstandard date

12/20/2010 Nonstandard date

Page 26: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Fixed Format Vs. Free format

Fixed format means a variable occupies in a fixed range of columns from observation to observation.

Free format means the data values are not in a fixed range of columns.

Ex: Fixed format Free format12345678901234567890 12345678901234567890

-------------------- --------------------

HIGH 340 12.5 F HIGH 340 12.5 F

LOW 5630 7.5 F LOW 5630 7.5 F

MEDIAN 674 26.73 M MEDIAN 674 26.73 M

Page 27: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Column INPUT

SAS can read a variety of different and complicate standard and nonstandard data values. This chapter focuses on reading raw data set with FIXED columns and in STANDARD format.

The Column INPUT statement describes the columns in each observation of the raw data set to SAS.

Each variable defined in the INPUT statement

• provides a name to represent each variable in the data set

• indicates a type of character or numeric

• indicates the starting column and ending column.

Page 28: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

The Column INPUT Statement

General form of the Column INPUT statement:

variable is a valid SAS variable name.

$ indicates a character variable.

start identifies the starting position.

end identifies the ending position.

INPUT variable $ start - end . . . ;

Page 29: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

The Column INPUT StatementThere are various ways to read data in the INPUT statement. The following is ‘column Input’.

For the Salesdata Example:

input last_name $ 1 - 7

month $ 9 - 11

residential 13 -21

commercial 23 – 31 ;

Page 30: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Am example: Read salesdata.datThe Tasks Program Statements in the Data Step

Reference SAS data library LIBNAME saleslib ‘C:\STA575\Sales’;

Reference external file FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\ salesdata.dat’ ;

Name SAS data set DATA saleslib.Sales_sasdata;

Identify external file INFILE sal_dat;

Describe and read data INPUT last_name $ 1-7 month $ 9-11 residential 13 -21 commercial 23 – 31;

Manipulate variables and data values

Depending on the objectives. More will be discussed later

Execute DATA step RUN statement

List the data PROC PRINT statement

Analyze and report Depending one objectives. Different reporting procedures and data analysis procedures will be needed.

Execute final program step RUN statement

Page 31: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

The INPUT Statement

This way of describing the input raw data record to SAS is called column input because it defines the starting and ending positions of each field.

This implies that each field in a raw data record is in the same position in every record of the file.

Page 32: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Data Step to read external raw data WITH FILENAME Statement

When reading a raw data set, one must inform SAS system where to read the raw data.

The approach we discussed is to use the following statements (WITH FILENAME statement):

FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\salesdata.dat’ ;

DATA saleslib.sales_sasdata;

INFILE sal_dat;

INPUT last_name $ 1-7 month $ 9-11 residential 13 -21 commercial 23 – 31;

NOTE: You can refer to the same raw data set in other DATA steps by using the fileref.

Page 33: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Data Step to read external raw data WITHOUT FILENAME Statement

To inform SAS where the raw data set is located, we can ignore the FILEMANE statement and combine the path into the INFLE statement:

DATA saleslib.Sales_sasdata;

INFILE ‘C:\math707\RawData\RawData_dat\salesdata.dat’ ;

INPUT last_name $ 1-7 month $ 9-11 residential 13 -21 commercial 23 – 31;

NOTE: No Fileref is defined in the above statements to read the salesdata.dat.

Page 34: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

The DATA StepGeneral form for the complete DATA step without FILEMANE statement:

DATA SAS_data_set_name; INFILE ‘path-to-input-raw-data-file’; INPUT variable $ start - end . . .;

RUN;

General form for the complete DATA step with Filename statement:

DATA SAS_data_set_name;FILENAME Fileref ‘path-to-input-raw-data-file’; INFILE Fileref ; INPUT variable $ start - end . . .;

RUN;

Page 35: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

The order of the variables in the INPUT statement

Using INPUT to read fixed column data set, it does not need to be in sequential order of variables in the raw data set.

For example:

INPUT residential 13-21 commercial 23-31 Last_name $ 1-7 Month $ 9-11;

will read the variables residential from col. 13 to 21, commercial from 23-31, then, move line pointer back to column #1 to read 1-7 for Last_Name and then 9-11 for month.

The output SAS data set will have the variables in the order of

Residential, commercial, Last_name, Month

Page 36: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Use the option ‘OBS= ‘ in INFILE statement

When the data set is very large, it is not a good idea to run the draft program using the entire data set. It is important to make sure there is no syntax error and reduce potential data error before processing the entire data set.

There are two ways to do this:

(1)Use SYSTTEM OPTIONS introduced in previous chapter:

OPTIONS firstobs = n1, obs=n2;

(2) Use OBS= as an option inside INFILE statement:

INFILE fileref obs = n;

Page 37: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Use _NULL_ as the SAS Data Set Name in Data StepSimilar to INFLIE fileref OBS = n; for preventing

processing the entire data set until the program is correct, one does not need to create any SAS data set (including temporary SAS data set). This can be done by using:

DATA _NULL_ ;

/* data set name _NULL_ means ‘Do not create any SAS data set in this data step’ */

FILENAME fileref ‘ ‘ ;

INFILE fileref OBS = n ;

INPUT ;

RUN;

Page 38: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Assignment statements in DATA StepAssignment statement is to modify, transform, redefine existing variables or create new variables.

The general syntax is Variable = expression ;

/*For the Sales data example, the following assignment statement computes the total sales in the Salesdata: */

Data work.sales;

INFILE ‘ ‘;

INPUT ;

Totalsale = residential + commercial;

/*The following assign statements compute the average sales per month:*/

AvgSale1 = (residential+commercial)/Month;

AvgSale2 = totalsale/Month;

/* AvgSale2 statement must appear after the Totalsale statement */

RUN;

Page 39: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Exercise

Variable Type start end Description

ID num 1 4 patient ID number

Name char 5 20 patient name

Sex char 22 22 sex (F or M)

Age num 28 29 age in years

Date num 32 36 day of admission

Height num 42 43 height in inches

Weight num 49 51 weight in pounds

ActLevel char 52 55 activity level (LOW, MOD, HIGH)

Fee num 59 63 clinic admission fee

ID Name Sex Age Date Height Weight ActLevel Fee

2458Murray, W

M 27 18251 72 168 HIGH 85.20

2462Almers, C

F 34 18253 66 172 HIGH 124.80

2501Bonaventure, T

F 31 18267 61 123 LOW 149.75

The following data admitfix.dat is posted on the class website.

Page 40: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Write a SAS program to perform the following tasks:

• Read the data set admitfix.dat using column format, and create the SAS data set admit_sasdata in the Work library

• Compute BMI using the formula:

• Use PROC CONTENTS to see the variable attributes

• Use PROC PRINT to print the admit_sasdata, and use the date9. to display the date variable.

Save the program as c5_colInp to your SASEx folder

2

( ) 703

( ( ))

Weight lbBMI

Height in

Page 41: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Solution

filename adm_fix 'C:\math707\RawData\RawData_dat\admitfix.dat';

data admit_sasdata;

infile adm_fix;

input ID 1-4 name $ 5-20 sex $ 22 age 28-29 date 32-36

height 42-43 weight 49-51 acelevel $ 52-55 fee 59-63;

bmi=weight*703/(height**2);

proc contents; run;

proc print;

format date date9.;

run;

Page 42: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Date ConstantsAs discussed in Chapter 4, SAS treats date as numeric value.

SAS defines the date 01/01/1960 as the date value 0, and sequentially adding the # of dates for the later date, subtracting # of dates for the earlier date.

Here are some examples:

SAS also provides various formats to display date (as discussed in Chapter 4: DATE9. and MMDDYY10. are two common date display formats). Besides how to handle dates, SAS also provides several formats to represent a date Constant:

‘ddmmmyy’d , ‘ddmmmyyyy’d

or with double quotation marks.

Actual Date SAS Stored Date value

01/01/1960 0

01/25/1960 24

12/25/1959 -7

01/01/1961 366

Page 43: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Date Constant (continued)Here are some examples:

NOTE: TODAY() is a SAS function, which provides today’s date. One can request today’s date by an assignment statement:

Today_date = today();

The result will be a numeric date value for today’s date counting from 01/01/1960.

To properly print (display) the date, refer to Chapter 4: using Date Format such as DATE9. , MMDDYY10.

Actual Date In terms of SAS date constant

3/25/2007 ’25mar2007’d

9/8/2009 ’08sep2009’d

Today’s date TODAY()

Page 44: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Time constant, Date-Time constant in SAS

In addition to Date constant, SAS also provides

TIME constant for any given date:

‘hour:minutes’t for up to minute.

‘hour:minutes:second’t for up to second.

Example: Duetime = ’23:59’t ;

TIME constant for a SPECIFIC date:

‘ddmmmyyyy:hour:minute:second’dt

Example: DueDate = ’09sep2009:23:59:59’dt

Page 45: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

ExerciseWrite a program to practice the following:

•Find out today’s date using TODAY() SAS function.

•Define the July 4th, 2011 as a date constant.

•Define the begin time and end time for Math 707 using time constant.

Begin time is 17:00:00 , end time is 19:45:00

•Define the first second of the year of 2011 using datetime constant.

•Print the date constant using DATE9. , print time constant using TIME10. , print the datetime constant using DATETIME25.

NOTE: DATE9. display date, TIMEw. displays time, and DATETIMEw. display datetime. W is the width needed to display the values. It should be large enough as needed.

Page 46: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Solution

data datetime;

today_D=today();

d_july4='04jul2011'd;

bg_time_S575 = '17:00:00't;

en_time_s575 = '19:59:59't;

dt_jan_2011 = '01jan2011:00:00:01'dt;

proc print;

format today_d d_july4 date9. bg_time_s575 en_time_s575 time10. dt_jan_2011 datetime25.;

run;

Page 47: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Subsetting data cases using conditional IF statementIn Chapter 4, we use WHERE statement in PROC step, such as

PROC PRINT; to select cases.

In this chapter, we introduce how to use the conditional IF statement for case selection in the DATA Step.

In the later chapters, we will discuss further the difference between WHERE and IF.

In DATA Step, we can use the statement:

IF expression;

To select cases that only satisfy the IF expression statement.

NOTE: For cases which do not satisfy the IF condition will not be kept in the output SAS data set.

Page 48: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Conditional IF to select observationsGeneral Syntax:

IF condition;

NOTE: When the condition is true, the observation is selected, otherwise, not selected..

For example: IF sex = ‘M’;

Will only select subjects whose sex is ‘M’.

NOTE: if the data value is ‘m’, it is not selected, since data value is case sensitive.

Page 49: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Example: Using IF for selecting only Month in Jan, Feb, March for the SalesdataData work.sale;

INFLIE ‘C:\math707\RawData\RawData_dat \Salesdata.dat’;

input last_name $ 1 – 7 month $ 9 - 11

residential $ 13 -21 commercial $ 23–31;

If Month in (‘JAN’, ‘FEB’, ‘MAR’);

run;

Proc print;

Run;

Page 50: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Can we read data from within the SAS program?

Answer is YES.

Here is the general steps:

DATA …. ;

INPUT ………….;

……….

DATALINES;

/*CARDS; also works */

Actual data values that is entered based on the format stated in the INPUT statement.

………

;

RUN;

Page 51: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Example – data are within the SAS programThe following is the scores of quizzes, test1, test2 and final of a class.

Name Q1 Q2 Q3 Q4 Q5 T1 T2 Final

----+----1----+----2----+----3----+----4

CSA 17 18 15 19 18 85 92 145

DB . 16 14 18 16 72 76 120

QC 20 18 19 16 20 92 95 143

DC 18 15 . 15 20 82 79 125

E 20 18 15 15 18 80 82 135

F 16 16 15 15 16 72 75 116

GC 20 16 17 16 17 . 87 139

HD 18 15 15 . 19 85 79 115

IM 17 18 19 20 20 95 92 145

WB 13 16 14 15 16 72 66 110

Write a SAS program to read the data by having the data included in the SAS program.

Page 52: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

/*Program Statements */

DATA scores;

INPUT Name $ 1-5 Q1 6-7 Q2 10-11 Q3 14-15 Q4 18-19 Q5 22-23TEST1 25-27 TEST2 29-31 Final 33-36;

CARDS; /*You can also use DATALINES; in place of CARDS; */

CSA 17 18 15 19 18 85 92 145

DB . 16 14 18 16 72 76 120

QC 20 18 19 16 20 92 95 143

DC 18 15 . 15 20 82 79 125

E 20 18 15 15 18 80 82 135

F 16 16 15 15 16 72 75 116

GC 20 16 17 16 17 . 87 139

HD 18 15 15 . 19 85 79 115

IM 17 18 19 20 20 95 92 145

WB 13 16 14 15 16 72 66 110

;

RUN;

PROC PRINT;

RUN;

Page 53: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Creating Raw Data File

So far, we introduce how to READ an external Raw data set (such as .txt file, .dat file) which has fixed columns for each variables.

By the same method, we can create a raw data set and save it to an external location using SAS.

For INPUT external raw data set, we use

INFLIE and INPUT statements.

For creating a raw data and PUT the data to an external location, we use

FILE and PUT statements.

FILE statement defines the location where the raw data set will be saved.

PUT statement defines how the variables will be saved.

Page 54: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Syntax for FILE and PUT statements

FILE ‘path of the raw data set location in the storage space’ ;

PUT var1 start_col – end_col var2 start_col – end_col ……. ;

When we create a raw data set to be saved as an external file, we do not need to create any SAS data set (including temporary data set), therefore, we can create a _NULL_ Data Step for this purpose.

The following is an example of creating a raw data set : admit.dat from the SAS data set admit in the library mylib, which we created previously.

Page 55: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Read SAS data set Admit from library mylib, and create a raw data set adAgegt30.dat, then save the variables in the following order and columns:

NAME (1-20), SEX (22), AGE (24-26), Fee (28-35), Weight (37-40), Height (42-45)to the c-drive in the Raw_data folder for AGE > 30.Libname mylib ‘C:\math707\SASData’;

Data _NULL_;

Set mylib.admit;

IF Age > 30;

FILE ‘C:\math707\RawData\RawData_dat\adagegt30.dat’;

PUT name 1-20 sex 22 age 24-26 fee 28-35 weight 37-40 height 42-45;

RUN;

Page 56: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Similar to FILENAME, INFILE and INPUT, we can use FILENAME, FILE and PUT statements to create a new external raw data set

Libname mylib ‘C:\math707\SASData’;

FILENAME age ‘C:\math707\RawData\RawData_dat\adagegt30.dat’ ;

Data _NULL_;

Set mylib.admit;

IF Age > 30;

FILE age;

PUT name 1-20 sex 22 age 24-26 fee 28-35 weight 37-40 height 42-45;

RUN;

Page 57: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Describing data using PUT Statement Different from INPUT statement is that we DO NOT need to

distinguish between Numeric and Character using $ for Character variables in the PUT statement. Once SAS writes the data values to the storage, no further processing is needed that requires SAS to know if the variable is Numeric or Character.

NOTE: Usually, you need to have the FILE statement given before PUT statement. However, if you do not give the FILE statement before PUT statement, SAS will write the data values to SAS LOG. Or, if you use LOG as fileref in FILE statement, SAS will also write the raw data to SAS LOG:

FILE LOG; PUT ………........;

NOTE: use PRINT as the fileref in FILE statement:

FILE PRINT; PUT ……….. ;

Will write the raw data lines to SAS OUTPUT window.

Page 58: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Exercise

Write a SAS program to read admit data in your Mylib, and create an external raw data set with only the observations whose Actlevel = ‘HIGH’.

Put the external data to the folder: c:\math707\Rawdata\Rawdata_dat, and call the raw data set as adm_high.dat

by including the following variables

(start Col. – End Col.):

Name (1-20), Age (22-24), Height ( 26-28) , Weight (30-33), Fee (35-45), and Actlevel (4 7-50)

Page 59: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Solution

data _null_;

set mylib.admit;

file 'c:\math707\Rawdata\\Rawdata_dat\adm_high.dat';

if Actlevel = 'HIGH';

put name 1-20 Age 22-24 Height 26-28 Weight 30-33 Fee 35-45 Actlevel 47-50;

*put name age height weight fee actlevel; /*Free format with space as delimiter */

run;

proc print; run;

Page 60: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Other methods of reading external raw data

You learn how to input raw data using Column INPUT.

The conditions for Column INPUT are data must be in standard format and must be in fixed columns.

Many real world data are not prepared this way. Various other INPUT techniques will be discussed in the later chapters.

Page 61: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Reading Microsoft Excel Data

SAS can also read various other data created using different software. The key relies on how the data set is referenced. For EXCEL data, you can read the Excel data using

• SAS/ACCESS LIBNAME

• IMPORT Wizard

Recall: SAS LIBNAME in which a SAS data library is created by defining a LIBREF to reference the SAS data set folder.

The SAS/ACCESS LIBNAME is similar. The LIBNAME statement defines an Excel Workbook that references to the folder of Excel data sets.

Page 62: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Steps for Reading Excel Data

The Data step must provide the following instructions to SAS:

• A libref pointing to the location of the Excel workbook to be read.

• A new SAS data set name and a libref pointing to the location of the new SAS data set, the name and location.

• The name of the Excel worksheet to be read.

Page 63: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Task SAS program statement

Reference Excel Workbook

LIBNAME libref ‘location-of-Excel-workbook’ <options>;

New SAS data set to be created

DATA sas-data-set-name;

Read in an Excel worksheet

SET libref.Excel-work-sheet ;

Execute Data Step RUN;

Tasks and Corresponding SAS statements to accomplish the tasks

Page 64: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Define SAS/ACCESS LIBNAMESAS /ACCESS LIBNAME statement has the syntax:

LIBNAME libref ‘location-of-Excel-workbook’ <options>;

Ex:

LIBNAME Exc_lib ‘C:\math707\RawData\RawData_XLS\admit.xls’ ;

Exc_lib references to the Excel workbook admit.xls.

NOTE:

It is possible that an Excel workbook consists of more than one Excel Worksheet. Each of the Worksheets will be read as a separate SAS data set.

SAS 9.2 can read both .XLS (Excel 2003) and .XLSX (Excel 2007)

Page 65: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

How SAS define a Valid Excel Worksheet Name

NOTE: All Excel worksheet have a special character ($) at the end of the Excel worksheet name. This is not a Valid data set name in SAS. In order to recognize a proper Excel worksheet name, SAS adds a letter n (or N) to a quotation marked Excel worksheet name.

Ex: Suppose Exercise is a .xlsx Excel worksheet, a valid Excel worksheet name SAS recongnizes is

‘Exercise$’n or ‘exercise$’N

Suppose the Exercise.xlsx is stored in C:\Exceldata folder. The following statements read Exercise.xlsx file into SAS:

LIBNAME TEST ‘C:\exceldata\Exercise.xlsx’;

DATA exer_sasdata;

SET TEST.’exercise$’n ;

RUN;

Page 66: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Named Ranges in Excel Worksheet

A named range is a range of cells within a worksheet that you define in Excel and assign a name to.

The valid name for the named Range Excel Worksheet tests_week_1 shown in the SAS Explorer will be tests_week_1 , not test_week_1$

NOTE: It is a common practice to use PROC CONTENTS or PROC DATASETS procedures to view the data set and variable attributes of the SAS data sets created from Excel data sets.

Page 67: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Libname Statement Options for readin EXCEL file

LIBNAME libref ‘location-of-Excel_workbook’ <options>;

A variety of options may be useful when referencing Excel workbook:

• DBMAX_TEXT=n : indicates the length of the longest character string is n, which is between 256 and 32767. The default n = 1024. (use PROC CONTENTS to check it).

• GETNAME = YES|NO : determines whether SAS will use the first row in an Excel worksheet as Column Names. Default = YES

NOTE: It is common that Excel sheet does not include Variable names as the first case. If this is the case, GETNAME = NO.

• MIXED=YES|NO: whether to import data with both character and numeric values and convert all data to Character. Default = NO, which will read Character as character and Numeric as numeric. A wrong data type will be read as missing.

• SCANTEXT=YES|NO: Whether to read the entire data column and use the length of the longest string as the SAS column width. Default = YES. If it is NO, then the column width is 255.

Page 68: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

• SCANTIME=YES|NO: Whether to scan all row values in a date/time column and automatically determine the TIME format if only time values exist. Default = NO.

If specify YES, the format will be TIME8.

If specify NO, the format will be DATE9.

• USEDATE = YES|NO: whether to use DATE9. format for date/time values in Excel workbooks. Default = YES.

If specify NO, the format is DATETIME.

MORE Options in LIBNAME statement

Page 69: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Creating Excel Work-sheets from SAS Data Sets

Besides reading Excel worksheets, SAS can also create Excel worksheets from SAS data sets:

This is accomplished by defining the new Excel worksheet in LIBNAME statement and using the LIBREF as the new data set name in DATA step:

Ex:

LIBNAME Ex_out ‘c:\math707\Exercise_out.xlsx’;

DATA Ex_out.High_exer;

Set work.exercise;

IF level=‘HIGH’;

Run;

Create a new Excel workseet high_exer from SAS data set Exercise in the WORK library and save it in the new Excel file Exercise_out.xlsx

Page 70: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

An Example – reading Excel sheet, Admit.xls from external location/* Read Excel worksheet */

libname ex_adm 'C:\math707\RawData\RawData_XLS\admit.xls' mixed=Yes GETNAMES = NO;

data ex_admit;

set ex_adm.'admit$'n ;

proc print; run;

Partial list of the output:

The SAS System 16:19 Monday, September 20, 2010 1 Obs F1 F2 F3 F4 F5 F6 F7 F8 F9 1 2458 Murray, W M 27 1 72 168 HIGH 85.20 2 2462 Almers, C F 34 3 66 152 HIGH 124.80 3 2501 Bonaventure, T F 31 17 61 123 LOW 149.75 4 2523 Johnson, R F 43 31 63 137 MOD 149.75

Page 71: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

How to Rename the Variable Names from Excel Sheet?If GETNAME = NO, the default Variable Names are F1, F2, F3,

…..

Two ways to have the correct variable names:

1. To add Variable Names to the Excel Sheet prior to reading the Excel sheet, and use GETNAME=YES (default).

2. To rename the Variable names in the SAS program in the DATA statement using RENAME option:

For example:

Libname ex_adm ‘C:\math707\RawData\RawData_XLS\admit.xle’ getname=NO MIXED=Yes;

DATA Ex_admit (RENAME = (F1=ID F2=NAME F3=Sex F4=Age F5=Date F6=Height F7=Weight F8=Actlevel F9=Admit_fee));

set ex_adm.'admit$'n ;

PROC PRINT; RUN;

Page 72: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Exercise: Read EXCEL dataWrite a SAS program to

•read the excel file diabetes.xls, which has 20 cases, located at c:\math707\rawdata\rawdata_xls

•Use PROC CONTENTS to see the data and variable attributes

•Use PROC PRINT to see the data set. Observe the data set:

(a) Variable Names are F1, F2, …., F8

(b) # of observation is not 20 , but 19.

Page 73: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Solution

libname dia_lib 'C:\math707\RawData\RawData_XLS\diabetes.xls';

data diab;

set dia_lib.'diabetes$'n;

proc contents; run;

proc print; run;

Page 74: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Exercise: revise the program reading Diabetes.xls data to perform the following tasks:

• Use Getname = NO no variable names will be read from Excel sheet, and Mixed =YES so that character and numeric will be read as it is.

• Rename variable names: F1=ID, F2=SEX, F3=AGE, F4=HEIGHT, F5=WEIGHT, F6=PULSE, F7=FASTGLUC, F8=POSTGLUC

Page 75: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Answer

libname dia_lib 'C:\math707\RawData\RawData_XLS\diabetes.xls' getnames=NO mixed = YES;

data diab (rename=(F1=ID F2=SEX F3=AGE F4=HEIGHT F5=WEIGHT F6=PULSE F7=FASTGLUC F8=POSTGLUC));

set dia_lib.'diabetes$'n;

proc contents; run;

proc print; run;

Page 76: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Exercise: Create Excel file using SAS

Write a SAS program to read diabetes data from mylib library and perform the following tasks:

•Select only individuals with age >= 50

•Create an excel file, diab_senior.xls, and save it to c:\math707\rawdata folder.

Page 77: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Solution

libname dia_out 'C:\math707\rawdata\diab_senior1.xlsx';

data dia_out.diab_high;

set mylib.diabetes;

if age >=50;

run;

proc print; run;

NOTE: You can not see the content of the Excel data file immediately after running the program. In order to access the Excel file you just created, you need to get out of the current program by exiting from SAS or by running another program.

Page 78: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Use IMPORT Wizard to Read Excel worksheet or other types of Raw DataIn addition to writing SAS program statements, you

can also use the SAS Pull-down menu :

Go to File, Import Data and follow the instructions to read a variety of data, including:

• dBase file (.dbf)

• Excel 2007 or earlier (.xlsx, .xls, .xlsb, .xlsm)

• Microsoft tables (.mdb, .accdb)

• Delimited files ( *.*)

• Comma-separated files (*.csv)

• Text files (.txt, .tab, .asc, .dat)

Page 79: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Use Export Widzard to write the SAS Data Set as External Excel data set

One can also use SAS Pull-down menu to write SAS data sets to an external source:

Go to File, Export, then follow the instructions to create external data sets from SAS data sets.

Page 80: Chapter 5 Creating SAS Data Sets from Raw Files and Excel Work- sheets

Save SAS Codes from using Import and Export Pull-down menu

When using the SAS Pull-down menu, you can also save the SAS program codes behind the Import and Export pull down menu.

Once you save the SAS programs running behind, you can edit the programs and save them for future needs