preparing your data for analysis using sas landon sego 24 april 2003 department of statistics...

75
Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Upload: jade-george

Post on 04-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Preparing your data for analysis using SAS

Landon Sego24 April 2003

Department of StatisticsUW-Madison

Page 2: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Assumptions

• That you have used SAS at least a few times.

• It doesn’t matter whether you run SAS in interactive mode (Windows) or in batch mode (Unix/Linux).

Page 3: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Interactive SAS for Windows

Page 4: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Editing SAS code in EMACS for Batch mode execution

Page 5: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Executing SAS in batch mode at the Linux prompt

Page 6: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Where we’re going…

• Rarely does data come to you in a form that is analyzable.

• As a best case, all you need to do is clean your data and check it for consistency.

• As a worst case, extensive manipulation of the data is needed in order to analyze.

• We want to get familiar with some tools in SAS used to check, clean, and manipulate data.

Page 7: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

The dark toolbox

• SAS is like a toolbox as big as a garage—with thousands of tools.

• For many SAS users, this toolbox is dark and there is a monster lurking inside.

• Let’s turn on the light and meet a few of the tools available in SAS.

• No guarantees about the monster…

Page 8: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Published resources

• Lots available

• I’ve learned on my own and with the SAS documentation

• Google searches

Page 9: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

SAS Online Documentation

http://shelf.doit.wisc.edu/SASOnlineDocs/onldoc.htm

or

http://shelf.doit.wisc.edu/SASOnlineDocs/main.htm

Page 10: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

What I use most often in the SAS Online Documentation

• Base SAS Software– SAS Language Reference: Concepts– SAS Language Reference: Dictionary– SAS Macro Language Reference– SAS Procedures Guide

• SAS/STAT– SAS/STAT User’s Guide

Page 11: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

SAS Language Reference: Dictionary

Page 12: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

SAS Procedures Guide

Page 13: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

SAS/STAT User’s Guide

Page 14: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Conventions, terminology, and

options

Page 15: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Conventions

• SAS terminology in red

• SAS code in blue

Page 16: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Basic terminology• Data in SAS exists as a data set, with variables

and observations.• variables are the columns.• observations are the rows.

• Two types of variables: Character and numeric.

• Character variables can range in length from 1 to 32,767 = 215 characters.

• Numeric variables can be virtually any size (within the limitations of the computer)

Page 17: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

My favorite options

On the first line of almost every SAS program I write, I include the following:

options nodate nocenter nonumber ps=3000 ls=200 mprint mlogic symbolgen;

These options control the format of the output and make macro code easier to debug

Page 18: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Importing data into SAS

Page 19: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Importing data into SAS

• Data can exist in many forms (text file, Excel spreadsheet, permanent SAS data set, etc.)

• Excel spreadsheets are probably the most common form.

• Can use DDE (dynamic data exchange) (Windows version of SAS only)

• But for Excel files, I like to use CSV file format (comma separated value). Works on any platform.

Page 20: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Excel CSV text file SAS data set

Column of ‘j’s provide a buffer at the end of each line of text in the CSV file. If you are running SAS on a Linux or UNIX machine, you need to add the j’s (or use the dos2unix command to convert the CSV file to the text formatting used by UNIX).

Page 21: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Save Excel spreadsheet in the CSV format

Page 22: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

How the CSV file looks (when viewed with a text editor)

Location,Type,Length,j

Albuquerque,1,1.414,j

Albuquerque,1,2.000,j

Albuquerque,1,1.414,j

Albuquerque,1,2.236,j

Albuquerque,2,2.000,j

Albuquerque,2,2.236,j

Albuquerque,2,2.236,j

Albuquerque,2,2.236,j

Lexington,1,2.000,j

Page 23: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

SAS code to import the CSV filedata fake; infile ‘c:\mydata\fake.csv’ dsd firstobs=2; input location :$14. type length;

proc print;

run;

Note, if I were to use:

input location $ type length;

it would truncate the “location” variable to 8 characters in length.

Page 24: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Results from the proc printObs location type length 1 Albuquerque 1 1.414 2 Albuquerque 1 2.000 3 Albuquerque 1 2.236 4 Albuquerque 2 2.000 5 Albuquerque 2 2.236 6 Albuquerque 2 2.236 7 Lexington 1 2.000 8 Lexington 1 2.000 9 Lexington 1 1.414 10 Lexington 1 2.000 11 Lexington 2 1.732 12 Lexington 2 1.732 13 Lexington 2 2.236 14 Johannesburg 1 2.000 15 Johannesburg 1 2.236 16 Johannesburg 1 2.236 17 Johannesburg 2 2.000 18 Johannesburg 2 1.414 19 Johannesburg 2 2.000

Page 25: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Checking and summarizing data

Page 26: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Checking and summarizing data

proc contents data=fake;

proc freq data=fake;

run;

This is IMPORTANT!

Don’t take for granted that there aren’t mistakes in your data.

Page 27: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Result from proc contents

Data Set Name: WORK.FAKE Observations: 19Member Type: DATA Variables: 3 Engine: V8 Indexes: 0 Created: 11:46 Wednesday, April 16, 2003 Observation Length: 32Last Modified: 11:46 Wednesday, April 16, 2003 Deleted Observations: 0 ..File Name: /tmp/sastmp_sego/SAS_workB6B700000836_

gstat201.stat.wisc.edu/fake.sas7bdat ..-----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos-----------------------------------3 length Num 8 81 location Char 14 162 type Num 8 0

Page 28: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Results from proc freqThe FREQ Procedure

Cumulative Cumulativelocation Frequency Percent Frequency Percent-----------------------------------------------------------------Albuquerque 6 31.58 6 31.58 Johannesburg 6 31.58 12 63.16 Lexington 7 36.84 19 100.00

Cumulative Cumulativetype Frequency Percent Frequency Percent--------------------------------------------------------- 1 10 52.63 10 52.63 2 9 47.37 19 100.00

Cumulative Cumulativelength Frequency Percent Frequency Percent----------------------------------------------------------- 1.414 3 15.79 3 15.79 1.732 2 10.53 5 26.32 2 8 42.11 13 68.42 2.236 6 31.58 19 100.00

Page 29: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Selecting subsets of the data

Page 30: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Selecting observations (rows)

• A large contiguous group of observations

• Specific observation numbers

• Using selection criteria

e.g. when the location is “Lexington”

or when the length is between 1 and 2.

Page 31: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Selecting a group of contiguous observations

data smallfake; set fake (firstobs=10 obs=15);proc print;

Obs location type length

1 Lexington 1 2.0002 Lexington 2 1.7323 Lexington 2 1.7324 Lexington 2 2.2365 Johannesburg 1 2.0006 Johannesburg 1 2.236

Selects observations 10 through 15

Data set options

Page 32: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Selecting specific observation numbers

data smallfake; set fake; if _n_ in (7,11,16);proc print;

Obs location type length

1 Lexington 1 2.000

2 Lexington 2 1.732

3 Johannesburg 1 2.236

Selects observation numbers 7, 11, and 16.

Page 33: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Selection criteria: where statement

data smallfake; set fake; where location = ‘Lexington’;or where location ne ‘Lexington’;or where location in (‘Lexington’ , ’Albuquerque’);or where (1 le length le 2);or where (length > 2.3);

Page 34: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Selection criteria: if statement

data smallfake; set fake; if location in (‘Lexington’ , ’Albuquerque’);or if location = ‘Lexington’ | location = ‘Albuquerque’;or if location = ‘Johannesburg’ then delete;

These three if statements produce identical results.

Page 35: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Some comparison operators• ne or ^= not equals to• eq or = equals to• ge or >= greater than or equal to• gt or > greater than• le or <= less than or equal to• lt or < less than• in if contained in a group• not in if not contained in a group• and or & and logical operator• or or | or logical operator

Page 36: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Selecting and managing variables

Page 37: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Selecting variables using “keep”data smallfake (keep = location length);

set fake;

where type = 1;

(type is available for processing, but not written to smallfake data set)

data smallfake;

set fake (keep = location length);

(type is not available for processing)

Data set options

Page 38: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Selecting variables using “drop”data smallfake (drop = type);

set fake;

where type = 1;

(type is available for processing, but not written to smallfake data set)

data smallfake;

set fake (drop = type);

(type is not available for processing)

Data set options

Page 39: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Renaming variablesdata fake1; set fake (rename=(location=place type=trt)); where trt = 1;

data fake2 (rename=(location=place type=trt)); set fake; where type =1;

data fake3 (drop = location type); set fake; where type = 1; place = location; trt = type;

These three pieces of code achieve the same result. Look closely at the where statements.

location place

type trt

Page 40: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Concatenation

Page 41: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Concatenation (stacking)

• SAS can stack multiple data sets on top of one another.

• Pay attention whether or not the variables and their attributes (length and variable type) match among the different data sets.

• Can use the set statement or proc append to concatenate data sets.

Page 42: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Using the set statement to concatenate data sets

Suppose you wanted to stack the three data sets fake, faux, and fraud on top of one other:

data fantastic; set fake faux fraud;

Page 43: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Using proc append to concatenate data sets

proc append concatenates only two data sets at a time—and typically these data sets must have the same variable names with the same attributes.

proc append base=fake data=faux;

Here the observations in faux are tacked onto the end of the fake data set. The combined data set is called fake.

Page 44: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Splitting data into several data sets

Supposed we want all Albuquerque observations to go into a data set called albuq, the Lexington observations to go into the data set lexing, and observations that have lengths larger than 3.0 into the data set large.

data albuq lexing large; set fake; if location = ‘Albuquerque’ then output albuq; else if location = ‘Lexington’ then output lexing; if length gt 3 then output large;

Page 45: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Merging data

Page 46: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Merging (combining) data• Merging data sets places two or more data sets

side by side into a single data set.• If you simply want place two data sets side by

side (1 to 1 merging):

data faux; set faux (rename=(location=location1 type=type1));

data fantastic; merge fake faux;

proc print data = fantastic;

Page 47: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Results of 1 to 1 mergeObs location type length location1 type1 weight

1 Albuquerque 1 1.414 Lexington 2 4.456 2 Albuquerque 1 2.000 Lexington 2 2.546 3 Albuquerque 1 2.236 Lexington 2 3.789 4 Albuquerque 2 2.000 Lexington 1 2.457 5 Albuquerque 2 2.236 Lexington 1 4.456 6 Albuquerque 2 2.236 Lexington 1 4.551 7 Lexington 1 2.000 Lexington 1 3.246 8 Lexington 1 2.000 Johannesburg 1 2.998 9 Lexington 1 1.414 Johannesburg 1 2.721 10 Lexington 1 2.000 Johannesburg 1 2.115 11 Lexington 2 1.732 Johannesburg 2 3.489 12 Lexington 2 1.732 Johannesburg 2 3.447 13 Lexington 2 2.236 Johannesburg 2 4.122 14 Johannesburg 1 2.000 Albuquerque 1 2.118 15 Johannesburg 1 2.236 Albuquerque 1 2.871 16 Johannesburg 1 2.236 Albuquerque 1 2.516 17 Johannesburg 2 2.000 Albuquerque 2 3.445 18 Johannesburg 2 1.414 Albuquerque 2 2.998 19 Johannesburg 2 2.000 Albuquerque 2 3.549

Page 48: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Merging data with a “by” variableAll data sets that will be merged must first be sorted by the linking variables:

proc sort data = fake; by location type;

proc sort data = faux; by location type;

data fantastic; merge fake faux; by location type;

proc print data = fantastic;

Page 49: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Results of merging with “by” variables

Obs location type length weight 1 Albuquerque 1 1.414 2.118 2 Albuquerque 1 2.000 2.871 3 Albuquerque 1 2.236 2.516 4 Albuquerque 2 2.000 3.445 5 Albuquerque 2 2.236 2.998 6 Albuquerque 2 2.236 3.549 7 Johannesburg 1 2.000 2.998 8 Johannesburg 1 2.236 2.721 9 Johannesburg 1 2.236 2.115 10 Johannesburg 2 2.000 3.489 11 Johannesburg 2 1.414 3.447 12 Johannesburg 2 2.000 4.122 13 Lexington 1 2.000 2.457 14 Lexington 1 2.000 4.456 15 Lexington 1 1.414 4.551 16 Lexington 1 2.000 3.246 17 Lexington 2 1.732 4.456 18 Lexington 2 1.732 2.546 19 Lexington 2 2.236 3.789

Page 50: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

More about mergingWhen you merge with a by statement, you may only want observations that have by-variable “matches” in both data sets.

fake fraud

Page 51: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Assume both fake and fraud are sorted by location and type.

data fantastic; merge fake (in = tmp1) fraud (in = tmp2); by location type; from_fake = tmp1; from_fraud = tmp2;

proc print;

Using (in =) data set option

Page 52: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

from_ from_Obs location type length thickness fake fraud

1 Albuquerque 1 1.414 1.120 1 1 2 Albuquerque 1 2.000 1.120 1 1 3 Albuquerque 1 2.236 1.120 1 1 4 Albuquerque 2 2.000 . 1 0 5 Albuquerque 2 2.236 . 1 0 6 Albuquerque 2 2.236 . 1 0 7 Albuquerque 3 . 1.400 0 1 8 Albuquerque 3 . 1.540 0 1 9 Albuquerque 3 . 1.968 0 1 10 Johannesburg 1 2.000 1.320 1 1 11 Johannesburg 1 2.236 1.945 1 1 12 Johannesburg 1 2.236 1.945 1 1 13 Johannesburg 2 2.000 1.440 1 1 14 Johannesburg 2 1.414 1.440 1 1 15 Johannesburg 2 2.000 1.440 1 1 16 Johannesburg 3 . 1.521 0 1 17 Johannesburg 3 . 1.500 0 1 18 Lexington 1 2.000 . 1 0 19 Lexington 1 2.000 . 1 0 20 Lexington 1 1.414 . 1 0 21 Lexington 1 2.000 . 1 0 22 Lexington 2 1.732 1.160 1 1 23 Lexington 2 1.732 0.783 1 1 24 Lexington 2 2.236 0.783 1 1 25 Lexington 4 . 1.372 0 1 26 Lexington 4 . 1.980 0 1

Identifying obs from both data sets

Page 53: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Now select observations that are common to both data sets:

data fantastic; merge fake (in = tmp1) fraud (in = tmp2); by location type; if tmp1=1 and tmp2=1;

proc print;

Using (in =) data set option

Page 54: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Obs location type length thickness

1 Albuquerque 1 1.414 1.120 2 Albuquerque 1 2.000 1.120 3 Albuquerque 1 2.236 1.120 4 Johannesburg 1 2.000 1.320 5 Johannesburg 1 2.236 1.945 6 Johannesburg 1 2.236 1.945 7 Johannesburg 2 2.000 1.440 8 Johannesburg 2 1.414 1.440 9 Johannesburg 2 2.000 1.440 10 Lexington 2 1.732 1.160 11 Lexington 2 1.732 0.783 12 Lexington 2 2.236 0.783

After selecting for observations in common:

Page 55: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Merge maniadata fantastic; merge fake fraud; by location type;

data fantastic; merge fake (in=tmp1) fraud; by location type; if tmp1 = 1;

data fantastic; merge fake (in=tmp1) fraud (in=tmp2); by location type; if tmp1 =1 and tmp2 = 1;

Page 56: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Creating new variables

Page 57: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Creating new variables

data fantastic; merge fake faux; by location type;

newcode = substr(location,1,1) || '-' || trim(left(type));

growth_index = length + weight**2;

if (growth_index gt 15) then large = '*'; else large = ' ';

Page 58: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Results of new variables growth_Obs location type length weight newcode index large

1 Albuquerque 1 1.414 2.118 A-1 5.8999 2 Albuquerque 1 2.000 2.871 A-1 10.2426 3 Albuquerque 1 2.236 2.516 A-1 8.5663 4 Albuquerque 2 2.000 3.445 A-2 13.8680 5 Albuquerque 2 2.236 2.998 A-2 11.2240 6 Albuquerque 2 2.236 3.549 A-2 14.8314 7 Johannesburg 1 2.000 2.998 J-1 10.9880 8 Johannesburg 1 2.236 2.721 J-1 9.6398 9 Johannesburg 1 2.236 2.115 J-1 6.7092 10 Johannesburg 2 2.000 3.489 J-2 14.1731 11 Johannesburg 2 1.414 3.447 J-2 13.2958 12 Johannesburg 2 2.000 4.122 J-2 18.9909 * 13 Lexington 1 2.000 2.457 L-1 8.0368 14 Lexington 1 2.000 4.456 L-1 21.8559 * 15 Lexington 1 1.414 4.551 L-1 22.1256 * 16 Lexington 1 2.000 3.246 L-1 12.5365 17 Lexington 2 1.732 4.456 L-2 21.5879 * 18 Lexington 2 1.732 2.546 L-2 8.2141

19 Lexington 2 2.236 3.789 L-2 16.5925 *

Page 59: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Common functions used to manipulate text strings

compress

index

left

scan

substr

trim

Refer to SAS Online Docs:

Base SAS Software

SAS Language Reference: Dictionary

Dictionary of Language Elements

Functions and Call Routines

Page 60: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

by-group processing

Page 61: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

by-group processing

• Suppose you wanted a subset of the data that contained the observation with the smallest length from each location.

proc sort data = fake; by location length;

data shortest; set fake; by location length; first = first.location; last = last.location;

Page 62: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

output from by-group processingObs location type length first last

1 Albuquerque 1 1.414 1 0 2 Albuquerque 1 2.000 0 0 3 Albuquerque 2 2.000 0 0 4 Albuquerque 1 2.236 0 0 5 Albuquerque 2 2.236 0 0 6 Albuquerque 2 2.236 0 1 7 Johannesburg 2 1.414 1 0 8 Johannesburg 1 2.000 0 0 9 Johannesburg 2 2.000 0 0 10 Johannesburg 2 2.000 0 0 11 Johannesburg 1 2.236 0 0 12 Johannesburg 1 2.236 0 1 13 Lexington 1 1.414 1 0 14 Lexington 2 1.732 0 0 15 Lexington 2 1.732 0 0 16 Lexington 1 2.000 0 0 17 Lexington 1 2.000 0 0 18 Lexington 1 2.000 0 0 19 Lexington 2 2.236 0 1

Page 63: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

by-group processingproc sort data = fake; by location length;

data shortest; set fake; by location length; if first.location = 1;

Obs location type length

1 Albuquerque 1 1.414 2 Johannesburg 2 1.414 3 Lexington 1 1.414

Page 64: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Basic macros

Page 65: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Basic macros

• SAS macros allow you to easily program repetitive tasks.

• On the surface, creating a SAS macro is very similar to creating a function in R or S-Plus.

• SAS Macro is actually a text generation tool.

Page 66: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

%macro analyze(dataset=,response=);

proc mixed data = &dataset; class location type; model &response = location type; lsmeans location type; ods output lsmeans=model_means;

data model_means; set model_means; variable = "&response";

proc append base=results data=model_means;

%mend analyze;

%analyze(dataset=fake,response=length)%analyze(dataset=faux,response=weight)

proc print data = results;

Name macro, begin macro definition, identify macro variables

Code to be generated by macro

Note the use of proc append

End macro definition

Call macro

Print results

Page 67: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Results from macro code

Obs Effect location type Estimate StdErr DF tValue Probt variable

1 location Albuquerque _ 2.0203 0.1243 15 16.25 <.0001 length 2 location Johannesburg _ 1.9810 0.1243 15 15.94 <.0001 length 3 location Lexington _ 1.8728 0.1155 15 16.21 <.0001 length 4 type 1 1.9621 0.09678 15 20.27 <.0001 length 5 type 2 1.9540 0.1015 15 19.25 <.0001 length 6 location Albuquerque _ 2.9162 0.2844 15 10.25 <.0001 weight 7 location Johannesburg _ 3.1487 0.2844 15 11.07 <.0001 weight 8 location Lexington _ 3.6842 0.2643 15 13.94 <.0001 weight 9 type 1 2.9614 0.2214 15 13.37 <.0001 weight 10 type 2 3.5379 0.2322 15 15.23 <.0001 weight

Page 68: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

proc transpose

Page 69: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Rearranging data with proc transpose

Obs year cultivar Effect trt Estimate StdErr DF tValue Probt

1 95 1 Intercept _ 1.5930 0.4477 38 3.56 0.0010 2 95 1 trt 1 0.2987 0.7485 35 0.40 0.6923 3 95 1 calevel _ -0.00155 0.002401 35 -0.65 0.5229 4 95 1 calevel*trt 1 -0.00010 0.003992 35 -0.02 0.9804 5 95 2 Intercept _ 0.5331 0.2782 38 1.92 0.0628 6 95 2 trt 1 0.3637 0.3501 36 1.04 0.3058 7 95 2 calevel _ 0.002348 0.001570 36 1.49 0.1437 8 95 2 calevel*trt 1 -0.00170 0.002173 36 -0.78 0.4388 13 96 1 Intercept _ 1.4129 0.3650 10 3.87 0.0031 14 96 1 trt 1 0.4670 0.4387 160 1.06 0.2887 15 96 1 calevel _ 0.003074 0.001146 160 2.68 0.0081 16 96 1 calevel*trt 1 -0.00127 0.001387 160 -0.92 0.3598

Consider this output from proc mixed:

Page 70: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Results from proc transpose

proc transpose data=solution out=tsolution; by year cultivar; var estimate; id effect;

calevel_

Obs year cultivar _NAME_ Intercept trt calevel trt

1 95 1 Estimate 1.5930 0.2987 -0.00155 -0.00010 2 95 2 Estimate 0.5331 0.3637 0.002348 -0.00170 3 95 3 Estimate 1.3996 0.08453 -0.00121 0.000326 4 96 1 Estimate 1.4129 0.4670 0.003074 -0.00127 5 96 2 Estimate 1.9705 0.1312 0.003656 -0.00073 6 96 3 Estimate 3.2758 -0.1568 -0.00131 0.001427 7 96 4 Estimate 2.6396 0.7034 0.001114 -0.00237 8 97 1 Estimate 1.3943 0.2684 -0.00026 -0.00120 9 97 2 Estimate 1.5388 -0.2840 -0.00183 0.002834

Page 71: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Parting words of advice

Page 72: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Attributes of SAS• SAS is read/write intensive.• Every time you create a data set, the data

set is written to the disk.• Where does it get written? To the SAS

Work Library, which is assigned to a directory somewhere…..use proc contents to find out.

• For CALS HP users and PC users, the SAS Work Library resides on the actual machine.

Page 73: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Attributes of SAS

• Users of the AFS system beware! (Stat department, CS department)

• The SAS Work Library is assigned to your account in AFS—not to the local machine that is running SAS.

Your local computer

running SAS

AFS recording and reading

your SAS data sets

Network

traffic

Page 74: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Assigning the SAS Work Library

To assign the SAS work library to a local directory (when running in batch mode on a Linux or Unix system):

$ sas mysascode.sas -work /scratch

Page 75: Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison

Synthesis

• Most of what we’ve covered today involves the data step.

• Many of the techniques shown in this presentation can be applied together in a single data step.

• Now that you know the names of some of the tools, use the Online Documentation!