bmtry 789 lecture 4: construction of flat files from vertical files, merging, subsetting

16
Construction of Flat Files from vertical files, merging, subsetting Readings – Chapter 14 In-Class Practice - 14.1, 14.4, 14.5, 14.6, 14.11

Upload: qamra

Post on 23-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting. Readings – Chapter 14 In-Class Practice - 14.1, 14.4, 14.5, 14.6, 14.11. You have to have Goals …. Vertical & Flat File Definitions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Readings – Chapter 14

In-Class Practice - 14.1, 14.4, 14.5, 14.6, 14.11

Page 2: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 2

You have to have Goals …

Page 3: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 3

Vertical & Flat File Definitions

A vertical file is a file that has multiple observations per identifier (i.e. patient)

A flat file is a file that has one observation per identifier.

Subj Age Date HR BP 001 26 07/23/98 58 12080 [Vertical]001 27 02/14/99 62 11060…Subj Age Date1 HR1 BP1 Date2 HR2 BP2001 26 07/23/98 58 12080 02/14/99 62 11060 [Flat]002 25 02/28/00 65 11070 09/01/01 65 12070…

Page 4: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 4

Data Manipulation Why do you spend 85% of your

programming time doing data manipulation? You get data one way, but it needs to be

another way in order to perform the analysis.

Will need to do things like, subset, merge, rename variables, create new variables, etc. in order to structure the data into the format that you need it to be.

Page 5: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 5

Conditional Logic [If-Then-Else]

Frequently, you want an assignment statement to apply to some observations but not all - under some conditions, but not others.

1) IF condition THEN action;2) IF condition THEN action;

ELSE IF condition THEN action;ELSE IF condition THEN action;

3) IF condition THEN action;ELSE IF condition THEN action;ELSE action;

4) IF condition THEN DO;Perform a set of actions;

END;

Page 6: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 6

IF-THEN-ELSE Rules A single IF-THEN statement can only have one

action. If you add the keywords DO and END, then you can execute more than one action.

You can also specify multiple conditions with the keywords AND and OR

The IN operator also makes comparisons, IN compares the value of a variable to a list of values.

*Remember SAS considers missing values to be smaller than non-missing values.

Page 7: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 7

Comparison Operators These operators can be coded using Symbols or

Mnemonics.Symbol Mnemonic Meaning= EQ Equals~= NE Not Equal> GT Greater Than< LT Less Than>= GE Greater than or Equal<= LE Less than or Equal& AND All comparisons must be true| OR Only one comparison must be true

Page 8: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 8

Subsetting Often programmers find that they want

to use some of the observations in a data set and exclude the rest. The most common way to do this is with a subsetting IF statement in a DATA step.

Syntax: IF expression; Ex: IF Sex = ‘f’;

Page 9: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 9

Computing a Difference Between first and last observationsProc Sort Data= Annie.PATIENTS;

By id Date;Run;Data First_Last;

Set Annie.PATIENTS;By Id;Format Date mmddyy10.;Retain First_HR First_SBP First_DBP;If First.id and Last.id then delete;***Purpose of code: if it is the first visit assign values to the first visit variables, same for last visit;If First.ID Then Do;

First_HR = HR;First_SBP = SBP;First_DBP = DBP;

End;If Last.ID Then Do;

Last_HR = HR;Last_SBP = SBP;Last_DBP = DBP;Diff_HR = Last_HR - First_HR;Diff_SBP = Last_SBP - First_SBP;Diff_DBP = Last_DBP - First_DBP;

Output;End;

Run;

Page 10: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 10

Subsetting (cont.) If the expression is true, then SAS continues with

the DATA step. If the expression is false, then no further statements are processed for that observation; that observation is not added to the data set being created; and SAS moves to the next observation.

While the subsetting IF statement tells SAS which observations to include, the DELETE statement tells SAS which observations to exclude:IF expression THEN DELETE;IF Sex = ‘f’ THEN DELETE;

Page 11: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 11

Stacking Data Sets Used to combine data sets with all or most of

the same variables but different observations. The order of observations is determined by the

order of the list of old data sets If one of the data sets has a variable not

contained in the other data sets, then the observations from the other data sets will have missing values for that variable.DATA new-data-set;

SET old-data-set1 old-data-set2;RUN;

Page 12: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 12

Interleaving Data Sets If your data sets are already sorted, combine

data sets while preserving that order. PROC SORT first!

If one of the data sets has a variable not contained in the other data sets, then the observations from the other data sets will have missing values for that variable.DATA new-data-set;

SET old-data-set1 old-data-set2;BY variable-list;

RUN;

Page 13: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 13

Combining Data Sets Using One-to-One Match Merge

When you have two data sets with related data and you want to combine them.

If you merge two data sets, and they have variables with the same names – besides the BY variables, then variables from the second data set will overwrite any variables having the same names in the first data set.

All observations from old data sets will be included in the new data set whether they have a match or not.

Page 14: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 14

Match Merge ExampleProc Sort Data = Rat;

BY RatID Date;Run;Proc Sort Data = Rat2;

BY RatID Date;Run;DATA BigRat;

MERGE Rat Rat2;BY RatID Date;

Run;

Page 15: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 15

Tracking and Selecting Obs with the IN=Option

This is most often used with MERGE IN=varName is temporary, existing only

during the current DATA step SAS gives the IN=varName variable a value

of ‘0’ if the data set did not contribute to the current observation and a value of ‘1’ if the data set did contribute

Page 16: BMTRY 789 Lecture 4: Construction of Flat Files from vertical files, merging, subsetting

Summer 2008 BMTRY 789 Introduction to SAS Programming 16

IN=varName ExampleProc Sort Data = Worker;

By Year;Run;Proc Sort Data = Exp;

By Year;Run;Data WorkExp;

Merge Worker (IN=InWork) Exp;By Year;

If InWork;Run;