get the scoop on the loop how best to write a loop in the data step

153
Get the Scoop on the Loop How Best to Write a Loop in the DATA Step Arthur Li Department of Information Science City of Hope Comprehensive Cancer Center Duarte, CA

Upload: arthur8898

Post on 12-May-2015

295 views

Category:

Education


3 download

DESCRIPTION

During the execution of the DATA step processing, the DATA step works like a loop, repetitively reading the data and creating observations one at a time. We call this type of loop the implicit loop. Sometimes we need to execute certain SAS® statements repeatedly. In this situation, we need to construct an explicit loop by using the DO, DO WHILE, or DO UNTIL statements. There is a wide range of applications for explicit loops, such as generating random samples, reading multiple external data files, and so forth. However, in some scenarios, creating an explicit loop can be very tricky, even for seasoned programmers. Constructing a successful loop is dependent upon grasping SAS® programming fundamentals, such as understanding that the SAS data set is created one observation at a time in the program data vector (PDV). In this paper, you will learn how to create loops with various applications and what happens in the PDV when creating the explicit loop.

TRANSCRIPT

Page 1: Get the scoop on the loop   how best to write a loop in the data step

Get the Scoop on the Loop How Best to Write a Loop in the

DATA Step

Arthur LiDepartment of Information Science

City of Hope Comprehensive Cancer Center Duarte, CA

Page 2: Get the scoop on the loop   how best to write a loop in the data step

INTRODUCTION

Loops: execute one or a group of statements repetitively until it reaches a predefined condition

For SAS, there are implicit and explicit loops

Sometimes programmers can’t distinguish clearly between the two different loops

Knowing when the situation calls for creating an explicit loop is one of a programmer’s challenges

Page 3: Get the scoop on the loop   how best to write a loop in the data step

COMPILATION AND EXECUTION PHASES

Compilation Phase

Execution phase

If there is no syntax error

A DATA step is processed in two-phase sequences:

Each statement is scanned for syntax errors

PDV is created according to the descriptor portion of the input dataset

SAS uses the PDV to build the new dataset

Page 4: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

During the execution phase, the DATA step works like a loop – an implicit loop

It repetitively executes statements reads data values creates observations in the PDV one at a time

Each loop is called an iteration Suppose you have the following dataset that contains

patient IDs for a clinical trial

You would like to assign each patient with either a drug or a placebo (50% chance of either/or)

Page 5: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

The RANUNI function

RANUNI (SEED)

It generates a number ~ Uniform(0, 1) e.g. 0.13567, 0.34567, 0.56789, etc

SEED is a nonnegative integerThe RANUNI function generates a stream of numbers

based on SEEDWhen SEED is set to 0, the generated number cannot

be reproducedwhen SEED is a non-zero number, the generated

number can be produced

Page 6: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:

COMPILATION:

Check for Syntax Error

PDV is Created

Automatic variables:_N_ = 1: 1st observation is being processed_N_ = 2: 2nd observation is being processed

Page 7: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:

COMPILATION:

Check for Syntax Error

PDV is Created

Automatic variables:_ERROR_ = 1: signals the data error of the currently-processed observation

Page 8: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:

Variable exists in the INPUT dataset

SAS sets each variable to missing in the PDV only before the 1st iteration of the execution

Variables will retain their values in the PDV until they are replaced by the new values

COMPILATION:

Page 9: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:

Variables being created in the DATA step

SAS sets each variable to missing in the PDV at the beginning of every iteration of the execution

COMPILATION:

Page 10: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP KPDV:

COMPILATION:

D = dropped

K = kept

Page 11: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

1st iteration:_N_ 1_ERROR_ 0The rest of variables are set to missing

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 .PDV:

EXECUTION:

Page 12: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

1st iteration:

The SET statement copies the 1st observation PDV

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 M2390 .PDV:

EXECUTION:

Page 13: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

1st iteration: RANNUM is generated

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 M2390 0.36993PDV:

EXECUTION:

Page 14: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

1st iteration: GROUP ‘P’ since RANNUM is not > 0.5

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 M2390 0.36993 PPDV:

EXECUTION:

Page 15: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

1st iteration:

The implicit OUTPUT statement writes the variables marked with (K) to the final dataset

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 M2390 0.36993 PPDV:

Trial1:ID GROUP

1 M2390 P

EXECUTION:

Page 16: Get the scoop on the loop   how best to write a loop in the data step

REVIEW: OUTPUT Statement

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

Explicit OUTPUT

The explicit OUTPUT statement:

Writes the current observation from the PDV to a SAS dataset immediately

Not at the end of the DATA step

Page 17: Get the scoop on the loop   how best to write a loop in the data step

REVIEW: OUTPUT Statement

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

Implicit OUTPUT

The implicit OUTPUT statement:

Without explicit OUTPUT statements, every DATA step contains an implicit OUTPUT statement at the end of the DATA step

It tells SAS to write observations to the dataset at the end of the DATA step

Page 18: Get the scoop on the loop   how best to write a loop in the data step

REVIEW: OUTPUT Statement

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

Placing an explicit OUTPUT

Override the implicit OUTPUT

SAS adds an observation to a dataset only when an explicit OUTPUT is executed

We can use more than one OUTPUT statement in the DATA step

Page 19: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

2nd iteration:

_N_ ↑2ID is retained since ID is from input datasetGROUP and RANNUM are set to missing

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

2 0 M2390 .PDV:

Trial1:ID GROUP

1 M2390 P

EXECUTION:

Page 20: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

2nd iteration:The SET statement copies the 2nd observation PDV

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

2 0 M2390 .PDV:

Trial1:ID GROUP

1 M2390 P

Skip a few iterations….

EXECUTION:

Page 21: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

The end of 4th iteration:The implicit OUTPUT statement writes the variables

marked with K to the final datasetSAS returns to the beginning of the DATA step

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

4 0 M1240 0.51880 DPDV:

Trial1:ID GROUP

1 M2390 P

2 F2390 D

3 F2340 D

4 M1240 D

EXECUTION:

Page 22: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

5th iteration:_N_ ↑5ID is retained GROUP and RANNUM are set to missing

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

5 0 M1240 .PDV:

Trial1:ID GROUP

1 M2390 P

2 F2390 D

3 F2340 D

4 M1240 D

EXECUTION:

Page 23: Get the scoop on the loop   how best to write a loop in the data step

IMPLICIT LOOP

data trial1 (drop=rannum); set patient; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P';run;

5th iteration:SAS reaches the end-of-file-marker, which means that

there are no more observations to readThe execution phase is completed, goes to next

DATA/PROC step

Patient:ID

1 M2390

2 F2390

3 F2340

4 M1240

_N_ D _ERROR_ D ID K RANNUM D GROUP K

5 0 M1240 .PDV:

Trial1:ID GROUP

1 M2390 P

2 F2390 D

3 F2340 D

4 M1240 DEnd-of-file marker

EXECUTION:

Page 24: Get the scoop on the loop   how best to write a loop in the data step

EXPLICIT LOOP

Suppose you don’t have a dataset containing the patient IDs

You are asked to assign four patients, ‘M2390’, ‘F2390’, ‘F2340’, ‘M1240’, with a 50% chance of receiving either the drug or the placebo

You can create the ID and assign each ID to a group in the DATA step at the same time. For example

Page 25: Get the scoop on the loop   how best to write a loop in the data step

EXPLICIT LOOP

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

Assigning IDs in the DATA step

Page 26: Get the scoop on the loop   how best to write a loop in the data step

EXPLICIT LOOP

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

4 explicit OUTPUT statements

Page 27: Get the scoop on the loop   how best to write a loop in the data step

EXPLICIT LOOP

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2340'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

4 almost identical blocks

Put identical codes in a loop

Loop along the IDs

Reduce amount of coding

Page 28: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP

General form for an iterative DO loop

DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;

INDEX-VARIABLE: contains the value of the current iteration

The loop will execute along VALUE1 through VALUEN

The VALUES can be either character or numeric

Page 29: Get the scoop on the loop   how best to write a loop in the data step

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; ...

id = 'F2340'; ...

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;

INDEX-VARIABLE: IDVALUE1 – VALUEN: 'M2390’, 'F2390’, 'F2340’, 'M1240'SAS STATEMENTS:

rannum = ranuni(2);if rannum> 0.5 then group = 'D';else group ='P';output;

ITERATIVE DO LOOP

Page 30: Get the scoop on the loop   how best to write a loop in the data step

data trial2(drop = rannum); id = 'M2390'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;

id = 'F2390'; ...

id = 'F2340'; ...

id = 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output;run;

DO INDEX-VARIABLE = VALUE1, VALUE2, …, VALUEN;SAS STATEMENTSEND;

data trial2 (drop = rannum); do id = 'M2390', 'F2390', 'F2340', 'M1240'; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

ITERATIVE DO LOOP

Page 31: Get the scoop on the loop   how best to write a loop in the data step

DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;

Usually we use the iterative DO loop and loop along a sequence of integers

The loop will execute from the START to the STOP value

ITERATIVE DO LOOP

Page 32: Get the scoop on the loop   how best to write a loop in the data step

DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;

Usually we use the iterative DO loop and loop along a sequence of integers

The optional BY clause specifies an increment between START and STOP

The default value for INCREMENT is 1

ITERATIVE DO LOOP

Page 33: Get the scoop on the loop   how best to write a loop in the data step

DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;

Usually we use the iterative DO loop and loop along a sequence of integers

START, STOP, and INCREMENTNumbersVariablesSAS expressions

These values are set upon entry into the DO loop and cannot be modified during the processing of the DO loop

ITERATIVE DO LOOP

Page 34: Get the scoop on the loop   how best to write a loop in the data step

DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;

Usually we use the iterative DO loop and loop along a sequence of integers

INDEX-VARIABLE can be changed within the loop

ITERATIVE DO LOOP

Page 35: Get the scoop on the loop   how best to write a loop in the data step

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

Suppose you are using a sequence of numbers, say 1 to 4, as patient IDs

DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;

INDEX-VARIABLE: IDSTART: 1STOP: 4INCREMENT: 1

ITERATIVE DO LOOP

Page 36: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

Since we didn’t read an input dataset, there will be only one iteration for the DATA step

_N_ will be 1 for the entire execution phase

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 . .PDV:

Page 37: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

ID 1

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 .PDV:

1st Iteration of DO loop:

Page 38: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

RANNUM is generated

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 0.36993PDV:

1st Iteration of DO loop:

Page 39: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

GROUP ‘P’ since RANNUM is not > 0.5

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 0.36993 PPDV:

1st Iteration of DO loop:

Page 40: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

The OUTPUT statement instructs SAS to write observations to the output dataset

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 0.36993 PPDV:

1st Iteration of DO loop:

ID GROUP

1 1 P

Page 41: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

SAS reaches the end of DO loop

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 0.36993 PPDV:

1st Iteration of DO loop:

ID GROUP

1 1 P

Page 42: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

ID ↑ 2; since 2 ≤ 4, the 2nd iteration continues

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 2 0.36993 PPDV:

2nd Iteration of DO loop:

ID GROUP

1 1 P

Page 43: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

RANNUM is generated

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 2 0.94018 PPDV:

2nd Iteration of DO loop:

ID GROUP

1 1 P

Page 44: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

GROUP ‘D’ since RANNUM > 0.5

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 2 0.94018 DPDV:

2nd Iteration of DO loop:

ID GROUP

1 1 P

Page 45: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

The OUTPUT statement instructs SAS to write observations to the output dataset

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 2 0.94018 DPDV:

2nd Iteration of DO loop:

ID GROUP

1 1 P

2 2 D

Page 46: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

Let’s skip two iterations

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 2 0.94018 DPDV:

ID GROUP

1 1 P

2 2 D

Page 47: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

SAS reaches the end of the DO loop of the 4th iteration

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 4 0.51880 DPDV:

4th Iteration of DO loop:

ID GROUP

1 1 P

2 2 D

3 3 D

4 4 D

Page 48: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

ID ↑5; since 5 is > 4, the loop ends

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 5 0.51880 DPDV:

5th iteration of DO loop:

ID GROUP

1 1 P

2 2 D

3 3 D

4 4 D

Page 49: Get the scoop on the loop   how best to write a loop in the data step

ITERATIVE DO LOOP: EXECUTION PHASE

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group = 'D'; else group ='P'; output; end;run;

There will be no implicit OUTPUT statementSince we didn’t read an input dataset, the DATA step

execution ends

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 5 0.51880 DPDV:

ID GROUP

1 1 P

2 2 D

3 3 D

4 4 D

Page 50: Get the scoop on the loop   how best to write a loop in the data step

EXECUTING LOOPS CONDITIONALLY

Using an iterative DO loop requires specifying the number of iterations for the DO loop

Sometimes you will need to execute statements repetitively until a condition is met

In this situation, you need to use either the DO WHILE or DO UNTIL statements

Page 51: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

DO WHILE (EXPRESSION);SAS STATEMENTSEND;

EXPRESSION is evaluated at the top of the DO loopThe DO loop will not execute if the EXPRESSION is false

Page 52: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

DO WHILE (EXPRESSION);SAS STATEMENTSEND;

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

Iterative DO loop: DO WHILE loop:

Page 53: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

_N_ 1, _ERROR_ 0ID 0 because of the SUM statementThe rest of the variables are set to missing

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 0 .PDV:

At the beginning of the execution phase:

Page 54: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

Since ID < 4, loop continues

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 0 .PDV:

1st iteration of the DO WHILE loop:

Page 55: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

ID 1

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 .PDV:

1st iteration of the DO WHILE loop:

Page 56: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

RANNUM is generated

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 0.36993PDV:

1st iteration of the DO WHILE loop:

Page 57: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

GROUP ‘P’

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 0.36993 PPDV:

1st iteration of the DO WHILE loop:

Page 58: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

The OUTPUT statement instructs SAS to write observations to the output dataset

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 0.36993 PPDV:

1st iteration of the DO WHILE loop:

ID GROUP

1 1 P

Page 59: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

SAS reaches the end of DO loop

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 0.36993 PPDV:

1st iteration of the DO WHILE loop:

ID GROUP

1 1 P

Page 60: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

Since ID < 4, the loop continues

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 1 0.36993 PPDV:

2nd iteration of the DO WHILE loop:

ID GROUP

1 1 P

Page 61: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

ID 2

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 2 0.36993 PPDV:

2nd iteration of the DO WHILE loop:

ID GROUP

1 1 P

Page 62: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 2 0.36993 PPDV:

ID GROUP

1 1 P

Let’s skip a few iterations

Page 63: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 4 0.51880 DPDV:

At the end of the 4th iteration:

ID GROUP

1 1 P

2 2 D

3 3 D

4 4 D

Here’s the contents of the PDV at the end of the 4th loop

Page 64: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 4 0.51880 DPDV:

5th iteration:

ID GROUP

1 1 P

2 2 D

3 3 D

4 4 D

Now ID is not < 4, loop stops

Page 65: Get the scoop on the loop   how best to write a loop in the data step

DO WHILE

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

_N_ D _ERROR_ D ID K RANNUM D GROUP K

1 0 4 0.51880 DPDV:

5th iteration:

ID GROUP

1 1 P

2 2 D

3 3 D

4 4 D

The execution phase ends

Page 66: Get the scoop on the loop   how best to write a loop in the data step

DO UNTIL

Unlike DO WHILE loops, the DO UNTIL loop evaluates the condition at the end of the loop

The DO UNTIL loop will not continue for another iteration if the EXPRESSION is evaluated to be TRUE at the end of the current loop

That means the DO UNTIL loop always executes at least once

DO UNTIL (EXPRESSION);SAS STATEMENTSEND;

Page 67: Get the scoop on the loop   how best to write a loop in the data step

DO UNTIL

DO UNTIL (EXPRESSION);SAS STATEMENTSEND;

data trial3 (drop = rannum); do id = 1 to 4; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

data trial4 (drop=rannum); do while (id < 4); id + 1; rannum = ranuni(2); if rannum> 0.5 then group ='D'; else group ='P'; output; end;run;

Iterative DO loop:

DO WHILE loop:data trial5 (drop=rannum); do until (id >=4); id +1; rannum = ranuni(2); if rannum > 0.5 then group ='D'; else group ='P'; output; end;run;

DO UNTIL loop:

Will not continue if the EXPRESSION is false

Will not continue for another iteration if the EXPRESSION is true

Page 68: Get the scoop on the loop   how best to write a loop in the data step

NESTED LOOPS

Suppose that you would like to assign 12 patients with either a drug or a placebo

These 12 subjects are from 3 cancer centers (“COH”, “UCLA”, and “USC”) with 4 subjects per center

data trial6; length center $4; do center = "COH", "UCLA", "USC"; do id = 1 to 4; if ranuni(2) > 0.5 then group = 'D'; else group ='P'; output; end; end;run;

Outer loop

Inner loop

Page 69: Get the scoop on the loop   how best to write a loop in the data step

NESTED LOOPS

Suppose that you would like to assign 12 patients with either a drug or a placebo

These 12 subjects are from 3 cancer centers (“COH”, “UCLA”, and “USC”) with 4 subjects per center

Obs center id group 1 COH 1 P 2 COH 2 D 3 COH 3 D 4 COH 4 D 5 UCLA 1 D 6 UCLA 2 D 7 UCLA 3 P 8 UCLA 4 P 9 USC 1 P 10 USC 2 P 11 USC 3 D 12 USC 4 P

Page 70: Get the scoop on the loop   how best to write a loop in the data step

COMBINING IMPLICIT AND EXPLICIT LOOPS

In previous program, all the observations were created from one DATA step since we didn’t read any input data

Suppose the values for CENTER is stored in a SAS dataset

For each center, you need to assign 4 patients with either a drug or a placebo

data trial7; set cancer_center; do id = 1 to 4; if ranuni(2)> 0.5 then group = 'D'; else group ='P'; output; end;run;

CENTER

1 COH

2 UCLA

3 USC

explicit loop

DATA step: implicit loop

Page 71: Get the scoop on the loop   how best to write a loop in the data step

UTILIZING LOOPS TO CREATE SAMPLESDIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

When reading a SAS dataset, by default, SAS reads the dataset sequentially SAS reads one observation for

each iteration of the DATA step This process will stop once it

reaches the end-of-file marker

sequentially

The end-of-file marker

Page 72: Get the scoop on the loop   how best to write a loop in the data step

DIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

SAS can also access an observation directly via direct-access mode

Direct Access

Page 73: Get the scoop on the loop   how best to write a loop in the data step

DIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

There are 3 important components for using the direct-access mode

Step1: Tell SAS which observation you would like to select by using POINT = in the SET statement

SET SAS-DATA-SET POINT = VARIABLE;

Temporary variable, not outputtedSet to 0 in the PDV at the very

beginning of the DATA step

Page 74: Get the scoop on the loop   how best to write a loop in the data step

DIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

There are 3 important components for using the direct-access mode

Step1: Tell SAS which observation you would like to select by using POINT = in the SET statement

SET SAS-DATA-SET POINT = VARIABLE;

VARIABLE must be assigned to an observation number before the SET statement

Page 75: Get the scoop on the loop   how best to write a loop in the data step

DIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

There are 3 important components for using the direct-access mode

Step1: Tell SAS which observation you would like to select by using POINT = in the SET statement

SET SAS-DATA-SET POINT = VARIABLE;

For example, to select the 5th observation…data sample1;

obs_n = 5; set sbp point= obs_n;

run;

Sbp:

Page 76: Get the scoop on the loop   how best to write a loop in the data step

DIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

There are 3 important components for using the direct-access mode

Step2: Use the STOP statementWhen using direct-access mode, SAS

will not be able to detect the end-of-file marker

data sample1; obs_n = 5; set sbp point= obs_n;

run;

Sbp:

The end-of-file marker

Page 77: Get the scoop on the loop   how best to write a loop in the data step

DIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

There are 3 important components for using the direct-access mode

Step2: Use the STOP statementWhen using direct-access mode, SAS

will not be able to detect the end-of-file marker

Without telling SAS explicitly when to stop processing, it will cause infinite looping

STOP;

data sample1; obs_n = 5; set sbp point= obs_n;

stop;run;

Sbp:

Page 78: Get the scoop on the loop   how best to write a loop in the data step

DIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

There are 3 important components for using the direct-access mode

Step3: Use the OUTPUT statement

data sample1; obs_n = 5; set sbp point= obs_n;

stop;run;

Sbp:

Implicit output

Recall:If there is no explicit OUTPUT, SAS writes the observations to the output data at the end of the DATA step

Page 79: Get the scoop on the loop   how best to write a loop in the data step

DIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

There are 3 important components for using the direct-access mode

Step3: Use the OUTPUT statement

data sample1; obs_n = 5; set sbp point= obs_n;

stop;run;

Sbp:

Implicit output

DATA step processing stop

DATA step processing stops BEFORE the end of the DATA step Implicit OUTPUT will not be reached!

Page 80: Get the scoop on the loop   how best to write a loop in the data step

DIRECT ACCESS MODE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

There are 3 important components for using the direct-access mode

Step3: Use the OUTPUT statement

data sample1; obs_n = 5; set sbp point= obs_n; output; stop;run;

Sbp:

Add the OUTPUT statement before the STOP

Page 81: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

Select every 3rd observation

ID SBP

1 01 145

2 04 106

3 07 127

A systematic sample is created by selecting every kth observation from an original dataset

Page 82: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

The systematic sample cannot be created sequentially - A direct-access mode must be used

You can create a systematic sample by using an iterative DO loop

DO INDEX-VARIABLE = START TO STOP <BY INCREMENT>;SAS STATEMENTSEND;

1total # of obs.

k - every kth obs.

Page 83: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

To find out the total number of observations, use the NOBS = option in the SET statement

SET SAS-DATA-SET NOBS = VARIABLE;

A temporary variable that contains the # of observations of the SAS-DATA-SET

It will not be outputted to the final dataset It is created automatically based on the

descriptor portion of the SAS-DATA-SET during the compilation phase

It will retain its value throughout the execution phase

Page 84: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 0 9 .PDV:

At the beginning of the execution phase:_N_ 1_N_ will be 1 throughout the execution phase because

SAS didn’t read the input data sequentially

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

_ERROR_ is not shown for simplicity

Page 85: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 0 9 .PDV:

At the beginning of the execution phase:CHOOSE 0TOTAL 9, based on the descriptor portion of SbpThe rest of variables missing

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

Page 86: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 1 9 .PDV:

1st iteration of the DO loop:CHOOSE 1

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

Page 87: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 1 9 01 145PDV:

1st iteration of the DO loop:SAS reads the 1st observation via direct-access mode

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

Page 88: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 1 9 01 145PDV:

1st iteration of the DO loop:The OUTPUT statement instructs SAS to write the

contents from PDV to Sample2

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

Sample2:

Page 89: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 1 9 01 145PDV:

1st iteration of the DO loop:SAS reaches the end of 1st iteration

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

Sample2:

Page 90: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 4 9 01 145PDV:

2nd iteration of the DO loop:CHOOSE ↑4Since 4 ≤ TOTAL (9), the 2nd iteration continues

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

Sample2:

Page 91: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 4 9 04 106PDV:

2nd iteration of the DO loop:SAS reads the 4th observation via direct-access mode

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

Sample2:

Page 92: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 4 9 04 106PDV:

2nd iteration of the DO loop:The OUTPUT statement instructs SAS to write the

contents from PDV to Sample2

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

2 04 106

Sample2:

Page 93: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 4 9 04 106PDV:

2nd iteration of the DO loop:SAS reaches the end of 2nd iteration

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

2 04 106

Sample2:

Page 94: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 7 9 04 106PDV:

3rd iteration of the DO loop:CHOOSE ↑7Since 7 ≤ TOTAL (9), the 3rd iteration continues

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

2 04 106

Sample2:

Page 95: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 7 9 07 127PDV:

3rd iteration of the DO loop:SAS reads the 7th observation via direct-access mode

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

2 04 106

Sample2:

Page 96: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 7 9 07 127PDV:

3rd iteration of the DO loop:The OUTPUT statement instructs SAS to write the

contents from PDV to Sample2

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

2 04 106

3 07 127

Sample2:

Page 97: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 7 9 07 127PDV:

3rd iteration of the DO loop:SAS reaches the end of 3rd iteration

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

2 04 106

3 07 127

Sample2:

Page 98: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 10 9 07 127PDV:

4th iteration of the DO loop:CHOOSE ↑10Since 10 > TOTAL (9), the loop ends

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

2 04 106

3 07 127

Sample2:

Page 99: Get the scoop on the loop   how best to write a loop in the data step

CREATING A SYSTEMATIC SAMPLE

_N_ D CHOOSE D TOTAL D ID K SBP K

1 10 9 07 127PDV:

The STOP statement stops the DATA step processing

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

ID SBP

1 01 145

2 04 106

3 07 127

Sample2:

Page 100: Get the scoop on the loop   how best to write a loop in the data step

CREATING A RANDOM SAMPLE WITH REPLACEMENT

A random sample – a sample is created from an original dataset on a random basis

A random sample with replacement An observation is replaced back into the original

dataset after it is chosen Any observations can be chosen more than once

Page 101: Get the scoop on the loop   how best to write a loop in the data step

data sample2; do choose = 1 to total by 3; set sbp point = choose nobs = total; output; end; stop;run;

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 151

6 06 112

7 07 127

8 08 119

9 09 113

Systematic sample

CHOOSE is incremented by k to create a systematic sample

To create a random sample, we need to generate a random integer between 1 and total # of observations

CREATING A RANDOM SAMPLE WITH REPLACEMENT

Page 102: Get the scoop on the loop   how best to write a loop in the data step

data sample3 (drop= i); do i =1 to 3; choose = ceil(ranuni(5)*total); set sbp point=choose nobs=total; output; end; stop;run;

How to generate a random integer between 1 and total # of observations ?

RANUNI(SEED) A randomly generated real number (0,1)

N Total number of observations

RANUNI(SEED)*N A real number (0, N)

CEIL(RANUNI(SEED)*N) An integer [1, N]

CREATING A RANDOM SAMPLE WITH REPLACEMENT

Page 103: Get the scoop on the loop   how best to write a loop in the data step

CREATING A RANDOM SAMPLE WITHOUT REPLACEMENT

SELF STUDY!

Page 104: Get the scoop on the loop   how best to write a loop in the data step

UTILIZING LOOPS TO READ A LIST OF EXTERNAL FILESTHE INFILE STATEMENT WITH THE END= OPTION

To read an external file, you can use the INFILE statement

For example, text1.txt, is located in “C:\”,

text1.txt:

01 14502 119

data example13; infile "C:\text1.txt"; input id $ sbp;run;

2 observations SAS will use 2 DATA step iterations to read the data

Like a SAS dataset, the external file also contains an end-of-file marker

When SAS reaches the end-of-file marker, it stops reading

End-of-file marker

Page 105: Get the scoop on the loop   how best to write a loop in the data step

THE INFILE STATEMENT WITH THE END= OPTION

When reading a SAS dataset …

Input dataset:ID

1 M2390

2 F2390

3 F2340

4 M1240

ID GROUP

1 M2390 P

2 F2390 D

3 F2340 D

4 M1240 D

_N_ D _ERROR_ D ID K RANNUM D GROUP K

4 0 M1240 0.51880 D

Output dataset:

PDV:

Page 106: Get the scoop on the loop   how best to write a loop in the data step

THE INFILE STATEMENT WITH THE END= OPTION

When reading a raw dataset …

Input dataset:

ID SBP

1 01 145

2 02 119

_N_ D _ERROR_ D ID K SBP K

2 0 02 119

Output dataset:

PDV:

01 14502 119

1 2 3 4 5 6 …

0 2 1 1 9 …Input buffer:Used to hold raw data

Page 107: Get the scoop on the loop   how best to write a loop in the data step

THE INFILE STATEMENT WITH THE END= OPTION

You can use an explicit loop to read the external fileTo construct an explicit loop, you need to specify

the number of iterations for the iterative DO loop or a condition for the DO WHILE /DO UNTIL loops

One way to specify a condition is by telling SAS to read the observations until it reads the last record

To identify the last record, use the END = option in the INFILE statement

INFILE FILE-SPECIFICATION END = VARIABLE;

The VARIABLE is set to 1 when SAS reads the last record of the external file; otherwise it sets to 0

Page 108: Get the scoop on the loop   how best to write a loop in the data step

THE INFILE STATEMENT WITH THE END= OPTION

The following program uses the DO UNTIL loop to read the external filedata example14; infile "C:\text1.txt" end = last; do until (last = 1); input id $ sbp; output; end;run;

There’s only one DATA step iterationWithin this iteration, the DO UNTIL loop iterates twice to

read the two observations in text1.txt.

Page 109: Get the scoop on the loop   how best to write a loop in the data step

THE INFILE STATEMENT WITH THE FILEVAR = OPTION

Generally, you specify the name and the location of the external file immediately in the INFILE statement

Alternatively, you can use the FILEVAR = option in the INFILE statement to read an external file that is specified by the FILEVAR = option

infile "C:\text1.txt";

INFILE FILE-SPECIFICATION FILEVAR = VARIABLE

VARIABLE contains the name of the external file

must be created before the INFILE statement

A placeholder, not an actual filename

Page 110: Get the scoop on the loop   how best to write a loop in the data step

THE INFILE STATEMENT WITH THE FILEVAR = OPTION

For example,

data example15; filename = "C:\text1.txt"; infile dummy filevar = filename; input id $ sbp;run;

167 data example14;168 filename = "C:\text1.txt";169 infile dummy filevar = filename;170 input id $ sbp;171 run;NOTE: The infile DUMMY is: File Name=C:\text1.txt, RECFM=V,LRECL=256NOTE: 2 records were read from the infile DUMMY. The minimum record length was 6. The maximum record length was 6.NOTE: The data set WORK.EXAMPLE13 has 2 observations and 2 variables.

Page 111: Get the scoop on the loop   how best to write a loop in the data step

READING MULTIPLE EXTERNAL FILES

text1.txt: 01 14502 119

text2.txt: 03 12604 106

text3.txt: 05 14006 118

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 140

6 06 118

read

concatenate

Identical Format:

You can read them all by using the FILEVAR = option in the INFILE statement in one single DATA step

Page 112: Get the scoop on the loop   how best to write a loop in the data step

READING MULTIPLE EXTERNAL FILES

text1.txt: 01 14502 119

text2.txt: 03 12604 106

text3.txt: 05 14006 118

ID SBP

1 01 145

2 02 119

3 03 126

4 04 106

5 05 140

6 06 118

read

concatenate

Identical Format:

The FILEVAR = option will cause the INFILE statement to close the current input file and open a new one which is the FILEVAR = option

Page 113: Get the scoop on the loop   how best to write a loop in the data step

READING MULTIPLE EXTERNAL FILES

text1.txt: 01 14502 119

data example15 ;

filename = "C:\text1.txt";

infile dummy filevar = filename; input id $ sbp; run;

text2.txt: 03 12604 106

text3.txt: 05 14006 118

These three statements need to be placed inside a loop

Page 114: Get the scoop on the loop   how best to write a loop in the data step

READING MULTIPLE EXTERNAL FILES

text1.txt: 01 14502 119

data example15 ;

filename = "C:\text1.txt";

infile dummy filevar = filename; input id $ sbp; run;

text2.txt: 03 12604 106

text3.txt: 05 14006 118

The names of the external files suggest that you create an iterative DO loop and iterate between 1 and 3

do i = 1 to 3;

end;

Page 115: Get the scoop on the loop   how best to write a loop in the data step

READING MULTIPLE EXTERNAL FILES

text1.txt: 01 14502 119

data example15 ;

filename = "C:\text1.txt";

infile dummy filevar = filename; input id $ sbp; run;

text2.txt: 03 12604 106

text3.txt: 05 14006 118

Modify the FILENAME statement by using the the || operator

do i = 1 to 3;

end;

Page 116: Get the scoop on the loop   how best to write a loop in the data step

READING MULTIPLE EXTERNAL FILES

text1.txt: 01 14502 119

data example15 ;

filename = "C:\text1.txt";

infile dummy filevar = filename; input id $ sbp; run;

text2.txt: 03 12604 106

text3.txt: 05 14006 118

do i = 1 to 3;

end;

filename = "C:\text" || put(i, 1.) || ".txt";

filename = "C:\text" || put(i, 1.) || ".txt";

Page 117: Get the scoop on the loop   how best to write a loop in the data step

READING MULTIPLE EXTERNAL FILES

text1.txt: 01 14502 119

data example15 ;

filename = "C:\text1.txt";

infile dummy filevar = filename; input id $ sbp; run;

text2.txt: 03 12604 106

text3.txt: 05 14006 118

do i = 1 to 3;

end;

filename = "C:\text" || put(i, 1.) || ".txt";

Add the OUTPUT statement within the loop

output;

Page 118: Get the scoop on the loop   how best to write a loop in the data step

READING MULTIPLE EXTERNAL FILES

text1.txt: 01 14502 119

data example15 ;

filename = "C:\text1.txt";

infile dummy filevar = filename; input id $ sbp; run;

text2.txt: 03 12604 106

text3.txt: 05 14006 118

do i = 1 to 3;

end;

filename = "C:\text" || put(i, 1.) || ".txt";

FILEVAR = option controls closing the current input file and opening a new file; SAS will not be able to detect the end-of-file marker

Place a STOP statement outside the loop

output;

stop;

Page 119: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ 1Other variables missing

_N_ D I D FILENAME D ID K SBP K

1 . .

At the beginning of the DATA step:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

Page 120: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

I 1

_N_ D I D FILENAME D ID K SBP K

1 1 .

1st iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

Page 121: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

FILENAME C:\text1.txt

_N_ D I D FILENAME D ID K SBP K

1 1 C:\text1.txt .

1st iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

Page 122: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

INFILE reads:1st data line from ‘text1.txt’ input buffer

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 0 .

1st iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

1 2 3 4 5 6 …

0 1 1 4 5 …

Input buffer:

Page 123: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

INPUT reads data values: input buffer PDV

_N_ D I D FILENAME D ID K SBP K

1 1 C:\text1.txt 01 145

1st iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

1 2 3 4 5 6 …

0 1 1 4 5 …

Input buffer:

Page 124: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

OUTPUT tells SAS to write observations: PDV output dataset

_N_ D I D FILENAME D ID K SBP K

1 1 C:\text1.txt 01 145

1st iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

ID SBP

1 01 145

Page 125: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

SAS reaches the end of the DO loop

_N_ D I D FILENAME D ID K SBP K

1 1 C:\text1.txt 01 145

1st iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

ID SBP

1 01 145

Page 126: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

I is incremented to 2

_N_ D I D FILENAME D ID K SBP K

1 2 C:\text1.txt 01 145

2nd iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

ID SBP

1 01 145

Page 127: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

FILENAME C:\text2.txt

_N_ D I D FILENAME D ID K SBP K

1 2 C:\text2.txt 01 145

2nd iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

ID SBP

1 01 145

Page 128: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

INFILE reads:1st data line from ‘text2.txt’ input buffer

_N_ D I D FILENAME D ID K SBP K

1 2 C:\text2.txt 01 145

2nd iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

1 2 3 4 5 6 …

0 3 1 2 6 …

Input buffer:

ID SBP

1 01 145

???

Page 129: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

Why? Once one iteration of the DO loop has completed, the following iteration starts to read a new file that is specified by the FILENAME variable

_N_ D I D FILENAME D ID K SBP K

1 2 C:\text2.txt 01 145

2nd iteration of the DO loop:

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

1 2 3 4 5 6 …

0 3 1 2 6 …

Input buffer:

ID SBP

1 01 145

???

Page 130: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 . 0 .

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

_N_ 1LAST 0Other variables missing

At the beginning of the DATA step:

Page 131: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 0 .

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

I 1

1st Iteration of the DO loop (outer loop):

Page 132: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 0 .

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):FILENAME C:\text1.txt

Page 133: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 0 .

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):

The DO UNTIL loop evaluates the condition at the end of the loop

1st Iteration of the DO UNTIL loop (inner loop):

Page 134: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 0 .

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):

INFILE reads: 1st data line’ from text1.txt’ input buffer

1st Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 1 1 4 5 …

Input buffer:

Page 135: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 0 01 145

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):

INPUT statement reads data values:input buffer PDV

1st Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 1 1 4 5 …

Input buffer:

Page 136: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 0 01 145

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):1st Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 1 1 4 5 …

Input buffer:

OUTPUT statement: PDV output dataset

ID SBP

1 01 145

Page 137: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 0 01 145

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):1st Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 1 1 4 5 …

Input buffer:

SAS reaches the end of the inner loopSince LAST ≠1, the inner loop continues

ID SBP

1 01 145

Page 138: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 0 01 145

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 1 1 4 5 …

Input buffer:

The DO UNTIL loop evaluates the condition at the end of the loop

ID SBP

1 01 145

Page 139: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 1 01 145

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 2 1 1 9 …

Input buffer:

INFILE reads: 2nd data line from ‘text1.txt’ input bufferLAST 1

ID SBP

1 01 145

Page 140: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 1 02 119

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 2 1 1 9 …

Input buffer:

The INPUT statement reads data values: input buffer PDV

ID SBP

1 01 145

Page 141: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 1 02 119

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 2 1 1 9 …

Input buffer:

OUTPUT statement:PDV output dataset

ID SBP

1 01 145

2 02 119

Page 142: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 1 02 119

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):2nd Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 2 1 1 9 …

Input buffer:

SAS reaches the end of the inner loopSince LAST = 1, the inner loop ends

ID SBP

1 01 145

2 02 119

Page 143: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 1 C:\text1.txt 1 02 119

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

1st Iteration of the DO loop (outer loop):

SAS reaches the end of the outer loop

ID SBP

1 01 145

2 02 119

Page 144: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 2 C:\text1.txt 1 02 119

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

2nd Iteration of the DO loop (outer loop):

I ↑2since I ≤ 3, the 2nd iteration of the outer

loop continues

ID SBP

1 01 145

2 02 119

Page 145: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 2 C:\text2.txt 1 02 119

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

2nd Iteration of the DO loop (outer loop):

ID SBP

1 01 145

2 02 119FILENAME C:\text2.txt

Page 146: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 2 C:\text2.txt 1 02 119

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

2nd Iteration of the DO loop (outer loop):

ID SBP

1 01 145

2 02 119

The DO UNTIL loop evaluates the condition at the end of the loop

1st Iteration of the DO UNTIL loop (inner loop):

Page 147: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 2 C:\text2.txt 0 02 119

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

2nd Iteration of the DO loop (outer loop):

ID SBP

1 01 145

2 02 119

INFILE reads: 1st data line from ‘text2.txt’ input bufferNot the last record of ‘text2.txt’, LAST 0

1st Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 3 1 2 6 …

Input buffer:

Page 148: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 2 C:\text2.txt 0 03 126

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

2nd Iteration of the DO loop (outer loop):

ID SBP

1 01 145

2 02 119

INPUT statement reads data values:input buffer PDV

1st Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 3 1 2 6 …

Input buffer:

Page 149: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 2 C:\text2.txt 0 03 126

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

2nd Iteration of the DO loop (outer loop):

ID SBP

1 01 145

2 02 119

3 03 126

OUTPUT statement:PDV output dataset

1st Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 3 1 2 6 …

Input buffer:

Page 150: Get the scoop on the loop   how best to write a loop in the data step

data example15 (drop = i); do i = 1 to 3; filename = "C:\text" || put(i, 1.) || ".txt";

infile dummy filevar = filename; input id $ sbp; output;

end; stop;run;

READING MULTIPLE EXTERNAL FILES

_N_ D I D FILENAME D LAST D ID K SBP K

1 2 C:\text2.txt 0 03 126

text1.txt: 01 145 02 119

text2.txt: 03 126 04 106

text3.txt: 05 140 06 118

do until (last); infile dummy filevar = filename end=last; input id $ sbp; output;end;

2nd Iteration of the DO loop (outer loop):

ID SBP

1 01 145

2 02 119

3 03 126

Skip the rest….

1st Iteration of the DO UNTIL loop (inner loop):

1 2 3 4 5 6 …

0 3 1 2 6 …

Input buffer:

Page 151: Get the scoop on the loop   how best to write a loop in the data step

ARRAY

There is a wide range of applications in using loop structures with ARRAY processing

Since ARRAY is a large and different topic, we are not covering ARRAY in this talk

Page 152: Get the scoop on the loop   how best to write a loop in the data step

CONCLUSION

Loops allow us to create more simplified and efficient programming codes

In order to use loop structures correctly, we need to understand how DATA steps are processed

When trying to debug our programming errors, we often realize that most of the errors are closely related to programming fundamentals, which is understanding how the PDV works

Page 153: Get the scoop on the loop   how best to write a loop in the data step

CONTACT INFORMATION

Arthur Li

City of Hope

Division of Information Science

1500 East Duarte Road

Duarte, CA 91010 - 3000

Phone: (626) 256-4673 ext. 65121

E-mail: [email protected]