the data step

4
7/21/2019 The Data Step http://slidepdf.com/reader/full/the-data-step 1/4 The Data Step Recall from Part 1 of this tutorial that a SAS program consists of two main blocks of code: the data step and the procedure (proc) step. The data step follows the following format:  DATA Dataset-Name (OPTIONS );  . . .  RUN; n the SAS program file abo!e" DATA is the ke#word that starts the data step" meaning that it tells SAS to create a dataset. Dataset-Name is the name of the dataset that #ou want to create or manipulate. f #ou want to add an# of the dataset options (see below)" the# would go in the parenthetical after #ou name the dataset. n between the first and last lines are the statements that create and manipulate the dataset. $ote the data step ends with a RUN statement and a semicolon. The SET Statement %hen #ou need to cop# or modif# an e&isting dataset" use the SET statement in the data step. n general the code will follow this form:  DATA New-Dataset-Name (OPTIONS );  SET Dataset-Name (OPTIONS );  . . .  RUN; The statements abo!e tell SAS to create a new dataset ( New-Dataset-Name) that is an e&act cop# of an e&isting SAS dataset ( Dataset-Name). This allows #ou to create new !ariables or modif# old ones without permanentl# changing the original data. (t is strongl# recommended that #ou do not alter #our original data files.)  A data step containing onl# the SET statement will create an e&act cop# of the dataset. 'or e&ample" the program  DATA new_sample;  SET sample;  RUN; creates a new temporar# dataset called new_sample that is a clone of the alread# e&isting dataset called sample. ou might use code like this when #ou

Upload: rmayreddy

Post on 06-Mar-2016

217 views

Category:

Documents


0 download

DESCRIPTION

The Data Step

TRANSCRIPT

Page 1: The Data Step

7/21/2019 The Data Step

http://slidepdf.com/reader/full/the-data-step 1/4

The Data Step

Recall from Part 1 of this tutorial that a SAS program consists of two main blocksof code: the data step and the procedure (proc) step.

The data step follows the following format:

  DATA  Dataset-Name (OPTIONS );  .

.

.  RUN;

n the SAS program file abo!e" DATA  is the ke#word that starts the data step"

meaning that it tells SAS to create a dataset.Dataset-Name is the name of thedataset that #ou want to create or manipulate. f #ou want to add an# of thedataset options (see below)" the# would go in the parenthetical after #ou name

the dataset. n between the first and last lines are the statements that create andmanipulate the dataset. $ote the data step ends with a RUN statement and a

semicolon.

The SET Statement

%hen #ou need to cop# or modif# an e&isting dataset" use the SET statement in

the data step. n general the code will follow this form:

  DATA  New-Dataset-Name (OPTIONS );

  SET Dataset-Name (OPTIONS );  .

.

.  RUN;

The statements abo!e tell SAS to create a new dataset (New-Dataset-Name)

that is an e&act cop# of an e&isting SAS dataset (Dataset-Name). This allows

#ou to create new !ariables or modif# old ones without permanentl# changing theoriginal data. (t is strongl# recommended that #ou do not alter #our original datafiles.)

 A data step containing onl# the SET statement will create an e&act cop# of the

dataset. 'or e&ample" the program

  DATA  new_sample;

  SET sample;  RUN;

creates a new temporar# dataset called new_sample that is a clone of the

alread# e&isting dataset called sample. ou might use code like this when #ou

Page 2: The Data Step

7/21/2019 The Data Step

http://slidepdf.com/reader/full/the-data-step 2/4

want to cop# a dataset from the temporar# librar# to a permanent librar# or !ice!ersa.

ou do not necessaril# ha!e to use a different dataset name inthe DATA  statement than the dataset name #ou use in the SETstatement. If you

use the same name, then SAS will overwrite the current dataset with thenew dataset you are creating.

DROP and KEEP

The KEEP option specifies that SAS should onl# process the listed !ariables of

the named dataset. on!ersel#" the DROP option specifies that SAS should

e&clude the listed !ariables when processing the named dataset. These twooptions can accomplish the same thing" but in a gi!en situation one will likel# beeasier than another. f #ou onl# want to remo!e a couple of !ariables from adataset" then using a DROP option would be easier than specif#ing all the

!ariables to remain in a KEEP option. *ice !ersa" if #ou onl# want to retain acouple of !ariables in the dataset then using a KEEP option would be easier than

specif#ing all the !ariables to remo!e in a DROP option.

Example. n a pre!ious section of the tutorial" we demonstrated how to calculate!ariables. Sa#" for instance" we want to calculate new !ariables based on e&isting!ariables but we don+t want the e&isting !ariables to remain in the new dataset.The following e&ample creates two new !ariables based on the e&isting!ariables height and weight but remo!es them from the new

dataset sample_new_vars.

  DATA  sample_new_vars (DROP = height weight);  SET sample;  bmi = (weight / (heightheight) ) 703;  height! = height 0.0254;  "# siblings $= 1 T%E& siblings! = 1;  "# siblings = 0 T%E& siblings! = 0;  RUN;

RENAME

This option tells SAS to change the name of a !ariable (or !ariables). t

accomplishes the same thing as creating a new !ariable in #our dataset andsetting it e,ual to an old !ariable" but it is much more efficient. The format is asfollows:

  RE&'E = (oldvariable = newvariable)

ou can specif# as man# !ariables as #ou want as long as each pair of old and

Page 3: The Data Step

7/21/2019 The Data Step

http://slidepdf.com/reader/full/the-data-step 3/4

new !ariable is separated b# a space.

Example. hange the name of the !ariable en*er to Se+ and the

!ariable DO, to Date_-_,irth in the sample dataset.

  DATA  sample! (RE&'E=(en*er=Se+ DO,=Date_-_,irth));  SET sample;

RUN;

FIRSTOBS and OBS

f #ou onl# want SAS to process part of #our dataset" starting at a certain recordnumber" then use the #"RSTO,S dataset option. The format is as follows:

  #"RSTO,S = n 

(where n is an integer that corresponds to the obser!ation number -rowwhere SAS should start)

The #"RSTO,S dataset option is often used in con/unction with the O,S dataset

option. The O,S dataset option tells SAS to stop processing at the record

number specified in the option statement. The format is as follows:

  O,S = n 

(where n is an integer that corresponds to the obser!ation number -rowwhere SAS should stop processing)

0sing the #"RSTO,S and O,S dataset options together can be a useful wa# tominimie unnecessar# processing.

Example. The #"RSTO,S and O,S options together are fre,uentl# used in

a PROC PRINT to !iew a subset of obser!ations in the dataset. The following

code will print the obser!ations 23 through 43 from the sample dataset in theoutput window.

  PROC PRINT D'T'=sample (#"RSTO,S=20 O,S=30);  RUN;

 As #ou saw in the pre!ious e&ample" dataset options are not necessaril# limitedto the data step. ou can specif# options an#time #ou refer to a dataset.

More Data Step Options

Page 4: The Data Step

7/21/2019 The Data Step

http://slidepdf.com/reader/full/the-data-step 4/4

SAS pro!ides man# usefuldata step options. 5ata step options pro!ide SAS with additional instructions onhow to read or write the dataset #ou name. 5ata step options are generall#

attached to an output dataset (one that SAS is going to create)" but sometimesthe# can be attached to an input dataset (one that SAS is going to read" likewhen a S6T statement is used). All dataset options are e&plained in the SAS7elp and 5ocumentation window as shown to the right.