the data step
DESCRIPTION
The Data StepTRANSCRIPT
7/21/2019 The Data Step
http://slidepdf.com/reader/full/the-data-step 1/4
The Data Step
Recall from Part 1 of this tutorial that a SAS program consists of two main blocksof code: the data step and the procedure (proc) step.
The data step follows the following format:
DATA Dataset-Name (OPTIONS ); .
.
. RUN;
n the SAS program file abo!e" DATA is the ke#word that starts the data step"
meaning that it tells SAS to create a dataset.Dataset-Name is the name of thedataset that #ou want to create or manipulate. f #ou want to add an# of thedataset options (see below)" the# would go in the parenthetical after #ou name
the dataset. n between the first and last lines are the statements that create andmanipulate the dataset. $ote the data step ends with a RUN statement and a
semicolon.
The SET Statement
%hen #ou need to cop# or modif# an e&isting dataset" use the SET statement in
the data step. n general the code will follow this form:
DATA New-Dataset-Name (OPTIONS );
SET Dataset-Name (OPTIONS ); .
.
. RUN;
The statements abo!e tell SAS to create a new dataset (New-Dataset-Name)
that is an e&act cop# of an e&isting SAS dataset (Dataset-Name). This allows
#ou to create new !ariables or modif# old ones without permanentl# changing theoriginal data. (t is strongl# recommended that #ou do not alter #our original datafiles.)
A data step containing onl# the SET statement will create an e&act cop# of the
dataset. 'or e&le" the program
DATA new_sample;
SET sample; RUN;
creates a new temporar# dataset called new_sample that is a clone of the
alread# e&isting dataset called sample. ou might use code like this when #ou
7/21/2019 The Data Step
http://slidepdf.com/reader/full/the-data-step 2/4
want to cop# a dataset from the temporar# librar# to a permanent librar# or !ice!ersa.
ou do not necessaril# ha!e to use a different dataset name inthe DATA statement than the dataset name #ou use in the SETstatement. If you
use the same name, then SAS will overwrite the current dataset with thenew dataset you are creating.
DROP and KEEP
The KEEP option specifies that SAS should onl# process the listed !ariables of
the named dataset. on!ersel#" the DROP option specifies that SAS should
e&clude the listed !ariables when processing the named dataset. These twooptions can accomplish the same thing" but in a gi!en situation one will likel# beeasier than another. f #ou onl# want to remo!e a couple of !ariables from adataset" then using a DROP option would be easier than specif#ing all the
!ariables to remain in a KEEP option. *ice !ersa" if #ou onl# want to retain acouple of !ariables in the dataset then using a KEEP option would be easier than
specif#ing all the !ariables to remo!e in a DROP option.
Example. n a pre!ious section of the tutorial" we demonstrated how to calculate!ariables. Sa#" for instance" we want to calculate new !ariables based on e&isting!ariables but we don+t want the e&isting !ariables to remain in the new dataset.The following e&le creates two new !ariables based on the e&isting!ariables height and weight but remo!es them from the new
dataset sample_new_vars.
DATA sample_new_vars (DROP = height weight); SET sample; bmi = (weight / (heightheight) ) 703; height! = height 0.0254; "# siblings $= 1 T%E& siblings! = 1; "# siblings = 0 T%E& siblings! = 0; RUN;
RENAME
This option tells SAS to change the name of a !ariable (or !ariables). t
accomplishes the same thing as creating a new !ariable in #our dataset andsetting it e,ual to an old !ariable" but it is much more efficient. The format is asfollows:
RE&'E = (oldvariable = newvariable)
ou can specif# as man# !ariables as #ou want as long as each pair of old and
7/21/2019 The Data Step
http://slidepdf.com/reader/full/the-data-step 3/4
new !ariable is separated b# a space.
Example. hange the name of the !ariable en*er to Se+ and the
!ariable DO, to Date_-_,irth in the sample dataset.
DATA sample! (RE&'E=(en*er=Se+ DO,=Date_-_,irth)); SET sample;
RUN;
FIRSTOBS and OBS
f #ou onl# want SAS to process part of #our dataset" starting at a certain recordnumber" then use the #"RSTO,S dataset option. The format is as follows:
#"RSTO,S = n
(where n is an integer that corresponds to the obser!ation number -rowwhere SAS should start)
The #"RSTO,S dataset option is often used in con/unction with the O,S dataset
option. The O,S dataset option tells SAS to stop processing at the record
number specified in the option statement. The format is as follows:
O,S = n
(where n is an integer that corresponds to the obser!ation number -rowwhere SAS should stop processing)
0sing the #"RSTO,S and O,S dataset options together can be a useful wa# tominimie unnecessar# processing.
Example. The #"RSTO,S and O,S options together are fre,uentl# used in
a PROC PRINT to !iew a subset of obser!ations in the dataset. The following
code will print the obser!ations 23 through 43 from the sample dataset in theoutput window.
PROC PRINT D'T'=sample (#"RSTO,S=20 O,S=30); RUN;
As #ou saw in the pre!ious e&le" dataset options are not necessaril# limitedto the data step. ou can specif# options an#time #ou refer to a dataset.
More Data Step Options
7/21/2019 The Data Step
http://slidepdf.com/reader/full/the-data-step 4/4
SAS pro!ides man# usefuldata step options. 5ata step options pro!ide SAS with additional instructions onhow to read or write the dataset #ou name. 5ata step options are generall#
attached to an output dataset (one that SAS is going to create)" but sometimesthe# can be attached to an input dataset (one that SAS is going to read" likewhen a S6T statement is used). All dataset options are e&plained in the SAS7elp and 5ocumentation window as shown to the right.