chapter 1: overview of sas system basic concepts of sas system
TRANSCRIPT
Chapter 1: Overview of SAS SystemBasic Concepts of SAS System
The SASProgrammingProcess
Create a SAS ProgramCreate a SAS Program
Enter the SAS Program Code
Enter the SAS Program Code
Process the SAS Program Code
Process the SAS Program Code
Review the ResultsReview the Results
Debug or ModifyDebug
or Modify
Define the Business Need
Define the Business Need
What Is SAS?SAS is a collection of components that enable you to manage, manipulate, and examine your data.
Base SASBase SAS
Reporting and
Graphics
Reporting and
Graphics
AnalyticalAnalytical
Visualizationand Discovery
Visualizationand Discovery
Data Accessand
Management
Data Accessand
Management
BusinessSolutions
BusinessSolutions
UserInterfaces
UserInterfaces
ApplicationDevelopment
ApplicationDevelopment
WebEnablement
WebEnablement
Basic Functionality
Access
Data
Manage
Present Analyze
Types of Files Used with SAS
SASProgram
FilesSAS
DATASets
RawDataFiles
Survey.sassurvey.sas7bdat
Survey.dat
SAS data set, Can be opened only by SAS system.
SAS program. It is created by users for solving problems. It can be created by the SAS Program Editor, or by any text editing software, then, copy and paste into SAS Program Editor, in order to be executed.
A raw data set in .dat format. In order to have a SAS program to read the text file, a statement linking the physical path storing the data in the SAS program is required. This is accomplished by ‘Infile’ statement in a SAS program.
• contain SAS program code
• do not contain data values
• can be saved and re-used.
SASProgram
Files
SAS Program Files
How the SAS program works
DATA steps are typically used to create SAS data sets.
PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, and sort data).
A SAS program is a sequence of steps that the user submits for execution.
RawData
RawData
DATAStep
DATAStep ReportReport
SASData Set
SASData Set
PROCStep
PROCStep
SAS Programs
Components of a SAS Program
A SAS program is a sequence of steps.
There are only two kinds of steps:
• DATA steps
• PROC steps
A SAS ProgramA SAS Program
PROC step(s)
DATA step(s)
DATA Step(s)
Typically, DATA steps read data, such as raw data files, SAS data set, Excel data sheet, as well as to create SAS data sets.
Data FileSAS Data Set
Descriptor
DATA Step
Data type for input: .dat, .txt. , .sas7bdat, xls, etc
Data type from a Data step: .sas7bdat
SAS Data Sets
Data Entry
External File
Conversion Process
SAS Data Set
Descriptor Portion
Data Portion
Other Software
Files
DATA Step(s)
In addition, DATA steps can modify existing variables or create new variables as necessary.
Data FileSAS Data Set
Descriptor
DATA Step
PROC Step(s)
PROC steps typically read SAS data sets to create reports, to analyze data.
SAS Data Set
Descriptor Report
PROC Step
PROC Step(s)
There are many different types of PROC steps.
MEANSMEANS
PRINTPRINT
FREQFREQ
. . .. . .
PROC Step(s)
Components of a Step
A SAS program is a sequence of steps:
• DATA steps
• PROC steps.
A step is a sequence of one or more statements.
Components of a Step
A statement usually starts with a keyword and always ends in a semicolon (;).
KEYWORD . . . ;
For example:
input name $ 1-8 age 11-12;
This INPUT statement can read the following data records:
----+----1----+----2----+ Peterson 21 Morgan 17
Because NAME is a character variable, a $ appears between the variable name and column numbers.
Components of a DATA Step
A DATA step starts with a DATA statement and ends with a RUN statement.
data _______________ ;
_______________ ;
. . .
_______________ ;
run;
Start
End
Components of a PROC Step
A PROC step starts with a PROC statement and ends with a RUN statement.
Start
End
proc _______________ ;
_______________ ;
. . .
_______________ ;
run;
• are nonsoftware-specific files that contain records and fields
• can be created by a variety of software products
• can be read by a variety of software products
• consist of no special attributes such as field headings, page breaks, or titles
• are not reports.
RawDataFiles
Raw Data Files
• are files specific to SAS that contain variables and observations
• can be created only by SAS
• can be read only by SAS
• consist of a descriptor portion and a data portion
SASDataSets
SAS Data Sets
SAS Data Sets
DataBase terminology
Data Processing Terminology
SAS Terminology
Table File SAS Data Set
Record Row Observation
Field Column Variable
How is a SAS data set created?
Data Entry
External File
Conversion Process
SAS Data Set
Descriptor Portion
Data Portion
Other Software
Files
Is accomplished in the DATA Step
SAS Data Sets • The descriptor portion contains attribute information about the data in a SAS data set.
SAS data set name, Date/time created, # of variables, # of observations.
For each variable: Name, Type, Length, Position, Label.
• The data portion contains the data values in the form of a rectangular table made up of observations and variables.
DescriptorPortion
DataPortion
Rules for a Valid SAS Data Set Name and a Valid Variable of a SAS data set
• Can be 1 to 32 characters long
• Must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (_)
• Can continue with any combination of numbers, letters or underscores
Example:
Policy, pOLIcY , total_bud2010_ , _N_ are valid
Total-budget , 2010_budget , #num_stud are NOT valid
Missing data in a SAS data set
• For a numeric variable, a missing data value is presented by a period (.)
• For a character variable, a missing data value is presented by a Blank space.
Variable Length
• A variable is stored in terms of # of bytes. • A character variable can be up to 32767 bytes long.• All numeric variables have a default variable length of 8, which are stored as floating-point numbers in 8 bytes of storage, unless is specified.
• Variable format is the format of outputting the variable in the SAS data set.
• Variable Informat is the specific format for inputting the variable into a standard SAS value.
• Variable Label: describe the variable in a more descriptive way. It can be up to 256 characters.
SAS Libraries
Every SAS file is stored in a SAS library.
SAS data set is one type of SAS file.
In some operating environment, a library is a physical collection of files.
In others, such as Windows and Unix environments, a library is a logical name consisting of a group of files that are stored in a physical location in a storage space.
Library can be Temporary or Permanent.
A SAS library must be prepared in order for a SAS program to reach the directory to either read or output a SAS data set.
SAS program only need to recognize the Library reference name.
Hard Drive
A Library Name
Path to the physical HD location
Reference a SAS file in a SAS LibraryA SAS library name has two-levels:
LIBREF.Filename
Libref is the the SAS Library name that is connected to a physical directory in a storage location in your computer.
fielname is a file stored in the directory referred to the Libref.
Two types of SAS Library
Temporary SAS data set:
The LIBREF is always WORK, which is already available in the Libraries folder in Explore Panel of the SAS working environment.
Example: WORK.admit is a temporary SAS data set.
NOTE: one can ignor ‘WORK’ and specify the data set as admit, if it is stored in the WORK library as temporary library.
Permanent SAS data set:
The Libref is defined by the user.
For example: Mylib.admit refers to a SAS data set admit which is stored in the library named Mylib.
Rules required for a Valid SAS Library name
• are limited to 8 characters
• must start with a letter or underscore
• can contain only letters, numbers, or underscores.
Example:
s575, _s575 , s575_ _s575_ are valid LIBREF
S-575 , sta575_online are not valid
An Example of Reading a SAS Data Set
The Admit data set contains admission information for patients in a wellness clinic.
Variable Type Length Description
ID num 8 patient ID number
Name char 20 patient name
Sex char 1 sex (F or M)
Age num 8 age in years
Date num 8 day of admission
Height num 8 height in inches
Weight num 8 weight in pounds
ActLevel char 4 activity level (LOW, MOD, HIGH)
Fee Num 8 Clinic admission fee
Some observations of the data set
ID Name Sex Age Date Height Weight ActLevel Fee
2458 Murray, W M 27 1 72 168 HIGH 85.20
2462 Almers, C F 34 3 66 172 HIGH 124.80
2501 Bonaventure, T F 31 17 61 123 LOW 149.75
2523 Johnson, R F 43 31 63 137 MOD 149.75
2539 LaMance, K M 51 4 71 158 LOW 124.80
A SAS program does the following tasks:create a SAS library: Mylib,reads SAS data set Admit from the library clinic, select the patients with HIGH activity level, store the selected patients in the Mylib library with the SAS data set name: Admit_high,print the observations in the new data set
libname Mylib ‘C:\Math707\SASData’;
DATA Mylib.admit_high;
set clinic.admit;
if ActLevel=‘HIGH’;
run;
PROC print data=Mylib.admit_high;
run;