introduction to sas. what is sas? sas originally stood for “statistical analysis system”. sas is...

30
Introduction to SAS

Upload: august-samson-harper

Post on 12-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Introduction to SAS

Page 2: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

What is SAS?

• SAS originally stood for “Statistical Analysis System”.• SAS is a computer software system that provides all

the tools needed for data analysis: reading data: flexible input techniques transformations: programming language with statistical &

mathematical functions manipulation: sorting, subsetting, concatenating, and merging maintenance: storing, documenting, updating, and editing report writing: printing information using pre-written procedures

or customized programs graphics: charts, plots, maps, and slides data reduction and summarization: descriptive statistics statistical analysis: from simple crosstab to complex multivariate

techniques

Page 3: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

What is SAS?

• SAS consists of a data-handling language and a library of procedures that work together as a system. A supervisor program controls the execution of your SAS job.

• The SAS System is comprised of numerous SAS products. This course will focus on base SAS software.

Page 4: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Components of Base SAS Software

• Base SAS software contains– a data management facility– a programming language– data analysis and reporting facilities.

• Learning to use these features of base SAS software prepares you to learn other SAS products, because they all follow the same basic rules.

Page 5: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Data Management Facility• SAS organizes data into a rectangular form

called a SAS data set. The example below shows the rectangular form and describes participants in a 16-week weight program at a health and fitness club.

• Note that a variable contains the same type of data value for all observations.

Page 6: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

How to build a SAS data set ?• Using the SAS programming language.

(1) The DATA statement tells SAS to begin building a SAS data set named WEIGHT_CLUB.(2) The INPUT statement identifies the fields to be read from the input data and names the SAS variables to be created from them.(3) This is an assignment statement that calculates the weight each person lost and assigns it to a new variable, Loss.(4) The DATALINES statement indicates that data lines follow.(5) These data lines contain the raw data. This way of reading raw data is useful when you don’t have a lot of data.(6) The semicolon signals the end of the raw data.

Page 7: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Programming Language

Rules for SAS Statements• Most SAS statements begin with an identifying keyword.• All SAS statements end with a semicolon.• You can enter SAS statements in lowercase, uppercase, or

a mixture.• SAS statements are free format.

– They can begin anywhere on a line and end anywhere on a line.– One statement can continue over several lines as long as you do not split a word

over 2 lines.– Several statements can be on one line.– Words in SAS statements are separated by blanks – as many as you want – or by

special characters, e.g. “=”.

• Recommended style (not rules, but conventions):– Start each statement on a new line.– Start DATA and PROC statements in column 1. Indent the other

statements within the– DATA or PROC step to indicate the logical structure of the step.

Page 8: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Programming Language

Rules for Most SAS Names• SAS names are used for data sets, variables, and other

items.– A SAS name can contain from 1-32 characters.– The first character must be a letter or underscore ( _ ).– Subsequent characters can be letters, numbers, or

underscores.– Blanks cannot appear in a SAS name.

• A Special Rule for Variable Names– For variable names only, SAS remembers the combination of

uppercase and lowercase letters used when the variable was created. Internally, the case does not matter (‘dog’, ‘DOG’, and ‘Dog’ represent the same variable). But for printing purposes, SAS uses the original case of each letter.

Page 9: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Data Analysis & Reporting Utilities

• Base SAS includes a library of built-in programs known as SAS procedures. SAS procedures analyze data from SAS data sets and produce preprogrammed reports.

• The SAS program below uses the PRINT procedure to produce a report that displays the values of the variables in the WEIGHT_CLUB data set.

Page 10: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Data Analysis and Reporting Utilities

• The following output shows the results:

Page 11: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Data Analysis and Reporting Utilities

• To produce a table showing the mean starting weight, ending weight, and weight loss for each team, you can use the TABULATE procedure;

Page 12: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

The structure of SAS program• A portion of a SAS program that begins with a

PROC (procedure) statement and ends with a RUN statement (or another PROC or DATA statement) is called a PROC step. Both of the PROC steps above include the following elements:– A PROC/DATA statement, which includes the word PROC/DATA,

the name of the procedure/data you want to use, and for PROC you need to specify the name of the SAS data set that contains the values to be analyzed.

– Additional statements that give SAS more information about what you want to do, for example, the CLASS, VAR, TABLE, and TITLE statements.

– A RUN statement, which indicates that the preceding group of statements is ready to be executed.

Page 13: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

SAS Processing

• All SAS jobs are a sequence of SAS steps. There are only two kinds of SAS steps:– DATA steps are usually used to create SAS data

sets, but can be used to produce reports.– PROC steps analyze or process SAS data sets

(generate reports and graphs, edit data, sort data) and, in some cases, create SAS data sets.

Page 14: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

The DATA Step

• In DATA steps, a powerful programming language gives programmers great flexibility in designing applications.

• DATA step capabilities include:– Sophisticated record I/O– Conditional logic– Iterative do loops– Array processing– Structured programming logic– A wide range of functions– Producing customized reports

Page 15: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

The PROC Step

• In PROC steps, a large library of prewritten procedures enables end users to produce reports easily.

• You can use PROC steps in base SAS software for:– List and tabular reports– Graphics– Statistical analysis– Data management– Ad hoc queries– Accessing other software files

Page 16: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

The SAS Data Set• Data must be in the form of a SAS data set to

be processed by most SAS procedures and some DATA statements.

• SAS data sets consist of a descriptor portion that contains information about the data and a data portion that contains the data values.

• The data values in the data portion are arranged in a rectangular table.

Page 17: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

SAS Variable

Rules for variable names:• Can be 1-32 characters in length.• Start with the letter A-Z or the underscore

character ( _ ).• Continue with letters, numbers, and

underscores.• Recommendation: Choose names that describe

the fields.There are two kinds of variables.• Character variables: Values are stored using

ASCII representation and can be from 1-32,767 characters in length.

• Numeric variables: Values are stored using floating point representation and can be 3-8 bytes long (typically use 8).

Page 18: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

SAS Variable

• Any number of variables can be stored in a SAS data set in SAS 9 (limited only by the computer’s capacity).

• The rows in a data set are called observations (or records). There is no limit to the number of observations.

Page 19: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Missing Values• Most collections of data contain missing values. The

rectangular structure of a SAS data set implies that a value must exist for every variable for every observation.

• In SAS data sets, missing values are represented by:– a period ( . ) for a numeric variable– a blank (" ") for a character variable

• Missing numeric variables are not zero. They are excluded from arithmetic and statistical computations.– Each SAS PROC checks variables for missing values and takes

appropriate action.– See the individual PROC descriptions in the User's Guide for details.

Page 20: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Documenting SAS Data Sets

• A SAS data set contains, in addition to the data values, descriptors about the data set as a whole and descriptors with names and attributes of the variables.• The name of the data set and its

member type• The date and time the data set was

created• The # of observations• The # of variables• The engine type• The attribute information: the

variable’s name, type, length, position, format for printing, informat for input, and label.

Page 21: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Documenting SAS Data Sets

• Below is a partial listing of the descriptor portion of a SAS data set.

PROC contents data=sc.class;Run;

Page 22: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

SAS Data Libraries

• A SAS data library is a collection of SAS files recognized as a unit by the SAS System. In directory-based operating systems, such as Windows or UNIX, a SAS data library is a collection of SAS files of the same engine type stored in a specific directory.

• Every SAS file has a two-level name. The first level determines whether the file is temporary or permanent.

• The general form of a SAS filename is:libref.SAS-filename– libref is a name specified in a LIBNAME

statement that is associated with a directory

Page 23: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

SAS Data Libraries

• SAS-filename refers to a specific SAS file in the library

• If you do not specify a libref (first-level name):– The default libref is WORK.– The data set is temporary.

• The LIBNAME statement is used to associate a libref with a directory containing SAS data files. Once defined, a libref can be used repeatedly throughout a program.

• You can think of librefs as temporary nicknames that you use to identify SAS data libraries during a SAS session.

Page 24: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

LIBNAME statement

• LIBNAME libref < engine-name > 'SAS-data-library' options ;– libref any valid SAS name (but only up to 8

characters long)– SAS-data-library a directory– engine-name an optional parameter specifying one of

the library engines supported by a given operating system• V8 accesses Version 8 or 9 SAS data sets• V6 accesses Version 6.10, 6.11, and 6.12 SAS data

sets• XPORT accesses transport format files

LIBNAME classlib ‘C:\SOCI6200\SASDATA';PROC PRINT DATA=classlib.class;RUN;

Page 25: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Temporary/Permanent SAS Libraries

• You can store SAS data sets in a temporary SAS data library by omitting the libref or by using the libref WORK (a libref that SAS always assigns for you). For example:

• You can permanently store SAS data sets by using a libref other than WORK. The directory where you want to store your data sets must exist. For example:LIBNAME soci 'Y:\SOCI6200' ;

DATA soci.one ; INFILE xyz ; INPUT a b c ;RUN;

Page 26: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

SAS Files• The individual files in the library are

considered members of the library. Member types include DATA, VIEW, CATALOG, ACCESS, and PROGRAM.

• SAS data sets can have one of two member types, DATA or VIEW, depending on the kind of information they contain.

Page 27: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Comments in SAS Code

• There are two ways to insert comments in SAS:* message ; or /* message */

• Comments can be used anywhere in a SAS program for documentation purposes.

• SAS ignores comments during processing.– * message ; must be written as a separate

statement and can not contain internal semicolons.

– /* message */ can be written within statements or anywhere a blank can appear. These comments can contain semicolons.

Page 28: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Outcomes of executing a SAS program

• A SAS program consists of a series of DATA steps and PROC steps. When you execute a SAS program, the output generated by SAS is divided into two parts: SAS Log and SAS Output.

• SAS Log– Contains information about the processing of the SAS

program.– Prints the statements you entered.– Prints errors and warning messages.– Prints NOTEs relating to each step:

• For each DATA step, documents the creation of the data set.• For each PROC step, indicates the page numbers of the output

and how much time the procedure spent operating.

• SAS Output contains the results of the PROC steps.

Page 29: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

Starting and Running SAS Programs

• There are three modes of execution or environments you can use to run SAS programs:– interactive windowing environment– interactive line mode– noninteractive or batch mode

• We will only discuss the interactive windowing environment and the noninteractive mode in this course. These are the two most common modes of execution.

Page 30: Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools

SAS for Windows

This window allows you to write SAS programs and submit your programs to execute.

This window illustrates a table of contents of SAS Output Window.

This window shows SAS files and libraries in the Windows Explore like display

This window returns the output of results from SAS executions.

This window displays the notes of SAS sessions, and tells you any errors, warnings after you submit your SAS programs