quantify the example data first, code and quantify the data (assign column locations & variable...

Post on 31-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Quantify the Example Data

First, code and quantify the data (assign column locations & variable names)

Use the sample data to create a data set from the first 10 counties

Include: ID, County, Number of reporting Units (v1), Number of employees (v2), Payroll (v3)

Save to your flash drive as ‘countydata’

The SAS® System

Statistical Analysis Programming

Introduction to SAS®

Arguably the most popular computer software for conducting statistical data analysis

Does both data management & statistical analysis Useful for managing even the most complex data

sets

Operates on its own language

Introduction to SAS®

Open the SAS® Window

Introduction to SAS®

You essentially have 4 windows within SAS: The Explorer Sidebar Window The Log Window The Editor Window The Output Window

You can resize and reconfigure these windows, and minimize & maximize as you would in any windows-based program

Introduction to SAS®

The Editor Window is for constructing & running programs

“Programming” in SAS involves writing out step-by-step instructions in the correct order in a format the SAS System can understand The program you write must be perfect SAS will give you error messages

SAS® Programming

Three major components to most SAS programs:InputManipulationOutput

SAS® Programming

Input

Most of the time data are placed into a data file and inputted into the program

The program tells the system which variables are located in which columns

SAS® Programming: Input

Input data &column locations

SAS® Programming

Manipulation

Data are then manipulated to accomplish the tasks for which the program was written: transforming or combining variables or conducting statistical or other analyses

SAS® Programming: Manipulation

Manipulate the

Data

SAS® Programming

Output

Program Output The results of the program are then

outputted into the Output Window You must save these results

Log

SAS® Programming: Output

SAS® Programming: Log

SAS® Programming

Basic Input Statement = “DATA Step” Begins with an “options” statement that formats

what the output page will look like Names the temporary data set location

“data1,” “data 2,” etc. or text name (8 characters max) Tells SAS where to find your actual data set

File location Gives the “Input” – or, column locations for your

variables

SAS® Programming: InputOptionsTemporary Data Set

DataLocation

InputColumn

Locations

SAS® Programming

Basic Input Statement After your input statement, you add statements to

transform or manipulate the data Add statements to perform analysis procedures Ends with a RUN statement

SAS® Programming: Input

Data Manipulations &Transformations

AnalysisProcedure

SAS® Programming: Syntax

SAS Statements Commands or instructions that can be interpreted

by the SAS system These commands appear as blue text in the

Enhanced Editor window

DATA, PROC, PUT, INPUT, RUN, etc.

SAS® Programming: Syntax

Every SAS statement must end in a semicolon; This is how the system knows the statement is

complete One of the most common errors is omitting

semicolons

Comments begin with an asterisk *

SAS® Programming: Syntax In the Enhanced Editor:

Plain text is black Numerical values are teal SAS Statements are blue Errors are red

Basic arithmetic functions can be used (+, -, *, /)

SAS® Programming: Logical Operators

Symbol Abbreviation Operation

= eq equal to

^= or ~= ne not equal to

> gt greater than

< lt less than

>= or => ge greater than or equal to

<= or =< le less than or equal to

& and and

| or or

Building a SAS® Program

1. Open the SAS Program and Click inside the Editor Window

2. Add your “options” statements:options nocenter nonumber nodate linesize=88 pagesize=72;

3. Add the “data” statement, then the name of your first temporary data file (data1)

Building a SAS® Program

Building a SAS® Program

4. Add the “infile” statement, then the file location where your data is stored

5. Add the “input” statement, then each variable name followed by its numeric location

A dollar sign $ after a variable name signifies that the variable is character (text) data

Recommend that you input data in 80 column lines, #2 would signify the start of a new line

Building a SAS® Program

Building a SAS® Program

6. Add statements for data management or statistical analysis.

SAS Statements vary based on the task to be accomplished

Data management: create new variables, change values, etc.

Statistical procedures: frequencies, correlations, crosstabulations, regression, etc.

Building a SAS® Program

Building a SAS® Program

Hands-On Exercise 1: Build a Basic SAS Program

Using SAS, write a basic program for the county data set you created

For your analysis, run a “print” command: Proc Print; var county v1 v2 v3;

Exercise 1

SAS® Procedures

PROC Commands SAS procedures that perform different

operations use “PROC” commands A lot of different PROC commands, we’ll touch on

a few of the most used Some for data management Some for statistical analysis

SAS® Procedures

PROC PRINT Prints the data you have in your temporary SAS

data set Will print the variables you designate (either those

from your initial INPUT statement, or variables you create)

Helps you better understand your data set; helps you spot errors

SAS® Procedures

Proc Print; var v1 v2 v3;

This statement tells SAS to print the data / information for v1, v2, and v3

If you run “PROC PRINT” without any variables designated, it will print ALL of your variables

SAS® Procedures

PROC PRINT You should run a proc print when you transform

variables or create new variables to insure that the transformations were done correctly

Example Create a new variable by adding two others: newvar = v1+v2; Proc print; var v1 v2 newvar; Check the output to insure that the operation is

correct

Variable Manipulations

SAS will permit you to perform many different types of variable manipulations

Add Variables newvar1 = v1+v2+v3;

Subtract Variables newvar2 = v3 – v2;

Variable Manipulations

Multiply Variables newvar3 = v2 * v3;

Divide Variables newvar4 = v2/v1;

More complex transformations can be done following basic rules for arithmetic operations newvar5 = (v1+v2/v3)*4;

Variable Manipulations

You can also use your new variables in other transformations newvar6 = newvar4*newvar5

Create categorical variables You can reformat your data into new variables If you have a survey question with responses

showing ‘year of birth’ you can convert it to ‘age’

Variable Manipulations

Variable Manipulations

For example, if you have a series of data for a variable: Variable name: “vexample”

Values: 1 2 3 4 5 6 7 8 9 10

We want to create a categorical variable with the categories and corresponding values of: Low = 1 Medium = 2 High = 3

Variable Manipulations Give your new variable a name like “newvexample”

or “vexamplecat” Your new categorical variable would be created with

this if/then syntax:

Variable Manipulations

If your data is not as simple as 1 2 3 4 5 6 and so on, you can use the “PROC SORT” command to help you sort your data set

Variable Manipulations

Run a PROC SORT for v2, and then run a PROC PRINT to show the variable rearranged in ascending order

Variable Manipulations

Variable Manipulations

Now, create a new variable “newv2” with the following categories:

Low = 1 (values less than 100) Medium = 2 (values 100 to 500) High = 3 (values more than 500)

Run a PROC PRINT and PROC FREQ to check your transformations

Variable Manipulations

Variable Manipulations

IF/THEN Statements

In the previous exercise, you saw how if / then statements can be used to create new variables

If / then statements are very powerful and can be used in a number of ways to help you manage your data

IF/THEN Statements

Segmenting Data Sets – the IF statement

Simple IF Statements The SAS “IF” command can be used to segment

or partition your data set For example, suppose you only want to examine

certain cases in your data set – only females, only people over age 55, only Florida counties with populations greater than 500,000, etc.

IF/THEN Statements

You can segment in this way, using the IF statement: If we only want to examine the number of

reporting units in our sample for counties with a “low” number of employees: If newv2 is low looks like this in SAS language: IF newv2=1;

IF/THEN Statements

IF/THEN Statements

Combining IF statements to segment data sets with the DATA command

It is very useful to combine the IF command to segment data with the DATA command we learned earlier

Recall that your initial data step started with the command:

data data1; This created the initial temporary SAS data

set

IF/THEN Statements

The temporary data set “data 1” contained all of the cases that you entered into your data set

If you now want to examine only a subset of those cases, you can do that in a second data set:

data data 2; set data1; This creates a second temporary data set

called “data 2” (remember SAS allows a large number of data sets)

IF/THEN Statements

We can now use an IF statement to segment the data in our set “data 2”

Let’s create a second data set that includes only counties with a “medium” number of employees

Run a PROC PRINT to check the output

IF/THEN Statements

IF/THEN Statements

The PROC PRINT shows us that the temporary data set we’re now dealing with has only the 5 counties with a “medium” number of employees

IF/THEN Statements

Hands-On Exercise Use the commands we’ve just learned to:

1. Create a new variable for high, medium, and low payroll amounts (newv3)

2. Use the DATA and IF statements to create a new data set that contains only those counties with the highest payroll for gasoline services stations – run a PROC PRINT to check your results

IF/THEN Statements

IF/THEN Statements

IF/THEN Statements

IF/THEN Statements

The IF and THEN commands are most often used together with the operators we talked about before

SAS® Programming: Logical Operators

Symbol Abbreviation Operation

= eq equal to

^= or ~= ne not equal to

> gt greater than

< lt less than

>= or => ge greater than or equal to

<= or =< le less than or equal to

& and and

| or or

IF/THEN Statements

More Complex IF statements

Multiple IF statements can be connected using “and” or “or” statements to make more complex statements:

if v1 eq 2 or v2 gt 5 and v3 ne 2 then newvar =1

IF/THEN StatementsUsing IF and THEN statements: The general form of this command (for

creating new variables, separating data sets, etc.) is: IF variable condition exists (character indicator

abbreviation: eq, ne, lt, le, ge) THEN new variable condition (numeric symbol)

IF v2 eq 5 then newv2 = 1; Again, you can combine conditions for more

complex statements

IF/THEN Statements

Add Variables & Cases

Two other important data management functions that SAS can perform are adding additional cases or observations and adding new variables

Add Variables & Cases

Adding Cases The term for adding cases or observations is

“concatenation” This allows you to add new cases to the bottom of

your existing data set You simply create a second data set and add it to

your initial data set

Add Variables & Cases

Initial DataSet

AdditionalCases

Merged Set

Add Variables & Cases

Hands-On Exercise You have already created one data set of

10 counties

1. Create a new data set containing information for the next 4 counties (Collier, Columbia, De Soto, and Dixie)

2. Add these cases to your existing data set

3. Do a PROC PRINT for data3 to verify

Exercise

Add Variables & Cases

Adding Variables Adding variables to your existing data is simple as

well Again, you will need to create a second data set

that will essentially add a column or columns to your initial data set

The second data set will contain the new variable you are adding and one variable that matches exactly a variable in your initial data set – usually the sequential ID number (similar to Access)

Add Variables & Cases

To make sure that the data sets are properly combined, you must SORT the initial and second data set by the matching variable

The syntax looks like this:

Add Variables & Cases

Initial Data Set

AddedVariables

Merge

SAS® Statistical ProceduresDescriptive Procedure for Continuous Data PROC UNIVARIATE; Proc Univariate will provide basic descriptive

information for continuous variables The syntax looks like this:

SAS® Statistical Procedures

SAS® Statistical Procedures

Descriptive Procedure for Categorical Data PROC FREQ; Proc Freq will provide basic descriptive information

for categorical or ordinal variables The syntax looks like this:

SAS® Statistical Procedures

SAS® Statistical ProceduresAnalytical Procedures for Continuous Data PROC CORR; Proc Corr provides an analysis of the association

between two continuous variables Computes a correlation coefficient that demonstrates

the level of association, as well as a p-value showing the significance of that association

The syntax looks like this:

SAS® Statistical Procedures

Correlation coefficientp-value

SAS® Statistical Procedures

Analytical Procedures for Categorical Data PROC FREQ; Proc Freq can also be used to calculate the level of

association between two categorical or nominal variables

X2 can be added to assess the significance level of that association

The syntax looks like this:DV

IV

SAS® Statistical Procedures

CrosstabTable

Chi-squareanalysis

SAS® Statistical Procedures

PROC FREQ can also be used in conjunction with DEVIATION to analyze the standard deviation

Many SAS procedures like this have additional analyses that can be added in this way

SAS® Statistical Procedures

Multivariate Analysis: PROC REG; computes the association between a

continuous dependent variable and numerous independent variables

PROC LOGIT; computes the association between a categorical dependent variable and numerous independent variables

SAS® Statistical Procedures

Regression analysis: PROC REG; Uses the “model” command Construct your model with your dependent variable

first, then your independent variables The syntax looks like this:

SAS® Statistical Procedures

SAS® Statistical Procedures

These are only a few examples of the analyses you can do with SAS

SAS can also do: Time series analysis Factor analysis ANNOVA T-tests …and more!

top related