quantify the example data first, code and quantify the data (assign column locations & variable...

85
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the first 10 counties Include: ID, County, Number of reporting Units (v1), Number of employees (v2), Payroll (v3) Save to your flash drive as ‘countydata’

Upload: blaze-job-briggs

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Quantify the Example Data

First, code and quantify the data (assign column locations & variable names)

Use the sample data to create a data set from the first 10 counties

Include: ID, County, Number of reporting Units (v1), Number of employees (v2), Payroll (v3)

Save to your flash drive as ‘countydata’

Page 2: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

The SAS® System

Statistical Analysis Programming

Page 3: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Introduction to SAS®

Arguably the most popular computer software for conducting statistical data analysis

Does both data management & statistical analysis Useful for managing even the most complex data

sets

Operates on its own language

Page 4: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Introduction to SAS®

Open the SAS® Window

Page 5: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Introduction to SAS®

You essentially have 4 windows within SAS: The Explorer Sidebar Window The Log Window The Editor Window The Output Window

You can resize and reconfigure these windows, and minimize & maximize as you would in any windows-based program

Page 6: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Introduction to SAS®

The Editor Window is for constructing & running programs

“Programming” in SAS involves writing out step-by-step instructions in the correct order in a format the SAS System can understand The program you write must be perfect SAS will give you error messages

Page 7: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming

Three major components to most SAS programs:InputManipulationOutput

Page 8: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming

Input

Most of the time data are placed into a data file and inputted into the program

The program tells the system which variables are located in which columns

Page 9: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Input

Input data &column locations

Page 10: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming

Manipulation

Data are then manipulated to accomplish the tasks for which the program was written: transforming or combining variables or conducting statistical or other analyses

Page 11: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Manipulation

Manipulate the

Data

Page 12: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming

Output

Program Output The results of the program are then

outputted into the Output Window You must save these results

Log

Page 13: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Output

Page 14: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Log

Page 15: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming

Basic Input Statement = “DATA Step” Begins with an “options” statement that formats

what the output page will look like Names the temporary data set location

“data1,” “data 2,” etc. or text name (8 characters max) Tells SAS where to find your actual data set

File location Gives the “Input” – or, column locations for your

variables

Page 16: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: InputOptionsTemporary Data Set

DataLocation

InputColumn

Locations

Page 17: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming

Basic Input Statement After your input statement, you add statements to

transform or manipulate the data Add statements to perform analysis procedures Ends with a RUN statement

Page 18: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Input

Data Manipulations &Transformations

AnalysisProcedure

Page 19: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Syntax

SAS Statements Commands or instructions that can be interpreted

by the SAS system These commands appear as blue text in the

Enhanced Editor window

DATA, PROC, PUT, INPUT, RUN, etc.

Page 20: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Syntax

Every SAS statement must end in a semicolon; This is how the system knows the statement is

complete One of the most common errors is omitting

semicolons

Comments begin with an asterisk *

Page 21: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Syntax In the Enhanced Editor:

Plain text is black Numerical values are teal SAS Statements are blue Errors are red

Basic arithmetic functions can be used (+, -, *, /)

Page 22: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Logical Operators

Symbol Abbreviation Operation

= eq equal to

^= or ~= ne not equal to

> gt greater than

< lt less than

>= or => ge greater than or equal to

<= or =< le less than or equal to

& and and

| or or

Page 23: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Building a SAS® Program

1. Open the SAS Program and Click inside the Editor Window

2. Add your “options” statements:options nocenter nonumber nodate linesize=88 pagesize=72;

3. Add the “data” statement, then the name of your first temporary data file (data1)

Page 24: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Building a SAS® Program

Page 25: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Building a SAS® Program

4. Add the “infile” statement, then the file location where your data is stored

5. Add the “input” statement, then each variable name followed by its numeric location

A dollar sign $ after a variable name signifies that the variable is character (text) data

Recommend that you input data in 80 column lines, #2 would signify the start of a new line

Page 26: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Building a SAS® Program

Page 27: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Building a SAS® Program

6. Add statements for data management or statistical analysis.

SAS Statements vary based on the task to be accomplished

Data management: create new variables, change values, etc.

Statistical procedures: frequencies, correlations, crosstabulations, regression, etc.

Page 28: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Building a SAS® Program

Page 29: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Building a SAS® Program

Hands-On Exercise 1: Build a Basic SAS Program

Using SAS, write a basic program for the county data set you created

For your analysis, run a “print” command: Proc Print; var county v1 v2 v3;

Page 30: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Exercise 1

Page 31: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Procedures

PROC Commands SAS procedures that perform different

operations use “PROC” commands A lot of different PROC commands, we’ll touch on

a few of the most used Some for data management Some for statistical analysis

Page 32: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Procedures

PROC PRINT Prints the data you have in your temporary SAS

data set Will print the variables you designate (either those

from your initial INPUT statement, or variables you create)

Helps you better understand your data set; helps you spot errors

Page 33: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Procedures

Proc Print; var v1 v2 v3;

This statement tells SAS to print the data / information for v1, v2, and v3

If you run “PROC PRINT” without any variables designated, it will print ALL of your variables

Page 34: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Procedures

PROC PRINT You should run a proc print when you transform

variables or create new variables to insure that the transformations were done correctly

Example Create a new variable by adding two others: newvar = v1+v2; Proc print; var v1 v2 newvar; Check the output to insure that the operation is

correct

Page 35: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

SAS will permit you to perform many different types of variable manipulations

Add Variables newvar1 = v1+v2+v3;

Subtract Variables newvar2 = v3 – v2;

Page 36: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

Multiply Variables newvar3 = v2 * v3;

Divide Variables newvar4 = v2/v1;

More complex transformations can be done following basic rules for arithmetic operations newvar5 = (v1+v2/v3)*4;

Page 37: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

You can also use your new variables in other transformations newvar6 = newvar4*newvar5

Create categorical variables You can reformat your data into new variables If you have a survey question with responses

showing ‘year of birth’ you can convert it to ‘age’

Page 38: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

Page 39: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

For example, if you have a series of data for a variable: Variable name: “vexample”

Values: 1 2 3 4 5 6 7 8 9 10

We want to create a categorical variable with the categories and corresponding values of: Low = 1 Medium = 2 High = 3

Page 40: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations Give your new variable a name like “newvexample”

or “vexamplecat” Your new categorical variable would be created with

this if/then syntax:

Page 41: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

If your data is not as simple as 1 2 3 4 5 6 and so on, you can use the “PROC SORT” command to help you sort your data set

Page 42: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

Run a PROC SORT for v2, and then run a PROC PRINT to show the variable rearranged in ascending order

Page 43: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

Page 44: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

Now, create a new variable “newv2” with the following categories:

Low = 1 (values less than 100) Medium = 2 (values 100 to 500) High = 3 (values more than 500)

Run a PROC PRINT and PROC FREQ to check your transformations

Page 45: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

Page 46: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Variable Manipulations

Page 47: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

In the previous exercise, you saw how if / then statements can be used to create new variables

If / then statements are very powerful and can be used in a number of ways to help you manage your data

Page 48: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

Segmenting Data Sets – the IF statement

Simple IF Statements The SAS “IF” command can be used to segment

or partition your data set For example, suppose you only want to examine

certain cases in your data set – only females, only people over age 55, only Florida counties with populations greater than 500,000, etc.

Page 49: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

You can segment in this way, using the IF statement: If we only want to examine the number of

reporting units in our sample for counties with a “low” number of employees: If newv2 is low looks like this in SAS language: IF newv2=1;

Page 50: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

Page 51: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

Combining IF statements to segment data sets with the DATA command

It is very useful to combine the IF command to segment data with the DATA command we learned earlier

Recall that your initial data step started with the command:

data data1; This created the initial temporary SAS data

set

Page 52: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

The temporary data set “data 1” contained all of the cases that you entered into your data set

If you now want to examine only a subset of those cases, you can do that in a second data set:

data data 2; set data1; This creates a second temporary data set

called “data 2” (remember SAS allows a large number of data sets)

Page 53: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

We can now use an IF statement to segment the data in our set “data 2”

Let’s create a second data set that includes only counties with a “medium” number of employees

Run a PROC PRINT to check the output

Page 54: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

Page 55: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

The PROC PRINT shows us that the temporary data set we’re now dealing with has only the 5 counties with a “medium” number of employees

Page 56: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

Hands-On Exercise Use the commands we’ve just learned to:

1. Create a new variable for high, medium, and low payroll amounts (newv3)

2. Use the DATA and IF statements to create a new data set that contains only those counties with the highest payroll for gasoline services stations – run a PROC PRINT to check your results

Page 57: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

Page 58: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

Page 59: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

Page 60: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

The IF and THEN commands are most often used together with the operators we talked about before

Page 61: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Programming: Logical Operators

Symbol Abbreviation Operation

= eq equal to

^= or ~= ne not equal to

> gt greater than

< lt less than

>= or => ge greater than or equal to

<= or =< le less than or equal to

& and and

| or or

Page 62: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

More Complex IF statements

Multiple IF statements can be connected using “and” or “or” statements to make more complex statements:

if v1 eq 2 or v2 gt 5 and v3 ne 2 then newvar =1

Page 63: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN StatementsUsing IF and THEN statements: The general form of this command (for

creating new variables, separating data sets, etc.) is: IF variable condition exists (character indicator

abbreviation: eq, ne, lt, le, ge) THEN new variable condition (numeric symbol)

IF v2 eq 5 then newv2 = 1; Again, you can combine conditions for more

complex statements

Page 64: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

IF/THEN Statements

Page 65: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Add Variables & Cases

Two other important data management functions that SAS can perform are adding additional cases or observations and adding new variables

Page 66: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Add Variables & Cases

Adding Cases The term for adding cases or observations is

“concatenation” This allows you to add new cases to the bottom of

your existing data set You simply create a second data set and add it to

your initial data set

Page 67: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Add Variables & Cases

Initial DataSet

AdditionalCases

Merged Set

Page 68: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Add Variables & Cases

Hands-On Exercise You have already created one data set of

10 counties

1. Create a new data set containing information for the next 4 counties (Collier, Columbia, De Soto, and Dixie)

2. Add these cases to your existing data set

3. Do a PROC PRINT for data3 to verify

Page 69: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Exercise

Page 70: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Add Variables & Cases

Adding Variables Adding variables to your existing data is simple as

well Again, you will need to create a second data set

that will essentially add a column or columns to your initial data set

The second data set will contain the new variable you are adding and one variable that matches exactly a variable in your initial data set – usually the sequential ID number (similar to Access)

Page 71: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Add Variables & Cases

To make sure that the data sets are properly combined, you must SORT the initial and second data set by the matching variable

The syntax looks like this:

Page 72: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

Add Variables & Cases

Initial Data Set

AddedVariables

Merge

Page 73: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical ProceduresDescriptive Procedure for Continuous Data PROC UNIVARIATE; Proc Univariate will provide basic descriptive

information for continuous variables The syntax looks like this:

Page 74: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

Page 75: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

Descriptive Procedure for Categorical Data PROC FREQ; Proc Freq will provide basic descriptive information

for categorical or ordinal variables The syntax looks like this:

Page 76: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

Page 77: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical ProceduresAnalytical Procedures for Continuous Data PROC CORR; Proc Corr provides an analysis of the association

between two continuous variables Computes a correlation coefficient that demonstrates

the level of association, as well as a p-value showing the significance of that association

The syntax looks like this:

Page 78: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

Correlation coefficientp-value

Page 79: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

Analytical Procedures for Categorical Data PROC FREQ; Proc Freq can also be used to calculate the level of

association between two categorical or nominal variables

X2 can be added to assess the significance level of that association

The syntax looks like this:DV

IV

Page 80: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

CrosstabTable

Chi-squareanalysis

Page 81: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

PROC FREQ can also be used in conjunction with DEVIATION to analyze the standard deviation

Many SAS procedures like this have additional analyses that can be added in this way

Page 82: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

Multivariate Analysis: PROC REG; computes the association between a

continuous dependent variable and numerous independent variables

PROC LOGIT; computes the association between a categorical dependent variable and numerous independent variables

Page 83: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

Regression analysis: PROC REG; Uses the “model” command Construct your model with your dependent variable

first, then your independent variables The syntax looks like this:

Page 84: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

Page 85: Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the

SAS® Statistical Procedures

These are only a few examples of the analyses you can do with SAS

SAS can also do: Time series analysis Factor analysis ANNOVA T-tests …and more!