how to start using sas tina tian. the topics an overview of the sas system reading raw data/ create...

42
How to start using SAS Tina Tian

Upload: alexia-perry

Post on 05-Jan-2016

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

How to start using SAS

Tina Tian

Page 2: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The topics

An overview of the SAS system

Reading raw data/ create SAS data set

Combining SAS data sets & Match merging SAS Data Sets

Formatting data

Introduce some simple statistical analysis procedures

Page 3: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Basic Screen Navigation

Main: Editor contains the SAS program to be submitted. Log contains information about the processing of the SAS

program, including any warning and error messages Output contains reports generated by SAS procedures and

DATA steps Side:

Explore navigate to other objects like libraries Results navigate your Output window

Page 4: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

SAS programs

A SAS program is a sequence of steps that the user submits for execution.

Data steps are typically used to create SAS data sets

PROC steps are typically used to process SAS data sets ( that is, generate reports and graphs, sort data and analyze data)

Page 5: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

SAS Data Libraries

A SAS data library is a collection of SAS files that are

recognized as a unit by SAS

A SAS data set is one type of SAS file stored in a data

library

Work library is temporary library, when SAS is closed, all

the datasets in the Work library are deleted; create a

permanent SAS dataset via your own library.

Page 6: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

SAS Data Libraries

Identify/create SAS data libraries by assigning each a library reference name (libref) with LIBNAME statement

LIBNAME libref “file-folder-location”;

Eg: LIBNAME readData 'C:\temp\sas class\readData‘;

Rules for naming a library reference name: The name must be 8 characters or less The name must begin with a letter or underscore The remaining characters must be letters, numbers or

underscores.

Page 7: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Reading internal raw data in SAS system

Put small amounts of raw data directly in the SAS program to create SAS data set, you must

Start a DATA step and name the SAS data set being created with DATA statement

Describe how to read the data fields from the raw data file with INPUT statement

Use the DATALINES statement to indicate internal data

The RUN statement detects the end of a step

Page 8: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Reading internal raw data in SAS system

Example: DATA dog1; INPUT ID Age Gender $ Income; DATALINES; 1 10 m 2300 2 13 f 1500 3 12 f 1700 4 9 m 100 5 13 m 1000; RUN;

Page 9: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Reading external raw data files into SAS system

In order to create a SAS data set from a raw data file, you must

Start a DATA step and name the SAS data set being created (DATA statement)

Identify the location of the raw data file to read (INFILE statement)

Describe how to read the data fields from the raw data file (INPUT statement)

The RUN statement detects the end of a step

Page 10: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Reading external raw data file into SAS system

LIBNAME readData “C:\temp\sas class”;

DATA readData.dog1; INFILE “C:\temp\sas class\dog.txt”; INPUT ID Age Gender $ Income;

RUN;

The LIBNAME statement assigns a libref ‘readData ’ to a data library. The DATA statement creates a permanent SAS data set named ‘dog1’. The INFILE statement points to a raw data file. The INPUT statement - name the SAS variables

- identify the variables as character or numeric ($ indicates character data) - specify the locations of the fields in the raw data - can be specified as column, formatted, list, or named input

The RUN statement detects the end of a step

Page 11: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Reading Delimited or PC Database Files with the IMPORT Procedure

If your data file has the proper extension, use the simplest form of the IMPORT procedure:

PROC IMPORT DATA FILE = ‘filename’ OUT = data-set DBMS = identifier ; RUN;

Type of File Extension DBMS Identifier

Comma-delimited .csv CSV Tab-delimited .txt TAB Excel .xls EXCEL Lotus Files .wk1, .wk3, .wk4 WK1,WK3,WK4 Delimiters other than commas or tabs DLM

Examples: PROC IMPORT DATAFILE=‘c:\temp\sale.xls’ OUT=readData.import1; DBMS = EXCEL; RUN;

Page 12: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Reading Delimited or PC Database Files with the IMPORT Procedure

If your file does not have the proper extension, or your file is of type with delimiters other than commas or tabs, then you must use the DBMS= and DELIMITER= option

PROC IMPORT DATA FILE = ‘filename’ OUT = data-set DBMS = identifier ; DELIMITER = ‘delimiter-character’; RUN;

Examples: PROC IMPORT DATAFILE=‘c:\temp\sale.txt’ OUT=readData.import2; DBMS = DLM; DELIMITER = ‘&’; RUN;

Page 13: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Reading Files with the IMPORT Procedure

If your file does not have the proper extension, or your file is of type with delimiters other than commas or tabs, then you must use the DBMS= and DELIMITER= option

PROC IMPORT DATAFILE = ‘filename’ OUT = data-set

DBMS = identifier; DELIMITER = ‘delimiter-character’; RUN;

Example:

PROC IMPORT DATAFILE = ‘C:\sas class\readData\import2.txt’ OUT =readData.sasfile DBMS =DLM;

DELIMITER = ‘&’; RUN;

Page 14: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Format in SAS data set

Standard Formats (selected): Character: $w. Date, Time and Datetime:

DATEw., MMDDYYw., TIMEw.d, …… Numeric: COMMAw.d, DOLLARw.d, ……

Use FORMAT statement PROC PRINT DATA=sales;

VAR Name DateReturned CandyType Profit;

FORMAT DateReturned DATE9. Profit DOLLAR 6.2;

RUN;

Page 15: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Format in SAS data set

Create your own custom formats with two steps: Create the format using PROC FORMAT and VALUE statement. Assign the format to the variable using FORMAT statement.

General form of a simple PROC FORMAT steps: PROC FORMAT;

VALUE name range-1=‘formatted-text-1’

range-2=‘formatted-text-2’ ……;

RUN;

The name in VALUE statement is the name of the format you are creating, which can’t be longer than eight characters, must not start or end with a number. If the format is for character data, it must start with a $.

Page 16: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Format in SAS data setExmaple:

/* Step1: Create the format for certain variables */ PROC FORMAT; VALUE $genFmt ‘m’ = 'Male' ‘f’ = 'Female'; VALUE polFmt 1=‘likes’ 2=‘dont care’ 3=‘dislikes’ 9=‘no answer’ RUN;

/* Step2: Assign the variables */

DATA Mydata.dog123(replace=yes); SET Mydata.dog123; FORMAT Gender genFmt. Policy polFmt.; RUN;

Page 17: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Format in SAS data set

Permanently store formats in a SAS catalog by Creating a format catalog file with LIB in PROC

FORMAT statement Setting the format search options

Example: LIBNAME Mydata ‘C:\sas class\Format’; OPTIONS FMTSEARCH=(Mydata.dogfmt); PROC FORMAT LIB=Mydata.dogfmt; VALUE $genFmt m = 'Male’ f = 'Female'; RUN;

Read formats OPTIONS nofmterr;

OPTIONS FMTSEARCH=(Mydata.dogfmt);

Page 18: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Combining SAS Data Sets: Concatenating and Interleaving

Use the SET statement in a DATA step to

concatenate SAS data sets.

Use the SET and BY statements in a DATA

step to interleave SAS data sets.

Page 19: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Combining SAS Data Sets: Concatenating and Interleaving

General form of a DATA step concatenation: DATA new SAS-data-set;

SET SAS-data-set1 SAS-data-set2 …;

RUN;

Example: DATA mydata.dog12; SET dog1 mydata.dog2; RUN;

Page 20: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Combining SAS Data Sets: Concatenating and Interleaving

General form of a DATA step interleave: DATA new-data-set; SET SAS-data-set1 SAS-data-set2 …; BY BY-variable; RUN;

Sort all SAS data set first by using PROC SORT Example:

PROC SORT data=dog1 OUT=dog1_sorted; BY ID; RUN; DATA mydata.dog12; SET dog1 mydata.dog2; BY ID; RUN;

Page 21: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Match-Merging SAS Data Sets

One-to-one match merge

One-to-many match merge

Many-to-many match merge The SAS statements for all three types of match

merge are identical in the following form:

DATA new-data-set;

MERGE SAS-data-set-1 SAS-data-set-2 SAS-data-set-3 …;

BY by-variable(s); /* indicates the variable(s) that control

which observations to match */

RUN;

Page 22: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Merging SAS Data Sets: A More Complex Example

/* To match-merge the data sets by common variables - EmpID, the data sets must be ordered by EmpID */

PROC SORT data=combData.Groupsched;

BY EmpID;

RUN;

Example: Merge two data sets acquire the names of the group team that is scheduled to fly next week.

combData.employee combData.groupsched

EmpID LastName

E00632 Strauss

E01483 Lee

E01996 Nick

E04064 Waschk

EmpID FlightNum

E04064 5105

E0632 5250

E01996 5501

Page 23: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Merging SAS Data Sets: A More Complex Example

/* simply merge two data sets */DATA combData.nextweek;

MERGE combData.employee combData.groupsched;

BY EmpID;

RUN;

EmpID LastJName FlightNum

E00632 Strauss 5250

E01483 Lee

E01996 Nick 5501

E04064 Waschk 5105

Page 24: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Merging SAS Data Sets: A More Complex Example

Eliminating Nonmatches Use the IN= data set option to determine which

dataset(s) contributed to the current observation. General form of the IN=data set option:

SAS-data-set (IN=variable) Variable is a temporary numeric variable that has two

possible values: 0 indicates that the data set did not contribute to the

current observation. 1 indicates that the data set did contribute to the

current observation.

Page 25: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Merging SAS Data Sets: A More Complex Example

/* Exclude from the data set employee who are not scheduled to fly next week. */

LIBNAME combData “K:\sas class\merge”;

DATA combData.nextweek; MERGE combData.employee combData.groupsched (in=InSched); BY EmpID; IF InSched=1; TrueRUN;

EmpID LastJName FlightNum

E00632 Strauss 5250

E01996 Nick 5501

E04064 Waschk 5105

Page 26: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Merging SAS Data Sets: A More Complex Example

/* Find employees who are not in the flight scheduled group. */

LIBNAME combData “K:\sas class\merge”;DATA combData .nextweek; MERGE combData .employee (in=InEmp) combData.groupsched (in=InSched); BY EmpID; IF InEmp=1; True IF InSched=0; False RUN;

EmpID LastJName FlightNum

E01483 Lee

Page 27: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Different Types of Merges in SAS

DATA work.three;

MERGE work.one work.two;

BY X;

RUN;

One-to-Many Merging

X Y

1 A

2 B

3 C

X E

1 A1

1 A2

2 B1

3 C1

3 C2

X Y Z

1 A A1

1 A A2

2 B B1

3 C C1

3 C C2

Work.three

Work.two

Work.one

Page 28: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Different Types of Merges in SAS

DATA work.three;

MERGE work.one work.two;

BY X;

RUN;

Many-to-Many Merging

X Y

1 A1

1 A2

2 B1

2 B2

X Z

1 AA1

1 AA2

1 AA3

2 BB1

2 BB2

X Y Z

1 A1 AA1

1 A2 AA2

1 A2 AA3

2 B1 BB1

2 B2 BB2

Work.three

Work.two

Work.one

Page 29: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Some simple analysis procedure

The PRINT Procedure

The CONTENTS Procedure

The FREQ Procedure

The SORT Procedure

The MEANS Procedure

The CORR Procedure

The TTEST Procedure

The ANOVA Procedure

Page 30: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The PRINT Procedure

The PRINT procedure prints the observations in a SAS data set.

General form of a simple PROC PRINT steps:

PROC PRINT DATA = SAS-data-set;

VAR variable(s) <option>;

SUM variable(s) <option>;

RUN; The VAR statement specifies which variables to print and the order The SUM statement indicates the total values of numeric variables

Page 31: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The Contents Procedure

The CONTENTS procedure shows the contents of a SAS data set and prints the directory of the SAS data library

General form of a simple PROC CONTENTS steps:

PROC CONTENTS DATA = SAS-data-set;

RUN;

Page 32: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The SORT Procedure

The SORT procedure orders SAS data set observations by the values of one or more character or numeric variables.

General form of a simple PROC SORT steps:

PROC SORT DATA = SAS-data-set;

BY <DESCENDING> variable-1 <...<DESCENDING> variable-n>;

RUN;

Page 33: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The MEANS Procedure

The MEANS procedure provides descriptive statistics for variables across all observations

General form of a simple PROC MEANS steps:

PROC MEANS DATA = SAS-data-set;

CLASS variable(s) </ option(s)>;

VAR variable(s)

RUN;

Page 34: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The FREQ Procedure

The FREQ procedure produces one-way to n-way frequency and crosstabulation (contingency) tables

General form of a simple PROC FREQ steps:

PROC FREQ DATA = SAS-data-set;

TABLE requests < / options > ; RUN;

The TABLES statement requests one-way to n-way frequency and crosstabulation tables and statistics for those tables

Page 35: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The TTEST Procedure

The TTEST procedure performs t tests for one sample, two samples, and paired observations.

General form of a simple PROC FREQ steps: PROC TTEST DATA = SAS-data-set H0=m;

VAR variable(s);

RUN; PROC TTEST DATA = SAS-data-set;

VAR variable(s);

CLASS variable;

RUN;• use H0 option to a given number in the one sample t test• use CLASS statement in the two groups comparison t test

Page 36: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The ANOVA Procedure

The ANOVA procedure performs one-way analysis of variance (ANOVA) for balanced data

General form of a simple PROC FREQ steps:PROC ANOVA DATA = SAS-data-set;

CLASS variable(s) </options>;

MODLE dependents = effects <options>;

RUN;

Page 37: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Some simple analysis procedure

The UNIVARIATE Procedure

The REG Procedure

The LOGISTIC Procedure

Page 38: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The UNIVARIATE Procedure

The UNIVARIATE procedure provides descriptive statistics, histograms, quartile - quartile plots (Q-Q plots) and probability plots

General form of a simple PROC FREQ steps:

PROC UNIVARIATE DATA = SAS-data-set;

VAR variables;

HISTOGRAM;

QQPLOT;

RUN;

Page 39: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The REG procedure

The REG procedure is one of many regression procedures in the SAS System.

The REG procedure allows several MODEL statements and gives additional regression diagnostics, especially for detection of collinearity. It also creates plots of model summary statistics and regression diagnostics.

PROC REG <options>;

MODEL dependents=independents </options>;

PLOT <yvariable*xvariable>;

RUN;

Page 40: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

An example

PROC REG DATA=water; MODEL Water = Temperature Days Persons / VIF; MODEL Water = Temperature Production Days / VIF; RUN; PROC REG DATA=water; MODEL Water = Temperature Production Days; PLOT STUDENT.* PREDICTED.; /*To get studentized Residual */ PLOT STUDENT.* NPP.; /*To get Normal Cumulative Distribution*/ PLOT r.*nqq.; /*Produce normal Q-Q plot */ RUN;

Page 41: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

The LOGISTIC procedure

The binary or ordinal responses with continuous independent variables

PROC LOGISTIC < options > ;

MODEL dependents=independents < / options > ;

RUN;

The binary or ordinal responses with categorical independent variables

PROC LOGISTIC < options > ;

CLASS categorical variables < / option > ;

MODEL dependents=independents < / options > ;

RUN;

Page 42: How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging

Example

PROC LOGISTIC data=Mydata2.pain;

CLASS Treatment Sex;

MODEL Pain= Treatment Sex Treatment*Sex Age Duration;

RUN;