isqs 6347, data & text mining 1 isqs 6339, data management & business intelligence data...

67
ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

Upload: donna-may

Post on 18-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 1

ISQS 6339, Data Management & Business Intelligence

Data Preparation for Analytics Using SAS

Zhangxi Lin

Texas Tech University

Page 2: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 2

Outline

An overview of data preparation for analytics SAS Programming Essentials

Running SAS programs Mastering fundamental concepts SAS program debugging

Make use of SAS Enterprise Guide for programming

Page 3: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 3

Structure and Components of Business Intelligence

Page 4: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 4

Overview: From Data Warehousing to Data Analysis Previous major topics in data warehousing (using SQL Server

2008) Dimensional model design ETL Cubes design and OLAP

Data analysis topics (using SAS) Data preparation

Analytic business questions Data format and data conversion

Data cleansing Data exploratory Data analysis Data visualization

Page 5: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 5

US Car Theft

The number of U.S. motor vehicle thefts decreased by 1.9 percent from 2003 to 2004, the first decrease since 1999. In 2004, the value of stolen motor vehicles was $7.6 billion, down from $8.6 billion in 2003. The average value of a motor vehicle reported stolen in 2004 was $6,143, compared with $6,797 in 2003.

Page 6: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 6

2004 Theft Statistics

Every 26 seconds, a motor vehicle is stolen in the United States. The odds of a vehicle being stolen were 1 in 190 in 2003. The odds are highest in urban areas.

U.S. motor vehicle thefts fell 1.9 percent in 2004 from 2003, according to the FBI's Uniform Crime Reports. In 2004, 1,237,114 motor vehicles were reported stolen.

The West was the only region with an increase in motor vehicle thefts from 2003 to 2004, up 3.2 percent. Thefts fell 9.7 percent in the Northeast, 4.4 percent in the Midwest and 2.9 percent in the South.

Nationwide, the 2004 motor vehicle theft rate per 100,000 people was 421.3, down 2.9 percent from 433.7 in 2003.

Only 13.0 percent of thefts were cleared by arrests in 2004. Carjackings occur most frequently in urban areas. They account

for only 3.0 percent of all motor vehicle thefts. The average comprehensive insurance premium in the U.S. rose

11.2 percent from 1999 to 2003

Page 7: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 7

Business Question

If the number of used Honda Accord thefts is ranked the top in auto theft, should the premium of insurance for Honda Accord be high enough than other brand of cars? Should the insurance for a user Honda higher than a brand new Honda?

Why?

Page 8: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 8

Analytic Business Questions

How do factors such as time, branch, promotion, and price influence the sale of a soft drink?

Which customers have a high cancellation risk in the next month? How can customers be segmented based on their purchase

behavior? Statistics showed that an online recommendation system may

increase the sale 20%, and the accuracy rate of the system is 40%. A newer algorithm can increase the accuracy rate to 50%. Should the sale be promoted to 20%*125% = 25%?

The airline companies are considering allowing seats over-booked because certain percentage of customers will cancel their flight at the last minute. If the average cancellation rate is 10%, should the over-booking rate be 10% as well? If a cancellation is charged 5% of the fare and how much should the penalty for sold-out situation with over-booking?

Page 9: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 9

Analysis Process

Selecting an analysis method Identify data source Prepare the data (collecting, cleansing, reorganizing, extracting

transforming, loading) Execute the analysis Interpret the analysis Automate data preparation and execution of analysis, if the

business question has to be answered more than once ETL Stored procedures

The above steps can also be iterated, not necessarily performed in sequential order

We focus on the data preparation step

Page 10: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 10

Characteristics of Analytic Business Questions Analysis complexity: real analysis or reporting Analysis paradigm: statistics or data mining Data preparation paradigm: as much as data as possible or

business knowledge first Analysis method: supervised or unsupervised analysis Scoring needed – yes/no Periodicity of analysis: one-shot or re-run Historic data needed, yes/no Data structure: one row or multiple rows per subject Complexity of the analysis team

Page 11: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 11

Components of the SAS System

ReportingAnd

Graphics

Data AccessAnd

Management

UserInterface

Analytical Base SASApplication

Development

VisualizationAnd Discovery

BusinessSolutions

WebEnablement

Page 12: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 12

SAS Programming Essentials

Find more information from http://support.sas.com

Page 13: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 13

Data-driven Tasks

The functionality of the SAS System is built around four data-driven tasks common to virtually any applications Data access Data management Data analysis Data presentation

Page 14: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 14

Turning Data into Information Process of delivery meaningful information

80% data-related Access Scrub Transform Mange Store and retrieve

20% analysis

Page 15: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 15

DATAStep

SAS Data Sets

Data

PROCSteps

Information

Turning Data into Information

Page 16: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 16

PCPC WorkstationWorkstationServers//Midrange MainframeMainframe

SuperComputer

90%independent

10%dependent

MultiVendor Architecture

Design of the SAS System

...

Page 17: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 17

MultiEngine Architecture

Design of the SAS System

DATADATA

Teradata

SYBASE

Microsoft ExcelORACLE

dBase

SAP

DB2

Page 18: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 18

SAS Programming – Level I Fundamentals (ch1-3) Producing list reports (ch4) Enhancing output (ch5) Creating data sets (ch6) Data step programming (ch7)

Reading data Creating variables Conditional processing Keeping and dropping variables Reading Excel files

Combining SAS data sets (ch8) Producing summary reports (ch9) SAS graphing (ch10)

Page 19: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 19

In this course, you work with business data from International Airlines (IA). The various kinds of data that IA maintains are listed below: flight data passenger data cargo data employee data revenue data

Course Scenario

Page 20: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 20

The following are some tasks that you will perform: importing data creating a list of employees producing a frequency table of job codes summarizing data creating a report of salary information

Course Scenario

Page 21: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 21

DATA steps are typically used to create SAS data sets.

PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, and sort data).

A SAS program is a sequence of steps that the user submits for execution.

RawData

RawData

DATAStep

DATAStep

ReportReport

SASDataSet

SASDataSet

PROCStep

PROCStep

SAS Programs

Page 22: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 22

data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;

proc print data=work.staff;run;

proc means data=work.staff; class JobTitle; var Salary;run;

DATAStep

PROCSteps

SAS Programs

Page 23: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 23

SAS steps begin with either of the following: DATA statement PROC statement

SAS detects the end of a step when it encounters one of the following: a RUN statement (for most steps) a QUIT statement (for some procedures) the beginning of another step (DATA statement

or PROC statement)

Step Boundaries

Page 24: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 24

data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;

proc print data=work.staff;

proc means data=work.staff; class JobTitle; var Salary;run;

Step Boundaries

Page 25: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 25

You can invoke SAS in the following ways: interactive windowing mode (SAS windowing

environment) interactive menu-driven mode (SAS Enterprise Guide,

SAS/ASSIST, SAS/AF, or SAS/EIS software) batch mode noninteractive mode

Running a SAS Program

Page 26: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 26

Preparation of SAS Programming Data sets: \SAS-Programming Create a user defined library reference

Statement

LIBNAME libref ‘SAS-data-library’ <options>;

Example

LIBNAME ia ‘c:\workshop\winsas\prog1’;

Two-levels of SAS files namesLibref.fielname

Page 27: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 27

SAS Programming Essentials

Demon: c02s2d1 Exercise: c02ex1

Page 28: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 28

General form of the CONTENTS procedure:

Example:

PROC CONTENTS DATA=SAS-data-set;RUN;

PROC CONTENTS DATA=SAS-data-set;RUN;

proc contents data=work.staff;run;

Browsing the Descriptor Portion

c02s3d1

Page 29: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 29

Numeric values

Variable

names

Variable

values

LastName FirstName JobTitle Salary

TORRES JAN Pilot 50000LANGKAMM SARAH Mechanic 80000SMITH MICHAEL Mechanic 40000WAGSCHAL NADJA Pilot 77500TOERMOEN JOCHEN Pilot 65000

The data portion of a SAS data set is a rectangular table of character and/or numeric data values.

Variable names are part of the descriptor portion, not the data portion.

Character values

SAS Data Sets: Data Portion

Page 30: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 30

SAS Variable Values

There are two types of variables:

character contain any value: letters, numbers, special characters, and blanks. Character values are stored with a length of 1 to 32,767 bytes. One byte equals one character.

numeric stored as floating point numbers in 8 bytes of storage by default. Eight bytes of floating point storage provide space for 16 or 17 significant digits. You are not restricted to 8 digits.

Page 31: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 31

SAS names have these characteristics: can be 32 characters long. can be uppercase, lowercase, or mixed-case. are not case sensitive. must start with a letter or underscore.

Subsequent characters can be letters, underscores, or numerals.

SAS Data Set and Variable Names

Page 32: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 32

data5mon

Select the valid default SAS names.

Valid SAS Names

...

Page 33: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 33

Select the valid default SAS names.

Valid SAS Names

...

data5mon

Page 34: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 34

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

Valid SAS Names

...

Page 35: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 35

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

Valid SAS Names

...

Page 36: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 36

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

Valid SAS Names

...

data#5

Page 37: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 37

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

Valid SAS Names

...

data#5

Page 38: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 38

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

Valid SAS Names

...

data#5

five months data

Page 39: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 39

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

Valid SAS Names

...

data#5

five months data

Page 40: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 40

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

five months data

data#5

Valid SAS Names

...

fivemonthsdata

Page 41: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 41

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

five months data

data#5

Valid SAS Names

...

fivemonthsdata

Page 42: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 42

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

five months data

data#5

Valid SAS Names

...

fivemonthsdata

FiveMonthsData

Page 43: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 43

data5mon

Select the valid default SAS names.

data5mon

5monthsdata

five months data

data#5

Valid SAS Names

...

fivemonthsdata

FiveMonthsData

Page 44: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 45

LastName FirstName JobTitle Salary

TORRES JAN Pilot 50000LANGKAMM SARAH Mechanic 80000SMITH MICHAEL Mechanic . WAGSCHAL NADJA Pilot 77500TOERMOEN JOCHEN 65000

A value must exist for every variable for each observation.

Missing values are valid values.

A numeric missing value is displayed as a period.

A character missing value is displayed as a blank.

Missing Data Values

Page 45: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 46

The PRINT procedure displays the data portion of a SAS data set.

By default, PROC PRINT displays the following: all observations all variables an Obs column on the left side

Browsing the Data Portion

Page 46: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 47

General form of the PRINT procedure:

Example:

PROC PRINT DATA=SAS-data-set;RUN;

PROC PRINT DATA=SAS-data-set;RUN;

proc print data=work.staff;run;

Browsing the Data Portion

c02s3d1

Page 47: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 48

SAS documentation and text in the SAS windowing environment use the following terms interchangeably:

SAS Data SetSAS Data Set SAS TableSAS Table

VariableVariable ColumnColumn

ObservationObservation RowRow

SAS Data Set Terminology

Page 48: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 49

SAS statements have these characteristics: usually begin with an identifying keyword always end with a semicolon

data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;

proc print data=work.staff;run;

proc means data=work.staff; class JobTitle; var Salary;run;

SAS Syntax Rules

Page 49: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 50

SAS statements are free-format. One or more blanks or special characters can

be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line.

Unconventional Spacing

data work.staff; infile 'raw-data-file';input LastName $ 1-20 FirstName $ 21-30JobTitle $ 36-43 Salary 54-59;run; proc means data=work.staff; class JobTitle; var Salary;run;

SAS Syntax Rules

...

Page 50: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 52

data work.staff; infile 'raw-data-file';input LastName $ 1-20 FirstName $ 21-30JobTitle $ 36-43 Salary 54-59;run; proc means data=work.staff; class JobTitle; var Salary;run;

SAS statements are free-format. One or more blanks or special characters can

be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line.

Unconventional Spacing

SAS Syntax Rules

...

Page 51: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 53

SAS statements are free-format. One or more blanks or special characters can

be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line.

Unconventional Spacing

data work.staff; infile 'raw-data-file';input LastName $ 1-20 FirstName $ 21-30JobTitle $ 36-43 Salary 54-59;run; proc means data=work.staff; class JobTitle; var Salary;run;

SAS Syntax Rules

...

Page 52: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 54

data work.staff; infile 'raw-data-file';input LastName $ 1-20 FirstName $ 21-30JobTitle $ 36-43 Salary 54-59;run; proc means data=work.staff; class JobTitle; var Salary;run;

...

SAS statements are free-format. One or more blanks or special characters can

be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line.

Unconventional Spacing

SAS Syntax Rules

...

Page 53: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 55

data work.staff; infile 'raw-data-file';input LastName $ 1-20 FirstName $ 21-30JobTitle $ 36-43 Salary 54-59;run; proc means data=work.staff; class JobTitle; var Salary;run;

...

SAS statements are free-format. One or more blanks or special characters can

be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line.

Unconventional Spacing

SAS Syntax Rules

Page 54: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 56

Good spacing makes the program easier to read.

Conventional Spacing

data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;

proc print data=work.staff;run;

proc means data=work.staff; class JobTitle; var Salary;run;

SAS Syntax Rules

Page 55: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 57

Type /* to begin a comment. Type your comment text. Type */ to end the comment.

/* Create work.staff data set */data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;

/* Produce listing report of work.staff */proc print data=work.staff;run;

SAS Comments

c02s3d2

Page 56: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 58

daat work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;

proc print data=work.staff run;

proc means data=work.staff average max; class JobTitle; var Salary;run;

Syntax errors include the following: misspelled keywords missing or invalid punctuation invalid options

Syntax Errors

Page 57: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 59

This demonstration illustrates how to submit a SAS program that contains errors, diagnose the errors, correct the errors, and save the corrected program.

Debugging a SAS Program c02s4d1.sas userid.prog1.sascode(c02s4d1) c02s4d2.sas userid.prog1.sascode(c02s4d2)

Page 58: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 60

daat work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;proc print data=work.staff run;proc means data=work.staff average max; class JobTitle; var Salary;run;data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;proc print data=work.staff; run;proc means data=work.staff mean max; class Jobtitle; var Salary;run;

Program statements accumulate in a recall buffer each time you issue a SUBMIT command.

SubmitNumber 1

SubmitNumber 2

Recall a Submitted Program

Page 59: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 61

SubmitNumber 1

SubmitNumber 2

Issue RECALLonce.

Submit Number 2 statementsare recalled.

Issue the RECALL command once to recall the most recently submitted program.

data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;proc print data=work.staff; run;proc means data=work.staff mean max; class JobTitle; var Salary;run;

Recall a Submitted Program

Page 60: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 62

daat work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;proc print data=work.staff run;proc means data=work.staff average max; class JobTitle; var Salary;run;data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ 21-30 JobTitle $ 36-43 Salary 54-59;run;proc print data=work.staff; run;proc means data=work.staff mean max; class JobTitle; var Salary;run;

Issue the RECALL command again to recall Submit Number 1 statements.

Recall a Submitted Program

SubmitNumber 1

SubmitNumber 2

Issue RECALLagain.

Page 61: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 63

Exercise 8: Basic SAS Programming Define library IA and Out Go through all SAS programs in Chapter 2-5. Write a SAS program to read a dataset created by

yourself or simply use Person0.txt in \\TechShare\coba\d\ISQS3358\OtherDatasets\ .

The dataset is output to your library Out. Try to apply whatever SAS features in Chapter 5

of Prog-I to general a nice looking report.

Go through all exercises for Ch 2, 3, 4, 5, 6 (answer keys are available, so no need to submit the results)

Page 62: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

Hands-on exercise

Write a SAS program to calculate the number of dates passed in 2012 to 3/3/2012. The input is in the format: date9.

01JAN2012 03MAR2012 Answer: 62 days

ISQS 6347, Data & Text Mining 64

Page 63: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 65

Making Use of SAS Enterprise Guide Code Import a text file

Example: Orders.txt Import an Excel file

Example: SupplyInfo.xls

Page 64: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 66

Learn from Examples

SAS Help Contents -> Learning to use SAS -> Sample SAS

Programs -> Base SAS “Base Usage Guide Examples”

Chapter 3, 4

Page 65: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 67

Import an Excel Sheet

proc import out=work.commrexdatafile ="C:\Lin\Shared\ISQS6339\Commrex_3358.xls" dbms=excel replace;

sheet="Company";getnames=yes;mixed=no;scantext=yes;usedate=yes;scantime=yes;run;proc print data=work.commrex;run;

Page 66: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 68

Excel SAS/ACCESS LIBNAME Enginelibname xlsdata 'C:\Lin\Shared\ISQS6339\Commrex_3358.xls';

proc print data=xlsdata.New1;

run;

Page 67: ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 69

Exercise 9: SAS Data Step Programming http://zlin.ba.ttu.edu/6339/ExerciseInstructions9.htm