using sql in sas - uncw faculty and staff web pagespeople.uncw.edu/blumj/advsas/ppt/using sql in...

26
Using SQL in SAS

Upload: dinhmien

Post on 16-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Using SQL in SAS

SQL

Structured Query Language (SQL) is a standardized language for access and manipulation of various data structures.

SQL is implemented in SAS via PROC SQL

General Syntax

PROC SQL ;

ALTER TABLE ;

CONNECT ;

CREATE INDEX ;

CREATE TABLE ;

CREATE VIEW ;

DELETE ;

DESCRIBE ;

DISCONNECT ;

DROP ;

EXECUTE ;

INSERT ;

RESET ;

SELECT ;

UPDATE ;

VALIDATE ;

SAS vs. SQL

In data processing, some standard terms differ between SAS and SQL:

General/Raw SAS SQL

File Data Set Table

Record Observation Row

Field Variable Column

Unique “Features” of PROC SQL

SQL does not require a run statement (it will use a quit statement instead). This is because SQL statements are executed upon submission.

How does this differ from typical SAS procedures?

Typical Execution of SAS Procedures

Suppose I have submitted the code as shown below:

I get a message that PROC MEANS is running. Has it summarized the variable as specified?

Typical Execution of SAS Procedures

Well, no. Suppose I amend it with a class statement and submit it.

PROC MEANS is still running, but it’s not recomputing the analysis for the given classes.

Typical Execution of SAS Procedures

Typical procedures are checked for proper syntax and compiled one at a time.

If you now submit a run statement, proc means will execute based on all statements specified.

For SQL, statements compile and execute immediately.

Other things about SQL…

SQL statements are made up of clauses

References to variables/columns or lists of data sets are separated by commas.

The select statement is the most important example of this structure…

Example

proc sql;

select region, pol_type, jobtotal,

0.02*jobtotal as incidental

from mysas.projects

where region in ('Beaumont','Boston')

order by region, pol_type

;

quit;

Example

proc sql;

select region, pol_type, jobtotal,

0.02*jobtotal as incidental

from mysas.projects

where region in ('Beaumont','Boston')

order by region, pol_type

;

quit;

Begin SQLprocessing

Example

proc sql;

select region, pol_type, jobtotal,

0.02*jobtotal as incidental

from mysas.projects

where region in ('Beaumont','Boston')

order by region, pol_type

;

quit;

Note that thisSQL invocation has

3 clauses

Select statementwith clauses

Example

proc sql;

select region, pol_type, jobtotal,

0.02*jobtotal as incidental

from mysas.projects

where region in ('Beaumont','Boston')

order by region, pol_type

;

quit;

Select startswith a reference

to variables/columns…

and potentiallycontains several

clauses

(including,possibly,

new ones)…

Example

proc sql;

select region, pol_type, jobtotal,

0.02*jobtotal as incidental

from mysas.projects

where region in ('Beaumont','Boston')

order by region, pol_type

;

quit;

End SQLprocessing

The Select Statement

Refers to a comma separated list of columns (variables) to be used/generated.

Allows for construction of new columns and aliasing (a variable name).

Has several clauses available…

Some Clauses

From: indicates which table/data set to read from.

Where: allows for conditional sub-setting of the data—similar to the where statement that can be used in any SAS procedure.

Order By: Sorts results by the column(s) specified (with the options ASC or DESC).

Summarizing/Grouping Data

SQL accommodates a variety of functions for summarizing data.

Some functions:

AVG (or MEAN)

COUNT (or FREQ or N)

CSS

CV

MAX

Summarizing/Grouping Data

Some more functions: MIN NMISS PRT RANGE STD STDERR SUM T USS VAR

Using Summary Functions

Summary functions are applied to columns in the select statement.

Suppose I try:proc sql;

select region, pol_type,

mean(jobtotal) as jobmean,

mean(0.02*jobtotal) as incidental

from mysas.projects

where region in ('Beaumont','Boston')

;

quit;

Using Summary Functions

I get the means of the requested columns, but…

Using Summary Functions

The summary results are “merged” back on to the original data set (as noted in the log).

Try this version:proc sql;

select region, pol_type, jobtotal,

mean(jobtotal) as jobmean,

mean(0.02*jobtotal) as incidental

from mysas.projects

where region in ('Beaumont','Boston')

;

quit;

Producing Short Summaries

You can produce summaries similar to what you get using PROC MEANS with a CLASS statement using the GROUP BY clause.proc sql;

select region, pol_type,

mean(jobtotal) as jobmean,

mean(0.02*jobtotal) as incidental

from mysas.projects

where region in ('Beaumont','Boston')

group by region, pol_type

;

quit;

Producing Short Summaries

Summaries are now computed for each group—behavior is similar to the group option in PROC REPORT

Creating Tables for Output

To create an actual table (data set) from our query, we use the CREATE TABLE statement.

SELECT now becomes a clause within that statement…

Creating Tables for Output

Example:proc sql;

create table incidentals as

select region, pol_type, jobtotal,

0.02*jobtotal as incidental

from mysas.projects

where region in ('Beaumont','Boston')

order by region, pol_type

;

quit;One

Statement

Creating Tables for Output

Run PROC CONTENTS on the resulting data set—noting any special characteristics.

Aliases become variable names in output tables. If no alias is assigned, SAS creates a variable name.