chapter 12 data transformations. section 12.1 reviewing sas functions

217
Chapter 12 Data Transformations

Upload: erick-parker

Post on 25-Dec-2015

231 views

Category:

Documents


6 download

TRANSCRIPT

Chapter 12

Data Transformations

Section 12.1

Reviewing SAS Functions

3 3

Objectives Review the syntax of SAS functions.

4 4

SAS FunctionsSAS provides a large library of functions for manipulating data during DATA step execution.

A SAS function is often categorized by the type of data manipulation performed: truncation character date and time mathematical trigonometric

special sample statistics financial random number state and ZIP code

5 5

Syntax for SAS FunctionsA SAS function is a routine that performs a computation or system manipulation and returns a value. Functions use arguments supplied by the user or by the operating environment.

General form of a SAS function:

function-name(argument-1,argument-2,…,argument-n)function-name(argument-1,argument-2,…,argument-n)

6 6

Using SAS FunctionsYou can use functions in executable DATA step statements anywhere that an expression can appear.

data contrib; set prog2.donate; Total=sum(Qtr1,Qtr2,Qtr3,Qtr4); if Total ge 50;run;

proc print data=contrib noobs;run;

7 7

Using SAS FunctionsPartial PROC PRINT Output

What if you want to sum Qtr1 through Qtr400, instead

of Qtr1 through Qtr4?

ID Qtr1 Qtr2 Qtr3 Qtr4 Total

E00224 12 33 22 . 67E00367 35 48 40 30 153E00441 . 63 89 90 242E00587 16 19 30 29 94E00621 10 12 15 25 62

8 8

SAS Variable ListsA SAS variable list is a shortcut method of referring to a list of variable names. SAS enables you to use the following variable lists: numbered range lists name range lists name prefix lists special SAS name lists

These methods can be used in many places where variable names are expected.

9 9

SAS Variable Lists: Numbered Range ListSyntax: x1-xn

A numbered range list specifies all variables from x1 to xn inclusive (including the variables named).

You can begin with any number and end with any number. You must follow the rules for user-supplied variable names and the numbers must be consecutive.

proc print data=prog2.donate; var id Qtr2-Qtr4;run;

1010

SAS Variable Lists- Numbered Range List

What would be the result of this program if Qtr3 were not in the data set?

proc print data=prog2.donate; var id Qtr2-Qtr4;run;

1111

SAS Variable Lists- Numbered Range List

What would be the result of this program if Qtr3 were not in the data set?

proc print data=prog2.donate; var id Qtr2-Qtr4;run;

Because the variable Qtr3 is not in the data set, you get an error in the log. The error message indicates that the variable does not exist.

1212

SAS Variable Lists- Name Range ListSyntax: StartVarName--StopVarName

A name range list specifies all variables ordered as they are in the program data vector, from StartVarName to StopVarName, inclusive.

There are two hyphens.

proc print data=fakedata; var id Name--Salary;run;

ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3

1313

What is the result of the following program?

The output contains ID followed by Name, State, and

Salary.

SAS Variable Lists- Name Range List

proc print data=fakedata; var id Name--Salary;run;

ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3

1414

SAS Variable Lists- Name Range ListSyntax: StartVarName-NUMERIC-StopVarName

StartVarName-CHARACTER-StopVarName

You can also use the keyword NUMERIC or CHARACTER inside the hyphens to select all the variables of that data type, inclusively.

proc print data=fakedata; var id Name-character-JobCode;run;

ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3

1515

SAS Variable Lists- Name Range ListWhat is the result of the following program?

The output contains ID followed by Name, State, and

Jobcode. Salary is not displayed because it is a numeric variable.

proc print data=fakedata; var id Name-character-Jobcode;run;

ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3

1616

SAS Variable Lists- Name Prefix ListSyntax: PartVarName:

Providing part of the variable name followed by a semicolon tells SAS that you want all the variables that start with that string.

The case of the string does not matter.

data fakedata2; set fakedata; keep id S:;run;

ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3

1717

What is the result of the following program?

The new data set contains ID followed by State and

Salary.

SAS Variable Lists- Name Prefix List

data fakedata2; set fakedata; keep id S:;run;

ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3

1818

SAS Variable Lists- Special Prefix ListSyntax: _ALL_

_NUMERIC_

_CHARACTER_

specifies either all variables, all numeric variables, or all character variables that are defined in the current DATA step.

The case of the keyword does not matter.

ID $ 4 Year 8 Name $ 24 State $ 2 Salary 8 Jobcode $ 3

1919

SAS Variable ListsWhen you use a SAS variable list in a SAS function, use the keyword OF in front of the first variable name in the list.

If you omit the OF keyword, the function subtracts Qtr4

from Qtr1.

data contrib; set prog2.donate; Total=sum(of Qtr1-Qtr4); if Total ge 50;run;

Section 12.2

Manipulating Character Values

2121

Objectives Use SAS functions and operators to extract, edit, and

search character values.

2222

A Mailing Label ApplicationThe freqflyers data set contains information about frequent flyers.

Use this data set to create another data set suitable for mailing labels.

2323

A Mailing Label ApplicationID is a character variable. Its last digit represents the gender (1 denotes female, 2 denotes male) of the frequent flyer.

prog2.freqflyers

ID Name Address1 Address2

F31351 Farr,Sue 15 Harvey Rd. Macon,Bibb,GA,31298F161 Cox,Kay B. 163 McNeil Pl. Kern,Pond,CA,93280F212 Mason,Ron 442 Glen Ave. Miami,Dade,FL,33054F25122 Ruth,G. H. 2491 Brady St. Munger,Bay,MI,48747

2424

A Mailing Label Applicationlabels

The first task is to create a title of Mr. or Ms. based

on the last digit of ID.

FullName Address1 Address2

Ms. Sue Farr 15 Harvey Rd. Macon, GA 31298Ms. Kay B. Cox 163 McNeil Pl. Kern, CA 93280Mr. Ron Mason 442 Glen Ave. Miami, FL 33054Mr. G. H. Ruth 2491 Brady St. Munger, MI 48747

2525

The SUBSTR Function (Right Side)The SUBSTR function is used to extract or replace characters.

This form of the SUBSTR function (right side of the assignment statement) extracts characters.

NewVar=SUBSTR(string,start<,length>);NewVar=SUBSTR(string,start<,length>);

2626

The SUBSTR Function - Examples

If the length of the created variable is not previously defined with a LENGTH statement, it is the same as the length of the first argument to SUBSTR.

String can be a character constant, variable or expression.

Start specifies the starting position.

Length specifies the number of characters to extract. If omitted, the substring consists of the remainder of the expression.

2727

The SUBSTR Function (Right Side)Extract two characters from Location and start at position 11.

State=substr(Location,11,2);

State$ 18

OH

Location $ 18

Columbus, OH 43227

2828

proc print data=prog2.freqflyers noobs; var ID;run;

A Mailing Label Application

In what position does the last digit of ID occur?

In some values, the last digit is in column 6 and in others it is in column 4.

ID

F31351 F161 F212 F25122

PROC PRINT Output

2929

The RIGHT FunctionThe RIGHT function returns its argument right-aligned.

Trailing blanks are moved to the start of the value.

NewID=right(ID);

NewVar=RIGHT(argument);NewVar=RIGHT(argument);

NewID$ 6

F161

ID$ 6

F161

continued...

3030

The RIGHT FunctionAn argument can be a character constant, variable, or expression

If the length of the created variable is not previously defined with a LENGTH statement, it is the same as the length of the argument.

3131

The LEFT FunctionThe LEFT function returns its argument left-aligned.

Trailing blanks are moved to the end of the value.

NewID=left(ID);

NewVar=LEFT(argument);NewVar=LEFT(argument);

ID$ 6

F161

NewID$ 6

F161

3232

The LEFT FunctionAn argument can be a character constant, variable or expression

If the length of the created variable is not previously defined with a LENGTH statement, it is the same as the length of the argument.

3333

data labels; set prog2.freqflyers; if substr(right(ID),6)='1' then Title='Ms.'; else if substr(right(ID),6)='2' then Title='Mr.';run;

proc print data=labels noobs; var ID Title;run;

A Mailing Label Application

The result of the RIGHT function acts as the first argument to the SUBSTR function.

3434

A Mailing Label ApplicationPROC PRINT Output

ID Title F31351 Ms. F161 Ms. F212 Mr. F25122 Mr.

3535

A Mailing Label ApplicationThe next task is to separate the names of the frequent flyers into two parts.

Name

Farr,Sue

Cox,Kay B.

FMName

Sue

Kay B.

LName

Farr

Cox

3636

The SCAN FunctionThe SCAN function returns the nth word of a character value.

It is used to extract words from a character value when the relative order of words is known, but their starting positions are not.

NewVar=SCAN(string,n<,delimiters>);NewVar=SCAN(string,n<,delimiters>);

3737

The SCAN FunctionWhen the SCAN function is used, the length of the created variable is 200 bytes if it is

not previously defined with a LENGTH statement delimiters before the first word have no effect any character or set of characters can serve as

delimiters two or more contiguous delimiters are treated as a

single delimiter a missing value is returned if there are fewer than n

words in string if n is negative, the SCAN function selects the word in

the character string starting from the end of string.

3838

Second=scan(Phrase,2,' ');

The SCAN FunctionExtract the second word of Phrase.

andsoftware and services

1 2 3

Second$ 200

Phrase$ 21

...

3939

Second=scan(Phrase,2,':');

The SCAN FunctionExtract the second word of Phrase.

2

servicessoftware and:services services

Second$ 200

Phrase$ 21

software and:services

1

4040

The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

...

4141

The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

Thursday

1 2 3 4

Var1 $ 200

...

4242

The SCAN Function

Thursday 1776

Var1 $ 200

Var2$ 200

data scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

1 2 3 4

...

4343

The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

Thursday 1776

1 2 3 4

Var1 $ 200

Var2$ 200

Var3 $ 200

missing...

4444

The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

Thursday 1776 1776)

1 2

space

Var1 $ 200

Var2$ 200

Var3 $ 200

Var4$ 200

...

4545

The SCAN Functiondata scan; Text='(Thursday July 4, 1776)'; Var1=scan(Text,1); Var2=scan(Text,4); Var3=scan(Text,5); Var4=scan(Text,2,','); Var5=scan(Text,2,',)'); run;

Thursday 1776 1776) 1776

space

1 2

Var1 $ 200

Var2$ 200

Var3 $ 200

Var4$ 200

Var5$ 200

4646

A Mailing Label Applicationdata labels; length FMName LName $ 10; set prog2.freqflyers; if substr(right(ID),6)='1' then Title='Ms.'; else if substr(right(ID),6)='2' then Title='Mr.'; FMName=scan(Name,2,','); LName=scan(Name,1,',');run;

4747

proc print data=labels noobs; var ID Name Title FMName LName;run;

A Mailing Label Application

ID Name Title FMName LName

F31351 Farr,Sue Ms. Sue FarrF161 Cox,Kay B. Ms. Kay B. CoxF212 Mason,Ron Mr. Ron MasonF25122 Ruth,G. H. Mr. G. H. Ruth

The next task is to join the values of Title, FMName,

and LName into another variable.

PROC PRINT Output

4848

Concatenation OperatorThe concatenation operator joins character strings.

Depending on the characters available on your keyboard, the symbol to concatenate character values can be two exclamation points (!!), two vertical bars (||), or two broken vertical bars (¦¦).

NewVar=string1 !! string2;NewVar=string1 !! string2;

4949

Sue Farr

FULLName$ 20

Concatenation OperatorCombine FMName and LName to create FullName.

FullName=FMName !! LName;

Sue

FMName$ 10

Farr

LName $ 10

5050

The TRIM FunctionThe TRIM function removes trailing blanks from its argument.

If the argument is blank, the TRIM function returns one blank.

The TRIMN function is similar but returns a null string (zero blanks) if the argument is blank.

NewVar=TRIM(argument1) !! argument2;NewVar=TRIM(argument1) !! argument2;

5151

The TRIM Functiondata trim; length FMName LName $ 10; FMName='Sue'; LName='Farr'; FullName1=trim(FMName); FullName2=trim(FMName) !! LName; FullName3=trim(FMName) !! ' ' !! LName;run;

FullName1$ 10

Sue

...

5252

The TRIM Functiondata trim; length FMName LName $ 10; FMName='Sue'; LName='Farr'; FullName1=trim(FMName); FullName2=trim(FMName) !! LName; FullName3=trim(FMName) !! ' ' !! LName;run;

FullName1$ 10

FullName2$ 20

Sue SueFarr

...

5353

The TRIM Functiondata trim; length FMName LName $ 10; FMName='Sue'; LName='Farr'; FullName1=trim(FMName); FullName2=trim(FMName) !! LName; FullName3=trim(FMName) !! ' ' !! LName;run;

FullName1$ 10

FullName2$ 20

Sue SueFarr Sue Farr

FullName3$ 21

5454

The TRIM FunctionThe TRIM function does not remove leading blanks from a character argument. Use a combination of the TRIM and LEFT functions to remove leading and trailing blanks from a character argument.

If FMName contained leading blanks, the following assignment statement would correctly concatenate FMName and LName into FullName.

FullName=trim(left(FMName)) !! ' ' !! LName;

5555

A Mailing Label Applicationdata labels(keep=FullName Address1 Address2); length FMName LName $ 10; set prog2.freqflyers; if substr(right(ID),6)='1' then Title='Ms.'; else if substr(right(ID),6)='2' then Title='Mr.'; FMName=scan(Name,2,','); LName=scan(Name,1,','); FullName=Title !! ' ' !! trim(FMName) !! ' ' !! LName; Address2=scan(Address2,1,',') !! ', ' !! scan(Address2,3,',') !! ' ' !! scan(Address2,4,',');run;

5656

A Mailing Label Application

FullName Address1 Address2

Ms. Sue Farr 15 Harvey Rd. Macon, GA 31298Ms. Kay B. Cox 163 McNeil Pl. Kern, CA 93280Mr. Ron Mason 442 Glen Ave. Miami, FL 33054Mr. G. H. Ruth 2491 Brady St. Munger, MI 48747

proc print data=labels noobs; var FullName Address1 Address2;run;

PROC PRINT Output

5757

The CATX FunctionThe CATX function concatenates character strings, removes leading and trailing blanks, and inserts separators.

CATX(separator, string-1, … string-n)CATX(separator, string-1, … string-n)

5858

A Mailing Label Applicationdata labels(keep=FullName Address1 Address2); length FMName LName $ 10;

set prog2.freqflyers;if substr(right(ID),6)='1' then Title = 'Ms.';else if substr(right(ID),6)='2' then Title = 'Mr.';FMName = scan(Name,2,',');Lname = scan(Name,1,',');FullName = catx(' ',Title,FMName,LName);Address2 = catx(' ',

scan(Address2,1,',') || ',', scan(Address2,3,','),

scan(Address2,4,','));run;

5959

A Mailing Label Application

FullName Address1 Address2

Ms. Sue Farr 15 Harvey Rd. Macon, GA 31298Ms. Kay B. Cox 163 McNeil Pl. Kern, CA 93280Mr. Ron Mason 442 Glen Ave. Miami, FL 33054Mr. G. H. Ruth 2491 Brady St. Munger, MI 48747

proc print data=labels noobs; var FullName Address1 Address2;run;

PROC PRINT Output

6060

This exercise reinforces the concepts discussed previously.

Exercise

6161

Exercises

The MIT Admissions Office received a list of students with perfect SAT scores. The file must be in a format that Admissions can use. Use the People data set to create a temporary SAS data set named Separate that contains the variables First, MI, and Last to perform the following tasks:

1. Create a First and MI variable that contains each person's first name and middle initial. Do not include in the Separate data set.

2. Use the Separate data set to create a temporary data set called flname that contains the variables NewName and CityState. NewName should be a concatenation of each person's first name and last name with one space between them.

3. Create a list report to view the results.

6262

Exercises – Solution 1

libname prog2 'your-directory';data separate(drop=FMnames); length FMnames First $ 30

MI $ 2Last $ 30;

set prog2.people; FMnames = left(scan(Name,2,',')); First = scan(FMnames,1,' '); MI = left(scan(FMnames,2,' ')); Last = scan(Name,1,',');run;

proc print data=separate; var Name CityState First MI Last;run;

6363

Exercises – Solution 2

libname prog2 'your-directory';data flname(keep=NewName CityState); length FMname First MI Last $ 30; set prog2.people; Last = scan(Name,1,','); FMname = left(scan(Name,2,',')); First = scan(FMname,1,' '); MI = scan(FMname,2,' '); NewName = trim(First) !! ' ' !! Last;run;

proc print data=flname; var NewName CityState;run;

6464

Exercises - Output

The SAS System

Obs NewName CityState

1 LINDSAY DEAN WILMINGTON, NC 2 HELEN-ASHE FLORENTINO WASHINGTON, DC 3 JAN VAN ALLSBURG SHORT HILLS, NJ 4 STANLEY LAFF SPRINGFIELD, IL 5 GEORGE RIZEN CHICAGO, IL 6 MARC MITCHELL CHICAGO, IL 7 DOROTHY MILLS JOE, MT 8 JONATHAN WEBB MORRISVILLE, NC 9 MAYNARD KEENAN SEDONA, AZ 10 PHYLLIS LACK WALTHAM, MA 11 KERRY THOMPSON WINTER PARK, FL 12 DOROTHY COX TIMONIUM, MD 13 DONALD SEPTOFF BOSTON, MA 14 JANICE PHOENIX SOMERVILLE, NJ 15 MURRAY HUNEYCUTT DIME BOX, TX 16 SHERRY ERICKSON EL PASO, TX 17 CLIVE SCHNEIDER CAPE MAY, NJ 18 KIMBERLY PUTNAM DUNWOODY, GA 19 JENNIFER PITTMAN BENNINGTON, VT 20 STACY ROLEN CODY, WY

6565

Exercises

The MIT Admissions Office likes to review applications based on merit and remove as much identifiable material as possible. They choose to label each application folder with an applicants' initials only.

Using the Separate data set that you recently

created, create a temporary data set called init that

contains only the variables Name and Initials.

The value of Initials should be a concatenation of the first characters from each person's first name, middle initial, and last name with no delimiters separating the characters.

6666

Exercises

data init(drop=FName MName LName FMNames); length Initials $ 3 LName FMNames FName MName $ 30; set prog2.People; LName=scan(Name,1,','); FMNames=scan(Name,2,','); FName=scan(FMNames,1,' '); MName=scan(FMNames,2,' '); /* Put together just the first letters. */ Initials=substr(FName,1,1) !! substr(MName,1,1) !! substr(LName,1,1);run;proc print data=init; var Name CityState Initials;run;

6767

Exercises

The SAS System

Obs Name CityState Initials

1 DEAN, LINDSAY A. WILMINGTON, NC LAD 2 FLORENTINO, HELEN-ASHE H. WASHINGTON, DC HHF 3 VAN ALLSBURG, JAN F. SHORT HILLS, NJ JFV 4 LAFF, STANLEY X. SPRINGFIELD, IL SXL 5 RIZEN, GEORGE Q. CHICAGO, IL GQR 6 MITCHELL, MARC J. CHICAGO, IL MJM 7 MILLS, DOROTHY E. JOE, MT DEM 8 WEBB, JONATHAN W. MORRISVILLE, NC JWW 9 KEENAN, MAYNARD J. SEDONA, AZ MJK 10 LACK, PHYLLIS M. WALTHAM, MA PML 11 THOMPSON, KERRY L. WINTER PARK, FL KLT 12 COX, DOROTHY E. TIMONIUM, MD DEC 13 SEPTOFF, DONALD E. BOSTON, MA DES 14 PHOENIX, JANICE A. SOMERVILLE, NJ JAP 15 HUNEYCUTT, MURRAY Y. DIME BOX, TX MYH 16 ERICKSON, SHERRY A. EL PASO, TX SAE 17 SCHNEIDER, CLIVE J. CAPE MAY, NJ CJS 18 PUTNAM, KIMBERLY M. DUNWOODY, GA KMP 19 PITTMAN, JENNIFER R. BENNINGTON, VT JRP 20 ROLEN, STACY D. CODY, WY SDR

6868

A Search ApplicationThe ffhistory data set contains information about the history of each frequent flyer.

This history information consists of each membership level that the flyer attained (Bronze,

Silver, or Gold) the year that the flier attained each level.

Create a report that shows all frequent flyers who attained Silver membership status and the year each of them became Silver members.

6969

A Search Applicationffhistory

To determine who attained Silver membership status, search the Status variable for the value Silver.

Seat ID Status Pref

F31351 Silver 1998,Gold 2000 AISLEF161 Bronze 1999 WINDOWF212 Bronze 1992,silver 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE

7070

The FIND function searches for a specific substring of characters within a character string that you specify, and returns its location.

The FIND function returns the starting position of the first occurrence of value

within target, if value is found 0, if value is not found.

The FIND Function

Position = FIND(target,value<,modifiers,startpos>);Position = FIND(target,value<,modifiers,startpos>);

7171

The FIND FunctionA modifier can be the value of I or T. I indicates that the search is case insensitive. T indicates that the search ignores trailing blanks. They can be combined. If they are omitted, the search is case sensitive and trailing blanks are taken into consideration.

The startpos is an integer that specifies the position at which the search should start and the direction of the search.

Positive values = forward (right)

Negative values= backward (left)

If omitted, the search starts at position 1 and moves right.

7272

PosN 8

24

Text="This target contains a BULL'S-EYE.";Pos=find(Text,"BULL'S-EYE");

The FIND FunctionDetermine whether Text contains the string

BULL'S-EYE.

Text$ 34

This target contains a BULL'S-EYE.

24

7373

The FIND Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');run;

6

6

Pos1 N 8

...

7474

The FIND Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');run;

86

Pos2N 8

Pos1 N 8

8

...

7575

The FIND Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');run;

086

Pos3N 8

Pos2N 8

Pos1 N 8

...

7676

The FIND Function

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=find(Text,'IT'); Pos2=find(Text,' IT '); Pos3=find(Text,'it'); Pos4=find(Text,'it','I');run;

086

Pos3N 8

Pos2N 8

Pos1 N 8

Pos4N 8

6

7777

A Search Application

Seat ID Status Pref

F31351 Silver 1998,Gold 2000 AISLEF161 Bronze 1999 WINDOWF212 Bronze 1992,silver 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE

prog2.ffhistory

data silver; set prog2.ffhistory; if find(Status,'silver','I') > 0;run;

7878

Seat ID Status Pref

F31351 Silver 1998,Gold 2000 AISLEF212 Bronze 1992,silver 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE

proc print data=silver noobs;run;

A Search Application

PROC PRINT Output

7979

The INDEX FunctionThe INDEX function searches a character argument for the location of a specified character value and returns its location.

The INDEX function returns the starting position of the first occurrence of value

within target, if value is found 0, if value is not found.

Position=INDEX(target,value);Position=INDEX(target,value);

8080

The INDEX FunctionTarget = specifies the character expression to search.

Value= specifies the string of characters to search for in the character expression.

The search for value is literal. Capitalization and blanks are considered.

The INDEX function differs from the FIND function : does not have a modifier does not have startpos functionality

8181

PosN 8

24

Text="This target contains a BULL'S-EYE.";Pos=index(Text,"BULL'S-EYE");

The INDEX FunctionDetermine whether Text contains the string

BULL'S-EYE.

TEXT$ 34

This target contains a BULL'S-EYE.

24

8282

The INDEX Function

Pos1 N 8

6

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=index(Text,'IT'); Pos2=index(Text,' IT '); Pos3=index(Text,'it');run;

6

...

8383

The INDEX Function

Pos2N 8

8

Pos1N 8

6

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=index(Text,'IT'); Pos2=index(Text,' IT '); Pos3=index(Text,'it');run;

8

...

8484

The INDEX Function

Pos3N 8

0

Pos2N 8

8

Pos1N 8

6

data index; Text='DELIMIT IT WITH BLANKS.'; Pos1=index(Text,'IT'); Pos2=index(Text,' IT '); Pos3=index(Text,'it');run;

8585

The INDEX Functiondata index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;

IT

String$ 5

...

8686

The INDEX Functiondata index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;

IT

String$ 5

Pos4N 8

0

...

8787

The INDEX Functiondata index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;

IT

String$ 5

6

Pos5N 8

Pos4N 8

0

6

...

8888

The INDEX Functiondata index2; length String $ 5; String='IT'; Text='DELIMIT IT WITH BLANKS.'; Pos4=index(Text,String); Pos5=index(Text,trim(String)); Pos6=index(Text,' ' !! trim(String) !! ' ');run;

8

IT

String$ 5

Pos4N 8

0 6

Pos5N 8

8

Pos6N 8

8989

A Search Application

Seat ID Status Pref

F31351 Silver 1998,Gold 2000 AISLEF161 Bronze 1999 WINDOWF212 Bronze 1992,silver 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE

prog2.ffhistory

data silver; set prog2.ffhistory; if index(Status,'Silver') > 0;run;

9090

Why was F212 not selected?

Seat ID Status Pref

F31351 Silver 1998,Gold 2000 AISLEF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE

proc print data=silver noobs;run;

A Search Application

PROC PRINT Output

9191

Why was F212 not selected?

For F212, silver was stored in lowercase. You searched for Silver.

Seat ID Status Pref

F31351 Silver 1998,Gold 2000 AISLEF25122 Bronze 1994,Gold 1996,Silver 1998 AISLE

proc print data=silver noobs;run;

A Search Application

PROC PRINT Output

9292

The UPCASE FunctionThe UPCASE function converts all letters in its argument to uppercase has no effect on digits and special characters.

NewVal=UPCASE(argument);NewVal=UPCASE(argument);

9393

A Search Applicationdata silver(drop=Location); length Year $ 4; set prog2.ffhistory; Location=index(upcase(Status),'SILVER'); if Location > 0; Year=substr(Status,Location+7,4);run;

proc print data=silver noobs; var ID Status Year SeatPref;run;

9494

A Search Application

Did the values of Status permanently change?

Seat ID Status Year Pref

F31351 Silver 1998,Gold 2000 1998 AISLEF212 Bronze 1992,silver 1995 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 1998 AISLE

PROC PRINT Output

Year=substr(Status,Location+7,4);

Location Location+7

9595

The PROPCASE FunctionThe PROPCASE function converts all words in an argument to proper case, in which the first letter is uppercase and the remaining letters are lowercase.

Default delimiters for the PROPCASE function are the blank, forward slash, hyphen, open parenthesis, period, and tab characters.

NewVal=PROPCASE(argument <,delimiter(s)>);NewVal=PROPCASE(argument <,delimiter(s)>);

9696

A Search Applicationdata silver(drop=Location); length Year $ 4; set prog2.ffhistory; Status=propcase(Status,' ,'); Location=find(Status,'Silver'); if Location > 0;

SeatPref=propcase(SeatPref);

Year=substr(Status,Location+7,4);run;

proc print data=silver noobs; var ID Status Year SeatPref;run;

9797

A Search Application

Seat ID Status Year Pref

F31351 Silver 1998,Gold 2000 1998 AisleF212 Bronze 1992,Silver 1995 1995 WindowF25122 Bronze 1994,Gold 1996,Silver 1998 1998 Aisle

PROC PRINT Output

9898

The TRANWRD FunctionThe TRANWRD function replaces or removes all occurrences of a given word (or a pattern of characters) within a character string.

The TRANWRD function does not remove trailing blanks from target or replacement.

source source string that you want translated

target string searched for in source

replacement string that replaces the target

NewVal=TRANWRD(source,target,replacement);NewVal=TRANWRD(source,target,replacement);

9999

The TRANWRD FunctionIf the length of the new variable is not previously defined with a LENGTH statement, the default length is 200 bytes.

Using the TRANWRD function to replace an existing string with a longer string might cause truncation of the resulting value if a LENGTH statement is not used.

100100

Dessert$ 20

Apple pie

Dessert$ 20

Pumpkin pie

Dessert=tranwrd(Dessert,'Pumpkin','Apple');

The TRANWRD FunctionReplace the first word of Dessert.

101101

A Search Applicationdata silver(drop=Location); length Year $ 4; set prog2.ffhistory; Status=tranwrd(Status,'silver','Silver'); Location=index(Status,'Silver'); if Location > 0; Year=substr(Status,Location+7,4);run;

proc print data=silver noobs; var ID Status Year SeatPref;run;

102102

A Search Application

Seat ID Status Year Pref

F31351 Silver 1998,Gold 2000 1998 AISLEF212 Bronze 1992,Silver 1995 1995 WINDOWF25122 Bronze 1994,Gold 1996,Silver 1998 1998 AISLE

PROC PRINT Output

103103

The LOWCASE FunctionThe LOWCASE function converts all letters in its argument to lowercase has no effect on digits and special characters.

NewVal=LOWCASE(argument);NewVal=LOWCASE(argument);

104104

The SUBSTR Function (Left Side)The SUBSTR function is used to extract or replace characters.

This form of the SUBSTR function (left side of the assignment statement) replaces characters in a character variable.

SUBSTR(string,start<,length>)=value;SUBSTR(string,start<,length>)=value;

105105

The SUBSTR Function (Left Side)string specifies a character variable

start specifies a numeric expression that is the beginning character position

length specifies a numeric expression that is the length of the substring to be replaced

The length value cannot be larger than the remaining length of string after start (including trailing blanks).

If you omit length, SAS uses all the characters on the right side of the assignment statement to replace the values of string up to the limit.

106106

Replace two characters from Location starting at position 11.

The SUBSTR Function (Left Side)

substr(Location,11,2)='OH';

Location$ 18

Columbus, OH 43227

Location$ 18

Columbus, GA 43227

107107

data silver; set silver; substr(SeatPref,2)= lowcase(substr(SeatPref,2));run;

The LOWCASE Function

SeatPref$ 6

AISLE

...

108108

data silver; set silver; substr(SeatPref,2)= lowcase(substr(SeatPref,2));run;

The LOWCASE Function

SeatPref$6

AISLE

isle

...

109109

data silver; set silver; substr(SeatPref,2)= lowcase(substr(SeatPref,2));run;

SeatPref$6

A

The LOWCASE Function

SeatPref$6

AISLE

isle

isle

110110

proc print data=silver noobs; var ID Year SeatPref;run;

A Search Application

Seat ID Year Pref

F31351 1998 Aisle F212 1995 Window

F25122 1998 Aisle

PROC PRINT Output

111111

This exercise reinforces the concepts discussed previously.

Exercise

112112

Exercises

The Pizza Company tracks feedback from customers. The person

who recorded the data misspelled the word received in the data

set Complaint.

Write a program to correct this data mistake. Create a new data set

called Complaints2 that contains the corrected data. Create a list report to view the results.

113113

Exercises – Solution

libname apex 'SAS-directory';data complaints2; set apex.complaint; Complaint = tranwrd(Complaint, 'recieve', 'receive');run;proc print data=complaints2;run;

114114

Exercises – Output

115115

Exercises

The SAS Snack Company wants to create categories for snacks.

Using the Snacks data set, create a temporary SAS data set

named Snack_new.

1. Create a variable called Category. The value will depend on the product and should be one of the following: Chips, Pretzels, Pretzel Sticks, Popcorn, Pork Rinds, Crackers. or Puffs. (Note: Pretzel Sticks should go to the Pretzel Category, as well as other snacks with Pretzel in the name.)

2. Create a variable called TotalSales that is the product of the quantity sold and the price. Format the value with dollar signs and two decimal places.

3. Create a frequency report of Category to make sure that all the rows are mapped correctly.

4. Create a list report to view the results.

116116

Exercises - Solution

data snack_new;

set snacks;

length Category $ 12;

if index(lowcase(Product), 'chip') >0 then Category = "Chips";

else if index(lowcase(Product), 'pretzel') >0 then Category = "Pretzels";

else if index(lowcase(Product), 'stick') >0 then Category = "Sticks";

else if index(lowcase(Product), 'popcorn') >0 then Category = "Popcorn";

else if index(lowcase(Product), 'pork rinds') >0 then

Category = "Pork Rinds";

else if index(lowcase(Product), 'cracker') >0 or

index(lowcase(Product), 'saltine') >0 then Category = "Crackers";

else if index(lowcase(Product), 'puffs') >0 then Category = "Puffs";

TotalSales = QtySold*Price;

format TotalSales dollar20.2;

run;

proc freq data=snack_new;

tables category;

run;

117117

Exercises - Output

The FREQ Procedure

Cumulative Cumulative Category Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chips 11242 31.43 11242 31.43 Crackers 5110 14.29 16352 45.71 Popcorn 4088 11.43 20440 57.14 Pork Rinds 3066 8.57 23506 65.71 Pretzels 3066 8.57 26572 74.29 Puffs 3066 8.57 29638 82.86 Sticks 6132 17.14 35770 100.00

118118

Exercises

As part of the financial aid process, MIT helps students identify scholarship opportunities from external sources.

Use the People data set to create the temporary

Prairie data set.

Use the appropriate function to search through values of CityState to identify only those applicants from Illinois.

119119

Exercises

data prairie; set prog2.people; if index(CityState,' IL') > 0;run;proc print data=Prairie;run;

The SAS System

Obs Name CityState

1 LAFF, STANLEY X. SPRINGFIELD, IL 2 RIZEN, GEORGE Q. CHICAGO, IL 3 MITCHELL, MARC J. CHICAGO, IL

120120

Exercises

In order to mail the scholarship information to the proper applicants, MIT need the address information in the proper format for a letter.

1. Use the variable Name from Prairie to create a

data set called mixedprairie that contains the

values of Name.

2. Convert Name from uppercase to mixed case.

121121

Exercises

data mixedprairie; set prairie; Name = propcase(Name);run;

proc print data=mixedprairie;run;

Section 12.3

Manipulating Numeric Values

123123

Objectives Use SAS functions to truncate numeric values. Use SAS functions to compute sample statistics of

numeric values.

124124

Truncation FunctionsSelected functions that truncate numeric values include ROUND function CEIL function FLOOR function INT function.

125125

The ROUND FunctionThe ROUND function returns a value rounded to the nearest round-off unit.

If round-off-unit is not provided, argument is rounded to the nearest integer.

NewVar=ROUND(argument<,round-off-unit>);NewVar=ROUND(argument<,round-off-unit>);

126126

The ROUND Function

NewVar1

12

data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;

...

127127

The ROUND Function

NewVar2

42.7

NewVar1

12

data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;

...

128128

The ROUND Function

NewVar3

6.48

NewVar2

42.7

NewVar1

12

data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;

...

129129

The ROUND Function

NewVar4

100

NewVar3

6.48

NewVar2

42.7

NewVar1

12

data truncate; NewVar1=round(12.12); NewVar2=round(42.65,.1); NewVar3=round(6.478,.01); NewVar4=round(96.47,10); run;

130130

The CEIL FunctionThe CEIL function returns the smallest integer greater than or equal to the argument.

3 4 5X

4.4

x=5;

NewVar=CEIL(argument);NewVar=CEIL(argument);

x=ceil(4.4);

131131

The FLOOR FunctionThe FLOOR function returns the greatest integer less than or equal to the argument.

X3.6

NewVar=FLOOR(argument);NewVar=FLOOR(argument);

3 4 5

y=3;y=floor(3.6);

132132

The INT FunctionThe INT function returns the integer portion of the argument.

NewVar=INT(argument);NewVar=INT(argument);

X3.9

3 4 5

z=3;z=int(3.9);

133133

Truncation Functions

NewVar1

7

Var1

6.478

data truncate; Var1=6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;

...

134134

Truncation Functions

NewVar2

6

NewVar1

7

Var1

6.478

data truncate; Var1=6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;

...

135135

Truncation Functions

NewVar3

6

NewVar2

6

NewVar1

7

Var1

6.478

data truncate; Var1=6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;

136136

Truncation Functions

NewVar3

-6

NewVar2

-7

NewVar1

-6

Var1

-6.478

data truncate; Var1=-6.478; NewVar1=ceil(Var1); NewVar2=floor(Var1); NewVar3=int(Var1); run;

Use the same functions with a negative value for the variable Var1.

For values greater than 0, the FLOOR and INT functions return the same value. For values less than 0, the CEIL and INT functions return the same value.

137137

Functions That Compute StatisticsSelected functions that compute sample statistics based on a group of values include the following: SUM function (total of values) MEAN function (average of values) MIN function (lowest value) MAX function (highest value)

These functions accept multiple arguments in any order use the same algorithm as SAS statistical procedures ignore missing values.

138138

Functions That Compute StatisticsThe SUM function adds values together and ignores missing values.

The MIN function returns the smallest non-missing value.

The MAX function returns the largest value.

NewVar=SUM(argument-1,argument-2,…,argument-n);NewVar=SUM(argument-1,argument-2,…,argument-n);

MIN(argument-1, argument-2,…, argument-n);MIN(argument-1, argument-2,…, argument-n);

MAX(argument-1, argument-2,…, argument-n);MAX(argument-1, argument-2,…, argument-n);

139139

The SUM Function

NewVar

18

Var3

6

Var2

.

Var1

12

data summary; Var1=12; Var2=.; Var3=6; NewVar=sum(Var1,Var2,Var3);run;

...

140140

The SUM Function

What would be the value of NewVar if an arithmetic operator were used instead of the SUM function?

NewVar

18

Var3

6

Var2

.

Var1

12

data summary; Var1=12; Var2=.; Var3=6; NewVar=sum(Var1,Var2,Var3);run;

...

141141

The SUM Function

What would be the value of NewVar if an arithmetic operator were used instead of the SUM function?

Missing

NewVar

18

Var3

6

Var2

.

Var1

12

data summary; Var1=12; Var2=.; Var3=6; NewVar=sum(Var1,Var2,Var3);run;

142142

The MEAN FunctionThe MEAN function calculates the arithmetic mean (average) of values and ignores missing values.

NewVar=MEAN(argument-1,argument-2,…,argument-n);NewVar=MEAN(argument-1,argument-2,…,argument-n);

143143

The MEAN Function

NewVar

9

Var3

6

Var2

.

Var1

12

data summary; Var1=12; Var2=.; Var3=6; NewVar=mean(Var1,Var2,Var3);run;

144144

This exercise reinforces the concepts discussed previously.

Exercise

145145

Exercises

Final grades are coming.

Use the data set Grade to create a data set named

Final. The Final data set should contain a variable

named Overall that is the semester average grade.

Calculate Overall by averaging all the tests plus the final. The final is weighted twice as much as any of the other tests. (Count the final twice when calculating Overall.)

Round Overall to the nearest integer.

146146

Exercises

data final; set prog.grade; Overall=round(mean(Test1,Test2,Test3,Final,Final));run;proc print data=final;run;

data final; set prog.grade; Overall=round(mean(of Test1-Test3,Final,Final));proc print data=final;run;

Alternate Solution

147147

Exercises

The SAS System

SSN Course Test1 Test2 Test3 Final Overall

012-40-4928 BUS450 80 70 80 80 78012-83-3816 BUS450 90 90 60 80 80341-44-0781 MATH400 78 87 90 91 87423-01-7721 BUS450 80 70 75 95 83448-23-8111 MATH400 88 91 100 95 94723-14-8422 HIST100 88 90 91 95 92819-32-1294 HIST100 67 80 60 70 69831-34-2411 MATH400 72 76 82 79 78837-33-8374 HIST100 90 99 87 96 94877-22-7731 MATH400 87 85 80 78 82880-90-0783 HIST400 50 70 78 80 72920-22-0209 MATH400 79 87 81 82 82973-34-2119 BUS450 80 75 88 90 85877-22-7731 SCI400 80 70 80 80 78012-40-4928 FRENCH100 80 70 80 80 78819-32-1294 FRENCH100 67 80 60 70 69819-32-1294 BUS450 67 80 60 70 69723-14-8422 SCI400 79 87 81 82 82837-33-8374 SCI400 79 87 81 82 82

148148

Exercises

Final grades are coming.

Modify the DATA step in the previous exercise so that the value of Overall is the average of the two highest test scores and the final. (The lowest test should not be used to calculate Overall.)

As before, the final exam should be counted twice.

Round Overall to the nearest integer.

149149

Exercises

data final(drop=OverallTotal); set prog2.grade; OverallTotal=sum(Test1,Test2,Test3,Final,Final)- min(Test1,Test2,Test3); Overall=round(OverallTotal/4);run;proc print data=final;run;

data final(drop=OverallTotal); set prog2.grade; OverallTotal=sum(of Test1-Test3,Final,Final)- min(of Test1-Test3); Overall=round(OverallTotal/4);proc print data=final;run;

Alternate Solution

150150

Exercises

The SAS System

Obs SSN Course Test1 Test2 Test3 Final Overall

1 012-40-4928 BUS450 80 70 80 80 80 2 012-83-3816 BUS450 90 90 60 80 85 3 341-44-0781 MATH400 78 87 90 91 90 4 423-01-7721 BUS450 80 70 75 95 86 5 448-23-8111 MATH400 88 91 100 95 95 6 723-14-8422 HIST100 88 90 91 95 93 7 819-32-1294 HIST100 67 80 60 70 72 8 831-34-2411 MATH400 72 76 82 79 79 9 837-33-8374 HIST100 90 99 87 96 95 10 877-22-7731 MATH400 87 85 80 78 82 11 880-90-0783 HIST400 50 70 78 80 77 12 920-22-0209 MATH400 79 87 81 82 83 13 973-34-2119 BUS450 80 75 88 90 87 14 877-22-7731 SCI400 80 70 80 80 80 15 012-40-4928 FRENCH100 80 70 80 80 80 16 819-32-1294 FRENCH100 67 80 60 70 72 17 819-32-1294 BUS450 67 80 60 70 72 18 723-14-8422 SCI400 79 87 81 82 83 19 837-33-8374 SCI400 79 87 81 82 83

Section 12.4

Manipulating Numeric Values Based on Dates

152152

Objectives Review SAS functions used to create SAS date

values. Review SAS functions used to extract information

from SAS date values. Use SAS functions to determine the intervals between

two SAS date values.

153153

Creating SAS Date ValuesYou can use the MDY or TODAY functions to create SAS date values.

The MDY function creates a SAS date value from month, day, and year values.

The TODAY function returns the current date as a SAS date value.

NewDate=MDY(month,day,year);NewDate=MDY(month,day,year);

NewDate=TODAY();NewDate=TODAY();

154154

Extracting InformationYou can use the MONTH, DAY, and YEAR functions to extract information from SAS date values.

The MONTH function creates a numeric value (1-12) that represents the month of a SAS date value.

NewMonth=MONTH(SAS-date-value);NewMonth=MONTH(SAS-date-value);

continued...

155155

Extracting InformationThe DAY function creates a numeric value (1-31) that represents the day of a SAS date value.

The YEAR function creates a four-digit numeric value that represents the year.

NewDay=DAY(SAS-date-value);NewDay=DAY(SAS-date-value);

NewYear=YEAR(SAS-date-value);NewYear=YEAR(SAS-date-value);

continued...

156156

Calculating an Interval of YearsThe YRDIF function returns the number of years between two SAS date values.

NewVal=YRDIF(sdate,edate,basis);NewVal=YRDIF(sdate,edate,basis);

continued...

157157

Calculating an Interval of Years'ACT/ACT' Actual number of days between dates in

calculating the number of years. # of days that fall in a 365 year/365 plus # of days in a 366 year/366. ACTUAL is the alias.

'30/360' Each month is considered to have 30 days and each year 360 days regardless of the actual number. '360' is the alias.

'ACT/360' Actual number of days between dates in calculating the number of years regardless of the actual number of days in a year.

'ACT/365' Actual number of days between dates to calculate years. Number of days divided by 365 regardless of the actual number in a year.

158158

The YRDIF FunctionThe variable DOB represents a person's date of birth.

Assume today's date is May 3, 2008, and DOB is November 8, 1972. What is this person's age?

The DATDIF function can be used to return the number of days between two SAS date values.

Only two basis values are valid for the DATDIF function:

'ACT/ACT' and '30/360')

MyVal=yrdif(DOB,'3may2008'd,'act/act');

159159

The YRDIF FunctionThe variable DOB represents a person's date of birth.

Assume today's date is May 3, 2008, and DOB is November 8, 1972. What is this person's age?

MyVal=yrdif(DOB,'3may2008'd,'act/act');

MyVal

35.4836

...

160160

The YRDIF FunctionThe variable DOB represents a person's date of birth.

Assume today's date is May 3, 2008, and DOB is November 8, 1972. What is this person's age?

How can you alter this program to compute each employee's age based on today's date truncate all of the decimal places without rounding?

MyVal=yrdif(DOB,'3may2008'd,'act/act');

MyVal

35.4836

...

161161

The YRDIF FunctionHow can you alter this program to compute each employee's age based on today's date truncate all of the decimal places without rounding?

This code was run on July 13, 2008. Your values will differ.

int(yrdif(DOB, today(),'act/act'));

MyVal

35

162162

This exercise reinforces the concepts discussed previously.

Exercise

163163

Exercises

In order to vote in most states, voters must be 18 years of age by the date of the election. Use the Register data set to create two new data sets

called Voters and NonVoters. Use the existing BirthMonth, Day, and BirthYear to

create a new variable called Birthday that stores the SAS date value for each voter's birth date.

Create a second new variable called Age that stores the number of years between each voter's birthday and today.

The values of Birthday should be displayed with the DATE9. format. The value of Age should be truncated to remove all decimals without rounding.

Produce two listing reports with appropriate titles.

164164

Exercises

data voters nonvoters; keep StudentId Name Birthday Age; set prog.register; Birthday=mdy(BirthMonth,Day,BirthYear);

/* The FLOOR function could be used in the following assignment statement. */ Age=int(yrdif(Birthday,today(),'act/act')); format Birthday date9.; if Age >= 18 then output voters; else output nonvoters;run;title 'Students who are over 18 and can vote';proc print data=voters;run;title 'Students who are not over 18 and cannot vote';proc print data=nonvoters;run;

165165

Exercises

Partial Listing of Voters

Your output will contain a different number of rows.

Students who are over 18 and can vote

StudentObs ID Name Birthday Age

1 1155 Angel Reisman 23JUN1987 19 2 1266 Melanie Michaels 17MAR1988 18 3 2055 Faith Sadowski 06FEB1988 18 4 2561 Dorothy Gilbert 16APR1988 18 5 2584 Patrice Ray 18AUG1988 18 6 2587 Jeremiah Ashford 26SEP1987 19 7 2600 Alisha Gurman 21DEC1987 18 8 2606 Gustavo Spencer 16SEP1988 18 9 2681 Ryan Lin 03MAR1988 18 10 3213 Thomas Gladwell 09JUN1988 18 11 3250 Misty Orlowski 22OCT1987 19 12 3456 Ruby Abdul 09JUN1988 18

166166

Exercises

Partial Listing of NonVoters Students who are not over 18 and cannot vote

StudentObs ID Name Birthday Age

1 1005 Chaz Richardson 21JUN1989 17 2 1154 Barbara Muir 04APR1990 16 3 1245 Leticia Ritter 27MAR1990 16 4 1257 Richard Calle 01SEP1989 17 5 1258 Ronnie Trimpin 03OCT1989 17 6 2001 Troy Pruska 05AUG1989 17 7 2006 Annie Ritter 25FEB1990 16 8 2046 Derrick Ikiz 05MAR1989 17 9 2334 Jesse Liu 03AUG1991 15 10 2335 Taylor Lowet 07AUG1989 17 . . .

Your output will contain a different number of rows.

Section 12.5

Converting Variable Type

168168

Objectives Describe the automatic conversion of character data

into numeric data. Explicitly convert character data into numeric data. Describe the automatic conversion of numeric data

into character data. Explicitly convert numeric data into character data.

169169

Data ConversionIn many applications, you might need to convert one data type to another. You might need to read digits in character form

into a numeric value. You might need to write a numeric value to a

character string.

170170

Data ConversionYou can convert data types by using one of the following methods: implicitly by enabling SAS to do it for you explicitly with these functions:

– INPUT character-to-numeric conversion– PUT numeric-to-character conversion

171171

Automatic Character-to-Numeric ConversionThe prog2.salary1 data set contains a character

variable GrossPay. Compute a 10-percent bonus for each employee.

What will happen when the character values of GrossPay are used in an arithmetic expression?

172172

Automatic Character-to-Numeric Conversionprog2.salary1

data bonuses; set prog2.salary1; Bonus=.10*GrossPay;run;

ID GrossPay $11 $5

201-92-2498 52000 482-87-7945 32000

330-40-7172 49000

173173

Partial Log

2 data bonuses;3 set prog2.salary1;4 Bonus=.10*GrossPay;5 run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 4:14NOTE: The data set WORK.BONUSES has 3 observations and 3 variables.

Automatic Character-to-Numeric Conversion

174174

ID GrossPay Bonus

201-92-2498 52000 5200482-87-7945 32000 3200330-40-7172 49000 4900

proc print data=bonuses noobs; run;

Automatic Character-to-Numeric Conversion

PROC PRINT Output

175175

SAS automatically converts a character value to a numeric value when the character value is used in a numeric context, such as the following: assignment to a numeric variable an arithmetic operation logical comparison with a numeric value a function that takes numeric arguments

Automatic Character-to-Numeric Conversion

176176

The automatic conversion uses the w. informat produces a numeric missing value from a character

value that does not conform to standard numeric notation (digits with optional decimal point and/or leading sign and/or E-notation).

Automatic Character-to-Numeric Conversion

177177

Automatic conversion

Charactervalue

52000

1.243E1

-8.96

1,742.64

Numericvalue

52000

12.43

-8.96

.

Automatic Character-to-Numeric Conversion

178178

The INPUT FunctionThe INPUT function is used primarily for converting character values to numeric values.

The INPUT function returns the value produced when source is read with informat.

NumVar=INPUT(source,informat);NumVar=INPUT(source,informat);

179179

The INPUT Functiondata conversion; CVar1='32000'; CVar2='32,000'; CVar3='03may2008'; CVar4='050308'; NVar1=input(CVar1,5.); NVar2=input(CVar2,comma6.); NVar3=input(CVar3,date9.); NVar4=input(CVar4,mmddyy6.);run;

180180

----Alphabetic List of Variables and Attributes----

# Variable Type Len ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 CVar1 Char 5 2 CVar2 Char 6 3 CVar3 Char 9 4 CVar4 Char 6 5 NVar1 Num 8 6 NVar2 Num 8 7 NVar3 Num 8 8 NVar4 Num 8

The INPUT Function

Partial PROC CONTENTS Output

proc contents data=conversion; run;

181181

CVar1 CVar2 CVar3 CVar4 NVar1

32000 32,000 03may2008 050308 32000

NVar2 NVar3 NVar4

32000 17655 17655

The INPUT Functionproc print data=conversion noobs; run;

PROC PRINT Output

182182

Explicit Character-to-Numeric ConversionThe values of the variable GrossPay in the SAS data

set prog2.salary2 contain commas. Attempt to use automatic conversion to compute a 10-percent bonus.

prog2.salary2

ID GrossPay $11 $6

201-92-2498 52,000 482-87-7945 32,000

330-40-7172 49,000

183183

Explicit Character-to-Numeric Conversiondata bonuses; set prog2.salary2; Bonus=.10*GrossPay;run;

proc print data=bonuses;run;

PROC PRINT Output

ID GrossPay Bonus

201-92-2498 52,000 . 482-87-7945 32,000 . 330-40-7172 49,000 .

184184

Explicit Character-to-Numeric Conversiondata bonuses; set prog2.salary2; Bonus=.10*input(GrossPay,comma6.);run;

proc print data=bonuses;run;

ID GrossPay Bonus

201-92-2498 52,000 5200 482-87-7945 32,000 3200 330-40-7172 49,000 4900

PROC PRINT Output

c05s5d2.sas

185185

Data Conversionproc contents data=bonuses; run;

Partial PROC CONTENTS Output

----Alphabetic List of Variables and Attributes----

# Variable Type Len ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 Bonus Num 8 2 GrossPay Char 6 1 ID Char 11

How can you convert GrossPay to a numeric variable with the same name?

186186

GrossPay=input(GrossPay,comma6.);

Data ConversionYou cannot convert data by assigning the converted variable value to a variable with the same name.

This assignment statement does not change GrossPay

from a character variableto a numeric variable.

This assignment statement does not change GrossPay

from a character variableto a numeric variable.

187187

Data ConversionOn the left side of the assignment statement, you want GrossPay to be numeric. However, on the right side

of the assignment statement, GrossPay is character.

GrossPay=input(GrossPay,comma6.);

A variable is character or numeric. After the

variable type is established, it cannot be changed.

A variable is character or numeric. After the

variable type is established, it cannot be changed.

188188

Data ConversionFirst, use the RENAME= data set option to rename the variable that you want to convert.

To rename more than one variable from the same data set, separate with a space the variables to rename.

data bonuses; set prog2.salary2(rename=(GrossPay= CharGross)); additional SAS statementsrun;

SAS-data-set(RENAME=(old-name=new-name))SAS-data-set(RENAME=(old-name=new-name))

189189

Data ConversionSecond, use the INPUT function in an assignment statement to create a new variable whose name is the original name of the variable you renamed previously.

data bonuses; set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); additional SAS statementsrun;

190190

Data ConversionThird, use a DROP= data set option in the DATA statement to exclude the original variable from the output SAS data set.

data bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;

191191

data bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;

Data Conversion: Compilation

ID $ 4

CharGross $ 6

PDV

...

192192

Data Conversion: Compilationdata bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;

GrossPay N 8

ID $ 4

CharGross $ 6

PDV

...

193193

Data Conversion: Compilationdata bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;

ID $ 4

CharGross $ 6

Bonus N 8

GrossPay N 8

PDV

...

194194

Data Conversion: Compilationdata bonuses(drop=CharGross); set prog2.salary2(rename=(GrossPay= CharGross)); GrossPay=input(CharGross,comma6.); Bonus=.10*GrossPay;run;

ID $ 4

CharGross $ 6

Bonus N 8

GrossPay N 8

PDV

D

195195

Converting Character Dates to SAS Datesprog2.born

How can you alter this program to compute each person's age based on today's date?

Name Date $12 $7

Ruth, G. H. 13apr72 Delgado, Ed 25aug68 Overby, Phil 08jun71

data birth(drop=Date); set prog2.born; Birthday=input(Date,date7.); Age=int(yrdif(Birthday,'3may2008'd, 'ACT/ACT')); run;

196196

Converting Character Dates to SAS Datesprog2.born

Use the TODAY() function.

Name Date $12 $7

Ruth, G. H. 13apr72 Delgado, Ed 25aug68 Overby, Phil 08jun71

data birth(drop=Date); set prog2.born; Birthday=input(Date,date7.); Age=int(yrdif(Birthday, today(), 'ACT/ACT')); run;

197197

Name Birthday Age

Ruth, G. H. 4486 36Delgado, Ed 3159 39Overby, Phil 4176 36

Converting Character Dates to SAS Datesproc print data=birth noobs;run;

PROC PRINT Output

198198

Automatic Numeric-to-Character ConversionThe prog2.phones data set contains a numeric

variable Code (area code) and a character variable

Telephone (telephone number). Create a character variable that contains the area code in parentheses followed by the telephone number.

199199

prog2.phones

data phonenumbers; set prog2.phones; Phone='(' !! Code !! ') ' !! Telephone;run;

Code Telephone N8 $8

303 393-0956 919 770-8292 301 449-5239

Automatic Numeric-to-Character Conversion

200200

Partial Log

13 data phonenumbers;14 set prog2.phones;15 Phone='(' !! Code !! ') ' !! Telephone;16 run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).

15:17NOTE: The data set WORK.PHONENUMBERS has 3

observations and 3 variables.

Automatic Numeric-to-Character Conversion

201201

Code Telephone Phone

303 393-0956 ( 303) 393-0956919 770-8292 ( 919) 770-8292301 449-5239 ( 301) 449-5239

proc print data=phonenumbers noobs; run;

Automatic Numeric-to-Character Conversion

PROC PRINT Output

202202

Automatic Numeric-to-Character ConversionSAS automatically converts a numeric value to a character value when the numeric value is used in a character context, such as assignment to a character variable a concatenation operation a function that accepts character arguments.

203203

The automatic conversion uses the BEST12. format right-aligns the resulting character value.

Automatic conversion

Numericvalue:

8 bytes

303

Charactervalue:

12 bytes

303

9 leadingblanks

Automatic Numeric-to-Character Conversion

204204

data phonenumbers; set prog2.phones; Phone='(' !! Code !! ') ' !! Telephone;run;

Phone$23

( 303) 393-0956

9 leadingblanks

Automatic Numeric-to-Character Conversion

205205

The PUT function writes values with a specific format.

The PUT function returns the value produced when source is written with format.

The PUT Function

CharVar=PUT(source,format);CharVar=PUT(source,format);

206206

The PUT Functiondata conversion; NVar1=614; NVar2=55000; NVar3=366; CVar1=put(NVar1,3.); CVar2=put(NVar2,dollar7.); CVar3=put(NVar3,date9.); run;

207207

-----Variables Ordered by Position-----

# Variable Type Len ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 NVar1 Num 8 2 NVar2 Num 8 3 NVar3 Num 8 4 CVar1 Char 3 5 CVar2 Char 7 6 CVar3 Char 9

The PUT Function

The VARNUM option in the PROC CONTENTS statement prints a list of the variables by their logical position in the data set.

Partial PROC CONTENTS Output

proc contents data=conversion varnum;run;

208208

NVar1 NVar2 NVar3 CVar1 CVar2 CVar3 614 55000 366 614 $55,000 01JAN1961

The PUT Functionproc print data=conversion noobs; run;

PROC PRINT Output

209209

20 data phonenumbers;21 set prog2.phone;22 Phone='(' !! put(Code,3.) !! ') ' !! Telephone;23 run;

NOTE: The data set WORK.PHONENUMBERS has 3 observations and 3 variables.

Explicit Numeric-to-Character Conversiondata phonenumbers; set prog2.phones; Phone='(' !! put(Code,3.) !! ') ' !! Telephone;run;

Partial Log

210210

Code Telephone Phone

303 393-0956 (303) 393-0956 919 770-8292 (919) 770-8292 301 449-5239 (301) 449-5239

proc print data=phonenumbers noobs; run;

Automatic Numeric-to-Character Conversion

PROC PRINT Output

211211

This exercise reinforces the concepts discussed previously.

Exercise

212212

Exercises

Use the data set prog.students to create a new data set named students. Create a new character variable Telephone that has this pattern xxx-xxxx, where XXXXXXX is the value of Number.

Recall the previous program and alter it to create a new numeric variable Birthday from the DOB variable. Birthday should contain the MMDDYY10. format for SAS data values.

When you are confident that both variables were converted correctly, use a DROP= or KEEP= data set option to ensure that the only variables in the students data set are SSN, Telephone, and Birthday.

213213

Exercises

libname prog 'SAS-directory';data students(drop=Number DOB); set prog.students; /* The PUT function is used to convert NUMBER from numeric to character. The resulting character value is manipulated with the SUBSTR function to extract the first 3 characters then the last 4 characters. */

Telephone=substr(put(Number,7.),1,3) || '-' || substr(put(Number,7.),4);

/* The INPUT function is used to convert DOB from character to numeric. The character values are in the form ddMMMyyyy, so the DATE9. informat is used in the conversion. */ Birthday=input(DOB,date9.); format Birthday mmddyy10.;run;proc print data=students;run;

214214

Exercises

The SAS System

SSN Telephone Birthday

012-40-4928 546-7887 12/05/1968 012-83-3816 688-8321 05/03/1965 341-44-0781 941-8123 11/23/1972 423-01-7721 783-9191 06/28/1967 448-23-8111 942-8122 11/30/1960 723-14-8422 828-0911 02/12/1964 819-32-1294 387-8181 09/01/1968 831-34-2411 967-7810 12/24/1972 837-33-8374 992-7615 10/06/1971 877-22-7731 233-7449 07/08/1969

215215

Exercises

Use the data set agents to create a data set called

work.agents2. Create a variable called

TrueLocation that puts the City and State together, separated by a comma. (Desired output is shown on the next slide.)

Things to think about:

1. Do you need to do this for all observations?

2. If your output looks unexpected, try using the $QUOTE. format. (You do not need to specify a width.)

216216

Exercises

Work.Agents Output

TrueLocation CityCountry State

Auckland, New Zealand Auckland, New Zealand Amsterdam, Netherlands Amsterdam, Netherlands Anchorage, Alaska Anchorage, USA Alaska Canberra, Australia Canberra, Australia Australian Capital Athens (Athinai), Greece Athens (Athinai), Greece Birmingham, Alabama Birmingham, USA Alabama Bangkok, Thailand Bangkok, Thailand Nashville, Tennessee Nashville, USA Tennessee Boston, Massachusetts Boston, USA Massachusetts Kansas City, Missouri Kansas City, USA Missouri

217217

Exercises

libname prog2 'SAS-directory';data work.agents2 (drop=country); set prog2.agents; length Country $ 20 TrueLocation $ 40; Country=left(scan(CityCountry,2,',')); if Country='USA' then /* Note the space before USA */ TrueLocation = scan(CityCountry,1,',') !! ', ' !! State; else /* not USA */ TrueLocation = CityCountry;run;proc print data=work.agents2 noobs; var TrueLocation CityCountry State;run;