A SAS User's Guide toStorage Management
Allan PageSenior Marketing AnalystCanadian Tire Financial Services
Files are stored on all system types
PC (All flavours of Windows)
File or Application Servers (Novell, NT, Y2KServer)
Mid-Range Systems (Unix, etc.)
Mainframes (MVS)
All systems have two thing in common
When the drives are full, they’re full !
You can’t add data to a full drive !
Disk drives come in different sizes
PC LAN MID-RANGE MAINFRAME
Tip # 1
WILL I EVER, EVER NEED THIS FILE AGAIN?
YES? NO?
Tip # 1
WILL I EVER, EVER NEED THIS FILE AGAIN?
YES? NO?
DELETE
Tip # 2
What if you said yes?
What if you said yes?
Will I need this file
right away?
Will I need this file
in the near future?
Will I need this file
in the far or unknown
future?
YES? NO?
I need the file right away …..
Will I need this file
right away?
Will I need this file
in the near future?
Will I need this file
in the far or unknown
future?
YES? NO?
Don’t touch it
I need it in the near future …...
Will I need this file
right away?
Will I need this file
in the near future?
Will I need this file
in the far or unknown
future?
YES? NO?
Compress the file‘till it is needed.
I need it in the near future …...
MVS uses HSM System
UNIX uses GZIP
Windows uses Winzip
Windows XP with NTFS has zip and compression utilities
I need it in the distant future …..
Will I need this file
right away?
Will I need this file
in the near future?
Will I need this file
in the far or unknown
future?
YES? NO?
Consider AlternateStorage Media
I need it in the distant future …..
MVS HSM will migrate to tape
UNIX systems may have access to tape storage.
For Windows, consider storing on CD
SAS Specific Storage Efficiencies
Don’t keep duplicate files or subsets
Don’t keep unnecessary rows of data
Don’t keep unnecessary columns of data
How to create a view or use Where.
There are two types of views
1. Data step views
2. SQL views
What a view is - and is not
A view IS a MAP to read other data in a specified form.
A view IS NOT a data store.
Creating and using a Data Step ViewData sasuser.withoutact / view= sasuser.withoutact; infile'X:\Pamela\PR03150\nucomm\pmd.nuc.ctac.enroll.20030505.txt'firstobs=2 delimiter = ','MISSOVER DSD lrecl=32767 ;InputVAR1 $ VAR2 $ VAR3 $ VAR4 $ VAR5 $ VAR6 $ VAR7 $ ;Length V1 $14 V2 $38 V3 $42 V4 $21 V5 $4 V6 $8 V7 $12 ;Array grp_a {34}$ var1-var7;more SAS statements;Run;PROC PRINT data= sasuser.withoutact; run;
Creating and Using a SQL View
PROC SQL; CREATE VIEW sasuser.fitview as SELECT * FROM sasuser.fitness WHERE age > 50;QUIT;
PROC FREQ data=sasuser.fitview; Tables age;RUN;
The LENGTH Statement
Numeric variables have a default length of 8
Character variables default to the length of first use.
Use the LENGTH statement to override the default values.
What length should I use for numeric values?
Windows and Unix MVSLength
2 2563 8,192 65,5364 2,097,152 16,777,2165 536,870,912 4,294,967,2966 137,438,953,472 1,099,511,627,7767 35,184,372,088,832 281,474,946,710,6568 9,007,199,254,740,992 72,057,594,037,927,936
When to use the LENGTH statement.
It is best to use the LENGTH statement before any reference to the variable is made either by reading data or assigning values.
Why is position important?
When SAS compiles a DATA Step, the attributes of the DATA Set are determined. All statements for an attribute, EXCEPT for the length of the variable, are applied to the variable in order.
Why is position important?
The length attribute for numeric variables is not applied to the variable while it is being manipulated in the step. If the length of a numeric variable is shortened the truncation does not occur until the observation is written out to the output data set.
Why is position important?
The length attribute for character variables is determined by it's first occurrence.
Let’s look at some data.
The SAS System 08:07 Wednesday, May 7, 2003 2
Obs age weight runtime rstpulse runpulse maxpulse oxygen group
1 57 73.37 12.63 58 174 176 39.407 2 2 54 79.38 11.17 62 156 165 46.080 2 3 52 76.32 9.63 48 164 166 45.441 2
The actual contents of this file
1 age Num 8 0 Age in years8 group Num 8 56 Experimental group6 maxpulse Num 8 40 Maximum heart rate7 oxygen Num 8 48 Oxygen consumption4 rstpulse Num 8 24 Heart rate while resting5 runpulse Num 8 32 Heart rate while running3 runtime Num 8 16 Min. to run 1.5 miles2 weight Num 8 8 Weight in kg
The wrong way to use the LENGTH statement
data fitness1; set sasuser.fitness; length age rstpulse runpulse maxpulse group 3;run;
The correct way to use the LENGTH statement
;
data fitness2; length age rstpulse runpulse maxpulse group 3; set sasuser.fitness;run;
Using LENGTH in a SQL Queryproc sql; connect to oracle (user=&user1 pass=&pwd1 &pth1); create table u.skudata as select acct_id length = 6,
datepart(post_dt) as post_dt length = 4,det_item_qty as quantity length = 4
from connection to oracle ( SELECT acct_id,
post_dt,det_item_qty
FROM sku_data WHERE acct_id_suf = 0
and substr(dept_id,4,8) = '00111200' );
disconnect from oracle; order by acct_id;quit;
Setting length in the ATTRIB statement
Data fitness; ATTRIB age length=3 informat=3. Format = 3.
Label=’Age in Years’; Set sasuser.fitness;Run;
Data Compression
Can be set in an options statement or as a data step option.
Data Compression
Can be set in an options statement or as a data step option.
OPTIONS compress = yes;
Data perm.comp (compress = yes);
Data Compression
Compresses the data set by reducing repeated consecutive characters to two- or three-byte representations.
Data Compression - Advantages
Reduced storage requirements for the data set
Fewer input and output operations necessary to read from or write to the data set during processing.
Data Compression - Disadvantages
may not compress at all (may actually make the file larger), but a message detailing the amount of compression is provided
more CPU resources are required.
Compression - A good example6 libname col 'x:\colleen\2002\pr02810';NOTE: Libref COL was successfully assigned as follows: Engine: V8 Physical Name: x:\colleen\2002\pr0281078 data telephone (compress=yes);9 set col.telephone;10 run;
NOTE: There were 1344653 observations read from the dataset COL.TELEPHONE.NOTE: The data set WORK.TELEPHONE has 1344653 observations and 21variables.NOTE: Compressing data set WORK.TELEPHONE decreased size by 29.34 percent. Compressed is 19794 pages; un-compressed would require 28014 pages.NOTE: DATA statement used: real time 7:07.65 cpu time 20.96 seconds
Compression - A bad example
11 data fitness (compress=yes);12 set sasuser.fitness;13 run;
NOTE: There were 31 observations read from the dataset SASUSER.FITNESS.NOTE: The data set WORK.FITNESS has 31 observations and 8 variables.NOTE: Compressing data set WORK.FITNESS increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages.NOTE: DATA statement used: real time 0.27 seconds cpu time 0.02 seconds
Copyright © 2003, SAS Institute Inc. All rights reserved. 38