copyright © 2005, sas institute inc. all rights reserved. getting the best performance from v9...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Copyright © 2005, SAS Institute Inc. All rights reserved.
Getting the BestPerformance from V9Threaded PROC SORTScott MebustSystem DeveloperBase Information Technology
Copyright © 2005, SAS Institute Inc. All rights reserved. 2
The (Unofficial) SAS Skydiving Team
Copyright © 2005, SAS Institute Inc. All rights reserved. 3
Keys to Sorting Performance
Know the conditions
Observe actual performance
Understand theoretical performance
Make adjustments
Copyright © 2005, SAS Institute Inc. All rights reserved. 4
Know the Conditions
System
SAS
Sort job
Copyright © 2005, SAS Institute Inc. All rights reserved. 8
Observe Actual Performance
Monitor System Activity
Examine the SAS Log
Measure System Capabilities
Copyright © 2005, SAS Institute Inc. All rights reserved. 9
Identify and Observe Sorting PhasesSort Phase
Merge Phase
I/O Bound, External, Single-Threaded
Copyright © 2005, SAS Institute Inc. All rights reserved. 12
Measure Storage Device SequentialTransfer Rates From Within SAS
Create a large dataset (e.g. 4xRAM)
Read dataset, dumping to _NULL_
Ensure Real time » CPU time
Compute transfer rates (R)
readread
t
FR
writewrite
t
FR
Where
F: size of the dataset (bytes)t: real time (seconds)
Copyright © 2005, SAS Institute Inc. All rights reserved. 14
Measure In-Core Sorting Costs
CPU Time Per Observation
0.00E+00
5.00E-08
1.00E-07
1.50E-07
2.00E-07
2.50E-07
3.00E-07
3.50E-07
4.00E-07
4.50E-07
5.00E-07
10000 100000 1000000 10000000 1E+08
Number of Observations
No
rma
lize
d C
PU
Tim
e
(se
co
nd
s)
Actual
Predicted
Small joboverhead
Copyright © 2005, SAS Institute Inc. All rights reserved. 15
Understand Theoretical Performance
Classify the job
Estimate SORT running time
Consider estimation hazards
Copyright © 2005, SAS Institute Inc. All rights reserved. 16
Classify the Job Performance Limitation
Compute Bound
I/O Bound
Mixed
Copyright © 2005, SAS Institute Inc. All rights reserved. 17
Classify the Job Size
InternalM
OF
1
ExternalM
OF
1
SinglePassMOFB MultiPassMOFB
Where
F: size of input datasetO: size of internal sorting overheadM: size of RAMB: utility file page (block) size
Copyright © 2005, SAS Institute Inc. All rights reserved. 21
Estimate the Running TimeInternal Sort, I/O Bound
writeread ttt
readread
R
Ft
writewrite
R
Ft
Input
Output
RAM
Sequential Read
Sequential Write
Where
t: real time (sec)F: dataset size (bytes)R: transfer rate (bytes/sec)
Copyright © 2005, SAS Institute Inc. All rights reserved. 22
Estimate the Running TimeSingle-Pass External, I/O Bound
4321 ttttt readR
Ft 1
writeR
Ut 2
Output
Input Sequential Read
Sequential Write Random Read
Sequential WriteRAM
RAMwriteR
Ft 4 ?3 t
U: utility file size (bytes)
Temp
Where
Copyright © 2005, SAS Institute Inc. All rights reserved. 23
Utility File Read Time
oFU Single-threaded:
File Size
FU Multi-threaded:
B
Unpages
Number of Pages
readread
R
Ut
Best Case (Sequential) Read Time
readpagesread
R
Brsnt
Worst Case (Random) Read Time
where
where
B: utility file page (block) sizewhereF: size of input dataset
o: # of observations × sort key length
s: average positional latencyr: average rotational latency
Copyright © 2005, SAS Institute Inc. All rights reserved. 24
Multi-Pass External Sorting
M
OFnruns
B
Mnbuffers
)(
)(
buffers
runspasses
nLn
nLnn
Number of Sorted Runs Number of Utility File Passes
is theMaximum External
Merge Order
where
and
F: size of input datasetO: size of internal sorting overheadM: SORTSIZEB: utility file page (block) size
Copyright © 2005, SAS Institute Inc. All rights reserved. 25
Estimate the Running TimeSingle-Pass External, Compute Bound
writemergesort tttt
?sortt
Output
Input
Temp
Sequential Write Random Read
RAM
RAM
?mergetwrite
writeR
Ft
Copyright © 2005, SAS Institute Inc. All rights reserved. 26
Single-Pass External, Compute Bound
sortmerge tt
runs
obsobs
n
nrun
Utility File Creation Time
where
Where nobs is the total number of observations in the dataset
M
OFnruns
trun is the time required to perform an in-memory sort the number ofobservations in a single run
Utility File Merge Time, Compute Bound
runrunssort tnt
As previously described forI/O bound Utility File Read Time
Utility File Merge Time, I/O Bound
readmerge
R
Ut
readpagesmerge
R
Brsnt
Worst Case:
Best Case:
Copyright © 2005, SAS Institute Inc. All rights reserved. 27
Consider Estimation Hazards
File cache effects
Pseudo-internal sorting (thrashing)
Pseudo-external sorting (file cache)
Limitations within each sorting phase
Copyright © 2005, SAS Institute Inc. All rights reserved. 30
Make adjustments
Determine if there is a problem
Identify the problem
Alter the conditions
Re-evaluate
Copyright © 2005, SAS Institute Inc. All rights reserved. 31
Identify the Problem
Processing speed
Memory
External Storage
Copyright © 2005, SAS Institute Inc. All rights reserved. 32
Alter the Conditions
Memory settings
Library to storage device mappings
Utility file location
Utility file page size
Copyright © 2005, SAS Institute Inc. All rights reserved. 34Copyright © 2005, SAS Institute Inc. All rights reserved. 34