copyright © 2008 sas institute inc. all rights reserved. sas and all other sas institute inc....
Post on 28-Mar-2015
216 Views
Preview:
TRANSCRIPT
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Cheryl Doninger Nancy RauschR&D Director, SAS Senior Software Mgr, SAS
Data Integration in a Data Integration in a Grid-Enabled Grid-Enabled EnvironmentEnvironment
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
30 Years Ago - the Mainframe
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Exploiting multiple processors in a machine
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Grid goes beyond a single machineSAS Grid Manager
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Grid Manager Key CapabilitiesSAS Grid Manager
Distributed Enterprise Scheduling
Workload Balancing
Parallelized Workload Balancing
Distribute parallelized SAS workloads to a shared pool of resources.
Distribute workloads to a shared pool of resources.
Distribute jobs within workflows to a shared pool of resources.
Optimize the Efficiency and Utilization of Computing Resources
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
What Products Can Leverage SAS Grid Manager?SAS Grid Manager
Distributed Enterprise Scheduling
Workload BalancingParallelized Workload
Balancing
SAS Data Integration Studio
SAS Enterprise Miner
SAS Risk Dimensions*
Any SAS program*
SAS Stored Processes**
*(with modification)**(with limitations)
SAS Data Integration Studio
SAS Enterprise Guide*
SAS Workspace Server
Any SAS program*
SAS Stored Processes**
*(with wrapper)**(with limitations)
SAS Data Integration Studio
SAS Web Report Studio
SAS Marketing Automation
SAS Marketing Optimization
Any SAS program*
*(with modification)
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Code Importer
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Once Imported ...
http://support.sas.com/documentation/onlinedoc/gridmgr/index.html
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Data Integration Studio – Distributed Enterprise Scheduling
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Data Integration Studio – Multi-User Workload Balancing …
PUBLIC SECTOR
MANUFACTURINGFINANCIAL
LIFE SCIENCES
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Data Integration Studio on a Grid: Loops and Iterations
Example: A simple job
Specific physical tables referenced
Specific transform logic
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Repetition can be helpful
Processing data in multiple pieces
Same process over several data sets
Examples:• Same process every hour
• Same process for multiple stores
• Same process for every state
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
How to do repetitive things?
Here is one way
Copy, Paste
Edit in new job
Problem: Multiple maintenance
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Doing this more automatically
Use Looping
• Loop
• Loop End
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
How to do iteration
Loop input:
• list of items to repeat over
Loop body:
• one or more jobs and transforms to run repeatedly
Loop output: status table (optional)
• Can be input into next loop
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
How to loop
Sequential• One SAS session runs
them all
Parallel• Connect licensed
− Parallel on samemachine SMP
• Grid Manager licensed
− Parallel on a grid
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
How to loop in parallel – Lots of options
1 per CPU• Don’t overload machine
Specified number• Help prevent overload
• Can double up per CPU
Run all• Let ‘er rip!
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Controlling iteration with parameters
By default, tables are specific physical locations
Many things in SAS can accept macro variables
Parameters are macro-enabled ETL objects
Data Integration Studio provides user interface
Input values can be mapped to parameters
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Creating Parameters
Parameter name
Macro variable• &StateParm
Default value• Used in many ETL/S
activities− Running a test job− Viewing data
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Parameters on objects
Property tab: add parameters for that object
Jobs can import them• From referenced tables
• From included nested objects
Loop transform will use them
Can supply a default for testing
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Some good Examples of Parameters
In a table name• RETAIL&StateParm
In a filepath
In library path
ODS Titles
Mapping
SQL Query
…Anywhere you want…
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Real Example: Start with 1 Retail Store
1,000,000 orders
1 year =
80 MB data
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Scale up the data
10,000 stores
52 billion orders
5 years =
4.2 terabytes of data
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Run Jobs in Parallel by Looping
Loop transform1 Atlanta Store
2 Chicago Store
3 Miami Store
4 …
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Substitute Variables
1 Week1
2 Week2
3 Week3
4 …
Add parameter to Table:
Name = &week
1 OutWeek1
2 OutWeek2
3 OutWeek3
4 …
Add parameter to Table:
Name = Out&week
Process LoadInput
Existing Job
Output
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
The Results were very good …
3.22 terabytes per hour
50 GB / minute
~1 GB / sec
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Grid Partitions
Data Integration Studio
…
n
Enterprise Miner
EM grid
DI grid
Base, Connect,
Base, Connect,
Base, Connect,
SAS Grid Mgr
SASServers
Connect Client
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Grid Partitions
Restart sessions
Log directory
Error handling• Abort all remaining
• Abort only current
• Continue on error
• …others
Useful Loop Transform Options
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Another Case Study: Census dataRunning sequentially
Data from 50 states
Running on one computer at a time
About 580 minutes
(just under 10 hours)
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Running in parallel
Running on six computers
About 108 minutes
(under 2 hours)
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Adding more computers
Across nine computers
caalarazcoct
dcdenygahi
iaidil tx
inks kyfloh
la mamdme pa
mimnmoms mt
ncnenh njnm
nv okorri scsd tn
ut vavtwawi
wvndwyak
0 10 20 30 40 50 60 70 80
Minutes (showing start time and duration of each job)
Census Data - Parallel Run 13 - Execution Profile(ETL running 1 job at a time on 9 blades, 1 slot/blade)
~ 77 minutes
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Did we keep the computers busy?
In this case, we really did
Running 6 jobs at a time on 6 processors
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Additional Case Studies
Unstructured Data
…See the paper for more examples
Using Grid harnesses the power of your enterprise
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Questions and Want to know more? Achieving High Availability in a SAS® Grid Environment, Paper
001-2009
What's New in SAS® Data Integration Studio 4.2, Paper 093-2009
For Base SAS® Users: Welcome to SAS® Data Integration!, Paper 092-2009
Cross Validation and Learning Curve Model Comparison with JMP® Genomics and Grid Computing, Paper 286-2009
ISO’s Evolution to BI on the Grid: A Customer Perspective, Mon, 5:30 PM, Maryland 3; Paper 269-2009
Going from Good to Great: The Value of an Analytic Grid Platform at ISO, Tues 11:00 PM, National Harbor 12; Presentation only
The University of Phoenix Wins Big with SAS® Grid , Tues 11:00 PM, National Harbor 5; Presentation only
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Supply a Default Value for Testing
Default value: Week1
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Existing Job
Extract LoadInput
Existing Job
Output
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Reuse existing job and run in parallel
1 Portugal
2 France
3 Spain
4 …
Existing Job
Existing Job
Existing Job
Existing Job
Existing Job
…
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Iteration (Looping) in Parallel
1 Portugal
2 France
3 Spain
4 …
Add parameter to Table:
Name = &Country
1 OutPortugal
2 OutFrance
3 OutSpain
4 …
Add parameter to Table:
Name = Out&Country
Extract LoadInput
Existing Job
Output
Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
What products can leverage SAS Grid Manager? SAS Grid Manager
Distributed Enterprise Scheduling
Multi-User Workload Balancing
Parallel Workload Balancing
Optimize the Efficiency and Utilization of Computing Resources
SAS Data Integration Studio
SAS Enterprise Miner
SAS Risk Dimensions*
Any SAS program*
SAS Stored Processes**
*(with modification)**(with limitations)
SAS Data Integration Studio
Any SAS program*
SAS Enterprise Guide*
SAS Workspace Server
SAS Stored Processes**
*(with modification)**(with limitations)
SAS Data Integration Studio
SAS Web Report Studio
SAS Marketing Automation
SAS Marketing Optimization
Any SAS program *
*(with modification)
top related