sas , sun, oracle: on mashups, enterprise 2.0 and ideation · sas ®, sun, oracle: on mashups,...
TRANSCRIPT
SAS®, Sun, Oracle: On Mashups, Enterprise 2.0 and Ideation
Charlie Garry, Director, Product Manager, Oracle CorporationCharlie Garry, Director, Product Manager, Oracle CorporationPaul Kent, Vice President, Platform R&D, SASMaureen Chew, Principal Software Engineer, Oracle Corporation
Agenda
� Getting to Enterprise 2.0 – The Optimized Information Infrastructure
� Speaking / Tweeting towards All Things In-Database
� SQL optimization (the obvious, not-so-obvious, unobvious)
� In-database – current landscape� In-database – current landscape
� In-database – A Look Ahead
� Apple iPad Drawing
Analytics and Enterprise 2.0
� Market estimated to be $105B with a CAGR of 7%
� Layer Enterprise 2.0 on top
� Even MORE information
� Is the information timely?
� Is the information clean?� Is the information clean?
� Is the information accurate?
Non-Conventional Thinking
“Almost two out of three data centers (63.0%)
worldwide report a ‘dramatic’ increase in their
storage requirements over the past five years”
(2009/2010 AFCOM Data Center Trends Survey Results & Analysis)
The Result
� MORE SERVERS – Poor Utilization
� MORE STORAGE – Poor Utilization
� MORE COMPLEXITY – Inability to Change or Integrate Quickly/Safely
Simplify/Optimize Your Information Infrastructure
�Reduce Analytic Latency
�Reduce OPEX and CAPEX
�Faster Time-To-Value
6
� Offload Infrastructure Design and Maintenance
© 2010 Oracle Corporation
SAS and Oracle
“Our customers are very interested in getting maximum value out of our business analytic solutions and that means putting less effort into provisioning and managing infrastructure. Our ability to partner with and leverage the fusion of Sun into Oracle to simplify infrastructure will be a benefit to our mutual
On Collaboration ...
infrastructure will be a benefit to our mutual customers.”
Keith Collins, SAS Senior Vice President and Chief Technology Officer
Sun & Oracle – A Better Platform for SAS
� SAS uses many Oracle and Sun technologies
� Solaris is a leading UNIX deployment platform for SAS
� Sun HW / Storage
� WebLogic
� Java
� LDAP� LDAP
� ACCESS / Oracle / Exadata
� MySQL
Oracle & SAS Collaboration
� Partnership & Collaboration
� High end performance testing
» SAS Enterprise BI, Sun Enterprise M9000
» JMP Genomics
» SAS Grid
» Sun Blade 6000» Sun Blade 6000
» Sun ZFS Storage 7420
� Broad Engineering collaboration
� http://oracle.com/sas
SQL Optimization – the “Obvious”
� UNION, MINUS, INTERSECT : sort to elminate duplicate rows; UNION ALL : no sort, includes dups
� IN vs EXISTS
� Queries using IN or NOT IN could convert to EXISTS / NOT EXISTS (or vice versa) - bit.ly/gZvzeM
� Wildcard search against an index� Wildcard search against an index
� Indexes (ie: COL) usable only from beginning of column
» “COL like 'abc123%'” uses index, “COL like '%abc123%'” does not
� Functions cannot use index, create “functional” index
� UPPER(COL)='ABC123' → create index idx on tablename(UPPER(COL));
SQL Optimization – the “less Obvious”
� Collect good statistics using DBMS_STATS
� Poor query performance can result from stale stats, data skew
� Partition large tables
� ie: Partition data by week - retrieves 1/52 of table
� CTAS instead of UPDATE/DELETE (DML)
� If deleting large number of rows, often better to CREATE TABLE xyz AS SELECT … from abc”
� INSERT with APPEND hint bypasses buffer cache and typically faster than conventional inserts
� Use parallelism – ie: query, dml, data load, replication, ... (bit.ly/eLFRQy)
SQL Optimization – the “unObvious”� Maria Colgan – Top Tips for Getting Optimal SQL
Execution All the Time
� Cardinality
» How to combat common causes for incorrect cardinality
� Access path
» What causes the wrong access path
� Join type
» Common causes for why the wrong join type was selected
� Join order
» Common causes for why the wrong join order was selected
� References
� Preso above: bit.ly/enQxBK
� Optimizer blog: blogs.oracle.com/optimizer
Tactics for Pushing SQL whitepaper
Tactics ... – TOC excerpts
In-Database Processing – Performance
� libname x oracle insertbuff=1000 ...
� SAS/ACCESS - dbslice – threaded read – data step 116GB file
set GE_Data.OBSERVATION_F ( DBSLICE=
("MOD(OBSERVATION_KEY,2)=0" "MOD(OBSERVATION_KEY,2)=1" ));
DATA Step Time Total Run TimeDATA Step Time Total Run Time
DATA Step Only 5 hrs, 4 min, 21 sec 8 hrs, 26 min, 38 sec
Input from Exadata 4 hrs, 36 min, 22 sec 7 hrs, 19 min, 26 sec
Input from Exadata +threaded read 1 hr, 47 min, 53 sec 4 hrs, 51 min, 23 sec
In-Database Processing – Current Landscape
� Pass through
� Implicit SQL – SAS code converted to SQL passthrough (Use Tactics for Pushing SQL to the Relational Database)
» 9.2M2 -significant improvements (inline views, SQL views, tables, expressions using CALCULATED keyword, SELECT, WHERE, HAVING, ON, GROUP BY, ORDER BY clauses)
� Explicit SQL� Explicit SQL
� In-database BASE PROCS for Oracle
� Available as of 9.2M3 (http://bit.ly/gMCvbo)
� FREQ, RANK, REPORT, SORT, SUMMARY/MEANS, TABULATE
The Most Important Acceleration Strategies
Co-location (of data and analytics)
Co-location (of data and analytics)
Co-location (of data and analytics)
Avoid the disk, use memory
ParallelizeParallelize
But, co-location
has many technological solutions
has to be done right
has to adjust to the complexity of the analytic task
Acceleration Strategies With DBMS
Customers want to improve response times to SAS workload that accesses data inside DBMS
What are the options
Re-state the work as SQL, let DBMS parallelize
Extend SQL with UDFs Inside-DB
SQL-PassThru
Extend SQL with UDFs
Go beyond the simple (obvious) transforms
Put SAS CPUs closer to DBMS CPUs Alongside-DB
Inside-DB
Inside-DB
Teradata, Greenplum…… but what about Oracle
Progress Report
It can be done!
We have the basic wiring assembled
We need to test the performance
Co-location. Not so muchCo-location. Not so much
Stay Tuned
Interested? [email protected]
Thank You
Charlie Garry, [email protected]
Paul Kent, [email protected]
Maureen Chew, [email protected]
Appendix
Sun Blade 6000; Sun Blade X6270 M2 server modules
Sun ZFS Storage 7420Sun Blade 6000 Ethernet Switched
Network Express Module (10GbE, 24p)
Sun ZFS Storage 7420
SAS® Grid Computing, Sun Blade 6000,Sun ZFS Storage 7420
Extreme Compute and I/O Performance
Sun ZFS Storage 7420 Extreme Performance� SAS Grid workload � Shared Read/Write filesystem via NFS� 10 x Sun Blade X6270 M2
� 2.01 GB/sec via 10GbE� even read/write load
ZFS 7420 I/O analytics� Grid I/O by client
Sun Blade 6000; Sun Blade X6270 M2 server modules
Sun ZFS Storage 7420Sun Blade 6000 Ethernet Switched
Network Express Module (10GbE, 24p)
SAS® Grid Computing, Sun Blade 6000,Sun ZFS Storage 7420
Extreme Compute and I/O Performance
modules
Sun ZFS Storage 7420 Extreme Performance� SAS Grid workload � Shared Read/Write filesystem via NFS� 10 x Sun Blade X6270 M2
� 2.01 GB/sec via 10GbE� even read/write load