con5193 oracle in-memory the game changer in data warehousing and business intelligence

32
Oracle In-Memory - Game Changer in Data Warehousing and Business Intelligence Dr.-Ing. Holger Friedrich

Upload: suchai

Post on 03-Dec-2015

10 views

Category:

Documents


0 download

DESCRIPTION

Open World 2014 CON5193 Oracle In-Memory The Game Changer in Data Warehousing and Business Intelligence

TRANSCRIPT

Page 1: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

Oracle In-Memory - Game Changer in Data Warehousing and Business Intelligence

Dr.-Ing. Holger Friedrich

Page 2: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Agenda

• Introduction • Columnar Stores • Oracle In-Memory • Analytics • Loading • Conclusions

2

Page 3: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2013 sumIT AG 3

sumIT AG

• Consulting and implementation services in Switzerland • Experts for

– Data Warehousing and – Business Intelligence solutions

• Focussed on Oracle technology • ‘BI Foundation specialized’ partner • ‘Data Warehousing specialized’ partner • Exalytics competence center with own server • Our motto: Get Value From Data • Visit our web site: www.sumit.ch

(in German)

Page 4: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2013 sumIT AG 4

Holger Friedrich• Computer Science diploma of

Karlsruhe Institute of Technology (KIT) • Ph.D. in Robotics and Machine Learning • More than 16 years experience with Oracle technology • Expert for

– Data Integration – Data Warehousing, – Data Mining and – Business Intelligence

• Technical Director of sumIT AG !

• First Oracle ACE for DWH/BI in Switzerland

Page 5: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Agenda

• Introduction • Columnar Databases • Oracle In-Memory • Analytics • Loading • Conclusions

5

Page 6: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Advantages

• Best for queries that - scan large quantities of data - on a rather small set of columns - compute aggregates on the

results

• High compression benefits on most columns (except ones containing distinct values)

6

Well suited for OLAP/BI

Page 7: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Drawbacks (Up To Now)• Some operations very costly

- DML - Queries retrieving entire rows !!!

• Complex DBMS infrastructure has to be build once more - storage (management) - security - clustering - disaster recovery - …

7

Less suited for OLTP

Page 8: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Competition

• Niche vendors - Exasol - HP Vertica - Infobright - Paracell !

• The usual suspects - Microsoft (Columnstore Indexes) - IBM - Teradata - and of course SAP/HANA

8

Page 9: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Agenda

• Introduction • Columnar Stores • Oracle In-Memory • Analytics • Loading • Conclusions

9

Page 10: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Columnar Stores/DBs - Oracle’s Flavour

10

• transparent column store managed next to the row store • not either/or • persistent storage row-based as before • column store DML-synched in real-time • the entire Oracle DB-ecosphere remains unchanged

- security - backup - disaster recovery - RAC - …

• NO application changes required!

Page 11: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Technology Gems

1. In-memory storage index 2. Filtering on binary compressed data 3. Columnar storage of selected columns 4. Transparent querying across storage hierarchy 5. Real-time background actualization of columnar store 6. Parallel query execution on the columnar store 7. SIMD vector processing 8. In-memory fault tolerance on RAC 9. On-demand building of multi-dimensional aggregation

data structure (almost an on-the-fly MOLAP cube)

11

Page 12: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

In-Memory Storage Index

12

• Column data ist stored separated in compression units (IMCUs) • In-Memory Storage Indexes store Min/Max values for each

column for each IMCU • IMCUs with Min/Max outside

a query predicate can besafely ignored duringprocessing

• v$mystat shows informationabout number of IMCUsassessed vs. IMCUs pruned

Memory

SALESColumn Format

Min 1 Max 3

Min 4 Max 7

Min 8 Max 12

Min 13 Max 15

Example: Find sales from stores with a store_id of 8 or higher

Page 13: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

SIMD Vector Processing

13

• Single Instruction processing Multiple Data values

• Evaluation of a set of column values in a single CPU instruction cycle

• Potential to speed up processing to billions of rows per second

Load multiple PROMO_ID values

Vector Compare all values in 1 cycle

CPU

PRO

MO

_ID

9999

99999999

9999

Example: Find all sales With PROMO_ID 9999

VECT

OR

REG

ISTE

R

Memory

Page 14: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

In-Memory Aggregation

14

• New optimizer transformation Vector Group By • Resembles well-known star transformation • Two phase, 6 step process • Phase 1 - preparation

1. Scan dimensions 2. Build key vectors 3. Prepare accumulator 4. Build tmp-tables for dim select attributes

• Phase 2 - computation 5. Scan facts w.r.t. key vectors

6. Join filtered facts with tmp-tables

Page 15: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

In-Memory Aggregation - XPLAN

15

Page 16: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

In-Memory on RAC Including Fault Tolerance

16

• Distribution of large objects’ in-memory compression units (IMCUs) – automatically (default) –BY ROWID RANGE –BY {SUB}PARTITION

• Fault tolerance (engineered systems only) – DISTRIBUTE clause to keep

redundant IMCU copies on nodes – DISTRIBUTE ALL = each IMCU

copied to every node

Page 17: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Assessment

17

• The In-Memory-Option can extremely improve query performance • In particular data scanning is benefiting • Joins & Vector-By aggregations are accelerated as well • However, it is advanced technology not magic • Sorting, classic aggregation etc. still take time

Scan Data Aggregate

t

Row Store

Scan Data AggregateIn-Memory

Scan Data AggregateRow StoreScan Data AggregateIn-Memory

Join / Sort / Group / …

Join / Sort / Group / …

Page 18: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Agenda

• Introduction • Columnar Stores • Oracle In-Memory • Analytics • Loading • Conclusions

18

Page 19: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Unprecedented Performance for…

• Reporting queries - Simple - SQL*Analytics

• (Tool based) OLAP • Dimensional queries

19

Page 20: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Simple Reporting Queries • Query characteristics

- few joins - simple one-step aggregations (if at all) - lots of filtering - sometimes many rows and to be displayed

• Processing - scanning in columnar store use IMCU storage indexes - join by bloom filtering applied on columnar store - scanning and joining effort far outweighs other processing effort - but large number of rows may need time to transfer to client - SIMD computation can be used on a large scale

• In-Memory impact – high performance gains

20

Page 21: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

SQL*Analytics Reporting Queries

• Query characteristics - some joins - complex analytic functions - lots of filtering - often many attributes to be displayed

• Processing - scanning in columnar store use IMCU storage indexes - join by bloom filtering applied on columnar store - share of processing effort other than scanning and joining rises - SIMD computation can be used

• In-Memory impact – gain of performance, but smaller than for more simple reporting queries

21

Page 22: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

(Tool Based) OLAP

22

• Query characteristics – horrendously complex queries – chaining of with clauses – complex analytic functions and aggregations

• Processing - short scanning time - hard for optimizer to find efficient plan - materialization of temporary results ‘breaks' pure columnar processing - intermediate computation effort exceeds columnar in-memory share of effort

• In-Memory impact – gain of using in-memory option depends on query complexity – the need for pre-computing (some) aggregates remains

Page 23: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Dimensional Queries• Characteristics

- few simple joins (star shape) - filtering on dimensions - most aggregations along dimension attributes - massive amount of facts - sometimes massive dimensions

• Technology & consequences - short scanning time - application of optimizer's new vector-group-by transformation

• In-Memory impact – high performance gain

23

Page 24: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Different Reporting Queries

24

Page 25: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Acceleration Of Reporting Queries

25

report type no of rows result set row store (SGA)

(SGA)

columnar store times X

simple 400K 35 10ms 2ms 5

join

(bloom)

14M & 55K 2M 25s 25s 1

join, top10(analytics)

14M & 55K 10 2s 1s 2

dimensional (vector by)

14M & 1.8K & 72 88 8s 0.8s 10

• Demo comparing SGA row based vs. in-memory columnar store • Small Virtual Machine • No SIMD support in demo environment • Serial execution

Higher gains on enterprise infrastructure

Page 26: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Agenda

• Introduction • Columnar Stores • Oracle In-Memory • Analytics • Loading • Conclusions

26

Page 27: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Data Quality & Consistency Assessment

• Typical tests - column value checks - intra row checks - inter row checks - inter table checks

• Challenge - often complex conditions - functions have to be applied - costly, also in columnar store - e.g. not REGEXP_LIKE (ssnumber, ‘\d{3}\.\d{4}\.\d{4}\.\d{2}’)

• Observation - gain depends significantly on test complexity

27

Page 28: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Meta Data Transformation During ETL

• Typical scenario - transformation of source dependent (domain) data into DWH

standard representation - usually using mapping tables - e.g. sourcesys=‘SAP’ and

gender = ‘0' => return ‘male'

- typical case of joins without aggregation

• Challenge - staging tables initially not in column store

• Strategy - populate only columns to be transformed into column store - check population time vs. speed gain

28

mapping tablestaged src

src system is JD Edwards

gender entries for all rows

gend

er

DW

H g

ende

r

src

syst

em

gend

er

Page 29: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Key Transformations

• Typical scenario - transformation of source dependent natural/business keys into DWH owned

surrogate representation - reverse lookups for data mart loading - multiple (outer) joins against target tables - typical case of (outer) joins without aggregation

• Challenge - staging tables initially not in column store

• Strategy - populate only rows to be transformed into column store - check population time vs. speed gain - works also with lookup tables in columnar and staging table in row format

29

Page 30: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Example Key Lookup Query

30

select s.invoicenumber, s.year, s.audit_id, s.cutoffdt, r.id invoice_id, m.id member_id from (select * from st_db_rechnung_in_t where rownum < 100000) s, db_rechnung_ht r, pv_mitglied_ht m where s.invoicenumber = m.invoicenumber (+) and s.invoicenumber = r.invoicenumber (+) and s.year = r.year (+) and s.incoiceitem = r.incoiceitem (+) and s.srcmodifieddt > SYSDATE-720

1. scan staging table

3. outer join to lookup tables

2. take last 2 years

4. return DWH-IDs plus some other stuff

Page 31: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Chaining Of Bloom Filters

31

-------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99999 | 9081K| | 2848 (1)| 00:00:01 | |* 1 | HASH JOIN OUTER | | 99999 | 9081K| 8008K| 2848 (1)| 00:00:01 | | 2 | JOIN FILTER CREATE | :BF0000 | 99999 | 6835K| | 1090 (1)| 00:00:01 | |* 3 | HASH JOIN OUTER | | 99999 | 6835K| 6840K| 1090 (1)| 00:00:01 | | 4 | JOIN FILTER CREATE | :BF0001 | 99999 | 5664K| | 278 (1)| 00:00:01 | |* 5 | VIEW | | 99999 | 5664K| | 278 (1)| 00:00:01 | |* 6 | COUNT STOPKEY | | | | | | | | 7 | TABLE ACCESS FULL | ST_DB_RECHNUNG_IN_T | 99999 | 3613K| | 278 (1)| 00:00:01 | | 8 | JOIN FILTER USE | :BF0001 | 395K| 4637K| | 29 (7)| 00:00:01 | |* 9 | TABLE ACCESS INMEMORY FULL| PV_MITGLIED_HT | 395K| 4637K| | 29 (7)| 00:00:01 | | 10 | JOIN FILTER USE | :BF0000 | 781K| 17M| | 73 (13)| 00:00:01 | |* 11 | TABLE ACCESS INMEMORY FULL | DB_RECHNUNG_HT | 781K| 17M| | 73 (13)| 00:00:01 | --------------------------------------------------------------------------------------------------------------

1. scan staging table2. create Bloom filters on lookup tables

3. apply Bloom filters

4. hash-join bloom filter false positives

Page 32: CON5193 Oracle in-Memory the Game Changer in Data Warehousing and Business Intelligence

03/2012© 2014 sumIT AG

Conclusions

• Oracle In-Memory is a game changer on the DWH/BI market – in contrary to niche players it is absolutely enterprise ready – in contrary to the other big players its use requires no modifications

• Therefore, In-Memory provides a big leap in performance with - low risks - low project-, infrastructure-, maintenance- & development cost

• However, In-Memory is no silver bullet • Speed-up varies very much on query complexity • Good design of ETL processes & analyses remains important • Powerful infrastructure is still required

(think about using Oracle Engineered Systems)

32