oracle index study for event tag db m. boschini matteo.boschini@mib.infn.it s. della torre...

Post on 30-Dec-2015

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Oracle Index study for Event TAG DB

M. Boschini matteo.boschini@mib.infn.itS. Della Torre stefano.dellatorre@mib.infn.it

Outline TAG DB issue:

space usage issue INDEX accounting for ~ 50% of used space...

query performance Tests with random data Tests with AMS01 data Tests with AMS02 Cosmic data

M. Boschini Oracle Index Study for STATUS DB – TIM October 2008 - CERN

EVENT TAG DB Goal: test feasibility of storing in an Oracle DB TAG

info for AMS02 events TAG is a 64bit number AMS02 MC has, as for now, no meaningful TAG info

(0 or error bit) We start from previous studies:

http://ams.cern.ch/AMS/Reports/AMSnote-2000_09_05b.ps.gz

http://ams.cern.ch/AMS/Reports/Computing/Apr2004/M.Boschini_LV3_DB.pdf

EVENT TAG DB

Test INDEX type space usage Test INDEX usage Time insert & selections What can we use ?

test environment SLC 4.6 (64bit) 2 dual core Intel(R) Xeon(R) CPU @ 2.00GHz Oracle 11g 64 bit

we opted for Oracle11g because of product lifetime All data is fed to DB using C-OCI programs Oracle setup:

user AMSDES with dedicated Bigfile tablespace overkill wrt Oracle's suggestion (>= 1 TB) ...?

default TEMP TBLSpace (3 GB)

test environment Dummy Table:

RUN (NUMBER) EVENT (NUMBER) TAG (BINARY_DOUBLE)

TAG field will be used for indexing ...assuming queries will be mainly on TAG

10^8 records, equivalent to ~ 10 days of DAQ Test B-Tree vs BitMap Index

as hinted by our previous studies...

Indexes (theory) Btree

default index type in Oracle very space consuming. Not advised for dataware housing

dataware house: write few, read many, HUGE data sample BitMap

advised for dataware housing Bitmap indexes are stored as an array of zero-and-one

values, with one entry for each row. Should be used only on low cardinality fields

Indexes (theory) BitMap

From Oracle Advanced Programming White Paper, Sep. 2008

“...Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems...”

Tests Items to be tested are:

SPACE OCCUPANCY = f(index type) QUERY PERFORMANCE = f(index type)

Index types: B-Tree and Bitmap We started with a random generated sample We then used an AMS01 sample

40 ntuples no request on charge, pmass, pmom, lat (nothing!)

AMS02 Cosmic data

Disk Space TABLE SIZE = 5 GB

this means that for AMS02 we'll need 500 GB B-Tree INDEX SIZE = 4.8 GB

this means that for AMS02 we'd need 480 GB BitMap INDEX SIZE = 70 MB

this means that for AMS02 we'd need 7 GB thus, at least for space reasons, we should use

BitMap INDEX, a factor 30 smaller.

when to create an INDEX After TABLE has been populated ! B-Tree INDEX creation time:

4 µsec/record mainly SORT and disk space allocation

BitMap INDEX creation time: 0.5 µsec/record

Theory: is an INDEX useful ? Not always... ORACLE has a built-in decision taking algorithm

(EXECUTION PLAN) which decides how to actually implement a query.

Execution plan tries to optimize disk accesses and CPU usage.

So, e.g., if reading an index which returns “many” records takes more than X, than the index is not used.

These are based, since Oracle 10.0.2i on Cost Based Optimizer and Dynamic Sampling (by default ON)

EXECUTION PLANPLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3591811347

------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 |

| 1 | TABLE ACCESS FULL | N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:05 |

EXECUTION PLANPLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 3591811347

------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 |

| 1 | TABLE ACCESS BY INDEX ROWID| N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:04 |

|* 2 | INDEX RANGE SCAN | TAG_RAND_IDX | 36793 | | 106 (0)| 00:00:02 |

IDX usage Test To test if Index is actually used, we wrote a simple

PLSQL+Perl program that Dynamically populates statistics table

HOW-MANY <--> TAG_STA For each TAG_STA value, generates and analyzes

EXECUTION PLAN for SELECT run, event from TABLE wher tag=TAG_STA;

We thus run CARDINALITY queries to DB and analyze the Optimizer decisions...

Sample1 CARDINALITY:5relative population > 5.7%

Sample2 CARDINALITY:2517relative population: 6x10E-4

Performance Sample1 and Sample2 both do NOT use INDEX in the

select query. This is because Oracle's built-in optimization algorithm

discovers that too many records will be returned, and thus decides to ignore the INDEX

Thus: LOW/MEDIUM cardinality with HIGH relative

population makes INDEX uneffective

Sample3 CARDINALITY:1024relative population 6x10E-5

Performance Sample3 uses INDEX ! This is because Oracle's builtin optimization algorithm

discovers that TABLE ACCESS BY INDEX ROWID is efficient.

Thus LOW/MEDIUM cardinality with “correct for

Oracle” relative population makes INDEX effective !

Real Data: AMS01 We then used an AMS01 sample

40 ntuples (no request on charge, pmass, pmom, ecc) “scrambled” this sample to get 10^8 records

Average relative cardinality = 2x10E-6 BitMap Index is USED ! Selection time

no IDX: 22 sec IDX: 0.05 sec

INDEX really effective

Real Data: AMS02 !!! We also used AMS02 Cosmic Rays data

we used only 2 2x10E8 events no request on charge, pmass, pmom, ...(nothing!)

Table size: 5.5 GB 2 BitMap IDX size: 57 MB 2

just for the sake of curiosity: a UNIQUE requirement on RUN+EVENTNO creates an INDEX 5 GB big !!!

Btree IDX size: 4.2 GB 2

Real Data: AMS02 !!! Mean Relative cardinality: 1.8x10E-4 Max Relative cardinality: 3x10E-3 BitMap Index can be USED !

tested EXECUTION PLAN for all TAG values.

Real Data: AMS02 !!! Selection time

no IDX: Average Selection time: 24 sec 0.01 sec/record once retrieved

IDX: Average Selection time: 0.05 sec 0.005 sec/record once retrieved

Again, B-Map index is very effective.

naïve usage... The most naïve way to design the DB is 1 table for

each month of DAQ. This leads to ~36 tables. Most naïve way to query them all is using a UNION

statement which still uses index with nearly no overhead for the SORT

UNIQUE/UNION-ALL part...

PLAN_TABLE_OUTPUT------------------------------------------------------------------------------------------------------------------------------------Plan hash value: 2624995586

----------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 44 | 1408 | 290K (50)| 00:58:03 || 1 | SORT UNIQUE | | 44 | 1408 | 290K (50)| 00:58:03 || 2 | UNION-ALL | | | | | || 3 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV | 42 | 1344 | 145K (1)| 00:29:10 || 4 | BITMAP CONVERSION TO ROWIDS| | | | | ||* 5 | BITMAP INDEX SINGLE VALUE | COMIX_TDV | | | | || 6 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV1 | 2 | 64 | 144K (1)| 00:28:53 || 7 | BITMAP CONVERSION TO ROWIDS| | | | | ||* 8 | BITMAP INDEX SINGLE VALUE | COMIX_TDV1 | | | | |----------------------------------------------------------------------------------------------

44 rows returned out of 377,516,896 rows in total

Rough estimate for SELECT on 3 years: 25 minutes...

Conclusions In general, Bit-Map index is very efficient in space usage

30 times smaller than normal Btree index Index is used in queries if relative cardinality is low If so, selection time is very low. According to AMS01 and AMS02 Cosmic Rays

analyzed, BitMap index can be used We could thus use it for AMS02 expected space usage: 500 GB (table)+ 6 GB (idx)

EventStatusTable02 files will use ~ 150 GB...

AMS01 sample CARDINALITY:89860

average relative population 5.7%

top related