www.semantec.de oracle 8i/9i features which support data warehousing author: krasen paskalev...

Post on 12-Jan-2016

225 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

www.semantec.de

Oracle 8i/9i features which support Data Warehousing

Author: Krasen Paskalev

Certified Oracle DBA

Semantec GmbH.

D-71083 Herrenberg

www.semantec.de

Agenda

• ETL Features

• Data Warehouse Management

• Data Warehouse Querying

• Parallel Operations

www.semantec.de

Agenda

• ETL (Extraction, Transformation, Transportation and Loading)

– Transportable Tablespaces– External Tables– Table Functions– MERGE Statement

• Data Warehouse Management• Data Warehouse Querying• Parallel Operations

www.semantec.de

Transportable tablespaces

• The fastest method for moving data between databases

• The tablespeces with all their data are plugged into the data warehouse database

Production Data WarehouseTablespace Tablespace

ftp

www.semantec.de

External Tables

• Can be directly queried and joined in SQL, PL/SQL and Java

• Avoid data staging• One step loading and transformation• Save DB space

ASCIIfile

Excelsheet

Read-only virtual tables

External files

www.semantec.de

Table Functions• Can take a set of rows as input• Can return a set of rows as output• Can be used in the FROM clause• Can be paralellized• Can be pipelined• User defined in PL/SQL, Java or C

Region %

West

Central

East

30

50

20

Sales

TableFunction

www.semantec.de

Table Functions

• Pipelining Data Transformation

TableFunction

TableFunction

Source TargetStep 1 Step 2

Log table

www.semantec.de

MERGE statement

id amount

4 3000

8 1000

9 2000

id amount

4 5000

7 3000

8 6000

9 2000

UPDATE

UPDATE

INSERT

new_sales sales

MERGE INTO sales s

USING new_sales n

ON (s.id = n.id)

WHEN MATCHED THEN

UPDATE s.amount = s.amount + n.amount

WHEN NOT MATCHED THEN

INSERT (s.id, s.amount)

VALUES (n.id, n.amount)

id amount

4 2000

7 3000

8 5000

www.semantec.de

MERGE Advantages

• Single simple SQL statement

• Can be paralellized

• Can use Bulk DML

• Fewer scans of the base table

www.semantec.de

More ETL Features

• Direct-path Interface– SQL*Loader– CREATE AS SELECT– INSERT– Oracle Call Interface

• Multi-table INSERTs

www.semantec.de

Agenda

• ETL Features

• Data Warehouse Management– Partitioning– Materialized Views– DBMS_STATS

• Data Warehouse Querying

• Parallel Operations

www.semantec.de

Partitioning

Jan‘2002

Tablespace 0102

Feb‘2002

Tablespace 0202

Dec‘2002

Tablespace 1202

...

Table Sales

www.semantec.de

Advantages of Partitioning

• Partition independance– LOAD, MOVE, Purge and DROP partitions– MERGE, SPLIT, EXCHANGE partitions– BACKUP, RESTORE, SET READ ONLY

• Partition elimination– SELECT or JOIN only the partition needed

• Parallel Operations– SELECT, UPDATE, DELETE, MERGE

www.semantec.de

Partitioning Methods

• Hash Partitioning– Even row distribution by hash function

• Range Patitioning– <01.01.2002 | <01.02.2002 | ... | <01.01.2003

• List Partitioning– Stuttgart, Munich | Manheim, Frankfurt | ...

www.semantec.de

Table Compression

• Stores tables or partitions in compressed format

• Reduces disk space requirements• Reduces memory requirements• Speeds up query execution• Speeds up backup and recovery• Very efficient for highly redundant data –

the FACT table• 2 to 4 times compression is usual

www.semantec.de

Materialized Views

revenue_sum

region month revenue

sales

region month invc_sum...

SELECT region, month,

sum(invc_sum) revenue

FROM sales

GROUP BY region, month

www.semantec.de

Advantages of Materialized Views

• Improved query/reporting performance for:– Summaries– Agregates– Joins

• Fast Refresh– Data change tracking– Partition change tracking

• No application change needed – their usage is automatic

www.semantec.de

DBMS_STATS

• New package for gathering table and index statistics

• Gathers statistics in parallel

• Can export and import statistics

ProductionData Warehouse

DevelopmentData Warehouse

Statistics

www.semantec.de

More Data Warehouse Management Features

• Index-organized tables

• Online index rebuild

• Online table rebuild

www.semantec.de

Agenda

• ETL Features• Data Warehouse Management• Data Warehouse Querying

– Bitmap Indexing

– Star Query Transformation

– Agregation – ROLLUP, CUBE, Grouping Sets

– Analytic functions

• Parallel Operations

www.semantec.de

Bitmap IndexesRegion east central west NULL

rowid 1 0 0 0

rowid 0 0 1 0

... 0 0 0 1

rowid 0 1 0 0

1

0

0

0

0

1

0

0

0

0

1

0

OR AND NOT( ) =

1

1

0

0

www.semantec.de

Advantages of Bitmap Indexes

• Reduced response time for ad-hoq queries• Uses much less space than a B-tree index• Dramatic performance gains for large class

of queries:– Multiple AND, OR and NOT conditions– IS NULL conditions– COUNT– NOT IN - Bitmap MINUS– BETWEEN - Bitmap UNION

www.semantec.de

Star Query Transformation• The query is re-written for efficient execution

sales cust_id prod_id amountq_id

cust_id name prod_id name q_id name

customers products quarters • Steps:1. Filter all

dimentions

2. Combine the bitmap indexes of the fact table‘s foreign keys

3. Retrieve fact and dimention other rows

www.semantec.de

Agregation Operators

• Oracle extends the GROUP BY clause by:– ROLLUP– CUBE– Grouping Sets

2500 8000

4000

6500

10500

SELECT SUM(amount)

FROM sales

GROUP BY county, quarter

Q1

Q2

UK US

1000 3000

1500 5000

www.semantec.de

ROLLUP and CUBE

ROLLUP(country, department, quarter)

(country, department, quarter)

(country, department)

(country)

() - Grand Total

CUBE(country, department, quarter)

(country, department, quarter)

(country, department)

(country, quarter)

(department, quarter)

(country)

(department)

(quarter)

() - Grand Total

ROLLUP – subtotals at increasing levels of agregation – from right to left

CUBE – subtotals on all combinations

n+1

2n

www.semantec.de

Agregation Operators Advantages

• Applicable on many agregation functions:– SUM, AVG, COUNT– MIN, MAX– STDDEV, VARIANCE

• Flexible agregation groups and levels

• Runs in parallel

www.semantec.de

Analytic functions

• Significantly improved performance for complex reports as:– Ranking – Find top 10 sales in each region– Moving agregates – What is the 90 day moving

sales average?– Period-over-period comparison – What are the

revenues from January 2002 compared to January 2001?

www.semantec.de

Example – Moving WindowSELECT c.cust_id, t.month,

SUM(amount_sold) SALES,

AVG(SUM(amount_sold))

OVER (ORDER BY c.cust_id, t.month ROWS 2 PRECEDING) MOV_3_MONTH

FROM sales s, times t, customers c

WHERE s.time_id = t.time_id AND

s.cust_id = c.cust_id AND

t. year = 1999 AND

c.cust_id IN (6380)

GROUP BY c.cust_id, t.month

ORDER BY c.cust_id, t.month;

CUST_ID MONTH SALES MOV_3_MONTH

------- ------- ------- -----------

6380 1999-01 19,642 19,642

6380 1999-02 19,324 19,483

6380 1999-03 21,655 20,207

6380 1999-04 27,091 22,690

6380 1999-05 16,367 21,704

6380 1999-06 24,755 22,738

www.semantec.de

More Data Warehouse Querying Features

• Function-based Indexes• Optimizer Plan Stability• Statistics for Long Running Operations• Resumable Statements• Full Outer Join• With Operator• Oracle Text “Advanced Searching with Oracle Text”

14.11.2002, 2nd Conference day

11:50-12:30, Konferenzraum EG

www.semantec.de

Agenda

• ETL Features

• Data Warehouse Management

• Data Warehouse Querying

• Parallel Operations

www.semantec.de

Parallel Operations

• Dramatically reduce execution time of data intensive operations

• Loading– Direct Path Load

• DDL Statements– CREATE AS SELECT, CREATE INDEX– REBUILD INDEX, REBUILD INDEX PARTITION– MOVE, SPLIT, COALESCE PARTITION

• DML Statements– INSERT AS SELECT– UPDATE, DELETE and MERGE

www.semantec.de

Parallel Operations

• Access methods– Table and index range and full scans

• Join methods– Nested loops, Sort merge, Hash, Star transformation

• SQL operations– GROUP BY, ROLLUP , CUBE

– DISTINCT, UNION, UNION ALL

– Agregate functions

www.semantec.de

Parallel System Requirements

• Symetric Multiprocessor Systems, Clusters or Massively Parallel Systems

• Sufficient I/O Bandwidth

• Sufficient (Underutilized) CPUs

• Sufficient Memory

www.semantec.de

Summary

• Effective handling of multi-terabyte Data Warehouses

• Rich feature set for all Data Warehouse operations

• Flexible agregation and analytical features for high performance queries

• Effective parallelizm

www.semantec.de

Want to know more?

Telephone:

Telephone:

Fax:

E-Mail:

Internet:

Company:Name:

Address:

Semantec GmbH.

Krasen Paskalev, Armin Singer, Peter Kopecki

Benzstr. 32D-71083 Herrenberg, Germany

Meet us here -> booth 2C at the ground floor

+49(7032)9130-0

+49(7032)9130-12

+49(7032)9130-22

krasen.paskalev@semantec.bg

singer@semantec.de

www.semantec.de

top related