database warehousing

59
Business Intelligence Chapter 31 Data Warehousing Concepts 1149 Chapter 32 Data Warehousing Design 1181 Chapter 33 OLAP 1204 Chapter 34 Data Mining 1232 9 Part

Upload: harish-kuppusamy

Post on 24-Dec-2015

259 views

Category:

Documents


2 download

DESCRIPTION

Database Warehousing

TRANSCRIPT

Page 1: Database Warehousing

Busin

ess In

tellig

ence

Ch

ap

ter 3

1D

ata

Ware

housin

g C

oncep

ts1149

Ch

ap

ter 3

2D

ata

Ware

housin

g D

esig

n1

181

Ch

ap

ter 3

3O

LA

P1204

Ch

ap

ter 3

4D

ata

Min

ing

1232

9P

art

Page 2: Database Warehousing
Page 3: Database Warehousing

31 C

hap

ter

Data

Ware

housin

g

Concep

ts

Ch

ap

ter O

bje

ctiv

es

In th

is c

hap

ter y

ou w

ill learn

:

nH

ow

data

ware

housin

g e

volv

ed

.

nThe m

ain

concep

ts a

nd

benefits

associa

ted

with

data

ware

ho

usin

g.

nH

ow

Onlin

e T

ransactio

n P

rocessin

g (O

LTP

) syste

ms d

iffer fro

m d

ata

ware

hou

sin

g.

nThe p

rob

lem

s a

ssocia

ted

with

data

ware

housin

g.

nThe a

rchite

ctu

re a

nd

main

com

ponents

of a

data

ware

house

.

nThe im

porta

nt d

ata

flow

s o

r pro

cesses o

f a d

ata

ware

house.

nThe m

ain

tools

and

technolo

gie

s a

ssocia

ted

with

data

ware

hou

sin

g.

nThe is

sues a

ssocia

ted

with

the in

teg

ratio

n o

f a d

ata

ware

house

and

the

imp

orta

nce o

f manag

ing

meta

data

.

nThe c

oncep

t of a

data

mart a

nd

the m

ain

reasons fo

r imp

lem

entin

g a

da

ta m

art.

nThe m

ain

issues a

ssocia

ted

with

the d

evelo

pm

ent a

nd

manag

em

ent o

f data

marts

.

nH

ow

Ora

cle

sup

ports

data

ware

housin

g.

We h

ave alread

y n

oted

in earlier ch

apters th

at datab

ase man

agem

ent sy

stems are p

ervas-

ive th

roughout in

dustry

, with

relational d

atabase m

anag

emen

t system

s bein

g th

e dom

in-

ant sy

stem. T

hese sy

stems h

ave b

een d

esigned

to h

andle h

igh tran

saction th

roughput,

with

transactio

ns ty

pically

mak

ing sm

all chan

ges to

the o

rgan

ization’s o

peratio

nal d

ata,

that is, d

ata that th

e org

anizatio

n req

uires to

han

dle its d

ay-to

-day

operatio

ns. T

hese ty

pes

of sy

stem are called

Onlin

e Tran

saction P

rocessin

g (O

LT

P) sy

stems. T

he size o

f OL

TP

datab

ases can ran

ge fro

m sm

all datab

ases of a few

meg

abytes (M

b), to

med

ium

-sized

datab

ases with

several g

igab

ytes (G

b), to

large d

atabases req

uirin

g terab

ytes (T

b) o

r even

petab

ytes (P

b) o

f storag

e.

Corp

orate d

ecision-m

akers req

uire access to

all the o

rgan

ization’s d

ata, wherev

er it is

located

. To p

rovid

e com

preh

ensiv

e analy

sis of th

e org

anizatio

n, its b

usin

ess, its require-

men

ts, and an

y tren

ds, req

uires access to

not o

nly

the cu

rrent v

alues in

the d

atabase b

ut

also to

histo

rical data. T

o facilitate th

is type o

f analy

sis, the d

ata

wareh

ouse h

as been

created to

hold

data d

rawn fro

m sev

eral data so

urces, m

aintain

ed b

y d

ifferent o

peratin

g

units, to

geth

er with

histo

rical and su

mm

ary tran

sform

ations. T

he d

ata wareh

ouse b

ased o

n

Page 4: Database Warehousing

11

50

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

31

.1

exten

ded

datab

ase tech

nolo

gy

pro

vid

es th

e m

anag

emen

t of

the

datasto

re. H

ow

ever,

decisio

n-m

akers also

require p

ow

erful an

alysis to

ols. T

wo m

ain ty

pes o

f analy

sis tools

hav

e emerg

ed o

ver th

e last few y

ears: Onlin

e Analy

tical Pro

cessing (O

LA

P) an

d d

ata

min

ing to

ols.

As d

ata wareh

ousin

g is su

ch a co

mplex

subject, w

e hav

e dev

oted

four ch

apters to

differen

t aspects o

f data w

arehousin

g. In

this ch

apter, w

e describ

e the b

asic concep

ts asso-

ciated w

ith d

ata wareh

ousin

g. In

Chap

ter 32 w

e describ

e how

to d

esign an

d b

uild

a data

wareh

ouse an

d in

Chap

ters 33 an

d 3

4 w

e discu

ss the im

portan

t end-u

ser access tools fo

r a

data w

arehouse.

Str

uc

ture

of th

is C

ha

pte

r

In S

ection 3

1.1

we o

utlin

e what d

ata wareh

ousin

g is an

d h

ow

it evolv

ed, an

d also

describ

e

the p

oten

tial ben

efits an

d p

roblem

s associated

with

this ap

pro

ach. In

Sectio

n 3

1.2

we

describ

e the arch

itecture an

d m

ain co

mponen

ts of a d

ata wareh

ouse. In

Sectio

ns 3

1.3

and

31.4

we id

entify

and d

iscuss th

e importan

t data fl

ow

s or p

rocesses o

f a data w

arehouse, an

d

the asso

ciated to

ols an

d tech

nolo

gies o

f a data w

arehouse, resp

ectively

. In S

ection 3

1.5

we

intro

duce d

ata marts an

d th

e issues asso

ciated w

ith th

e dev

elopm

ent an

d m

anag

emen

t of

data m

arts. Fin

ally, in

Sectio

n 3

1.6

we p

resent an

overv

iew o

f how

Oracle su

pports a d

ata

wareh

ouse en

viro

nm

ent. T

he ex

amples in

this ch

apter are tak

en fro

m th

e Drea

mH

om

e

case study d

escribed

in S

ection 1

0.4

and A

ppen

dix

A.

Intro

du

ctio

n to

Data

Wa

reh

ou

sin

g

In th

is section w

e discu

ss the o

rigin

and ev

olu

tion o

f the co

ncep

t of d

ata wareh

ousin

g.

We th

en d

iscuss th

e main

ben

efits asso

ciated w

ith d

ata wareh

ousin

g. W

e nex

t iden

tify th

e

main

characteristics o

f data w

arehousin

g sy

stems in

com

pariso

n w

ith O

nlin

e Tran

saction

Pro

cessing (O

LT

P) sy

stems. W

e conclu

de th

is section b

y ex

amin

ing th

e pro

blem

s of

dev

elopin

g an

d m

anag

ing a d

ata wareh

ouse.

Th

e E

vo

lutio

n o

f Data

Wa

reh

ou

sin

g

Sin

ce the 1

970s, o

rgan

izations h

ave m

ostly

focu

sed th

eir investm

ent in

new

com

puter

system

s that au

tom

ate busin

ess pro

cesses. In th

is way

, org

anizatio

ns g

ained

com

petitiv

e

advan

tage th

rough sy

stems th

at offered

more effi

cient an

d co

st-effective serv

ices to th

e

custo

mer. T

hro

ughout th

is perio

d, o

rgan

izations accu

mulated

gro

win

g am

ounts o

f data

stored

in th

eir operatio

nal d

atabases. H

ow

ever, in

recent tim

es, where su

ch sy

stems are

com

monplace,

org

anizatio

ns

are fo

cusin

g on w

ays

to use

operatio

nal

data

to su

pport

decisio

n-m

akin

g, as a m

eans o

f regain

ing co

mpetitiv

e advan

tage.

Operatio

nal sy

stems w

ere nev

er desig

ned

to su

pport su

ch b

usin

ess activities an

d so

usin

g th

ese system

s for d

ecision-m

akin

g m

ay n

ever b

e an easy

solu

tion. T

he leg

acy is th

at

31

.1.1

Page 5: Database Warehousing

31.1

Intro

du

ctio

n to

Da

ta W

are

housin

g|

11

51

a ty

pical

org

anizatio

n m

ay hav

e num

erous

operatio

nal

system

s w

ith overlap

pin

g an

d

som

etimes co

ntrad

ictory

defi

nitio

ns, su

ch as d

ata types. T

he ch

allenge fo

r an o

rgan

ization

is to tu

rn its arch

ives o

f data in

to a so

urce o

f know

ledge, so

that a sin

gle in

tegrated

/

conso

lidated

view

of th

e org

anizatio

n’s d

ata is presen

ted to

the u

ser. The co

ncep

t of a

data w

arehouse w

as deem

ed th

e solu

tion to

meet th

e requirem

ents o

f a system

capab

le of

supportin

g d

ecision-m

akin

g, receiv

ing d

ata from

multip

le operatio

nal d

ata sources.

Data

Wa

reh

ou

sin

g C

on

ce

pts

The o

rigin

al concep

t of a d

ata wareh

ouse w

as dev

ised b

y IB

M as th

e ‘info

rmatio

n w

are-

house’ an

d p

resented

as a solu

tion fo

r accessing d

ata held

in n

on-relatio

nal sy

stems. T

he

info

rmatio

n w

arehouse w

as pro

posed

to allo

w o

rgan

izations to

use th

eir data arch

ives to

help

them

gain

a busin

ess advan

tage. H

ow

ever, d

ue to

the sh

eer com

plex

ity an

d p

erform

-

ance p

roblem

s associated

with

the im

plem

entatio

n o

f such

solu

tions, th

e early attem

pts at

creating an

info

rmatio

n w

arehouse w

ere mostly

rejected. S

ince th

en, th

e concep

t of d

ata

wareh

ousin

g h

as been

raised sev

eral times b

ut it is o

nly

in recen

t years th

at the p

oten

tial

of d

ata wareh

ousin

g is n

ow

seen as a v

aluab

le and v

iable so

lutio

n. T

he latest an

d m

ost

successfu

l advocate fo

r data w

arehousin

g is B

ill Inm

on, w

ho h

as earned

the title o

f ‘father

of d

ata wareh

ousin

g’ d

ue to

his activ

e pro

motio

n o

f the co

ncep

t.

Da

taA

sub

ject-o

riente

d, in

teg

rate

d, tim

e-v

aria

nt, a

nd

non-v

ola

tile c

olle

c-

wa

reh

ou

sin

gtio

n o

f data

in s

up

port o

f manag

em

ent’s

decis

ion-m

akin

g p

rocess.

In th

is defi

nitio

n b

y In

mon (1

993), th

e data is:

nSubject-o

riented

as the w

arehouse is o

rgan

ized aro

und th

e majo

r subjects o

f the en

ter-

prise (su

ch as cu

stom

ers, pro

ducts, an

d sales) rath

er than

the m

ajor ap

plicatio

n areas

(such

as custo

mer in

voicin

g, sto

ck co

ntro

l, and p

roduct sales). T

his is refl

ected in

the

need

to sto

re decisio

n-su

pport d

ata rather th

an ap

plicatio

n-o

riented

data.

nIn

tegra

ted b

ecause o

f the co

min

g to

geth

er of so

urce d

ata from

differen

t enterp

rise-wid

e

applicatio

ns sy

stems. T

he so

urce d

ata is often

inco

nsisten

t usin

g, fo

r exam

ple, d

ifferent

form

ats. The in

tegrated

data so

urce m

ust b

e mad

e consisten

t to p

resent a u

nifi

ed v

iew

of th

e data to

the u

sers.

nT

ime-va

riant

becau

se data in

the w

arehouse is o

nly

accurate an

d v

alid at so

me p

oin

t in

time o

r over so

me tim

e interv

al. The tim

e-varian

ce of th

e data w

arehouse is also

show

n

in th

e exten

ded

time th

at the d

ata is held

, the im

plicit o

r explicit asso

ciation o

f time w

ith

all data, an

d th

e fact that th

e data rep

resents a series o

f snap

shots.

nN

on-vo

latile as th

e data is n

ot u

pdated

in real tim

e but is refresh

ed fro

m o

peratio

nal

system

s on a reg

ular b

asis. New

data is alw

ays ad

ded

as a supplem

ent to

the d

atabase,

rather th

an a rep

lacemen

t. The d

atabase co

ntin

ually

abso

rbs th

is new

data, in

cremen

-

tally in

tegratin

g it w

ith th

e prev

ious d

ata.

There are n

um

erous d

efinitio

ns o

f data w

arehousin

g, w

ith th

e earlier defi

nitio

ns fo

cusin

g

on th

e characteristics o

f the d

ata held

in th

e wareh

ouse. A

lternativ

e defi

nitio

ns w

iden

the

31

.1.2

Page 6: Database Warehousing

11

52

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

scope o

f the d

efinitio

n o

f data w

arehousin

g to

inclu

de th

e pro

cessing asso

ciated w

ith

accessing th

e data fro

m th

e orig

inal so

urces to

the d

elivery

of th

e data to

the d

ecision-

mak

ers (Anah

ory

and M

urray

, 1997).

Whatev

er the d

efinitio

n, th

e ultim

ate goal o

f data w

arehousin

g is to

integ

rate enterp

rise-

wid

e corp

orate d

ata into

a single rep

osito

ry fro

m w

hich

users can

easily ru

n q

ueries, p

ro-

duce rep

orts, an

d p

erform

analy

sis. In su

mm

ary, a d

ata wareh

ouse is d

ata man

agem

ent an

d

data an

alysis tech

nolo

gy.

In recen

t years a n

ew term

associated

with

data w

arehousin

g h

as been

used

, nam

ely

‘Data W

ebhouse’.

Da

taA

dis

tribute

d d

ata

ware

house th

at is

imp

lem

ente

d o

ver th

e W

eb

with

We

bh

ou

se

no c

entra

l data

rep

osito

ry.

The

Web

is

an im

men

se so

urce

of

beh

avio

ral data

as in

div

iduals

interact

thro

ugh

their W

eb b

row

sers with

remote W

eb sites. T

he d

ata gen

erated b

y th

is beh

avio

r is called

clickstrea

m. U

sing a d

ata wareh

ouse o

n th

e Web

to h

arness click

stream d

ata has led

to

the d

evelo

pm

ent o

f Data W

ebhouses. F

urth

er discu

ssions o

n th

e dev

elopm

ent o

f this n

ew

variatio

n o

f data w

arehousin

g is o

ut w

ith th

e scope o

f this b

ook, h

ow

ever th

e interested

reader is referred

to K

imball et a

l.(2

000).

Be

ne

fits

of D

ata

Wa

reh

ou

sin

g

The

successfu

l im

plem

entatio

n

of

a data

wareh

ouse

can

brin

g

majo

r ben

efits

to

an

org

anizatio

n in

cludin

g:

nP

oten

tial h

igh retu

rns o

n in

vestmen

tA

n o

rgan

ization m

ust co

mm

it a huge am

ount o

f

resources to

ensu

re the su

ccessful im

plem

entatio

n o

f a data w

arehouse an

d th

e cost

can v

ary en

orm

ously

from

£50,0

00 to

over £

10 m

illion d

ue to

the v

ariety o

f technical

solu

tions av

ailable. H

ow

ever, a stu

dy b

y th

e Intern

ational D

ata Corp

oratio

n (ID

C) in

1996 rep

orted

that av

erage th

ree-year retu

rns o

n in

vestm

ent (R

OI) in

data w

arehousin

g

reached

401%

, with

over 9

0%

of th

e com

pan

ies surv

eyed

achiev

ing o

ver 4

0%

RO

I, half

the co

mpan

ies achiev

ing o

ver 1

60%

RO

I, and a q

uarter w

ith m

ore th

an 6

00%

RO

I

(IDC

, 1996).

nC

om

petitive a

dva

nta

ge

The h

uge retu

rns o

n in

vestm

ent fo

r those co

mpan

ies that h

ave

successfu

lly im

plem

ented

a data w

arehouse is ev

iden

ce of th

e enorm

ous co

mpetitiv

e

advan

tage

that

accom

pan

ies th

is tech

nolo

gy.

The

com

petitiv

e ad

van

tage

is gain

ed

by allo

win

g d

ecision-m

akers access to

data th

at can rev

eal prev

iously

unav

ailable,

unknow

n, an

d u

ntap

ped

info

rmatio

n o

n, fo

r exam

ple, cu

stom

ers, trends, an

d d

eman

ds.

nIn

creased

pro

ductivity

of

corp

ora

te decisio

n-m

akers

Data

wareh

ousin

g

impro

ves

the p

roductiv

ity o

f corp

orate d

ecision-m

akers b

y creatin

g an

integ

rated d

atabase o

f

consisten

t, su

bject-o

riented

, histo

rical data.

It in

tegrates

data

from

m

ultip

le in

com

-

patib

le system

s into

a form

that p

rovid

es one co

nsisten

t view

of th

e org

anizatio

n. B

y

transfo

rmin

g data

into

m

eanin

gfu

l in

form

ation,

a data

wareh

ouse

allow

s co

rporate

decisio

n-m

akers to

perfo

rm m

ore su

bstan

tive, accu

rate, and co

nsisten

t analy

sis.

31

.1.3

Page 7: Database Warehousing

31.1

Intro

du

ctio

n to

Da

ta W

are

housin

g|

11

53

Co

mp

aris

on

of O

LT

P S

yste

ms a

nd

D

ata

Wa

reh

ou

sin

g

A D

BM

S b

uilt fo

r Onlin

e Tran

saction P

rocessin

g (O

LT

P) is g

enerally

regard

ed as u

nsu

it-

able fo

r data w

arehousin

g b

ecause each

system

is desig

ned

with

a differin

g set o

f require-

men

ts in m

ind. F

or ex

ample, O

LT

P sy

stems are d

esigned

to m

axim

ize the tran

saction

pro

cessing cap

acity, w

hile d

ata wareh

ouses are d

esigned

to su

pport a

d h

oc

query

pro

-

cessing. T

able 3

1.1

pro

vid

es a com

pariso

n o

f the m

ajor ch

aracteristics of O

LT

P sy

stems

and d

ata wareh

ousin

g sy

stems (S

ingh, 1

997).

An o

rgan

ization w

ill norm

ally h

ave a n

um

ber o

f differen

t OL

TP

system

s for b

usin

ess

pro

cesses such

as inven

tory

contro

l, custo

mer in

voicin

g, an

d p

oin

t-of-sale. T

hese sy

stems

gen

erate operatio

nal d

ata that is d

etailed, cu

rrent, an

d su

bject to

chan

ge. T

he O

LT

P sy

s-

tems are o

ptim

ized fo

r a hig

h n

um

ber o

f transactio

ns th

at are pred

ictable, rep

etitive, an

d

update in

tensiv

e. The O

LT

P d

ata is org

anized

accord

ing to

the req

uirem

ents o

f the tran

s-

actions asso

ciated w

ith th

e busin

ess applicatio

ns an

d su

pports th

e day

-to-d

ay d

ecisions o

f

a large n

um

ber o

f concu

rrent o

peratio

nal u

sers.

In co

ntrast, an

org

anizatio

n w

ill norm

ally h

ave a sin

gle d

ata wareh

ouse, w

hich

hold

s

data th

at is histo

rical, detailed

, and su

mm

arized to

vario

us lev

els and rarely

subject to

chan

ge (o

ther th

an b

eing su

pplem

ented

with

new

data). T

he d

ata wareh

ouse is d

esigned

to su

pport relativ

ely lo

w n

um

bers o

f transactio

ns th

at are unpred

ictable in

natu

re and

require an

swers to

queries th

at are ad h

oc, u

nstru

ctured

, and h

euristic. T

he w

arehouse d

ata

is org

anized

accord

ing to

the req

uirem

ents o

f poten

tial queries an

d su

pports th

e long-term

strategic d

ecisions o

f a relatively

low

num

ber o

f man

agerial u

sers.

Alth

ough O

LT

P sy

stems an

d d

ata wareh

ouses h

ave d

ifferent ch

aracteristics and are

built w

ith d

ifferent p

urp

oses in

min

d, th

ese system

s are closely

related in

that th

e OL

TP

system

s pro

vid

e the so

urce d

ata for th

e wareh

ouse. A

majo

r pro

blem

of th

is relationsh

ip

is that th

e data h

eld b

y th

e OL

TP

system

s can b

e inco

nsisten

t, fragm

ented

, and su

bject

31

.1.4

Tab

le 31.1

Com

pariso

n o

f OL

TP

system

s and d

ata wareh

ousin

g sy

stems.

OLTP

syste

ms

Hold

s curren

t data

Sto

res detailed

data

Data is d

ynam

ic

Rep

etitive p

rocessin

g

Hig

h lev

el of tran

saction th

roughput

Pred

ictable p

attern o

f usag

e

Tran

saction-d

riven

Applicatio

n-o

riented

Supports d

ay-to

-day

decisio

ns

Serv

es large n

um

ber o

f clerical/o

peratio

nal u

sers

Data

ware

housin

g s

yste

ms

Hold

s histo

rical data

Sto

res detailed

, lightly

, and h

ighly

sum

marized

data

Data is larg

ely static

Ad h

oc, u

nstru

ctured

, and h

euristic p

rocessin

g

Med

ium

to lo

w lev

el of tran

saction th

roughput

Unpred

ictable p

attern o

f usag

e

Analy

sis driv

en

Subject-o

riented

Supports strateg

ic decisio

ns

Serv

es relatively

low

num

ber o

f man

agerial u

sers

Page 8: Database Warehousing

11

54

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

to ch

ange, co

ntain

ing d

uplicate o

r missin

g en

tries. As su

ch, th

e operatio

nal d

ata must b

e

‘cleaned

up’ b

efore it can

be u

sed in

the d

ata wareh

ouse. W

e discu

ss the task

s associated

with

this p

rocess in

Sectio

n 3

1.3

.1.

OL

TP

system

s are not b

uilt to

quick

ly an

swer a

d h

oc

queries. T

hey

also ten

d n

ot to

store

histo

rical data, w

hich

is necessary

to an

alyze tren

ds. B

asically, O

LT

P o

ffers large am

ounts

of raw

data, w

hich

is not easily

analy

zed. T

he d

ata wareh

ouse allo

ws m

ore co

mplex

queries

to b

e answ

ered b

esides ju

st simple ag

greg

ations su

ch as, ‘W

hat is th

e averag

e selling p

rice

for p

roperties in

the m

ajor cities o

f Great B

ritain?’. T

he ty

pes o

f queries th

at a data w

are-

house is ex

pected

to an

swer ran

ge fro

m th

e relatively

simple to

the h

ighly

com

plex

and are

dep

enden

t on th

e types o

f end-u

ser access tools u

sed (see S

ection 3

1.2

.10). E

xam

ples o

f the

range o

f queries th

at the D

ream

Hom

edata w

arehouse m

ay b

e capab

le of su

pportin

g in

clude:

nW

hat w

as the to

tal reven

ue fo

r Sco

tland in

the th

ird q

uarter o

f 2004?

nW

hat w

as the to

tal reven

ue fo

r pro

perty

sales for each

type o

f pro

perty

in G

reat Britain

in 2

003?

nW

hat are th

e three m

ost p

opular areas in

each city

for th

e rentin

g o

f pro

perty

in 2

004

and h

ow

does th

is com

pare w

ith th

e results fo

r the p

revio

us tw

o y

ears?

nW

hat is th

e month

ly rev

enue fo

r pro

perty

sales at each b

ranch

offi

ce, com

pared

with

rollin

g 1

2-m

onth

ly p

rior fi

gures?

nW

hat w

ould

be th

e effect on p

roperty

sales in th

e differen

t regio

ns o

f Britain

if legal co

sts

wen

t up b

y 3

.5%

and G

overn

men

t taxes w

ent d

ow

n b

y 1

.5%

for p

roperties o

ver £

100,0

00?

nW

hich

type o

f pro

perty

sells for p

rices above th

e averag

e selling p

rice for p

roperties in

the m

ain cities o

f Great B

ritain an

d h

ow

does th

is correlate to

dem

ograp

hic d

ata?

nW

hat is th

e relationsh

ip b

etween

the to

tal annual rev

enue g

enerated

by each

bran

ch

offi

ce and th

e total n

um

ber o

f sales staff assigned

to each

bran

ch o

ffice?

Pro

ble

ms o

f Data

Wa

reh

ou

sin

g

The p

roblem

s associated

with

dev

elopin

g an

d m

anag

ing a d

ata wareh

ouse are listed

in

Tab

le 31.2

(Green

field

, 1996).

Tab

le 31.2

Pro

blem

s of d

ata wareh

ousin

g.

Underestim

ation o

f resources fo

r data lo

adin

g

Hid

den

pro

blem

s with

source sy

stems

Req

uired

data n

ot cap

tured

Increased

end-u

ser dem

ands

Data h

om

ogen

ization

Hig

h d

eman

d fo

r resources

Data o

wnersh

ip

Hig

h m

ainten

ance

Long-d

uratio

n p

rojects

Com

plex

ity o

f integ

ration

31

.1.5

Page 9: Database Warehousing

31.1

Intro

du

ctio

n to

Da

ta W

are

housin

g|

11

55

Und

ere

stim

atio

n o

f resourc

es fo

r data

load

ing

Man

y d

evelo

pers u

nderestim

ate the tim

e required

to ex

tract, clean, an

d lo

ad th

e data in

to

the w

arehouse. T

his p

rocess m

ay acco

unt fo

r a signifi

cant p

roportio

n o

f the to

tal dev

elop-

men

t time, alth

ough b

etter data clean

sing an

d m

anag

emen

t tools sh

ould

ultim

ately red

uce

the tim

e and effo

rt spen

t.

Hid

den p

rob

lem

s w

ith s

ourc

e s

yste

ms

Hid

den

pro

blem

s associated

with

the so

urce sy

stems feed

ing th

e data w

arehouse m

ay b

e

iden

tified

, possib

ly after y

ears of b

eing u

ndetected

. The d

evelo

per m

ust d

ecide w

heth

er

to fi

x th

e pro

blem

in th

e data w

arehouse an

d/o

r fix th

e source sy

stems. F

or ex

ample, w

hen

enterin

g th

e details o

f a new

pro

perty

, certain fi

elds m

ay allo

w n

ulls, w

hich

may

result in

staff enterin

g in

com

plete p

roperty

data, ev

en w

hen

availab

le and ap

plicab

le.

Req

uire

d d

ata

not c

ap

ture

d

Wareh

ouse p

rojects o

ften h

ighlig

ht a req

uirem

ent fo

r data n

ot b

eing cap

tured

by th

e

existin

g so

urce sy

stems. T

he o

rgan

ization m

ust d

ecide w

heth

er to m

odify

the O

LT

P sy

s-

tems o

r create a system

ded

icated to

captu

ring th

e missin

g d

ata. For ex

ample, w

hen

con-

siderin

g th

e Drea

mH

om

e case study, w

em

ay w

ish to

analy

ze the ch

aracteristics of certain

even

ts such

as the reg

istering o

f new

clients an

d p

roperties at each

bran

ch o

ffice. H

ow

ever,

this is cu

rrently

not p

ossib

le as we d

o n

ot cap

ture th

e data th

at the an

alysis req

uires su

ch

as the d

ate registered

in eith

er case.

Incre

ased

end

-user d

em

and

s

After en

d-u

sers receive q

uery

and rep

ortin

g to

ols, req

uests fo

r support fro

m IS

staff may

increase rath

er than

decrease. T

his is cau

sed b

y an

increasin

g aw

areness o

f the u

sers on

the cap

abilities an

d v

alue o

f the d

ata wareh

ouse. T

his p

roblem

can b

e partially

alleviated

by in

vestin

g in

easier-to-u

se, more p

ow

erful to

ols, o

r in p

rovid

ing b

etter trainin

g fo

r the

users. A

furth

er reason fo

r increasin

g d

eman

ds o

n IS

staff is that o

nce a d

ata wareh

ouse is

onlin

e, it is often

the case th

at the n

um

ber o

f users an

d q

ueries in

crease togeth

er with

requests fo

r answ

ers to m

ore an

d m

ore co

mplex

queries.

Data

hom

og

eniz

atio

n

Larg

e-scale data w

arehousin

g can

beco

me an

exercise in

data h

om

ogen

ization th

at lessens

the v

alue o

f the d

ata. For ex

ample, in

pro

ducin

g a co

nso

lidated

and in

tegrated

view

of th

e

org

anizatio

n’s d

ata, the w

arehouse d

esigner m

ay b

e tempted

to em

phasize sim

ilarities

rather th

an d

ifferences in

the d

ata used

by d

ifferent ap

plicatio

n areas su

ch as p

roperty

sales

and p

roperty

rentin

g.

Hig

h d

em

and

for re

sourc

es

The

data

wareh

ouse

can use

large

amounts

of

disk

sp

ace. M

any relatio

nal

datab

ases

used

fo

r decisio

n-su

pport

are desig

ned

aro

und star,

snow

flak

e, an

d starfl

ake

schem

as

Page 10: Database Warehousing

11

56

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

(see Chap

ter 32). T

hese ap

pro

aches resu

lt in th

e creation o

f very

large fact tab

les. If there

are man

y d

imen

sions to

the factu

al data, th

e com

bin

ation o

f aggreg

ate tables an

d in

dex

es

to th

e fact tables can

use u

p m

ore sp

ace than

the raw

data.

Data

ow

ners

hip

Data w

arehousin

g m

ay ch

ange th

e attitude o

f end-u

sers to th

e ow

nersh

ip o

f data. S

ensitiv

e

data th

at was o

rigin

ally v

iewed

and u

sed o

nly

by a p

articular d

epartm

ent o

r busin

ess area,

such

as sales or m

arketin

g, m

ay n

ow

be m

ade accessib

le to o

thers in

the o

rgan

ization.

Hig

h m

ain

tenance

Data

wareh

ouses

are hig

h

main

tenan

ce sy

stems.

Any

reorg

anizatio

n

of

the

busin

ess

pro

cesses and th

e source sy

stems m

ay affect th

e data w

arehouse. T

o rem

ain a v

aluab

le

resource, th

e data w

arehouse m

ust rem

ain co

nsisten

t with

the o

rgan

ization th

at it supports.

Long

-dura

tion p

roje

cts

A d

ata wareh

ouse rep

resents a sin

gle d

ata resource fo

r the o

rgan

ization. H

ow

ever, th

e

build

ing o

f a wareh

ouse can

take u

p to

three y

ears, which

is why so

me o

rgan

izations are

build

ing d

ata marts (see S

ection 3

1.5

). Data m

arts support o

nly

the req

uirem

ents o

f a

particu

lar dep

artmen

t or fu

nctio

nal area an

d can

therefo

re be b

uilt m

ore rap

idly

.

Com

ple

xity

of in

teg

ratio

n

The

most

importan

t area

for

the

man

agem

ent

of

a data

wareh

ouse

is th

e in

tegratio

n

capab

ilities. This m

eans an

org

anizatio

n m

ust sp

end a sig

nifi

cant am

ount o

f time d

eter-

min

ing h

ow

well th

e vario

us d

ifferent d

ata wareh

ousin

g to

ols can

be in

tegrated

into

the

overall so

lutio

n th

at is need

ed. T

his can

be a v

ery d

ifficu

lt task, as th

ere are a num

ber o

f

tools fo

r every

operatio

n o

f the d

ata wareh

ouse, w

hich

must in

tegrate w

ell in o

rder th

at the

wareh

ouse w

ork

s to th

e org

anizatio

n’s b

enefi

t.

Data

Wa

reh

ou

se

Arc

hite

ctu

re

In th

is section w

e presen

t an o

verv

iew o

f the arch

itecture an

d m

ajor co

mponen

ts of a d

ata

wareh

ouse (A

nah

ory

and M

urray

, 1997). T

he p

rocesses, to

ols, an

d tech

nolo

gies asso

ciated

with

data w

arehousin

g are d

escribed

in m

ore d

etail in th

e follo

win

g sectio

ns o

f this ch

apter.

The ty

pical arch

itecture o

f a data w

arehouse is sh

ow

n in

Fig

ure 3

1.1

.

Op

era

tion

al D

ata

The so

urce o

f data fo

r the d

ata wareh

ouse is su

pplied

from

:

nM

ainfram

e operatio

nal d

ata held

in fi

rst gen

eration h

ierarchical an

d n

etwork

datab

ases.

It is estimated

that th

e majo

rity o

f corp

orate o

peratio

nal d

ata is held

in th

ese system

s.

31

.2

31

.2.1

Page 11: Database Warehousing

31

.2 D

ata

Wa

reho

use

Arc

hite

ctu

re|

11

57

nD

epartm

ental d

ata held

in p

roprietary

file sy

stems su

ch as V

SA

M, R

MS

, and relatio

nal

DB

MS

s such

as Info

rmix

and O

racle.

nP

rivate d

ata held

on w

ork

stations an

d p

rivate serv

ers.

nE

xtern

al system

s such

as the In

ternet, co

mm

ercially av

ailable d

atabases, o

r datab

ases

associated

with

an o

rgan

ization’s su

ppliers o

r custo

mers.

Op

era

tion

al D

ata

Sto

re

An O

peratio

nal D

ata Sto

re (OD

S) is a rep

osito

ry o

f curren

t and in

tegrated

operatio

nal d

ata

used

for an

alysis. It is o

ften stru

ctured

and su

pplied

with

data in

the sam

e way

as the

data w

arehouse, b

ut m

ay in

fact act simply

as a stagin

g area fo

r data to

be m

oved

into

the

wareh

ouse.

The O

DS

is often

created w

hen

legacy

operatio

nal sy

stems are fo

und to

be in

capab

le

of ach

ievin

g rep

ortin

g req

uirem

ents. T

he O

DS

pro

vid

es users w

ith th

e ease of u

se of a

relational d

atabase w

hile rem

ainin

g d

istant fro

m th

e decisio

n su

pport fu

nctio

ns o

f the

data w

arehouse.

Fig

ure

31

.1T

yp

ica

l arc

hite

ctu

re o

f a d

ata

wa

reh

ou

se

.

31

.2.2

Page 12: Database Warehousing

11

58

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

Build

ing an

OD

S can

be a h

elpfu

l step to

ward

s build

ing a d

ata wareh

ouse b

ecause an

OD

S can

supply

data th

at has b

een alread

y ex

tracted fro

m th

e source sy

stems an

d clean

ed.

This m

eans th

at the rem

ainin

g w

ork

of in

tegratin

g an

d restru

cturin

g th

e data fo

r the d

ata

wareh

ouse is sim

plifi

ed (see S

ection 3

2.3

).

Lo

ad

Ma

na

ge

r

The

load

m

anag

er (also

called

th

e fro

nten

dco

mponen

t) perfo

rms

all th

e operatio

ns

associated

with

the ex

traction an

d lo

adin

g o

f data in

to th

e wareh

ouse. T

he d

ata may

be

extracted

directly

from

the d

ata sources o

r more co

mm

only

from

the o

peratio

nal d

ata store.

The o

peratio

ns p

erform

ed b

y th

e load

man

ager m

ay in

clude sim

ple tran

sform

ations o

f the

data to

prep

are the d

ata for en

try in

to th

e wareh

ou

se. The size an

d co

mplex

ity o

f this co

m-

ponen

t will v

ary b

etween

data w

arehouses an

d m

ay b

e constru

cted u

sing a co

mbin

ation

of v

endor d

ata load

ing to

ols an

d cu

stom

-built p

rogram

s.

Wa

reh

ou

se

Ma

na

ge

r

The w

arehouse m

anag

er perfo

rms all th

e operatio

ns asso

ciated w

ith th

e man

agem

ent o

f

the d

ata in th

e wareh

ouse. T

his co

mponen

t is constru

cted u

sing v

endor d

ata man

agem

ent

tools an

d cu

stom

-built p

rogram

s. The o

peratio

ns p

erform

ed b

y th

e wareh

ouse m

anag

er

inclu

de:

nan

alysis o

f data to

ensu

re consisten

cy;

ntran

sform

ation an

d m

ergin

g o

f source d

ata from

temp

orary

storag

e into

data w

arehouse

tables;

ncreatio

n o

f index

es and v

iews o

n b

ase tables;

ngen

eration o

f den

orm

alizations (if n

ecessary);

ngen

eration o

f aggreg

ations (if n

ecessary);

nback

ing-u

p an

d arch

ivin

g d

ata.

In so

me cases, th

e wareh

ouse m

anag

er also g

enerates q

uery

pro

files to

determ

ine w

hich

index

es and ag

greg

ations are ap

pro

priate. A

query

pro

file can

be g

enerated

for each

user,

gro

up o

f users, o

r the d

ata wareh

ouse an

d is b

ased o

n in

form

ation th

at describ

es the ch

ar-

acteristics of th

e queries su

ch as freq

uen

cy, targ

et table(s), an

d size o

f result sets.

Qu

ery M

an

ag

er

The

query

m

anag

er (also

called

th

e backen

dco

mponen

t) perfo

rms

all th

e operatio

ns

associated

with

the m

anag

emen

t of u

ser queries. T

his co

mponen

t is typically

constru

cted

usin

g

ven

dor

end-u

ser data

access to

ols,

data

wareh

ouse

monito

ring

tools,

datab

ase

facilities, and cu

stom

-built p

rogram

s. The co

mplex

ity o

f the q

uery

man

ager is d

etermin

ed

by th

e facilities pro

vid

ed b

y th

e end-u

ser access tools an

d th

e datab

ase. The o

peratio

ns

31

.2.3

31

.2.4

31

.2.5

Page 13: Database Warehousing

31

.2 D

ata

Wa

reho

use

Arc

hite

ctu

re|

11

59

perfo

rmed

by th

is co

mponen

t in

clude

directin

g queries

to th

e ap

pro

priate

tables

and

sched

ulin

g th

e execu

tion o

f queries. In

som

e cases, the q

uery

man

ager also

gen

erates

query

pro

files to

allow

the w

arehouse m

anag

er to d

etermin

e which

index

es and ag

greg

a-

tions are ap

pro

priate.

De

taile

d D

ata

This area o

f the w

arehouse sto

res all the d

etailed d

ata in th

e datab

ase schem

a. In m

ost

cases, the d

etailed d

ata is not sto

red o

nlin

e but is m

ade av

ailable b

y ag

greg

ating th

e data

to th

e nex

t level o

f detail. H

ow

ever, o

n a reg

ular b

asis, detailed

data is ad

ded

to th

e ware-

house to

supplem

ent th

e aggreg

ated d

ata.

Lig

htly

an

d H

igh

ly S

um

ma

rize

d D

ata

This area o

f the w

arehouse sto

res all the p

redefi

ned

lightly

and h

ighly

sum

marized

(aggreg

ated)

data g

enerated

by th

e wareh

ouse m

anag

er. This area o

f the w

arehouse is tran

sient as it w

ill

be su

bject to

chan

ge o

n an

ongoin

g b

asis in o

rder to

respond to

chan

gin

g q

uery

pro

files.

The

purp

ose

of

sum

mary

in

form

ation

is to

sp

eed

up

the

perfo

rman

ce of

queries.

Alth

ough th

ere are increased

operatio

nal co

sts associated

with

initially

sum

marizin

g th

e

data, th

is is offset b

y rem

ovin

g th

e requirem

ent to

contin

ually

perfo

rm su

mm

ary o

pera-

tions (su

ch as so

rting o

r gro

upin

g) in

answ

ering u

ser queries. T

he su

mm

ary d

ata is updated

contin

uously

as new

data is lo

aded

into

the w

arehouse.

Arc

hiv

e/B

ack

up

Data

This area o

f the w

arehouse sto

res the d

etailed an

d su

mm

arized d

ata for th

e purp

oses o

f

archiv

ing an

d b

ackup. E

ven

although su

mm

ary d

ata is gen

erated fro

m d

etailed d

ata, it

may

be n

ecessary to

back

up o

nlin

e sum

mary

data if th

is data is k

ept b

eyond th

e retentio

n

perio

d fo

r detailed

data. T

he d

ata is transferred

to sto

rage arch

ives su

ch as m

agnetic tap

e

or o

ptical d

isk.

Me

tad

ata

This area o

f the w

arehouse sto

res all the m

etadata (d

ata about d

ata) defi

nitio

ns u

sed b

y all

the p

rocesses in

the w

arehouse. M

etadata is u

sed fo

r a variety

of p

urp

oses in

cludin

g:

nth

e ex

traction an

d lo

adin

g pro

cesses – m

etadata

is used

to

m

ap data

sources

to a

com

mon v

iew o

f the d

ata with

in th

e wareh

ouse;

nth

e wareh

ouse m

anag

emen

t pro

cess – m

etadata is u

sed to

auto

mate th

e pro

ductio

n o

f

sum

mary

tables;

nas p

art of th

e query

man

agem

ent p

rocess –

metad

ata is used

to d

irect a query

to th

e most

appro

priate d

ata source.

31

.2.6

31

.2.7

31

.2.8

31

.2.9

Page 14: Database Warehousing

11

60

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

The stru

cture o

f metad

ata differs b

etween

each p

rocess, b

ecause th

e purp

ose is d

ifferent.

This m

eans th

at multip

le copies o

f metad

ata describ

ing th

e same d

ata item are h

eld w

ithin

the d

ata wareh

ouse. In

additio

n, m

ost v

endor to

ols fo

r copy m

anag

emen

t and en

d-u

ser

data access u

se their o

wn v

ersions o

f metad

ata. Specifi

cally, co

py m

anag

emen

t tools u

se

metad

ata to u

nderstan

d th

e map

pin

g ru

les to ap

ply

in o

rder to

convert th

e source d

ata into

a com

mon fo

rm. E

nd-u

ser access tools u

se metad

ata to u

nderstan

d h

ow

to b

uild

a query

.

The m

anag

emen

t of m

etadata w

ithin

the d

ata wareh

ouse is a v

ery co

mplex

task th

at should

not b

e underestim

ated. T

he issu

es associated

with

the m

anag

emen

t of m

etadata in

a data

wareh

ouse are d

iscussed

in S

ection 3

1.4

.3.

En

d-U

se

r Ac

ce

ss T

oo

ls

The p

rincip

al purp

ose o

f data w

arehousin

g is to

pro

vid

e info

rmatio

n to

busin

ess users

for strateg

ic decisio

n-m

akin

g. T

hese u

sers interact w

ith th

e wareh

ouse u

sing en

d-u

ser

access tools. T

he d

ata wareh

ouse m

ust effi

ciently

support a

d h

oc

and ro

utin

e analy

sis.

Hig

h p

erform

ance is ach

ieved

by p

re-plan

nin

g th

e requirem

ents fo

r join

s, sum

matio

ns,

and p

eriodic rep

orts b

y en

d-u

sers.

Alth

ough th

e defi

nitio

ns o

f end-u

ser access tools can

overlap

, for th

e purp

ose o

f this

discu

ssion, w

e categorize th

ese tools in

to fi

ve m

ain g

roups (B

erson an

d S

mith

, 1997):

nrep

ortin

g an

d q

uery

tools;

nap

plicatio

n d

evelo

pm

ent to

ols;

nE

xecu

tive In

form

ation S

ystem

(EIS

) tools;

nO

nlin

e Analy

tical Pro

cessing (O

LA

P) to

ols;

ndata m

inin

g to

ols.

Rep

ortin

g a

nd

query

tools

Rep

ortin

g to

ols in

clude p

roductio

n rep

ortin

g to

ols an

d rep

ort w

riters. Pro

ductio

n rep

ort-

ing to

ols are u

sed to

gen

erate regular o

peratio

nal rep

orts o

r support h

igh-v

olu

me b

atch

jobs, su

ch as cu

stom

er ord

ers/invoices an

d staff p

ay ch

eques. R

eport w

riters, on th

e oth

er

han

d, are in

expen

sive d

eskto

p to

ols d

esigned

for en

d-u

sers.

Query

tools fo

r relational d

ata wareh

ouses are d

esigned

to accep

t SQ

L o

r gen

erate

SQ

L statem

ents to

query

data sto

red in

the w

arehouse. T

hese to

ols sh

ield en

d-u

sers from

the co

mplex

ities of S

QL

and d

atabase stru

ctures b

y in

cludin

g a m

eta-layer b

etween

users

and th

e datab

ase. The m

eta-layer

is the so

ftware th

at pro

vid

es subject-o

riented

view

s of

a datab

ase and su

pports ‘p

oin

t-and-click

’ creation o

f SQ

L. A

n ex

ample o

f a query

tool is

Query

-By-E

xam

ple (Q

BE

). The Q

BE

facility o

f Micro

soft O

ffice A

ccess DB

MS

was

dem

onstrated

in C

hap

ter 7. Q

uery

tools are p

opular w

ith u

sers of b

usin

ess applicatio

ns

such

as dem

ograp

hic an

alysis an

d cu

stom

er mailin

g lists. H

ow

ever, as q

uestio

ns b

ecom

e

increasin

gly

com

plex

, these to

ols m

ay rap

idly

beco

me in

efficien

t.

Ap

plic

atio

n d

eve

lop

ment to

ols

The req

uirem

ents o

f the en

d-u

sers may

be su

ch th

at the b

uilt-in

capab

ilities of rep

ortin

g

and q

uery

tools are in

adeq

uate eith

er becau

se the req

uired

analy

sis cannot b

e perfo

rmed

31

.2.1

0

Page 15: Database Warehousing

31.3

Data

Ware

house D

ata

Flo

ws

|1

16

1

or b

ecause th

e user in

teraction req

uires an

unreaso

nab

ly h

igh lev

el of ex

pertise b

y th

e

user. In

this situ

ation, u

ser access may

require th

e dev

elopm

ent o

f in-h

ouse ap

plicatio

ns

usin

g g

raphical d

ata access tools d

esigned

prim

arily fo

r client–

server en

viro

nm

ents. S

om

e

of th

ese applicatio

n d

evelo

pm

ent to

ols in

tegrate w

ith p

opular O

LA

P to

ols, an

d can

access

all majo

r datab

ase system

s, inclu

din

g O

racle, Sybase, an

d In

form

ix.

Executive

info

rmatio

n s

yste

m (E

IS) to

ols

Execu

tive in

form

ation sy

stems, m

ore recen

tly referred

to as ‘ev

erybody’s in

form

ation

system

s’, were o

rigin

ally d

evelo

ped

to su

pport h

igh-lev

el strategic d

ecision-m

akin

g. H

ow

-

ever, th

e focu

s of th

ese system

s wid

ened

to in

clude su

pport fo

r all levels o

f man

agem

ent.

EIS

tools w

ere orig

inally

associated

with

main

frames en

ablin

g u

sers to b

uild

custo

mized

,

grap

hical d

ecision-su

pport ap

plicatio

ns to

pro

vid

e an o

verv

iew o

f the o

rgan

ization’s d

ata

and access to

extern

al data so

urces.

Curren

tly, th

e dem

arcation b

etween

EIS

tools an

d o

ther d

ecision-su

pport to

ols is ev

en

more v

ague as E

IS d

evelo

pers ad

d ad

ditio

nal q

uery

facilities and p

rovid

e custo

m-b

uilt

applicatio

ns fo

r busin

ess areas such

as sales, mark

eting, an

d fi

nan

ce.

Onlin

e A

naly

tical P

rocessin

g (O

LA

P) to

ols

Onlin

e Analy

tical Pro

cessing (O

LA

P) to

ols are b

ased o

n th

e concep

t of m

ulti-d

imen

sional

datab

ases an

d

allow

a

sophisticated

user

to

analy

ze th

e data

usin

g

com

plex

, m

ulti-

dim

ensio

nal v

iews. T

ypical b

usin

ess applicatio

ns fo

r these to

ols in

clude assessin

g th

e

effectiven

ess of

a m

arketin

g cam

paig

n,

pro

duct

sales fo

recasting,

and cap

acity plan

-

nin

g.

These

tools

assum

e th

at th

e data

is org

anized

in

a

multi-d

imen

sional

model

supported

by a sp

ecial multi-d

imen

sional d

atabase (M

DD

B) o

r by a relatio

nal d

atabase

desig

ned

to en

able m

ulti-d

imen

sional q

ueries. W

e discu

ss OL

AP

tools in

more d

etail in

Chap

ter 33.

Data

min

ing

tools

Data m

inin

g is th

e pro

cess of d

iscoverin

g m

eanin

gfu

l new

correlatio

ns, p

atterns, an

d

trends

by m

inin

g larg

e am

ounts

of

data

usin

g statistical,

math

ematical,

and artifi

cial

intellig

ence (A

I) techniq

ues. D

ata min

ing h

as the p

oten

tial to su

persed

e the cap

abilities o

f

OL

AP

tools, as th

e majo

r attraction o

f data m

inin

g is its ab

ility to

build

pred

ictiverath

er

than

retrosp

ective models. W

e discu

ss data m

inin

g in

more d

etail in C

hap

ter 34.

Data

Wa

reh

ou

se

Data

Flo

ws

In th

is section w

e exam

ine th

e activities asso

ciated w

ith th

e pro

cessing (o

r flow

) of d

ata

with

in a d

ata wareh

ouse. D

ata wareh

ousin

g fo

cuses o

n th

e man

agem

ent o

f five p

rimary

data fl

ow

s, nam

ely th

e infl

ow

, upfl

ow

, dow

nfl

ow

, outfl

ow

, and m

etaflow

(Hack

athorn

,

1995). T

he d

ata flow

s with

in a d

ata wareh

ouse are sh

ow

n in

Fig

ure 3

1.2

. The p

rocesses

associated

with

each d

ata flow

inclu

de:

31

.3

Page 16: Database Warehousing

11

62

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

nIn

flow

Extractio

n, clean

sing, an

d lo

adin

g o

f the so

urce d

ata.

nU

pfl

ow

Addin

g v

alue to

the d

ata in th

e wareh

ouse th

rough su

mm

arizing, p

ack-

agin

g, an

d d

istributio

n o

f the d

ata.

nD

ow

nfl

ow

Arch

ivin

g an

d b

ackin

g-u

p th

e data in

the w

arehouse.

nO

utfl

ow

Mak

ing th

e data av

ailable to

end-u

sers.

nM

etaflow

Man

agin

g th

e metad

ata.

Infl

ow

Infl

ow

The p

rocesses a

ssocia

ted

with

the e

xtra

ctio

n, c

leansin

g, a

nd

load

ing

of th

e

data

from

the s

ourc

e s

yste

ms in

to th

e d

ata

ware

house.

The in

flow

is concern

ed w

ith tak

ing d

ata from

the so

urce sy

stems to

load

into

the d

ata

wareh

ouse. A

lternativ

ely, th

e data m

ay b

e first lo

aded

into

the o

peratio

nal d

ata store

Fig

ure

31

.2In

form

atio

n fl

ow

s o

f a d

ata

wa

reh

ou

se

.

31

.3.1

Page 17: Database Warehousing

31.3

Data

Ware

house D

ata

Flo

ws

|1

16

3

(OD

S) (see S

ection 3

1.2

.2) b

efore b

eing tran

sferred to

the d

ata wareh

ouse. A

s the so

urce

data is g

enerated

pred

om

inately

by O

LT

P sy

stems, th

e data m

ust b

e reconstru

cted fo

r the

purp

oses o

f the d

ata wareh

ouse. T

he reco

nstru

ction o

f data in

volv

es:

nclean

sing d

irty d

ata;

nrestru

cturin

g d

ata to su

it the n

ew req

uirem

ents o

f the d

ata wareh

ouse in

cludin

g, fo

r

exam

ple, ad

din

g an

d/o

r removin

g fi

elds, an

d d

enorm

alizing d

ata;

nen

surin

g th

at the so

urce d

ata is consisten

t with

itself and w

ith th

e data alread

y in

the

wareh

ouse.

To effectiv

ely m

anag

e the in

flow

, mech

anism

s must b

e iden

tified

to d

etermin

e when

to

start extractin

g th

e data to

carry o

ut th

e necessary

transfo

rmatio

ns an

d to

undertak

e con-

sistency

check

s. When

extractin

g d

ata from

the so

urce sy

stems, it is im

portan

t to en

sure

that th

e data is in

a consisten

t state to g

enerate a sin

gle, co

nsisten

t view

of th

e corp

orate

data. T

he co

mplex

ity o

f the ex

traction p

rocess is d

etermin

ed b

y th

e exten

t to w

hich

the

source sy

stems are ‘in

tune’ w

ith o

ne an

oth

er.

Once th

e data is ex

tracted, th

e data is u

sually

load

ed in

to a tem

porary

store fo

r the

purp

oses

of

cleansin

g

and

consisten

cy

check

ing.

As

this

pro

cess is

com

plex

, it

is

importan

t for it to

be fu

lly au

tom

ated an

d to

hav

e the ab

ility to

report w

hen

pro

blem

s

and failu

res occu

r. Com

mercial to

ols are av

ailable to

support th

e man

agem

ent o

f the

infl

ow

. How

ever, u

nless th

e pro

cess is relatively

straightfo

rward

, the to

ols m

ay req

uire

custo

mizatio

n.

Up

flow

Up

flo

wThe p

rocesses a

ssocia

ted

with

ad

din

g v

alu

e to

the d

ata

in th

e w

are

hou

se

thro

ug

h s

um

mariz

ing

, packag

ing

, and

dis

tributio

n o

f the d

ata

.

The activ

ities associated

with

the u

pfl

ow

inclu

de:

nSum

marizin

gth

e data b

y selectin

g, p

rojectin

g, jo

inin

g, an

d g

roupin

g relatio

nal d

ata

into

view

s that are m

ore co

nven

ient an

d u

seful to

the en

d-u

sers. Sum

marizin

g ex

tends

bey

on

d

simp

le relatio

nal

op

eration

s to

in

vo

lve

sop

histicated

statistical

analy

sis

inclu

din

g id

entify

ing tren

ds, clu

stering, an

d sam

plin

g th

e data.

nP

acka

gin

gth

e data b

y co

nvertin

g th

e detailed

or su

mm

arized d

ata into

more u

seful

form

ats, such

as spread

sheets, tex

t docu

men

ts, charts, o

ther g

raphical p

resentatio

ns,

priv

ate datab

ases, and an

imatio

n.

nD

istributin

gth

e data to

appro

priate g

roups to

increase its av

ailability

and accessib

ility.

While ad

din

g v

alue to

the d

ata, consid

eration m

ust also

be g

iven

to su

pport th

e perfo

rm-

ance req

uirem

ents o

f the d

ata wareh

ouse an

d to

min

imize th

e ongoin

g o

peratio

nal co

sts.

These req

uirem

ents essen

tially p

ull th

e desig

n in

opposin

g d

irections, fo

rcing restru

ctur-

ing to

impro

ve q

uery

perfo

rman

ce or to

low

er operatio

nal co

sts. In o

ther w

ord

s, the d

ata

wareh

ouse ad

min

istrator m

ust id

entify

the m

ost ap

pro

priate d

atabase d

esign to

meet all

requirem

ents, w

hich

often

necessitates a d

egree o

f com

pro

mise.

31

.3.2

Page 18: Database Warehousing

11

64

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

Dow

nfl

ow

Do

wn

flo

wThe p

rocesses a

ssocia

ted

with

arc

hiv

ing

and

bac

kin

g-u

p o

f data

in th

e

ware

house.

Arch

ivin

g o

ld d

ata play

s an im

portan

t role in

main

tainin

g th

e effectiven

ess and p

erform

-

ance o

f the w

arehouse b

y tran

sferring th

e old

er data o

f limited

valu

e to a sto

rage arch

ive

such

as mag

netic tap

e or o

ptical d

isk. H

ow

ever, if th

e correct p

artitionin

g sch

eme is

selected fo

r the d

atabase, th

e amount o

f data o

nlin

e should

not affect p

erform

ance.

Partitio

nin

g is a u

seful d

esign o

ptio

n fo

r very

large d

atabases th

at enab

les the frag

-

men

tation o

f a table sto

ring en

orm

ous n

um

bers o

f record

s into

several sm

aller tables. T

he

rule fo

r the p

artitionin

g a g

iven

table can

be b

ased o

n ch

aracteristics of th

e data su

ch as

timesp

an o

r area of th

e country

. For ex

ample, th

e Pro

perty

Sale

table o

f Drea

mH

om

eco

uld

be p

artitioned

accord

ing to

the co

untries o

f the U

K.

The d

ow

nfl

ow

of d

ata inclu

des th

e pro

cesses to en

sure th

at the cu

rrent state o

f the d

ata

wareh

ouse can

be reb

uilt fo

llow

ing d

ata loss, o

r softw

are/hard

ware failu

res. Arch

ived

data

should

be sto

red in

a way

that allo

ws th

e re-establish

men

t of th

e data in

the w

arehouse,

when

required

.

Ou

tflow

Ou

tflo

wT

he

p

roc

esse

s

asso

cia

ted

w

ith

ma

kin

g

the

d

ata

a

va

ilab

le

to

the

end

-users

.

The

outfl

ow

is

where

the

real valu

e of

wareh

ousin

g is

realized by th

e org

anizatio

n.

This m

ay req

uire re-en

gin

eering th

e busin

ess pro

cesses to ach

ieve co

mpetitiv

e advan

tage

(Hack

athorn

, 1995). T

he tw

o k

ey activ

ities involv

ed in

the o

utfl

ow

inclu

de:

nA

ccessing, w

hich

is concern

ed w

ith satisfy

ing th

e end-u

sers’ requests fo

r the d

ata they

need

. The m

ain issu

e is to create an

enviro

nm

ent so

that u

sers can effectiv

ely u

se

the q

uery

tools to

access the m

ost ap

pro

priate d

ata source. T

he freq

uen

cy o

f user

accesses can v

ary fro

m a

d h

oc, to

routin

e, to real-tim

e. It is importan

t to en

sure th

at the

system

’s resources are u

sed in

the m

ost effectiv

e way

in sch

edulin

g th

e execu

tion

of u

ser queries.

nD

elivering,w

hich

is concern

ed w

ith p

roactiv

ely d

eliverin

g in

form

ation to

the en

d-u

sers’

work

stations

and

is referred

to

as

a ty

pe

of

‘publish

-and-su

bscrib

e’ pro

cess. T

he

wareh

ouse p

ublish

es vario

us ‘b

usin

ess objects’ th

at are revised

perio

dically

by m

onito

r-

ing u

sage p

atterns. U

sers subscrib

e to th

e set of b

usin

ess objects th

at best m

eets their

need

s.

An im

portan

t issue in

man

agin

g th

e outfl

ow

is the activ

e mark

eting o

f the d

ata wareh

ouse

to u

sers, which

will co

ntrib

ute to

its overall im

pact o

n an

org

anizatio

n’s o

peratio

ns. T

here

are additio

nal o

peratio

nal activ

ities in m

anag

ing th

e outfl

ow

inclu

din

g d

irecting q

ueries to

31

.3.3

31

.3.4

Page 19: Database Warehousing

31.4

Data

Ware

ho

usin

g T

ools

an

d T

echnolo

gie

s|

11

65

the ap

pro

priate targ

et table(s) an

d cap

turin

g in

form

ation o

n th

e query

pro

files asso

ciated

with

user g

roups to

determ

ine w

hich

aggreg

ations to

gen

erate.

Data w

arehouses th

at contain

sum

mary

data p

oten

tially p

rovid

e a num

ber o

f distin

ct

data so

urces to

respond to

a specifi

c query

inclu

din

g th

e detailed

data itself an

d an

y n

um

-

ber o

f aggreg

ations th

at satisfy th

e query

’s data n

eeds. H

ow

ever, th

e perfo

rman

ce of th

e

query

will v

ary co

nsid

erably

dep

endin

g o

n th

e characteristics o

f the targ

et data, th

e most

obvio

us b

eing th

e volu

me o

f data to

be read

. As p

art of m

anag

ing th

e outfl

ow

, the sy

stem

must d

etermin

e the m

ost effi

cient w

ay to

answ

er a query

.

Me

tafl

ow

Me

tafl

ow

The p

rocesses a

ssocia

ted

with

the m

anag

em

ent o

f the m

eta

da

ta.

The p

revio

us fl

ow

s describ

e the m

anag

emen

t of th

e data w

arehouse w

ith reg

ard to

how

the

data m

oves in

and o

ut o

f the w

arehouse. M

etaflow

is the p

rocess th

at moves m

etadata

(data ab

out th

e oth

er flow

s). Metad

ata is a descrip

tion o

f the d

ata conten

ts of th

e data

wareh

ouse, w

hat is in

it, where it cam

e from

orig

inally

, and w

hat h

as been

done to

it by

way

of clean

sing, in

tegratin

g, an

d su

mm

arizing. W

e discu

ss issues asso

ciated w

ith th

e

man

agem

ent o

f metad

ata in a d

ata wareh

ouse in

Sectio

n 3

1.4

.3.

To resp

ond to

chan

gin

g b

usin

ess need

s, legacy

system

s are constan

tly ch

angin

g. T

here-

fore, th

e wareh

ouse in

volv

es respondin

g to

these co

ntin

uous ch

anges, w

hich

must refl

ect

the ch

anges to

the so

urce leg

acy sy

stems an

d th

e chan

gin

g b

usin

ess enviro

nm

ent. T

he

metafl

ow

(metad

ata) must b

e contin

uously

updated

with

these ch

anges.

Data

Wa

reh

ou

sin

g T

oo

ls a

nd

Te

ch

no

log

ies

In

this

section

we

exam

ine

the

tools

and

technolo

gies

associated

w

ith

build

ing

and

man

agin

g a d

ata wareh

ouse an

d, in

particu

lar, we fo

cus o

n th

e issues asso

ciated w

ith th

e

integ

ration o

f these to

ols. F

or m

ore in

form

ation o

n d

ata wareh

ousin

g to

ols an

d tech

-

nolo

gies, th

e interested

reader is referred

to B

erson an

d S

mith

(1997).

Ex

trac

tion

, Cle

an

sin

g, a

nd

Tra

nsfo

rm

atio

n T

oo

ls

Selectin

g th

e correct ex

traction, clean

sing, an

d tran

sform

ation to

ols are critical step

s in

the co

nstru

ction o

f a data w

arehouse. T

here are an

increasin

g n

um

ber o

f ven

dors th

at are

focu

sed o

n fu

lfillin

g th

e requirem

ents o

f data w

arehouse im

plem

entatio

ns as o

pposed

to sim

ply

movin

g d

ata betw

een h

ardw

are platfo

rms. T

he task

s of cap

turin

g d

ata from

a source sy

stem, clean

sing an

d tran

sform

ing it, an

d th

en lo

adin

g th

e results in

to a targ

et

system

can b

e carried o

ut eith

er by sep

arate pro

ducts, o

r by a sin

gle in

tegrated

solu

tion.

Integ

rated so

lutio

ns fall in

to o

ne o

f the fo

llow

ing categ

ories:

31

.4

31

.3.5

31

.4.1

Page 20: Database Warehousing

11

66

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

nco

de g

enerato

rs;

ndatab

ase data rep

lication to

ols;

ndynam

ic transfo

rmatio

n en

gin

es.

Cod

e g

enera

tors

Code g

enerato

rscreate cu

stom

ized 3

GL

/4G

L tran

sform

ation p

rogram

s based

on so

urce an

d

target d

ata defi

nitio

ns. T

he m

ain issu

e with

this ap

pro

ach is th

e man

agem

ent o

f the larg

e

num

ber o

f pro

gram

s required

to su

pport a co

mplex

corp

orate d

ata wareh

ouse. V

endors

recognize th

is issue an

d so

me are d

evelo

pin

g m

anag

emen

t com

ponen

ts emplo

yin

g tech

-

niq

ues su

ch as w

ork

flow

meth

ods an

d au

tom

ated sch

edulin

g sy

stems.

Data

base d

ata

rep

licatio

n to

ols

Datab

ase data rep

lication to

ols em

plo

y d

atabase trig

gers o

r a recovery

log to

captu

re

chan

ges to

a single d

ata source o

n o

ne sy

stem an

d ap

ply

the ch

anges to

a copy o

f the

source d

ata located

on a d

ifferent sy

stem (see C

hap

ter 24). M

ost rep

lication p

roducts d

o

not su

pport th

e captu

re of ch

anges to

non-relatio

nal fi

les and d

atabases, an

d o

ften d

o n

ot

pro

vid

e facilities for sig

nifi

cant d

ata transfo

rmatio

n an

d en

han

cemen

t. These to

ols can

be

used

to reb

uild

a datab

ase follo

win

g failu

re or to

create a datab

ase for a d

ata mart (see

Sectio

n 3

1.5

), pro

vid

ed th

at the n

um

ber o

f data so

urces is sm

all and th

e level o

f data

transfo

rmatio

n is relativ

ely sim

ple.

Dynam

ic tra

nsfo

rmatio

n e

ng

ines

Rule-d

riven

dynam

ic transfo

rmatio

n en

gin

es captu

re data fro

m a so

urce sy

stem at u

ser-

defi

ned

interv

als, transfo

rm th

e data, an

d th

en sen

d an

d lo

ad th

e results in

to a targ

et envir-

onm

ent. T

o d

ate, most p

roducts su

pport o

nly

relational d

ata sources, b

ut p

roducts are n

ow

emerg

ing th

at han

dle n

on-relatio

nal so

urce fi

les and d

atabases.

Data

Wa

reh

ou

se

DB

MS

There are few

integ

ration issu

es associated

with

the d

ata wareh

ouse d

atabase. D

ue to

the

matu

rity o

f such

pro

ducts, m

ost relatio

nal d

atabases w

ill integ

rate pred

ictably

with

oth

er

types o

f softw

are. How

ever, th

ere are issues asso

ciated w

ith th

e poten

tial size of th

e data

wareh

ouse d

atabase. P

arallelism in

the d

atabase b

ecom

es an im

portan

t issue, as w

ell as the

usu

al issues su

ch as p

erform

ance, scalab

ility, av

ailability

, and m

anag

eability

, which

must

all be tak

en in

to co

nsid

eration w

hen

choosin

g a D

BM

S. W

e first id

entify

the req

uirem

ents

for a d

ata wareh

ouse D

BM

S an

d th

en d

iscuss b

riefly h

ow

the req

uirem

ents o

f data w

are-

housin

g are su

pported

by p

arallel technolo

gies.

Req

uire

ments

for d

ata

ware

house D

BM

S

The sp

ecialized req

uirem

ents fo

r a relational D

BM

S su

itable fo

r data w

arehousin

g are

publish

ed in

a White P

aper (R

ed B

rick S

ystem

s, 1996) an

d are listed

in T

able 3

1.3

.

31

.4.2

Page 21: Database Warehousing

31.4

Data

Ware

ho

usin

g T

ools

an

d T

echnolo

gie

s|

11

67

Load

perfo

rmance

Data

wareh

ouses

require

increm

ental

load

ing of

new

data

on a

perio

dic

basis

with

in

narro

w tim

e win

dow

s. Perfo

rman

ce of th

e load

pro

cess should

be m

easured

in h

undred

s

of m

illions o

f row

s or g

igab

ytes o

f data p

er hour an

d th

ere should

be n

o m

axim

um

limit

that co

nstrain

s the b

usin

ess.

Load

pro

cessin

g

Man

y step

s must b

e taken

to lo

ad n

ew o

r updated

data in

to th

e data w

arehouse in

cludin

g

data co

nversio

ns, fi

ltering, refo

rmattin

g, in

tegrity

check

s, physical sto

rage, in

dex

ing, an

d

metad

ata update. A

lthough each

step m

ay in

practice b

e atom

ic, the lo

ad p

rocess sh

ould

appear to

execu

te as a single, seam

less unit o

f work

.

Data

quality

manag

em

ent

The sh

ift to fact-b

ased m

anag

emen

t dem

ands th

e hig

hest d

ata quality

. The w

arehouse

must en

sure lo

cal consisten

cy, g

lobal co

nsisten

cy, an

d referen

tial integ

rity d

espite ‘d

irty’

sources an

d m

assive d

atabase sizes. W

hile lo

adin

g an

d p

reparatio

n are n

ecessary step

s,

they

are not su

fficien

t. The ab

ility to

answ

er end-u

sers’ queries is th

e measu

re of su

ccess

for a d

ata wareh

ouse ap

plicatio

n. A

s more q

uestio

ns are an

swered

, analy

sts tend to

ask

more creativ

e and co

mplex

questio

ns.

Query

perfo

rmance

Fact-b

ased m

anag

emen

t and a

d h

oc

analy

sis must n

ot b

e slow

ed o

r inhib

ited b

y th

e

perfo

rman

ce of th

e data w

arehouse R

DB

MS

. Larg

e, com

plex

queries fo

r key

busin

ess

operatio

ns m

ust co

mplete in

reasonab

le time p

eriods.

Tera

byte

scala

bility

Data w

arehouse sizes are g

row

ing at en

orm

ous rates w

ith sizes ran

gin

g fro

m a few

to

hundred

s of

gig

abytes

to

terabyte-sized

(1

012

bytes)

and

petab

yte-sized

(1

015

bytes).

Tab

le 31.3

The req

uirem

ents fo

r a data w

arehouse R

DB

MS

.

Load

perfo

rman

ce

Load

pro

cessing

Data q

uality

man

agem

ent

Query

perfo

rman

ce

Terab

yte scalab

ility

Mass u

ser scalability

Netw

ork

ed d

ata wareh

ouse

Wareh

ouse ad

min

istration

Integ

rated d

imen

sional an

alysis

Advan

ced q

uery

functio

nality

Page 22: Database Warehousing

11

68

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

The R

DB

MS

must n

ot h

ave an

y arch

itectural lim

itations to

the size o

f the d

atabase an

d

should

support m

odular an

d p

arallel man

agem

ent. In

the ev

ent o

f failure, th

e RD

BM

S sh

ould

support co

ntin

ued

availab

ility, an

d p

rovid

e mech

anism

s for reco

very

. The R

DB

MS

must

support m

ass storag

e dev

ices such

as optical d

isk an

d h

ierarchical sto

rage m

anag

emen

t

dev

ices. Lastly

, query

perfo

rman

ce should

not b

e dep

enden

t on th

e size of th

e datab

ase,

but rath

er on th

e com

plex

ity o

f the q

uery

.

Mass u

ser s

cala

bility

Curren

t thin

kin

g is th

at access to a d

ata wareh

ouse is lim

ited to

relatively

low

num

bers

of m

anag

erial users. T

his is u

nlik

ely to

remain

true as th

e valu

e of d

ata wareh

ouses is

realized. It is p

redicted

that th

e data w

arehouse R

DB

MS

should

be cap

able o

f supportin

g

hundred

s, or ev

en th

ousan

ds, o

f concu

rrent u

sers while m

aintain

ing accep

table q

uery

perfo

rman

ce.

Netw

ork

ed

data

ware

house

Data w

arehouse sy

stems sh

ould

be cap

able o

f cooperatin

g in

a larger n

etwork

of d

ata ware-

houses. T

he d

ata wareh

ouse m

ust in

clude to

ols th

at coord

inate th

e movem

ent o

f subsets

of d

ata betw

een w

arehouses. U

sers should

be ab

le to lo

ok at, an

d w

ork

with

, multip

le data

wareh

ouses fro

m a sin

gle clien

t work

station.

Ware

house a

dm

inis

tratio

n

The v

ery-larg

e scale and tim

e-cyclic n

ature o

f the d

ata wareh

ouse d

eman

ds ad

min

istrat-

ive ease an

d fl

exib

ility. T

he R

DB

MS

must p

rovid

e contro

ls for im

plem

entin

g reso

urce

limits, ch

argeb

ack acco

untin

g to

allocate co

sts back

to u

sers, and q

uery

prio

ritization to

address th

e need

s of d

ifferent u

ser classes and activ

ities. The R

DB

MS

must also

pro

vid

e

for w

ork

load

trackin

g an

d tu

nin

g so

that sy

stem reso

urces m

ay b

e optim

ized fo

r max

imum

perfo

rman

ce and th

roughput. T

he m

ost v

isible an

d m

easurab

le valu

e of im

plem

entin

g

a data w

arehouse is ev

iden

ced in

the u

nin

hib

ited, creativ

e access to d

ata it pro

vid

es for

end-u

sers.

Inte

gra

ted

dim

ensio

nal a

naly

sis

The

pow

er of

multi-d

imen

sional

view

s is

wid

ely accep

ted,

and dim

ensio

nal

support

must

be

inheren

t in

th

e w

arehouse

RD

BM

S to

pro

vid

e th

e hig

hest

perfo

rman

ce fo

r

relational O

LA

P to

ols (see C

hap

ter 33). T

he R

DB

MS

must su

pport fast, easy

creation o

f

pre-co

mputed

sum

maries co

mm

on in

large d

ata wareh

ouses, an

d p

rovid

e main

tenan

ce

tools to

auto

mate th

e creation o

f these p

re-com

puted

aggreg

ates. Dynam

ic calculatio

n o

f

aggreg

ates should

be co

nsisten

t with

the in

teractive p

erform

ance n

eeds o

f the en

d-u

ser.

Ad

vanced

query

functio

nality

End-u

sers require ad

van

ced an

alytical calcu

lations, seq

uen

tial and co

mparativ

e analy

sis,

and co

nsisten

t access to d

etailed an

d su

mm

arized d

ata. Usin

g S

QL

in a clien

t–serv

er

‘poin

t-and-click

’ to

ol

enviro

nm

ent

may

so

metim

es be

impractical

or

even

im

possib

le

due to

the co

mplex

ity o

f the u

sers’ queries. T

he R

DB

MS

must p

rovid

e a com

plete an

d

advan

ced set o

f analy

tical operatio

ns.

Page 23: Database Warehousing

31.4

Data

Ware

ho

usin

g T

ools

an

d T

echnolo

gie

s|

11

69

Para

llel D

BM

Ss

Data w

arehousin

g req

uires th

e pro

cessing o

f enorm

ous am

ounts o

f data an

d p

arallel data-

base tech

nolo

gy o

ffers a solu

tion to

pro

vid

ing th

e necessary

gro

wth

in p

erform

ance. T

he

success o

f parallel D

BM

Ss d

epen

ds o

n th

e efficien

t operatio

n o

f man

y reso

urces in

clud-

ing p

rocesso

rs, mem

ory

, disk

s, and n

etwork

connectio

ns. A

s data w

arehousin

g g

row

s

in p

opularity

, man

y v

endors are b

uild

ing larg

e decisio

n-su

pport D

BM

Ss u

sing p

arallel

technolo

gies. T

he aim

is to so

lve d

ecision-su

pport p

roblem

s usin

g m

ultip

le nodes w

ork

-

ing o

n th

e same p

roblem

. The m

ajor ch

aracteristics of p

arallel DB

MS

s are scalability

,

operab

ility, an

d av

ailability

.

The

parallel

DB

MS

perfo

rms

man

y

datab

ase operatio

ns

simultan

eously

, sp

litting

indiv

idual task

s into

smaller p

arts so th

at tasks can

be sp

read acro

ss multip

le pro

cessors.

Parallel D

BM

Ss m

ust b

e capab

le of ru

nnin

g p

arallel queries. In

oth

er word

s, they

must

be ab

le to d

ecom

pose larg

e com

plex

queries in

to su

bqueries, ru

n th

e separate su

bqueries

simultan

eously

, and reassem

ble th

e results at th

e end. T

he cap

ability

of su

ch D

BM

Ss m

ust

also in

clude p

arallel data lo

adin

g, tab

le scannin

g, an

d d

ata archiv

ing an

d b

ackup. T

here

are two m

ain p

arallel hard

ware arch

itectures co

mm

only

used

as datab

ase server p

latform

s

for d

ata wareh

ousin

g:

nS

ym

metric M

ulti-P

rocessin

g (S

MP

) – a set o

f tightly

coupled

pro

cessors th

at share

mem

ory

and d

isk sto

rage;

nM

assively

Parallel P

rocessin

g (M

PP

) – a set o

f loosely

coupled

pro

cessors, each

of

which

has its o

wn m

emory

and d

isk sto

rage.

The S

MP

and M

PP

parallel arch

itectures w

ere describ

ed in

detail in

Sectio

n 2

2.1

.1.

Data

Wa

reh

ou

se

Me

tad

ata

There are m

any issu

es associated

with

data w

arehouse in

tegratio

n, h

ow

ever in

this sectio

n

we fo

cus o

n th

e integ

ration o

f metad

ata, that is ‘d

ata about d

ata’ (Darlin

g, 1

996). T

he

man

agem

ent o

f the m

etadata in

the w

arehouse is an

extrem

ely co

mplex

and d

ifficu

lt task.

Metad

ata is used

for a v

ariety o

f purp

oses an

d th

e man

agem

ent o

f metad

ata is a critical

issue in

achiev

ing a fu

lly in

tegrated

data w

arehouse.

The m

ajor p

urp

ose o

f metad

ata is to sh

ow

the p

athw

ay b

ack to

where th

e data b

egan

,

so th

at the w

arehouse ad

min

istrators k

now

the h

istory

of an

y item

in th

e wareh

ouse.

How

ever, th

e pro

blem

is that m

etadata h

as several fu

nctio

ns w

ithin

the w

arehouse th

at

relates to th

e pro

cesses associated

with

data tran

sform

ation an

d lo

adin

g, d

ata wareh

ouse

man

agem

ent, an

d q

uery

gen

eration (see S

ection 3

1.2

.9).

The

metad

ata asso

ciated

with

data

transfo

rmatio

n

and

load

ing

must

describ

e th

e

source d

ata and an

y ch

anges th

at were m

ade to

the d

ata. For ex

ample, fo

r each so

urce

field

there sh

ould

be a u

niq

ue id

entifi

er, orig

inal fi

eld n

ame, so

urce d

ata type, an

d o

rigin

al

locatio

n in

cludin

g th

e system

and o

bject n

ame, alo

ng w

ith th

e destin

ation d

ata type an

d

destin

ation tab

le nam

e. If the fi

eld is su

bject to

any tran

sform

ations su

ch as a sim

ple fi

eld

type ch

ange to

a com

plex

set of p

roced

ures an

d fu

nctio

ns, th

is should

also b

e record

ed.

The m

etadata asso

ciated w

ith d

ata man

agem

ent d

escribes th

e data as it is sto

red in

the

wareh

ouse. E

very

object in

the d

atabase n

eeds to

be d

escribed

inclu

din

g th

e data in

each

31

.4.3

Page 24: Database Warehousing

11

70

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

table, in

dex

, and v

iew, an

d an

y asso

ciated co

nstrain

ts. This in

form

ation is h

eld in

the

DB

MS

system

catalog, h

ow

ever, th

ere are additio

nal req

uirem

ents fo

r the p

urp

oses o

f

the w

arehouse. F

or ex

ample, m

etadata sh

ould

also d

escribe an

y fi

elds asso

ciated w

ith

aggreg

ations, in

cludin

g a d

escriptio

n o

f the ag

greg

ation th

at was p

erform

ed. In

additio

n,

table p

artitions sh

ould

be d

escribed

inclu

din

g in

form

ation o

n th

e partitio

n k

ey, an

d th

e

data ran

ge asso

ciated w

ith th

at partitio

n.

The m

etadata d

escribed

above is also

required

by th

e query

man

ager to

gen

erate appro

-

priate q

ueries. In

turn

, the q

uery

man

ager g

enerates ad

ditio

nal m

etadata ab

out th

e queries

that are ru

n, w

hich

can b

e used

to g

enerate a h

istory

on all th

e queries an

d a q

uery

pro

file

for each

user, g

roup o

f users, o

r the d

ata wareh

ouse. T

here is also

metad

ata associated

with

the u

sers of q

ueries th

at inclu

des, fo

r exam

ple, in

form

ation d

escribin

g w

hat th

e term

‘price’ o

r ‘custo

mer’ m

eans in

a particu

lar datab

ase and w

heth

er the m

eanin

g h

as chan

ged

over tim

e.

Synchro

niz

ing

meta

data

The m

ajor in

tegratio

n issu

e is how

to sy

nch

ronize th

e vario

us ty

pes o

f metad

ata used

thro

ughout th

e data w

arehouse. T

he v

arious to

ols o

f a data w

arehouse g

enerate an

d u

se

their o

wn m

etadata, an

d to

achiev

e integ

ration, w

e require th

at these to

ols are cap

able o

f

sharin

g th

eir metad

ata. The ch

allenge is to

synch

ronize m

etadata b

etween

differen

t pro

d-

ucts fro

m d

ifferent v

endors u

sing d

ifferent m

etadata sto

res. For ex

ample, it is n

ecessary

to id

entify

the co

rrect item o

f metad

ata at the rig

ht lev

el of d

etail from

one p

roduct an

d

map

it to th

e appro

priate item

of m

etadata at th

e right lev

el of d

etail in an

oth

er pro

duct,

then

sort o

ut an

y co

din

g d

ifferences b

etween

them

. This h

as to b

e repeated

for all o

ther

metad

ata that th

e two p

roducts h

ave in

com

mon. F

urth

er, any ch

anges to

the m

etadata

(or ev

en m

eta-metad

ata), in o

ne p

roduct n

eeds to

be co

nvey

ed to

the o

ther p

roduct. T

he

task o

f synch

ronizin

g tw

o p

roducts is h

ighly

com

plex

, and th

erefore rep

eating th

is pro

cess

for six

or m

ore p

roducts th

at mak

e up th

e data w

arehouse can

be reso

urce in

tensiv

e.

How

ever, in

tegratio

n o

f the m

etadata m

ust b

e achiev

ed.

In th

e beg

innin

g th

ere were tw

o m

ajor stan

dard

s for m

etadata an

d m

odelin

g in

the

areas of

data

wareh

ousin

g an

d co

mponen

t-based

dev

elopm

ent

pro

posed

by th

e M

eta

Data C

oalitio

n (M

DC

) and th

e Object M

anag

emen

t Gro

up (O

MG

). How

ever, th

ese two

industry

org

anizatio

ns jo

intly

announced

that th

e MD

C w

ould

merg

e into

the O

MG

. As

a result, th

e MD

C d

iscontin

ued

indep

enden

t operatio

ns an

d w

ork

contin

ued

in th

e OM

G

to in

tegrate th

e two stan

dard

s.

The m

erger o

f MD

C in

to th

e OM

G m

arked

an ag

reemen

t of th

e majo

r data w

are-

housin

g an

d m

etadata v

endors to

converg

e on o

ne stan

dard

, inco

rporatin

g th

e best o

f the

MD

C’s O

pen

Info

rmatio

n M

odel (O

IM) w

ith th

e best o

f the O

MG

’s Com

mon W

arehouse

Metam

odel (C

WM

). This w

ork

is now

com

plete an

d th

e resultin

g sp

ecificatio

n issu

ed b

y

the O

MG

as the n

ext v

ersion o

f the C

WM

is discu

ssed in

Sectio

n 2

7.1

.3. A

single stan

d-

ard allo

ws u

sers to ex

chan

ge m

etadata b

etween

differen

t pro

ducts fro

m d

ifferent v

endors

freely.

The

OM

G’s

CW

M

build

s on

vario

us

standard

s, in

cludin

g

OM

G’s

UM

L

(Unifi

ed

Modelin

g

Lan

guag

e), X

MI

(XM

L

Metad

ata In

terchan

ge),

and

MO

F

(Meta

Object

Facility

), and o

n th

e MD

C’s O

IM. T

he C

WM

was d

evelo

ped

by a n

um

ber o

f com

pan

ies,

inclu

din

g IB

M, O

racle, Unisy

s, Hyperio

n, G

enesis, N

CR

, UB

S, an

d D

imen

sion E

DI.

Page 25: Database Warehousing

31.5

Data

Marts

|1

17

1

Ad

min

istra

tion

an

d M

an

ag

em

en

t To

ols

A d

ata wareh

ouse req

uires to

ols to

support th

e adm

inistratio

n an

d m

anag

emen

t of su

ch

a com

plex

enviro

nm

ent. T

hese to

ols are relativ

ely scarce, esp

ecially th

ose th

at are well

integ

rated w

ith th

e vario

us ty

pes o

f metad

ata and th

e day

-to-d

ay o

peratio

ns o

f the d

ata

wareh

ouse. T

he d

ata wareh

ouse ad

min

istration an

d m

anag

emen

t tools m

ust b

e capab

le of

supportin

g th

e follo

win

g task

s:

nm

onito

ring d

ata load

ing fro

m m

ultip

le sources;

ndata q

uality

and in

tegrity

check

s;

nm

anag

ing an

d u

pdatin

g m

etadata;

nm

onito

ring d

atabase p

erform

ance to

ensu

re efficien

t query

response tim

es and reso

urce

utilizatio

n;

nau

ditin

g d

ata wareh

ouse u

sage to

pro

vid

e user ch

argeb

ack in

form

ation;

nrep

licating, su

bsettin

g, an

d d

istributin

g d

ata;

nm

aintain

ing effi

cient d

ata storag

e man

agem

ent;

npurg

ing d

ata;

narch

ivin

g an

d b

ackin

g-u

p d

ata;

nim

plem

entin

g reco

very

follo

win

g failu

re;

nsecu

rity m

anag

emen

t.

Data

Ma

rts

Acco

mpan

yin

g th

e rapid

emerg

ence o

f data w

arehouses is th

e related co

ncep

t of d

ata

marts. In

this sectio

n w

e describ

e what d

ata marts are, th

e reasons fo

r build

ing d

ata marts,

and th

e issues asso

ciated w

ith th

e dev

elopm

ent an

d u

se of d

ata marts.

Da

taA

sub

set o

f a d

ata

ware

house th

at s

up

ports

the re

quire

me

nts

of a

partic

ula

r

ma

rtd

ep

artm

ent o

r busin

ess fu

nctio

n.

A d

ata mart h

old

s a subset o

f the d

ata in a d

ata wareh

ouse n

orm

ally in

the fo

rm o

f

sum

mary

data relatin

g to

a particu

lar dep

artmen

t or b

usin

ess functio

n. T

he d

ata mart can

be stan

dalo

ne o

r linked

centrally

to th

e corp

orate d

ata wareh

ouse. A

s a data w

arehouse

gro

ws larg

er, the ab

ility to

serve th

e vario

us n

eeds o

f the o

rgan

ization m

ay b

e com

pro

m-

ised. T

he p

opularity

of d

ata marts stem

s from

the fact th

at corp

orate-w

ide d

ata wareh

ouses

are pro

vin

g d

ifficu

lt to b

uild

and u

se. The ty

pical arch

itecture fo

r a data w

arehouse an

d

associated

data m

art is show

n in

Fig

ure 3

1.3

. The ch

aracteristics that d

ifferentiate d

ata

marts an

d d

ata wareh

ouses in

clude:

na d

ata mart fo

cuses o

n o

nly

the req

uirem

ents o

f users asso

ciated w

ith o

ne d

epartm

ent

or b

usin

ess functio

n;

ndata m

arts do n

ot n

orm

ally co

ntain

detailed

operatio

nal d

ata, unlik

e data w

arehouses;

31

.5

31

.4.4

Page 26: Database Warehousing

11

72

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

Fig

ure

31

.3T

yp

ica

l da

ta w

are

ho

use

an

d d

ata

ma

rt arc

hite

ctu

re.

Page 27: Database Warehousing

31.5

Data

Marts

|1

17

3

nas d

ata marts co

ntain

less data co

mpared

with

data w

arehouses, d

ata marts are m

ore

easily u

ndersto

od an

d n

avig

ated.

There are sev

eral appro

aches to

build

ing d

ata marts. O

ne ap

pro

ach is to

build

several

data m

arts with

a view

to th

e even

tual in

tegratio

n in

to a w

arehouse; an

oth

er appro

ach is

to b

uild

the in

frastructu

re for a co

rporate d

ata wareh

ouse w

hile at th

e same tim

e build

ing

one o

r more d

ata marts to

satisfy im

med

iate busin

ess need

s.

Data m

art architectu

res can b

e built as tw

o-tier o

r three-tier d

atabase ap

plicatio

ns. T

he

data w

arehouse is th

e optio

nal fi

rst tier (if the d

ata wareh

ouse p

rovid

es the d

ata for th

e

data m

art), the d

ata mart is th

e second tier, an

d th

e end-u

ser work

station is th

e third

tier,

as show

n in

Fig

ure 3

1.3

. Data is d

istributed

among th

e tiers.

Re

aso

ns fo

r Cre

atin

g a

Data

Ma

rt

There are m

any reaso

ns fo

r creating a d

ata mart, w

hich

inclu

de:

nT

o g

ive u

sers access to th

e data th

ey n

eed to

analy

ze most o

ften.

nT

o p

rovid

e data in

a form

that m

atches th

e collectiv

e view

of th

e data b

y a g

roup o

f

users in

a dep

artmen

t or b

usin

ess functio

n.

nT

o im

pro

ve en

d-u

ser response tim

e due to

the red

uctio

n in

the v

olu

me o

f data to

be

accessed.

nT

o p

rovid

e appro

priately

structu

red d

ata as dictated

by th

e requirem

ents o

f end-u

ser

access tools su

ch as O

nlin

e Analy

tical Pro

cessing (O

LA

P) an

d d

ata min

ing to

ols, w

hich

may

require th

eir ow

n in

ternal d

atabase stru

ctures. In

practice, th

ese tools o

ften create

their o

wn d

ata mart d

esigned

to su

pport th

eir specifi

c functio

nality

.

nD

ata marts n

orm

ally u

se less data so

tasks su

ch as d

ata cleansin

g, lo

adin

g, tran

sform

a-

tion, an

d in

tegratio

n are far easier, an

d h

ence im

plem

entin

g an

d settin

g u

p a d

ata mart

is simpler th

an estab

lishin

g a co

rporate d

ata wareh

ouse.

nT

he co

st of im

plem

entin

g d

ata marts is n

orm

ally less th

an th

at required

to estab

lish a

data w

arehouse.

nT

he p

oten

tial users o

f a data m

art are more clearly

defi

ned

and can

be m

ore easily

targeted

to o

btain

support fo

r a data m

art pro

ject rather th

an a co

rporate d

ata wareh

ouse p

roject.

Data

Ma

rts

Issu

es

The issu

es associated

with

the d

evelo

pm

ent an

d m

anag

emen

t of d

ata marts are listed

in

Tab

le 31.4

(Bro

oks, 1

997).

Data

mart fu

nctio

nality

The cap

abilities o

f data m

arts hav

e increased

with

the g

row

th in

their p

opularity

. Rath

er

than

bein

g sim

ply

small, easy

-to-access d

atabases, so

me d

ata marts m

ust n

ow

be scalab

le

to h

undred

s of g

igab

ytes (G

b), an

d p

rovid

e sophisticated

analy

sis usin

g O

nlin

e Analy

tical

31

.5.1

31

.5.2

Page 28: Database Warehousing

11

74

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

Pro

cessing (O

LA

P) an

d/o

r data m

inin

g to

ols. F

urth

er, hundred

s of u

sers must b

e capab

le

of rem

otely

accessing th

e data m

art. The co

mplex

ity an

d size o

f som

e data m

arts are

match

ing th

e characteristics o

f small-scale co

rporate d

ata wareh

ouses.

Data

mart s

ize

Users ex

pect faster resp

onse tim

es from

data m

arts than

from

data w

arehouses, h

ow

ever,

perfo

rman

ce deterio

rates as data m

arts gro

w in

size. Sev

eral ven

dors o

f data m

arts are

investig

ating w

ays to

reduce th

e size of d

ata marts to

gain

impro

vem

ents in

perfo

rm-

ance. F

or ex

ample, d

ynam

ic dim

ensio

ns allo

w ag

greg

ations to

be calcu

lated o

n d

eman

d

rather th

an p

re-calculated

and sto

red in

the m

ulti-d

imen

sional d

atabase (M

DD

B) cu

be

(see Chap

ter 33).

Data

mart lo

ad

perfo

rmance

A d

ata mart h

as to b

alance tw

o critical co

mponen

ts: end-u

ser response tim

e and d

ata

load

ing perfo

rman

ce. A

data

mart

desig

ned

fo

r fast

user

response

will

hav

e a

large

num

ber o

f sum

mary

tables an

d ag

greg

ate valu

es. Unfo

rtunately

, the creatio

n o

f such

tables

and v

alues g

reatly in

creases the tim

e of th

e load

pro

cedure. V

endors are in

vestig

ating

impro

vem

ents in

the lo

ad p

roced

ure b

y p

rovid

ing in

dex

es that au

tom

atically an

d co

n-

tinually

adap

t to th

e data b

eing p

rocessed

or b

y su

pportin

g in

cremen

tal datab

ase updatin

g

so th

at only

cells affected b

y th

e chan

ge are u

pdated

and n

ot th

e entire M

DD

B stru

cture.

Users

’ access to

data

in m

ultip

le d

ata

marts

One ap

pro

ach is to

replicate d

ata betw

een d

ifferent d

ata marts o

r, alternativ

ely, b

uild

virtual

data

marts. V

irtual d

ata marts are v

iews o

f several p

hysical d

ata marts o

r the

corp

orate d

ata wareh

ouse tailo

red to

meet th

e requirem

ents o

f specifi

c gro

ups o

f users.

Com

mercial p

roducts th

at man

age v

irtual d

ata marts are av

ailable.

Data

mart In

tern

et/In

tranet a

ccess

Intern

et/Intran

et tech

nolo

gy

offers

users

low

-cost

access to

data

marts

and

the

data

wareh

ouse

usin

g

Web

bro

wsers

such

as

Netscap

e N

avig

ator

and

Micro

soft

Intern

et

Tab

le31.4

The issu

es associated

with

data m

arts.

Data m

art functio

nality

Data m

art size

Data m

art load

perfo

rman

ce

Users access to

data in

multip

le data m

arts

Data m

art Intern

et/intran

et access

Data m

art adm

inistratio

n

Data m

art installatio

n

Page 29: Database Warehousing

31.6

Data

Wa

rehousin

g U

sin

g O

racle

|1

17

5

Explo

rer. Data m

art Intern

et/Intran

et pro

ducts n

orm

ally sit b

etween

a Web

server an

d th

e

data an

alysis p

roduct. V

endors are d

evelo

pin

g p

roducts w

ith in

creasingly

advan

ced W

eb

capab

ilities. These p

roducts in

clude Jav

a and A

ctiveX

capab

ilities. We d

iscussed

Web

and

DB

MS

integ

ration in

detail in

Chap

ter 29.

Data

mart a

dm

inis

tratio

n

As th

e num

ber o

f data m

arts in an

org

anizatio

n in

creases, so d

oes th

e need

to cen

trally

man

age an

d co

ord

inate d

ata mart activ

ities. Once d

ata is copied

to d

ata marts, d

ata can

beco

me in

consisten

t as users alter th

eir ow

n d

ata marts to

allow

them

to an

alyze d

ata in

differen

t way

s. Org

anizatio

ns can

not easily

perfo

rm ad

min

istration o

f multip

le data m

arts,

giv

ing rise to

issues su

ch as d

ata mart v

ersionin

g, d

ata and m

etadata co

nsisten

cy an

d

integ

rity, en

terprise-w

ide secu

rity, an

d p

erform

ance tu

nin

g. D

ata mart ad

min

istrative to

ols

are com

mercially

availab

le.

Data

mart in

sta

llatio

n

Data m

arts are beco

min

g in

creasingly

com

plex

to b

uild

. Ven

dors are o

ffering p

roducts

referred to

as ‘data m

arts in a b

ox’ th

at pro

vid

e a low

-cost so

urce o

f data m

art tools.

Data

Wa

reh

ou

sin

g U

sin

g O

racle

In C

hap

ter 8 w

e pro

vid

ed a g

eneral o

verv

iew o

f the m

ajor featu

res of th

e Oracle D

BM

S.

In th

is section w

e describ

e the featu

res of O

racle9i

Enterp

rise Editio

n th

at are specifi

cally

desig

ned

to

im

pro

ve

perfo

rman

ce an

d m

anag

eability

fo

r th

e data

wareh

ouse

(Oracle

Corp

oratio

n, 2

004f).

Ora

cle

9i

Oracle9

iE

nterp

rise E

ditio

n

is one

of

the

leadin

g

relational

DB

MS

fo

r data

ware-

housin

g. O

racle has ach

ieved

this su

ccess by fo

cusin

g o

n b

asic, core req

uirem

ents fo

r data

wareh

ousin

g: p

erform

ance, scalab

ility, an

d m

anag

eability

. Data w

arehouses sto

re larger

volu

mes o

f data, su

pport m

ore u

sers, and req

uire faster p

erform

ance, so

that th

ese core

requirem

ents rem

ain k

ey facto

rs in th

e successfu

l implem

entatio

n o

f data w

arehouses.

How

ever, O

racle goes b

eyond th

ese core req

uirem

ents an

d is th

e first tru

e ‘data w

arehouse

platfo

rm’. D

ata wareh

ouse ap

plicatio

ns req

uire sp

ecialized p

rocessin

g tech

niq

ues to

allow

support fo

r com

plex

, ad h

oc

queries ru

nnin

g ag

ainst larg

e amounts o

f data. T

o ad

dress

these sp

ecial requirem

ents, O

racle offers a v

ariety o

f query

pro

cessing tech

niq

ues, so

phis-

ticated q

uery

optim

ization to

choose th

e most effi

cient d

ata access path

, and a scalab

le

architectu

re that tak

es full ad

van

tage o

f all parallel h

ardw

are confi

guratio

ns. S

uccessfu

l

data w

arehouse ap

plicatio

ns rely

on su

perio

r perfo

rman

ce when

accessing th

e enorm

ous

amounts o

f stored

data. O

racle pro

vid

es a rich v

ariety o

f integ

rated in

dex

ing sch

emes,

join

meth

ods, an

d su

mm

ary m

anag

emen

t features, to

deliv

er answ

ers quick

ly to

data

31

.6

31

.6.1

Page 30: Database Warehousing

11

76

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

wareh

ouse u

sers. Oracle also

addresses ap

plicatio

ns th

at hav

e mix

ed w

ork

load

s and w

here

adm

inistrato

rs wan

t to co

ntro

l which

users, o

r gro

ups o

f users, h

ave p

riority

when

execu

t-

ing tran

sactions o

r queries. In

this sectio

n w

e pro

vid

e an o

verv

iew o

f the m

ain featu

res

of

Oracle,

which

are

particu

larly aim

ed at

supportin

g data

wareh

ousin

g ap

plicatio

ns.

These featu

res inclu

de:

nsu

mm

ary m

anag

emen

t;

nan

alytical fu

nctio

ns;

nbitm

apped

index

es;

nad

van

ced jo

in m

ethods;

nso

phisticated

SQ

L o

ptim

izer;

nreso

urce m

anag

emen

t.

Sum

mary

manag

em

ent

In a d

ata wareh

ouse ap

plicatio

n, u

sers often

issue q

ueries th

at sum

marize d

etail data b

y

com

mon d

imen

sions, su

ch as m

onth

, pro

duct, o

r regio

n. O

racle pro

vid

es a mech

anism

for

storin

g m

ultip

le dim

ensio

ns an

d su

mm

ary calcu

lations o

n a tab

le. Thus, w

hen

a query

requests a su

mm

ary o

f detail reco

rds, th

e query

is transp

arently

re-written

to access th

e

stored

aggreg

ates rather th

an su

mm

ing th

e detail reco

rds ev

ery tim

e the q

uery

is issued

.

This resu

lts in d

ramatic im

pro

vem

ents in

query

perfo

rman

ce. These su

mm

aries are auto

-

matically

main

tained

from

data in

the b

ase tables. O

racle also p

rovid

es sum

mary

adviso

ry

functio

ns th

at assist datab

ase adm

inistrato

rs in ch

oosin

g w

hich

sum

mary

tables are th

e

most effectiv

e, dep

endin

g o

n actu

al work

load

and sch

ema statistics. O

racle Enterp

rise

Man

ager su

pports th

e creation an

d m

anag

emen

t of m

aterialized v

iews an

d related

dim

en-

sions an

d h

ierarchies v

ia a grap

hical in

terface, greatly

simplify

ing th

e man

agem

ent o

f

materialized

view

s.

Analy

tical fu

nctio

ns

Oracle9

iin

cludes a ran

ge o

f SQ

L fu

nctio

ns fo

r busin

ess intellig

ence an

d d

ata wareh

ous-

ing ap

plicatio

ns. T

hese fu

nctio

ns are co

llectively

called ‘an

alytical fu

nctio

ns’, an

d th

ey

pro

vid

e impro

ved

perfo

rman

ce and sim

plifi

ed co

din

g fo

r man

y b

usin

ess analy

sis queries.

Som

e exam

ples o

f the n

ew cap

abilities are:

nran

kin

g (fo

r exam

ple, w

ho are th

e top ten

sales reps in

each reg

ion o

f Great B

ritain?);

nm

ovin

g ag

greg

ates (for ex

ample, w

hat is th

e three-m

onth

movin

g av

erage o

f pro

perty

sales?);

noth

er functio

ns in

cludin

g cu

mulativ

e aggreg

ates, lag/lead

expressio

ns, p

eriod-o

ver-p

eriod

com

pariso

ns, an

d ratio

-to-rep

ort.

Oracle also

inclu

des th

e CU

BE

and R

OL

LU

P o

perato

rs for O

LA

P an

alysis, v

ia SQ

L.

These an

alytical an

d O

LA

P fu

nctio

ns sig

nifi

cantly

exten

d th

e capab

ilities of O

racle for

analy

tical applicatio

ns (see C

hap

ter 33).

Page 31: Database Warehousing

31.6

Data

Wa

rehousin

g U

sin

g O

racle

|1

17

7

Bitm

ap

ped

ind

exes

Bitm

apped

index

es deliv

er perfo

rman

ce ben

efits to

data w

arehouse ap

plicatio

ns. T

hey

coex

ist w

ith,

and

com

plem

ent,

oth

er av

ailable

index

ing

schem

es, in

cludin

g

standard

B-tree

index

es, clu

stered tab

les, an

d hash

clu

sters. W

hile

a B

-tree in

dex

m

ay be

the

most effi

cient w

ay to

retrieve d

ata usin

g a u

niq

ue id

entifi

er, bitm

apped

index

es are most

efficien

t when

retrievin

g d

ata based

on m

uch

wid

er criteria, such

as ‘How

man

y fl

ats were

sold

last month

?’ In d

ata wareh

ousin

g ap

plicatio

ns, en

d-u

sers often

query

data b

ased o

n

these w

ider criteria. O

racle enab

les efficien

t storag

e of b

itmap

index

es thro

ugh th

e use o

f

advan

ced d

ata com

pressio

n tech

nolo

gy.

Ad

vanced

join

meth

od

s

Oracle o

ffers partitio

n-w

ise join

s, which

dram

atically in

crease the p

erform

ance o

f join

s

involv

ing tab

les that h

ave b

een p

artitioned

on th

e join

key

s. Join

ing reco

rds in

match

ing

partitio

ns

increases

perfo

rman

ce, by av

oid

ing partitio

ns

that

could

not

possib

ly hav

e

match

ing k

ey reco

rds. L

ess mem

ory

is also u

sed sin

ce less in-m

emory

sortin

g is req

uired

.

Hash

jo

ins

deliv

er hig

her

perfo

rman

ce over

oth

er jo

in m

ethods

in m

any co

mplex

queries, esp

ecially fo

r those q

ueries w

here ex

isting in

dex

es cannot b

e leverag

ed in

join

pro

cessing, a co

mm

on o

ccurren

ce in a

d h

oc

query

enviro

nm

ents. T

his jo

in elim

inates th

e

need

to p

erform

sorts, b

y u

sing an

in-m

emory

hash

table co

nstru

cted at ru

ntim

e. The h

ash

join

is also id

eally su

ited fo

r scalable p

arallel execu

tion.

Sop

his

ticate

d S

QL o

ptim

izer

Oracle

pro

vid

es num

erous

pow

erful

query

pro

cessing tech

niq

ues

that

are co

mpletely

transp

arent

to th

e en

d-u

ser. T

he

Oracle

cost-b

ased optim

izer dynam

ically determ

ines

the m

ost effi

cient access p

aths an

d jo

ins fo

r every

query

. It inco

rporates tran

sform

ation

technolo

gy th

at auto

matically

re-writes q

ueries g

enerated

by en

d-u

ser tools, fo

r efficien

t

query

execu

tion.

To ch

oose th

e most effi

cient q

uery

execu

tion strateg

y, th

e Oracle co

st-based

optim

izer

takes in

to acco

unt statistics, su

ch as th

e size of each

table an

d th

e selectivity

of each

query

conditio

n. H

istogram

s pro

vid

e the co

st-based

optim

izer with

more d

etailed statistics b

ased

on a sk

ewed

, non-u

nifo

rm d

ata distrib

utio

n. T

he co

st-based

optim

izer optim

izes execu

tion

of q

ueries in

volv

ed in

a star schem

a, which

is com

mon in

data w

arehouse ap

plicatio

ns

(see Sectio

n 3

2.2

). By u

sing a so

phisticated

star-query

optim

ization alg

orith

m an

d b

it-

map

ped

index

es, Oracle can

dram

atically red

uce th

e query

execu

tions d

one in

a traditio

nal

join

fashio

n. O

racle query

pro

cessing n

ot o

nly

inclu

des a co

mpreh

ensiv

e set of sp

ecialized

techniq

ues in

all areas (optim

ization, access an

d jo

in m

ethods, an

d q

uery

execu

tion), th

ey

are also all seam

lessly in

tegrated

, and w

ork

togeth

er to d

eliver th

e full p

ow

er of th

e query

pro

cessing en

gin

e.

Resourc

e m

anag

em

ent

Man

agin

g C

PU

and d

isk reso

urces in

a multi-u

ser data w

arehouse o

r OL

TP

applicatio

n

is challen

gin

g. A

s more u

sers require access, co

nten

tion fo

r resources b

ecom

es greater.

Page 32: Database Warehousing

11

78

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

Oracle h

as resource m

anag

emen

t functio

nality

that p

rovid

es contro

l of sy

stem reso

urces

assigned

to u

sers. Importan

t onlin

e users, su

ch as o

rder en

try clerk

s, can b

e giv

en a h

igh

prio

rity, w

hile o

ther u

sers – th

ose ru

nnin

g b

atch rep

orts –

receive lo

wer p

riorities. U

sers

are assigned

to reso

urce classes, su

ch as ‘o

rder en

try’ o

r ‘batch

,’ and each

resource class

is th

en assig

ned

an

ap

pro

priate

percen

tage

of

mach

ine

resources.

In th

is w

ay,

hig

h-

prio

rity u

sers are giv

en m

ore sy

stem reso

urces th

an lo

wer-p

riority

users.

Ad

ditio

nal d

ata

ware

house fe

atu

res

Oracle also

inclu

des m

any featu

res that im

pro

ve th

e man

agem

ent an

d p

erform

ance o

f data

wareh

ouse ap

plicatio

ns. In

dex

rebuild

s can b

e done o

nlin

e with

out in

terruptin

g in

serts,

updates, o

r deletes th

at may

be o

ccurrin

g o

n th

e base tab

le. Functio

n-b

ased in

dex

es can b

e

used

to in

dex

expressio

ns, su

ch as arith

metic ex

pressio

ns, o

r functio

ns th

at modify

colu

mn

valu

es. The sam

ple scan

functio

nality

allow

s queries to

run an

d o

nly

access a specifi

ed

percen

tage o

f the ro

ws o

r blo

cks o

f a table. T

his is u

seful fo

r gettin

g m

eanin

gfu

l aggreg

ate

amounts, su

ch as an

averag

e, with

out accessin

g ev

ery ro

w o

f a table.

Ch

ap

ter S

um

ma

ry

nD

ata

wareh

ou

sing

is subject-o

riented

, integ

rated, tim

e-varian

t, and n

on-v

olatile co

llection o

f data in

sup-

port o

f man

agem

ent’s d

ecision-m

akin

g p

rocess. A

data w

arehouse is d

ata man

agem

ent an

d d

ata analy

sis

technolo

gy.

nD

ata

Web

hou

se is a distrib

uted

data w

arehouse th

at is implem

ented

over th

e Web

with

no cen

tral data

reposito

ry.

nT

he p

oten

tial ben

efits o

f data w

arehousin

g are h

igh retu

rns o

n in

vestm

ent, su

bstan

tial com

petitiv

e advan

tage,

and in

creased p

roductiv

ity o

f corp

orate d

ecision-m

akers.

nA

DB

MS

built fo

r On

line T

ran

sactio

n P

rocessin

g (O

LT

P) is g

enerally

regard

ed as u

nsu

itable fo

r data w

are-

housin

g b

ecause each

system

is desig

ned

with

a differin

g set o

f requirem

ents in

min

d. F

or ex

ample, O

LT

P

system

s are desig

n to

max

imize th

e transactio

n p

rocessin

g cap

acity, w

hile d

ata wareh

ouses are d

esigned

to

support a

d h

oc

query

pro

cessing.

nT

he m

ajor co

mponen

ts of a d

ata wareh

ouse in

clude th

e operatio

nal d

ata sources, o

peratio

nal d

ata store, lo

ad

man

ager, w

arehouse m

anag

er, query

man

ager, d

etailed, lig

htly

and h

ighly

sum

marized

data, arch

ive/b

ackup

data, m

etadata, an

d en

d-u

ser access tools.

nT

he o

pera

tion

al d

ata

source fo

r the d

ata wareh

ouse is su

pplied

from

main

frame o

peratio

nal d

ata held

in fi

rst

gen

eration h

ierarchical an

d n

etwork

datab

ases, dep

artmen

tal data h

eld in

pro

prietary

file sy

stems, p

rivate d

ata

held

on w

ork

stations an

d p

rivate serv

ers and ex

ternal sy

stems su

ch as th

e Intern

et, com

mercially

availab

le

datab

ases, or d

atabases asso

ciated w

ith an

org

anizatio

n’s su

ppliers o

r custo

mers.

nT

he o

pera

tion

al d

ata

store (O

DS

)is a rep

osito

ry o

f curren

t and in

tegrated

operatio

nal d

ata used

for an

alysis.

It is often

structu

red an

d su

pplied

with

data in

the sam

e way

as the d

ata wareh

ouse, b

ut m

ay in

fact simply

act

as a stagin

g area fo

r data to

be m

oved

into

the w

arehouse.

Page 33: Database Warehousing

Ch

ap

ter S

um

mary

|1

17

9

nT

he lo

ad

man

ager

(also called

the fro

nten

dco

mponen

t) perfo

rms all th

e operatio

ns asso

ciated w

ith th

e

extractio

n an

d lo

adin

g o

f data in

to th

e wareh

ouse. T

hese o

peratio

ns in

clude sim

ple tran

sform

ations o

f the d

ata

to p

repare th

e data fo

r entry

into

the w

arehouse.

nT

he w

areh

ou

se man

ager

perfo

rms all th

e operatio

ns asso

ciated w

ith th

e man

agem

ent o

f the d

ata in th

e

wareh

ouse. T

he o

peratio

ns p

erform

ed b

y th

is com

ponen

t inclu

de an

alysis o

f data to

ensu

re consisten

cy, tran

s-

form

ation an

d m

ergin

g o

f source d

ata, creation o

f index

es and v

iews, g

eneratio

n o

f den

orm

alizations an

d

aggreg

ations, an

d arch

ivin

g an

d b

ackin

g-u

p d

ata.

nT

he q

uery

man

ager

(also called

the b

acken

dco

mponen

t) perfo

rms all th

e operatio

ns asso

ciated w

ith th

e

man

agem

ent o

f user q

ueries. T

he o

peratio

ns p

erform

ed b

y th

is com

ponen

t inclu

de d

irecting q

ueries to

the

appro

priate tab

les and sch

edulin

g th

e execu

tion o

f queries.

nE

nd

-user a

ccess tools

can b

e categorized

into

five m

ain g

roups: d

ata reportin

g an

d q

uery

tools, ap

plicatio

n

dev

elopm

ent to

ols, ex

ecutiv

e info

rmatio

n sy

stem (E

IS) to

ols, O

nlin

e Analy

tical Pro

cessing (O

LA

P) to

ols, an

d

data m

inin

g to

ols.

nD

ata w

arehousin

g fo

cuses

on th

e m

anag

emen

t of

five

prim

ary data

flow

s, nam

ely th

e in

flow

, upfl

ow

,

dow

nfl

ow

, outfl

ow

, and m

etaflow

.

nIn

flow

is the

pro

cesses associated

with

the ex

traction, clean

sing, an

d lo

adin

g o

f the d

ata from

the so

urce

system

s into

the d

ata wareh

ouse.

nU

pfl

ow

is the p

rocesses asso

ciated w

ith ad

din

g v

alue to

the d

ata in th

e wareh

ouse th

rough su

mm

arizing,

pack

agin

g, an

d d

istributio

n o

f the d

ata.

nD

ow

nfl

ow

is the p

rocesses asso

ciated w

ith arch

ivin

g an

d b

ackin

g-u

p o

f data in

the w

arehouse.

nO

utfl

ow

is the p

rocesses asso

ciated w

ith m

akin

g th

e data av

ailable to

the en

d-u

sers.

nM

etafl

ow

is the p

rocesses asso

ciated w

ith th

e man

agem

ent o

f the m

etadata (d

ata about d

ata).

nT

he req

uirem

ents fo

r a data w

arehouse R

DB

MS

inclu

de lo

ad p

erform

ance, lo

ad p

rocessin

g, d

ata quality

man

agem

ent,

query

perfo

rman

ce, terab

yte

scalability

, m

ass user

scalability

, netw

ork

ed data

wareh

ouse,

wareh

ouse ad

min

istration, in

tegrated

dim

ensio

nal an

alysis, an

d ad

van

ced q

uery

functio

nality

.

nD

ata

mart is a su

bset o

f a data w

arehouse th

at supports th

e requirem

ents o

f a particu

lar dep

artmen

t or

busin

ess functio

n. T

he issu

es associated

with

data m

arts inclu

de fu

nctio

nality

, size, load

perfo

rman

ce, users’

access to d

ata in m

ultip

le data m

arts, Intern

et/intran

et access, adm

inistratio

n, an

d in

stallation.

Page 34: Database Warehousing

11

80

|C

hap

ter 3

1z

Data

Ware

housin

g C

oncep

ts

Ex

erc

ise

31.1

5Y

ou are ask

ed b

y th

e Man

agin

g D

irector o

f Drea

mH

om

eto

investig

ate and rep

ort o

n th

e applicab

ility o

f data

wareh

ousin

g fo

r the o

rgan

ization. T

he rep

ort sh

ould

com

pare d

ata wareh

ouse tech

nolo

gy w

ith O

LT

P sy

stems

and sh

ould

iden

tify th

e advan

tages an

d d

isadvan

tages, an

d an

y p

roblem

areas associated

with

implem

entin

g

a data w

arehouse. T

he rep

ort sh

ould

reach a fu

lly ju

stified

set of co

nclu

sions o

n th

e applicab

ility o

f a data

wareh

ouse fo

r Drea

mH

om

e.

Revie

w Q

ue

stio

ns

31.1

Discu

ss what is m

eant b

y th

e follo

win

g term

s

when

describ

ing th

e characteristics o

f the d

ata

in a d

ata wareh

ouse:

(a)su

bject-o

riented

;

(b)

integ

rated;

(c)tim

e-varian

t;

(d)

non-v

olatile.

31.2

Discu

ss how

Onlin

e Tran

saction P

rocessin

g

(OL

TP

) system

s differ fro

m d

ata wareh

ousin

g

system

s.

31.3

Discu

ss the m

ain b

enefi

ts and p

roblem

s

associated

with

data w

arehousin

g.

31.4

Presen

t a diag

ramm

atic represen

tation o

f the

typical arch

itecture an

d m

ain co

mponen

ts of

a data w

arehouse.

31.5

Describ

e the ch

aracteristics and m

ain

functio

ns o

f the fo

llow

ing co

mponen

ts of

a data w

arehouse:

(a)lo

ad m

anag

er;

(b)

wareh

ouse m

anag

er;

(c)query

man

ager;

(d)

metad

ata;

(e)en

d-u

ser access tools.

31.6

Discu

ss the activ

ities associated

with

each o

f

the fi

ve p

rimary

data fl

ow

s or p

rocesses w

ithin

a data w

arehouse:

(a)in

flow

;

(b)

upfl

ow

;

(c)dow

nfl

ow

;

(d)

outfl

ow

;

(e)m

etaflow

.

31.7

What are th

e three m

ain ap

pro

aches tak

en b

y

ven

dors to

pro

vid

e data ex

traction, clean

sing,

and tran

sform

ation to

ols?

31.8

Describ

e the sp

ecialized req

uirem

ents o

f

a relational d

atabase m

anag

emen

t system

(RD

BM

S) su

itable fo

r use in

a data

wareh

ouse en

viro

nm

ent.

31.9

Discu

ss how

parallel tech

nolo

gies can

support th

e requirem

ents o

f a data

wareh

ouse.

31.1

0D

iscuss th

e importan

ce of m

anag

ing m

etadata

and h

ow

this relates to

the in

tegratio

n o

f the

data w

arehouse.

31.1

1D

iscuss th

e main

tasks asso

ciated w

ith th

e

adm

inistratio

n an

d m

anag

emen

t of a d

ata

wareh

ouse.

31.1

2D

iscuss h

ow

data m

arts differ fro

m d

ata

wareh

ouses an

d id

entify

the m

ain reaso

ns fo

r

implem

entin

g a d

ata mart.

31.1

3Id

entify

the m

ain issu

es associated

with

the d

evelo

pm

ent an

d m

anag

emen

t of d

ata

marts.

31.1

4D

escribe th

e features o

f Oracle th

at

support th

e core req

uirem

ents o

f data

wareh

ousin

g.

Page 35: Database Warehousing

32 C

hap

ter

Data

Ware

housin

g D

esig

n

Ch

ap

ter O

bje

ctiv

es

In th

is c

hap

ter y

ou w

ill learn

:

nThe is

sues a

ssocia

ted

with

desig

nin

g a

data

ware

house d

ata

base

.

nA

techniq

ue fo

r desig

nin

g a

data

ware

house d

ata

base c

alle

d d

ime

nsio

nality

mod

elin

g.

nH

ow

a d

imensio

nal m

od

el (D

M) d

iffers

from

an E

ntity

–R

ela

tionship

(ER

) mo

de

l.

nA

ste

p-b

y-s

tep

meth

od

olo

gy fo

r desig

nin

g a

data

ware

house d

ata

base

.

nC

riteria

for a

ssessin

g th

e d

eg

ree o

f dim

ensio

nality

pro

vid

ed

by a

data

ware

house.

nH

ow

Ora

cle

Ware

house B

uild

er c

an b

e u

sed

to b

uild

a d

ata

ware

house.

In C

hap

ter 31 w

e describ

ed th

e basic co

ncep

ts of d

ata wareh

ousin

g. In

this ch

apter w

e

focu

s on th

e issues asso

ciated w

ith d

ata wareh

ouse d

atabase d

esign. S

ince th

e 1980s, d

ata

wareh

ouses h

ave ev

olv

ed th

eir ow

n d

esign tech

niq

ues, d

istinct fro

m tran

saction-p

rocessin

g

system

s. Dim

ensio

nal d

esign tech

niq

ues h

ave em

erged

as the d

om

inan

t appro

ach fo

r most

data w

arehouse d

atabases.

Page 36: Database Warehousing

11

82

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

32

.1

Str

uc

ture

of th

is C

ha

pte

r

In S

ection 3

2.1

we h

ighlig

ht th

e majo

r issues asso

ciated w

ith d

ata wareh

ouse d

esign.

In S

ection 3

2.2

we d

escribe th

e basic co

ncep

ts associated

with

dim

ensio

nality

model-

ing

and

then

co

mpare

this

techniq

ue

with

trad

itional

Entity

–R

elationsh

ip

modelin

g.

In S

ection 3

2.3

we d

escribe an

d d

emonstrate a step

-by-step

meth

odolo

gy fo

r desig

nin

g

a data w

arehouse d

atabase u

sing w

ork

ed ex

amples tak

en fro

m an

exten

ded

versio

n o

f

the D

ream

Hom

ecase stu

dy d

escribed

in S

ection 1

0.4

and A

ppen

dix

A. In

Sectio

n 3

2.4

we d

escribe criteria fo

r assessing th

e dim

ensio

nality

of a d

ata wareh

ouse. F

inally

, in

Sectio

n 3

2.5

we d

escribe h

ow

to d

esign a d

ata wareh

ouse u

sing an

Oracle p

roduct called

Oracle W

arehouse B

uild

er.

De

sig

nin

g a

Data

Wa

reh

ou

se

Data

ba

se

Desig

nin

g a d

ata wareh

ouse d

atabase is h

ighly

com

plex

. To b

egin

a data w

arehouse p

ro-

ject, we n

eed an

swers fo

r questio

ns su

ch as: w

hich

user req

uirem

ents are m

ost im

portan

t

and w

hich

data sh

ould

be co

nsid

ered fi

rst? Also

, should

the p

roject b

e scaled d

ow

n in

to

som

ethin

g m

ore m

anag

eable y

et at the sam

e time p

rovid

e an in

frastructu

re capab

le of

ultim

ately d

eliverin

g a fu

ll-scale enterp

rise-wid

e data w

arehouse? Q

uestio

ns su

ch as th

ese

hig

hlig

ht so

me o

f the m

ajor issu

es in b

uild

ing d

ata wareh

ouses. F

or m

any en

terprises th

e

solu

tion is d

ata marts, w

hich

we d

escribed

in S

ection 3

1.5

. Data m

arts allow

desig

ners

to b

uild

som

ethin

g th

at is far simpler an

d ach

ievab

le for a sp

ecific g

roup o

f users. F

ew

desig

ners are w

illing to

com

mit to

an en

terprise-w

ide d

esign th

at must m

eet all user

requirem

ents at o

ne tim

e. How

ever, d

espite th

e interim

solu

tion o

f build

ing d

ata marts,

the g

oal rem

ains th

e same; th

e ultim

ate creation o

f a data w

arehouse th

at supports th

e

requirem

ents o

f the en

terprise.

The req

uirem

ents co

llection an

d an

alysis stag

e (see Sectio

n 9

.5) o

f a data w

arehouse

pro

ject in

volv

es in

terview

ing ap

pro

priate

mem

bers

of

staff su

ch as

mark

eting users,

finan

ce users, sales u

sers, operatio

nal u

sers, and m

anag

emen

t to en

able th

e iden

tificatio

n

of a p

rioritized

set of req

uirem

ents fo

r the en

terprise th

at the d

ata wareh

ouse m

ust m

eet.

At th

e same tim

e, interv

iews are co

nducted

with

mem

bers o

f staff responsib

le for O

nlin

e

Tran

saction P

rocessin

g (O

LT

P) sy

stems to

iden

tify, w

hich

data so

urces can

pro

vid

e clean,

valid

, and co

nsisten

t data th

at will rem

ain su

pported

over th

e nex

t few y

ears.

The in

terview

s pro

vid

e the n

ecessary in

form

ation fo

r the to

p-d

ow

n v

iew (u

ser require-

men

ts) and th

e botto

m-u

p v

iew (w

hich

data so

urces are av

ailable) o

f the d

ata wareh

ouse.

With

these tw

o v

iews d

efined

we are read

y to

beg

in th

e pro

cess of d

esignin

g th

e data w

are-

house d

atabase.

The d

atabase co

mponen

t of a d

ata wareh

ouse is d

escribed

usin

g a tech

niq

ue called

dim

en-

sion

ality

mod

eling. In

the fo

llow

ing sectio

ns, w

e first d

escribe th

e concep

ts associated

with

a dim

ensio

nal m

odel an

d co

ntrast th

is model w

ith th

e traditio

nal E

ntity

–R

elationsh

ip

(ER

) model (see C

hap

ters 11 an

d 1

2). W

e then

presen

t a step-b

y-step

meth

odolo

gy fo

r

creating a d

imen

sional m

odel u

sing w

ork

ed ex

amples fro

m an

exten

ded

versio

n o

f the

Drea

mH

om

ecase stu

dy.

Page 37: Database Warehousing

32.2

Dim

ensio

nality

Mod

elin

g|

11

83

Dim

en

sio

na

lity M

od

elin

g

Dim

en

sio

na

lityA

lo

gic

al

desig

n te

chniq

ue th

at

aim

s to

p

resent

the d

ata

in

a

mo

de

ling

sta

nd

ard

, intu

itive fo

rm th

at a

llow

s fo

r hig

h-p

erfo

rman

ce a

cce

ss.

Dim

ensio

nality

modelin

g u

ses the co

ncep

ts of E

ntity

–R

elationsh

ip (E

R) m

odelin

g w

ith

som

e importan

t restrictions. E

very

dim

ensio

nal m

odel (D

M) is co

mposed

of o

ne tab

le

with

a com

posite p

rimary

key

, called th

e fact ta

ble, an

d a set o

f smaller tab

les called

dim

ensio

n ta

bles. E

ach d

imen

sion tab

le has a sim

ple (n

on-co

mposite) p

rimary

key

that

corresp

onds ex

actly to

one o

f the co

mponen

ts of th

e com

posite k

ey in

the fact tab

le. In

oth

er word

s, the p

rimary

key

of th

e fact table is m

ade u

p o

f two o

r more fo

reign k

eys. T

his

characteristic ‘star-lik

e’ structu

re is called a sta

r schem

aor sta

r join

. An ex

ample star

schem

a for th

e pro

perty

sales of D

ream

Hom

eis sh

ow

n in

Fig

ure 3

2.1

. Note th

at foreig

n

key

s (labeled

{F

K}) are in

cluded

in a d

imen

sional m

odel.

Anoth

er importan

t feature o

f a DM

is that all n

atural k

eys are rep

laced w

ith su

rrogate

key

s. This m

eans th

at every

join

betw

een fact an

d d

imen

sion tab

les is based

on su

rrogate

key

s, not n

atural k

eys. E

ach su

rrogate k

eysh

ould

hav

e a gen

eralized stru

cture b

ased o

n

simple in

tegers. T

he u

se of su

rrogate k

eys allo

ws th

e data in

the w

arehouse to

hav

e som

e

indep

enden

ce from

the d

ata used

and p

roduced

by th

e OL

TP

system

s. For ex

ample, each

bran

ch h

as a natu

ral key

, nam

ely b

ranchN

oan

d also

a surro

gate k

ey n

amely

bra

nchID

.

Sta

rA

lo

gic

al

stru

ctu

re th

at

has a fa

ct

tab

le conta

inin

g fa

ctu

al

data

in

th

e

sc

he

ma

cente

r, surro

und

ed

by d

imensio

n ta

ble

s c

onta

inin

g re

fere

nce

data

(whic

h

can b

e d

enorm

aliz

ed

).

The star sch

ema ex

plo

its the ch

aracteristics of factu

al data su

ch th

at facts are gen

erated

by ev

ents th

at occu

rred in

the p

ast, and are u

nlik

ely to

chan

ge, reg

ardless o

f how

they

are

analy

zed. A

s the b

ulk

of d

ata in a d

ata wareh

ouse is rep

resented

as facts, the fact tab

les

can b

e extrem

ely larg

e relative to

the d

imen

sion tab

les. As su

ch, it is im

portan

t to treat

fact data as read

-only

reference d

ata that w

ill not ch

ange o

ver tim

e. The m

ost u

seful fact

tables co

ntain

one o

r more n

um

erical measu

res, or ‘facts’, th

at occu

r for each

record

. In

Fig

ure 3

2.1

, the facts are o

fferP

rice, s

ellin

gP

rice, s

ale

Com

mis

sio

n, and s

ale

Revenue. T

he m

ost

usefu

l facts in a fact tab

le are num

eric and ad

ditiv

e becau

se data w

arehouse ap

plicatio

ns

almost n

ever access a sin

gle reco

rd; rath

er, they

access hundred

s, thousan

ds, o

r even

millio

ns o

f record

s at a time an

d th

e most u

seful th

ing to

do w

ith so

man

y reco

rds is to

aggreg

ate them

.

Dim

ensio

n

tables,

by

contrast,

gen

erally

contain

descrip

tive

textu

al in

form

ation.

Dim

ensio

n attrib

utes are u

sed as th

e constrain

ts in d

ata wareh

ouse q

ueries. F

or ex

ample,

the star sch

ema sh

ow

n in

Fig

ure 3

2.1

can su

pport q

ueries th

at require access to

sales

of p

roperties in

Glasg

ow

usin

g th

e city

attribute o

f the P

rop

erty

ForS

ale

table, an

d o

n sales

of p

roperties th

at are flats u

sing th

e typ

eattrib

ute in

the P

rop

erty

ForS

ale

table. In

fact, the

usefu

lness o

f a data w

arehouse is in

relation to

the ap

pro

priaten

ess of th

e data h

eld in

the

dim

ensio

n tab

les.

32

.2

Page 38: Database Warehousing

11

84

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

Star sch

emas can

be u

sed to

speed

up q

uery

perfo

rman

ce by d

enorm

alizing referen

ce

info

rmatio

n in

to a sin

gle d

imen

sion tab

le. For ex

ample, in

Fig

ure 3

2.1

note th

at several

dim

ensio

n tab

les (n

amely

P

rop

erty

ForS

ale,

Bra

nch,

Clie

ntB

uyer,

Sta

ff, an

d

Ow

ner)

contain

locatio

n d

ata (city,

reg

ion, an

d country), w

hich

is repeated

in each

. Den

orm

alization is

appro

priate w

hen

there are a n

um

ber o

f entities related

to th

e dim

ensio

n tab

le that are o

ften

accessed,

avoid

ing

the

overh

ead

of

hav

ing

to

join

ad

ditio

nal

tables

to

access th

ose

attributes. D

enorm

alization is n

ot ap

pro

priate w

here th

e additio

nal d

ata is not accessed

very

often

, becau

se the o

verh

ead o

f scannin

g th

e expan

ded

dim

ensio

n tab

le may

not b

e

offset b

y an

y g

ain in

the q

uery

perfo

rman

ce.

Sn

ow

fla

ke

A v

aria

nt o

f the s

tar s

chem

a w

he

re d

imensio

n ta

ble

s d

o n

ot c

onta

in

sc

he

ma

denorm

aliz

ed

data

.

Fig

ure

32

.1

Sta

r schem

a fo

r

pro

perty

sa

les o

f

Dre

am

Hom

e.

Page 39: Database Warehousing

32.2

Dim

ensio

nality

Mod

elin

g|

11

85

There is a v

ariation to

the star sch

ema called

the sn

ow

flak

e schem

a, w

hich

allow

s

dim

ensio

ns to

hav

e dim

ensio

ns. F

or ex

ample, w

e could

norm

alize the lo

cation d

ata (city,

reg

ion, an

d c

ountry

attributes) in

the B

ranch

dim

ensio

n tab

le of F

igure 3

2.1

to create tw

o

new

dim

ensio

n tab

les called C

ityan

d R

eg

ion. A

norm

alized v

ersion o

f the B

ranch

dim

en-

sion tab

le of th

e pro

perty

sales schem

a is show

n in

Fig

ure 3

2.2

. In a sn

ow

flak

e schem

a

the lo

cation d

ata in th

e Pro

perty

ForS

ale, C

lientB

uyer, S

taff, an

d O

wner

dim

ensio

n tab

les would

also b

e removed

and th

e new

City

and R

eg

ion

dim

ensio

n tab

les would

be sh

ared w

ith th

ese

tables.

Sta

rfla

ke

A

hyb

rid

stru

ctu

re

tha

t c

on

tain

s

a

mix

ture

o

f sta

r a

nd

sn

ow

fla

ke

sc

he

ma

schem

as.

The m

ost ap

pro

priate d

atabase sch

emas u

se a mix

ture o

f den

orm

alized star an

d n

or-

malized

snow

flak

e schem

as. This co

mbin

ation o

f star and sn

ow

flak

e schem

as is called a

starfl

ak

e schem

a. S

om

e dim

ensio

ns m

ay b

e presen

t in b

oth

form

s to cater fo

r differen

t

query

requirem

ents. W

heth

er the sch

ema is star, sn

ow

flak

e, or starfl

ake, th

e pred

ictable

and stan

dard

fo

rm of

the

underly

ing dim

ensio

nal

model

offers

importan

t ad

van

tages

with

in a d

ata wareh

ouse en

viro

nm

ent in

cludin

g:

nE

fficien

cyT

he co

nsisten

cy o

f the u

nderly

ing d

atabase stru

cture allo

ws m

ore effi

cient

access to th

e data b

y v

arious to

ols in

cludin

g rep

ort w

riters and q

uery

tools.

nA

bility to

handle ch

angin

g req

uirem

ents

The star sch

ema can

adap

t to ch

anges in

the

user’s req

uirem

ents, as all d

imen

sions are eq

uiv

alent in

terms o

f pro

vid

ing access to

the

fact table. T

his m

eans th

at the d

esign is b

etter able to

support a

d h

oc

user q

ueries.

Fig

ure

32

.2

Pa

rt of s

tar s

che

ma

for p

rop

erty

sale

s o

f

Dre

am

Ho

me w

ith a

norm

aliz

ed

vers

ion

of th

e B

ranch

dim

ensio

n ta

ble

.

Page 40: Database Warehousing

11

86

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

nE

xtensib

ilityT

he d

imen

sional m

odel is ex

tensib

le; for ex

ample ty

pical ch

anges th

at

a DM

must su

pport in

clude: (a) ad

din

g n

ew facts as lo

ng as th

ey are co

nsisten

t with

the fu

ndam

ental g

ranularity

of th

e existin

g fact tab

le; (b) ad

din

g n

ew d

imen

sions, as

long as th

ere is a single v

alue o

f that d

imen

sion d

efined

for each

existin

g fact reco

rd;

(c) addin

g n

ew d

imen

sional attrib

utes; an

d (d

) break

ing ex

isting d

imen

sion reco

rds

dow

n to

a low

er level o

f gran

ularity

from

a certain p

oin

t in tim

e forw

ard.

nA

bility to

model co

mm

on b

usin

ess situatio

ns

There are a g

row

ing n

um

ber o

f standard

appro

aches fo

r han

dlin

g co

mm

on m

odelin

g situ

ations in

the b

usin

ess world

. Each

of

these situ

ations h

as a well-u

ndersto

od set o

f alternativ

es that can

be sp

ecifically

pro

-

gram

med

in rep

ort w

riters, query

tools, an

d o

ther u

ser interfaces; fo

r exam

ple, slo

wly

chan

gin

g d

imen

sions w

here a ‘co

nstan

t’ dim

ensio

n su

ch as

Bra

nch

or

Sta

ffactu

ally

evolv

es slow

ly an

d asy

nch

ronously

. We d

iscuss slo

wly

chan

gin

g d

imen

sions in

more

detail in

Sectio

n 3

2.3

, Step

8.

nP

redicta

ble q

uery p

rocessin

gD

ata wareh

ouse ap

plicatio

ns th

at drill d

ow

n w

ill simply

be ad

din

g m

ore d

imen

sion attrib

utes fro

m w

ithin

a single star sch

ema. A

pplicatio

ns th

at

drill acro

ss will b

e linkin

g sep

arate fact tables to

geth

er thro

ugh th

e shared

(confo

rmed

)

dim

ensio

ns. E

ven

though th

e overall su

ite of star sch

emas in

the en

terprise d

imen

sional

model is co

mplex

, the q

uery

pro

cessing is v

ery p

redictab

le becau

se at the lo

west lev

el,

each fact tab

le should

be q

ueried

indep

enden

tly.

Co

mp

aris

on

of D

M a

nd

ER

mo

de

ls

In th

is section w

e com

pare an

d co

ntrast th

e dim

ensio

nal m

odel (D

M) w

ith th

e Entity

Relatio

nsh

ip (E

R) m

odel. A

s describ

ed in

the p

revio

us sectio

n, D

Ms are n

orm

ally u

sed to

desig

n th

e datab

ase com

ponen

t of a d

ata wareh

ouse w

hereas E

R m

odels h

ave trad

itionally

been

used

to d

escribe th

e datab

ase for O

nlin

e Tran

saction P

rocessin

g (O

LT

P) sy

stems.

ER

m

odelin

g is

a tech

niq

ue

for

iden

tifyin

g relatio

nsh

ips

among en

tities. A

m

ajor

goal o

f ER

modelin

g is to

remove red

undan

cy in

the d

ata. This is im

men

sely b

enefi

cial to

transactio

n p

rocessin

g b

ecause tran

sactions are m

ade v

ery sim

ple an

d d

etermin

istic. For

exam

ple, a tran

saction th

at updates a clien

t’s address n

orm

ally accesses a sin

gle reco

rd in

the C

lienttab

le. This access is ex

tremely

fast as it uses an

index

on th

e prim

ary k

ey c

lientN

o.

How

ever, in

mak

ing tran

saction p

rocessin

g effi

cient su

ch d

atabases can

not effi

ciently

and

easily su

pport a

d h

oc en

d-u

ser queries. T

raditio

nal b

usin

ess applicatio

ns su

ch as cu

stom

er

ord

ering, sto

ck co

ntro

l, and cu

stom

er invoicin

g req

uire m

any tab

les with

num

erous jo

ins

betw

een th

em. A

n E

R m

odel fo

r an en

terprise can

hav

e hundred

s of lo

gical en

tities, which

can m

ap to

hundred

s of p

hysical tab

les. Trad

itional E

R m

odelin

g d

oes n

ot su

pport th

e

main

attractio

n of

data

wareh

ousin

g,

nam

ely in

tuitiv

e an

d hig

h-p

erform

ance

retrieval

of d

ata.

The k

ey to

understan

din

g th

e relationsh

ip b

etween

dim

ensio

nal m

odels an

d E

ntity

Relatio

nsh

ip m

odels is th

at a single E

R m

odel n

orm

ally d

ecom

poses in

to m

ultip

le DM

s.

The m

ultip

le DM

s are then

associated

thro

ugh ‘sh

ared’ d

imen

sion tab

les. We d

escribe th

e

relationsh

ip b

etween

ER

models an

d D

Ms in

more d

etail in th

e follo

win

g sectio

n, in

which

we p

resent a d

atabase d

esign m

ethodolo

gy fo

r data w

arehouses.

32

.2.1

Page 41: Database Warehousing

32.3

Data

base D

esig

n M

eth

od

olo

gy fo

r Data

Ware

houses

|1

18

7

Data

ba

se

De

sig

n M

eth

od

olo

gy fo

r D

ata

Wa

reh

ou

se

s

In th

is section w

e describ

e a step-b

y-step

meth

odolo

gy fo

r desig

nin

g th

e datab

ase of a

data w

arehouse. T

his m

ethodolo

gy w

as pro

posed

by K

imball an

d is called

the ‘N

ine-S

tep

Meth

odolo

gy’ (K

imball, 1

996). T

he step

s of th

is meth

odolo

gy are sh

ow

n in

Tab

le 32.1

.

There are m

any ap

pro

aches th

at offer altern

ative ro

utes to

the creatio

n o

f a data w

arehouse.

One o

f the m

ore su

ccessful ap

pro

aches is to

deco

mpose th

e desig

n o

f the d

ata wareh

ouse

into

more m

anag

eable p

arts, nam

ely d

ata marts (see S

ection 3

1.5

). At a later stag

e, the in

te-

gratio

n o

f the sm

aller data m

arts leads to

the creatio

n o

f the en

terprise-w

ide d

ata wareh

ouse.

Thus, a d

ata wareh

ouse is th

e unio

n o

f a set of sep

arate data m

arts implem

ented

over a

perio

d o

f time, p

ossib

ly b

y d

ifferent d

esign team

s, and p

ossib

ly o

n d

ifferent h

ardw

are and

softw

are platfo

rms.

The N

ine-S

tep M

ethodolo

gy sp

ecifies th

e steps req

uired

for th

e desig

n o

f a data m

art.

How

ever, th

e meth

odolo

gy also

ties togeth

er separate d

ata marts so

that o

ver tim

e they

merg

e togeth

er into

a coheren

t overall d

ata wareh

ou

se. We n

ow

describ

e the step

s show

n

in T

able 3

2.1

in so

me d

etail usin

g w

ork

ed ex

amples tak

en fro

m an

exten

ded

versio

n o

f the

Drea

mH

om

ecase stu

dy.

Ste

p 1

:C

ho

osin

g th

e p

roc

ess

The p

rocess (fu

nctio

n) refers to

the su

bject m

atter of a p

articular d

ata mart. T

he fi

rst

data m

art to b

e built sh

ould

be th

e one th

at is most lik

ely to

be d

elivered

on tim

e, with

in

budget, an

d to

answ

er the m

ost co

mm

ercially im

portan

t busin

ess questio

ns. T

he b

est

choice fo

r the fi

rst data m

art tends to

be th

e one th

at is related to

sales. This d

ata source is

likely

to b

e accessible an

d o

f hig

h q

uality

. In selectin

g th

e first d

ata mart fo

r Drea

mH

om

e,

we fi

rst iden

tify th

at the d

iscrete busin

ess pro

cesses of D

ream

Hom

ein

clude:

32

.3

Tab

le 32.1

Nin

e-Step

Meth

odolo

gy b

y K

imball (1

996).

Ste

pA

ctiv

ity

1C

hoosin

g th

e pro

cess

2C

hoosin

g th

e grain

3Id

entify

ing an

d co

nfo

rmin

g th

e dim

ensio

ns

4C

hoosin

g th

e facts

5S

torin

g p

re-calculatio

ns in

the fact tab

le

6R

oundin

g o

ut th

e dim

ensio

n tab

les

7C

hoosin

g th

e duratio

n o

f the d

atabase

8T

rackin

g slo

wly

chan

gin

g d

imen

sions

9D

ecidin

g th

e query

prio

rities and th

e query

modes

Page 42: Database Warehousing

11

88

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

npro

perty

sales;

npro

perty

rentals (leasin

g);

npro

perty

view

ing;

npro

perty

advertisin

g;

npro

perty

main

tenan

ce.

The d

ata requirem

ents asso

ciated w

ith th

ese pro

cesses are show

n in

the E

R d

iagram

of

Fig

ure 3

2.3

. Note th

at this E

R d

iagram

form

s part o

f the d

esign d

ocu

men

tation, w

hich

describ

es the O

nlin

e Tran

saction P

rocessin

g (O

LT

P) sy

stems req

uired

to su

pport th

e busi-

ness p

rocesses o

f Drea

mH

om

e. The E

R d

iagram

of F

igure 3

2.3

has b

een sim

plifi

ed b

y

labelin

g o

nly

the m

ain en

tities and relatio

nsh

ips an

d is created

by fo

llow

ing S

teps 1

and 2

of th

e datab

ase desig

n m

ethodolo

gy d

escribed

earlier in C

hap

ters 15 an

d 1

6. T

he sh

aded

entities rep

resent th

e core facts fo

r each b

usin

ess pro

cess of D

ream

Hom

e. The b

usin

ess

pro

cess selected to

be th

e first d

ata mart is p

roperty

sales. The p

art of th

e orig

inal E

R

Fig

ure

32

.3E

R d

iag

ram

of a

n e

xte

nd

ed

ve

rsio

n o

f Dre

am

Hom

e.

Page 43: Database Warehousing

32.3

Data

base D

esig

n M

eth

od

olo

gy fo

r Data

Ware

houses

|1

18

9

diag

ram th

at represen

ts the d

ata requirem

ents o

f the p

roperty

sales busin

ess pro

cess is

show

n in

Fig

ure 3

2.4

.

Ste

p 2

:C

ho

osin

g th

e g

rain

Choosin

g th

e grain

mean

s decid

ing ex

actly w

hat a fact tab

le record

represen

ts. For ex

ample,

the P

rop

erty

Sale

entity

show

n w

ith sh

adin

g in

Fig

ure 3

2.4

represen

ts the facts ab

out each

pro

perty

sale

and

beco

mes

the

fact tab

le of

the

pro

perty

sales

star sch

ema

show

n

prev

iously

in F

igure 3

2.1

. Therefo

re, the g

rain o

f the P

rop

erty

Sale

fact table is in

div

idual

pro

perty

sales.

Only

when

the g

rain fo

r the fact tab

le is chosen

can w

e iden

tify th

e dim

ensio

ns o

f the

fact table. F

or ex

ample, th

e Bra

nch, S

taff, O

wner, C

lientB

uyer, P

rop

erty

ForS

ale, an

d P

rom

otio

n

entities in

Fig

ure 3

2.4

will b

e used

to referen

ce the d

ata about p

roperty

sales and w

ill be-

com

e the d

imen

sion tab

les of th

e pro

perty

sales star schem

a show

n p

revio

usly

in F

igure 3

2.1

.

We also

inclu

de T

ime

as a core d

imen

sion, w

hich

is alway

s presen

t in star sch

emas.

The g

rain d

ecision fo

r the fact tab

le also d

etermin

es the g

rain o

f each o

f the d

imen

sion

tables. F

or ex

ample, if th

e grain

for th

e Pro

perty

Sale

fact table is an

indiv

idual p

roperty

sale,

then

the g

rain o

f the C

lientB

uyer

dim

ensio

n is th

e details o

f the clien

t who b

ought a p

artic-

ular p

roperty

.

Ste

p 3

:Id

en

tifyin

g a

nd

co

nfo

rmin

g th

e d

ime

nsio

ns

Dim

ensio

ns set th

e contex

t for ask

ing q

uestio

ns ab

out th

e facts in th

e fact table. A

well-

built set o

f dim

ensio

ns m

akes th

e data m

art understan

dab

le and easy

to u

se. We id

entify

dim

ensio

ns in

suffi

cient d

etail to d

escribe th

ings su

ch as clien

ts and p

roperties at th

e

correct g

rain. F

or ex

ample, each

client o

f the C

lientB

uyer

dim

ensio

n tab

le is describ

ed b

y

the c

lientID

, clie

ntN

o, clie

ntN

am

e, clie

ntT

yp

e, city, re

gio

n, and c

ountry

attributes, as sh

ow

n p

revi-

ously

in F

igure 3

2.1

. A p

oorly

presen

ted o

r inco

mplete set o

f dim

ensio

ns w

ill reduce th

e

usefu

lness o

f a data m

art to an

enterp

rise.

Fig

ure

32

.4

Pa

rt of E

R d

iag

ram

in F

igu

re 3

2.3

that

rep

resents

the

data

req

uire

men

ts o

f the

pro

pe

rty s

ale

s

bu

sin

ess p

roc

ess

of D

ream

Hom

e.

Page 44: Database Warehousing

11

90

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

If any d

imen

sion o

ccurs in

two d

ata marts, th

ey m

ust b

e exactly

the sam

e dim

ensio

n, o

r

one m

ust b

e a math

ematical su

bset o

f the o

ther. O

nly

in th

is way

can tw

o d

ata marts sh

are

one o

r more d

imen

sions in

the sam

e applicatio

n. W

hen

a dim

ensio

n is u

sed in

more th

an

one d

ata mart, th

e dim

ensio

n is referred

to as b

eing co

nfo

rmed

. Exam

ples o

f dim

en-

sions th

at must co

nfo

rm b

etween

pro

perty

sales and p

roperty

advertisin

g are th

e Tim

e,

Pro

perty

ForS

ale, B

ranch, an

d P

rom

otio

ndim

ensio

ns. If th

ese dim

ensio

ns are n

ot sy

nch

ronized

or if th

ey are allo

wed

to d

rift out o

f synch

ronizatio

n b

etween

data m

arts, the o

verall d

ata

wareh

ouse w

ill fail, becau

se the tw

o d

ata marts w

ill not b

e able to

be u

sed to

geth

er.

For ex

ample, in

Fig

ure 3

2.5

we sh

ow

the star sch

emas fo

r pro

perty

sales and p

roperty

advertisin

g w

ith T

ime, P

rop

erty

ForS

ale, B

ranch, an

d P

rom

otio

nas co

nfo

rmed

dim

ensio

ns w

ith

light sh

adin

g.

Fig

ure

32

.5

Sta

r schem

as fo

r

pro

perty

sa

les a

nd

pro

perty

ad

ve

rtisin

g

with

Tim

e,

Pro

perty

Fo

rSa

le,

Bra

nch, a

nd

Pro

motio

n a

s

confo

rmed

(sh

are

d)

dim

ensio

n ta

ble

s.

Page 45: Database Warehousing

32.3

Data

base D

esig

n M

eth

od

olo

gy fo

r Data

Ware

houses

|1

19

1

Ste

p 4

:C

ho

osin

g th

e fa

cts

The g

rain o

f the fact tab

le determ

ines w

hich

facts can b

e used

in th

e data m

art. All th

e

facts must b

e expressed

at the lev

el implied

by th

e grain

. In o

ther w

ord

s, if the g

rain

of th

e fact table is an

indiv

idual p

roperty

sale, then

all the n

um

erical facts must refer

to th

is particu

lar sale. Also

, the facts sh

ould

be n

um

eric and ad

ditiv

e. In F

igure 3

2.6

we

use th

e star schem

a of th

e pro

perty

rental p

rocess o

f Drea

mH

om

eto

illustrate a b

adly

structu

red fact tab

le. This fact tab

le is unusab

le with

non-n

um

eric facts (pro

motio

nN

am

e

and s

taffN

am

e), a non-ad

ditiv

e fact (month

lyR

ent), an

d a fact (la

stY

earR

evenue) at a d

ifferent

gran

ularity

fro

m th

e oth

er facts

in th

e tab

le. F

igure

32.7

sh

ow

s how

th

e Lease

fact

table

show

n in

F

igure

32.6

co

uld

be

corrected

so

th

at th

e fact

table

is ap

pro

priately

structu

red.

Additio

nal facts can

be ad

ded

to a fact tab

le at any tim

e pro

vid

ed th

ey are co

nsisten

t

with

the g

rain o

f the tab

le.

Fig

ure

32

.6

Sta

r sche

ma

for

pro

perty

ren

tals

of

Dre

am

Hom

e. T

his

is a

n e

xam

ple

of a

bad

ly s

tructu

red

fac

t tab

le w

ith

no

n-n

um

eric

fac

ts,

a n

on-a

dd

itive

fact,

an

d a

nu

me

ric fa

ct

with

an in

co

nsis

tent

gra

nula

rity w

ith th

e

oth

er fa

cts

in th

e

tab

le.

Page 46: Database Warehousing

11

92

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

Ste

p 5

:S

torin

g p

re-c

alc

ula

tion

s in

the

fac

t tab

le

Once th

e facts hav

e been

selected each

should

be re-ex

amin

ed to

determ

ine w

heth

er there

are opportu

nities to

use p

re-calculatio

ns. A

com

mon ex

ample o

f the n

eed to

store p

re-

calculatio

ns o

ccurs w

hen

the facts co

mprise a p

rofi

t and lo

ss statemen

t. This situ

ation w

ill

often

arise when

the fact tab

le is based

on in

voices o

r sales. Fig

ure 3

2.7

show

s the fact tab

le

with

the re

ntD

ura

tion, to

talR

ent, c

lientA

llow

ance, s

taffC

om

mis

sio

n, and to

talR

evenue

attributes. T

hese

types o

f facts are usefu

l becau

se they

are additiv

e quan

tities, from

which

we can

deriv

e

valu

able in

form

ation su

ch as th

e averag

e clie

ntA

llow

ance

based

on ag

greg

ating so

me n

um

ber

of fact tab

le record

s. To calcu

late the to

talR

evenue

gen

erated p

er pro

perty

rental w

e subtract

the c

lientA

llow

ance

and th

e sta

ffCom

mis

sio

nfro

m to

talR

ent. A

lthough th

e tota

lRevenue

can alw

ays

be d

erived

from

these attrib

utes, w

e still need

to sto

re the to

talR

evenue. T

his is p

articularly

true fo

r a valu

e that is fu

ndam

ental to

an en

terprise, su

ch as to

talR

evenue, o

r if there is an

y

chan

ce of a u

ser calculatin

g th

e tota

lRevenue

inco

rrectly. T

he co

st of a u

ser inco

rrectly rep

-

resentin

g th

e tota

lRevenue

is offset ag

ainst th

e min

or co

st of a little red

undan

t data sto

rage.

Fig

ure

32

.7

Sta

r schem

a fo

r the

pro

perty

ren

tals

of

Dre

am

Hom

e. T

his

is

the s

chem

a s

ho

wn

in

Fig

ure

32.6

with

the

pro

ble

ms c

orre

cte

d.

Page 47: Database Warehousing

32.3

Data

base D

esig

n M

eth

od

olo

gy fo

r Data

Ware

houses

|1

19

3

Ste

p 6

:R

ou

nd

ing

ou

t the

dim

en

sio

n ta

ble

s

In th

is step, w

e return

to th

e dim

ensio

n tab

les and ad

d as m

any tex

t descrip

tions to

the

dim

ensio

ns as p

ossib

le. The tex

t descrip

tions sh

ould

be as in

tuitiv

e and u

nderstan

dab

le to

the u

sers as possib

le. The u

sefuln

ess of a d

ata mart is d

etermin

ed b

y th

e scope an

d n

ature

of th

e attributes o

f the d

imen

sion tab

les.

Ste

p 7

:C

ho

osin

g th

e d

ura

tion

of th

e d

ata

ba

se

The d

uratio

n m

easures h

ow

far back

in tim

e the fact tab

le goes. In

man

y en

terprises,

there is a req

uirem

ent to

look at th

e same tim

e perio

d a y

ear or tw

o earlier. F

or o

ther en

ter-

prises, su

ch as in

suran

ce com

pan

ies, there m

ay b

e a legal req

uirem

ent to

retain d

ata

exten

din

g b

ack fi

ve o

r more y

ears. Very

large fact tab

les raise at least two v

ery sig

nifi

cant

data w

arehouse d

esign issu

es. First, it is o

ften in

creasingly

diffi

cult to

source in

creasingly

old

data.

The

old

er th

e data,

the

more

likely

th

ere w

ill be

pro

blem

s in

read

ing an

d

interp

reting th

e old

files o

r the o

ld tap

es. Seco

nd, it is m

andato

ry th

at the o

ld v

ersions

of th

e importan

t dim

ensio

ns b

e used

, not th

e most cu

rrent v

ersions. T

his is k

now

n as th

e

‘slow

ly ch

angin

g d

imen

sion’ p

roblem

, which

is describ

ed in

more d

etail in th

e follo

w-

ing step

.

Ste

p 8

:T

rac

kin

g s

low

ly c

ha

ng

ing

dim

en

sio

ns

The slo

wly

chan

gin

g d

imen

sion p

roblem

mean

s, for ex

ample, th

at the p

roper d

escriptio

n

of th

e old

client an

d th

e old

bran

ch m

ust b

e used

with

the o

ld tran

saction h

istory

. Often

,

the d

ata wareh

ouse m

ust assig

n a g

eneralized

key

to th

ese importan

t dim

ensio

ns in

ord

er

to d

istinguish

multip

le snap

shots o

f clients an

d b

ranch

es over a p

eriod o

f time.

There are th

ree basic ty

pes o

f slow

ly ch

angin

g d

imen

sions: T

ype 1

, where a ch

anged

dim

ensio

n attrib

ute is o

verw

ritten; T

ype 2

, where a ch

anged

dim

ensio

n attrib

ute cau

ses a

new

dim

ensio

n reco

rd to

be created

; and T

ype 3

, where a ch

anged

dim

ensio

n attrib

ute

causes an

alternate attrib

ute to

be created

so th

at both

the o

ld an

d n

ew v

alues o

f the attri-

bute are sim

ultan

eously

accessible in

the sam

e dim

ensio

n reco

rd.

Ste

p 9

: De

cid

ing

the

qu

ery

prio

rities a

nd

the

qu

ery

mo

de

s

In th

is step w

e consid

er physical d

esign issu

es. The m

ost critical p

hysical d

esign issu

es

affecting th

e end-u

ser’s percep

tion o

f the d

ata mart are th

e physical so

rt ord

er of th

e fact

table o

n d

isk an

d th

e presen

ce of p

re-stored

sum

maries o

r aggreg

ations. B

eyond th

ese issues

there are a h

ost o

f additio

nal p

hysical d

esign issu

es affecting ad

min

istration, b

ackup,

index

ing p

erform

ance, an

d secu

rity. F

or fu

rther in

form

ation o

n th

e issues affectin

g th

e

physical d

esign fo

r data w

arehouses th

e interested

reader is referred

to A

nah

ory

and

Murray

(1997).

At th

e end o

f this m

ethodolo

gy, w

e hav

e a desig

n fo

r a data m

art that su

pports th

e

requirem

ents o

f a particu

lar busin

ess pro

cess and also

allow

s the easy

integ

ration w

ith

oth

er related d

ata marts to

ultim

ately fo

rm th

e enterp

rise-wid

e data w

arehouse. T

able 3

2.2

lists the fact an

d d

imen

sion tab

les associated

with

the star sch

ema fo

r each b

usin

ess pro

cess

of D

ream

Hom

e(id

entifi

ed in

Step

1 o

f the m

ethodolo

gy).

Page 48: Database Warehousing

11

94

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

We in

tegrate th

e star schem

as for th

e busin

ess pro

cesses of D

ream

Hom

eusin

g th

e con-

form

ed d

imen

sions. F

or ex

ample, all th

e fact tables sh

are the T

ime

and B

ranch

dim

ensio

ns

as show

n in

Tab

le 32.2

. A d

imen

sional m

odel, w

hich

contain

s more th

an o

ne fact tab

le

sharin

g o

ne o

r more co

nfo

rmed

dim

ensio

n tab

les, is referred to

as a fact co

nstella

tion

.

The fact co

nstellatio

n fo

r the D

ream

Hom

edata w

arehouse is sh

ow

n in

Fig

ure 3

2.8

. The

model h

as been

simplifi

ed b

y d

isplay

ing o

nly

the n

ames o

f the fact an

d d

imen

sion tab

les.

Note th

at the fact tab

les are show

n w

ith d

ark sh

adin

g an

d all th

e dim

ensio

n tab

les bein

g

confo

rmed

are show

n w

ith lig

ht sh

adin

g.

Fig

ure

32

.8

Dim

ensio

nal m

od

el

(fact c

onste

llatio

n)

for th

e D

ream

Hom

e

data

ware

ho

use

.

Page 49: Database Warehousing

32.4

Crite

ria fo

r Assessin

g th

e D

imensio

nality

of a

Da

ta W

are

hou

se

|1

19

5

Crite

ria fo

r Asse

ssin

g th

e D

ime

nsio

na

lity

of a

Data

Wa

reh

ou

se

Sin

ce the 1

980s, d

ata wareh

ouses h

ave ev

olv

ed th

eir ow

n d

esign tech

niq

ues, d

istinct fro

m

OL

TP

system

s. Dim

ensio

nal d

esign tech

niq

ues h

ave em

erged

as the m

ain ap

pro

ach fo

r

most o

f the d

ata wareh

ouses. In

this sectio

n w

e describ

e the criteria p

roposed

by R

alph

Kim

ball to

measu

re the ex

tent to

which

a system

supports th

e dim

ensio

nal v

iew o

f data

wareh

ousin

g (K

imball, 2

000a,b

).

When

assessing a p

articular d

ata wareh

ouse rem

ember th

at few v

endors attem

pt to

pro

vid

e a com

pletely

integ

rated so

lutio

n. H

ow

ever, as a d

ata wareh

ouse is a co

mplete

system

, the criteria sh

ould

only

be u

sed to

assess com

plete en

d-to

-end sy

stems an

d n

ot a

collectio

n o

f disjo

inted

pack

ages th

at may

nev

er integ

rate well to

geth

er.

There are tw

enty

criteria div

ided

into

three b

road

gro

ups: a

rchitectu

re, adm

inistra

tion

,

and exp

ression

as show

n in

Tab

le 32.3

. The p

urp

ose o

f establish

ing th

ese criteria is to

establish

an o

bjectiv

e standard

for assessin

g h

ow

well a sy

stem su

pports th

e dim

ensio

nal

view

of d

ata wareh

ousin

g, an

d to

set the th

reshold

hig

h so

that v

endors h

ave a targ

et for

impro

vin

g th

eir system

s. The in

tended

way

to u

se this list is to

rate a system

on each

criterion w

ith a sim

ple 0

or 1

. A sy

stem q

ualifi

es for a 1

only

if it meets th

e full d

efinitio

n

of

support

for

that

criterion.

For

exam

ple,

a sy

stem th

at offers

aggreg

ate nav

igatio

n

(the fo

urth

criterion) th

at is availab

le only

to a sin

gle fro

nt-en

d to

ol g

ets a zero b

ecause

the ag

greg

ate nav

igatio

n is n

ot o

pen

. In o

ther w

ord

s, there can

be n

o p

artial credit fo

r a

criterion.

Arch

itectura

l criteria are fu

ndam

ental ch

aracteristics to th

e way

the en

tire system

is

org

anized

. These criteria u

sually

exten

d fro

m th

e back

end, th

rough th

e DB

MS

, to th

e

fronten

d an

d th

e user’s d

eskto

p.

Ad

min

istratio

n criteria

are more tactical th

an arch

itectural criteria, b

ut are co

nsid

ered

to be

essential

to th

e ‘sm

ooth

ru

nnin

g’

of

a dim

ensio

nally

orien

ted data

wareh

ouse.

These criteria g

enerally

affect IT p

ersonnel w

ho are b

uild

ing an

d m

aintain

ing th

e data

wareh

ouse.

Tab

le 32.2

Fact an

d d

imen

sion tab

les for each

busin

ess pro

cess of D

ream

Hom

e.

Busin

ess p

rocess

Fact ta

ble

Pro

perty

salesP

rop

erty

Sale

Pro

perty

rentals

Lease

Pro

perty

view

ing

Pro

perty

Vie

win

g

Pro

perty

advertisin

gA

dvert

Pro

perty

main

tenan

ceP

rop

erty

Main

tenance

Dim

ensio

n ta

ble

s

Tim

e, B

ranch, S

taff, P

rop

erty

ForS

ale

, Ow

ner,

Clie

ntB

uyer, P

rom

otio

n

Tim

e, B

ranch, S

taff, P

rop

erty

ForR

ent, O

wner,

Clie

ntR

ente

r, Pro

motio

n

Tim

e, B

ranch, P

rop

erty

ForS

ale

,P

rop

erty

ForR

ent, C

lientB

uyer, C

lientR

ente

r

Tim

e, B

ranch, P

rop

erty

ForS

ale

,P

rop

erty

ForR

ent, P

rom

otio

n, N

ew

sp

ap

er

Tim

e, B

ranch, S

taff, P

rop

erty

ForR

ent

32

.4

Page 50: Database Warehousing

11

96

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

Exp

ression

criteriaare m

ostly

analy

tic capab

ilities that are n

eeded

in real-life situ

-

ations. T

he en

d-u

ser com

munity

experien

ces all expressio

n criteria d

irectly. T

he ex

pressio

n

criteria for d

imen

sional sy

stems are n

ot th

e only

features u

sers look fo

r in a d

ata ware-

house, b

ut th

ey are all cap

abilities th

at need

to ex

plo

it the p

ow

er of a d

imen

sional sy

stem.

A sy

stem th

at supports m

ost o

r all of th

ese dim

ensio

nal criteria w

ould

be ad

aptab

le,

easier to ad

min

ister, and ab

le to ad

dress m

any real-w

orld

applicatio

ns. T

he m

ajor p

oin

t of

dim

ensio

nal sy

stems is th

at they

are busin

ess-issue an

d en

d-u

ser driv

en. F

or fu

rther d

etails

of th

e criteria in T

able 3

2.3

, the in

terested read

er is referred to

Kim

ball (2

000a,b

).

Data

Wa

reh

ou

sin

g D

esig

n U

sin

g O

racle

We in

troduced

the O

racle DB

MS

in S

ection 8

.2. In

this sectio

n, w

e describ

e Ora

cleW

areh

ou

se Bu

ilder

(OW

B)

as a

key

co

mponen

t of

the

Oracle

Wareh

ouse

solu

tion,

enab

ling th

e desig

n an

d d

eplo

ym

ent o

f data w

arehouses, d

ata marts, an

d e-B

usin

ess intelli-

gen

ce applicatio

ns. O

WB

is a desig

n to

ol an

d an

extractio

n, tran

sform

ation, an

d lo

adin

g

32

.5

Tab

le 32.3

Criteria fo

r assessing th

e dim

ensio

nality

pro

vid

ed b

y a d

ataw

arehouse (K

imball, 2

000a,b

).

Gro

up

Crite

ria

Arch

itecture

Explicit d

eclaration

Confo

rmed

dim

ensio

ns an

d facts

Dim

ensio

nal in

tegrity

Open

aggreg

ate nav

igatio

n

Dim

ensio

nal sy

mm

etry

Dim

ensio

nal scalab

ility

Sparsity

toleran

ce

Ad

min

istratio

nG

raceful m

odifi

cation

Dim

ensio

nal rep

lication

Chan

ged

dim

ensio

n n

otifi

cation

Surro

gate k

ey ad

min

istration

Intern

ational co

nsisten

cy

Exp

ression

Multip

le-dim

ensio

n h

ierarchies

Rag

ged

-dim

ensio

n h

ierarchies

Multip

le valu

ed d

imen

sions

Slo

wly

chan

gin

g d

imen

sions

Roles o

f a dim

ensio

n

Hot-sw

appab

le dim

ensio

ns

On-th

e-fly fact ran

ge d

imen

sions

On-th

e-fly b

ehav

ior d

imen

sions

Page 51: Database Warehousing

32.5

Data

Wa

reho

usin

g D

esig

n U

sin

g O

racle

|1

19

7

(ET

L) to

ol. A

n im

portan

t aspect o

f OW

B fro

m th

e custo

mers’ p

erspectiv

e is that it allo

ws

the in

tegratio

n o

f the trad

itional d

ata wareh

ousin

g en

viro

nm

ents w

ith th

e new

e-Busin

ess

enviro

nm

ents (O

racle Corp

oratio

n, 2

000). T

his sectio

n fi

rst pro

vid

es an o

verv

iew o

f the

com

ponen

ts of O

WB

and th

e underly

ing tech

nolo

gies an

d th

en d

escribes h

ow

the u

ser

would

apply

OW

B to

typical d

ata wareh

ousin

g task

s.

Ora

cle

Wa

reh

ou

se

Bu

ilde

r Co

mp

on

en

ts

OW

B p

rovid

es the fo

llow

ing p

rimary

functio

nal co

mponen

ts:

nA

reposito

ryco

nsistin

g o

f a set of tab

les in an

Oracle d

atabase th

at is accessed v

ia a

Java-b

ased access lay

er. The rep

osito

ry is b

ased o

n th

e Com

mon W

arehouse M

odel

(CW

M) stan

dard

, which

allow

s the O

WB

meta-d

ata to b

e accessible to

oth

er pro

ducts

that su

pport th

is standard

(see Sectio

n 3

1.4

.3).

nA

gra

ph

ical u

ser interfa

ce (G

UI)

that

enab

les access

toth

e rep

osito

ry.

The

GU

I

features g

raphical ed

itors an

d an

exten

sive u

se of w

izards. T

he G

UI is w

ritten in

Java,

mak

ing th

e fronten

d p

ortab

le.

nA

cod

e gen

erato

r, also w

ritten in

Java, g

enerates th

e code th

at enab

les the d

eplo

ym

ent

of d

ata wareh

ouses. T

he d

ifferent co

de ty

pes g

enerated

by O

WB

are discu

ssed later in

this sectio

n.

nIn

tegrators, which

are com

ponen

ts that are d

edicated

to ex

tracting d

ata from

a particu

lar

type o

f source. In

additio

n to

nativ

e support fo

r Oracle, o

ther relatio

nal, n

on-relatio

nal,

and fl

at-file d

ata sources, O

WB

integ

rators allo

w access to

info

rmatio

n in

enterp

rise

resource p

lannin

g (E

RP

) applicatio

ns su

ch as O

racle and S

AP

R/3

. The S

AP

integ

rator

pro

vid

es access to S

AP

transp

arent tab

les usin

g P

L/S

QL

code g

enerated

by O

WB

.

nA

n o

pen

interfa

ceth

at allow

s dev

elopers to

exten

d th

e extractio

n cap

abilities o

f OW

B,

while lev

eragin

g th

e ben

efits o

f the O

WB

framew

ork

. This o

pen

interface is m

ade av

ail-

able to

dev

elopers as p

art of th

e OW

B S

oftw

are Dev

elopm

ent K

it (SD

K).

nR

un

time, w

hich

is a set of tab

les, sequen

ces, pack

ages, an

d trig

gers th

at are installed

in th

e target sch

ema. T

hese d

atabase o

bjects are th

e foundatio

n fo

r the au

ditin

g an

d

error d

etection

/correctio

n cap

abilities o

f OW

B. F

or ex

ample, lo

ads can

be restarted

based

on in

form

ation sto

red in

the ru

ntim

e tables. O

WB

inclu

des a ru

ntim

e audit v

iewer

for b

row

sing th

e runtim

e tables an

d ru

ntim

e reports.

The arch

itecture o

f the O

racle Wareh

ouse B

uild

er is show

n in

Fig

ure 3

2.9

. Oracle W

are-

house B

uild

er is a key

com

ponen

t of th

e larger O

racle data w

arehouse. T

he o

ther p

roducts

that th

e OW

B m

ust w

ork

with

with

in th

e data w

arehouse in

clude:

nO

racle – th

e engin

e of O

WB

(as there is n

o ex

ternal serv

er);

nO

racle Enterp

rise Man

ager –

for sch

edulin

g;

nO

racle Work

flow

– fo

r dep

enden

cy m

anag

emen

t;

nO

racle Pure•E

xtract –

for M

VS

main

frame access;

nO

racle Pure•In

tegrate –

for cu

stom

er data q

uality

;

nO

racle Gatew

ays –

for relatio

nal an

d m

ainfram

e data access.

32

.5.1

Page 52: Database Warehousing

11

98

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

Usin

g O

racle

Wa

reh

ou

se

Bu

ilde

r

In th

is section w

e describ

e how

OW

B assists th

e user in

som

e typical d

ata wareh

ousin

g

tasks lik

e defi

nin

g so

urce d

ata structu

res, desig

nin

g th

e target w

arehouse, m

appin

g so

urces

to targ

ets, gen

erating co

de, in

stantiatin

g th

e wareh

ouse, ex

tracting th

e data, an

d m

aintain

-

ing th

ew

arehouse.

Definin

g s

ourc

es

Once th

e requirem

ents h

ave b

een d

etermin

ed an

d all th

e data so

urces h

ave b

een id

entifi

ed,

a tool su

ch as O

WB

can b

e used

for co

nstru

cting th

e data w

arehouse. O

WB

can h

andle

a div

erse set of d

ata sources b

y m

eans o

f integ

rators. O

WB

also h

as the co

ncep

t of a

module, w

hich

is a logical g

roupin

g o

f related o

bjects. T

here are tw

o ty

pes o

f modules:

data so

urce an

d w

arehouse.

For ex

ample, a d

ata source m

odule m

ight co

ntain

all the

defi

nitio

ns o

f the tab

les in an

OL

TP

datab

ase that is a so

urce fo

r the d

ata wareh

ouse.

And a m

odule o

f type w

arehouse m

ight co

ntain

defi

nitio

ns o

f the facts, d

imen

sions, an

d

stagin

g tab

les that m

ake u

p th

e data w

arehouse. It is im

portan

t to n

ote th

at modules m

erely

contain

defi

nitio

ns, th

at is metad

ata, about eith

er sources o

r wareh

ouses, an

d n

ot o

bjects

that can

be p

opulated

or q

ueried

. A u

ser iden

tifies th

e integ

rators th

at are appro

priate

for th

e data so

urces, an

d each

integ

rator accesses a so

urce an

d im

ports th

e metad

ata

that d

escribes it.

Ora

cle

sourc

es

To co

nnect to

an O

racle datab

ase, the u

ser chooses th

e integ

rator fo

r Oracle d

atabases.

Nex

t, the u

ser supplies so

me m

ore d

etailed co

nnectio

n in

form

ation, fo

r exam

ple u

ser

nam

e, passw

ord

, and S

QL

*N

et connectio

n strin

g. T

his in

form

ation is u

sed to

defi

ne a

datab

ase link in

the d

atabase th

at hosts th

e OW

B rep

osito

ry. O

WB

uses th

is datab

ase link

to q

uery

the sy

stem catalo

g o

f the so

urce d

atabase an

d ex

tract metad

ata that d

escribes th

e

tables an

d v

iews o

f interest to

the u

ser. The u

ser experien

ces this as a p

rocess o

f visu

ally

insp

ecting th

e source an

d selectin

g o

bjects o

f interest.

Fig

ure

32

.9

Ora

cle

Ware

house

Build

er a

rch

itec

ture

.

32

.5.2

Page 53: Database Warehousing

32.5

Data

Wa

reho

usin

g D

esig

n U

sin

g O

racle

|1

19

9

Non-O

racle

sourc

es

Non-O

racle datab

ases are accessed in

exactly

the sam

e way

as Oracle d

atabases. W

hat

mak

es th

is possib

le is

the

Tran

sparen

t G

ateway

tech

nolo

gy of

Oracle.

In essen

ce, a

Tran

sparen

t Gatew

ay allo

ws a n

on-O

racle datab

ase to b

e treated in

exactly

the sam

e

way

as if it were an

Oracle d

atabase. O

n th

e SQ

L lev

el, once th

e datab

ase link p

oin

ting to

the n

on-O

racle datab

ase has b

een d

efined

, the n

on-O

racle datab

ase can b

e queried

via

SE

LE

CT

just lik

e any O

racle datab

ase. In O

WB

, all the u

ser has to

do is id

entify

the ty

pe

of d

atabase, so

that O

WB

can select th

e appro

priate T

ransp

arent G

ateway

for th

e datab

ase

link d

efinitio

n. In

the case o

f MV

S m

ainfram

e sources, O

WB

and O

racle Pure•E

xtract

pro

vid

e data ex

traction fro

m so

urces su

ch as IM

S, D

B2, an

d V

SA

M. T

he p

lan is th

at

Oracle P

ure•E

xtract w

ill ultim

ately b

e integ

rated w

ith th

e OW

B tech

nolo

gy.

Fla

t file

s

OW

B su

pports tw

o k

inds o

f flat fi

les: character-d

elimited

and fi

xed

-length

files. If th

e data

source is a fl

at file, th

e user selects th

e integ

rator fo

r flat fi

les and sp

ecifies th

e path

and

file n

ame. T

he p

rocess o

f creating th

e meta-d

ata that d

escribes a fi

le is differen

t from

the

pro

cess used

for a tab

le in a d

atabase. W

ith a tab

le, the o

wnin

g d

atabase itself sto

res

exten

sive in

form

ation ab

out th

e table su

ch as th

e table n

ame, th

e colu

mn n

ames, an

d d

ata

types. T

his in

form

ation can

be easily

queried

from

the catalo

g. W

ith a fi

le, on th

e oth

er

han

d, th

e user assists in

the p

rocess o

f creating th

e metad

ata with

som

e intellig

ent g

uesses

supplied

by O

WB

. In O

WB

, this p

rocess is called

sam

plin

g.

Web

data

With

the p

roliferatio

n o

f the In

ternet, th

e new

challen

ge fo

r data w

arehousin

g is to

captu

re

data fro

m W

eb sites. T

here are d

ifferent ty

pes o

f data in

e-Busin

ess enviro

nm

ents: tran

s-

actional W

eb d

ata stored

in th

e underly

ing d

atabases; click

stream d

ata stored

in W

eb serv

er

log fi

les; registratio

n d

ata in d

atabases o

r log fi

les; and co

nso

lidated

clickstream

data in

the lo

g fi

les of W

eb an

alysis to

ols. O

WB

can ad

dress all th

ese sources w

ith its b

uilt-in

features fo

r accessing d

atabases an

d fl

at files.

Data

quality

A so

lutio

n to

the ch

allenge o

f data q

uality

is OW

B w

ith O

racle Pure•In

tegrate. O

racle

Pure•In

tegrate is cu

stom

er data in

tegratio

n so

ftware th

at auto

mates th

e creation o

f con-

solid

ated

pro

files

of

custo

mers

and

related

busin

ess data

to

support

e-Busin

ess an

d

custo

mer relatio

nsh

ip m

anag

emen

t applicatio

ns. P

ure•In

tegrate co

mplem

ents O

WB

by

pro

vid

ing ad

van

ced d

ata transfo

rmatio

n an

d clean

sing featu

res desig

ned

specifi

cally to

meet th

e requirem

ents o

f datab

ase applicatio

ns. T

hese in

clude:

nin

tegrated

nam

e and ad

dress p

rocessin

g to

standard

ize, correct, an

d en

han

ce represen

ta-

tions o

f custo

mer n

ames an

d lo

cations;

nad

van

ced p

robab

ilistic match

ing to

iden

tify u

niq

ue co

nsu

mers, b

usin

esses, househ

old

s,

super-h

ouseh

old

s, or o

ther en

tities for w

hich

no co

mm

on id

entifi

ers exist;

npow

erful ru

le-based

merg

ing to

resolv

e confl

icting d

ata and create th

e ‘best p

ossib

le’

integ

rated resu

lt from

the m

atched

data.

Page 54: Database Warehousing

12

00

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

Desig

nin

g th

e ta

rget w

are

house

Once th

e source sy

stems h

ave b

een id

entifi

ed an

d d

efined

, the n

ext task

is to d

esign th

e

target w

arehouse b

ased o

n u

ser requirem

ents. O

ne o

f the m

ost p

opular d

esigns in

data

wareh

ousin

g is th

e star schem

a and its v

ariations, as d

iscussed

in S

ection 3

2.2

. Also

, man

y

busin

ess intellig

ence to

ols su

ch as O

racle Disco

verer are o

ptim

ized fo

r this k

ind o

f desig

n.

OW

B su

pports all v

ariations o

f star schem

a desig

ns. It featu

res wizard

s and g

raphical

edito

rs for fact an

d d

imen

sions tab

les. For ex

ample, in

the D

imen

sion E

dito

r the u

ser

grap

hically

defi

nes th

e attributes, lev

els, and h

ierarchies o

f a dim

ensio

n.

Map

pin

g s

ourc

es to

targ

ets

When

both

the so

urces an

d th

e target h

ave b

een w

ell defi

ned

, the n

ext step

is to m

ap th

e

two to

geth

er. Rem

ember th

at there are tw

o ty

pes o

f modules: so

urce m

odules an

d w

are-

house m

odules. M

odules can

be reu

sed m

any tim

es in d

ifferent m

appin

gs. W

arehouse

modules can

them

selves b

e used

as source m

odules. F

or ex

ample, in

an arch

itecture w

here

we h

ave an

OL

TP

datab

ase that feed

s a central d

ata wareh

ouse, w

hich

in tu

rn feed

s a data

mart, th

e data w

arehouse is a targ

et (from

the p

erspectiv

e of th

e OL

TP

datab

ase) and a

source (fro

m th

epersp

ective o

f the d

ata mart).

The m

appin

gs o

f OW

B are d

efined

on tw

o lev

els. A h

igh-level

mappin

g th

atin

dicates

source an

d targ

et modules. O

ne lev

el dow

n is th

e deta

il mappin

g th

atallo

ws a u

ser to m

ap

source co

lum

ns to

target co

lum

ns an

d d

efines tran

sform

ations. O

WB

features a b

uilt-in

transfo

rmatio

n lib

rary fro

m w

hich

the u

ser can p

ick p

redefi

ned

transfo

rmatio

ns. U

sers can

also d

efine th

eir ow

n tran

sform

ations in

PL

/SQ

L an

d Jav

a.

Genera

ting

cod

e

The C

ode G

enerato

r is the O

WB

com

ponen

t that read

s the targ

et defi

nitio

ns an

d so

urce-

to-targ

et map

pin

gs an

d g

enerates co

de to

implem

ent th

e wareh

ouse. T

he ty

pe o

f gen

erated

code v

aries dep

endin

g o

n th

e type o

f object th

at the u

ser wan

ts to im

plem

ent.

Log

ical v

ers

us p

hysic

al d

esig

n

Befo

re gen

erating co

de, th

e user h

as prim

arily b

een w

ork

ing o

n th

e logical lev

el, that is,

on th

e level o

f object d

efinitio

ns. O

n th

is level, th

e user is co

ncern

ed w

ith cap

turin

g all th

e

details

and relatio

nsh

ips

(the

seman

tics) of

an object,

but

is not

yet

concern

ed w

ith

defi

nin

g an

y im

plem

entatio

n ch

aracteristics. For ex

ample, co

nsid

er a table to

be im

ple-

men

ted in

an O

racle datab

ase. On th

e logical lev

el, the u

ser may

be co

ncern

ed w

ith th

e

table n

ame, th

e num

ber o

f colu

mns, th

e colu

mn n

ames an

d d

ata types, an

d an

y relatio

n-

ship

s that th

e table h

as to o

ther tab

les. On th

e physical lev

el, how

ever, th

e questio

n

beco

mes: h

ow

can th

is table b

e optim

ally im

plem

ented

in an

Oracle d

atabase? T

he u

ser

must n

ow

be co

ncern

ed w

ith th

ings lik

e tablesp

aces, index

es, and sto

rage p

arameters (see

Sectio

n 8

.2.2

). OW

B allo

ws th

e user to

view

and m

anip

ulate an

object o

n b

oth

the lo

gical

and p

hysical lev

el. The lo

gical d

efinitio

n an

d p

hysical im

plem

entatio

n d

etails are auto

-

matically

synch

ronized

.

Page 55: Database Warehousing

32.5

Data

Wa

reho

usin

g D

esig

n U

sin

g O

racle

|1

20

1

Config

ura

tion

In O

WB

, the p

rocess o

f assignin

g p

hysical ch

aracteristics to an

object is called

confi

gura-

tion. T

he sp

ecific ch

aracteristics that can

be d

efined

dep

end o

n th

e object th

at is bein

g

confi

gured

. These o

bjects in

clude, fo

r exam

ple, sto

rage p

arameters, in

dex

es, tablesp

aces,

and p

artitions.

Valid

atio

n

It is good p

ractice to ch

eck th

e object d

efinitio

ns fo

r com

pleten

ess and co

nsisten

cy p

rior

to co

de g

eneratio

n. O

WB

offers a v

alidate featu

re to au

tom

ate this p

rocess. E

rrors d

etect-

able b

y th

e valid

ation p

rocess in

clude, fo

r exam

ple, d

ata type m

ismatch

es betw

een so

urces

and targ

ets, and fo

reign k

ey erro

rs.

Genera

tion

The fo

llow

ing are so

me o

f the m

ain ty

pes o

f code th

at OW

B p

roduces:

nSQ

L

Data

D

efinitio

n

Language

(DD

L)

com

mands

A

wareh

ouse

module

with

its

defi

nitio

ns o

f fact and d

imen

sion tab

les is implem

ented

as a relational sch

ema in

an

Oracle d

atabase. O

WB

gen

erates SQ

L D

DL

scripts th

at create this sch

ema. T

he scrip

ts

can eith

er be ex

ecuted

from

with

in O

WB

or sav

ed to

the fi

le system

for later, m

anual

execu

tion.

nP

L/S

QL

pro

gra

ms

A so

urce-to

-target m

appin

g resu

lts in a P

L/S

QL

pro

gram

if the

source is a d

atabase, w

heth

er Oracle o

r non-O

racle. The P

L/S

QL

pro

gram

accesses

the so

urce d

atabase v

ia a datab

ase link, p

erform

s the tran

sform

ations as d

efined

in th

e

map

pin

g, an

d lo

ads th

e data in

to th

e target tab

le.

nSQ

L*L

oader co

ntro

l files

If the so

urce in

a map

pin

g is a fl

at file, O

WB

gen

erates a

contro

l file fo

r use w

ith S

QL

*L

oad

er.

nT

cl scripts

OW

B also

gen

erates Tcl scrip

ts. These can

be u

sed to

sched

ule P

L/S

QL

and S

QL

*L

oad

er map

pin

gs as jo

bs in

Oracle E

nterp

rise Man

ager –

for ex

ample, to

refresh th

e wareh

ouse at reg

ular in

tervals.

Insta

ntia

ting

the w

are

house a

nd

extra

ctin

g d

ata

Befo

re the d

ata can b

e moved

from

the so

urce to

the targ

et datab

ase, the d

evelo

per h

as to

instan

tiate the w

arehouse, in

oth

er word

s execu

te the g

enerated

DD

L scrip

ts to create th

e

target sch

ema. O

WB

refers to th

is step as d

eplo

ym

ent. O

nce th

e target sch

ema is in

place,

the P

L/S

QL

pro

gram

s can m

ove d

ata from

the so

urce in

to th

e target. N

ote th

at the b

asic

data m

ovem

ent m

echan

ism is IN

SE

RT

...S

EL

EC

T.

..w

ith th

e use o

f a datab

ase link.

If an erro

r should

occu

r, a routin

e from

one o

f the O

WB

runtim

e pack

ages lo

gs th

e error

in an

audit tab

le.

Main

tain

ing

the w

are

house

Once th

e data w

arehouse h

as been

instan

tiated an

d th

e initial lo

ad h

as been

com

pleted

, it

has to

be m

aintain

ed. F

or ex

ample, th

e fact table h

as to b

e refreshed

at regular in

tervals,

so th

at queries

return

up-to

-date

results.

Dim

ensio

n tab

les hav

e to

be

exten

ded

an

d

Page 56: Database Warehousing

12

02

|C

hap

ter 3

2z

Data

Ware

housin

g D

esig

n

updated

, albeit m

uch

less frequen

tly th

an fact tab

les. An ex

ample o

f a slow

ly ch

angin

g

dim

ensio

n is th

e Custo

mer

table, in

which

a custo

mer’s ad

dress, m

arital status, o

r nam

e

may

all chan

ge o

ver tim

e. In ad

ditio

n to

INS

ER

T, O

WB

also su

pports o

ther w

ays o

f

man

ipulatin

g th

e wareh

ouse:

nU

PD

AT

E

nD

EL

ET

E

nIN

SE

RT

/UP

DA

TE

(insert a ro

w; if it alread

y ex

ists, update it)

nU

PD

AT

E/IN

SE

RT

(update a ro

w; if it d

oes n

ot ex

ist, insert it)

These featu

res giv

e the O

WB

user a v

ariety o

f tools to

undertak

e ongoin

g m

ainten

ance

tasks. O

WB

interfaces w

ith O

racle Enterp

rise Man

ager fo

r repetitiv

e main

tenan

ce tasks;

for ex

ample, a fact tab

le refresh th

at is sched

uled

to o

ccur at a reg

ular in

terval. F

or co

m-

plex

dep

enden

cies OW

B in

tegrates w

ith O

racle Work

flow

.

Meta

data

inte

gra

tion

OW

B is b

ased o

n th

e Com

mon W

arehouse M

odel (C

WM

) standard

(see Sectio

n 3

1.4

.3).

It can seam

lessly ex

chan

ge m

etadata w

ith O

racle Express an

d O

racle Disco

verer as w

ell

as oth

er busin

ess intellig

ence to

ols th

at com

ply

with

the stan

dard

.

Ch

ap

ter S

um

ma

ry

nD

imen

sion

ality

mod

eling is a d

esign tech

niq

ue th

at aims to

presen

t the d

ata in a stan

dard

, intu

itive fo

rm th

at

allow

s for h

igh-p

erform

ance access.

nE

very

dim

ensio

nal m

od

el(D

M) is co

mposed

of o

ne tab

le with

a com

posite p

rimary

key

, called th

e fact ta

ble,

and a set o

f smaller tab

les called d

imen

sion

tab

les. Each

dim

ensio

n tab

le has a sim

ple (n

on-co

mposite)

prim

ary k

ey th

at corresp

onds ex

actly to

one o

f the co

mponen

ts of th

e com

posite k

ey in

the fact tab

le. In o

ther

word

s, the p

rimary

key

of th

e fact table is m

ade u

p o

f two o

r more fo

reign k

eys. T

his ch

aracteristic ‘star-like’

structu

re is called a sta

r schem

aor sta

r join

.

nS

tar

schem

ais a lo

gical stru

cture th

at has a fact tab

le contain

ing factu

al data in

the cen

ter, surro

unded

by

dim

ensio

n tab

les contain

ing referen

ce data (w

hich

can b

e den

orm

alized).

nT

he star sch

ema ex

plo

its the ch

aracteristics of fa

ctual d

ata

such

that facts are g

enerated

by ev

ents th

at

occu

rred in

the p

ast, and are u

nlik

ely to

chan

ge, reg

ardless o

f how

they

are analy

zed. A

s the b

ulk

of d

ata in

the d

ata wareh

ouse is rep

resented

with

in facts, th

e fact tables can

be ex

tremely

large relativ

e to th

e dim

ensio

n

tables.

nT

he m

ost u

seful facts in

a fact ta

ble

are num

erical and ad

ditiv

e becau

se data w

arehouse ap

plicatio

ns alm

ost

nev

er access a single reco

rd; rath

er, they

access hundred

s, thousan

ds, o

r even

millio

ns o

f record

s at a time an

d

the m

ost u

seful th

ing to

do w

ith so

man

y reco

rds is to

aggreg

ate them

.

nD

imen

sion

tab

lesm

ost o

ften co

ntain

descrip

tive tex

tual in

form

ation. D

imen

sion attrib

utes are u

sed as th

e

constrain

ts in d

ata wareh

ouse q

ueries.

nS

now

flak

e schem

ais a v

ariant o

f the star sch

ema w

here d

imen

sion tab

les do n

ot co

ntain

den

orm

alized d

ata.

nS

tarfl

ak

e schem

ais a h

ybrid

structu

re that co

ntain

s a mix

ture o

f star and sn

ow

flak

e schem

as.

Page 57: Database Warehousing

Exerc

ises

|1

20

3

Revie

w Q

ue

stio

ns

31.1

Iden

tify th

e majo

r issues asso

ciated w

ith

desig

nin

g a d

ata wareh

ouse d

atabase.

31.2

Describ

e how

a dim

ensio

nal m

odel (D

M)

differs fro

m an

Entity

–R

elationsh

ip (E

R)

model.

31.3

Presen

t a diag

ramm

atic represen

tation o

f a

typical star sch

ema.

31.4

Describ

e how

the fact an

d d

imen

sional tab

les

of a star sch

ema d

iffer.

31.5

Describ

e how

star, snow

flak

e, and starfl

ake

schem

as differ.

31.6

The star, sn

ow

flak

e, and starfl

ake sch

emas

offer im

portan

t advan

tages in

a data

wareh

ouse en

viro

nm

ent. D

escribe th

ese

advan

tages.

31.7

Describ

e the m

ain activ

ities associated

with

each step

of th

e Nin

e-Step

Meth

odolo

gy fo

r

data w

arehouse d

atabase d

esign.

31.8

Describ

e the p

urp

ose o

f assessing th

e

dim

ensio

nality

of a d

ata wareh

ouse.

31.9

Briefl

y o

utlin

e the criteria g

roups u

sed to

assess the d

imen

sionality

of a d

ata

wareh

ouse.

31.1

0D

escribe h

ow

the O

racle Wareh

ouse

Build

er supports th

e desig

n o

f a data

wareh

ouse.

nT

he k

ey to

understan

din

g th

e relationsh

ip b

etween

dim

ensio

nal m

odels an

d E

R m

odels is th

at a single E

R

model n

orm

ally d

ecom

poses in

to m

ultip

le DM

s. The m

ultip

le DM

s are then

associated

thro

ugh co

nfo

rmed

(shared

) dim

ensio

n tab

les.

nT

here are m

any ap

pro

aches th

at offer altern

ative ro

utes to

the creatio

n o

f a data w

arehouse. O

ne o

f the m

ore

successfu

l appro

aches is to

deco

mpose th

e desig

n o

f the d

ata wareh

ouse in

to m

ore m

anag

eable p

arts, nam

ely

data

marts. A

t a later stage, th

e integ

ration o

f the sm

aller data m

arts leads to

the creatio

n o

f the en

terprise-

wid

e data w

arehouse.

nT

he N

ine-S

tep M

ethod

olo

gy

specifi

es the step

s required

for th

e desig

n o

f a data m

art / wareh

ouse. T

he step

s

inclu

de: S

tep 1

Choosin

g th

e pro

cess, Step

2 C

hoosin

g th

e grain

, Step

3 Id

entify

ing an

d co

nfo

rmin

g th

e

dim

ensio

ns, S

tep 4

Choosin

g th

e facts, Step

5 S

torin

g p

re-calculatio

ns in

the fact tab

le, Step

6 R

oundin

g o

ut

the d

imen

sions, S

tep 7

Choosin

g th

e duratio

n o

f the d

atabase, S

tep 8

Track

ing slo

wly

chan

gin

g d

imen

sions,

and S

tep 9

Decid

ing th

e query

prio

rities and q

uery

modes.

nT

here are criteria to

measu

re the ex

tent to

which

a system

supports th

e dim

ensio

nal v

iew o

f data w

arehous-

ing. T

he criteria are d

ivid

ed in

to th

ree bro

ad g

roups: a

rchitectu

re, adm

inistra

tion

, and exp

ression.

nO

racle W

areh

ou

se Bu

ilder (O

WB

) is a key

com

ponen

t of th

e Oracle W

arehouse so

lutio

n, en

ablin

g th

e

desig

n an

d d

eplo

ym

ent o

f data w

arehouses, d

ata marts, an

d e-B

usin

ess intellig

ence ap

plicatio

ns. O

WB

is both

a desig

n to

ol an

d an

extractio

n, tran

sform

ation, an

d lo

adin

g (E

TL

) tool.

Ex

erc

ise

s

31.1

1U

se the N

ine-S

tep M

ethodolo

gy fo

r data w

arehouse d

atabase d

esign to

pro

duce d

imen

sional m

odels fo

r the

case studies d

escribed

in A

ppen

dix

B.

31.1

2U

se the N

ine-S

tep M

ethodolo

gy fo

r data w

arehouse d

atabase d

esign to

pro

duce a d

imen

sional m

odel fo

r all

or p

art of y

our o

rgan

ization.

Page 58: Database Warehousing

33 C

hap

ter

OLA

P

Ch

ap

ter O

bje

ctiv

es

In th

is c

hap

ter y

ou w

ill learn

:

nThe p

urp

ose o

f Onlin

e A

naly

tical P

rocessin

g (O

LA

P).

nThe re

latio

nship

betw

een O

LA

P a

nd

data

ware

housin

g.

nThe k

ey fe

atu

res o

f OLA

P a

pp

licatio

ns.

nThe p

ote

ntia

l benefits

associa

ted

with

successfu

l OL

AP

ap

plic

atio

ns.

nH

ow

to re

pre

sent m

ulti-d

imensio

nal d

ata

.

nThe ru

les fo

r OLA

P to

ols

.

nThe m

ain

cate

gorie

s o

f OLA

P to

ols

.

nO

LA

P e

xte

nsio

ns to

the S

QL s

tand

ard

.

nH

ow

Ora

cle

sup

ports

OLA

P.

In C

hap

ter 31 w

e discu

ssed th

e increasin

g p

opularity

of d

ata wareh

ousin

g as a m

eans o

f

gain

ing

com

petitiv

e ad

van

tage.

We

learnt

that

data

wareh

ouses

brin

g

togeth

er larg

e

volu

mes o

f data fo

r the p

urp

oses o

f data an

alysis. U

ntil recen

tly, access to

ols fo

r large

datab

ase sy

stems

hav

e pro

vid

ed

only

lim

ited

and

relatively

sim

plistic

data

analy

sis.

How

ever, acco

mpan

yin

g th

e gro

wth

in d

ata wareh

ousin

g is an

ever-in

creasing d

eman

d b

y

users fo

r more p

ow

erful access to

ols th

at pro

vid

e advan

ced an

alytical cap

abilities. T

here

are tw

o

main

ty

pes

of

access to

ols

availab

le to

m

eet th

is dem

and,

nam

ely

Onlin

e

Analy

tical Pro

cessing (O

LA

P) an

d d

ata min

ing. T

hese to

ols d

iffer in w

hat th

ey o

ffer the

user an

d b

ecause o

f this th

ey are co

mplem

entary

technolo

gies.

A d

ata wareh

ouse (o

r more co

mm

only

one o

r more d

ata marts) to

geth

er with

tools su

ch

as OL

AP

and

/or d

ata min

ing are co

llectively

referred to

as Bu

siness In

telligen

ce(B

I)

technolo

gies. In

this ch

apter w

e describ

e OL

AP

and in

the fo

llow

ing ch

apter w

e describ

e

data m

inin

g.

Page 59: Database Warehousing

33

.1 O

nlin

e A

naly

tica

l Pro

cessin

g|

12

05

Str

uc

ture

of th

is C

ha

pte

r

In S

ection 33.1

w

e in

troduce

Onlin

e A

naly

tical P

rocessin

g (O

LA

P)

and discu

ss th

e

relationsh

ip b

etween

OL

AP

and d

ata wareh

ousin

g. In

Sectio

n 3

3.2

we d

escribe O

LA

P

applicatio

ns an

d id

entify

the k

ey featu

res and p

oten

tial ben

efits asso

ciated w

ith O

LA

P

applicatio

ns. In

Sectio

n 3

3.3

we d

iscuss h

ow

multi-d

imen

sional d

ata can b

e represen

ted

and d

escribe th

e main

concep

ts associated

with

multi-d

imen

sional an

alysis. In

Sectio

n

33.4

we d

escribe th

e rules fo

r OL

AP

tools an

d h

ighlig

ht th

e characteristics an

d issu

es

associated

with

OL

AP

tools. In

Sectio

n 3

3.5

we d

iscuss h

ow

the S

QL

standard

has b

een

exten

ded

to in

clude O

LA

P fu

nctio

ns. F

inally

, in S

ection 3

3.6

, we d

escribe h

ow

Oracle

supports O

LA

P. T

he ex

amples in

this ch

apter are tak

en fro

m th

e Dream

Hom

ecase stu

dy

describ

ed in

Sectio

n 1

0.4

and A

ppen

dix

A.

On

line

An

aly

tica

l Pro

ce

ssin

g

Over th

e past few

decad

es, we h

ave w

itnessed

the in

creasing p

opularity

and p

revalen

ce of

relational D

BM

Ss su

ch th

at we n

ow

find a sig

nifi

cant p

roportio

n o

f corp

orate d

ata is housed

in su

ch sy

stems. R

elational d

atabases h

ave b

een u

sed p

rimarily

to su

pport trad

itional

Onlin

e Tran

saction P

rocessin

g (O

LT

P) sy

stems. T

o p

rovid

e appro

priate su

pport fo

r OL

TP

system

s, relational D

BM

Ss h

ave b

een d

evelo

ped

to en

able th

e hig

hly

efficien

t execu

tion

of a larg

e num

ber o

f relatively

simple tran

sactions.

In th

e past few

years, relatio

nal D

BM

S v

endors h

ave targ

eted th

e data w

arehousin

g

mark

et and h

ave p

rom

oted

their sy

stems as to

ols fo

r build

ing d

ata wareh

ouses. A

s dis-

cussed

in C

hap

ter 31, a d

ata wareh

ouse sto

res operatio

nal d

ata and is ex

pected

to su

pport

a wid

e range o

f queries fro

m th

e relatively

simple to

the h

ighly

com

plex

. How

ever, th

e

ability

to an

swer p

articular q

ueries is d

epen

den

t on th

e types o

f end-u

ser access tools

availab

le for u

se on th

e data w

arehouse. G

eneral-p

urp

ose to

ols su

ch as rep

ortin

g an

d q

uery

tools can

easily su

pport ‘w

ho?’ an

d ‘w

hat?’ q

uestio

ns ab

out p

ast even

ts. A ty

pical q

uery

subm

itted d

irectly to

a data w

arehouse is: ‘W

hat w

as the to

tal reven

ue fo

r Sco

tland in

the

third

quarter o

f 2004?’. In

this sectio

n w

e focu

s on a to

ol th

at can su

pport m

ore ad

van

ced

queries, n

amely

Onlin

e Analy

tical Pro

cessing (O

LA

P).

On

line

An

aly

tica

lThe d

ynam

ic s

ynth

esis

, analy

sis

, and

conso

lidatio

n o

f larg

e

Pro

ce

ssin

g (O

LA

P)

volu

mes o

f multi-d

imensio

nal d

ata

.

OL

AP

is a term th

at describ

es a technolo

gy th

at uses a m

ulti-d

imen

sional v

iew o

f aggre-

gate d

ata to p

rovid

e quick

access to strateg

ic info

rmatio

n fo

r the p

urp

oses o

f advan

ced

analy

sis (Codd e

t al., 1

995). O

LA

P en

ables u

sers to g

ain a d

eeper u

nderstan

din

g an

d k

now

-

ledge ab

out v

arious asp

ects of th

eir corp

orate d

ata thro

ugh fast, co

nsisten

t, interactiv

e access

to a w

ide v

ariety o

f possib

le view

s of th

e data. O

LA

P allo

ws th

e user to

view

corp

orate

data

in su

ch a w

ay th

at it is a better m

odel o

f the tru

e dim

ensio

nality

of th

e enterp

rise.

While O

LA

P sy

stems can

easily an

swer ‘w

ho?’ an

d ‘w

hat?’ q

uestio

ns, it is th

eir ability

to

answ

er ‘what if?’ an

d ‘w

hy?’ ty

pe q

uestio

ns th

at distin

guish

es them

from

gen

eral-purp

ose

33

.1