database warehousing
DESCRIPTION
Database WarehousingTRANSCRIPT
Busin
ess In
tellig
ence
Ch
ap
ter 3
1D
ata
Ware
housin
g C
oncep
ts1149
Ch
ap
ter 3
2D
ata
Ware
housin
g D
esig
n1
181
Ch
ap
ter 3
3O
LA
P1204
Ch
ap
ter 3
4D
ata
Min
ing
1232
9P
art
31 C
hap
ter
Data
Ware
housin
g
Concep
ts
Ch
ap
ter O
bje
ctiv
es
In th
is c
hap
ter y
ou w
ill learn
:
nH
ow
data
ware
housin
g e
volv
ed
.
nThe m
ain
concep
ts a
nd
benefits
associa
ted
with
data
ware
ho
usin
g.
nH
ow
Onlin
e T
ransactio
n P
rocessin
g (O
LTP
) syste
ms d
iffer fro
m d
ata
ware
hou
sin
g.
nThe p
rob
lem
s a
ssocia
ted
with
data
ware
housin
g.
nThe a
rchite
ctu
re a
nd
main
com
ponents
of a
data
ware
house
.
nThe im
porta
nt d
ata
flow
s o
r pro
cesses o
f a d
ata
ware
house.
nThe m
ain
tools
and
technolo
gie
s a
ssocia
ted
with
data
ware
hou
sin
g.
nThe is
sues a
ssocia
ted
with
the in
teg
ratio
n o
f a d
ata
ware
house
and
the
imp
orta
nce o
f manag
ing
meta
data
.
nThe c
oncep
t of a
data
mart a
nd
the m
ain
reasons fo
r imp
lem
entin
g a
da
ta m
art.
nThe m
ain
issues a
ssocia
ted
with
the d
evelo
pm
ent a
nd
manag
em
ent o
f data
marts
.
nH
ow
Ora
cle
sup
ports
data
ware
housin
g.
We h
ave alread
y n
oted
in earlier ch
apters th
at datab
ase man
agem
ent sy
stems are p
ervas-
ive th
roughout in
dustry
, with
relational d
atabase m
anag
emen
t system
s bein
g th
e dom
in-
ant sy
stem. T
hese sy
stems h
ave b
een d
esigned
to h
andle h
igh tran
saction th
roughput,
with
transactio
ns ty
pically
mak
ing sm
all chan
ges to
the o
rgan
ization’s o
peratio
nal d
ata,
that is, d
ata that th
e org
anizatio
n req
uires to
han
dle its d
ay-to
-day
operatio
ns. T
hese ty
pes
of sy
stem are called
Onlin
e Tran
saction P
rocessin
g (O
LT
P) sy
stems. T
he size o
f OL
TP
datab
ases can ran
ge fro
m sm
all datab
ases of a few
meg
abytes (M
b), to
med
ium
-sized
datab
ases with
several g
igab
ytes (G
b), to
large d
atabases req
uirin
g terab
ytes (T
b) o
r even
petab
ytes (P
b) o
f storag
e.
Corp
orate d
ecision-m
akers req
uire access to
all the o
rgan
ization’s d
ata, wherev
er it is
located
. To p
rovid
e com
preh
ensiv
e analy
sis of th
e org
anizatio
n, its b
usin
ess, its require-
men
ts, and an
y tren
ds, req
uires access to
not o
nly
the cu
rrent v
alues in
the d
atabase b
ut
also to
histo
rical data. T
o facilitate th
is type o
f analy
sis, the d
ata
wareh
ouse h
as been
created to
hold
data d
rawn fro
m sev
eral data so
urces, m
aintain
ed b
y d
ifferent o
peratin
g
units, to
geth
er with
histo
rical and su
mm
ary tran
sform
ations. T
he d
ata wareh
ouse b
ased o
n
11
50
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
31
.1
exten
ded
datab
ase tech
nolo
gy
pro
vid
es th
e m
anag
emen
t of
the
datasto
re. H
ow
ever,
decisio
n-m
akers also
require p
ow
erful an
alysis to
ols. T
wo m
ain ty
pes o
f analy
sis tools
hav
e emerg
ed o
ver th
e last few y
ears: Onlin
e Analy
tical Pro
cessing (O
LA
P) an
d d
ata
min
ing to
ols.
As d
ata wareh
ousin
g is su
ch a co
mplex
subject, w
e hav
e dev
oted
four ch
apters to
differen
t aspects o
f data w
arehousin
g. In
this ch
apter, w
e describ
e the b
asic concep
ts asso-
ciated w
ith d
ata wareh
ousin
g. In
Chap
ter 32 w
e describ
e how
to d
esign an
d b
uild
a data
wareh
ouse an
d in
Chap
ters 33 an
d 3
4 w
e discu
ss the im
portan
t end-u
ser access tools fo
r a
data w
arehouse.
Str
uc
ture
of th
is C
ha
pte
r
In S
ection 3
1.1
we o
utlin
e what d
ata wareh
ousin
g is an
d h
ow
it evolv
ed, an
d also
describ
e
the p
oten
tial ben
efits an
d p
roblem
s associated
with
this ap
pro
ach. In
Sectio
n 3
1.2
we
describ
e the arch
itecture an
d m
ain co
mponen
ts of a d
ata wareh
ouse. In
Sectio
ns 3
1.3
and
31.4
we id
entify
and d
iscuss th
e importan
t data fl
ow
s or p
rocesses o
f a data w
arehouse, an
d
the asso
ciated to
ols an
d tech
nolo
gies o
f a data w
arehouse, resp
ectively
. In S
ection 3
1.5
we
intro
duce d
ata marts an
d th
e issues asso
ciated w
ith th
e dev
elopm
ent an
d m
anag
emen
t of
data m
arts. Fin
ally, in
Sectio
n 3
1.6
we p
resent an
overv
iew o
f how
Oracle su
pports a d
ata
wareh
ouse en
viro
nm
ent. T
he ex
amples in
this ch
apter are tak
en fro
m th
e Drea
mH
om
e
case study d
escribed
in S
ection 1
0.4
and A
ppen
dix
A.
Intro
du
ctio
n to
Data
Wa
reh
ou
sin
g
In th
is section w
e discu
ss the o
rigin
and ev
olu
tion o
f the co
ncep
t of d
ata wareh
ousin
g.
We th
en d
iscuss th
e main
ben
efits asso
ciated w
ith d
ata wareh
ousin
g. W
e nex
t iden
tify th
e
main
characteristics o
f data w
arehousin
g sy
stems in
com
pariso
n w
ith O
nlin
e Tran
saction
Pro
cessing (O
LT
P) sy
stems. W
e conclu
de th
is section b
y ex
amin
ing th
e pro
blem
s of
dev
elopin
g an
d m
anag
ing a d
ata wareh
ouse.
Th
e E
vo
lutio
n o
f Data
Wa
reh
ou
sin
g
Sin
ce the 1
970s, o
rgan
izations h
ave m
ostly
focu
sed th
eir investm
ent in
new
com
puter
system
s that au
tom
ate busin
ess pro
cesses. In th
is way
, org
anizatio
ns g
ained
com
petitiv
e
advan
tage th
rough sy
stems th
at offered
more effi
cient an
d co
st-effective serv
ices to th
e
custo
mer. T
hro
ughout th
is perio
d, o
rgan
izations accu
mulated
gro
win
g am
ounts o
f data
stored
in th
eir operatio
nal d
atabases. H
ow
ever, in
recent tim
es, where su
ch sy
stems are
com
monplace,
org
anizatio
ns
are fo
cusin
g on w
ays
to use
operatio
nal
data
to su
pport
decisio
n-m
akin
g, as a m
eans o
f regain
ing co
mpetitiv
e advan
tage.
Operatio
nal sy
stems w
ere nev
er desig
ned
to su
pport su
ch b
usin
ess activities an
d so
usin
g th
ese system
s for d
ecision-m
akin
g m
ay n
ever b
e an easy
solu
tion. T
he leg
acy is th
at
31
.1.1
31.1
Intro
du
ctio
n to
Da
ta W
are
housin
g|
11
51
a ty
pical
org
anizatio
n m
ay hav
e num
erous
operatio
nal
system
s w
ith overlap
pin
g an
d
som
etimes co
ntrad
ictory
defi
nitio
ns, su
ch as d
ata types. T
he ch
allenge fo
r an o
rgan
ization
is to tu
rn its arch
ives o
f data in
to a so
urce o
f know
ledge, so
that a sin
gle in
tegrated
/
conso
lidated
view
of th
e org
anizatio
n’s d
ata is presen
ted to
the u
ser. The co
ncep
t of a
data w
arehouse w
as deem
ed th
e solu
tion to
meet th
e requirem
ents o
f a system
capab
le of
supportin
g d
ecision-m
akin
g, receiv
ing d
ata from
multip
le operatio
nal d
ata sources.
Data
Wa
reh
ou
sin
g C
on
ce
pts
The o
rigin
al concep
t of a d
ata wareh
ouse w
as dev
ised b
y IB
M as th
e ‘info
rmatio
n w
are-
house’ an
d p
resented
as a solu
tion fo
r accessing d
ata held
in n
on-relatio
nal sy
stems. T
he
info
rmatio
n w
arehouse w
as pro
posed
to allo
w o
rgan
izations to
use th
eir data arch
ives to
help
them
gain
a busin
ess advan
tage. H
ow
ever, d
ue to
the sh
eer com
plex
ity an
d p
erform
-
ance p
roblem
s associated
with
the im
plem
entatio
n o
f such
solu
tions, th
e early attem
pts at
creating an
info
rmatio
n w
arehouse w
ere mostly
rejected. S
ince th
en, th
e concep
t of d
ata
wareh
ousin
g h
as been
raised sev
eral times b
ut it is o
nly
in recen
t years th
at the p
oten
tial
of d
ata wareh
ousin
g is n
ow
seen as a v
aluab
le and v
iable so
lutio
n. T
he latest an
d m
ost
successfu
l advocate fo
r data w
arehousin
g is B
ill Inm
on, w
ho h
as earned
the title o
f ‘father
of d
ata wareh
ousin
g’ d
ue to
his activ
e pro
motio
n o
f the co
ncep
t.
Da
taA
sub
ject-o
riente
d, in
teg
rate
d, tim
e-v
aria
nt, a
nd
non-v
ola
tile c
olle
c-
wa
reh
ou
sin
gtio
n o
f data
in s
up
port o
f manag
em
ent’s
decis
ion-m
akin
g p
rocess.
In th
is defi
nitio
n b
y In
mon (1
993), th
e data is:
nSubject-o
riented
as the w
arehouse is o
rgan
ized aro
und th
e majo
r subjects o
f the en
ter-
prise (su
ch as cu
stom
ers, pro
ducts, an
d sales) rath
er than
the m
ajor ap
plicatio
n areas
(such
as custo
mer in
voicin
g, sto
ck co
ntro
l, and p
roduct sales). T
his is refl
ected in
the
need
to sto
re decisio
n-su
pport d
ata rather th
an ap
plicatio
n-o
riented
data.
nIn
tegra
ted b
ecause o
f the co
min
g to
geth
er of so
urce d
ata from
differen
t enterp
rise-wid
e
applicatio
ns sy
stems. T
he so
urce d
ata is often
inco
nsisten
t usin
g, fo
r exam
ple, d
ifferent
form
ats. The in
tegrated
data so
urce m
ust b
e mad
e consisten
t to p
resent a u
nifi
ed v
iew
of th
e data to
the u
sers.
nT
ime-va
riant
becau
se data in
the w
arehouse is o
nly
accurate an
d v
alid at so
me p
oin
t in
time o
r over so
me tim
e interv
al. The tim
e-varian
ce of th
e data w
arehouse is also
show
n
in th
e exten
ded
time th
at the d
ata is held
, the im
plicit o
r explicit asso
ciation o
f time w
ith
all data, an
d th
e fact that th
e data rep
resents a series o
f snap
shots.
nN
on-vo
latile as th
e data is n
ot u
pdated
in real tim
e but is refresh
ed fro
m o
peratio
nal
system
s on a reg
ular b
asis. New
data is alw
ays ad
ded
as a supplem
ent to
the d
atabase,
rather th
an a rep
lacemen
t. The d
atabase co
ntin
ually
abso
rbs th
is new
data, in
cremen
-
tally in
tegratin
g it w
ith th
e prev
ious d
ata.
There are n
um
erous d
efinitio
ns o
f data w
arehousin
g, w
ith th
e earlier defi
nitio
ns fo
cusin
g
on th
e characteristics o
f the d
ata held
in th
e wareh
ouse. A
lternativ
e defi
nitio
ns w
iden
the
31
.1.2
11
52
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
scope o
f the d
efinitio
n o
f data w
arehousin
g to
inclu
de th
e pro
cessing asso
ciated w
ith
accessing th
e data fro
m th
e orig
inal so
urces to
the d
elivery
of th
e data to
the d
ecision-
mak
ers (Anah
ory
and M
urray
, 1997).
Whatev
er the d
efinitio
n, th
e ultim
ate goal o
f data w
arehousin
g is to
integ
rate enterp
rise-
wid
e corp
orate d
ata into
a single rep
osito
ry fro
m w
hich
users can
easily ru
n q
ueries, p
ro-
duce rep
orts, an
d p
erform
analy
sis. In su
mm
ary, a d
ata wareh
ouse is d
ata man
agem
ent an
d
data an
alysis tech
nolo
gy.
In recen
t years a n
ew term
associated
with
data w
arehousin
g h
as been
used
, nam
ely
‘Data W
ebhouse’.
Da
taA
dis
tribute
d d
ata
ware
house th
at is
imp
lem
ente
d o
ver th
e W
eb
with
We
bh
ou
se
no c
entra
l data
rep
osito
ry.
The
Web
is
an im
men
se so
urce
of
beh
avio
ral data
as in
div
iduals
interact
thro
ugh
their W
eb b
row
sers with
remote W
eb sites. T
he d
ata gen
erated b
y th
is beh
avio
r is called
clickstrea
m. U
sing a d
ata wareh
ouse o
n th
e Web
to h
arness click
stream d
ata has led
to
the d
evelo
pm
ent o
f Data W
ebhouses. F
urth
er discu
ssions o
n th
e dev
elopm
ent o
f this n
ew
variatio
n o
f data w
arehousin
g is o
ut w
ith th
e scope o
f this b
ook, h
ow
ever th
e interested
reader is referred
to K
imball et a
l.(2
000).
Be
ne
fits
of D
ata
Wa
reh
ou
sin
g
The
successfu
l im
plem
entatio
n
of
a data
wareh
ouse
can
brin
g
majo
r ben
efits
to
an
org
anizatio
n in
cludin
g:
nP
oten
tial h
igh retu
rns o
n in
vestmen
tA
n o
rgan
ization m
ust co
mm
it a huge am
ount o
f
resources to
ensu
re the su
ccessful im
plem
entatio
n o
f a data w
arehouse an
d th
e cost
can v
ary en
orm
ously
from
£50,0
00 to
over £
10 m
illion d
ue to
the v
ariety o
f technical
solu
tions av
ailable. H
ow
ever, a stu
dy b
y th
e Intern
ational D
ata Corp
oratio
n (ID
C) in
1996 rep
orted
that av
erage th
ree-year retu
rns o
n in
vestm
ent (R
OI) in
data w
arehousin
g
reached
401%
, with
over 9
0%
of th
e com
pan
ies surv
eyed
achiev
ing o
ver 4
0%
RO
I, half
the co
mpan
ies achiev
ing o
ver 1
60%
RO
I, and a q
uarter w
ith m
ore th
an 6
00%
RO
I
(IDC
, 1996).
nC
om
petitive a
dva
nta
ge
The h
uge retu
rns o
n in
vestm
ent fo
r those co
mpan
ies that h
ave
successfu
lly im
plem
ented
a data w
arehouse is ev
iden
ce of th
e enorm
ous co
mpetitiv
e
advan
tage
that
accom
pan
ies th
is tech
nolo
gy.
The
com
petitiv
e ad
van
tage
is gain
ed
by allo
win
g d
ecision-m
akers access to
data th
at can rev
eal prev
iously
unav
ailable,
unknow
n, an
d u
ntap
ped
info
rmatio
n o
n, fo
r exam
ple, cu
stom
ers, trends, an
d d
eman
ds.
nIn
creased
pro
ductivity
of
corp
ora
te decisio
n-m
akers
Data
wareh
ousin
g
impro
ves
the p
roductiv
ity o
f corp
orate d
ecision-m
akers b
y creatin
g an
integ
rated d
atabase o
f
consisten
t, su
bject-o
riented
, histo
rical data.
It in
tegrates
data
from
m
ultip
le in
com
-
patib
le system
s into
a form
that p
rovid
es one co
nsisten
t view
of th
e org
anizatio
n. B
y
transfo
rmin
g data
into
m
eanin
gfu
l in
form
ation,
a data
wareh
ouse
allow
s co
rporate
decisio
n-m
akers to
perfo
rm m
ore su
bstan
tive, accu
rate, and co
nsisten
t analy
sis.
31
.1.3
31.1
Intro
du
ctio
n to
Da
ta W
are
housin
g|
11
53
Co
mp
aris
on
of O
LT
P S
yste
ms a
nd
D
ata
Wa
reh
ou
sin
g
A D
BM
S b
uilt fo
r Onlin
e Tran
saction P
rocessin
g (O
LT
P) is g
enerally
regard
ed as u
nsu
it-
able fo
r data w
arehousin
g b
ecause each
system
is desig
ned
with
a differin
g set o
f require-
men
ts in m
ind. F
or ex
ample, O
LT
P sy
stems are d
esigned
to m
axim
ize the tran
saction
pro
cessing cap
acity, w
hile d
ata wareh
ouses are d
esigned
to su
pport a
d h
oc
query
pro
-
cessing. T
able 3
1.1
pro
vid
es a com
pariso
n o
f the m
ajor ch
aracteristics of O
LT
P sy
stems
and d
ata wareh
ousin
g sy
stems (S
ingh, 1
997).
An o
rgan
ization w
ill norm
ally h
ave a n
um
ber o
f differen
t OL
TP
system
s for b
usin
ess
pro
cesses such
as inven
tory
contro
l, custo
mer in
voicin
g, an
d p
oin
t-of-sale. T
hese sy
stems
gen
erate operatio
nal d
ata that is d
etailed, cu
rrent, an
d su
bject to
chan
ge. T
he O
LT
P sy
s-
tems are o
ptim
ized fo
r a hig
h n
um
ber o
f transactio
ns th
at are pred
ictable, rep
etitive, an
d
update in
tensiv
e. The O
LT
P d
ata is org
anized
accord
ing to
the req
uirem
ents o
f the tran
s-
actions asso
ciated w
ith th
e busin
ess applicatio
ns an
d su
pports th
e day
-to-d
ay d
ecisions o
f
a large n
um
ber o
f concu
rrent o
peratio
nal u
sers.
In co
ntrast, an
org
anizatio
n w
ill norm
ally h
ave a sin
gle d
ata wareh
ouse, w
hich
hold
s
data th
at is histo
rical, detailed
, and su
mm
arized to
vario
us lev
els and rarely
subject to
chan
ge (o
ther th
an b
eing su
pplem
ented
with
new
data). T
he d
ata wareh
ouse is d
esigned
to su
pport relativ
ely lo
w n
um
bers o
f transactio
ns th
at are unpred
ictable in
natu
re and
require an
swers to
queries th
at are ad h
oc, u
nstru
ctured
, and h
euristic. T
he w
arehouse d
ata
is org
anized
accord
ing to
the req
uirem
ents o
f poten
tial queries an
d su
pports th
e long-term
strategic d
ecisions o
f a relatively
low
num
ber o
f man
agerial u
sers.
Alth
ough O
LT
P sy
stems an
d d
ata wareh
ouses h
ave d
ifferent ch
aracteristics and are
built w
ith d
ifferent p
urp
oses in
min
d, th
ese system
s are closely
related in
that th
e OL
TP
system
s pro
vid
e the so
urce d
ata for th
e wareh
ouse. A
majo
r pro
blem
of th
is relationsh
ip
is that th
e data h
eld b
y th
e OL
TP
system
s can b
e inco
nsisten
t, fragm
ented
, and su
bject
31
.1.4
Tab
le 31.1
Com
pariso
n o
f OL
TP
system
s and d
ata wareh
ousin
g sy
stems.
OLTP
syste
ms
Hold
s curren
t data
Sto
res detailed
data
Data is d
ynam
ic
Rep
etitive p
rocessin
g
Hig
h lev
el of tran
saction th
roughput
Pred
ictable p
attern o
f usag
e
Tran
saction-d
riven
Applicatio
n-o
riented
Supports d
ay-to
-day
decisio
ns
Serv
es large n
um
ber o
f clerical/o
peratio
nal u
sers
Data
ware
housin
g s
yste
ms
Hold
s histo
rical data
Sto
res detailed
, lightly
, and h
ighly
sum
marized
data
Data is larg
ely static
Ad h
oc, u
nstru
ctured
, and h
euristic p
rocessin
g
Med
ium
to lo
w lev
el of tran
saction th
roughput
Unpred
ictable p
attern o
f usag
e
Analy
sis driv
en
Subject-o
riented
Supports strateg
ic decisio
ns
Serv
es relatively
low
num
ber o
f man
agerial u
sers
11
54
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
to ch
ange, co
ntain
ing d
uplicate o
r missin
g en
tries. As su
ch, th
e operatio
nal d
ata must b
e
‘cleaned
up’ b
efore it can
be u
sed in
the d
ata wareh
ouse. W
e discu
ss the task
s associated
with
this p
rocess in
Sectio
n 3
1.3
.1.
OL
TP
system
s are not b
uilt to
quick
ly an
swer a
d h
oc
queries. T
hey
also ten
d n
ot to
store
histo
rical data, w
hich
is necessary
to an
alyze tren
ds. B
asically, O
LT
P o
ffers large am
ounts
of raw
data, w
hich
is not easily
analy
zed. T
he d
ata wareh
ouse allo
ws m
ore co
mplex
queries
to b
e answ
ered b
esides ju
st simple ag
greg
ations su
ch as, ‘W
hat is th
e averag
e selling p
rice
for p
roperties in
the m
ajor cities o
f Great B
ritain?’. T
he ty
pes o
f queries th
at a data w
are-
house is ex
pected
to an
swer ran
ge fro
m th
e relatively
simple to
the h
ighly
com
plex
and are
dep
enden
t on th
e types o
f end-u
ser access tools u
sed (see S
ection 3
1.2
.10). E
xam
ples o
f the
range o
f queries th
at the D
ream
Hom
edata w
arehouse m
ay b
e capab
le of su
pportin
g in
clude:
nW
hat w
as the to
tal reven
ue fo
r Sco
tland in
the th
ird q
uarter o
f 2004?
nW
hat w
as the to
tal reven
ue fo
r pro
perty
sales for each
type o
f pro
perty
in G
reat Britain
in 2
003?
nW
hat are th
e three m
ost p
opular areas in
each city
for th
e rentin
g o
f pro
perty
in 2
004
and h
ow
does th
is com
pare w
ith th
e results fo
r the p
revio
us tw
o y
ears?
nW
hat is th
e month
ly rev
enue fo
r pro
perty
sales at each b
ranch
offi
ce, com
pared
with
rollin
g 1
2-m
onth
ly p
rior fi
gures?
nW
hat w
ould
be th
e effect on p
roperty
sales in th
e differen
t regio
ns o
f Britain
if legal co
sts
wen
t up b
y 3
.5%
and G
overn
men
t taxes w
ent d
ow
n b
y 1
.5%
for p
roperties o
ver £
100,0
00?
nW
hich
type o
f pro
perty
sells for p
rices above th
e averag
e selling p
rice for p
roperties in
the m
ain cities o
f Great B
ritain an
d h
ow
does th
is correlate to
dem
ograp
hic d
ata?
nW
hat is th
e relationsh
ip b
etween
the to
tal annual rev
enue g
enerated
by each
bran
ch
offi
ce and th
e total n
um
ber o
f sales staff assigned
to each
bran
ch o
ffice?
Pro
ble
ms o
f Data
Wa
reh
ou
sin
g
The p
roblem
s associated
with
dev
elopin
g an
d m
anag
ing a d
ata wareh
ouse are listed
in
Tab
le 31.2
(Green
field
, 1996).
Tab
le 31.2
Pro
blem
s of d
ata wareh
ousin
g.
Underestim
ation o
f resources fo
r data lo
adin
g
Hid
den
pro
blem
s with
source sy
stems
Req
uired
data n
ot cap
tured
Increased
end-u
ser dem
ands
Data h
om
ogen
ization
Hig
h d
eman
d fo
r resources
Data o
wnersh
ip
Hig
h m
ainten
ance
Long-d
uratio
n p
rojects
Com
plex
ity o
f integ
ration
31
.1.5
31.1
Intro
du
ctio
n to
Da
ta W
are
housin
g|
11
55
Und
ere
stim
atio
n o
f resourc
es fo
r data
load
ing
Man
y d
evelo
pers u
nderestim
ate the tim
e required
to ex
tract, clean, an
d lo
ad th
e data in
to
the w
arehouse. T
his p
rocess m
ay acco
unt fo
r a signifi
cant p
roportio
n o
f the to
tal dev
elop-
men
t time, alth
ough b
etter data clean
sing an
d m
anag
emen
t tools sh
ould
ultim
ately red
uce
the tim
e and effo
rt spen
t.
Hid
den p
rob
lem
s w
ith s
ourc
e s
yste
ms
Hid
den
pro
blem
s associated
with
the so
urce sy
stems feed
ing th
e data w
arehouse m
ay b
e
iden
tified
, possib
ly after y
ears of b
eing u
ndetected
. The d
evelo
per m
ust d
ecide w
heth
er
to fi
x th
e pro
blem
in th
e data w
arehouse an
d/o
r fix th
e source sy
stems. F
or ex
ample, w
hen
enterin
g th
e details o
f a new
pro
perty
, certain fi
elds m
ay allo
w n
ulls, w
hich
may
result in
staff enterin
g in
com
plete p
roperty
data, ev
en w
hen
availab
le and ap
plicab
le.
Req
uire
d d
ata
not c
ap
ture
d
Wareh
ouse p
rojects o
ften h
ighlig
ht a req
uirem
ent fo
r data n
ot b
eing cap
tured
by th
e
existin
g so
urce sy
stems. T
he o
rgan
ization m
ust d
ecide w
heth
er to m
odify
the O
LT
P sy
s-
tems o
r create a system
ded
icated to
captu
ring th
e missin
g d
ata. For ex
ample, w
hen
con-
siderin
g th
e Drea
mH
om
e case study, w
em
ay w
ish to
analy
ze the ch
aracteristics of certain
even
ts such
as the reg
istering o
f new
clients an
d p
roperties at each
bran
ch o
ffice. H
ow
ever,
this is cu
rrently
not p
ossib
le as we d
o n
ot cap
ture th
e data th
at the an
alysis req
uires su
ch
as the d
ate registered
in eith
er case.
Incre
ased
end
-user d
em
and
s
After en
d-u
sers receive q
uery
and rep
ortin
g to
ols, req
uests fo
r support fro
m IS
staff may
increase rath
er than
decrease. T
his is cau
sed b
y an
increasin
g aw
areness o
f the u
sers on
the cap
abilities an
d v
alue o
f the d
ata wareh
ouse. T
his p
roblem
can b
e partially
alleviated
by in
vestin
g in
easier-to-u
se, more p
ow
erful to
ols, o
r in p
rovid
ing b
etter trainin
g fo
r the
users. A
furth
er reason fo
r increasin
g d
eman
ds o
n IS
staff is that o
nce a d
ata wareh
ouse is
onlin
e, it is often
the case th
at the n
um
ber o
f users an
d q
ueries in
crease togeth
er with
requests fo
r answ
ers to m
ore an
d m
ore co
mplex
queries.
Data
hom
og
eniz
atio
n
Larg
e-scale data w
arehousin
g can
beco
me an
exercise in
data h
om
ogen
ization th
at lessens
the v
alue o
f the d
ata. For ex
ample, in
pro
ducin
g a co
nso
lidated
and in
tegrated
view
of th
e
org
anizatio
n’s d
ata, the w
arehouse d
esigner m
ay b
e tempted
to em
phasize sim
ilarities
rather th
an d
ifferences in
the d
ata used
by d
ifferent ap
plicatio
n areas su
ch as p
roperty
sales
and p
roperty
rentin
g.
Hig
h d
em
and
for re
sourc
es
The
data
wareh
ouse
can use
large
amounts
of
disk
sp
ace. M
any relatio
nal
datab
ases
used
fo
r decisio
n-su
pport
are desig
ned
aro
und star,
snow
flak
e, an
d starfl
ake
schem
as
11
56
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
(see Chap
ter 32). T
hese ap
pro
aches resu
lt in th
e creation o
f very
large fact tab
les. If there
are man
y d
imen
sions to
the factu
al data, th
e com
bin
ation o
f aggreg
ate tables an
d in
dex
es
to th
e fact tables can
use u
p m
ore sp
ace than
the raw
data.
Data
ow
ners
hip
Data w
arehousin
g m
ay ch
ange th
e attitude o
f end-u
sers to th
e ow
nersh
ip o
f data. S
ensitiv
e
data th
at was o
rigin
ally v
iewed
and u
sed o
nly
by a p
articular d
epartm
ent o
r busin
ess area,
such
as sales or m
arketin
g, m
ay n
ow
be m
ade accessib
le to o
thers in
the o
rgan
ization.
Hig
h m
ain
tenance
Data
wareh
ouses
are hig
h
main
tenan
ce sy
stems.
Any
reorg
anizatio
n
of
the
busin
ess
pro
cesses and th
e source sy
stems m
ay affect th
e data w
arehouse. T
o rem
ain a v
aluab
le
resource, th
e data w
arehouse m
ust rem
ain co
nsisten
t with
the o
rgan
ization th
at it supports.
Long
-dura
tion p
roje
cts
A d
ata wareh
ouse rep
resents a sin
gle d
ata resource fo
r the o
rgan
ization. H
ow
ever, th
e
build
ing o
f a wareh
ouse can
take u
p to
three y
ears, which
is why so
me o
rgan
izations are
build
ing d
ata marts (see S
ection 3
1.5
). Data m
arts support o
nly
the req
uirem
ents o
f a
particu
lar dep
artmen
t or fu
nctio
nal area an
d can
therefo
re be b
uilt m
ore rap
idly
.
Com
ple
xity
of in
teg
ratio
n
The
most
importan
t area
for
the
man
agem
ent
of
a data
wareh
ouse
is th
e in
tegratio
n
capab
ilities. This m
eans an
org
anizatio
n m
ust sp
end a sig
nifi
cant am
ount o
f time d
eter-
min
ing h
ow
well th
e vario
us d
ifferent d
ata wareh
ousin
g to
ols can
be in
tegrated
into
the
overall so
lutio
n th
at is need
ed. T
his can
be a v
ery d
ifficu
lt task, as th
ere are a num
ber o
f
tools fo
r every
operatio
n o
f the d
ata wareh
ouse, w
hich
must in
tegrate w
ell in o
rder th
at the
wareh
ouse w
ork
s to th
e org
anizatio
n’s b
enefi
t.
Data
Wa
reh
ou
se
Arc
hite
ctu
re
In th
is section w
e presen
t an o
verv
iew o
f the arch
itecture an
d m
ajor co
mponen
ts of a d
ata
wareh
ouse (A
nah
ory
and M
urray
, 1997). T
he p
rocesses, to
ols, an
d tech
nolo
gies asso
ciated
with
data w
arehousin
g are d
escribed
in m
ore d
etail in th
e follo
win
g sectio
ns o
f this ch
apter.
The ty
pical arch
itecture o
f a data w
arehouse is sh
ow
n in
Fig
ure 3
1.1
.
Op
era
tion
al D
ata
The so
urce o
f data fo
r the d
ata wareh
ouse is su
pplied
from
:
nM
ainfram
e operatio
nal d
ata held
in fi
rst gen
eration h
ierarchical an
d n
etwork
datab
ases.
It is estimated
that th
e majo
rity o
f corp
orate o
peratio
nal d
ata is held
in th
ese system
s.
31
.2
31
.2.1
31
.2 D
ata
Wa
reho
use
Arc
hite
ctu
re|
11
57
nD
epartm
ental d
ata held
in p
roprietary
file sy
stems su
ch as V
SA
M, R
MS
, and relatio
nal
DB
MS
s such
as Info
rmix
and O
racle.
nP
rivate d
ata held
on w
ork
stations an
d p
rivate serv
ers.
nE
xtern
al system
s such
as the In
ternet, co
mm
ercially av
ailable d
atabases, o
r datab
ases
associated
with
an o
rgan
ization’s su
ppliers o
r custo
mers.
Op
era
tion
al D
ata
Sto
re
An O
peratio
nal D
ata Sto
re (OD
S) is a rep
osito
ry o
f curren
t and in
tegrated
operatio
nal d
ata
used
for an
alysis. It is o
ften stru
ctured
and su
pplied
with
data in
the sam
e way
as the
data w
arehouse, b
ut m
ay in
fact act simply
as a stagin
g area fo
r data to
be m
oved
into
the
wareh
ouse.
The O
DS
is often
created w
hen
legacy
operatio
nal sy
stems are fo
und to
be in
capab
le
of ach
ievin
g rep
ortin
g req
uirem
ents. T
he O
DS
pro
vid
es users w
ith th
e ease of u
se of a
relational d
atabase w
hile rem
ainin
g d
istant fro
m th
e decisio
n su
pport fu
nctio
ns o
f the
data w
arehouse.
Fig
ure
31
.1T
yp
ica
l arc
hite
ctu
re o
f a d
ata
wa
reh
ou
se
.
31
.2.2
11
58
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
Build
ing an
OD
S can
be a h
elpfu
l step to
ward
s build
ing a d
ata wareh
ouse b
ecause an
OD
S can
supply
data th
at has b
een alread
y ex
tracted fro
m th
e source sy
stems an
d clean
ed.
This m
eans th
at the rem
ainin
g w
ork
of in
tegratin
g an
d restru
cturin
g th
e data fo
r the d
ata
wareh
ouse is sim
plifi
ed (see S
ection 3
2.3
).
Lo
ad
Ma
na
ge
r
The
load
m
anag
er (also
called
th
e fro
nten
dco
mponen
t) perfo
rms
all th
e operatio
ns
associated
with
the ex
traction an
d lo
adin
g o
f data in
to th
e wareh
ouse. T
he d
ata may
be
extracted
directly
from
the d
ata sources o
r more co
mm
only
from
the o
peratio
nal d
ata store.
The o
peratio
ns p
erform
ed b
y th
e load
man
ager m
ay in
clude sim
ple tran
sform
ations o
f the
data to
prep
are the d
ata for en
try in
to th
e wareh
ou
se. The size an
d co
mplex
ity o
f this co
m-
ponen
t will v
ary b
etween
data w
arehouses an
d m
ay b
e constru
cted u
sing a co
mbin
ation
of v
endor d
ata load
ing to
ols an
d cu
stom
-built p
rogram
s.
Wa
reh
ou
se
Ma
na
ge
r
The w
arehouse m
anag
er perfo
rms all th
e operatio
ns asso
ciated w
ith th
e man
agem
ent o
f
the d
ata in th
e wareh
ouse. T
his co
mponen
t is constru
cted u
sing v
endor d
ata man
agem
ent
tools an
d cu
stom
-built p
rogram
s. The o
peratio
ns p
erform
ed b
y th
e wareh
ouse m
anag
er
inclu
de:
nan
alysis o
f data to
ensu
re consisten
cy;
ntran
sform
ation an
d m
ergin
g o
f source d
ata from
temp
orary
storag
e into
data w
arehouse
tables;
ncreatio
n o
f index
es and v
iews o
n b
ase tables;
ngen
eration o
f den
orm
alizations (if n
ecessary);
ngen
eration o
f aggreg
ations (if n
ecessary);
nback
ing-u
p an
d arch
ivin
g d
ata.
In so
me cases, th
e wareh
ouse m
anag
er also g
enerates q
uery
pro
files to
determ
ine w
hich
index
es and ag
greg
ations are ap
pro
priate. A
query
pro
file can
be g
enerated
for each
user,
gro
up o
f users, o
r the d
ata wareh
ouse an
d is b
ased o
n in
form
ation th
at describ
es the ch
ar-
acteristics of th
e queries su
ch as freq
uen
cy, targ
et table(s), an
d size o
f result sets.
Qu
ery M
an
ag
er
The
query
m
anag
er (also
called
th
e backen
dco
mponen
t) perfo
rms
all th
e operatio
ns
associated
with
the m
anag
emen
t of u
ser queries. T
his co
mponen
t is typically
constru
cted
usin
g
ven
dor
end-u
ser data
access to
ols,
data
wareh
ouse
monito
ring
tools,
datab
ase
facilities, and cu
stom
-built p
rogram
s. The co
mplex
ity o
f the q
uery
man
ager is d
etermin
ed
by th
e facilities pro
vid
ed b
y th
e end-u
ser access tools an
d th
e datab
ase. The o
peratio
ns
31
.2.3
31
.2.4
31
.2.5
31
.2 D
ata
Wa
reho
use
Arc
hite
ctu
re|
11
59
perfo
rmed
by th
is co
mponen
t in
clude
directin
g queries
to th
e ap
pro
priate
tables
and
sched
ulin
g th
e execu
tion o
f queries. In
som
e cases, the q
uery
man
ager also
gen
erates
query
pro
files to
allow
the w
arehouse m
anag
er to d
etermin
e which
index
es and ag
greg
a-
tions are ap
pro
priate.
De
taile
d D
ata
This area o
f the w
arehouse sto
res all the d
etailed d
ata in th
e datab
ase schem
a. In m
ost
cases, the d
etailed d
ata is not sto
red o
nlin
e but is m
ade av
ailable b
y ag
greg
ating th
e data
to th
e nex
t level o
f detail. H
ow
ever, o
n a reg
ular b
asis, detailed
data is ad
ded
to th
e ware-
house to
supplem
ent th
e aggreg
ated d
ata.
Lig
htly
an
d H
igh
ly S
um
ma
rize
d D
ata
This area o
f the w
arehouse sto
res all the p
redefi
ned
lightly
and h
ighly
sum
marized
(aggreg
ated)
data g
enerated
by th
e wareh
ouse m
anag
er. This area o
f the w
arehouse is tran
sient as it w
ill
be su
bject to
chan
ge o
n an
ongoin
g b
asis in o
rder to
respond to
chan
gin
g q
uery
pro
files.
The
purp
ose
of
sum
mary
in
form
ation
is to
sp
eed
up
the
perfo
rman
ce of
queries.
Alth
ough th
ere are increased
operatio
nal co
sts associated
with
initially
sum
marizin
g th
e
data, th
is is offset b
y rem
ovin
g th
e requirem
ent to
contin
ually
perfo
rm su
mm
ary o
pera-
tions (su
ch as so
rting o
r gro
upin
g) in
answ
ering u
ser queries. T
he su
mm
ary d
ata is updated
contin
uously
as new
data is lo
aded
into
the w
arehouse.
Arc
hiv
e/B
ack
up
Data
This area o
f the w
arehouse sto
res the d
etailed an
d su
mm
arized d
ata for th
e purp
oses o
f
archiv
ing an
d b
ackup. E
ven
although su
mm
ary d
ata is gen
erated fro
m d
etailed d
ata, it
may
be n
ecessary to
back
up o
nlin
e sum
mary
data if th
is data is k
ept b
eyond th
e retentio
n
perio
d fo
r detailed
data. T
he d
ata is transferred
to sto
rage arch
ives su
ch as m
agnetic tap
e
or o
ptical d
isk.
Me
tad
ata
This area o
f the w
arehouse sto
res all the m
etadata (d
ata about d
ata) defi
nitio
ns u
sed b
y all
the p
rocesses in
the w
arehouse. M
etadata is u
sed fo
r a variety
of p
urp
oses in
cludin
g:
nth
e ex
traction an
d lo
adin
g pro
cesses – m
etadata
is used
to
m
ap data
sources
to a
com
mon v
iew o
f the d
ata with
in th
e wareh
ouse;
nth
e wareh
ouse m
anag
emen
t pro
cess – m
etadata is u
sed to
auto
mate th
e pro
ductio
n o
f
sum
mary
tables;
nas p
art of th
e query
man
agem
ent p
rocess –
metad
ata is used
to d
irect a query
to th
e most
appro
priate d
ata source.
31
.2.6
31
.2.7
31
.2.8
31
.2.9
11
60
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
The stru
cture o
f metad
ata differs b
etween
each p
rocess, b
ecause th
e purp
ose is d
ifferent.
This m
eans th
at multip
le copies o
f metad
ata describ
ing th
e same d
ata item are h
eld w
ithin
the d
ata wareh
ouse. In
additio
n, m
ost v
endor to
ols fo
r copy m
anag
emen
t and en
d-u
ser
data access u
se their o
wn v
ersions o
f metad
ata. Specifi
cally, co
py m
anag
emen
t tools u
se
metad
ata to u
nderstan
d th
e map
pin
g ru
les to ap
ply
in o
rder to
convert th
e source d
ata into
a com
mon fo
rm. E
nd-u
ser access tools u
se metad
ata to u
nderstan
d h
ow
to b
uild
a query
.
The m
anag
emen
t of m
etadata w
ithin
the d
ata wareh
ouse is a v
ery co
mplex
task th
at should
not b
e underestim
ated. T
he issu
es associated
with
the m
anag
emen
t of m
etadata in
a data
wareh
ouse are d
iscussed
in S
ection 3
1.4
.3.
En
d-U
se
r Ac
ce
ss T
oo
ls
The p
rincip
al purp
ose o
f data w
arehousin
g is to
pro
vid
e info
rmatio
n to
busin
ess users
for strateg
ic decisio
n-m
akin
g. T
hese u
sers interact w
ith th
e wareh
ouse u
sing en
d-u
ser
access tools. T
he d
ata wareh
ouse m
ust effi
ciently
support a
d h
oc
and ro
utin
e analy
sis.
Hig
h p
erform
ance is ach
ieved
by p
re-plan
nin
g th
e requirem
ents fo
r join
s, sum
matio
ns,
and p
eriodic rep
orts b
y en
d-u
sers.
Alth
ough th
e defi
nitio
ns o
f end-u
ser access tools can
overlap
, for th
e purp
ose o
f this
discu
ssion, w
e categorize th
ese tools in
to fi
ve m
ain g
roups (B
erson an
d S
mith
, 1997):
nrep
ortin
g an
d q
uery
tools;
nap
plicatio
n d
evelo
pm
ent to
ols;
nE
xecu
tive In
form
ation S
ystem
(EIS
) tools;
nO
nlin
e Analy
tical Pro
cessing (O
LA
P) to
ols;
ndata m
inin
g to
ols.
Rep
ortin
g a
nd
query
tools
Rep
ortin
g to
ols in
clude p
roductio
n rep
ortin
g to
ols an
d rep
ort w
riters. Pro
ductio
n rep
ort-
ing to
ols are u
sed to
gen
erate regular o
peratio
nal rep
orts o
r support h
igh-v
olu
me b
atch
jobs, su
ch as cu
stom
er ord
ers/invoices an
d staff p
ay ch
eques. R
eport w
riters, on th
e oth
er
han
d, are in
expen
sive d
eskto
p to
ols d
esigned
for en
d-u
sers.
Query
tools fo
r relational d
ata wareh
ouses are d
esigned
to accep
t SQ
L o
r gen
erate
SQ
L statem
ents to
query
data sto
red in
the w
arehouse. T
hese to
ols sh
ield en
d-u
sers from
the co
mplex
ities of S
QL
and d
atabase stru
ctures b
y in
cludin
g a m
eta-layer b
etween
users
and th
e datab
ase. The m
eta-layer
is the so
ftware th
at pro
vid
es subject-o
riented
view
s of
a datab
ase and su
pports ‘p
oin
t-and-click
’ creation o
f SQ
L. A
n ex
ample o
f a query
tool is
Query
-By-E
xam
ple (Q
BE
). The Q
BE
facility o
f Micro
soft O
ffice A
ccess DB
MS
was
dem
onstrated
in C
hap
ter 7. Q
uery
tools are p
opular w
ith u
sers of b
usin
ess applicatio
ns
such
as dem
ograp
hic an
alysis an
d cu
stom
er mailin
g lists. H
ow
ever, as q
uestio
ns b
ecom
e
increasin
gly
com
plex
, these to
ols m
ay rap
idly
beco
me in
efficien
t.
Ap
plic
atio
n d
eve
lop
ment to
ols
The req
uirem
ents o
f the en
d-u
sers may
be su
ch th
at the b
uilt-in
capab
ilities of rep
ortin
g
and q
uery
tools are in
adeq
uate eith
er becau
se the req
uired
analy
sis cannot b
e perfo
rmed
31
.2.1
0
31.3
Data
Ware
house D
ata
Flo
ws
|1
16
1
or b
ecause th
e user in
teraction req
uires an
unreaso
nab
ly h
igh lev
el of ex
pertise b
y th
e
user. In
this situ
ation, u
ser access may
require th
e dev
elopm
ent o
f in-h
ouse ap
plicatio
ns
usin
g g
raphical d
ata access tools d
esigned
prim
arily fo
r client–
server en
viro
nm
ents. S
om
e
of th
ese applicatio
n d
evelo
pm
ent to
ols in
tegrate w
ith p
opular O
LA
P to
ols, an
d can
access
all majo
r datab
ase system
s, inclu
din
g O
racle, Sybase, an
d In
form
ix.
Executive
info
rmatio
n s
yste
m (E
IS) to
ols
Execu
tive in
form
ation sy
stems, m
ore recen
tly referred
to as ‘ev
erybody’s in
form
ation
system
s’, were o
rigin
ally d
evelo
ped
to su
pport h
igh-lev
el strategic d
ecision-m
akin
g. H
ow
-
ever, th
e focu
s of th
ese system
s wid
ened
to in
clude su
pport fo
r all levels o
f man
agem
ent.
EIS
tools w
ere orig
inally
associated
with
main
frames en
ablin
g u
sers to b
uild
custo
mized
,
grap
hical d
ecision-su
pport ap
plicatio
ns to
pro
vid
e an o
verv
iew o
f the o
rgan
ization’s d
ata
and access to
extern
al data so
urces.
Curren
tly, th
e dem
arcation b
etween
EIS
tools an
d o
ther d
ecision-su
pport to
ols is ev
en
more v
ague as E
IS d
evelo
pers ad
d ad
ditio
nal q
uery
facilities and p
rovid
e custo
m-b
uilt
applicatio
ns fo
r busin
ess areas such
as sales, mark
eting, an
d fi
nan
ce.
Onlin
e A
naly
tical P
rocessin
g (O
LA
P) to
ols
Onlin
e Analy
tical Pro
cessing (O
LA
P) to
ols are b
ased o
n th
e concep
t of m
ulti-d
imen
sional
datab
ases an
d
allow
a
sophisticated
user
to
analy
ze th
e data
usin
g
com
plex
, m
ulti-
dim
ensio
nal v
iews. T
ypical b
usin
ess applicatio
ns fo
r these to
ols in
clude assessin
g th
e
effectiven
ess of
a m
arketin
g cam
paig
n,
pro
duct
sales fo
recasting,
and cap
acity plan
-
nin
g.
These
tools
assum
e th
at th
e data
is org
anized
in
a
multi-d
imen
sional
model
supported
by a sp
ecial multi-d
imen
sional d
atabase (M
DD
B) o
r by a relatio
nal d
atabase
desig
ned
to en
able m
ulti-d
imen
sional q
ueries. W
e discu
ss OL
AP
tools in
more d
etail in
Chap
ter 33.
Data
min
ing
tools
Data m
inin
g is th
e pro
cess of d
iscoverin
g m
eanin
gfu
l new
correlatio
ns, p
atterns, an
d
trends
by m
inin
g larg
e am
ounts
of
data
usin
g statistical,
math
ematical,
and artifi
cial
intellig
ence (A
I) techniq
ues. D
ata min
ing h
as the p
oten
tial to su
persed
e the cap
abilities o
f
OL
AP
tools, as th
e majo
r attraction o
f data m
inin
g is its ab
ility to
build
pred
ictiverath
er
than
retrosp
ective models. W
e discu
ss data m
inin
g in
more d
etail in C
hap
ter 34.
Data
Wa
reh
ou
se
Data
Flo
ws
In th
is section w
e exam
ine th
e activities asso
ciated w
ith th
e pro
cessing (o
r flow
) of d
ata
with
in a d
ata wareh
ouse. D
ata wareh
ousin
g fo
cuses o
n th
e man
agem
ent o
f five p
rimary
data fl
ow
s, nam
ely th
e infl
ow
, upfl
ow
, dow
nfl
ow
, outfl
ow
, and m
etaflow
(Hack
athorn
,
1995). T
he d
ata flow
s with
in a d
ata wareh
ouse are sh
ow
n in
Fig
ure 3
1.2
. The p
rocesses
associated
with
each d
ata flow
inclu
de:
31
.3
11
62
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
nIn
flow
Extractio
n, clean
sing, an
d lo
adin
g o
f the so
urce d
ata.
nU
pfl
ow
Addin
g v
alue to
the d
ata in th
e wareh
ouse th
rough su
mm
arizing, p
ack-
agin
g, an
d d
istributio
n o
f the d
ata.
nD
ow
nfl
ow
Arch
ivin
g an
d b
ackin
g-u
p th
e data in
the w
arehouse.
nO
utfl
ow
Mak
ing th
e data av
ailable to
end-u
sers.
nM
etaflow
Man
agin
g th
e metad
ata.
Infl
ow
Infl
ow
The p
rocesses a
ssocia
ted
with
the e
xtra
ctio
n, c
leansin
g, a
nd
load
ing
of th
e
data
from
the s
ourc
e s
yste
ms in
to th
e d
ata
ware
house.
The in
flow
is concern
ed w
ith tak
ing d
ata from
the so
urce sy
stems to
load
into
the d
ata
wareh
ouse. A
lternativ
ely, th
e data m
ay b
e first lo
aded
into
the o
peratio
nal d
ata store
Fig
ure
31
.2In
form
atio
n fl
ow
s o
f a d
ata
wa
reh
ou
se
.
31
.3.1
31.3
Data
Ware
house D
ata
Flo
ws
|1
16
3
(OD
S) (see S
ection 3
1.2
.2) b
efore b
eing tran
sferred to
the d
ata wareh
ouse. A
s the so
urce
data is g
enerated
pred
om
inately
by O
LT
P sy
stems, th
e data m
ust b
e reconstru
cted fo
r the
purp
oses o
f the d
ata wareh
ouse. T
he reco
nstru
ction o
f data in
volv
es:
nclean
sing d
irty d
ata;
nrestru
cturin
g d
ata to su
it the n
ew req
uirem
ents o
f the d
ata wareh
ouse in
cludin
g, fo
r
exam
ple, ad
din
g an
d/o
r removin
g fi
elds, an
d d
enorm
alizing d
ata;
nen
surin
g th
at the so
urce d
ata is consisten
t with
itself and w
ith th
e data alread
y in
the
wareh
ouse.
To effectiv
ely m
anag
e the in
flow
, mech
anism
s must b
e iden
tified
to d
etermin
e when
to
start extractin
g th
e data to
carry o
ut th
e necessary
transfo
rmatio
ns an
d to
undertak
e con-
sistency
check
s. When
extractin
g d
ata from
the so
urce sy
stems, it is im
portan
t to en
sure
that th
e data is in
a consisten
t state to g
enerate a sin
gle, co
nsisten
t view
of th
e corp
orate
data. T
he co
mplex
ity o
f the ex
traction p
rocess is d
etermin
ed b
y th
e exten
t to w
hich
the
source sy
stems are ‘in
tune’ w
ith o
ne an
oth
er.
Once th
e data is ex
tracted, th
e data is u
sually
load
ed in
to a tem
porary
store fo
r the
purp
oses
of
cleansin
g
and
consisten
cy
check
ing.
As
this
pro
cess is
com
plex
, it
is
importan
t for it to
be fu
lly au
tom
ated an
d to
hav
e the ab
ility to
report w
hen
pro
blem
s
and failu
res occu
r. Com
mercial to
ols are av
ailable to
support th
e man
agem
ent o
f the
infl
ow
. How
ever, u
nless th
e pro
cess is relatively
straightfo
rward
, the to
ols m
ay req
uire
custo
mizatio
n.
Up
flow
Up
flo
wThe p
rocesses a
ssocia
ted
with
ad
din
g v
alu
e to
the d
ata
in th
e w
are
hou
se
thro
ug
h s
um
mariz
ing
, packag
ing
, and
dis
tributio
n o
f the d
ata
.
The activ
ities associated
with
the u
pfl
ow
inclu
de:
nSum
marizin
gth
e data b
y selectin
g, p
rojectin
g, jo
inin
g, an
d g
roupin
g relatio
nal d
ata
into
view
s that are m
ore co
nven
ient an
d u
seful to
the en
d-u
sers. Sum
marizin
g ex
tends
bey
on
d
simp
le relatio
nal
op
eration
s to
in
vo
lve
sop
histicated
statistical
analy
sis
inclu
din
g id
entify
ing tren
ds, clu
stering, an
d sam
plin
g th
e data.
nP
acka
gin
gth
e data b
y co
nvertin
g th
e detailed
or su
mm
arized d
ata into
more u
seful
form
ats, such
as spread
sheets, tex
t docu
men
ts, charts, o
ther g
raphical p
resentatio
ns,
priv
ate datab
ases, and an
imatio
n.
nD
istributin
gth
e data to
appro
priate g
roups to
increase its av
ailability
and accessib
ility.
While ad
din
g v
alue to
the d
ata, consid
eration m
ust also
be g
iven
to su
pport th
e perfo
rm-
ance req
uirem
ents o
f the d
ata wareh
ouse an
d to
min
imize th
e ongoin
g o
peratio
nal co
sts.
These req
uirem
ents essen
tially p
ull th
e desig
n in
opposin
g d
irections, fo
rcing restru
ctur-
ing to
impro
ve q
uery
perfo
rman
ce or to
low
er operatio
nal co
sts. In o
ther w
ord
s, the d
ata
wareh
ouse ad
min
istrator m
ust id
entify
the m
ost ap
pro
priate d
atabase d
esign to
meet all
requirem
ents, w
hich
often
necessitates a d
egree o
f com
pro
mise.
31
.3.2
11
64
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
Dow
nfl
ow
Do
wn
flo
wThe p
rocesses a
ssocia
ted
with
arc
hiv
ing
and
bac
kin
g-u
p o
f data
in th
e
ware
house.
Arch
ivin
g o
ld d
ata play
s an im
portan
t role in
main
tainin
g th
e effectiven
ess and p
erform
-
ance o
f the w
arehouse b
y tran
sferring th
e old
er data o
f limited
valu
e to a sto
rage arch
ive
such
as mag
netic tap
e or o
ptical d
isk. H
ow
ever, if th
e correct p
artitionin
g sch
eme is
selected fo
r the d
atabase, th
e amount o
f data o
nlin
e should
not affect p
erform
ance.
Partitio
nin
g is a u
seful d
esign o
ptio
n fo
r very
large d
atabases th
at enab
les the frag
-
men
tation o
f a table sto
ring en
orm
ous n
um
bers o
f record
s into
several sm
aller tables. T
he
rule fo
r the p
artitionin
g a g
iven
table can
be b
ased o
n ch
aracteristics of th
e data su
ch as
timesp
an o
r area of th
e country
. For ex
ample, th
e Pro
perty
Sale
table o
f Drea
mH
om
eco
uld
be p
artitioned
accord
ing to
the co
untries o
f the U
K.
The d
ow
nfl
ow
of d
ata inclu
des th
e pro
cesses to en
sure th
at the cu
rrent state o
f the d
ata
wareh
ouse can
be reb
uilt fo
llow
ing d
ata loss, o
r softw
are/hard
ware failu
res. Arch
ived
data
should
be sto
red in
a way
that allo
ws th
e re-establish
men
t of th
e data in
the w
arehouse,
when
required
.
Ou
tflow
Ou
tflo
wT
he
p
roc
esse
s
asso
cia
ted
w
ith
ma
kin
g
the
d
ata
a
va
ilab
le
to
the
end
-users
.
The
outfl
ow
is
where
the
real valu
e of
wareh
ousin
g is
realized by th
e org
anizatio
n.
This m
ay req
uire re-en
gin
eering th
e busin
ess pro
cesses to ach
ieve co
mpetitiv
e advan
tage
(Hack
athorn
, 1995). T
he tw
o k
ey activ
ities involv
ed in
the o
utfl
ow
inclu
de:
nA
ccessing, w
hich
is concern
ed w
ith satisfy
ing th
e end-u
sers’ requests fo
r the d
ata they
need
. The m
ain issu
e is to create an
enviro
nm
ent so
that u
sers can effectiv
ely u
se
the q
uery
tools to
access the m
ost ap
pro
priate d
ata source. T
he freq
uen
cy o
f user
accesses can v
ary fro
m a
d h
oc, to
routin
e, to real-tim
e. It is importan
t to en
sure th
at the
system
’s resources are u
sed in
the m
ost effectiv
e way
in sch
edulin
g th
e execu
tion
of u
ser queries.
nD
elivering,w
hich
is concern
ed w
ith p
roactiv
ely d
eliverin
g in
form
ation to
the en
d-u
sers’
work
stations
and
is referred
to
as
a ty
pe
of
‘publish
-and-su
bscrib
e’ pro
cess. T
he
wareh
ouse p
ublish
es vario
us ‘b
usin
ess objects’ th
at are revised
perio
dically
by m
onito
r-
ing u
sage p
atterns. U
sers subscrib
e to th
e set of b
usin
ess objects th
at best m
eets their
need
s.
An im
portan
t issue in
man
agin
g th
e outfl
ow
is the activ
e mark
eting o
f the d
ata wareh
ouse
to u
sers, which
will co
ntrib
ute to
its overall im
pact o
n an
org
anizatio
n’s o
peratio
ns. T
here
are additio
nal o
peratio
nal activ
ities in m
anag
ing th
e outfl
ow
inclu
din
g d
irecting q
ueries to
31
.3.3
31
.3.4
31.4
Data
Ware
ho
usin
g T
ools
an
d T
echnolo
gie
s|
11
65
the ap
pro
priate targ
et table(s) an
d cap
turin
g in
form
ation o
n th
e query
pro
files asso
ciated
with
user g
roups to
determ
ine w
hich
aggreg
ations to
gen
erate.
Data w
arehouses th
at contain
sum
mary
data p
oten
tially p
rovid
e a num
ber o
f distin
ct
data so
urces to
respond to
a specifi
c query
inclu
din
g th
e detailed
data itself an
d an
y n
um
-
ber o
f aggreg
ations th
at satisfy th
e query
’s data n
eeds. H
ow
ever, th
e perfo
rman
ce of th
e
query
will v
ary co
nsid
erably
dep
endin
g o
n th
e characteristics o
f the targ
et data, th
e most
obvio
us b
eing th
e volu
me o
f data to
be read
. As p
art of m
anag
ing th
e outfl
ow
, the sy
stem
must d
etermin
e the m
ost effi
cient w
ay to
answ
er a query
.
Me
tafl
ow
Me
tafl
ow
The p
rocesses a
ssocia
ted
with
the m
anag
em
ent o
f the m
eta
da
ta.
The p
revio
us fl
ow
s describ
e the m
anag
emen
t of th
e data w
arehouse w
ith reg
ard to
how
the
data m
oves in
and o
ut o
f the w
arehouse. M
etaflow
is the p
rocess th
at moves m
etadata
(data ab
out th
e oth
er flow
s). Metad
ata is a descrip
tion o
f the d
ata conten
ts of th
e data
wareh
ouse, w
hat is in
it, where it cam
e from
orig
inally
, and w
hat h
as been
done to
it by
way
of clean
sing, in
tegratin
g, an
d su
mm
arizing. W
e discu
ss issues asso
ciated w
ith th
e
man
agem
ent o
f metad
ata in a d
ata wareh
ouse in
Sectio
n 3
1.4
.3.
To resp
ond to
chan
gin
g b
usin
ess need
s, legacy
system
s are constan
tly ch
angin
g. T
here-
fore, th
e wareh
ouse in
volv
es respondin
g to
these co
ntin
uous ch
anges, w
hich
must refl
ect
the ch
anges to
the so
urce leg
acy sy
stems an
d th
e chan
gin
g b
usin
ess enviro
nm
ent. T
he
metafl
ow
(metad
ata) must b
e contin
uously
updated
with
these ch
anges.
Data
Wa
reh
ou
sin
g T
oo
ls a
nd
Te
ch
no
log
ies
In
this
section
we
exam
ine
the
tools
and
technolo
gies
associated
w
ith
build
ing
and
man
agin
g a d
ata wareh
ouse an
d, in
particu
lar, we fo
cus o
n th
e issues asso
ciated w
ith th
e
integ
ration o
f these to
ols. F
or m
ore in
form
ation o
n d
ata wareh
ousin
g to
ols an
d tech
-
nolo
gies, th
e interested
reader is referred
to B
erson an
d S
mith
(1997).
Ex
trac
tion
, Cle
an
sin
g, a
nd
Tra
nsfo
rm
atio
n T
oo
ls
Selectin
g th
e correct ex
traction, clean
sing, an
d tran
sform
ation to
ols are critical step
s in
the co
nstru
ction o
f a data w
arehouse. T
here are an
increasin
g n
um
ber o
f ven
dors th
at are
focu
sed o
n fu
lfillin
g th
e requirem
ents o
f data w
arehouse im
plem
entatio
ns as o
pposed
to sim
ply
movin
g d
ata betw
een h
ardw
are platfo
rms. T
he task
s of cap
turin
g d
ata from
a source sy
stem, clean
sing an
d tran
sform
ing it, an
d th
en lo
adin
g th
e results in
to a targ
et
system
can b
e carried o
ut eith
er by sep
arate pro
ducts, o
r by a sin
gle in
tegrated
solu
tion.
Integ
rated so
lutio
ns fall in
to o
ne o
f the fo
llow
ing categ
ories:
31
.4
31
.3.5
31
.4.1
11
66
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
nco
de g
enerato
rs;
ndatab
ase data rep
lication to
ols;
ndynam
ic transfo
rmatio
n en
gin
es.
Cod
e g
enera
tors
Code g
enerato
rscreate cu
stom
ized 3
GL
/4G
L tran
sform
ation p
rogram
s based
on so
urce an
d
target d
ata defi
nitio
ns. T
he m
ain issu
e with
this ap
pro
ach is th
e man
agem
ent o
f the larg
e
num
ber o
f pro
gram
s required
to su
pport a co
mplex
corp
orate d
ata wareh
ouse. V
endors
recognize th
is issue an
d so
me are d
evelo
pin
g m
anag
emen
t com
ponen
ts emplo
yin
g tech
-
niq
ues su
ch as w
ork
flow
meth
ods an
d au
tom
ated sch
edulin
g sy
stems.
Data
base d
ata
rep
licatio
n to
ols
Datab
ase data rep
lication to
ols em
plo
y d
atabase trig
gers o
r a recovery
log to
captu
re
chan
ges to
a single d
ata source o
n o
ne sy
stem an
d ap
ply
the ch
anges to
a copy o
f the
source d
ata located
on a d
ifferent sy
stem (see C
hap
ter 24). M
ost rep
lication p
roducts d
o
not su
pport th
e captu
re of ch
anges to
non-relatio
nal fi
les and d
atabases, an
d o
ften d
o n
ot
pro
vid
e facilities for sig
nifi
cant d
ata transfo
rmatio
n an
d en
han
cemen
t. These to
ols can
be
used
to reb
uild
a datab
ase follo
win
g failu
re or to
create a datab
ase for a d
ata mart (see
Sectio
n 3
1.5
), pro
vid
ed th
at the n
um
ber o
f data so
urces is sm
all and th
e level o
f data
transfo
rmatio
n is relativ
ely sim
ple.
Dynam
ic tra
nsfo
rmatio
n e
ng
ines
Rule-d
riven
dynam
ic transfo
rmatio
n en
gin
es captu
re data fro
m a so
urce sy
stem at u
ser-
defi
ned
interv
als, transfo
rm th
e data, an
d th
en sen
d an
d lo
ad th
e results in
to a targ
et envir-
onm
ent. T
o d
ate, most p
roducts su
pport o
nly
relational d
ata sources, b
ut p
roducts are n
ow
emerg
ing th
at han
dle n
on-relatio
nal so
urce fi
les and d
atabases.
Data
Wa
reh
ou
se
DB
MS
There are few
integ
ration issu
es associated
with
the d
ata wareh
ouse d
atabase. D
ue to
the
matu
rity o
f such
pro
ducts, m
ost relatio
nal d
atabases w
ill integ
rate pred
ictably
with
oth
er
types o
f softw
are. How
ever, th
ere are issues asso
ciated w
ith th
e poten
tial size of th
e data
wareh
ouse d
atabase. P
arallelism in
the d
atabase b
ecom
es an im
portan
t issue, as w
ell as the
usu
al issues su
ch as p
erform
ance, scalab
ility, av
ailability
, and m
anag
eability
, which
must
all be tak
en in
to co
nsid
eration w
hen
choosin
g a D
BM
S. W
e first id
entify
the req
uirem
ents
for a d
ata wareh
ouse D
BM
S an
d th
en d
iscuss b
riefly h
ow
the req
uirem
ents o
f data w
are-
housin
g are su
pported
by p
arallel technolo
gies.
Req
uire
ments
for d
ata
ware
house D
BM
S
The sp
ecialized req
uirem
ents fo
r a relational D
BM
S su
itable fo
r data w
arehousin
g are
publish
ed in
a White P
aper (R
ed B
rick S
ystem
s, 1996) an
d are listed
in T
able 3
1.3
.
31
.4.2
31.4
Data
Ware
ho
usin
g T
ools
an
d T
echnolo
gie
s|
11
67
Load
perfo
rmance
Data
wareh
ouses
require
increm
ental
load
ing of
new
data
on a
perio
dic
basis
with
in
narro
w tim
e win
dow
s. Perfo
rman
ce of th
e load
pro
cess should
be m
easured
in h
undred
s
of m
illions o
f row
s or g
igab
ytes o
f data p
er hour an
d th
ere should
be n
o m
axim
um
limit
that co
nstrain
s the b
usin
ess.
Load
pro
cessin
g
Man
y step
s must b
e taken
to lo
ad n
ew o
r updated
data in
to th
e data w
arehouse in
cludin
g
data co
nversio
ns, fi
ltering, refo
rmattin
g, in
tegrity
check
s, physical sto
rage, in
dex
ing, an
d
metad
ata update. A
lthough each
step m
ay in
practice b
e atom
ic, the lo
ad p
rocess sh
ould
appear to
execu
te as a single, seam
less unit o
f work
.
Data
quality
manag
em
ent
The sh
ift to fact-b
ased m
anag
emen
t dem
ands th
e hig
hest d
ata quality
. The w
arehouse
must en
sure lo
cal consisten
cy, g
lobal co
nsisten
cy, an
d referen
tial integ
rity d
espite ‘d
irty’
sources an
d m
assive d
atabase sizes. W
hile lo
adin
g an
d p
reparatio
n are n
ecessary step
s,
they
are not su
fficien
t. The ab
ility to
answ
er end-u
sers’ queries is th
e measu
re of su
ccess
for a d
ata wareh
ouse ap
plicatio
n. A
s more q
uestio
ns are an
swered
, analy
sts tend to
ask
more creativ
e and co
mplex
questio
ns.
Query
perfo
rmance
Fact-b
ased m
anag
emen
t and a
d h
oc
analy
sis must n
ot b
e slow
ed o
r inhib
ited b
y th
e
perfo
rman
ce of th
e data w
arehouse R
DB
MS
. Larg
e, com
plex
queries fo
r key
busin
ess
operatio
ns m
ust co
mplete in
reasonab
le time p
eriods.
Tera
byte
scala
bility
Data w
arehouse sizes are g
row
ing at en
orm
ous rates w
ith sizes ran
gin
g fro
m a few
to
hundred
s of
gig
abytes
to
terabyte-sized
(1
012
bytes)
and
petab
yte-sized
(1
015
bytes).
Tab
le 31.3
The req
uirem
ents fo
r a data w
arehouse R
DB
MS
.
Load
perfo
rman
ce
Load
pro
cessing
Data q
uality
man
agem
ent
Query
perfo
rman
ce
Terab
yte scalab
ility
Mass u
ser scalability
Netw
ork
ed d
ata wareh
ouse
Wareh
ouse ad
min
istration
Integ
rated d
imen
sional an
alysis
Advan
ced q
uery
functio
nality
11
68
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
The R
DB
MS
must n
ot h
ave an
y arch
itectural lim
itations to
the size o
f the d
atabase an
d
should
support m
odular an
d p
arallel man
agem
ent. In
the ev
ent o
f failure, th
e RD
BM
S sh
ould
support co
ntin
ued
availab
ility, an
d p
rovid
e mech
anism
s for reco
very
. The R
DB
MS
must
support m
ass storag
e dev
ices such
as optical d
isk an
d h
ierarchical sto
rage m
anag
emen
t
dev
ices. Lastly
, query
perfo
rman
ce should
not b
e dep
enden
t on th
e size of th
e datab
ase,
but rath
er on th
e com
plex
ity o
f the q
uery
.
Mass u
ser s
cala
bility
Curren
t thin
kin
g is th
at access to a d
ata wareh
ouse is lim
ited to
relatively
low
num
bers
of m
anag
erial users. T
his is u
nlik
ely to
remain
true as th
e valu
e of d
ata wareh
ouses is
realized. It is p
redicted
that th
e data w
arehouse R
DB
MS
should
be cap
able o
f supportin
g
hundred
s, or ev
en th
ousan
ds, o
f concu
rrent u
sers while m
aintain
ing accep
table q
uery
perfo
rman
ce.
Netw
ork
ed
data
ware
house
Data w
arehouse sy
stems sh
ould
be cap
able o
f cooperatin
g in
a larger n
etwork
of d
ata ware-
houses. T
he d
ata wareh
ouse m
ust in
clude to
ols th
at coord
inate th
e movem
ent o
f subsets
of d
ata betw
een w
arehouses. U
sers should
be ab
le to lo
ok at, an
d w
ork
with
, multip
le data
wareh
ouses fro
m a sin
gle clien
t work
station.
Ware
house a
dm
inis
tratio
n
The v
ery-larg
e scale and tim
e-cyclic n
ature o
f the d
ata wareh
ouse d
eman
ds ad
min
istrat-
ive ease an
d fl
exib
ility. T
he R
DB
MS
must p
rovid
e contro
ls for im
plem
entin
g reso
urce
limits, ch
argeb
ack acco
untin
g to
allocate co
sts back
to u
sers, and q
uery
prio
ritization to
address th
e need
s of d
ifferent u
ser classes and activ
ities. The R
DB
MS
must also
pro
vid
e
for w
ork
load
trackin
g an
d tu
nin
g so
that sy
stem reso
urces m
ay b
e optim
ized fo
r max
imum
perfo
rman
ce and th
roughput. T
he m
ost v
isible an
d m
easurab
le valu
e of im
plem
entin
g
a data w
arehouse is ev
iden
ced in
the u
nin
hib
ited, creativ
e access to d
ata it pro
vid
es for
end-u
sers.
Inte
gra
ted
dim
ensio
nal a
naly
sis
The
pow
er of
multi-d
imen
sional
view
s is
wid
ely accep
ted,
and dim
ensio
nal
support
must
be
inheren
t in
th
e w
arehouse
RD
BM
S to
pro
vid
e th
e hig
hest
perfo
rman
ce fo
r
relational O
LA
P to
ols (see C
hap
ter 33). T
he R
DB
MS
must su
pport fast, easy
creation o
f
pre-co
mputed
sum
maries co
mm
on in
large d
ata wareh
ouses, an
d p
rovid
e main
tenan
ce
tools to
auto
mate th
e creation o
f these p
re-com
puted
aggreg
ates. Dynam
ic calculatio
n o
f
aggreg
ates should
be co
nsisten
t with
the in
teractive p
erform
ance n
eeds o
f the en
d-u
ser.
Ad
vanced
query
functio
nality
End-u
sers require ad
van
ced an
alytical calcu
lations, seq
uen
tial and co
mparativ
e analy
sis,
and co
nsisten
t access to d
etailed an
d su
mm
arized d
ata. Usin
g S
QL
in a clien
t–serv
er
‘poin
t-and-click
’ to
ol
enviro
nm
ent
may
so
metim
es be
impractical
or
even
im
possib
le
due to
the co
mplex
ity o
f the u
sers’ queries. T
he R
DB
MS
must p
rovid
e a com
plete an
d
advan
ced set o
f analy
tical operatio
ns.
31.4
Data
Ware
ho
usin
g T
ools
an
d T
echnolo
gie
s|
11
69
Para
llel D
BM
Ss
Data w
arehousin
g req
uires th
e pro
cessing o
f enorm
ous am
ounts o
f data an
d p
arallel data-
base tech
nolo
gy o
ffers a solu
tion to
pro
vid
ing th
e necessary
gro
wth
in p
erform
ance. T
he
success o
f parallel D
BM
Ss d
epen
ds o
n th
e efficien
t operatio
n o
f man
y reso
urces in
clud-
ing p
rocesso
rs, mem
ory
, disk
s, and n
etwork
connectio
ns. A
s data w
arehousin
g g
row
s
in p
opularity
, man
y v
endors are b
uild
ing larg
e decisio
n-su
pport D
BM
Ss u
sing p
arallel
technolo
gies. T
he aim
is to so
lve d
ecision-su
pport p
roblem
s usin
g m
ultip
le nodes w
ork
-
ing o
n th
e same p
roblem
. The m
ajor ch
aracteristics of p
arallel DB
MS
s are scalability
,
operab
ility, an
d av
ailability
.
The
parallel
DB
MS
perfo
rms
man
y
datab
ase operatio
ns
simultan
eously
, sp
litting
indiv
idual task
s into
smaller p
arts so th
at tasks can
be sp
read acro
ss multip
le pro
cessors.
Parallel D
BM
Ss m
ust b
e capab
le of ru
nnin
g p
arallel queries. In
oth
er word
s, they
must
be ab
le to d
ecom
pose larg
e com
plex
queries in
to su
bqueries, ru
n th
e separate su
bqueries
simultan
eously
, and reassem
ble th
e results at th
e end. T
he cap
ability
of su
ch D
BM
Ss m
ust
also in
clude p
arallel data lo
adin
g, tab
le scannin
g, an
d d
ata archiv
ing an
d b
ackup. T
here
are two m
ain p
arallel hard
ware arch
itectures co
mm
only
used
as datab
ase server p
latform
s
for d
ata wareh
ousin
g:
nS
ym
metric M
ulti-P
rocessin
g (S
MP
) – a set o
f tightly
coupled
pro
cessors th
at share
mem
ory
and d
isk sto
rage;
nM
assively
Parallel P
rocessin
g (M
PP
) – a set o
f loosely
coupled
pro
cessors, each
of
which
has its o
wn m
emory
and d
isk sto
rage.
The S
MP
and M
PP
parallel arch
itectures w
ere describ
ed in
detail in
Sectio
n 2
2.1
.1.
Data
Wa
reh
ou
se
Me
tad
ata
There are m
any issu
es associated
with
data w
arehouse in
tegratio
n, h
ow
ever in
this sectio
n
we fo
cus o
n th
e integ
ration o
f metad
ata, that is ‘d
ata about d
ata’ (Darlin
g, 1
996). T
he
man
agem
ent o
f the m
etadata in
the w
arehouse is an
extrem
ely co
mplex
and d
ifficu
lt task.
Metad
ata is used
for a v
ariety o
f purp
oses an
d th
e man
agem
ent o
f metad
ata is a critical
issue in
achiev
ing a fu
lly in
tegrated
data w
arehouse.
The m
ajor p
urp
ose o
f metad
ata is to sh
ow
the p
athw
ay b
ack to
where th
e data b
egan
,
so th
at the w
arehouse ad
min
istrators k
now
the h
istory
of an
y item
in th
e wareh
ouse.
How
ever, th
e pro
blem
is that m
etadata h
as several fu
nctio
ns w
ithin
the w
arehouse th
at
relates to th
e pro
cesses associated
with
data tran
sform
ation an
d lo
adin
g, d
ata wareh
ouse
man
agem
ent, an
d q
uery
gen
eration (see S
ection 3
1.2
.9).
The
metad
ata asso
ciated
with
data
transfo
rmatio
n
and
load
ing
must
describ
e th
e
source d
ata and an
y ch
anges th
at were m
ade to
the d
ata. For ex
ample, fo
r each so
urce
field
there sh
ould
be a u
niq
ue id
entifi
er, orig
inal fi
eld n
ame, so
urce d
ata type, an
d o
rigin
al
locatio
n in
cludin
g th
e system
and o
bject n
ame, alo
ng w
ith th
e destin
ation d
ata type an
d
destin
ation tab
le nam
e. If the fi
eld is su
bject to
any tran
sform
ations su
ch as a sim
ple fi
eld
type ch
ange to
a com
plex
set of p
roced
ures an
d fu
nctio
ns, th
is should
also b
e record
ed.
The m
etadata asso
ciated w
ith d
ata man
agem
ent d
escribes th
e data as it is sto
red in
the
wareh
ouse. E
very
object in
the d
atabase n
eeds to
be d
escribed
inclu
din
g th
e data in
each
31
.4.3
11
70
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
table, in
dex
, and v
iew, an
d an
y asso
ciated co
nstrain
ts. This in
form
ation is h
eld in
the
DB
MS
system
catalog, h
ow
ever, th
ere are additio
nal req
uirem
ents fo
r the p
urp
oses o
f
the w
arehouse. F
or ex
ample, m
etadata sh
ould
also d
escribe an
y fi
elds asso
ciated w
ith
aggreg
ations, in
cludin
g a d
escriptio
n o
f the ag
greg
ation th
at was p
erform
ed. In
additio
n,
table p
artitions sh
ould
be d
escribed
inclu
din
g in
form
ation o
n th
e partitio
n k
ey, an
d th
e
data ran
ge asso
ciated w
ith th
at partitio
n.
The m
etadata d
escribed
above is also
required
by th
e query
man
ager to
gen
erate appro
-
priate q
ueries. In
turn
, the q
uery
man
ager g
enerates ad
ditio
nal m
etadata ab
out th
e queries
that are ru
n, w
hich
can b
e used
to g
enerate a h
istory
on all th
e queries an
d a q
uery
pro
file
for each
user, g
roup o
f users, o
r the d
ata wareh
ouse. T
here is also
metad
ata associated
with
the u
sers of q
ueries th
at inclu
des, fo
r exam
ple, in
form
ation d
escribin
g w
hat th
e term
‘price’ o
r ‘custo
mer’ m
eans in
a particu
lar datab
ase and w
heth
er the m
eanin
g h
as chan
ged
over tim
e.
Synchro
niz
ing
meta
data
The m
ajor in
tegratio
n issu
e is how
to sy
nch
ronize th
e vario
us ty
pes o
f metad
ata used
thro
ughout th
e data w
arehouse. T
he v
arious to
ols o
f a data w
arehouse g
enerate an
d u
se
their o
wn m
etadata, an
d to
achiev
e integ
ration, w
e require th
at these to
ols are cap
able o
f
sharin
g th
eir metad
ata. The ch
allenge is to
synch
ronize m
etadata b
etween
differen
t pro
d-
ucts fro
m d
ifferent v
endors u
sing d
ifferent m
etadata sto
res. For ex
ample, it is n
ecessary
to id
entify
the co
rrect item o
f metad
ata at the rig
ht lev
el of d
etail from
one p
roduct an
d
map
it to th
e appro
priate item
of m
etadata at th
e right lev
el of d
etail in an
oth
er pro
duct,
then
sort o
ut an
y co
din
g d
ifferences b
etween
them
. This h
as to b
e repeated
for all o
ther
metad
ata that th
e two p
roducts h
ave in
com
mon. F
urth
er, any ch
anges to
the m
etadata
(or ev
en m
eta-metad
ata), in o
ne p
roduct n
eeds to
be co
nvey
ed to
the o
ther p
roduct. T
he
task o
f synch
ronizin
g tw
o p
roducts is h
ighly
com
plex
, and th
erefore rep
eating th
is pro
cess
for six
or m
ore p
roducts th
at mak
e up th
e data w
arehouse can
be reso
urce in
tensiv
e.
How
ever, in
tegratio
n o
f the m
etadata m
ust b
e achiev
ed.
In th
e beg
innin
g th
ere were tw
o m
ajor stan
dard
s for m
etadata an
d m
odelin
g in
the
areas of
data
wareh
ousin
g an
d co
mponen
t-based
dev
elopm
ent
pro
posed
by th
e M
eta
Data C
oalitio
n (M
DC
) and th
e Object M
anag
emen
t Gro
up (O
MG
). How
ever, th
ese two
industry
org
anizatio
ns jo
intly
announced
that th
e MD
C w
ould
merg
e into
the O
MG
. As
a result, th
e MD
C d
iscontin
ued
indep
enden
t operatio
ns an
d w
ork
contin
ued
in th
e OM
G
to in
tegrate th
e two stan
dard
s.
The m
erger o
f MD
C in
to th
e OM
G m
arked
an ag
reemen
t of th
e majo
r data w
are-
housin
g an
d m
etadata v
endors to
converg
e on o
ne stan
dard
, inco
rporatin
g th
e best o
f the
MD
C’s O
pen
Info
rmatio
n M
odel (O
IM) w
ith th
e best o
f the O
MG
’s Com
mon W
arehouse
Metam
odel (C
WM
). This w
ork
is now
com
plete an
d th
e resultin
g sp
ecificatio
n issu
ed b
y
the O
MG
as the n
ext v
ersion o
f the C
WM
is discu
ssed in
Sectio
n 2
7.1
.3. A
single stan
d-
ard allo
ws u
sers to ex
chan
ge m
etadata b
etween
differen
t pro
ducts fro
m d
ifferent v
endors
freely.
The
OM
G’s
CW
M
build
s on
vario
us
standard
s, in
cludin
g
OM
G’s
UM
L
(Unifi
ed
Modelin
g
Lan
guag
e), X
MI
(XM
L
Metad
ata In
terchan
ge),
and
MO
F
(Meta
Object
Facility
), and o
n th
e MD
C’s O
IM. T
he C
WM
was d
evelo
ped
by a n
um
ber o
f com
pan
ies,
inclu
din
g IB
M, O
racle, Unisy
s, Hyperio
n, G
enesis, N
CR
, UB
S, an
d D
imen
sion E
DI.
31.5
Data
Marts
|1
17
1
Ad
min
istra
tion
an
d M
an
ag
em
en
t To
ols
A d
ata wareh
ouse req
uires to
ols to
support th
e adm
inistratio
n an
d m
anag
emen
t of su
ch
a com
plex
enviro
nm
ent. T
hese to
ols are relativ
ely scarce, esp
ecially th
ose th
at are well
integ
rated w
ith th
e vario
us ty
pes o
f metad
ata and th
e day
-to-d
ay o
peratio
ns o
f the d
ata
wareh
ouse. T
he d
ata wareh
ouse ad
min
istration an
d m
anag
emen
t tools m
ust b
e capab
le of
supportin
g th
e follo
win
g task
s:
nm
onito
ring d
ata load
ing fro
m m
ultip
le sources;
ndata q
uality
and in
tegrity
check
s;
nm
anag
ing an
d u
pdatin
g m
etadata;
nm
onito
ring d
atabase p
erform
ance to
ensu
re efficien
t query
response tim
es and reso
urce
utilizatio
n;
nau
ditin
g d
ata wareh
ouse u
sage to
pro
vid
e user ch
argeb
ack in
form
ation;
nrep
licating, su
bsettin
g, an
d d
istributin
g d
ata;
nm
aintain
ing effi
cient d
ata storag
e man
agem
ent;
npurg
ing d
ata;
narch
ivin
g an
d b
ackin
g-u
p d
ata;
nim
plem
entin
g reco
very
follo
win
g failu
re;
nsecu
rity m
anag
emen
t.
Data
Ma
rts
Acco
mpan
yin
g th
e rapid
emerg
ence o
f data w
arehouses is th
e related co
ncep
t of d
ata
marts. In
this sectio
n w
e describ
e what d
ata marts are, th
e reasons fo
r build
ing d
ata marts,
and th
e issues asso
ciated w
ith th
e dev
elopm
ent an
d u
se of d
ata marts.
Da
taA
sub
set o
f a d
ata
ware
house th
at s
up
ports
the re
quire
me
nts
of a
partic
ula
r
ma
rtd
ep
artm
ent o
r busin
ess fu
nctio
n.
A d
ata mart h
old
s a subset o
f the d
ata in a d
ata wareh
ouse n
orm
ally in
the fo
rm o
f
sum
mary
data relatin
g to
a particu
lar dep
artmen
t or b
usin
ess functio
n. T
he d
ata mart can
be stan
dalo
ne o
r linked
centrally
to th
e corp
orate d
ata wareh
ouse. A
s a data w
arehouse
gro
ws larg
er, the ab
ility to
serve th
e vario
us n
eeds o
f the o
rgan
ization m
ay b
e com
pro
m-
ised. T
he p
opularity
of d
ata marts stem
s from
the fact th
at corp
orate-w
ide d
ata wareh
ouses
are pro
vin
g d
ifficu
lt to b
uild
and u
se. The ty
pical arch
itecture fo
r a data w
arehouse an
d
associated
data m
art is show
n in
Fig
ure 3
1.3
. The ch
aracteristics that d
ifferentiate d
ata
marts an
d d
ata wareh
ouses in
clude:
na d
ata mart fo
cuses o
n o
nly
the req
uirem
ents o
f users asso
ciated w
ith o
ne d
epartm
ent
or b
usin
ess functio
n;
ndata m
arts do n
ot n
orm
ally co
ntain
detailed
operatio
nal d
ata, unlik
e data w
arehouses;
31
.5
31
.4.4
11
72
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
Fig
ure
31
.3T
yp
ica
l da
ta w
are
ho
use
an
d d
ata
ma
rt arc
hite
ctu
re.
31.5
Data
Marts
|1
17
3
nas d
ata marts co
ntain
less data co
mpared
with
data w
arehouses, d
ata marts are m
ore
easily u
ndersto
od an
d n
avig
ated.
There are sev
eral appro
aches to
build
ing d
ata marts. O
ne ap
pro
ach is to
build
several
data m
arts with
a view
to th
e even
tual in
tegratio
n in
to a w
arehouse; an
oth
er appro
ach is
to b
uild
the in
frastructu
re for a co
rporate d
ata wareh
ouse w
hile at th
e same tim
e build
ing
one o
r more d
ata marts to
satisfy im
med
iate busin
ess need
s.
Data m
art architectu
res can b
e built as tw
o-tier o
r three-tier d
atabase ap
plicatio
ns. T
he
data w
arehouse is th
e optio
nal fi
rst tier (if the d
ata wareh
ouse p
rovid
es the d
ata for th
e
data m
art), the d
ata mart is th
e second tier, an
d th
e end-u
ser work
station is th
e third
tier,
as show
n in
Fig
ure 3
1.3
. Data is d
istributed
among th
e tiers.
Re
aso
ns fo
r Cre
atin
g a
Data
Ma
rt
There are m
any reaso
ns fo
r creating a d
ata mart, w
hich
inclu
de:
nT
o g
ive u
sers access to th
e data th
ey n
eed to
analy
ze most o
ften.
nT
o p
rovid
e data in
a form
that m
atches th
e collectiv
e view
of th
e data b
y a g
roup o
f
users in
a dep
artmen
t or b
usin
ess functio
n.
nT
o im
pro
ve en
d-u
ser response tim
e due to
the red
uctio
n in
the v
olu
me o
f data to
be
accessed.
nT
o p
rovid
e appro
priately
structu
red d
ata as dictated
by th
e requirem
ents o
f end-u
ser
access tools su
ch as O
nlin
e Analy
tical Pro
cessing (O
LA
P) an
d d
ata min
ing to
ols, w
hich
may
require th
eir ow
n in
ternal d
atabase stru
ctures. In
practice, th
ese tools o
ften create
their o
wn d
ata mart d
esigned
to su
pport th
eir specifi
c functio
nality
.
nD
ata marts n
orm
ally u
se less data so
tasks su
ch as d
ata cleansin
g, lo
adin
g, tran
sform
a-
tion, an
d in
tegratio
n are far easier, an
d h
ence im
plem
entin
g an
d settin
g u
p a d
ata mart
is simpler th
an estab
lishin
g a co
rporate d
ata wareh
ouse.
nT
he co
st of im
plem
entin
g d
ata marts is n
orm
ally less th
an th
at required
to estab
lish a
data w
arehouse.
nT
he p
oten
tial users o
f a data m
art are more clearly
defi
ned
and can
be m
ore easily
targeted
to o
btain
support fo
r a data m
art pro
ject rather th
an a co
rporate d
ata wareh
ouse p
roject.
Data
Ma
rts
Issu
es
The issu
es associated
with
the d
evelo
pm
ent an
d m
anag
emen
t of d
ata marts are listed
in
Tab
le 31.4
(Bro
oks, 1
997).
Data
mart fu
nctio
nality
The cap
abilities o
f data m
arts hav
e increased
with
the g
row
th in
their p
opularity
. Rath
er
than
bein
g sim
ply
small, easy
-to-access d
atabases, so
me d
ata marts m
ust n
ow
be scalab
le
to h
undred
s of g
igab
ytes (G
b), an
d p
rovid
e sophisticated
analy
sis usin
g O
nlin
e Analy
tical
31
.5.1
31
.5.2
11
74
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
Pro
cessing (O
LA
P) an
d/o
r data m
inin
g to
ols. F
urth
er, hundred
s of u
sers must b
e capab
le
of rem
otely
accessing th
e data m
art. The co
mplex
ity an
d size o
f som
e data m
arts are
match
ing th
e characteristics o
f small-scale co
rporate d
ata wareh
ouses.
Data
mart s
ize
Users ex
pect faster resp
onse tim
es from
data m
arts than
from
data w
arehouses, h
ow
ever,
perfo
rman
ce deterio
rates as data m
arts gro
w in
size. Sev
eral ven
dors o
f data m
arts are
investig
ating w
ays to
reduce th
e size of d
ata marts to
gain
impro
vem
ents in
perfo
rm-
ance. F
or ex
ample, d
ynam
ic dim
ensio
ns allo
w ag
greg
ations to
be calcu
lated o
n d
eman
d
rather th
an p
re-calculated
and sto
red in
the m
ulti-d
imen
sional d
atabase (M
DD
B) cu
be
(see Chap
ter 33).
Data
mart lo
ad
perfo
rmance
A d
ata mart h
as to b
alance tw
o critical co
mponen
ts: end-u
ser response tim
e and d
ata
load
ing perfo
rman
ce. A
data
mart
desig
ned
fo
r fast
user
response
will
hav
e a
large
num
ber o
f sum
mary
tables an
d ag
greg
ate valu
es. Unfo
rtunately
, the creatio
n o
f such
tables
and v
alues g
reatly in
creases the tim
e of th
e load
pro
cedure. V
endors are in
vestig
ating
impro
vem
ents in
the lo
ad p
roced
ure b
y p
rovid
ing in
dex
es that au
tom
atically an
d co
n-
tinually
adap
t to th
e data b
eing p
rocessed
or b
y su
pportin
g in
cremen
tal datab
ase updatin
g
so th
at only
cells affected b
y th
e chan
ge are u
pdated
and n
ot th
e entire M
DD
B stru
cture.
Users
’ access to
data
in m
ultip
le d
ata
marts
One ap
pro
ach is to
replicate d
ata betw
een d
ifferent d
ata marts o
r, alternativ
ely, b
uild
virtual
data
marts. V
irtual d
ata marts are v
iews o
f several p
hysical d
ata marts o
r the
corp
orate d
ata wareh
ouse tailo
red to
meet th
e requirem
ents o
f specifi
c gro
ups o
f users.
Com
mercial p
roducts th
at man
age v
irtual d
ata marts are av
ailable.
Data
mart In
tern
et/In
tranet a
ccess
Intern
et/Intran
et tech
nolo
gy
offers
users
low
-cost
access to
data
marts
and
the
data
wareh
ouse
usin
g
Web
bro
wsers
such
as
Netscap
e N
avig
ator
and
Micro
soft
Intern
et
Tab
le31.4
The issu
es associated
with
data m
arts.
Data m
art functio
nality
Data m
art size
Data m
art load
perfo
rman
ce
Users access to
data in
multip
le data m
arts
Data m
art Intern
et/intran
et access
Data m
art adm
inistratio
n
Data m
art installatio
n
31.6
Data
Wa
rehousin
g U
sin
g O
racle
|1
17
5
Explo
rer. Data m
art Intern
et/Intran
et pro
ducts n
orm
ally sit b
etween
a Web
server an
d th
e
data an
alysis p
roduct. V
endors are d
evelo
pin
g p
roducts w
ith in
creasingly
advan
ced W
eb
capab
ilities. These p
roducts in
clude Jav
a and A
ctiveX
capab
ilities. We d
iscussed
Web
and
DB
MS
integ
ration in
detail in
Chap
ter 29.
Data
mart a
dm
inis
tratio
n
As th
e num
ber o
f data m
arts in an
org
anizatio
n in
creases, so d
oes th
e need
to cen
trally
man
age an
d co
ord
inate d
ata mart activ
ities. Once d
ata is copied
to d
ata marts, d
ata can
beco
me in
consisten
t as users alter th
eir ow
n d
ata marts to
allow
them
to an
alyze d
ata in
differen
t way
s. Org
anizatio
ns can
not easily
perfo
rm ad
min
istration o
f multip
le data m
arts,
giv
ing rise to
issues su
ch as d
ata mart v
ersionin
g, d
ata and m
etadata co
nsisten
cy an
d
integ
rity, en
terprise-w
ide secu
rity, an
d p
erform
ance tu
nin
g. D
ata mart ad
min
istrative to
ols
are com
mercially
availab
le.
Data
mart in
sta
llatio
n
Data m
arts are beco
min
g in
creasingly
com
plex
to b
uild
. Ven
dors are o
ffering p
roducts
referred to
as ‘data m
arts in a b
ox’ th
at pro
vid
e a low
-cost so
urce o
f data m
art tools.
Data
Wa
reh
ou
sin
g U
sin
g O
racle
In C
hap
ter 8 w
e pro
vid
ed a g
eneral o
verv
iew o
f the m
ajor featu
res of th
e Oracle D
BM
S.
In th
is section w
e describ
e the featu
res of O
racle9i
Enterp
rise Editio
n th
at are specifi
cally
desig
ned
to
im
pro
ve
perfo
rman
ce an
d m
anag
eability
fo
r th
e data
wareh
ouse
(Oracle
Corp
oratio
n, 2
004f).
Ora
cle
9i
Oracle9
iE
nterp
rise E
ditio
n
is one
of
the
leadin
g
relational
DB
MS
fo
r data
ware-
housin
g. O
racle has ach
ieved
this su
ccess by fo
cusin
g o
n b
asic, core req
uirem
ents fo
r data
wareh
ousin
g: p
erform
ance, scalab
ility, an
d m
anag
eability
. Data w
arehouses sto
re larger
volu
mes o
f data, su
pport m
ore u
sers, and req
uire faster p
erform
ance, so
that th
ese core
requirem
ents rem
ain k
ey facto
rs in th
e successfu
l implem
entatio
n o
f data w
arehouses.
How
ever, O
racle goes b
eyond th
ese core req
uirem
ents an
d is th
e first tru
e ‘data w
arehouse
platfo
rm’. D
ata wareh
ouse ap
plicatio
ns req
uire sp
ecialized p
rocessin
g tech
niq
ues to
allow
support fo
r com
plex
, ad h
oc
queries ru
nnin
g ag
ainst larg
e amounts o
f data. T
o ad
dress
these sp
ecial requirem
ents, O
racle offers a v
ariety o
f query
pro
cessing tech
niq
ues, so
phis-
ticated q
uery
optim
ization to
choose th
e most effi
cient d
ata access path
, and a scalab
le
architectu
re that tak
es full ad
van
tage o
f all parallel h
ardw
are confi
guratio
ns. S
uccessfu
l
data w
arehouse ap
plicatio
ns rely
on su
perio
r perfo
rman
ce when
accessing th
e enorm
ous
amounts o
f stored
data. O
racle pro
vid
es a rich v
ariety o
f integ
rated in
dex
ing sch
emes,
join
meth
ods, an
d su
mm
ary m
anag
emen
t features, to
deliv
er answ
ers quick
ly to
data
31
.6
31
.6.1
11
76
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
wareh
ouse u
sers. Oracle also
addresses ap
plicatio
ns th
at hav
e mix
ed w
ork
load
s and w
here
adm
inistrato
rs wan
t to co
ntro
l which
users, o
r gro
ups o
f users, h
ave p
riority
when
execu
t-
ing tran
sactions o
r queries. In
this sectio
n w
e pro
vid
e an o
verv
iew o
f the m
ain featu
res
of
Oracle,
which
are
particu
larly aim
ed at
supportin
g data
wareh
ousin
g ap
plicatio
ns.
These featu
res inclu
de:
nsu
mm
ary m
anag
emen
t;
nan
alytical fu
nctio
ns;
nbitm
apped
index
es;
nad
van
ced jo
in m
ethods;
nso
phisticated
SQ
L o
ptim
izer;
nreso
urce m
anag
emen
t.
Sum
mary
manag
em
ent
In a d
ata wareh
ouse ap
plicatio
n, u
sers often
issue q
ueries th
at sum
marize d
etail data b
y
com
mon d
imen
sions, su
ch as m
onth
, pro
duct, o
r regio
n. O
racle pro
vid
es a mech
anism
for
storin
g m
ultip
le dim
ensio
ns an
d su
mm
ary calcu
lations o
n a tab
le. Thus, w
hen
a query
requests a su
mm
ary o
f detail reco
rds, th
e query
is transp
arently
re-written
to access th
e
stored
aggreg
ates rather th
an su
mm
ing th
e detail reco
rds ev
ery tim
e the q
uery
is issued
.
This resu
lts in d
ramatic im
pro
vem
ents in
query
perfo
rman
ce. These su
mm
aries are auto
-
matically
main
tained
from
data in
the b
ase tables. O
racle also p
rovid
es sum
mary
adviso
ry
functio
ns th
at assist datab
ase adm
inistrato
rs in ch
oosin
g w
hich
sum
mary
tables are th
e
most effectiv
e, dep
endin
g o
n actu
al work
load
and sch
ema statistics. O
racle Enterp
rise
Man
ager su
pports th
e creation an
d m
anag
emen
t of m
aterialized v
iews an
d related
dim
en-
sions an
d h
ierarchies v
ia a grap
hical in
terface, greatly
simplify
ing th
e man
agem
ent o
f
materialized
view
s.
Analy
tical fu
nctio
ns
Oracle9
iin
cludes a ran
ge o
f SQ
L fu
nctio
ns fo
r busin
ess intellig
ence an
d d
ata wareh
ous-
ing ap
plicatio
ns. T
hese fu
nctio
ns are co
llectively
called ‘an
alytical fu
nctio
ns’, an
d th
ey
pro
vid
e impro
ved
perfo
rman
ce and sim
plifi
ed co
din
g fo
r man
y b
usin
ess analy
sis queries.
Som
e exam
ples o
f the n
ew cap
abilities are:
nran
kin
g (fo
r exam
ple, w
ho are th
e top ten
sales reps in
each reg
ion o
f Great B
ritain?);
nm
ovin
g ag
greg
ates (for ex
ample, w
hat is th
e three-m
onth
movin
g av
erage o
f pro
perty
sales?);
noth
er functio
ns in
cludin
g cu
mulativ
e aggreg
ates, lag/lead
expressio
ns, p
eriod-o
ver-p
eriod
com
pariso
ns, an
d ratio
-to-rep
ort.
Oracle also
inclu
des th
e CU
BE
and R
OL
LU
P o
perato
rs for O
LA
P an
alysis, v
ia SQ
L.
These an
alytical an
d O
LA
P fu
nctio
ns sig
nifi
cantly
exten
d th
e capab
ilities of O
racle for
analy
tical applicatio
ns (see C
hap
ter 33).
31.6
Data
Wa
rehousin
g U
sin
g O
racle
|1
17
7
Bitm
ap
ped
ind
exes
Bitm
apped
index
es deliv
er perfo
rman
ce ben
efits to
data w
arehouse ap
plicatio
ns. T
hey
coex
ist w
ith,
and
com
plem
ent,
oth
er av
ailable
index
ing
schem
es, in
cludin
g
standard
B-tree
index
es, clu
stered tab
les, an
d hash
clu
sters. W
hile
a B
-tree in
dex
m
ay be
the
most effi
cient w
ay to
retrieve d
ata usin
g a u
niq
ue id
entifi
er, bitm
apped
index
es are most
efficien
t when
retrievin
g d
ata based
on m
uch
wid
er criteria, such
as ‘How
man
y fl
ats were
sold
last month
?’ In d
ata wareh
ousin
g ap
plicatio
ns, en
d-u
sers often
query
data b
ased o
n
these w
ider criteria. O
racle enab
les efficien
t storag
e of b
itmap
index
es thro
ugh th
e use o
f
advan
ced d
ata com
pressio
n tech
nolo
gy.
Ad
vanced
join
meth
od
s
Oracle o
ffers partitio
n-w
ise join
s, which
dram
atically in
crease the p
erform
ance o
f join
s
involv
ing tab
les that h
ave b
een p
artitioned
on th
e join
key
s. Join
ing reco
rds in
match
ing
partitio
ns
increases
perfo
rman
ce, by av
oid
ing partitio
ns
that
could
not
possib
ly hav
e
match
ing k
ey reco
rds. L
ess mem
ory
is also u
sed sin
ce less in-m
emory
sortin
g is req
uired
.
Hash
jo
ins
deliv
er hig
her
perfo
rman
ce over
oth
er jo
in m
ethods
in m
any co
mplex
queries, esp
ecially fo
r those q
ueries w
here ex
isting in
dex
es cannot b
e leverag
ed in
join
pro
cessing, a co
mm
on o
ccurren
ce in a
d h
oc
query
enviro
nm
ents. T
his jo
in elim
inates th
e
need
to p
erform
sorts, b
y u
sing an
in-m
emory
hash
table co
nstru
cted at ru
ntim
e. The h
ash
join
is also id
eally su
ited fo
r scalable p
arallel execu
tion.
Sop
his
ticate
d S
QL o
ptim
izer
Oracle
pro
vid
es num
erous
pow
erful
query
pro
cessing tech
niq
ues
that
are co
mpletely
transp
arent
to th
e en
d-u
ser. T
he
Oracle
cost-b
ased optim
izer dynam
ically determ
ines
the m
ost effi
cient access p
aths an
d jo
ins fo
r every
query
. It inco
rporates tran
sform
ation
technolo
gy th
at auto
matically
re-writes q
ueries g
enerated
by en
d-u
ser tools, fo
r efficien
t
query
execu
tion.
To ch
oose th
e most effi
cient q
uery
execu
tion strateg
y, th
e Oracle co
st-based
optim
izer
takes in
to acco
unt statistics, su
ch as th
e size of each
table an
d th
e selectivity
of each
query
conditio
n. H
istogram
s pro
vid
e the co
st-based
optim
izer with
more d
etailed statistics b
ased
on a sk
ewed
, non-u
nifo
rm d
ata distrib
utio
n. T
he co
st-based
optim
izer optim
izes execu
tion
of q
ueries in
volv
ed in
a star schem
a, which
is com
mon in
data w
arehouse ap
plicatio
ns
(see Sectio
n 3
2.2
). By u
sing a so
phisticated
star-query
optim
ization alg
orith
m an
d b
it-
map
ped
index
es, Oracle can
dram
atically red
uce th
e query
execu
tions d
one in
a traditio
nal
join
fashio
n. O
racle query
pro
cessing n
ot o
nly
inclu
des a co
mpreh
ensiv
e set of sp
ecialized
techniq
ues in
all areas (optim
ization, access an
d jo
in m
ethods, an
d q
uery
execu
tion), th
ey
are also all seam
lessly in
tegrated
, and w
ork
togeth
er to d
eliver th
e full p
ow
er of th
e query
pro
cessing en
gin
e.
Resourc
e m
anag
em
ent
Man
agin
g C
PU
and d
isk reso
urces in
a multi-u
ser data w
arehouse o
r OL
TP
applicatio
n
is challen
gin
g. A
s more u
sers require access, co
nten
tion fo
r resources b
ecom
es greater.
11
78
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
Oracle h
as resource m
anag
emen
t functio
nality
that p
rovid
es contro
l of sy
stem reso
urces
assigned
to u
sers. Importan
t onlin
e users, su
ch as o
rder en
try clerk
s, can b
e giv
en a h
igh
prio
rity, w
hile o
ther u
sers – th
ose ru
nnin
g b
atch rep
orts –
receive lo
wer p
riorities. U
sers
are assigned
to reso
urce classes, su
ch as ‘o
rder en
try’ o
r ‘batch
,’ and each
resource class
is th
en assig
ned
an
ap
pro
priate
percen
tage
of
mach
ine
resources.
In th
is w
ay,
hig
h-
prio
rity u
sers are giv
en m
ore sy
stem reso
urces th
an lo
wer-p
riority
users.
Ad
ditio
nal d
ata
ware
house fe
atu
res
Oracle also
inclu
des m
any featu
res that im
pro
ve th
e man
agem
ent an
d p
erform
ance o
f data
wareh
ouse ap
plicatio
ns. In
dex
rebuild
s can b
e done o
nlin
e with
out in
terruptin
g in
serts,
updates, o
r deletes th
at may
be o
ccurrin
g o
n th
e base tab
le. Functio
n-b
ased in
dex
es can b
e
used
to in
dex
expressio
ns, su
ch as arith
metic ex
pressio
ns, o
r functio
ns th
at modify
colu
mn
valu
es. The sam
ple scan
functio
nality
allow
s queries to
run an
d o
nly
access a specifi
ed
percen
tage o
f the ro
ws o
r blo
cks o
f a table. T
his is u
seful fo
r gettin
g m
eanin
gfu
l aggreg
ate
amounts, su
ch as an
averag
e, with
out accessin
g ev
ery ro
w o
f a table.
Ch
ap
ter S
um
ma
ry
nD
ata
wareh
ou
sing
is subject-o
riented
, integ
rated, tim
e-varian
t, and n
on-v
olatile co
llection o
f data in
sup-
port o
f man
agem
ent’s d
ecision-m
akin
g p
rocess. A
data w
arehouse is d
ata man
agem
ent an
d d
ata analy
sis
technolo
gy.
nD
ata
Web
hou
se is a distrib
uted
data w
arehouse th
at is implem
ented
over th
e Web
with
no cen
tral data
reposito
ry.
nT
he p
oten
tial ben
efits o
f data w
arehousin
g are h
igh retu
rns o
n in
vestm
ent, su
bstan
tial com
petitiv
e advan
tage,
and in
creased p
roductiv
ity o
f corp
orate d
ecision-m
akers.
nA
DB
MS
built fo
r On
line T
ran
sactio
n P
rocessin
g (O
LT
P) is g
enerally
regard
ed as u
nsu
itable fo
r data w
are-
housin
g b
ecause each
system
is desig
ned
with
a differin
g set o
f requirem
ents in
min
d. F
or ex
ample, O
LT
P
system
s are desig
n to
max
imize th
e transactio
n p
rocessin
g cap
acity, w
hile d
ata wareh
ouses are d
esigned
to
support a
d h
oc
query
pro
cessing.
nT
he m
ajor co
mponen
ts of a d
ata wareh
ouse in
clude th
e operatio
nal d
ata sources, o
peratio
nal d
ata store, lo
ad
man
ager, w
arehouse m
anag
er, query
man
ager, d
etailed, lig
htly
and h
ighly
sum
marized
data, arch
ive/b
ackup
data, m
etadata, an
d en
d-u
ser access tools.
nT
he o
pera
tion
al d
ata
source fo
r the d
ata wareh
ouse is su
pplied
from
main
frame o
peratio
nal d
ata held
in fi
rst
gen
eration h
ierarchical an
d n
etwork
datab
ases, dep
artmen
tal data h
eld in
pro
prietary
file sy
stems, p
rivate d
ata
held
on w
ork
stations an
d p
rivate serv
ers and ex
ternal sy
stems su
ch as th
e Intern
et, com
mercially
availab
le
datab
ases, or d
atabases asso
ciated w
ith an
org
anizatio
n’s su
ppliers o
r custo
mers.
nT
he o
pera
tion
al d
ata
store (O
DS
)is a rep
osito
ry o
f curren
t and in
tegrated
operatio
nal d
ata used
for an
alysis.
It is often
structu
red an
d su
pplied
with
data in
the sam
e way
as the d
ata wareh
ouse, b
ut m
ay in
fact simply
act
as a stagin
g area fo
r data to
be m
oved
into
the w
arehouse.
Ch
ap
ter S
um
mary
|1
17
9
nT
he lo
ad
man
ager
(also called
the fro
nten
dco
mponen
t) perfo
rms all th
e operatio
ns asso
ciated w
ith th
e
extractio
n an
d lo
adin
g o
f data in
to th
e wareh
ouse. T
hese o
peratio
ns in
clude sim
ple tran
sform
ations o
f the d
ata
to p
repare th
e data fo
r entry
into
the w
arehouse.
nT
he w
areh
ou
se man
ager
perfo
rms all th
e operatio
ns asso
ciated w
ith th
e man
agem
ent o
f the d
ata in th
e
wareh
ouse. T
he o
peratio
ns p
erform
ed b
y th
is com
ponen
t inclu
de an
alysis o
f data to
ensu
re consisten
cy, tran
s-
form
ation an
d m
ergin
g o
f source d
ata, creation o
f index
es and v
iews, g
eneratio
n o
f den
orm
alizations an
d
aggreg
ations, an
d arch
ivin
g an
d b
ackin
g-u
p d
ata.
nT
he q
uery
man
ager
(also called
the b
acken
dco
mponen
t) perfo
rms all th
e operatio
ns asso
ciated w
ith th
e
man
agem
ent o
f user q
ueries. T
he o
peratio
ns p
erform
ed b
y th
is com
ponen
t inclu
de d
irecting q
ueries to
the
appro
priate tab
les and sch
edulin
g th
e execu
tion o
f queries.
nE
nd
-user a
ccess tools
can b
e categorized
into
five m
ain g
roups: d
ata reportin
g an
d q
uery
tools, ap
plicatio
n
dev
elopm
ent to
ols, ex
ecutiv
e info
rmatio
n sy
stem (E
IS) to
ols, O
nlin
e Analy
tical Pro
cessing (O
LA
P) to
ols, an
d
data m
inin
g to
ols.
nD
ata w
arehousin
g fo
cuses
on th
e m
anag
emen
t of
five
prim
ary data
flow
s, nam
ely th
e in
flow
, upfl
ow
,
dow
nfl
ow
, outfl
ow
, and m
etaflow
.
nIn
flow
is the
pro
cesses associated
with
the ex
traction, clean
sing, an
d lo
adin
g o
f the d
ata from
the so
urce
system
s into
the d
ata wareh
ouse.
nU
pfl
ow
is the p
rocesses asso
ciated w
ith ad
din
g v
alue to
the d
ata in th
e wareh
ouse th
rough su
mm
arizing,
pack
agin
g, an
d d
istributio
n o
f the d
ata.
nD
ow
nfl
ow
is the p
rocesses asso
ciated w
ith arch
ivin
g an
d b
ackin
g-u
p o
f data in
the w
arehouse.
nO
utfl
ow
is the p
rocesses asso
ciated w
ith m
akin
g th
e data av
ailable to
the en
d-u
sers.
nM
etafl
ow
is the p
rocesses asso
ciated w
ith th
e man
agem
ent o
f the m
etadata (d
ata about d
ata).
nT
he req
uirem
ents fo
r a data w
arehouse R
DB
MS
inclu
de lo
ad p
erform
ance, lo
ad p
rocessin
g, d
ata quality
man
agem
ent,
query
perfo
rman
ce, terab
yte
scalability
, m
ass user
scalability
, netw
ork
ed data
wareh
ouse,
wareh
ouse ad
min
istration, in
tegrated
dim
ensio
nal an
alysis, an
d ad
van
ced q
uery
functio
nality
.
nD
ata
mart is a su
bset o
f a data w
arehouse th
at supports th
e requirem
ents o
f a particu
lar dep
artmen
t or
busin
ess functio
n. T
he issu
es associated
with
data m
arts inclu
de fu
nctio
nality
, size, load
perfo
rman
ce, users’
access to d
ata in m
ultip
le data m
arts, Intern
et/intran
et access, adm
inistratio
n, an
d in
stallation.
11
80
|C
hap
ter 3
1z
Data
Ware
housin
g C
oncep
ts
Ex
erc
ise
31.1
5Y
ou are ask
ed b
y th
e Man
agin
g D
irector o
f Drea
mH
om
eto
investig
ate and rep
ort o
n th
e applicab
ility o
f data
wareh
ousin
g fo
r the o
rgan
ization. T
he rep
ort sh
ould
com
pare d
ata wareh
ouse tech
nolo
gy w
ith O
LT
P sy
stems
and sh
ould
iden
tify th
e advan
tages an
d d
isadvan
tages, an
d an
y p
roblem
areas associated
with
implem
entin
g
a data w
arehouse. T
he rep
ort sh
ould
reach a fu
lly ju
stified
set of co
nclu
sions o
n th
e applicab
ility o
f a data
wareh
ouse fo
r Drea
mH
om
e.
Revie
w Q
ue
stio
ns
31.1
Discu
ss what is m
eant b
y th
e follo
win
g term
s
when
describ
ing th
e characteristics o
f the d
ata
in a d
ata wareh
ouse:
(a)su
bject-o
riented
;
(b)
integ
rated;
(c)tim
e-varian
t;
(d)
non-v
olatile.
31.2
Discu
ss how
Onlin
e Tran
saction P
rocessin
g
(OL
TP
) system
s differ fro
m d
ata wareh
ousin
g
system
s.
31.3
Discu
ss the m
ain b
enefi
ts and p
roblem
s
associated
with
data w
arehousin
g.
31.4
Presen
t a diag
ramm
atic represen
tation o
f the
typical arch
itecture an
d m
ain co
mponen
ts of
a data w
arehouse.
31.5
Describ
e the ch
aracteristics and m
ain
functio
ns o
f the fo
llow
ing co
mponen
ts of
a data w
arehouse:
(a)lo
ad m
anag
er;
(b)
wareh
ouse m
anag
er;
(c)query
man
ager;
(d)
metad
ata;
(e)en
d-u
ser access tools.
31.6
Discu
ss the activ
ities associated
with
each o
f
the fi
ve p
rimary
data fl
ow
s or p
rocesses w
ithin
a data w
arehouse:
(a)in
flow
;
(b)
upfl
ow
;
(c)dow
nfl
ow
;
(d)
outfl
ow
;
(e)m
etaflow
.
31.7
What are th
e three m
ain ap
pro
aches tak
en b
y
ven
dors to
pro
vid
e data ex
traction, clean
sing,
and tran
sform
ation to
ols?
31.8
Describ
e the sp
ecialized req
uirem
ents o
f
a relational d
atabase m
anag
emen
t system
(RD
BM
S) su
itable fo
r use in
a data
wareh
ouse en
viro
nm
ent.
31.9
Discu
ss how
parallel tech
nolo
gies can
support th
e requirem
ents o
f a data
wareh
ouse.
31.1
0D
iscuss th
e importan
ce of m
anag
ing m
etadata
and h
ow
this relates to
the in
tegratio
n o
f the
data w
arehouse.
31.1
1D
iscuss th
e main
tasks asso
ciated w
ith th
e
adm
inistratio
n an
d m
anag
emen
t of a d
ata
wareh
ouse.
31.1
2D
iscuss h
ow
data m
arts differ fro
m d
ata
wareh
ouses an
d id
entify
the m
ain reaso
ns fo
r
implem
entin
g a d
ata mart.
31.1
3Id
entify
the m
ain issu
es associated
with
the d
evelo
pm
ent an
d m
anag
emen
t of d
ata
marts.
31.1
4D
escribe th
e features o
f Oracle th
at
support th
e core req
uirem
ents o
f data
wareh
ousin
g.
32 C
hap
ter
Data
Ware
housin
g D
esig
n
Ch
ap
ter O
bje
ctiv
es
In th
is c
hap
ter y
ou w
ill learn
:
nThe is
sues a
ssocia
ted
with
desig
nin
g a
data
ware
house d
ata
base
.
nA
techniq
ue fo
r desig
nin
g a
data
ware
house d
ata
base c
alle
d d
ime
nsio
nality
mod
elin
g.
nH
ow
a d
imensio
nal m
od
el (D
M) d
iffers
from
an E
ntity
–R
ela
tionship
(ER
) mo
de
l.
nA
ste
p-b
y-s
tep
meth
od
olo
gy fo
r desig
nin
g a
data
ware
house d
ata
base
.
nC
riteria
for a
ssessin
g th
e d
eg
ree o
f dim
ensio
nality
pro
vid
ed
by a
data
ware
house.
nH
ow
Ora
cle
Ware
house B
uild
er c
an b
e u
sed
to b
uild
a d
ata
ware
house.
In C
hap
ter 31 w
e describ
ed th
e basic co
ncep
ts of d
ata wareh
ousin
g. In
this ch
apter w
e
focu
s on th
e issues asso
ciated w
ith d
ata wareh
ouse d
atabase d
esign. S
ince th
e 1980s, d
ata
wareh
ouses h
ave ev
olv
ed th
eir ow
n d
esign tech
niq
ues, d
istinct fro
m tran
saction-p
rocessin
g
system
s. Dim
ensio
nal d
esign tech
niq
ues h
ave em
erged
as the d
om
inan
t appro
ach fo
r most
data w
arehouse d
atabases.
11
82
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
32
.1
Str
uc
ture
of th
is C
ha
pte
r
In S
ection 3
2.1
we h
ighlig
ht th
e majo
r issues asso
ciated w
ith d
ata wareh
ouse d
esign.
In S
ection 3
2.2
we d
escribe th
e basic co
ncep
ts associated
with
dim
ensio
nality
model-
ing
and
then
co
mpare
this
techniq
ue
with
trad
itional
Entity
–R
elationsh
ip
modelin
g.
In S
ection 3
2.3
we d
escribe an
d d
emonstrate a step
-by-step
meth
odolo
gy fo
r desig
nin
g
a data w
arehouse d
atabase u
sing w
ork
ed ex
amples tak
en fro
m an
exten
ded
versio
n o
f
the D
ream
Hom
ecase stu
dy d
escribed
in S
ection 1
0.4
and A
ppen
dix
A. In
Sectio
n 3
2.4
we d
escribe criteria fo
r assessing th
e dim
ensio
nality
of a d
ata wareh
ouse. F
inally
, in
Sectio
n 3
2.5
we d
escribe h
ow
to d
esign a d
ata wareh
ouse u
sing an
Oracle p
roduct called
Oracle W
arehouse B
uild
er.
De
sig
nin
g a
Data
Wa
reh
ou
se
Data
ba
se
Desig
nin
g a d
ata wareh
ouse d
atabase is h
ighly
com
plex
. To b
egin
a data w
arehouse p
ro-
ject, we n
eed an
swers fo
r questio
ns su
ch as: w
hich
user req
uirem
ents are m
ost im
portan
t
and w
hich
data sh
ould
be co
nsid
ered fi
rst? Also
, should
the p
roject b
e scaled d
ow
n in
to
som
ethin
g m
ore m
anag
eable y
et at the sam
e time p
rovid
e an in
frastructu
re capab
le of
ultim
ately d
eliverin
g a fu
ll-scale enterp
rise-wid
e data w
arehouse? Q
uestio
ns su
ch as th
ese
hig
hlig
ht so
me o
f the m
ajor issu
es in b
uild
ing d
ata wareh
ouses. F
or m
any en
terprises th
e
solu
tion is d
ata marts, w
hich
we d
escribed
in S
ection 3
1.5
. Data m
arts allow
desig
ners
to b
uild
som
ethin
g th
at is far simpler an
d ach
ievab
le for a sp
ecific g
roup o
f users. F
ew
desig
ners are w
illing to
com
mit to
an en
terprise-w
ide d
esign th
at must m
eet all user
requirem
ents at o
ne tim
e. How
ever, d
espite th
e interim
solu
tion o
f build
ing d
ata marts,
the g
oal rem
ains th
e same; th
e ultim
ate creation o
f a data w
arehouse th
at supports th
e
requirem
ents o
f the en
terprise.
The req
uirem
ents co
llection an
d an
alysis stag
e (see Sectio
n 9
.5) o
f a data w
arehouse
pro
ject in
volv
es in
terview
ing ap
pro
priate
mem
bers
of
staff su
ch as
mark
eting users,
finan
ce users, sales u
sers, operatio
nal u
sers, and m
anag
emen
t to en
able th
e iden
tificatio
n
of a p
rioritized
set of req
uirem
ents fo
r the en
terprise th
at the d
ata wareh
ouse m
ust m
eet.
At th
e same tim
e, interv
iews are co
nducted
with
mem
bers o
f staff responsib
le for O
nlin
e
Tran
saction P
rocessin
g (O
LT
P) sy
stems to
iden
tify, w
hich
data so
urces can
pro
vid
e clean,
valid
, and co
nsisten
t data th
at will rem
ain su
pported
over th
e nex
t few y
ears.
The in
terview
s pro
vid
e the n
ecessary in
form
ation fo
r the to
p-d
ow
n v
iew (u
ser require-
men
ts) and th
e botto
m-u
p v
iew (w
hich
data so
urces are av
ailable) o
f the d
ata wareh
ouse.
With
these tw
o v
iews d
efined
we are read
y to
beg
in th
e pro
cess of d
esignin
g th
e data w
are-
house d
atabase.
The d
atabase co
mponen
t of a d
ata wareh
ouse is d
escribed
usin
g a tech
niq
ue called
dim
en-
sion
ality
mod
eling. In
the fo
llow
ing sectio
ns, w
e first d
escribe th
e concep
ts associated
with
a dim
ensio
nal m
odel an
d co
ntrast th
is model w
ith th
e traditio
nal E
ntity
–R
elationsh
ip
(ER
) model (see C
hap
ters 11 an
d 1
2). W
e then
presen
t a step-b
y-step
meth
odolo
gy fo
r
creating a d
imen
sional m
odel u
sing w
ork
ed ex
amples fro
m an
exten
ded
versio
n o
f the
Drea
mH
om
ecase stu
dy.
32.2
Dim
ensio
nality
Mod
elin
g|
11
83
Dim
en
sio
na
lity M
od
elin
g
Dim
en
sio
na
lityA
lo
gic
al
desig
n te
chniq
ue th
at
aim
s to
p
resent
the d
ata
in
a
mo
de
ling
sta
nd
ard
, intu
itive fo
rm th
at a
llow
s fo
r hig
h-p
erfo
rman
ce a
cce
ss.
Dim
ensio
nality
modelin
g u
ses the co
ncep
ts of E
ntity
–R
elationsh
ip (E
R) m
odelin
g w
ith
som
e importan
t restrictions. E
very
dim
ensio
nal m
odel (D
M) is co
mposed
of o
ne tab
le
with
a com
posite p
rimary
key
, called th
e fact ta
ble, an
d a set o
f smaller tab
les called
dim
ensio
n ta
bles. E
ach d
imen
sion tab
le has a sim
ple (n
on-co
mposite) p
rimary
key
that
corresp
onds ex
actly to
one o
f the co
mponen
ts of th
e com
posite k
ey in
the fact tab
le. In
oth
er word
s, the p
rimary
key
of th
e fact table is m
ade u
p o
f two o
r more fo
reign k
eys. T
his
characteristic ‘star-lik
e’ structu
re is called a sta
r schem
aor sta
r join
. An ex
ample star
schem
a for th
e pro
perty
sales of D
ream
Hom
eis sh
ow
n in
Fig
ure 3
2.1
. Note th
at foreig
n
key
s (labeled
{F
K}) are in
cluded
in a d
imen
sional m
odel.
Anoth
er importan
t feature o
f a DM
is that all n
atural k
eys are rep
laced w
ith su
rrogate
key
s. This m
eans th
at every
join
betw
een fact an
d d
imen
sion tab
les is based
on su
rrogate
key
s, not n
atural k
eys. E
ach su
rrogate k
eysh
ould
hav
e a gen
eralized stru
cture b
ased o
n
simple in
tegers. T
he u
se of su
rrogate k
eys allo
ws th
e data in
the w
arehouse to
hav
e som
e
indep
enden
ce from
the d
ata used
and p
roduced
by th
e OL
TP
system
s. For ex
ample, each
bran
ch h
as a natu
ral key
, nam
ely b
ranchN
oan
d also
a surro
gate k
ey n
amely
bra
nchID
.
Sta
rA
lo
gic
al
stru
ctu
re th
at
has a fa
ct
tab
le conta
inin
g fa
ctu
al
data
in
th
e
sc
he
ma
cente
r, surro
und
ed
by d
imensio
n ta
ble
s c
onta
inin
g re
fere
nce
data
(whic
h
can b
e d
enorm
aliz
ed
).
The star sch
ema ex
plo
its the ch
aracteristics of factu
al data su
ch th
at facts are gen
erated
by ev
ents th
at occu
rred in
the p
ast, and are u
nlik
ely to
chan
ge, reg
ardless o
f how
they
are
analy
zed. A
s the b
ulk
of d
ata in a d
ata wareh
ouse is rep
resented
as facts, the fact tab
les
can b
e extrem
ely larg
e relative to
the d
imen
sion tab
les. As su
ch, it is im
portan
t to treat
fact data as read
-only
reference d
ata that w
ill not ch
ange o
ver tim
e. The m
ost u
seful fact
tables co
ntain
one o
r more n
um
erical measu
res, or ‘facts’, th
at occu
r for each
record
. In
Fig
ure 3
2.1
, the facts are o
fferP
rice, s
ellin
gP
rice, s
ale
Com
mis
sio
n, and s
ale
Revenue. T
he m
ost
usefu
l facts in a fact tab
le are num
eric and ad
ditiv
e becau
se data w
arehouse ap
plicatio
ns
almost n
ever access a sin
gle reco
rd; rath
er, they
access hundred
s, thousan
ds, o
r even
millio
ns o
f record
s at a time an
d th
e most u
seful th
ing to
do w
ith so
man
y reco
rds is to
aggreg
ate them
.
Dim
ensio
n
tables,
by
contrast,
gen
erally
contain
descrip
tive
textu
al in
form
ation.
Dim
ensio
n attrib
utes are u
sed as th
e constrain
ts in d
ata wareh
ouse q
ueries. F
or ex
ample,
the star sch
ema sh
ow
n in
Fig
ure 3
2.1
can su
pport q
ueries th
at require access to
sales
of p
roperties in
Glasg
ow
usin
g th
e city
attribute o
f the P
rop
erty
ForS
ale
table, an
d o
n sales
of p
roperties th
at are flats u
sing th
e typ
eattrib
ute in
the P
rop
erty
ForS
ale
table. In
fact, the
usefu
lness o
f a data w
arehouse is in
relation to
the ap
pro
priaten
ess of th
e data h
eld in
the
dim
ensio
n tab
les.
32
.2
11
84
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
Star sch
emas can
be u
sed to
speed
up q
uery
perfo
rman
ce by d
enorm
alizing referen
ce
info
rmatio
n in
to a sin
gle d
imen
sion tab
le. For ex
ample, in
Fig
ure 3
2.1
note th
at several
dim
ensio
n tab
les (n
amely
P
rop
erty
ForS
ale,
Bra
nch,
Clie
ntB
uyer,
Sta
ff, an
d
Ow
ner)
contain
locatio
n d
ata (city,
reg
ion, an
d country), w
hich
is repeated
in each
. Den
orm
alization is
appro
priate w
hen
there are a n
um
ber o
f entities related
to th
e dim
ensio
n tab
le that are o
ften
accessed,
avoid
ing
the
overh
ead
of
hav
ing
to
join
ad
ditio
nal
tables
to
access th
ose
attributes. D
enorm
alization is n
ot ap
pro
priate w
here th
e additio
nal d
ata is not accessed
very
often
, becau
se the o
verh
ead o
f scannin
g th
e expan
ded
dim
ensio
n tab
le may
not b
e
offset b
y an
y g
ain in
the q
uery
perfo
rman
ce.
Sn
ow
fla
ke
A v
aria
nt o
f the s
tar s
chem
a w
he
re d
imensio
n ta
ble
s d
o n
ot c
onta
in
sc
he
ma
denorm
aliz
ed
data
.
Fig
ure
32
.1
Sta
r schem
a fo
r
pro
perty
sa
les o
f
Dre
am
Hom
e.
32.2
Dim
ensio
nality
Mod
elin
g|
11
85
There is a v
ariation to
the star sch
ema called
the sn
ow
flak
e schem
a, w
hich
allow
s
dim
ensio
ns to
hav
e dim
ensio
ns. F
or ex
ample, w
e could
norm
alize the lo
cation d
ata (city,
reg
ion, an
d c
ountry
attributes) in
the B
ranch
dim
ensio
n tab
le of F
igure 3
2.1
to create tw
o
new
dim
ensio
n tab
les called C
ityan
d R
eg
ion. A
norm
alized v
ersion o
f the B
ranch
dim
en-
sion tab
le of th
e pro
perty
sales schem
a is show
n in
Fig
ure 3
2.2
. In a sn
ow
flak
e schem
a
the lo
cation d
ata in th
e Pro
perty
ForS
ale, C
lientB
uyer, S
taff, an
d O
wner
dim
ensio
n tab
les would
also b
e removed
and th
e new
City
and R
eg
ion
dim
ensio
n tab
les would
be sh
ared w
ith th
ese
tables.
Sta
rfla
ke
A
hyb
rid
stru
ctu
re
tha
t c
on
tain
s
a
mix
ture
o
f sta
r a
nd
sn
ow
fla
ke
sc
he
ma
schem
as.
The m
ost ap
pro
priate d
atabase sch
emas u
se a mix
ture o
f den
orm
alized star an
d n
or-
malized
snow
flak
e schem
as. This co
mbin
ation o
f star and sn
ow
flak
e schem
as is called a
starfl
ak
e schem
a. S
om
e dim
ensio
ns m
ay b
e presen
t in b
oth
form
s to cater fo
r differen
t
query
requirem
ents. W
heth
er the sch
ema is star, sn
ow
flak
e, or starfl
ake, th
e pred
ictable
and stan
dard
fo
rm of
the
underly
ing dim
ensio
nal
model
offers
importan
t ad
van
tages
with
in a d
ata wareh
ouse en
viro
nm
ent in
cludin
g:
nE
fficien
cyT
he co
nsisten
cy o
f the u
nderly
ing d
atabase stru
cture allo
ws m
ore effi
cient
access to th
e data b
y v
arious to
ols in
cludin
g rep
ort w
riters and q
uery
tools.
nA
bility to
handle ch
angin
g req
uirem
ents
The star sch
ema can
adap
t to ch
anges in
the
user’s req
uirem
ents, as all d
imen
sions are eq
uiv
alent in
terms o
f pro
vid
ing access to
the
fact table. T
his m
eans th
at the d
esign is b
etter able to
support a
d h
oc
user q
ueries.
Fig
ure
32
.2
Pa
rt of s
tar s
che
ma
for p
rop
erty
sale
s o
f
Dre
am
Ho
me w
ith a
norm
aliz
ed
vers
ion
of th
e B
ranch
dim
ensio
n ta
ble
.
11
86
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
nE
xtensib
ilityT
he d
imen
sional m
odel is ex
tensib
le; for ex
ample ty
pical ch
anges th
at
a DM
must su
pport in
clude: (a) ad
din
g n
ew facts as lo
ng as th
ey are co
nsisten
t with
the fu
ndam
ental g
ranularity
of th
e existin
g fact tab
le; (b) ad
din
g n
ew d
imen
sions, as
long as th
ere is a single v
alue o
f that d
imen
sion d
efined
for each
existin
g fact reco
rd;
(c) addin
g n
ew d
imen
sional attrib
utes; an
d (d
) break
ing ex
isting d
imen
sion reco
rds
dow
n to
a low
er level o
f gran
ularity
from
a certain p
oin
t in tim
e forw
ard.
nA
bility to
model co
mm
on b
usin
ess situatio
ns
There are a g
row
ing n
um
ber o
f standard
appro
aches fo
r han
dlin
g co
mm
on m
odelin
g situ
ations in
the b
usin
ess world
. Each
of
these situ
ations h
as a well-u
ndersto
od set o
f alternativ
es that can
be sp
ecifically
pro
-
gram
med
in rep
ort w
riters, query
tools, an
d o
ther u
ser interfaces; fo
r exam
ple, slo
wly
chan
gin
g d
imen
sions w
here a ‘co
nstan
t’ dim
ensio
n su
ch as
Bra
nch
or
Sta
ffactu
ally
evolv
es slow
ly an
d asy
nch
ronously
. We d
iscuss slo
wly
chan
gin
g d
imen
sions in
more
detail in
Sectio
n 3
2.3
, Step
8.
nP
redicta
ble q
uery p
rocessin
gD
ata wareh
ouse ap
plicatio
ns th
at drill d
ow
n w
ill simply
be ad
din
g m
ore d
imen
sion attrib
utes fro
m w
ithin
a single star sch
ema. A
pplicatio
ns th
at
drill acro
ss will b
e linkin
g sep
arate fact tables to
geth
er thro
ugh th
e shared
(confo
rmed
)
dim
ensio
ns. E
ven
though th
e overall su
ite of star sch
emas in
the en
terprise d
imen
sional
model is co
mplex
, the q
uery
pro
cessing is v
ery p
redictab
le becau
se at the lo
west lev
el,
each fact tab
le should
be q
ueried
indep
enden
tly.
Co
mp
aris
on
of D
M a
nd
ER
mo
de
ls
In th
is section w
e com
pare an
d co
ntrast th
e dim
ensio
nal m
odel (D
M) w
ith th
e Entity
–
Relatio
nsh
ip (E
R) m
odel. A
s describ
ed in
the p
revio
us sectio
n, D
Ms are n
orm
ally u
sed to
desig
n th
e datab
ase com
ponen
t of a d
ata wareh
ouse w
hereas E
R m
odels h
ave trad
itionally
been
used
to d
escribe th
e datab
ase for O
nlin
e Tran
saction P
rocessin
g (O
LT
P) sy
stems.
ER
m
odelin
g is
a tech
niq
ue
for
iden
tifyin
g relatio
nsh
ips
among en
tities. A
m
ajor
goal o
f ER
modelin
g is to
remove red
undan
cy in
the d
ata. This is im
men
sely b
enefi
cial to
transactio
n p
rocessin
g b
ecause tran
sactions are m
ade v
ery sim
ple an
d d
etermin
istic. For
exam
ple, a tran
saction th
at updates a clien
t’s address n
orm
ally accesses a sin
gle reco
rd in
the C
lienttab
le. This access is ex
tremely
fast as it uses an
index
on th
e prim
ary k
ey c
lientN
o.
How
ever, in
mak
ing tran
saction p
rocessin
g effi
cient su
ch d
atabases can
not effi
ciently
and
easily su
pport a
d h
oc en
d-u
ser queries. T
raditio
nal b
usin
ess applicatio
ns su
ch as cu
stom
er
ord
ering, sto
ck co
ntro
l, and cu
stom
er invoicin
g req
uire m
any tab
les with
num
erous jo
ins
betw
een th
em. A
n E
R m
odel fo
r an en
terprise can
hav
e hundred
s of lo
gical en
tities, which
can m
ap to
hundred
s of p
hysical tab
les. Trad
itional E
R m
odelin
g d
oes n
ot su
pport th
e
main
attractio
n of
data
wareh
ousin
g,
nam
ely in
tuitiv
e an
d hig
h-p
erform
ance
retrieval
of d
ata.
The k
ey to
understan
din
g th
e relationsh
ip b
etween
dim
ensio
nal m
odels an
d E
ntity
–
Relatio
nsh
ip m
odels is th
at a single E
R m
odel n
orm
ally d
ecom
poses in
to m
ultip
le DM
s.
The m
ultip
le DM
s are then
associated
thro
ugh ‘sh
ared’ d
imen
sion tab
les. We d
escribe th
e
relationsh
ip b
etween
ER
models an
d D
Ms in
more d
etail in th
e follo
win
g sectio
n, in
which
we p
resent a d
atabase d
esign m
ethodolo
gy fo
r data w
arehouses.
32
.2.1
32.3
Data
base D
esig
n M
eth
od
olo
gy fo
r Data
Ware
houses
|1
18
7
Data
ba
se
De
sig
n M
eth
od
olo
gy fo
r D
ata
Wa
reh
ou
se
s
In th
is section w
e describ
e a step-b
y-step
meth
odolo
gy fo
r desig
nin
g th
e datab
ase of a
data w
arehouse. T
his m
ethodolo
gy w
as pro
posed
by K
imball an
d is called
the ‘N
ine-S
tep
Meth
odolo
gy’ (K
imball, 1
996). T
he step
s of th
is meth
odolo
gy are sh
ow
n in
Tab
le 32.1
.
There are m
any ap
pro
aches th
at offer altern
ative ro
utes to
the creatio
n o
f a data w
arehouse.
One o
f the m
ore su
ccessful ap
pro
aches is to
deco
mpose th
e desig
n o
f the d
ata wareh
ouse
into
more m
anag
eable p
arts, nam
ely d
ata marts (see S
ection 3
1.5
). At a later stag
e, the in
te-
gratio
n o
f the sm
aller data m
arts leads to
the creatio
n o
f the en
terprise-w
ide d
ata wareh
ouse.
Thus, a d
ata wareh
ouse is th
e unio
n o
f a set of sep
arate data m
arts implem
ented
over a
perio
d o
f time, p
ossib
ly b
y d
ifferent d
esign team
s, and p
ossib
ly o
n d
ifferent h
ardw
are and
softw
are platfo
rms.
The N
ine-S
tep M
ethodolo
gy sp
ecifies th
e steps req
uired
for th
e desig
n o
f a data m
art.
How
ever, th
e meth
odolo
gy also
ties togeth
er separate d
ata marts so
that o
ver tim
e they
merg
e togeth
er into
a coheren
t overall d
ata wareh
ou
se. We n
ow
describ
e the step
s show
n
in T
able 3
2.1
in so
me d
etail usin
g w
ork
ed ex
amples tak
en fro
m an
exten
ded
versio
n o
f the
Drea
mH
om
ecase stu
dy.
Ste
p 1
:C
ho
osin
g th
e p
roc
ess
The p
rocess (fu
nctio
n) refers to
the su
bject m
atter of a p
articular d
ata mart. T
he fi
rst
data m
art to b
e built sh
ould
be th
e one th
at is most lik
ely to
be d
elivered
on tim
e, with
in
budget, an
d to
answ
er the m
ost co
mm
ercially im
portan
t busin
ess questio
ns. T
he b
est
choice fo
r the fi
rst data m
art tends to
be th
e one th
at is related to
sales. This d
ata source is
likely
to b
e accessible an
d o
f hig
h q
uality
. In selectin
g th
e first d
ata mart fo
r Drea
mH
om
e,
we fi
rst iden
tify th
at the d
iscrete busin
ess pro
cesses of D
ream
Hom
ein
clude:
32
.3
Tab
le 32.1
Nin
e-Step
Meth
odolo
gy b
y K
imball (1
996).
Ste
pA
ctiv
ity
1C
hoosin
g th
e pro
cess
2C
hoosin
g th
e grain
3Id
entify
ing an
d co
nfo
rmin
g th
e dim
ensio
ns
4C
hoosin
g th
e facts
5S
torin
g p
re-calculatio
ns in
the fact tab
le
6R
oundin
g o
ut th
e dim
ensio
n tab
les
7C
hoosin
g th
e duratio
n o
f the d
atabase
8T
rackin
g slo
wly
chan
gin
g d
imen
sions
9D
ecidin
g th
e query
prio
rities and th
e query
modes
11
88
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
npro
perty
sales;
npro
perty
rentals (leasin
g);
npro
perty
view
ing;
npro
perty
advertisin
g;
npro
perty
main
tenan
ce.
The d
ata requirem
ents asso
ciated w
ith th
ese pro
cesses are show
n in
the E
R d
iagram
of
Fig
ure 3
2.3
. Note th
at this E
R d
iagram
form
s part o
f the d
esign d
ocu
men
tation, w
hich
describ
es the O
nlin
e Tran
saction P
rocessin
g (O
LT
P) sy
stems req
uired
to su
pport th
e busi-
ness p
rocesses o
f Drea
mH
om
e. The E
R d
iagram
of F
igure 3
2.3
has b
een sim
plifi
ed b
y
labelin
g o
nly
the m
ain en
tities and relatio
nsh
ips an
d is created
by fo
llow
ing S
teps 1
and 2
of th
e datab
ase desig
n m
ethodolo
gy d
escribed
earlier in C
hap
ters 15 an
d 1
6. T
he sh
aded
entities rep
resent th
e core facts fo
r each b
usin
ess pro
cess of D
ream
Hom
e. The b
usin
ess
pro
cess selected to
be th
e first d
ata mart is p
roperty
sales. The p
art of th
e orig
inal E
R
Fig
ure
32
.3E
R d
iag
ram
of a
n e
xte
nd
ed
ve
rsio
n o
f Dre
am
Hom
e.
32.3
Data
base D
esig
n M
eth
od
olo
gy fo
r Data
Ware
houses
|1
18
9
diag
ram th
at represen
ts the d
ata requirem
ents o
f the p
roperty
sales busin
ess pro
cess is
show
n in
Fig
ure 3
2.4
.
Ste
p 2
:C
ho
osin
g th
e g
rain
Choosin
g th
e grain
mean
s decid
ing ex
actly w
hat a fact tab
le record
represen
ts. For ex
ample,
the P
rop
erty
Sale
entity
show
n w
ith sh
adin
g in
Fig
ure 3
2.4
represen
ts the facts ab
out each
pro
perty
sale
and
beco
mes
the
fact tab
le of
the
pro
perty
sales
star sch
ema
show
n
prev
iously
in F
igure 3
2.1
. Therefo
re, the g
rain o
f the P
rop
erty
Sale
fact table is in
div
idual
pro
perty
sales.
Only
when
the g
rain fo
r the fact tab
le is chosen
can w
e iden
tify th
e dim
ensio
ns o
f the
fact table. F
or ex
ample, th
e Bra
nch, S
taff, O
wner, C
lientB
uyer, P
rop
erty
ForS
ale, an
d P
rom
otio
n
entities in
Fig
ure 3
2.4
will b
e used
to referen
ce the d
ata about p
roperty
sales and w
ill be-
com
e the d
imen
sion tab
les of th
e pro
perty
sales star schem
a show
n p
revio
usly
in F
igure 3
2.1
.
We also
inclu
de T
ime
as a core d
imen
sion, w
hich
is alway
s presen
t in star sch
emas.
The g
rain d
ecision fo
r the fact tab
le also d
etermin
es the g
rain o
f each o
f the d
imen
sion
tables. F
or ex
ample, if th
e grain
for th
e Pro
perty
Sale
fact table is an
indiv
idual p
roperty
sale,
then
the g
rain o
f the C
lientB
uyer
dim
ensio
n is th
e details o
f the clien
t who b
ought a p
artic-
ular p
roperty
.
Ste
p 3
:Id
en
tifyin
g a
nd
co
nfo
rmin
g th
e d
ime
nsio
ns
Dim
ensio
ns set th
e contex
t for ask
ing q
uestio
ns ab
out th
e facts in th
e fact table. A
well-
built set o
f dim
ensio
ns m
akes th
e data m
art understan
dab
le and easy
to u
se. We id
entify
dim
ensio
ns in
suffi
cient d
etail to d
escribe th
ings su
ch as clien
ts and p
roperties at th
e
correct g
rain. F
or ex
ample, each
client o
f the C
lientB
uyer
dim
ensio
n tab
le is describ
ed b
y
the c
lientID
, clie
ntN
o, clie
ntN
am
e, clie
ntT
yp
e, city, re
gio
n, and c
ountry
attributes, as sh
ow
n p
revi-
ously
in F
igure 3
2.1
. A p
oorly
presen
ted o
r inco
mplete set o
f dim
ensio
ns w
ill reduce th
e
usefu
lness o
f a data m
art to an
enterp
rise.
Fig
ure
32
.4
Pa
rt of E
R d
iag
ram
in F
igu
re 3
2.3
that
rep
resents
the
data
req
uire
men
ts o
f the
pro
pe
rty s
ale
s
bu
sin
ess p
roc
ess
of D
ream
Hom
e.
11
90
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
If any d
imen
sion o
ccurs in
two d
ata marts, th
ey m
ust b
e exactly
the sam
e dim
ensio
n, o
r
one m
ust b
e a math
ematical su
bset o
f the o
ther. O
nly
in th
is way
can tw
o d
ata marts sh
are
one o
r more d
imen
sions in
the sam
e applicatio
n. W
hen
a dim
ensio
n is u
sed in
more th
an
one d
ata mart, th
e dim
ensio
n is referred
to as b
eing co
nfo
rmed
. Exam
ples o
f dim
en-
sions th
at must co
nfo
rm b
etween
pro
perty
sales and p
roperty
advertisin
g are th
e Tim
e,
Pro
perty
ForS
ale, B
ranch, an
d P
rom
otio
ndim
ensio
ns. If th
ese dim
ensio
ns are n
ot sy
nch
ronized
or if th
ey are allo
wed
to d
rift out o
f synch
ronizatio
n b
etween
data m
arts, the o
verall d
ata
wareh
ouse w
ill fail, becau
se the tw
o d
ata marts w
ill not b
e able to
be u
sed to
geth
er.
For ex
ample, in
Fig
ure 3
2.5
we sh
ow
the star sch
emas fo
r pro
perty
sales and p
roperty
advertisin
g w
ith T
ime, P
rop
erty
ForS
ale, B
ranch, an
d P
rom
otio
nas co
nfo
rmed
dim
ensio
ns w
ith
light sh
adin
g.
Fig
ure
32
.5
Sta
r schem
as fo
r
pro
perty
sa
les a
nd
pro
perty
ad
ve
rtisin
g
with
Tim
e,
Pro
perty
Fo
rSa
le,
Bra
nch, a
nd
Pro
motio
n a
s
confo
rmed
(sh
are
d)
dim
ensio
n ta
ble
s.
32.3
Data
base D
esig
n M
eth
od
olo
gy fo
r Data
Ware
houses
|1
19
1
Ste
p 4
:C
ho
osin
g th
e fa
cts
The g
rain o
f the fact tab
le determ
ines w
hich
facts can b
e used
in th
e data m
art. All th
e
facts must b
e expressed
at the lev
el implied
by th
e grain
. In o
ther w
ord
s, if the g
rain
of th
e fact table is an
indiv
idual p
roperty
sale, then
all the n
um
erical facts must refer
to th
is particu
lar sale. Also
, the facts sh
ould
be n
um
eric and ad
ditiv
e. In F
igure 3
2.6
we
use th
e star schem
a of th
e pro
perty
rental p
rocess o
f Drea
mH
om
eto
illustrate a b
adly
structu
red fact tab
le. This fact tab
le is unusab
le with
non-n
um
eric facts (pro
motio
nN
am
e
and s
taffN
am
e), a non-ad
ditiv
e fact (month
lyR
ent), an
d a fact (la
stY
earR
evenue) at a d
ifferent
gran
ularity
fro
m th
e oth
er facts
in th
e tab
le. F
igure
32.7
sh
ow
s how
th
e Lease
fact
table
show
n in
F
igure
32.6
co
uld
be
corrected
so
th
at th
e fact
table
is ap
pro
priately
structu
red.
Additio
nal facts can
be ad
ded
to a fact tab
le at any tim
e pro
vid
ed th
ey are co
nsisten
t
with
the g
rain o
f the tab
le.
Fig
ure
32
.6
Sta
r sche
ma
for
pro
perty
ren
tals
of
Dre
am
Hom
e. T
his
is a
n e
xam
ple
of a
bad
ly s
tructu
red
fac
t tab
le w
ith
no
n-n
um
eric
fac
ts,
a n
on-a
dd
itive
fact,
an
d a
nu
me
ric fa
ct
with
an in
co
nsis
tent
gra
nula
rity w
ith th
e
oth
er fa
cts
in th
e
tab
le.
11
92
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
Ste
p 5
:S
torin
g p
re-c
alc
ula
tion
s in
the
fac
t tab
le
Once th
e facts hav
e been
selected each
should
be re-ex
amin
ed to
determ
ine w
heth
er there
are opportu
nities to
use p
re-calculatio
ns. A
com
mon ex
ample o
f the n
eed to
store p
re-
calculatio
ns o
ccurs w
hen
the facts co
mprise a p
rofi
t and lo
ss statemen
t. This situ
ation w
ill
often
arise when
the fact tab
le is based
on in
voices o
r sales. Fig
ure 3
2.7
show
s the fact tab
le
with
the re
ntD
ura
tion, to
talR
ent, c
lientA
llow
ance, s
taffC
om
mis
sio
n, and to
talR
evenue
attributes. T
hese
types o
f facts are usefu
l becau
se they
are additiv
e quan
tities, from
which
we can
deriv
e
valu
able in
form
ation su
ch as th
e averag
e clie
ntA
llow
ance
based
on ag
greg
ating so
me n
um
ber
of fact tab
le record
s. To calcu
late the to
talR
evenue
gen
erated p
er pro
perty
rental w
e subtract
the c
lientA
llow
ance
and th
e sta
ffCom
mis
sio
nfro
m to
talR
ent. A
lthough th
e tota
lRevenue
can alw
ays
be d
erived
from
these attrib
utes, w
e still need
to sto
re the to
talR
evenue. T
his is p
articularly
true fo
r a valu
e that is fu
ndam
ental to
an en
terprise, su
ch as to
talR
evenue, o
r if there is an
y
chan
ce of a u
ser calculatin
g th
e tota
lRevenue
inco
rrectly. T
he co
st of a u
ser inco
rrectly rep
-
resentin
g th
e tota
lRevenue
is offset ag
ainst th
e min
or co
st of a little red
undan
t data sto
rage.
Fig
ure
32
.7
Sta
r schem
a fo
r the
pro
perty
ren
tals
of
Dre
am
Hom
e. T
his
is
the s
chem
a s
ho
wn
in
Fig
ure
32.6
with
the
pro
ble
ms c
orre
cte
d.
32.3
Data
base D
esig
n M
eth
od
olo
gy fo
r Data
Ware
houses
|1
19
3
Ste
p 6
:R
ou
nd
ing
ou
t the
dim
en
sio
n ta
ble
s
In th
is step, w
e return
to th
e dim
ensio
n tab
les and ad
d as m
any tex
t descrip
tions to
the
dim
ensio
ns as p
ossib
le. The tex
t descrip
tions sh
ould
be as in
tuitiv
e and u
nderstan
dab
le to
the u
sers as possib
le. The u
sefuln
ess of a d
ata mart is d
etermin
ed b
y th
e scope an
d n
ature
of th
e attributes o
f the d
imen
sion tab
les.
Ste
p 7
:C
ho
osin
g th
e d
ura
tion
of th
e d
ata
ba
se
The d
uratio
n m
easures h
ow
far back
in tim
e the fact tab
le goes. In
man
y en
terprises,
there is a req
uirem
ent to
look at th
e same tim
e perio
d a y
ear or tw
o earlier. F
or o
ther en
ter-
prises, su
ch as in
suran
ce com
pan
ies, there m
ay b
e a legal req
uirem
ent to
retain d
ata
exten
din
g b
ack fi
ve o
r more y
ears. Very
large fact tab
les raise at least two v
ery sig
nifi
cant
data w
arehouse d
esign issu
es. First, it is o
ften in
creasingly
diffi
cult to
source in
creasingly
old
data.
The
old
er th
e data,
the
more
likely
th
ere w
ill be
pro
blem
s in
read
ing an
d
interp
reting th
e old
files o
r the o
ld tap
es. Seco
nd, it is m
andato
ry th
at the o
ld v
ersions
of th
e importan
t dim
ensio
ns b
e used
, not th
e most cu
rrent v
ersions. T
his is k
now
n as th
e
‘slow
ly ch
angin
g d
imen
sion’ p
roblem
, which
is describ
ed in
more d
etail in th
e follo
w-
ing step
.
Ste
p 8
:T
rac
kin
g s
low
ly c
ha
ng
ing
dim
en
sio
ns
The slo
wly
chan
gin
g d
imen
sion p
roblem
mean
s, for ex
ample, th
at the p
roper d
escriptio
n
of th
e old
client an
d th
e old
bran
ch m
ust b
e used
with
the o
ld tran
saction h
istory
. Often
,
the d
ata wareh
ouse m
ust assig
n a g
eneralized
key
to th
ese importan
t dim
ensio
ns in
ord
er
to d
istinguish
multip
le snap
shots o
f clients an
d b
ranch
es over a p
eriod o
f time.
There are th
ree basic ty
pes o
f slow
ly ch
angin
g d
imen
sions: T
ype 1
, where a ch
anged
dim
ensio
n attrib
ute is o
verw
ritten; T
ype 2
, where a ch
anged
dim
ensio
n attrib
ute cau
ses a
new
dim
ensio
n reco
rd to
be created
; and T
ype 3
, where a ch
anged
dim
ensio
n attrib
ute
causes an
alternate attrib
ute to
be created
so th
at both
the o
ld an
d n
ew v
alues o
f the attri-
bute are sim
ultan
eously
accessible in
the sam
e dim
ensio
n reco
rd.
Ste
p 9
: De
cid
ing
the
qu
ery
prio
rities a
nd
the
qu
ery
mo
de
s
In th
is step w
e consid
er physical d
esign issu
es. The m
ost critical p
hysical d
esign issu
es
affecting th
e end-u
ser’s percep
tion o
f the d
ata mart are th
e physical so
rt ord
er of th
e fact
table o
n d
isk an
d th
e presen
ce of p
re-stored
sum
maries o
r aggreg
ations. B
eyond th
ese issues
there are a h
ost o
f additio
nal p
hysical d
esign issu
es affecting ad
min
istration, b
ackup,
index
ing p
erform
ance, an
d secu
rity. F
or fu
rther in
form
ation o
n th
e issues affectin
g th
e
physical d
esign fo
r data w
arehouses th
e interested
reader is referred
to A
nah
ory
and
Murray
(1997).
At th
e end o
f this m
ethodolo
gy, w
e hav
e a desig
n fo
r a data m
art that su
pports th
e
requirem
ents o
f a particu
lar busin
ess pro
cess and also
allow
s the easy
integ
ration w
ith
oth
er related d
ata marts to
ultim
ately fo
rm th
e enterp
rise-wid
e data w
arehouse. T
able 3
2.2
lists the fact an
d d
imen
sion tab
les associated
with
the star sch
ema fo
r each b
usin
ess pro
cess
of D
ream
Hom
e(id
entifi
ed in
Step
1 o
f the m
ethodolo
gy).
11
94
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
We in
tegrate th
e star schem
as for th
e busin
ess pro
cesses of D
ream
Hom
eusin
g th
e con-
form
ed d
imen
sions. F
or ex
ample, all th
e fact tables sh
are the T
ime
and B
ranch
dim
ensio
ns
as show
n in
Tab
le 32.2
. A d
imen
sional m
odel, w
hich
contain
s more th
an o
ne fact tab
le
sharin
g o
ne o
r more co
nfo
rmed
dim
ensio
n tab
les, is referred to
as a fact co
nstella
tion
.
The fact co
nstellatio
n fo
r the D
ream
Hom
edata w
arehouse is sh
ow
n in
Fig
ure 3
2.8
. The
model h
as been
simplifi
ed b
y d
isplay
ing o
nly
the n
ames o
f the fact an
d d
imen
sion tab
les.
Note th
at the fact tab
les are show
n w
ith d
ark sh
adin
g an
d all th
e dim
ensio
n tab
les bein
g
confo
rmed
are show
n w
ith lig
ht sh
adin
g.
Fig
ure
32
.8
Dim
ensio
nal m
od
el
(fact c
onste
llatio
n)
for th
e D
ream
Hom
e
data
ware
ho
use
.
32.4
Crite
ria fo
r Assessin
g th
e D
imensio
nality
of a
Da
ta W
are
hou
se
|1
19
5
Crite
ria fo
r Asse
ssin
g th
e D
ime
nsio
na
lity
of a
Data
Wa
reh
ou
se
Sin
ce the 1
980s, d
ata wareh
ouses h
ave ev
olv
ed th
eir ow
n d
esign tech
niq
ues, d
istinct fro
m
OL
TP
system
s. Dim
ensio
nal d
esign tech
niq
ues h
ave em
erged
as the m
ain ap
pro
ach fo
r
most o
f the d
ata wareh
ouses. In
this sectio
n w
e describ
e the criteria p
roposed
by R
alph
Kim
ball to
measu
re the ex
tent to
which
a system
supports th
e dim
ensio
nal v
iew o
f data
wareh
ousin
g (K
imball, 2
000a,b
).
When
assessing a p
articular d
ata wareh
ouse rem
ember th
at few v
endors attem
pt to
pro
vid
e a com
pletely
integ
rated so
lutio
n. H
ow
ever, as a d
ata wareh
ouse is a co
mplete
system
, the criteria sh
ould
only
be u
sed to
assess com
plete en
d-to
-end sy
stems an
d n
ot a
collectio
n o
f disjo
inted
pack
ages th
at may
nev
er integ
rate well to
geth
er.
There are tw
enty
criteria div
ided
into
three b
road
gro
ups: a
rchitectu
re, adm
inistra
tion
,
and exp
ression
as show
n in
Tab
le 32.3
. The p
urp
ose o
f establish
ing th
ese criteria is to
establish
an o
bjectiv
e standard
for assessin
g h
ow
well a sy
stem su
pports th
e dim
ensio
nal
view
of d
ata wareh
ousin
g, an
d to
set the th
reshold
hig
h so
that v
endors h
ave a targ
et for
impro
vin
g th
eir system
s. The in
tended
way
to u
se this list is to
rate a system
on each
criterion w
ith a sim
ple 0
or 1
. A sy
stem q
ualifi
es for a 1
only
if it meets th
e full d
efinitio
n
of
support
for
that
criterion.
For
exam
ple,
a sy
stem th
at offers
aggreg
ate nav
igatio
n
(the fo
urth
criterion) th
at is availab
le only
to a sin
gle fro
nt-en
d to
ol g
ets a zero b
ecause
the ag
greg
ate nav
igatio
n is n
ot o
pen
. In o
ther w
ord
s, there can
be n
o p
artial credit fo
r a
criterion.
Arch
itectura
l criteria are fu
ndam
ental ch
aracteristics to th
e way
the en
tire system
is
org
anized
. These criteria u
sually
exten
d fro
m th
e back
end, th
rough th
e DB
MS
, to th
e
fronten
d an
d th
e user’s d
eskto
p.
Ad
min
istratio
n criteria
are more tactical th
an arch
itectural criteria, b
ut are co
nsid
ered
to be
essential
to th
e ‘sm
ooth
ru
nnin
g’
of
a dim
ensio
nally
orien
ted data
wareh
ouse.
These criteria g
enerally
affect IT p
ersonnel w
ho are b
uild
ing an
d m
aintain
ing th
e data
wareh
ouse.
Tab
le 32.2
Fact an
d d
imen
sion tab
les for each
busin
ess pro
cess of D
ream
Hom
e.
Busin
ess p
rocess
Fact ta
ble
Pro
perty
salesP
rop
erty
Sale
Pro
perty
rentals
Lease
Pro
perty
view
ing
Pro
perty
Vie
win
g
Pro
perty
advertisin
gA
dvert
Pro
perty
main
tenan
ceP
rop
erty
Main
tenance
Dim
ensio
n ta
ble
s
Tim
e, B
ranch, S
taff, P
rop
erty
ForS
ale
, Ow
ner,
Clie
ntB
uyer, P
rom
otio
n
Tim
e, B
ranch, S
taff, P
rop
erty
ForR
ent, O
wner,
Clie
ntR
ente
r, Pro
motio
n
Tim
e, B
ranch, P
rop
erty
ForS
ale
,P
rop
erty
ForR
ent, C
lientB
uyer, C
lientR
ente
r
Tim
e, B
ranch, P
rop
erty
ForS
ale
,P
rop
erty
ForR
ent, P
rom
otio
n, N
ew
sp
ap
er
Tim
e, B
ranch, S
taff, P
rop
erty
ForR
ent
32
.4
11
96
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
Exp
ression
criteriaare m
ostly
analy
tic capab
ilities that are n
eeded
in real-life situ
-
ations. T
he en
d-u
ser com
munity
experien
ces all expressio
n criteria d
irectly. T
he ex
pressio
n
criteria for d
imen
sional sy
stems are n
ot th
e only
features u
sers look fo
r in a d
ata ware-
house, b
ut th
ey are all cap
abilities th
at need
to ex
plo
it the p
ow
er of a d
imen
sional sy
stem.
A sy
stem th
at supports m
ost o
r all of th
ese dim
ensio
nal criteria w
ould
be ad
aptab
le,
easier to ad
min
ister, and ab
le to ad
dress m
any real-w
orld
applicatio
ns. T
he m
ajor p
oin
t of
dim
ensio
nal sy
stems is th
at they
are busin
ess-issue an
d en
d-u
ser driv
en. F
or fu
rther d
etails
of th
e criteria in T
able 3
2.3
, the in
terested read
er is referred to
Kim
ball (2
000a,b
).
Data
Wa
reh
ou
sin
g D
esig
n U
sin
g O
racle
We in
troduced
the O
racle DB
MS
in S
ection 8
.2. In
this sectio
n, w
e describ
e Ora
cleW
areh
ou
se Bu
ilder
(OW
B)
as a
key
co
mponen
t of
the
Oracle
Wareh
ouse
solu
tion,
enab
ling th
e desig
n an
d d
eplo
ym
ent o
f data w
arehouses, d
ata marts, an
d e-B
usin
ess intelli-
gen
ce applicatio
ns. O
WB
is a desig
n to
ol an
d an
extractio
n, tran
sform
ation, an
d lo
adin
g
32
.5
Tab
le 32.3
Criteria fo
r assessing th
e dim
ensio
nality
pro
vid
ed b
y a d
ataw
arehouse (K
imball, 2
000a,b
).
Gro
up
Crite
ria
Arch
itecture
Explicit d
eclaration
Confo
rmed
dim
ensio
ns an
d facts
Dim
ensio
nal in
tegrity
Open
aggreg
ate nav
igatio
n
Dim
ensio
nal sy
mm
etry
Dim
ensio
nal scalab
ility
Sparsity
toleran
ce
Ad
min
istratio
nG
raceful m
odifi
cation
Dim
ensio
nal rep
lication
Chan
ged
dim
ensio
n n
otifi
cation
Surro
gate k
ey ad
min
istration
Intern
ational co
nsisten
cy
Exp
ression
Multip
le-dim
ensio
n h
ierarchies
Rag
ged
-dim
ensio
n h
ierarchies
Multip
le valu
ed d
imen
sions
Slo
wly
chan
gin
g d
imen
sions
Roles o
f a dim
ensio
n
Hot-sw
appab
le dim
ensio
ns
On-th
e-fly fact ran
ge d
imen
sions
On-th
e-fly b
ehav
ior d
imen
sions
32.5
Data
Wa
reho
usin
g D
esig
n U
sin
g O
racle
|1
19
7
(ET
L) to
ol. A
n im
portan
t aspect o
f OW
B fro
m th
e custo
mers’ p
erspectiv
e is that it allo
ws
the in
tegratio
n o
f the trad
itional d
ata wareh
ousin
g en
viro
nm
ents w
ith th
e new
e-Busin
ess
enviro
nm
ents (O
racle Corp
oratio
n, 2
000). T
his sectio
n fi
rst pro
vid
es an o
verv
iew o
f the
com
ponen
ts of O
WB
and th
e underly
ing tech
nolo
gies an
d th
en d
escribes h
ow
the u
ser
would
apply
OW
B to
typical d
ata wareh
ousin
g task
s.
Ora
cle
Wa
reh
ou
se
Bu
ilde
r Co
mp
on
en
ts
OW
B p
rovid
es the fo
llow
ing p
rimary
functio
nal co
mponen
ts:
nA
reposito
ryco
nsistin
g o
f a set of tab
les in an
Oracle d
atabase th
at is accessed v
ia a
Java-b
ased access lay
er. The rep
osito
ry is b
ased o
n th
e Com
mon W
arehouse M
odel
(CW
M) stan
dard
, which
allow
s the O
WB
meta-d
ata to b
e accessible to
oth
er pro
ducts
that su
pport th
is standard
(see Sectio
n 3
1.4
.3).
nA
gra
ph
ical u
ser interfa
ce (G
UI)
that
enab
les access
toth
e rep
osito
ry.
The
GU
I
features g
raphical ed
itors an
d an
exten
sive u
se of w
izards. T
he G
UI is w
ritten in
Java,
mak
ing th
e fronten
d p
ortab
le.
nA
cod
e gen
erato
r, also w
ritten in
Java, g
enerates th
e code th
at enab
les the d
eplo
ym
ent
of d
ata wareh
ouses. T
he d
ifferent co
de ty
pes g
enerated
by O
WB
are discu
ssed later in
this sectio
n.
nIn
tegrators, which
are com
ponen
ts that are d
edicated
to ex
tracting d
ata from
a particu
lar
type o
f source. In
additio
n to
nativ
e support fo
r Oracle, o
ther relatio
nal, n
on-relatio
nal,
and fl
at-file d
ata sources, O
WB
integ
rators allo
w access to
info
rmatio
n in
enterp
rise
resource p
lannin
g (E
RP
) applicatio
ns su
ch as O
racle and S
AP
R/3
. The S
AP
integ
rator
pro
vid
es access to S
AP
transp
arent tab
les usin
g P
L/S
QL
code g
enerated
by O
WB
.
nA
n o
pen
interfa
ceth
at allow
s dev
elopers to
exten
d th
e extractio
n cap
abilities o
f OW
B,
while lev
eragin
g th
e ben
efits o
f the O
WB
framew
ork
. This o
pen
interface is m
ade av
ail-
able to
dev
elopers as p
art of th
e OW
B S
oftw
are Dev
elopm
ent K
it (SD
K).
nR
un
time, w
hich
is a set of tab
les, sequen
ces, pack
ages, an
d trig
gers th
at are installed
in th
e target sch
ema. T
hese d
atabase o
bjects are th
e foundatio
n fo
r the au
ditin
g an
d
error d
etection
/correctio
n cap
abilities o
f OW
B. F
or ex
ample, lo
ads can
be restarted
based
on in
form
ation sto
red in
the ru
ntim
e tables. O
WB
inclu
des a ru
ntim
e audit v
iewer
for b
row
sing th
e runtim
e tables an
d ru
ntim
e reports.
The arch
itecture o
f the O
racle Wareh
ouse B
uild
er is show
n in
Fig
ure 3
2.9
. Oracle W
are-
house B
uild
er is a key
com
ponen
t of th
e larger O
racle data w
arehouse. T
he o
ther p
roducts
that th
e OW
B m
ust w
ork
with
with
in th
e data w
arehouse in
clude:
nO
racle – th
e engin
e of O
WB
(as there is n
o ex
ternal serv
er);
nO
racle Enterp
rise Man
ager –
for sch
edulin
g;
nO
racle Work
flow
– fo
r dep
enden
cy m
anag
emen
t;
nO
racle Pure•E
xtract –
for M
VS
main
frame access;
nO
racle Pure•In
tegrate –
for cu
stom
er data q
uality
;
nO
racle Gatew
ays –
for relatio
nal an
d m
ainfram
e data access.
32
.5.1
11
98
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
Usin
g O
racle
Wa
reh
ou
se
Bu
ilde
r
In th
is section w
e describ
e how
OW
B assists th
e user in
som
e typical d
ata wareh
ousin
g
tasks lik
e defi
nin
g so
urce d
ata structu
res, desig
nin
g th
e target w
arehouse, m
appin
g so
urces
to targ
ets, gen
erating co
de, in
stantiatin
g th
e wareh
ouse, ex
tracting th
e data, an
d m
aintain
-
ing th
ew
arehouse.
Definin
g s
ourc
es
Once th
e requirem
ents h
ave b
een d
etermin
ed an
d all th
e data so
urces h
ave b
een id
entifi
ed,
a tool su
ch as O
WB
can b
e used
for co
nstru
cting th
e data w
arehouse. O
WB
can h
andle
a div
erse set of d
ata sources b
y m
eans o
f integ
rators. O
WB
also h
as the co
ncep
t of a
module, w
hich
is a logical g
roupin
g o
f related o
bjects. T
here are tw
o ty
pes o
f modules:
data so
urce an
d w
arehouse.
For ex
ample, a d
ata source m
odule m
ight co
ntain
all the
defi
nitio
ns o
f the tab
les in an
OL
TP
datab
ase that is a so
urce fo
r the d
ata wareh
ouse.
And a m
odule o
f type w
arehouse m
ight co
ntain
defi
nitio
ns o
f the facts, d
imen
sions, an
d
stagin
g tab
les that m
ake u
p th
e data w
arehouse. It is im
portan
t to n
ote th
at modules m
erely
contain
defi
nitio
ns, th
at is metad
ata, about eith
er sources o
r wareh
ouses, an
d n
ot o
bjects
that can
be p
opulated
or q
ueried
. A u
ser iden
tifies th
e integ
rators th
at are appro
priate
for th
e data so
urces, an
d each
integ
rator accesses a so
urce an
d im
ports th
e metad
ata
that d
escribes it.
Ora
cle
sourc
es
To co
nnect to
an O
racle datab
ase, the u
ser chooses th
e integ
rator fo
r Oracle d
atabases.
Nex
t, the u
ser supplies so
me m
ore d
etailed co
nnectio
n in
form
ation, fo
r exam
ple u
ser
nam
e, passw
ord
, and S
QL
*N
et connectio
n strin
g. T
his in
form
ation is u
sed to
defi
ne a
datab
ase link in
the d
atabase th
at hosts th
e OW
B rep
osito
ry. O
WB
uses th
is datab
ase link
to q
uery
the sy
stem catalo
g o
f the so
urce d
atabase an
d ex
tract metad
ata that d
escribes th
e
tables an
d v
iews o
f interest to
the u
ser. The u
ser experien
ces this as a p
rocess o
f visu
ally
insp
ecting th
e source an
d selectin
g o
bjects o
f interest.
Fig
ure
32
.9
Ora
cle
Ware
house
Build
er a
rch
itec
ture
.
32
.5.2
32.5
Data
Wa
reho
usin
g D
esig
n U
sin
g O
racle
|1
19
9
Non-O
racle
sourc
es
Non-O
racle datab
ases are accessed in
exactly
the sam
e way
as Oracle d
atabases. W
hat
mak
es th
is possib
le is
the
Tran
sparen
t G
ateway
tech
nolo
gy of
Oracle.
In essen
ce, a
Tran
sparen
t Gatew
ay allo
ws a n
on-O
racle datab
ase to b
e treated in
exactly
the sam
e
way
as if it were an
Oracle d
atabase. O
n th
e SQ
L lev
el, once th
e datab
ase link p
oin
ting to
the n
on-O
racle datab
ase has b
een d
efined
, the n
on-O
racle datab
ase can b
e queried
via
SE
LE
CT
just lik
e any O
racle datab
ase. In O
WB
, all the u
ser has to
do is id
entify
the ty
pe
of d
atabase, so
that O
WB
can select th
e appro
priate T
ransp
arent G
ateway
for th
e datab
ase
link d
efinitio
n. In
the case o
f MV
S m
ainfram
e sources, O
WB
and O
racle Pure•E
xtract
pro
vid
e data ex
traction fro
m so
urces su
ch as IM
S, D
B2, an
d V
SA
M. T
he p
lan is th
at
Oracle P
ure•E
xtract w
ill ultim
ately b
e integ
rated w
ith th
e OW
B tech
nolo
gy.
Fla
t file
s
OW
B su
pports tw
o k
inds o
f flat fi
les: character-d
elimited
and fi
xed
-length
files. If th
e data
source is a fl
at file, th
e user selects th
e integ
rator fo
r flat fi
les and sp
ecifies th
e path
and
file n
ame. T
he p
rocess o
f creating th
e meta-d
ata that d
escribes a fi
le is differen
t from
the
pro
cess used
for a tab
le in a d
atabase. W
ith a tab
le, the o
wnin
g d
atabase itself sto
res
exten
sive in
form
ation ab
out th
e table su
ch as th
e table n
ame, th
e colu
mn n
ames, an
d d
ata
types. T
his in
form
ation can
be easily
queried
from
the catalo
g. W
ith a fi
le, on th
e oth
er
han
d, th
e user assists in
the p
rocess o
f creating th
e metad
ata with
som
e intellig
ent g
uesses
supplied
by O
WB
. In O
WB
, this p
rocess is called
sam
plin
g.
Web
data
With
the p
roliferatio
n o
f the In
ternet, th
e new
challen
ge fo
r data w
arehousin
g is to
captu
re
data fro
m W
eb sites. T
here are d
ifferent ty
pes o
f data in
e-Busin
ess enviro
nm
ents: tran
s-
actional W
eb d
ata stored
in th
e underly
ing d
atabases; click
stream d
ata stored
in W
eb serv
er
log fi
les; registratio
n d
ata in d
atabases o
r log fi
les; and co
nso
lidated
clickstream
data in
the lo
g fi
les of W
eb an
alysis to
ols. O
WB
can ad
dress all th
ese sources w
ith its b
uilt-in
features fo
r accessing d
atabases an
d fl
at files.
Data
quality
A so
lutio
n to
the ch
allenge o
f data q
uality
is OW
B w
ith O
racle Pure•In
tegrate. O
racle
Pure•In
tegrate is cu
stom
er data in
tegratio
n so
ftware th
at auto
mates th
e creation o
f con-
solid
ated
pro
files
of
custo
mers
and
related
busin
ess data
to
support
e-Busin
ess an
d
custo
mer relatio
nsh
ip m
anag
emen
t applicatio
ns. P
ure•In
tegrate co
mplem
ents O
WB
by
pro
vid
ing ad
van
ced d
ata transfo
rmatio
n an
d clean
sing featu
res desig
ned
specifi
cally to
meet th
e requirem
ents o
f datab
ase applicatio
ns. T
hese in
clude:
nin
tegrated
nam
e and ad
dress p
rocessin
g to
standard
ize, correct, an
d en
han
ce represen
ta-
tions o
f custo
mer n
ames an
d lo
cations;
nad
van
ced p
robab
ilistic match
ing to
iden
tify u
niq
ue co
nsu
mers, b
usin
esses, househ
old
s,
super-h
ouseh
old
s, or o
ther en
tities for w
hich
no co
mm
on id
entifi
ers exist;
npow
erful ru
le-based
merg
ing to
resolv
e confl
icting d
ata and create th
e ‘best p
ossib
le’
integ
rated resu
lt from
the m
atched
data.
12
00
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
Desig
nin
g th
e ta
rget w
are
house
Once th
e source sy
stems h
ave b
een id
entifi
ed an
d d
efined
, the n
ext task
is to d
esign th
e
target w
arehouse b
ased o
n u
ser requirem
ents. O
ne o
f the m
ost p
opular d
esigns in
data
wareh
ousin
g is th
e star schem
a and its v
ariations, as d
iscussed
in S
ection 3
2.2
. Also
, man
y
busin
ess intellig
ence to
ols su
ch as O
racle Disco
verer are o
ptim
ized fo
r this k
ind o
f desig
n.
OW
B su
pports all v
ariations o
f star schem
a desig
ns. It featu
res wizard
s and g
raphical
edito
rs for fact an
d d
imen
sions tab
les. For ex
ample, in
the D
imen
sion E
dito
r the u
ser
grap
hically
defi
nes th
e attributes, lev
els, and h
ierarchies o
f a dim
ensio
n.
Map
pin
g s
ourc
es to
targ
ets
When
both
the so
urces an
d th
e target h
ave b
een w
ell defi
ned
, the n
ext step
is to m
ap th
e
two to
geth
er. Rem
ember th
at there are tw
o ty
pes o
f modules: so
urce m
odules an
d w
are-
house m
odules. M
odules can
be reu
sed m
any tim
es in d
ifferent m
appin
gs. W
arehouse
modules can
them
selves b
e used
as source m
odules. F
or ex
ample, in
an arch
itecture w
here
we h
ave an
OL
TP
datab
ase that feed
s a central d
ata wareh
ouse, w
hich
in tu
rn feed
s a data
mart, th
e data w
arehouse is a targ
et (from
the p
erspectiv
e of th
e OL
TP
datab
ase) and a
source (fro
m th
epersp
ective o
f the d
ata mart).
The m
appin
gs o
f OW
B are d
efined
on tw
o lev
els. A h
igh-level
mappin
g th
atin
dicates
source an
d targ
et modules. O
ne lev
el dow
n is th
e deta
il mappin
g th
atallo
ws a u
ser to m
ap
source co
lum
ns to
target co
lum
ns an
d d
efines tran
sform
ations. O
WB
features a b
uilt-in
transfo
rmatio
n lib
rary fro
m w
hich
the u
ser can p
ick p
redefi
ned
transfo
rmatio
ns. U
sers can
also d
efine th
eir ow
n tran
sform
ations in
PL
/SQ
L an
d Jav
a.
Genera
ting
cod
e
The C
ode G
enerato
r is the O
WB
com
ponen
t that read
s the targ
et defi
nitio
ns an
d so
urce-
to-targ
et map
pin
gs an
d g
enerates co
de to
implem
ent th
e wareh
ouse. T
he ty
pe o
f gen
erated
code v
aries dep
endin
g o
n th
e type o
f object th
at the u
ser wan
ts to im
plem
ent.
Log
ical v
ers
us p
hysic
al d
esig
n
Befo
re gen
erating co
de, th
e user h
as prim
arily b
een w
ork
ing o
n th
e logical lev
el, that is,
on th
e level o
f object d
efinitio
ns. O
n th
is level, th
e user is co
ncern
ed w
ith cap
turin
g all th
e
details
and relatio
nsh
ips
(the
seman
tics) of
an object,
but
is not
yet
concern
ed w
ith
defi
nin
g an
y im
plem
entatio
n ch
aracteristics. For ex
ample, co
nsid
er a table to
be im
ple-
men
ted in
an O
racle datab
ase. On th
e logical lev
el, the u
ser may
be co
ncern
ed w
ith th
e
table n
ame, th
e num
ber o
f colu
mns, th
e colu
mn n
ames an
d d
ata types, an
d an
y relatio
n-
ship
s that th
e table h
as to o
ther tab
les. On th
e physical lev
el, how
ever, th
e questio
n
beco
mes: h
ow
can th
is table b
e optim
ally im
plem
ented
in an
Oracle d
atabase? T
he u
ser
must n
ow
be co
ncern
ed w
ith th
ings lik
e tablesp
aces, index
es, and sto
rage p
arameters (see
Sectio
n 8
.2.2
). OW
B allo
ws th
e user to
view
and m
anip
ulate an
object o
n b
oth
the lo
gical
and p
hysical lev
el. The lo
gical d
efinitio
n an
d p
hysical im
plem
entatio
n d
etails are auto
-
matically
synch
ronized
.
32.5
Data
Wa
reho
usin
g D
esig
n U
sin
g O
racle
|1
20
1
Config
ura
tion
In O
WB
, the p
rocess o
f assignin
g p
hysical ch
aracteristics to an
object is called
confi
gura-
tion. T
he sp
ecific ch
aracteristics that can
be d
efined
dep
end o
n th
e object th
at is bein
g
confi
gured
. These o
bjects in
clude, fo
r exam
ple, sto
rage p
arameters, in
dex
es, tablesp
aces,
and p
artitions.
Valid
atio
n
It is good p
ractice to ch
eck th
e object d
efinitio
ns fo
r com
pleten
ess and co
nsisten
cy p
rior
to co
de g
eneratio
n. O
WB
offers a v
alidate featu
re to au
tom
ate this p
rocess. E
rrors d
etect-
able b
y th
e valid
ation p
rocess in
clude, fo
r exam
ple, d
ata type m
ismatch
es betw
een so
urces
and targ
ets, and fo
reign k
ey erro
rs.
Genera
tion
The fo
llow
ing are so
me o
f the m
ain ty
pes o
f code th
at OW
B p
roduces:
nSQ
L
Data
D
efinitio
n
Language
(DD
L)
com
mands
A
wareh
ouse
module
with
its
defi
nitio
ns o
f fact and d
imen
sion tab
les is implem
ented
as a relational sch
ema in
an
Oracle d
atabase. O
WB
gen
erates SQ
L D
DL
scripts th
at create this sch
ema. T
he scrip
ts
can eith
er be ex
ecuted
from
with
in O
WB
or sav
ed to
the fi
le system
for later, m
anual
execu
tion.
nP
L/S
QL
pro
gra
ms
A so
urce-to
-target m
appin
g resu
lts in a P
L/S
QL
pro
gram
if the
source is a d
atabase, w
heth
er Oracle o
r non-O
racle. The P
L/S
QL
pro
gram
accesses
the so
urce d
atabase v
ia a datab
ase link, p
erform
s the tran
sform
ations as d
efined
in th
e
map
pin
g, an
d lo
ads th
e data in
to th
e target tab
le.
nSQ
L*L
oader co
ntro
l files
If the so
urce in
a map
pin
g is a fl
at file, O
WB
gen
erates a
contro
l file fo
r use w
ith S
QL
*L
oad
er.
nT
cl scripts
OW
B also
gen
erates Tcl scrip
ts. These can
be u
sed to
sched
ule P
L/S
QL
and S
QL
*L
oad
er map
pin
gs as jo
bs in
Oracle E
nterp
rise Man
ager –
for ex
ample, to
refresh th
e wareh
ouse at reg
ular in
tervals.
Insta
ntia
ting
the w
are
house a
nd
extra
ctin
g d
ata
Befo
re the d
ata can b
e moved
from
the so
urce to
the targ
et datab
ase, the d
evelo
per h
as to
instan
tiate the w
arehouse, in
oth
er word
s execu
te the g
enerated
DD
L scrip
ts to create th
e
target sch
ema. O
WB
refers to th
is step as d
eplo
ym
ent. O
nce th
e target sch
ema is in
place,
the P
L/S
QL
pro
gram
s can m
ove d
ata from
the so
urce in
to th
e target. N
ote th
at the b
asic
data m
ovem
ent m
echan
ism is IN
SE
RT
...S
EL
EC
T.
..w
ith th
e use o
f a datab
ase link.
If an erro
r should
occu
r, a routin
e from
one o
f the O
WB
runtim
e pack
ages lo
gs th
e error
in an
audit tab
le.
Main
tain
ing
the w
are
house
Once th
e data w
arehouse h
as been
instan
tiated an
d th
e initial lo
ad h
as been
com
pleted
, it
has to
be m
aintain
ed. F
or ex
ample, th
e fact table h
as to b
e refreshed
at regular in
tervals,
so th
at queries
return
up-to
-date
results.
Dim
ensio
n tab
les hav
e to
be
exten
ded
an
d
12
02
|C
hap
ter 3
2z
Data
Ware
housin
g D
esig
n
updated
, albeit m
uch
less frequen
tly th
an fact tab
les. An ex
ample o
f a slow
ly ch
angin
g
dim
ensio
n is th
e Custo
mer
table, in
which
a custo
mer’s ad
dress, m
arital status, o
r nam
e
may
all chan
ge o
ver tim
e. In ad
ditio
n to
INS
ER
T, O
WB
also su
pports o
ther w
ays o
f
man
ipulatin
g th
e wareh
ouse:
nU
PD
AT
E
nD
EL
ET
E
nIN
SE
RT
/UP
DA
TE
(insert a ro
w; if it alread
y ex
ists, update it)
nU
PD
AT
E/IN
SE
RT
(update a ro
w; if it d
oes n
ot ex
ist, insert it)
These featu
res giv
e the O
WB
user a v
ariety o
f tools to
undertak
e ongoin
g m
ainten
ance
tasks. O
WB
interfaces w
ith O
racle Enterp
rise Man
ager fo
r repetitiv
e main
tenan
ce tasks;
for ex
ample, a fact tab
le refresh th
at is sched
uled
to o
ccur at a reg
ular in
terval. F
or co
m-
plex
dep
enden
cies OW
B in
tegrates w
ith O
racle Work
flow
.
Meta
data
inte
gra
tion
OW
B is b
ased o
n th
e Com
mon W
arehouse M
odel (C
WM
) standard
(see Sectio
n 3
1.4
.3).
It can seam
lessly ex
chan
ge m
etadata w
ith O
racle Express an
d O
racle Disco
verer as w
ell
as oth
er busin
ess intellig
ence to
ols th
at com
ply
with
the stan
dard
.
Ch
ap
ter S
um
ma
ry
nD
imen
sion
ality
mod
eling is a d
esign tech
niq
ue th
at aims to
presen
t the d
ata in a stan
dard
, intu
itive fo
rm th
at
allow
s for h
igh-p
erform
ance access.
nE
very
dim
ensio
nal m
od
el(D
M) is co
mposed
of o
ne tab
le with
a com
posite p
rimary
key
, called th
e fact ta
ble,
and a set o
f smaller tab
les called d
imen
sion
tab
les. Each
dim
ensio
n tab
le has a sim
ple (n
on-co
mposite)
prim
ary k
ey th
at corresp
onds ex
actly to
one o
f the co
mponen
ts of th
e com
posite k
ey in
the fact tab
le. In o
ther
word
s, the p
rimary
key
of th
e fact table is m
ade u
p o
f two o
r more fo
reign k
eys. T
his ch
aracteristic ‘star-like’
structu
re is called a sta
r schem
aor sta
r join
.
nS
tar
schem
ais a lo
gical stru
cture th
at has a fact tab
le contain
ing factu
al data in
the cen
ter, surro
unded
by
dim
ensio
n tab
les contain
ing referen
ce data (w
hich
can b
e den
orm
alized).
nT
he star sch
ema ex
plo
its the ch
aracteristics of fa
ctual d
ata
such
that facts are g
enerated
by ev
ents th
at
occu
rred in
the p
ast, and are u
nlik
ely to
chan
ge, reg
ardless o
f how
they
are analy
zed. A
s the b
ulk
of d
ata in
the d
ata wareh
ouse is rep
resented
with
in facts, th
e fact tables can
be ex
tremely
large relativ
e to th
e dim
ensio
n
tables.
nT
he m
ost u
seful facts in
a fact ta
ble
are num
erical and ad
ditiv
e becau
se data w
arehouse ap
plicatio
ns alm
ost
nev
er access a single reco
rd; rath
er, they
access hundred
s, thousan
ds, o
r even
millio
ns o
f record
s at a time an
d
the m
ost u
seful th
ing to
do w
ith so
man
y reco
rds is to
aggreg
ate them
.
nD
imen
sion
tab
lesm
ost o
ften co
ntain
descrip
tive tex
tual in
form
ation. D
imen
sion attrib
utes are u
sed as th
e
constrain
ts in d
ata wareh
ouse q
ueries.
nS
now
flak
e schem
ais a v
ariant o
f the star sch
ema w
here d
imen
sion tab
les do n
ot co
ntain
den
orm
alized d
ata.
nS
tarfl
ak
e schem
ais a h
ybrid
structu
re that co
ntain
s a mix
ture o
f star and sn
ow
flak
e schem
as.
Exerc
ises
|1
20
3
Revie
w Q
ue
stio
ns
31.1
Iden
tify th
e majo
r issues asso
ciated w
ith
desig
nin
g a d
ata wareh
ouse d
atabase.
31.2
Describ
e how
a dim
ensio
nal m
odel (D
M)
differs fro
m an
Entity
–R
elationsh
ip (E
R)
model.
31.3
Presen
t a diag
ramm
atic represen
tation o
f a
typical star sch
ema.
31.4
Describ
e how
the fact an
d d
imen
sional tab
les
of a star sch
ema d
iffer.
31.5
Describ
e how
star, snow
flak
e, and starfl
ake
schem
as differ.
31.6
The star, sn
ow
flak
e, and starfl
ake sch
emas
offer im
portan
t advan
tages in
a data
wareh
ouse en
viro
nm
ent. D
escribe th
ese
advan
tages.
31.7
Describ
e the m
ain activ
ities associated
with
each step
of th
e Nin
e-Step
Meth
odolo
gy fo
r
data w
arehouse d
atabase d
esign.
31.8
Describ
e the p
urp
ose o
f assessing th
e
dim
ensio
nality
of a d
ata wareh
ouse.
31.9
Briefl
y o
utlin
e the criteria g
roups u
sed to
assess the d
imen
sionality
of a d
ata
wareh
ouse.
31.1
0D
escribe h
ow
the O
racle Wareh
ouse
Build
er supports th
e desig
n o
f a data
wareh
ouse.
nT
he k
ey to
understan
din
g th
e relationsh
ip b
etween
dim
ensio
nal m
odels an
d E
R m
odels is th
at a single E
R
model n
orm
ally d
ecom
poses in
to m
ultip
le DM
s. The m
ultip
le DM
s are then
associated
thro
ugh co
nfo
rmed
(shared
) dim
ensio
n tab
les.
nT
here are m
any ap
pro
aches th
at offer altern
ative ro
utes to
the creatio
n o
f a data w
arehouse. O
ne o
f the m
ore
successfu
l appro
aches is to
deco
mpose th
e desig
n o
f the d
ata wareh
ouse in
to m
ore m
anag
eable p
arts, nam
ely
data
marts. A
t a later stage, th
e integ
ration o
f the sm
aller data m
arts leads to
the creatio
n o
f the en
terprise-
wid
e data w
arehouse.
nT
he N
ine-S
tep M
ethod
olo
gy
specifi
es the step
s required
for th
e desig
n o
f a data m
art / wareh
ouse. T
he step
s
inclu
de: S
tep 1
Choosin
g th
e pro
cess, Step
2 C
hoosin
g th
e grain
, Step
3 Id
entify
ing an
d co
nfo
rmin
g th
e
dim
ensio
ns, S
tep 4
Choosin
g th
e facts, Step
5 S
torin
g p
re-calculatio
ns in
the fact tab
le, Step
6 R
oundin
g o
ut
the d
imen
sions, S
tep 7
Choosin
g th
e duratio
n o
f the d
atabase, S
tep 8
Track
ing slo
wly
chan
gin
g d
imen
sions,
and S
tep 9
Decid
ing th
e query
prio
rities and q
uery
modes.
nT
here are criteria to
measu
re the ex
tent to
which
a system
supports th
e dim
ensio
nal v
iew o
f data w
arehous-
ing. T
he criteria are d
ivid
ed in
to th
ree bro
ad g
roups: a
rchitectu
re, adm
inistra
tion
, and exp
ression.
nO
racle W
areh
ou
se Bu
ilder (O
WB
) is a key
com
ponen
t of th
e Oracle W
arehouse so
lutio
n, en
ablin
g th
e
desig
n an
d d
eplo
ym
ent o
f data w
arehouses, d
ata marts, an
d e-B
usin
ess intellig
ence ap
plicatio
ns. O
WB
is both
a desig
n to
ol an
d an
extractio
n, tran
sform
ation, an
d lo
adin
g (E
TL
) tool.
Ex
erc
ise
s
31.1
1U
se the N
ine-S
tep M
ethodolo
gy fo
r data w
arehouse d
atabase d
esign to
pro
duce d
imen
sional m
odels fo
r the
case studies d
escribed
in A
ppen
dix
B.
31.1
2U
se the N
ine-S
tep M
ethodolo
gy fo
r data w
arehouse d
atabase d
esign to
pro
duce a d
imen
sional m
odel fo
r all
or p
art of y
our o
rgan
ization.
33 C
hap
ter
OLA
P
Ch
ap
ter O
bje
ctiv
es
In th
is c
hap
ter y
ou w
ill learn
:
nThe p
urp
ose o
f Onlin
e A
naly
tical P
rocessin
g (O
LA
P).
nThe re
latio
nship
betw
een O
LA
P a
nd
data
ware
housin
g.
nThe k
ey fe
atu
res o
f OLA
P a
pp
licatio
ns.
nThe p
ote
ntia
l benefits
associa
ted
with
successfu
l OL
AP
ap
plic
atio
ns.
nH
ow
to re
pre
sent m
ulti-d
imensio
nal d
ata
.
nThe ru
les fo
r OLA
P to
ols
.
nThe m
ain
cate
gorie
s o
f OLA
P to
ols
.
nO
LA
P e
xte
nsio
ns to
the S
QL s
tand
ard
.
nH
ow
Ora
cle
sup
ports
OLA
P.
In C
hap
ter 31 w
e discu
ssed th
e increasin
g p
opularity
of d
ata wareh
ousin
g as a m
eans o
f
gain
ing
com
petitiv
e ad
van
tage.
We
learnt
that
data
wareh
ouses
brin
g
togeth
er larg
e
volu
mes o
f data fo
r the p
urp
oses o
f data an
alysis. U
ntil recen
tly, access to
ols fo
r large
datab
ase sy
stems
hav
e pro
vid
ed
only
lim
ited
and
relatively
sim
plistic
data
analy
sis.
How
ever, acco
mpan
yin
g th
e gro
wth
in d
ata wareh
ousin
g is an
ever-in
creasing d
eman
d b
y
users fo
r more p
ow
erful access to
ols th
at pro
vid
e advan
ced an
alytical cap
abilities. T
here
are tw
o
main
ty
pes
of
access to
ols
availab
le to
m
eet th
is dem
and,
nam
ely
Onlin
e
Analy
tical Pro
cessing (O
LA
P) an
d d
ata min
ing. T
hese to
ols d
iffer in w
hat th
ey o
ffer the
user an
d b
ecause o
f this th
ey are co
mplem
entary
technolo
gies.
A d
ata wareh
ouse (o
r more co
mm
only
one o
r more d
ata marts) to
geth
er with
tools su
ch
as OL
AP
and
/or d
ata min
ing are co
llectively
referred to
as Bu
siness In
telligen
ce(B
I)
technolo
gies. In
this ch
apter w
e describ
e OL
AP
and in
the fo
llow
ing ch
apter w
e describ
e
data m
inin
g.
33
.1 O
nlin
e A
naly
tica
l Pro
cessin
g|
12
05
Str
uc
ture
of th
is C
ha
pte
r
In S
ection 33.1
w
e in
troduce
Onlin
e A
naly
tical P
rocessin
g (O
LA
P)
and discu
ss th
e
relationsh
ip b
etween
OL
AP
and d
ata wareh
ousin
g. In
Sectio
n 3
3.2
we d
escribe O
LA
P
applicatio
ns an
d id
entify
the k
ey featu
res and p
oten
tial ben
efits asso
ciated w
ith O
LA
P
applicatio
ns. In
Sectio
n 3
3.3
we d
iscuss h
ow
multi-d
imen
sional d
ata can b
e represen
ted
and d
escribe th
e main
concep
ts associated
with
multi-d
imen
sional an
alysis. In
Sectio
n
33.4
we d
escribe th
e rules fo
r OL
AP
tools an
d h
ighlig
ht th
e characteristics an
d issu
es
associated
with
OL
AP
tools. In
Sectio
n 3
3.5
we d
iscuss h
ow
the S
QL
standard
has b
een
exten
ded
to in
clude O
LA
P fu
nctio
ns. F
inally
, in S
ection 3
3.6
, we d
escribe h
ow
Oracle
supports O
LA
P. T
he ex
amples in
this ch
apter are tak
en fro
m th
e Dream
Hom
ecase stu
dy
describ
ed in
Sectio
n 1
0.4
and A
ppen
dix
A.
On
line
An
aly
tica
l Pro
ce
ssin
g
Over th
e past few
decad
es, we h
ave w
itnessed
the in
creasing p
opularity
and p
revalen
ce of
relational D
BM
Ss su
ch th
at we n
ow
find a sig
nifi
cant p
roportio
n o
f corp
orate d
ata is housed
in su
ch sy
stems. R
elational d
atabases h
ave b
een u
sed p
rimarily
to su
pport trad
itional
Onlin
e Tran
saction P
rocessin
g (O
LT
P) sy
stems. T
o p
rovid
e appro
priate su
pport fo
r OL
TP
system
s, relational D
BM
Ss h
ave b
een d
evelo
ped
to en
able th
e hig
hly
efficien
t execu
tion
of a larg
e num
ber o
f relatively
simple tran
sactions.
In th
e past few
years, relatio
nal D
BM
S v
endors h
ave targ
eted th
e data w
arehousin
g
mark
et and h
ave p
rom
oted
their sy
stems as to
ols fo
r build
ing d
ata wareh
ouses. A
s dis-
cussed
in C
hap
ter 31, a d
ata wareh
ouse sto
res operatio
nal d
ata and is ex
pected
to su
pport
a wid
e range o
f queries fro
m th
e relatively
simple to
the h
ighly
com
plex
. How
ever, th
e
ability
to an
swer p
articular q
ueries is d
epen
den
t on th
e types o
f end-u
ser access tools
availab
le for u
se on th
e data w
arehouse. G
eneral-p
urp
ose to
ols su
ch as rep
ortin
g an
d q
uery
tools can
easily su
pport ‘w
ho?’ an
d ‘w
hat?’ q
uestio
ns ab
out p
ast even
ts. A ty
pical q
uery
subm
itted d
irectly to
a data w
arehouse is: ‘W
hat w
as the to
tal reven
ue fo
r Sco
tland in
the
third
quarter o
f 2004?’. In
this sectio
n w
e focu
s on a to
ol th
at can su
pport m
ore ad
van
ced
queries, n
amely
Onlin
e Analy
tical Pro
cessing (O
LA
P).
On
line
An
aly
tica
lThe d
ynam
ic s
ynth
esis
, analy
sis
, and
conso
lidatio
n o
f larg
e
Pro
ce
ssin
g (O
LA
P)
volu
mes o
f multi-d
imensio
nal d
ata
.
OL
AP
is a term th
at describ
es a technolo
gy th
at uses a m
ulti-d
imen
sional v
iew o
f aggre-
gate d
ata to p
rovid
e quick
access to strateg
ic info
rmatio
n fo
r the p
urp
oses o
f advan
ced
analy
sis (Codd e
t al., 1
995). O
LA
P en
ables u
sers to g
ain a d
eeper u
nderstan
din
g an
d k
now
-
ledge ab
out v
arious asp
ects of th
eir corp
orate d
ata thro
ugh fast, co
nsisten
t, interactiv
e access
to a w
ide v
ariety o
f possib
le view
s of th
e data. O
LA
P allo
ws th
e user to
view
corp
orate
data
in su
ch a w
ay th
at it is a better m
odel o
f the tru
e dim
ensio
nality
of th
e enterp
rise.
While O
LA
P sy
stems can
easily an
swer ‘w
ho?’ an
d ‘w
hat?’ q
uestio
ns, it is th
eir ability
to
answ
er ‘what if?’ an
d ‘w
hy?’ ty
pe q
uestio
ns th
at distin
guish
es them
from
gen
eral-purp
ose
33
.1