kbase paper
TRANSCRIPT
KBASE PAPER
ETL Services:
ETL - extract, transform and load - is the set of processes by which data is extracted from numerous databases, applications and systems, transformed as appropriate, and loaded into target systems - including, but not limited to, data warehouses, data marts, analytical applications, etc.
The first part of the extract, transform and load (ETL) process is understanding the data sources. The transformations are organization-specific and Integration is sometimes included in the ETL process; because it requires an in-depth knowledge of the organization and its business.
More than half of all development work for data warehousing projects is typically dedicated to the design and implementation of ETL processes. Poorly designed ETL processes are costly to maintain, change and update, so it is critical it is to make the right choices in terms of the right technology and tools that will be used for developing and maintaining the etl processes.
Knowl
edgeBase
consultants
has
vast experience
dev
eloping
ETL
complying
with
Star
Schema/Dim
ensional/Snowflake
approach
or
Normalized
a
pproach
depending
on
the
DataWarehouse
des
ign.
KnowledgeBase
consultants
have
experti
se
across
Extract, Transform
&
Load
(ETL)
t
ools
that are
effectively
used
to
extract an
d
unite
data
from
disparate
sources
and
d
eliver
meaningful and
actionable
business
in
telligence
across
the
organization.
ETL Pro
cess Management
KnowledgeBase handles the ent
ire data movement lifecycle – from source syst
ems to the staging area to the datawarehouse
to the final data mart.
Defining data source
s and the relevant mappings
1.Mainte
nance and monitoring of transforma
tion scripts
2.Creati
on of new transformation scripts for the new business requirements
3.Prepar
ation of batch scripts
Update
mapping
specifications
with
meta
d
ata
Update
monitoring
and
responsibility
processes
on
source
data
changes
Ensure
accurate
and
efficient
ETL
process
in
prod
uction
1.Rectifi
cation of a broken ETL process
2.Updati
on of the ETL process
3.Stream
line/ eliminate disjoi
nted extraction and transformation programs
4.Valida
tion routines to check for data consistency between source and
destination
Performance
tuning, error
handling
and
interdependent
scheduling
Some of the nume
rous technological approaches and solutions ava
ilable on the market include:
1.Traditi
onal engine-based ETL products
2.RDB
M
S proprietary solutions
3.Third-
generation ELT solutions, based on a code-generation approach that uses the power of
the RDBMS engines to perform the data transformations
Some of the key techno
logies used by KnowledgeBase for ETL are:
1.Micros
oft
SQL Server 2005 Integration Services
2.Pentah
o ETL
3.SAP
ETL
4.Data
Stage
5.Oracle
OWB
E
TL and Data Warehousi
ng from information m
anagement:E
T
L
C
o
n
ce
pt
s
E
xt
ra
cti
o
n,
tr
an
sf
or
m
at
io
n,
an
d
lo
ad
in
g.
E
T
L
re
fe
rs
to
th
e
m
et
h
o
ds
in
vo
lv
ed
in
ac
ce
ss
in
g
an
d
m
an
ip
ul
at
in
g
so
ur
ce
da
ta
an
d
lo
ad
in
g
it
in
to
ta
rg
et
da
ta
ba
se
.
T
he
fir
st
st
ep
in
E
T
L
pr
oc
es
s
is
m
ap
pi
ng
th
e
da
ta
be
tw
ee
n
so
ur
ce
sy
st
e
m
s
an
d
ta
rg
et
da
ta
ba
se
(d
at
a
w
ar
eh
o
us
e
or
da
ta
m
ar
t).
T
he
se
co
n
d
st
ep
is
cl
ea
ns
in
g
of
so
ur
ce
da
ta
in
st
ag
in
g
ar
ea
.
T
he
th
ir
d
st
ep
is
tr
an
sf
or
m
in
g
cl
ea
ns
ed
so
ur
ce
da
ta
an
d
th
en
lo
ad
in
g
in
to
th
e
ta
rg
et
sy
st
e
m
.
N
ot
e
th
at
E
T
T
(e
xt
ra
cti
o
n,
tr
an
sf
or
m
at
io
n,
tr
an
sp
or
ta
ti
o
n)
an
d
E
T
M
(e
xt
ra
cti
o
n,
tr
an
sf
or
m
at
io
n,
m
ov
e)
ar
e
so
m
et
i
m
es
us
ed
in
st
ea
d
of
E
T
L.
S
o
u
rc
e
S
ys
te
m
A
da
ta
ba
se
,
ap
pl
ic
at
io
n,
fil
e,
or
ot
he
r
st
or
ag
e
fa
cil
it
y
fr
o
m
w
hi
ch
th
e
da
ta
in
a
da
ta
w
ar
eh
o
us
e
is
de
ri
ve
d.
M
a
p
pi
n
g
T
he
de
fi
ni
ti
o
n
of
th
e
re
la
ti
o
ns
hi
p
an
d
da
ta
fl
o
w
be
tw
ee
n
so
ur
ce
an
d
ta
rg
et
ob
je
ct
s.
M
et
a
d
at
a
D
at
a
th
at
de
sc
ri
be
s
da
ta
an
d
ot
he
r
st
ru
ct
ur
es
,
su
ch
as
ob
je
ct
s,
b
us
in
es
s
ru
le
s,
an
d
pr
oc
es
se
s.
F
or
ex
a
m
pl
e,
th
e
sc
he
m
a
de
si
gn
of
a
da
ta
w
ar
eh
o
us
e
is
ty
pi
ca
lly
st
or
ed
in
a
re
p
os
it
or
y
as
m
et
ad
at
a,
w
hi
ch
is
us
ed
to
ge
ne
ra
te
sc
ri
pt
s
us
ed
to
b
ui
ld
an
d
p
o
p
ul
at
e
th
e
da
ta
w
ar
eh
o
us
e.
A
re
p
os
it
or
y
co
nt
ai
ns
m
et
ad
at
a.
St
a
gi
n
g
A
re
a
A
pl
ac
e
w
he
re
da
ta
is
pr
oc
es
se
d
be
fo
re
en
te
ri
ng
th
e
w
ar
eh
o
us
e.
Cl
e
a
n
si
n
g
T
he
pr
oc
es
s
of
re
so
lvi
ng
in
co
ns
ist
en
ci
es
an
d
fi
xi
ng
th
e
an
o
m
ali
es
in
so
ur
ce
da
ta
,
ty
pi
ca
lly
as
pa
rt
of
th
e
E
T
L
pr
oc
es
s.
T
r
a
n
sf
o
r
m
at
io
n
T
he
pr
oc
es
s
of
m
an
ip
ul
at
in
g
da
ta
.
A
ny
m
an
ip
ul
at
io
n
be
yo
n
d
co
py
in
g
is
a
tr
an
sf
or
m
at
io
n.
E
xa
m
pl
es
in
cl
u
de
cl
ea
ns
in
g,
ag
gr
eg
at
in
g,
an
d
in
te
gr
at
in
g
da
ta
fr
o
m
m
ul
ti
pl
e
so
ur
ce
s.
T
r
a
n
s
p
o
rt
at
io
n
T
he
pr
oc
es
s
of
m
ov
in
g
co
pi
ed
or
tr
an
sf
or
m
ed
da
ta
fr
o
m
a
so
ur
ce
to
a
da
ta
w
ar
eh
o
us
e.
T
a
rg
et
S
ys
te
m
A
da
ta
ba
se
,
ap
pl
ic
at
io
n,
fil
e,
or
ot
he
r
st
or
ag
e
fa
cil
it
y
to
w
hi
ch
th
e
"t
ra
ns
fo
r
m
ed
so
ur
ce
da
ta
"
is
lo
ad
ed
in
a
da
ta
w
ar
eh
o
us
e.
I
nf
o
r
m
at
ic
a
In
fo
r
m
at
ic
a
is
a
p
o
w
er
fu
l
E
T
L
to
ol
fr
o
m
In
fo
r
m
at
ic
a
C
or
p
or
at
io
n,
a
le
ad
in
g
pr
ov
id
er
of
en
te
rp
ri
se
da
ta
in
te
gr
at
io
n
so
ft
w
ar
e
an
d
E
T
L
so
ft
w
ar
es
.
T
he
i
m
p
or
ta
nt
In
fo
r
m
at
ic
a
C
o
m
p
o
ne
nt
s
ar
e:
•
P
o
w
e
r
E
x
c
h
a
n
g
e
•
P
o
w
e
r
C
e
n
t
e
r
•
P
o
w
e
r
C
e
n
t
e
r
C
o
n
n
e
c
t
•
P
o
w
e
r
E
x
c
h
a
n
g
e
•
P
o
w
e
r
C
h
a
n
n
e
l
•
M
e
t
a
d
a
t
a
E
x
c
h
a
n
g
e
•
P
o
w
e
r
A
n
a
l
y
z
e
r
•
S
u
p
e
r
G
l
u
e
In
In
fo
r
m
at
ic
a,
all
th
e
M
et
ad
at
a
in
fo
r
m
at
io
n
ab
o
ut
so
ur
ce
sy
st
e
m
s,
ta
rg
et
sy
st
e
m
s
an
d
tr
an
sf
or
m
at
io
ns
ar
e
st
or
ed
in
th
e
In
fo
r
m
at
ic
a
re
p
os
it
or
y.
In
fo
r
m
at
ic
a'
s
P
o
w
er
C
en
te
r
Cl
ie
nt
an
d
R
ep
os
it
or
y
Se
rv
er
ac
ce
ss
th
is
re
p
os
it
or
y
to
st
or
e
an
d
re
tri
ev
e
m
et
ad
at
a.
N
ot
e:
T
o
k
n
o
w
m
or
e
ab
o
ut
M
et
ad
at
a
an
d
its
si
gn
ifi
ca
nc
e,
pl
ea
se
cli
ck
he
re
.
S
o
u
rc
e
a
n
d
T
a
rg
et
:
C
o
ns
id
er
a
B
an
k
th
at
ha
s
go
t
m
an
y
br
an
ch
es
th
ro
ug
h
o
ut
th
e
w
or
ld
.
In
ea
ch
br
an
ch
da
ta
m
ay
be
st
or
ed
in
di
ff
er
en
t
so
ur
ce
sy
st
e
m
s
li
ke
or
ac
le,
sq
l
se
rv
er
,
te
rr
ad
at
a,
et
c.
W
he
n
th
e
B
an
k
de
ci
de
s
to
in
te
gr
at
e
its
da
ta
fr
o
m
se
ve
ra
l
so
ur
ce
s
fo
r
its
m
an
ag
e
m
en
t
de
ci
si
o
ns
,
it
m
ay
ch
oo
se
o
ne
or
m
or
e
sy
st
e
m
s
li
ke
or
ac
le,
sq
l
se
rv
er
,
te
rr
ad
at
a,
et
c.
as
its
da
ta
w
ar
eh
o
us
e
ta
rg
et.
M
an
y
or
ga
ni
sa
ti
o
ns
pr
ef
er
In
fo
r
m
at
ic
a
to
d
o
th
at
E
T
L
pr
oc
es
s,
be
ca
us
e
In
fo
r
m
at
ic
a
is
m
or
e
p
o
w
er
fu
l
in
de
si
gn
in
g
an
d
b
ui
ld
in
g
da
ta
w
ar
eh
o
us
es
.
It
ca
n
co
n
ne
ct
to
se
ve
ra
l
so
ur
ce
s
an
d
ta
rg
et
s
to
ex
tr
ac
t
m
et
a
da
ta
fr
o
m
so
ur
ce
s
an
d
ta
rg
et
s,
tr
an
sf
or
m
an
d
lo
ad
th
e
da
ta
in
to
ta
rg
et
sy
st
e
m
s.
G
ui
d
el
in
es
to
w
o
r
k
w
it
h
I
nf
o
r
m
at
ic
a
P
o
w
er
C
e
nt
er
•
R
e
p
o
s
i
t
o
r
y
:
T
h
i
s
i
s
w
h
e
r
e
a
ll
t
h
e
m
e
t
a
d
a
t
a
i
n
f
o
r
m
a
t
i
o
n
i
s
s
t
o
r
e
d
i
n
t
h
e
I
n
f
o
r
m
a
t
i
c
a
s
u
i
t
e
.
T
h
e
P
o
w
e
r
C
e
n
t
e
r
C
li
e
n
t
a
n
d
t
h
e
R
e
p
o
s
i
t
o
r
y
S
e
r
v
e
r
w
o
u
l
d
a
c
c
e
s
s
t
h
i
s
r
e
p
o
s
i
t
o
r
y
t
o
r
e
t
r
i
e
v
e
,
s
t
o
r
e
a
n
d
m
a
n
a
g
e
m
e
t
a
d
a
t
a
.
•
P
o
w
e
r
C
e
n
t
e
r
C
l
i
e
n
t
:
I
n
f
o
r
m
a
t
i
c
a
c
li
e
n
t
i
s
u
s
e
d
f
o
r
m
a
n
a
g
i
n
g
u
s
e
r
s
,
i
d
e
n
t
i
f
i
y
i
n
g
s
o
u
r
c
e
a
n
d
t
a
r
g
e
t
s
y
s
t
e
m
s
d
e
f
i
n
i
t
i
o
n
s
,
c
r
e
a
t
i
n
g
m
a
p
p
i
n
g
a
n
d
m
a
p
p
l
e
t
s
,
c
r
e
a
t
i
n
g
s
e
s
s
i
o
n
s
a
n
d
r
u
n
w
o
r
k
f
l
o
w
s
e
t
c
.
•
R
e
p
o
s
i
t
o
r
y
S
e
r
v
e
r
:
T
h
i
s
r
e
p
o
s
i
t
o
r
y
s
e
r
v
e
r
t
a
k
e
s
c
a
r
e
o
f
a
ll
t
h
e
c
o
n
n
e
c
t
i
o
n
s
b
e
t
w
e
e
n
t
h
e
r
e
p
o
s
i
t
o
r
y
a
n
d
t
h
e
P
o
w
e
r
C
e
n
t
e
r
C
li
e
n
t
.
•
P
o
w
e
r
C
e
n
t
e
r
S
e
r
v
e
r
:
P
o
w
e
r
C
e
n
t
e
r
s
e
r
v
e
r
d
o
e
s
t
h
e
e
x
t
r
a
c
t
i
o
n
f
r
o
m
s
o
u
r
c
e
a
n
d
t
h
e
n
l
o
a
d
i
n
g
d
a
t
a
i
n
t
o
t
a
r
g
e
t
s
.
•
D
e
s
i
g
n
e
r
:
S
o
u
r
c
e
A
n
a
l
y
z
e
r
,
M
a
p
p
i
n
g
D
e
s
i
g
n
e
r
a
n
d
W
a
r
e
h
o
u
s
e
D
e
s
i
g
n
e
r
a
r
e
t
o
o
l
s
r
e
s
i
d
e
w
i
t
h
i
n
t
h
e
D
e
s
i
g
n
e
r
w
i
z
a
r
d
.
S
o
u
r
c
e
A
n
a
l
y
z
e
r
i
s
u
s
e
d
f
o
r
e
x
t
r
a
c
t
i
n
g
m
e
t
a
d
a
t
a
f
r
o
m
s
o
u
r
c
e
s
y
s
t
e
m
s
.
M
a
p
p
i
n
g
D
e
s
i
g
n
e
r
i
s
u
s
e
d
t
o
c
r
e
a
t
e
m
a
p
p
i
n
g
b
e
t
w
e
e
n
s
o
u
r
c
e
s
a
n
d
t
a
r
g
e
t
s
.
M
a
p
p
i
n
g
i
s
a
p
i
c
t
o
r
i
a
l
r
e
p
r
e
s
e
n
t
a
t
i
o
n
a
b
o
u
t
t
h
e
f
l
o
w
o
f
d
a
t
a
f
r
o
m
s
o
u
r
c
e
t
o
t
a
r
g
e
t
.
W
a
r
e
h
o
u
s
e
D
e
s
i
g
n
e
r
i
s
u
s
e
d
f
o
r
e
x
t
r
a
c
t
i
n
g
m
e
t
a
d
a
t
a
f
r
o
m
t
a
r
g
e
t
s
y
s
t
e
m
s
o
r
m
e
t
a
d
a
t
a
c
a
n
b
e
c
r
e
a
t
e
d
i
n
t
h
e
D
e
s
i
g
n
e
r
i
t
s
e
l
f
.
•
D
a
t
a
C
l
e
a
n
s
i
n
g
:
T
h
e
P
o
w
e
r
C
e
n
t
e
r
'
s
d
a
t
a
c
l
e
a
n
s
i
n
g
t
e
c
h
n
o
l
o
g
y
i
m
p
r
o
v
e
s
d
a
t
a
q
u
a
li
t
y
b
y
v
a
li
d
a
t
i
n
g
,
c
o
r
r
e
c
t
l
y
n
a
m
i
n
g
a
n
d
s
t
a
n
d
a
r
d
i
z
a
t
i
o
n
o
f
a
d
d
r
e
s
s
d
a
t
a
.
A
p
e
r
s
o
n
'
s
a
d
d
r
e
s
s
m
a
y
n
o
t
b
e
s
a
m
e
i
n
a
ll
s
o
u
r
c
e
s
y
s
t
e
m
s
b
e
c
a
u
s
e
o
f
t
y
p
o
s
a
n
d
p
o
s
t
a
l
c
o
d
e
,
c
i
t
y
n
a
m
e
m
a
y
n
o
t
m
a
t
c
h
w
i
t
h
a
d
d
r
e
s
s
.
T
h
e
s
e
e
r
r
o
r
s
c
a
n
b
e
c
o
r
r
e
c
t
e
d
b
y
u
s
i
n
g
d
a
t
a
c
l
e
a
n
s
i
n
g
p
r
o
c
e
s
s
a
n
d
s
t
a
n
d
a
r
d
i
z
e
d
d
a
t
a
c
a
n
b
e
l
o
a
d
e
d
i
n
t
a
r
g
e
t
s
y
s
t
e
m
s
(
d
a
t
a
w
a
r
e
h
o
u
s
e
)
.
•
T
r
a
n
s
f
o
r
m
a
t
i
o
n
:
T
r
a
n
s
f
o
r
m
a
t
i
o
n
s
h
e
l
p
t
o
t
r
a
n
s
f
o
r
m
t
h
e
s
o
u
r
c
e
d
a
t
a
a
c
c
o
r
d
i
n
g
t
o
t
h
e
r
e
q
u
i
r
e
m
e
n
t
s
o
f
t
a
r
g
e
t
s
y
s
t
e
m
.
S
o
r
t
i
n
g
,
F
il
t
e
r
i
n
g
,
A
g
g
r
e
g
a
t
i
o
n
,
J
o
i
n
i
n
g
a
r
e
s
o
m
e
o
f
t
h
e
e
x
a
m
p
l
e
s
o
f
t
r
a
n
s
f
o
r
m
a
t
i
o
n
.
T
r
a
n
s
f
o
r
m
a
t
i
o
n
s
e
n
s
u
r
e
t
h
e
q
u
a
li
t
y
o
f
t
h
e
d
a
t
a
b
e
i
n
g
l
o
a
d
e
d
i
n
t
o
t
a
r
g
e
t
a
n
d
t
h
i
s
i
s
d
o
n
e
d
u
r
i
n
g
t
h
e
m
a
p
p
i
n
g
p
r
o
c
e
s
s
f
r
o
m
s
o
u
r
c
e
t
o
t
a
r
g
e
t
.
•
W
o
r
k
f
l
o
w
M
a
n
a
g
e
r
:
W
o
r
k
f
l
o
w
h
e
l
p
s
t
o
l
o
a
d
t
h
e
d
a
t
a
f
r
o
m
s
o
u
r
c
e
t
o
t
a
r
g
e
t
i
n
a
s
e
q
u
e
n
t
i
a
l
m
a
n
n
e
r
.
F
o
r
e
x
a
m
p
l
e
,
i
f
t
h
e
f
a
c
t
t
a
b
l
e
s
a
r
e
l
o
a
d
e
d
b
e
f
o
r
e
t
h
e
l
o
o
k
u
p
t
a
b
l
e
s
,
t
h
e
n
t
h
e
t
a
r
g
e
t
s
y
s
t
e
m
w
il
l
p
o
p
u
p
a
n
e
r
r
o
r
m
e
s
s
a
g
e
s
i
n
c
e
t
h
e
f
a
c
t
t
a
b
l
e
i
s
v
i
o
l
a
t
i
n
g
t
h
e
f
o
r
e
i
g
n
k
e
y
v
a
li
d
a
t
i
o
n
.
T
o
a
v
o
i
d
t
h
i
s
,
w
o
r
k
f
l
o
w
s
c
a
n
b
e
c
r
e
a
t
e
d
t
o
e
n
s
u
r
e
t
h
e
c
o
r
r
e
c
t
f
l
o
w
o
f
d
a
t
a
f
r
o
m
s
o
u
r
c
e
t
o
t
a
r
g
e
t
.
•
W
o
r
k
f
l
o
w
M
o
n
i
t
o
r
:
T
h
i
s
m
o
n
i
t
o
r
i
s
h
e
l
p
f
u
l
i
n
m
o
n
i
t
o
r
i
n
g
a
n
d
t
r
a
c
k
i
n
g
t
h
e
w
o
r
k
f
l
o
w
s
c
r
e
a
t
e
d
i
n
e
a
c
h
P
o
w
e
r
C
e
n
t
e
r
S
e
r
v
e
r
.
•
P
o
w
e
r
C
e
n
t
e
r
C
o
n
n
e
c
t
:
T
h
i
s
c
o
m
p
o
n
e
n
t
h
e
l
p
s
t
o
e
x
t
r
a
c
t
d
a
t
a
a
n
d
m
e
t
a
d
a
t
a
f
r
o
m
E
R
P
s
y
s
t
e
m
s
li
k
e
I
B
M
'
s
M
Q
S
e
r
i
e
s
,
P
e
o
p
l
e
s
o
f
t
,
S
A
P
,
S
i
e
b
e
l
e
t
c
.
a
n
d
o
t
h
e
r
t
h
i
r
d
p
a
r
t
y
a
p
p
li
c
a
t
i
o
n
s
.
•
P
o
w
e
r
C
e
n
t
e
r
E
x
c
h
a
n
g
e
:
T
h
i
s
c
o
m
p
o
n
e
n
t
h
e
l
p
s
t
o
e
x
t
r
a
c
t
d
a
t
a
a
n
d
m
e
t
a
d
a
t
a
f
r
o
m
E
R
P
s
y
s
t
e
m
s
li
k
e
I
B
M
'
s
M
Q
S
e
r
i
e
s
,
P
e
o
p
l
e
s
o
f
t
,
S
A
P
,
S
i
e
b
e
l
e
t
c
.
a
n
d
o
t
h
e
r
t
h
i
r
d
p
a
r
t
y
a
p
p
li
c
a
t
i
o
n
s
.
P
o
w
er
E
x
c
h
a
n
g
e:
In
fo
r
m
at
ic
a
P
o
w
er
E
xc
ha
ng
e
as
a
st
an
d
al
o
ne
se
rv
ic
e
or
al
o
ng
wi
th
P
o
w
er
C
en
te
r,
he
lp
s
or
ga
ni
za
ti
o
ns
le
ve
ra
ge
da
ta
by
av
oi
di
ng
m
an
ua
l
co
di
ng
of
da
ta
ex
tr
ac
ti
o
n
pr
og
ra
m
s.
P
o
w
er
E
xc
ha
ng
e
su
p
p
or
ts
ba
tc
h,
re
al
ti
m
e
an
d
ch
an
ge
d
da
ta
ca
pt
ur
e
o
pt
io
ns
in
m
ai
n
fr
a
m
e(
D
B
2,
V
S
A
M
,
I
M
S
et
c.,
),
m
id
ra
ng
e
(A
S
4
0
0
D
B
2
et
c.,
),
an
d
fo
r
re
la
ti
o
na
l
da
ta
ba
se
s
(o
ra
cl
e,
sq
l
se
rv
er
,
d
b
2
et
c)
an
d
fl
at
fil
es
in
u
ni
x,
li
n
ux
an
d
wi
n
d
o
w
s
sy
st
e
m
s.
P
o
w
er
C
h
a
n
n
el
:
T
hi
s
he
lp
s
to
tr
an
sf
er
la
rg
e
a
m
o
u
nt
of
en
cr
yp
te
d
an
d
co
m
pr
es
se
d
da
ta
ov
er
L
A
N,
W
A
N,
th
ro
ug
h
Fi
re
w
all
s,
tr
an
fe
r
fil
es
ov
er
F
T
P,
et
c.
M
et
a
D
at
a
E
x
c
h
a
n
g
e:
M
et
ad
at
a
E
xc
ha
ng
e
en
ab
le
s
or
ga
ni
za
ti
o
ns
to
ta
ke
ad
va
nt
ag
e
of
th
e
ti
m
e
an
d
ef
fo
rt
al
re
ad
y
in
ve
st
ed
in
de
fi
ni
ng
da
ta
st
ru
ct
ur
es
wi
th
in
th
ei
r
IT
en
vi
ro
n
m
en
t
w
he
n
us
ed
wi
th
P
o
w
er
C
en
te
r.
F
or
ex
a
m
pl
e,
an
or
ga
ni
za
ti
o
n
m
ay
be
us
in
g
da
ta
m
o
de
li
ng
to
ol
s,
su
ch
as
Er
wi
n,
E
m
ba
rc
ad
er
o,
O
ra
cl
e
de
si
gn
er
,
Sy
ba
se
P
o
w
er
D
es
ig
ne
r
et
c
fo
r
de
ve
lo
pi
ng
da
ta
m
o
de
ls.
F
u
nc
ti
o
na
l
an
d
te
ch
ni
ca
l
te
a
m
sh
o
ul
d
ha
ve
sp
en
t
m
uc
h
ti
m
e
an
d
ef
fo
rt
in
cr
ea
ti
ng
th
e
da
ta
m
o
de
l's
da
ta
st
ru
ct
ur
es
(t
ab
le
s,
co
lu
m
ns
,
da
ta
ty
pe
s,
pr
oc
ed
ur
es
,
fu
nc
ti
o
ns
,
tri
gg
er
s
et
c).
B
y
us
in
g
m
et
a
de
ta
ex
ch
an
ge
,
th
es
e
da
ta
st
ru
ct
ur
es
ca
n
be
i
m
p
or
te
d
in
to
p
o
w
er
ce
nt
er
to
id
en
tif
iy
so
ur
ce
an
d
ta
rg
et
m
ap
pi
ng
s
w
hi
ch
le
ve
ra
ge
s
ti
m
e
an
d
ef
fo
rt.
T
he
re
is
n
o
ne
ed
fo
r
in
fo
r
m
at
ic
a
de
ve
lo
pe
r
to
cr
ea
te
th
es
e
da
ta
st
ru
ct
ur
es
o
nc
e
ag
ai
n.
P
o
w
er
A
n
al
yz
er
:
P
o
w
er
A
na
ly
ze
r
pr
ov
id
es
or
ga
ni
za
ti
o
ns
wi
th
re
p
or
ti
ng
fa
cil
iti
es
.
P
o
w
er
A
na
ly
ze
r
m
ak
es
ac
ce
ss
in
g,
an
al
yz
in
g,
an
d
sh
ar
in
g
en
te
rp
ri
se
da
ta
si
m
pl
e
an
d
ea
sil
y
av
ail
ab
le
to
de
ci
si
o
n
m
ak
er
s.
P
o
w
er
A
na
ly
ze
r
en
ab
le
s
to
ga
in
in
si
gh
t
in
to
b
us
in
es
s
pr
oc
es
se
s
an
d
de
ve
lo
p
b
us
in
es
s
in
te
lli
ge
nc
e.
W
it
h
P
o
w
er
A
na
ly
ze
r,
an
or
ga
ni
za
ti
o
n
ca
n
ex
tr
ac
t,
fil
te
r,
fo
r
m
at
,
an
d
an
al
yz
e
co
rp
or
at
e
in
fo
r
m
at
io
n
fr
o
m
da
ta
st
or
ed
in
a
da
ta
w
ar
eh
o
us
e,
da
ta
m
ar
t,
o
pe
ra
ti
o
na
l
da
ta
st
or
e,
or
ot
he
rd
at
a
st
or
ag
e
m
o
de
ls.
P
o
w
er
A
na
ly
ze
r
is
be
st
wi
th
a
di
m
en
si
o
na
l
da
ta
w
ar
eh
o
us
e
in
a
re
la
ti
o
na
l
da
ta
ba
se
.
It
ca
n
al
so
ru
n
re
p
or
ts
o
n
da
ta
in
an
y
ta
bl
e
in
a
re
la
ti
o
na
l
da
ta
ba
se
th
at
d
o
n
ot
co
nf
or
m
to
th
e
di
m
en
si
o
na
l
m
o
de
l.
S
u
p
er
G
lu
e:
S
u
pe
rg
lu
e
is
us
ed
fo
r
lo
ad
in
g
m
et
ad
at
a
in
a
ce
nt
ra
liz
ed
pl
ac
e
fr
o
m
se
ve
ra
l
so
ur
ce
s.
R
ep
or
ts
ca
n
be
ru
n
ag
ai
ns
t
th
is
su
pe
rg
lu
e
to
an
al
yz
e
m
et
a
da
ta
.
P
o
w
er
M
a
rt
:
P
o
w
er
M
ar
t
is
a
de
pa
rt
m
en
ta
l
ve
rs
io
n
of
In
fo
r
m
at
ic
a
fo
r
b
ui
ld
in
g,
de
pl
oy
in
g,
an
d
m
an
ag
in
g
da
ta
w
ar
eh
o
us
es
an
d
da
ta
m
ar
ts.
P
o
w
er
ce
nt
er
is
us
ed
fo
r
co
rp
or
at
e
en
te
rp
ri
se
da
ta
w
ar
eh
o
us
e
an
d
p
o
w
er
m
ar
t
is
us
ed
fo
r
de
pa
rt
m
en
ta
l
da
ta
w
ar
eh
o
us
es
li
ke
da
ta
m
ar
ts.
P
o
w
er
C
en
te
r
su
p
p
or
ts
gl
ob
al
re
p
os
it
or
ie
s
an
d
ne
tw
or
ke
d
re
p
os
it
or
ie
s
an
d
it
ca
n
be
co
n
ne
ct
ed
to
se
ve
ra
l
so
ur
ce
s.
P
o
w
er
M
ar
t
su
p
p
or
ts
si
ng
le
re
p
os
it
or
y
an
d
it
ca
n
be
co
n
ne
ct
ed
to
fe
w
er
so
ur
ce
s
w
he
n
co
m
pa
re
d
to
P
o
w
er
C
en
te
r.
P
o
w
er
M
ar
t
ca
n
ex
te
ns
ib
ily
gr
o
w
to
an
en
te
rp
ri
se
i
m
pl
e
m
en
ta
ti
o
n
an
d
it
is
ea
sy
fo
r
de
ve
lo
pe
r
pr
o
d
uc
ti
vi
ty
th
ro
ug
h
a
co
de
le
ss
en
vi
ro
n
m
en
t.
N
ot
e:
T
hi
s
is
n
ot
a
co
m
pl
et
e
tu
to
ri
al
o
n
In
fo
r
m
at
ic
a.
W
e
wi
ll
ad
d
m
or
e
Ti
ps
an
d
G
ui
de
li
ne
s
o
n
In
fo
r
m
at
ic
a
in
ne
ar
fu
tu
re
.
Pl
ea
se
vi
sit
us
so
o
n
to
ch
ec
k
ba
ck
.
T
o
k
n
o
w
m
or
e
ab
o
ut
In
fo
r
m
at
ic
a,
co
nt
ac
t
its
of
fic
ial
w
eb
sit
e
w
w
w.
in
fo
r
m
at
ic
a.
co
m
.
Lbi
software:
What is ETL?
ETL, or Ext
ract, Transform and Load, eases the combination
of heterogeneous sources into a unified central
repository. Usually this repository is a data w
arehouse or mart which will support enterprise
business intelligence.
Extract – read data fro
m multiple source systems into a single format
. This process extracts the data from each nati
ve system and saves it to one target location.
That source data may be any number of database
formats, flat files, or document repositories. Us
ually, the goal is to extract the entire unmodif
ied source system data, though certain checks a
nd filters may be performed here to ensure the
data meets an expected layout or to selective
ly remove data (e.g. potentially confidential in
formation).
Transform – in this step, the data
from the various systems is made consistent an
d linked. Some of the key operations here are:
•Standardization – data is mapped to a consistent set of lookup values (e.g. US, USA, United State
s and blank/null – all mapped to the standard ISO country code)
•Cleansing – perform validity
checks and either remove or modify problem data
•Surrogate keys – new key values applied
to similar data from different source systems prevent key collisions in the future and provide a cr
oss reference across these systems
•Transposing – organizes data to optimize reporting. Man
y source systems are optimized for transactional performance but the warehouse will be primari
ly used for reporting. Often this involves denormalizing and re-organizing into a dimensional m
odel.
Load – the transformed data is now writte
n out to a warehouse/mart. The load process wil
l usually preserve prior data. In some instances
existing warehouse data is never removed, just
marked as inactive. This provides full auditing
and supports historical reporting.
ETL Tools
There are a number of commercial and open source
ETL tools available to assist in any ETL pro
cess. Some of the prominent ones are:
•Business Objects Da
ta Integrator
•Informatica PowerCenter
•IBM InfoSphere DataStage
•Oracle Warehouse Bu
ilder / Data Integrator
•Microsoft SQL Server Integration Services
•Pentaho Data Integratio
n (Open Source)
•Jasper ETL (Open Source)
These tools provide a n
umber of functions to facilitate the ETL workfl
ow. The variety of source data types are handled
automatically. A transformation engine makes i
t easy to create reusable scripts to handle th
e data mapping. Scheduling and error handling a
re also built in.
It is particularly advantage
ous to use an ETL tool in the following situati
ons:
•When there are many source systems to be integrated
•When source systems ar
e in different formats
•When this process needs to be run repeatedly (e.g. daily, hourly, real tim
e)
•To take advantage of pre-built warehouses/marts. Many of these exist for popular platforms
such as PeopleSoft, SAP, JD Edwards.
There are also times where t
he overhead and cost of setting up an ETL tool m
ight not make sense. In these situations some c
ombination of stored procedures, custom coding a
nd off the shelf packages may make more sense. Sc
enarios of this type include:
•One time conversion of data
•A limited number of source systems that share key identifiers