kbase paper

246
KBASE PAPER ETL Services: ETL - extract, transform and load - is the set of processes by which data is extracted from numerous databases, applications and systems, transformed as appropriate, and loaded into target systems - including, but not limited to, data warehouses, data marts, analytical applications, etc. The first part of the extract, transform and load (ETL) process is understanding the data sources. The transformations are organization-specific and Integration is sometimes included in the ETL process; because it requires an in-depth knowledge of the organization and its business. More than half of all development work for data warehousing projects is typically dedicated to the design and implementation of ETL processes. Poorly designed ETL processes are costly to maintain, change and update, so it is critical it is to make the right choices in terms of the right technology and tools that will be used for developing and maintaining the etl processes. K n o w l

Upload: aruk1990

Post on 11-Jul-2015

145 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Kbase paper

KBASE PAPER

ETL Services:

ETL - extract, transform and load - is the set of processes by which data is extracted from numerous databases, applications and systems, transformed as appropriate, and loaded into target systems - including, but not limited to, data warehouses, data marts, analytical applications, etc.

The first part of the extract, transform and load (ETL) process is understanding the data sources. The transformations are organization-specific and Integration is sometimes included in the ETL process; because it requires an in-depth knowledge of the organization and its business.

More than half of all development work for data warehousing projects is typically dedicated to the design and implementation of ETL processes. Poorly designed ETL processes are costly to maintain, change and update, so it is critical it is to make the right choices in terms of the right technology and tools that will be used for developing and maintaining the etl processes.

Knowl

Page 2: Kbase paper

edgeBase

consultants

has

vast experience

dev

Page 3: Kbase paper

eloping

ETL

complying

with

Star

Schema/Dim

Page 4: Kbase paper

ensional/Snowflake

approach

or

Normalized

a

Page 5: Kbase paper

pproach

depending

on

the

DataWarehouse

des

Page 6: Kbase paper

ign.

KnowledgeBase

consultants

have

experti

Page 7: Kbase paper

se

across

Extract, Transform

&

Load

(ETL)

t

Page 8: Kbase paper

ools

that are

effectively

used

to

extract an

Page 9: Kbase paper

d

unite

data

from

disparate

sources

and

d

Page 10: Kbase paper

eliver

meaningful and

actionable

business

in

Page 11: Kbase paper

telligence

across

the

organization.

ETL Pro

Page 12: Kbase paper

cess Management

KnowledgeBase handles the ent

Page 13: Kbase paper

ire data movement lifecycle – from source syst

Page 14: Kbase paper

ems to the staging area to the datawarehouse

Page 15: Kbase paper

to the final data mart.

Defining data source

Page 16: Kbase paper

s and the relevant mappings

1.Mainte

nance and monitoring of transforma

Page 17: Kbase paper

tion scripts

2.Creati

on of new transformation scripts for the new business requirements

3.Prepar

ation of batch scripts

Page 18: Kbase paper

Update

mapping

specifications

with

meta

d

Page 19: Kbase paper

ata

Update

monitoring

and

responsibility

Page 20: Kbase paper

processes

on

source

data

changes

Ensure

Page 21: Kbase paper

accurate

and

efficient

ETL

process

in

prod

Page 22: Kbase paper

uction

1.Rectifi

cation of a broken ETL process

2.Updati

on of the ETL process

3.Stream

line/ eliminate disjoi

Page 23: Kbase paper

nted extraction and transformation programs

4.Valida

tion routines to check for data consistency between source and

Page 24: Kbase paper

destination

Performance

tuning, error

handling

and

Page 25: Kbase paper

interdependent

scheduling

Some of the nume

Page 26: Kbase paper

rous technological approaches and solutions ava

Page 27: Kbase paper

ilable on the market include:

1.Traditi

onal engine-based ETL products

2.RDB

M

Page 28: Kbase paper

S proprietary solutions

3.Third-

generation ELT solutions, based on a code-generation approach that uses the power of

Page 29: Kbase paper

the RDBMS engines to perform the data transformations

Some of the key techno

Page 30: Kbase paper

logies used by KnowledgeBase for ETL are:

1.Micros

oft

Page 31: Kbase paper

SQL Server 2005 Integration Services

2.Pentah

o ETL

3.SAP

ETL

4.Data

Stage

5.Oracle

OWB

E

Page 32: Kbase paper

TL and Data Warehousi

Page 33: Kbase paper

ng from information m

Page 34: Kbase paper

anagement:E

T

L

C

o

n

ce

pt

s

E

xt

ra

cti

o

n,

tr

an

sf

Page 35: Kbase paper

or

m

at

io

n,

an

d

lo

ad

in

g.

E

T

L

re

fe

rs

to

th

e

m

et

h

o

ds

in

vo

lv

ed

in

ac

ce

ss

in

Page 36: Kbase paper

g

an

d

m

an

ip

ul

at

in

g

so

ur

ce

da

ta

an

d

lo

ad

in

g

it

in

to

ta

rg

et

da

ta

ba

se

.

T

Page 37: Kbase paper

he

fir

st

st

ep

in

E

T

L

pr

oc

es

s

is

m

ap

pi

ng

th

e

da

ta

be

tw

ee

n

so

ur

ce

sy

st

e

m

s

Page 38: Kbase paper

an

d

ta

rg

et

da

ta

ba

se

(d

at

a

w

ar

eh

o

us

e

or

da

ta

m

ar

t).

T

he

se

co

n

d

st

ep

is

cl

Page 39: Kbase paper

ea

ns

in

g

of

so

ur

ce

da

ta

in

st

ag

in

g

ar

ea

.

T

he

th

ir

d

st

ep

is

tr

an

sf

or

m

in

g

cl

Page 40: Kbase paper

ea

ns

ed

so

ur

ce

da

ta

an

d

th

en

lo

ad

in

g

in

to

th

e

ta

rg

et

sy

st

e

m

.

N

ot

e

th

at

Page 41: Kbase paper

E

T

T

(e

xt

ra

cti

o

n,

tr

an

sf

or

m

at

io

n,

tr

an

sp

or

ta

ti

o

n)

an

d

E

T

M

(e

xt

ra

cti

Page 42: Kbase paper

o

n,

tr

an

sf

or

m

at

io

n,

m

ov

e)

ar

e

so

m

et

i

m

es

us

ed

in

st

ea

d

of

E

T

L.

S

o

Page 43: Kbase paper

u

rc

e

S

ys

te

m

A

da

ta

ba

se

,

ap

pl

ic

at

io

n,

fil

e,

or

ot

he

r

st

or

ag

e

fa

cil

it

y

fr

Page 44: Kbase paper

o

m

w

hi

ch

th

e

da

ta

in

a

da

ta

w

ar

eh

o

us

e

is

de

ri

ve

d.

M

a

p

pi

n

g

T

he

de

Page 45: Kbase paper

fi

ni

ti

o

n

of

th

e

re

la

ti

o

ns

hi

p

an

d

da

ta

fl

o

w

be

tw

ee

n

so

ur

ce

an

d

ta

rg

et

Page 46: Kbase paper

ob

je

ct

s.

M

et

a

d

at

a

D

at

a

th

at

de

sc

ri

be

s

da

ta

an

d

ot

he

r

st

ru

ct

ur

es

,

Page 47: Kbase paper

su

ch

as

ob

je

ct

s,

b

us

in

es

s

ru

le

s,

an

d

pr

oc

es

se

s.

F

or

ex

a

m

pl

e,

th

e

sc

he

m

Page 48: Kbase paper

a

de

si

gn

of

a

da

ta

w

ar

eh

o

us

e

is

ty

pi

ca

lly

st

or

ed

in

a

re

p

os

it

or

y

as

m

et

ad

Page 49: Kbase paper

at

a,

w

hi

ch

is

us

ed

to

ge

ne

ra

te

sc

ri

pt

s

us

ed

to

b

ui

ld

an

d

p

o

p

ul

at

e

th

e

da

Page 50: Kbase paper

ta

w

ar

eh

o

us

e.

A

re

p

os

it

or

y

co

nt

ai

ns

m

et

ad

at

a.

St

a

gi

n

g

A

re

a

A

pl

Page 51: Kbase paper

ac

e

w

he

re

da

ta

is

pr

oc

es

se

d

be

fo

re

en

te

ri

ng

th

e

w

ar

eh

o

us

e.

Cl

e

a

n

si

Page 52: Kbase paper

n

g

T

he

pr

oc

es

s

of

re

so

lvi

ng

in

co

ns

ist

en

ci

es

an

d

fi

xi

ng

th

e

an

o

m

ali

es

in

so

Page 53: Kbase paper

ur

ce

da

ta

,

ty

pi

ca

lly

as

pa

rt

of

th

e

E

T

L

pr

oc

es

s.

T

r

a

n

sf

o

r

m

at

io

n

Page 54: Kbase paper

T

he

pr

oc

es

s

of

m

an

ip

ul

at

in

g

da

ta

.

A

ny

m

an

ip

ul

at

io

n

be

yo

n

d

co

py

in

g

Page 55: Kbase paper

is

a

tr

an

sf

or

m

at

io

n.

E

xa

m

pl

es

in

cl

u

de

cl

ea

ns

in

g,

ag

gr

eg

at

in

g,

an

d

in

te

Page 56: Kbase paper

gr

at

in

g

da

ta

fr

o

m

m

ul

ti

pl

e

so

ur

ce

s.

T

r

a

n

s

p

o

rt

at

io

n

T

he

pr

oc

Page 57: Kbase paper

es

s

of

m

ov

in

g

co

pi

ed

or

tr

an

sf

or

m

ed

da

ta

fr

o

m

a

so

ur

ce

to

a

da

ta

w

ar

eh

o

Page 58: Kbase paper

us

e.

T

a

rg

et

S

ys

te

m

A

da

ta

ba

se

,

ap

pl

ic

at

io

n,

fil

e,

or

ot

he

r

st

or

ag

e

fa

Page 59: Kbase paper

cil

it

y

to

w

hi

ch

th

e

"t

ra

ns

fo

r

m

ed

so

ur

ce

da

ta

"

is

lo

ad

ed

in

a

da

ta

w

ar

eh

o

Page 60: Kbase paper

us

e.

I

nf

o

r

m

at

ic

a

In

fo

r

m

at

ic

a

is

a

p

o

w

er

fu

l

E

T

L

to

ol

fr

o

m

Page 61: Kbase paper

In

fo

r

m

at

ic

a

C

or

p

or

at

io

n,

a

le

ad

in

g

pr

ov

id

er

of

en

te

rp

ri

se

da

ta

in

te

gr

Page 62: Kbase paper

at

io

n

so

ft

w

ar

e

an

d

E

T

L

so

ft

w

ar

es

.

T

he

i

m

p

or

ta

nt

In

fo

r

m

at

ic

a

C

Page 63: Kbase paper

o

m

p

o

ne

nt

s

ar

e:

P

o

w

e

r

E

x

c

h

a

n

g

e

P

o

w

e

r

C

e

n

t

e

r

Page 64: Kbase paper

P

o

w

e

r

C

e

n

t

e

r

C

o

n

n

e

c

t

P

o

w

e

r

E

x

c

h

a

n

g

e

P

o

w

Page 65: Kbase paper

e

r

C

h

a

n

n

e

l

M

e

t

a

d

a

t

a

E

x

c

h

a

n

g

e

P

o

w

e

r

A

n

a

l

Page 66: Kbase paper

y

z

e

r

S

u

p

e

r

G

l

u

e

In

In

fo

r

m

at

ic

a,

all

th

e

M

et

ad

at

a

in

fo

Page 67: Kbase paper

r

m

at

io

n

ab

o

ut

so

ur

ce

sy

st

e

m

s,

ta

rg

et

sy

st

e

m

s

an

d

tr

an

sf

or

m

at

io

ns

Page 68: Kbase paper

ar

e

st

or

ed

in

th

e

In

fo

r

m

at

ic

a

re

p

os

it

or

y.

In

fo

r

m

at

ic

a'

s

P

o

w

er

C

Page 69: Kbase paper

en

te

r

Cl

ie

nt

an

d

R

ep

os

it

or

y

Se

rv

er

ac

ce

ss

th

is

re

p

os

it

or

y

to

st

or

e

an

d

Page 70: Kbase paper

re

tri

ev

e

m

et

ad

at

a.

N

ot

e:

T

o

k

n

o

w

m

or

e

ab

o

ut

M

et

ad

at

a

an

d

its

si

Page 71: Kbase paper

gn

ifi

ca

nc

e,

pl

ea

se

cli

ck

he

re

.

S

o

u

rc

e

a

n

d

T

a

rg

et

:

C

o

ns

id

er

a

B

Page 72: Kbase paper

an

k

th

at

ha

s

go

t

m

an

y

br

an

ch

es

th

ro

ug

h

o

ut

th

e

w

or

ld

.

In

ea

ch

br

an

ch

da

Page 73: Kbase paper

ta

m

ay

be

st

or

ed

in

di

ff

er

en

t

so

ur

ce

sy

st

e

m

s

li

ke

or

ac

le,

sq

l

se

rv

er

,

te

rr

Page 74: Kbase paper

ad

at

a,

et

c.

W

he

n

th

e

B

an

k

de

ci

de

s

to

in

te

gr

at

e

its

da

ta

fr

o

m

se

ve

ra

l

so

Page 75: Kbase paper

ur

ce

s

fo

r

its

m

an

ag

e

m

en

t

de

ci

si

o

ns

,

it

m

ay

ch

oo

se

o

ne

or

m

or

e

sy

st

e

Page 76: Kbase paper

m

s

li

ke

or

ac

le,

sq

l

se

rv

er

,

te

rr

ad

at

a,

et

c.

as

its

da

ta

w

ar

eh

o

us

e

ta

rg

et.

M

Page 77: Kbase paper

an

y

or

ga

ni

sa

ti

o

ns

pr

ef

er

In

fo

r

m

at

ic

a

to

d

o

th

at

E

T

L

pr

oc

es

s,

be

ca

us

Page 78: Kbase paper

e

In

fo

r

m

at

ic

a

is

m

or

e

p

o

w

er

fu

l

in

de

si

gn

in

g

an

d

b

ui

ld

in

g

da

ta

w

Page 79: Kbase paper

ar

eh

o

us

es

.

It

ca

n

co

n

ne

ct

to

se

ve

ra

l

so

ur

ce

s

an

d

ta

rg

et

s

to

ex

tr

ac

t

m

Page 80: Kbase paper

et

a

da

ta

fr

o

m

so

ur

ce

s

an

d

ta

rg

et

s,

tr

an

sf

or

m

an

d

lo

ad

th

e

da

ta

in

to

ta

rg

Page 81: Kbase paper

et

sy

st

e

m

s.

G

ui

d

el

in

es

to

w

o

r

k

w

it

h

I

nf

o

r

m

at

ic

a

P

o

w

er

C

Page 82: Kbase paper

e

nt

er

R

e

p

o

s

i

t

o

r

y

:

T

h

i

s

i

s

w

h

e

r

e

a

ll

t

h

e

m

e

t

a

d

Page 83: Kbase paper

a

t

a

i

n

f

o

r

m

a

t

i

o

n

i

s

s

t

o

r

e

d

i

n

t

h

e

I

n

f

o

r

m

a

t

i

Page 84: Kbase paper

c

a

s

u

i

t

e

.

T

h

e

P

o

w

e

r

C

e

n

t

e

r

C

li

e

n

t

a

n

d

t

h

e

R

e

p

o

Page 85: Kbase paper

s

i

t

o

r

y

S

e

r

v

e

r

w

o

u

l

d

a

c

c

e

s

s

t

h

i

s

r

e

p

o

s

i

t

o

r

y

Page 86: Kbase paper

t

o

r

e

t

r

i

e

v

e

,

s

t

o

r

e

a

n

d

m

a

n

a

g

e

m

e

t

a

d

a

t

a

.

P

Page 87: Kbase paper

o

w

e

r

C

e

n

t

e

r

C

l

i

e

n

t

:

I

n

f

o

r

m

a

t

i

c

a

c

li

e

n

t

i

s

u

s

Page 88: Kbase paper

e

d

f

o

r

m

a

n

a

g

i

n

g

u

s

e

r

s

,

i

d

e

n

t

i

f

i

y

i

n

g

s

o

u

r

c

e

Page 89: Kbase paper

a

n

d

t

a

r

g

e

t

s

y

s

t

e

m

s

d

e

f

i

n

i

t

i

o

n

s

,

c

r

e

a

t

i

n

g

m

Page 90: Kbase paper

a

p

p

i

n

g

a

n

d

m

a

p

p

l

e

t

s

,

c

r

e

a

t

i

n

g

s

e

s

s

i

o

n

s

a

n

d

Page 91: Kbase paper

r

u

n

w

o

r

k

f

l

o

w

s

e

t

c

.

R

e

p

o

s

i

t

o

r

y

S

e

r

v

e

r

:

T

Page 92: Kbase paper

h

i

s

r

e

p

o

s

i

t

o

r

y

s

e

r

v

e

r

t

a

k

e

s

c

a

r

e

o

f

a

ll

t

h

e

c

o

Page 93: Kbase paper

n

n

e

c

t

i

o

n

s

b

e

t

w

e

e

n

t

h

e

r

e

p

o

s

i

t

o

r

y

a

n

d

t

h

e

P

Page 94: Kbase paper

o

w

e

r

C

e

n

t

e

r

C

li

e

n

t

.

P

o

w

e

r

C

e

n

t

e

r

S

e

r

v

e

r

:

P

Page 95: Kbase paper

o

w

e

r

C

e

n

t

e

r

s

e

r

v

e

r

d

o

e

s

t

h

e

e

x

t

r

a

c

t

i

o

n

f

r

o

m

Page 96: Kbase paper

s

o

u

r

c

e

a

n

d

t

h

e

n

l

o

a

d

i

n

g

d

a

t

a

i

n

t

o

t

a

r

g

e

t

s

.

Page 97: Kbase paper

D

e

s

i

g

n

e

r

:

S

o

u

r

c

e

A

n

a

l

y

z

e

r

,

M

a

p

p

i

n

g

D

e

s

i

g

Page 98: Kbase paper

n

e

r

a

n

d

W

a

r

e

h

o

u

s

e

D

e

s

i

g

n

e

r

a

r

e

t

o

o

l

s

r

e

s

i

d

e

Page 99: Kbase paper

w

i

t

h

i

n

t

h

e

D

e

s

i

g

n

e

r

w

i

z

a

r

d

.

S

o

u

r

c

e

A

n

a

l

y

z

Page 100: Kbase paper

e

r

i

s

u

s

e

d

f

o

r

e

x

t

r

a

c

t

i

n

g

m

e

t

a

d

a

t

a

f

r

o

m

s

o

u

Page 101: Kbase paper

r

c

e

s

y

s

t

e

m

s

.

M

a

p

p

i

n

g

D

e

s

i

g

n

e

r

i

s

u

s

e

d

t

o

c

r

e

Page 102: Kbase paper

a

t

e

m

a

p

p

i

n

g

b

e

t

w

e

e

n

s

o

u

r

c

e

s

a

n

d

t

a

r

g

e

t

s

.

M

Page 103: Kbase paper

a

p

p

i

n

g

i

s

a

p

i

c

t

o

r

i

a

l

r

e

p

r

e

s

e

n

t

a

t

i

o

n

a

b

o

u

Page 104: Kbase paper

t

t

h

e

f

l

o

w

o

f

d

a

t

a

f

r

o

m

s

o

u

r

c

e

t

o

t

a

r

g

e

t

.

W

a

Page 105: Kbase paper

r

e

h

o

u

s

e

D

e

s

i

g

n

e

r

i

s

u

s

e

d

f

o

r

e

x

t

r

a

c

t

i

n

g

m

e

t

Page 106: Kbase paper

a

d

a

t

a

f

r

o

m

t

a

r

g

e

t

s

y

s

t

e

m

s

o

r

m

e

t

a

d

a

t

a

c

a

n

Page 107: Kbase paper

b

e

c

r

e

a

t

e

d

i

n

t

h

e

D

e

s

i

g

n

e

r

i

t

s

e

l

f

.

D

a

t

a

Page 108: Kbase paper

C

l

e

a

n

s

i

n

g

:

T

h

e

P

o

w

e

r

C

e

n

t

e

r

'

s

d

a

t

a

c

l

e

a

n

s

i

Page 109: Kbase paper

n

g

t

e

c

h

n

o

l

o

g

y

i

m

p

r

o

v

e

s

d

a

t

a

q

u

a

li

t

y

b

y

v

a

li

d

a

Page 110: Kbase paper

t

i

n

g

,

c

o

r

r

e

c

t

l

y

n

a

m

i

n

g

a

n

d

s

t

a

n

d

a

r

d

i

z

a

t

i

o

Page 111: Kbase paper

n

o

f

a

d

d

r

e

s

s

d

a

t

a

.

A

p

e

r

s

o

n

'

s

a

d

d

r

e

s

s

m

a

y

n

Page 112: Kbase paper

o

t

b

e

s

a

m

e

i

n

a

ll

s

o

u

r

c

e

s

y

s

t

e

m

s

b

e

c

a

u

s

e

o

f

t

y

Page 113: Kbase paper

p

o

s

a

n

d

p

o

s

t

a

l

c

o

d

e

,

c

i

t

y

n

a

m

e

m

a

y

n

o

t

m

a

t

c

h

w

Page 114: Kbase paper

i

t

h

a

d

d

r

e

s

s

.

T

h

e

s

e

e

r

r

o

r

s

c

a

n

b

e

c

o

r

r

e

c

t

e

d

Page 115: Kbase paper

b

y

u

s

i

n

g

d

a

t

a

c

l

e

a

n

s

i

n

g

p

r

o

c

e

s

s

a

n

d

s

t

a

n

d

a

r

Page 116: Kbase paper

d

i

z

e

d

d

a

t

a

c

a

n

b

e

l

o

a

d

e

d

i

n

t

a

r

g

e

t

s

y

s

t

e

m

s

Page 117: Kbase paper

(

d

a

t

a

w

a

r

e

h

o

u

s

e

)

.

T

r

a

n

s

f

o

r

m

a

t

i

o

n

:

T

r

a

n

Page 118: Kbase paper

s

f

o

r

m

a

t

i

o

n

s

h

e

l

p

t

o

t

r

a

n

s

f

o

r

m

t

h

e

s

o

u

r

c

e

d

Page 119: Kbase paper

a

t

a

a

c

c

o

r

d

i

n

g

t

o

t

h

e

r

e

q

u

i

r

e

m

e

n

t

s

o

f

t

a

r

g

e

t

Page 120: Kbase paper

s

y

s

t

e

m

.

S

o

r

t

i

n

g

,

F

il

t

e

r

i

n

g

,

A

g

g

r

e

g

a

t

i

o

n

,

J

Page 121: Kbase paper

o

i

n

i

n

g

a

r

e

s

o

m

e

o

f

t

h

e

e

x

a

m

p

l

e

s

o

f

t

r

a

n

s

f

o

r

m

Page 122: Kbase paper

a

t

i

o

n

.

T

r

a

n

s

f

o

r

m

a

t

i

o

n

s

e

n

s

u

r

e

t

h

e

q

u

a

li

t

y

o

Page 123: Kbase paper

f

t

h

e

d

a

t

a

b

e

i

n

g

l

o

a

d

e

d

i

n

t

o

t

a

r

g

e

t

a

n

d

t

h

i

s

i

Page 124: Kbase paper

s

d

o

n

e

d

u

r

i

n

g

t

h

e

m

a

p

p

i

n

g

p

r

o

c

e

s

s

f

r

o

m

s

o

u

r

Page 125: Kbase paper

c

e

t

o

t

a

r

g

e

t

.

W

o

r

k

f

l

o

w

M

a

n

a

g

e

r

:

W

o

r

k

f

l

o

Page 126: Kbase paper

w

h

e

l

p

s

t

o

l

o

a

d

t

h

e

d

a

t

a

f

r

o

m

s

o

u

r

c

e

t

o

t

a

r

g

Page 127: Kbase paper

e

t

i

n

a

s

e

q

u

e

n

t

i

a

l

m

a

n

n

e

r

.

F

o

r

e

x

a

m

p

l

e

,

i

f

t

Page 128: Kbase paper

h

e

f

a

c

t

t

a

b

l

e

s

a

r

e

l

o

a

d

e

d

b

e

f

o

r

e

t

h

e

l

o

o

k

u

p

t

Page 129: Kbase paper

a

b

l

e

s

,

t

h

e

n

t

h

e

t

a

r

g

e

t

s

y

s

t

e

m

w

il

l

p

o

p

u

p

a

n

Page 130: Kbase paper

e

r

r

o

r

m

e

s

s

a

g

e

s

i

n

c

e

t

h

e

f

a

c

t

t

a

b

l

e

i

s

v

i

o

l

a

Page 131: Kbase paper

t

i

n

g

t

h

e

f

o

r

e

i

g

n

k

e

y

v

a

li

d

a

t

i

o

n

.

T

o

a

v

o

i

d

t

h

Page 132: Kbase paper

i

s

,

w

o

r

k

f

l

o

w

s

c

a

n

b

e

c

r

e

a

t

e

d

t

o

e

n

s

u

r

e

t

h

e

c

Page 133: Kbase paper

o

r

r

e

c

t

f

l

o

w

o

f

d

a

t

a

f

r

o

m

s

o

u

r

c

e

t

o

t

a

r

g

e

t

.

Page 134: Kbase paper

W

o

r

k

f

l

o

w

M

o

n

i

t

o

r

:

T

h

i

s

m

o

n

i

t

o

r

i

s

h

e

l

p

f

u

Page 135: Kbase paper

l

i

n

m

o

n

i

t

o

r

i

n

g

a

n

d

t

r

a

c

k

i

n

g

t

h

e

w

o

r

k

f

l

o

w

s

Page 136: Kbase paper

c

r

e

a

t

e

d

i

n

e

a

c

h

P

o

w

e

r

C

e

n

t

e

r

S

e

r

v

e

r

.

P

o

w

Page 137: Kbase paper

e

r

C

e

n

t

e

r

C

o

n

n

e

c

t

:

T

h

i

s

c

o

m

p

o

n

e

n

t

h

e

l

p

s

t

o

e

Page 138: Kbase paper

x

t

r

a

c

t

d

a

t

a

a

n

d

m

e

t

a

d

a

t

a

f

r

o

m

E

R

P

s

y

s

t

e

m

s

Page 139: Kbase paper

li

k

e

I

B

M

'

s

M

Q

S

e

r

i

e

s

,

P

e

o

p

l

e

s

o

f

t

,

S

A

P

,

S

i

e

b

e

Page 140: Kbase paper

l

e

t

c

.

a

n

d

o

t

h

e

r

t

h

i

r

d

p

a

r

t

y

a

p

p

li

c

a

t

i

o

n

s

.

Page 141: Kbase paper

P

o

w

e

r

C

e

n

t

e

r

E

x

c

h

a

n

g

e

:

T

h

i

s

c

o

m

p

o

n

e

n

t

h

e

l

p

Page 142: Kbase paper

s

t

o

e

x

t

r

a

c

t

d

a

t

a

a

n

d

m

e

t

a

d

a

t

a

f

r

o

m

E

R

P

s

y

s

Page 143: Kbase paper

t

e

m

s

li

k

e

I

B

M

'

s

M

Q

S

e

r

i

e

s

,

P

e

o

p

l

e

s

o

f

t

,

S

A

P

,

S

Page 144: Kbase paper

i

e

b

e

l

e

t

c

.

a

n

d

o

t

h

e

r

t

h

i

r

d

p

a

r

t

y

a

p

p

li

c

a

t

i

o

n

Page 145: Kbase paper

s

.

P

o

w

er

E

x

c

h

a

n

g

e:

In

fo

r

m

at

ic

a

P

o

w

er

E

xc

ha

ng

e

as

a

Page 146: Kbase paper

st

an

d

al

o

ne

se

rv

ic

e

or

al

o

ng

wi

th

P

o

w

er

C

en

te

r,

he

lp

s

or

ga

ni

za

ti

o

ns

Page 147: Kbase paper

le

ve

ra

ge

da

ta

by

av

oi

di

ng

m

an

ua

l

co

di

ng

of

da

ta

ex

tr

ac

ti

o

n

pr

og

ra

m

s.

P

o

Page 148: Kbase paper

w

er

E

xc

ha

ng

e

su

p

p

or

ts

ba

tc

h,

re

al

ti

m

e

an

d

ch

an

ge

d

da

ta

ca

pt

ur

e

o

pt

Page 149: Kbase paper

io

ns

in

m

ai

n

fr

a

m

e(

D

B

2,

V

S

A

M

,

I

M

S

et

c.,

),

m

id

ra

ng

e

(A

S

4

0

0

Page 150: Kbase paper

D

B

2

et

c.,

),

an

d

fo

r

re

la

ti

o

na

l

da

ta

ba

se

s

(o

ra

cl

e,

sq

l

se

rv

er

,

d

b

2

Page 151: Kbase paper

et

c)

an

d

fl

at

fil

es

in

u

ni

x,

li

n

ux

an

d

wi

n

d

o

w

s

sy

st

e

m

s.

P

o

w

er

C

Page 152: Kbase paper

h

a

n

n

el

:

T

hi

s

he

lp

s

to

tr

an

sf

er

la

rg

e

a

m

o

u

nt

of

en

cr

yp

te

d

an

d

co

Page 153: Kbase paper

m

pr

es

se

d

da

ta

ov

er

L

A

N,

W

A

N,

th

ro

ug

h

Fi

re

w

all

s,

tr

an

fe

r

fil

es

ov

er

F

T

Page 154: Kbase paper

P,

et

c.

M

et

a

D

at

a

E

x

c

h

a

n

g

e:

M

et

ad

at

a

E

xc

ha

ng

e

en

ab

le

s

or

Page 155: Kbase paper

ga

ni

za

ti

o

ns

to

ta

ke

ad

va

nt

ag

e

of

th

e

ti

m

e

an

d

ef

fo

rt

al

re

ad

y

in

ve

st

ed

in

Page 156: Kbase paper

de

fi

ni

ng

da

ta

st

ru

ct

ur

es

wi

th

in

th

ei

r

IT

en

vi

ro

n

m

en

t

w

he

n

us

ed

wi

th

P

o

Page 157: Kbase paper

w

er

C

en

te

r.

F

or

ex

a

m

pl

e,

an

or

ga

ni

za

ti

o

n

m

ay

be

us

in

g

da

ta

m

o

de

li

ng

Page 158: Kbase paper

to

ol

s,

su

ch

as

Er

wi

n,

E

m

ba

rc

ad

er

o,

O

ra

cl

e

de

si

gn

er

,

Sy

ba

se

P

o

w

er

D

es

Page 159: Kbase paper

ig

ne

r

et

c

fo

r

de

ve

lo

pi

ng

da

ta

m

o

de

ls.

F

u

nc

ti

o

na

l

an

d

te

ch

ni

ca

l

te

a

Page 160: Kbase paper

m

sh

o

ul

d

ha

ve

sp

en

t

m

uc

h

ti

m

e

an

d

ef

fo

rt

in

cr

ea

ti

ng

th

e

da

ta

m

o

de

l's

Page 161: Kbase paper

da

ta

st

ru

ct

ur

es

(t

ab

le

s,

co

lu

m

ns

,

da

ta

ty

pe

s,

pr

oc

ed

ur

es

,

fu

nc

ti

o

ns

,

tri

Page 162: Kbase paper

gg

er

s

et

c).

B

y

us

in

g

m

et

a

de

ta

ex

ch

an

ge

,

th

es

e

da

ta

st

ru

ct

ur

es

ca

n

be

i

Page 163: Kbase paper

m

p

or

te

d

in

to

p

o

w

er

ce

nt

er

to

id

en

tif

iy

so

ur

ce

an

d

ta

rg

et

m

ap

pi

ng

s

w

hi

Page 164: Kbase paper

ch

le

ve

ra

ge

s

ti

m

e

an

d

ef

fo

rt.

T

he

re

is

n

o

ne

ed

fo

r

in

fo

r

m

at

ic

a

de

ve

lo

Page 165: Kbase paper

pe

r

to

cr

ea

te

th

es

e

da

ta

st

ru

ct

ur

es

o

nc

e

ag

ai

n.

P

o

w

er

A

n

al

yz

er

:

P

Page 166: Kbase paper

o

w

er

A

na

ly

ze

r

pr

ov

id

es

or

ga

ni

za

ti

o

ns

wi

th

re

p

or

ti

ng

fa

cil

iti

es

.

P

o

w

Page 167: Kbase paper

er

A

na

ly

ze

r

m

ak

es

ac

ce

ss

in

g,

an

al

yz

in

g,

an

d

sh

ar

in

g

en

te

rp

ri

se

da

ta

si

m

Page 168: Kbase paper

pl

e

an

d

ea

sil

y

av

ail

ab

le

to

de

ci

si

o

n

m

ak

er

s.

P

o

w

er

A

na

ly

ze

r

en

ab

le

s

Page 169: Kbase paper

to

ga

in

in

si

gh

t

in

to

b

us

in

es

s

pr

oc

es

se

s

an

d

de

ve

lo

p

b

us

in

es

s

in

te

lli

ge

Page 170: Kbase paper

nc

e.

W

it

h

P

o

w

er

A

na

ly

ze

r,

an

or

ga

ni

za

ti

o

n

ca

n

ex

tr

ac

t,

fil

te

r,

fo

r

Page 171: Kbase paper

m

at

,

an

d

an

al

yz

e

co

rp

or

at

e

in

fo

r

m

at

io

n

fr

o

m

da

ta

st

or

ed

in

a

da

ta

w

Page 172: Kbase paper

ar

eh

o

us

e,

da

ta

m

ar

t,

o

pe

ra

ti

o

na

l

da

ta

st

or

e,

or

ot

he

rd

at

a

st

or

ag

e

m

o

Page 173: Kbase paper

de

ls.

P

o

w

er

A

na

ly

ze

r

is

be

st

wi

th

a

di

m

en

si

o

na

l

da

ta

w

ar

eh

o

us

e

in

a

Page 174: Kbase paper

re

la

ti

o

na

l

da

ta

ba

se

.

It

ca

n

al

so

ru

n

re

p

or

ts

o

n

da

ta

in

an

y

ta

bl

e

in

a

Page 175: Kbase paper

re

la

ti

o

na

l

da

ta

ba

se

th

at

d

o

n

ot

co

nf

or

m

to

th

e

di

m

en

si

o

na

l

m

o

de

Page 176: Kbase paper

l.

S

u

p

er

G

lu

e:

S

u

pe

rg

lu

e

is

us

ed

fo

r

lo

ad

in

g

m

et

ad

at

a

in

a

ce

nt

Page 177: Kbase paper

ra

liz

ed

pl

ac

e

fr

o

m

se

ve

ra

l

so

ur

ce

s.

R

ep

or

ts

ca

n

be

ru

n

ag

ai

ns

t

th

is

su

pe

Page 178: Kbase paper

rg

lu

e

to

an

al

yz

e

m

et

a

da

ta

.

P

o

w

er

M

a

rt

:

P

o

w

er

M

ar

t

is

a

de

pa

Page 179: Kbase paper

rt

m

en

ta

l

ve

rs

io

n

of

In

fo

r

m

at

ic

a

fo

r

b

ui

ld

in

g,

de

pl

oy

in

g,

an

d

m

an

ag

Page 180: Kbase paper

in

g

da

ta

w

ar

eh

o

us

es

an

d

da

ta

m

ar

ts.

P

o

w

er

ce

nt

er

is

us

ed

fo

r

co

rp

or

at

e

Page 181: Kbase paper

en

te

rp

ri

se

da

ta

w

ar

eh

o

us

e

an

d

p

o

w

er

m

ar

t

is

us

ed

fo

r

de

pa

rt

m

en

ta

l

Page 182: Kbase paper

da

ta

w

ar

eh

o

us

es

li

ke

da

ta

m

ar

ts.

P

o

w

er

C

en

te

r

su

p

p

or

ts

gl

ob

al

re

p

os

Page 183: Kbase paper

it

or

ie

s

an

d

ne

tw

or

ke

d

re

p

os

it

or

ie

s

an

d

it

ca

n

be

co

n

ne

ct

ed

to

se

ve

ra

l

Page 184: Kbase paper

so

ur

ce

s.

P

o

w

er

M

ar

t

su

p

p

or

ts

si

ng

le

re

p

os

it

or

y

an

d

it

ca

n

be

co

n

ne

Page 185: Kbase paper

ct

ed

to

fe

w

er

so

ur

ce

s

w

he

n

co

m

pa

re

d

to

P

o

w

er

C

en

te

r.

P

o

w

er

M

ar

t

Page 186: Kbase paper

ca

n

ex

te

ns

ib

ily

gr

o

w

to

an

en

te

rp

ri

se

i

m

pl

e

m

en

ta

ti

o

n

an

d

it

is

ea

sy

fo

Page 187: Kbase paper

r

de

ve

lo

pe

r

pr

o

d

uc

ti

vi

ty

th

ro

ug

h

a

co

de

le

ss

en

vi

ro

n

m

en

t.

N

ot

e:

T

Page 188: Kbase paper

hi

s

is

n

ot

a

co

m

pl

et

e

tu

to

ri

al

o

n

In

fo

r

m

at

ic

a.

W

e

wi

ll

ad

d

m

or

e

Ti

Page 189: Kbase paper

ps

an

d

G

ui

de

li

ne

s

o

n

In

fo

r

m

at

ic

a

in

ne

ar

fu

tu

re

.

Pl

ea

se

vi

sit

us

so

o

n

Page 190: Kbase paper

to

ch

ec

k

ba

ck

.

T

o

k

n

o

w

m

or

e

ab

o

ut

In

fo

r

m

at

ic

a,

co

nt

ac

t

its

of

fic

ial

Page 191: Kbase paper

w

eb

sit

e

w

w

w.

in

fo

r

m

at

ic

a.

co

m

.

Lbi

Page 192: Kbase paper

software:

What is ETL?

ETL, or Ext

Page 193: Kbase paper

ract, Transform and Load, eases the combination

Page 194: Kbase paper

of heterogeneous sources into a unified central

Page 195: Kbase paper

repository. Usually this repository is a data w

Page 196: Kbase paper

arehouse or mart which will support enterprise

Page 197: Kbase paper

business intelligence.

Extract – read data fro

Page 198: Kbase paper

m multiple source systems into a single format

Page 199: Kbase paper

. This process extracts the data from each nati

Page 200: Kbase paper

ve system and saves it to one target location.

Page 201: Kbase paper

That source data may be any number of database

Page 202: Kbase paper

formats, flat files, or document repositories. Us

Page 203: Kbase paper

ually, the goal is to extract the entire unmodif

Page 204: Kbase paper

ied source system data, though certain checks a

Page 205: Kbase paper

nd filters may be performed here to ensure the

Page 206: Kbase paper

data meets an expected layout or to selective

Page 207: Kbase paper

ly remove data (e.g. potentially confidential in

Page 208: Kbase paper

formation).

Transform – in this step, the data

Page 209: Kbase paper

from the various systems is made consistent an

Page 210: Kbase paper

d linked. Some of the key operations here are:

Page 211: Kbase paper

•Standardization – data is mapped to a consistent set of lookup values (e.g. US, USA, United State

Page 212: Kbase paper

s and blank/null – all mapped to the standard ISO country code)

•Cleansing – perform validity

Page 213: Kbase paper

checks and either remove or modify problem data

•Surrogate keys – new key values applied

Page 214: Kbase paper

to similar data from different source systems prevent key collisions in the future and provide a cr

Page 215: Kbase paper

oss reference across these systems

•Transposing – organizes data to optimize reporting. Man

Page 216: Kbase paper

y source systems are optimized for transactional performance but the warehouse will be primari

Page 217: Kbase paper

ly used for reporting. Often this involves denormalizing and re-organizing into a dimensional m

Page 218: Kbase paper

odel.

Load – the transformed data is now writte

Page 219: Kbase paper

n out to a warehouse/mart. The load process wil

Page 220: Kbase paper

l usually preserve prior data. In some instances

Page 221: Kbase paper

existing warehouse data is never removed, just

Page 222: Kbase paper

marked as inactive. This provides full auditing

Page 223: Kbase paper

and supports historical reporting.

ETL Tools

Page 224: Kbase paper

There are a number of commercial and open source

Page 225: Kbase paper

ETL tools available to assist in any ETL pro

Page 226: Kbase paper

cess. Some of the prominent ones are:

•Business Objects Da

Page 227: Kbase paper

ta Integrator

•Informatica PowerCenter

•IBM InfoSphere DataStage

•Oracle Warehouse Bu

Page 228: Kbase paper

ilder / Data Integrator

•Microsoft SQL Server Integration Services

•Pentaho Data Integratio

Page 229: Kbase paper

n (Open Source)

•Jasper ETL (Open Source)

These tools provide a n

Page 230: Kbase paper

umber of functions to facilitate the ETL workfl

Page 231: Kbase paper

ow. The variety of source data types are handled

Page 232: Kbase paper

automatically. A transformation engine makes i

Page 233: Kbase paper

t easy to create reusable scripts to handle th

Page 234: Kbase paper

e data mapping. Scheduling and error handling a

Page 235: Kbase paper

re also built in.

It is particularly advantage

Page 236: Kbase paper

ous to use an ETL tool in the following situati

Page 237: Kbase paper

ons:

•When there are many source systems to be integrated

•When source systems ar

Page 238: Kbase paper

e in different formats

•When this process needs to be run repeatedly (e.g. daily, hourly, real tim

Page 239: Kbase paper

e)

•To take advantage of pre-built warehouses/marts. Many of these exist for popular platforms

Page 240: Kbase paper

such as PeopleSoft, SAP, JD Edwards.

There are also times where t

Page 241: Kbase paper

he overhead and cost of setting up an ETL tool m

Page 242: Kbase paper

ight not make sense. In these situations some c

Page 243: Kbase paper

ombination of stored procedures, custom coding a

Page 244: Kbase paper

nd off the shelf packages may make more sense. Sc

Page 245: Kbase paper

enarios of this type include:

•One time conversion of data

Page 246: Kbase paper

•A limited number of source systems that share key identifiers