4 ways to optimize powercenter mappings.odt
TRANSCRIPT
-
8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt
1/5
4 Ways to Optimize PowerCenter Mappings
Plenty of sources will tell you that this or that method is the best way to optimize an ETL.
Seldom do they point out the downsides of it, and even rarer do they mention caveats.
Not today! Lets tae a loo at four classic ways to improve performance inInformatica
PowerCenterand, more importantly, what needs to be ept in mind when you do.
"ffload the Effort
The te#tboo answer to improvin$ efficiency of an ETL%
dont let it do the wor. &y usin$ stored procedures, tailored S'L or pushdown optimization,
you can have the source or tar$et (or both!) databases do the wor for you.
The $ood
*&+S are $ood at manipulatin$ their own data. ow for row, you are liely to $et better
throu$hput by runnin$ the same -uery in any *&+S than in any ETL.
nformatica was desi$ned to connect disparate databases, which *&+S tend to flounder at,
but the home advanta$e will always be on the *&+S, so it is lo$ical to let each component do
what they are best at.
The bad
There is a certain amount of stealin$ from Peter to pay Paul in this approach./es, your
mappin$ will be much faster, mostly because all the wor is bein$ done elsewhere. The effort
is still bein$ done, 0ust out of view (dont let it be out of mind).
This re-uires careful handlin$ since you can easily mae the whole system worse by usin$ this
approach% the transactional database should not be used to perform batch processin$, and the
*1 is liely needed for its own processes, too. This techni-ue is best used when the database
doin$ the wor is e#clusive to the ETL (sta$in$ areas and the lie).
The u$ly
2 bi$ advanta$e of ETLs over S'L is that they present the transformations in a step3by3step
visual way. f you place the comple#ity in tailored -ueries or stored procedures, you will lose
this advanta$e, and mae maintenance and future chan$es much harder4 a front3loaded ETL
-
8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt
2/5
with all its lo$ic and comple#ity in the form of a S'L statement will defeat the advanta$es of
havin$ nformatica in the first place!
/ou can have your visual cae and eat it too if you can use Pushdown "ptimization, but bear
in mind, there is only so much lo$ic that can be 5Push*owned6 7 and that it will not save you
from a *&2 screamin$ for your head when your batch ETL sends a three hour -uery to his
server.
Sort "ut the Problem
t is easy to dismiss the 5Sorted nput6 checbo# as a
small thin$ that wont chan$e much 7 now did, and have seen many newcomers to
Power8enter mae the same mistae.
"n the contrary, this checbo# has the potential to solve most of your problems in one wide
sweep. f all your transformations re-uire the same set of eys (or subsets thereof), you
probably can sort once, and then watch as your memory re-uirements drop and your
throu$hput spies.
The $ood
t really does wor. So much so, it is sometimes worth desi$nin$ around a sorted input,
ensurin$ that each transformation receives the data ordered in the precise way it needs it.
Even if the overall mappin$ loos 5out of order6 (say, by a$$re$atin$ before 0oinin$), if the
order is re-uired by the 0oin (because the a$$re$ation uses a set of ey fields, which the 0oin
will subse-uently scramble), the ETL will run faster as lon$ as you can eep usin$ the 5Sorted
nput6 option.
The bad
Sortin$ is e#pensive. This techni-ue will only wor if the same sortin$ can be used by at least
two transformations, and it is not worth desi$nin$ around it unless it will be used by three or
four. *o not fall into the trap of orderin$ before each transformation 0ust so you can chec thebo# 7 the transformation will indeed be blazin$ly fast, but only because the time spent sortin$
was moved to the Sortin$ transformation before it.
-
8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt
3/5
The u$ly
f you desi$n with sortin$ as the main motivator, be sure to document it 7 especially if the
sortin$ is provided by the source. Nothin$ baffles more than wonderin$ why rows are sorted in
some arcane manner, but chan$in$ it causes the mappin$ to fail.
emember that if you are dependin$ on strict orderin$, future chan$es will be more difficult,
since even small chan$es in orderin$ can render transformations unusable 7 but you will only
discover this in e#ecution. Especially, beware of 0oins that mi$ht alter the sortin$ order 0ust by
introducin$ a new field with une#pected values.
+essin$ with +emory
52d0ustin$ the cache6 is the ultimate catch3all solution to
almost any computer problem, double for ETLs. No matter how cluny an ETL is, if you canrun the whole thin$ a$ainst cache, it will fly. 9ive enou$h memory to a problem and it will
disappear, the sayin$ $oes.
The $ood
Power8enter :.; now includes a handy cache calculator, and memory mana$ement.
-
8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt
4/5
&est approach e#perimentally% run a realistic load, then double cache and try a$ain. epeat
until the $ains become insi$nificant compared to the increased cache, or when you run out of
available memory (and always remember that it mi$ht not be the only process runnin$!)
The Smallest +aster
This approach only applies if you have a 0oin between a
lar$e and cumbersome table and a small and a$ile one, but, perhaps unsurprisin$ly, that is a
common occurrence in ETLs.
Ensure the small one is selected as +aster 7 the +aster is the one loaded in cache, while the
*etail is processed as rows arrive. f the +aster doesnt need pa$in$, processin$ both sides of
the 0oin will be much faster.
The $ood
Such a small chan$e, and what a difference it maes. t cuts down your memory use,
throu$hput 0umps, and it wors with sorted inputs but dont depend on them. 2ll for a sin$le
selection that doesnt even re-uire other transformations added 7 0ust some data meta3
nowled$e.
The bad
This approach only wors on a narrow band of scenarios (a common one ll $rant you, but
narrow nonetheless). f your roadbloc is an a$$re$ation, or both sides of the 0oin are too bi$to cache, this will not help you.
The u$ly
mostly approve of the terminolo$y chan$e from =Left and =i$ht to =+aster and =*etail in
0oins, $iven the vertical stacin$ inside the 0oiner transformation, but do wish they had chosen
a different pair of words.
t is seemin$ly inevitable that a developer new to Power8enter will choose the most important
table as +aster, $iven the terminolo$y, when in fact the +aster should always be the smallestof the pair (i.e., the one that re-uires the least amount of cache).
-
8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt
5/5
2ll of the 2bove, and a Salad on the Side
/ou wont find silver bullets anywhere. The best solution is one
that addresses the problem as it is, not as it could be.
Let the sta$in$ database tae care of sortin$ for you, but pile the cache on a sortin$
transformation when the source is a database too crucial to bo$ down.
8onsider carefully a desi$n built around sortin$, and be wary of any sortin$ that only wors 5as
lon$ as the data loos lie it does in development6. >eep in mind what willbe cached (beware
of lar$e looups) and what will be processed againstcached rows 7 mae the prior ones
small, and the latter bi$.
2nd of course, remember the basics of S'L% filter first. Sort before 0oinin$. 2$$re$ate on eys
and inde#es. 2nd i$nore all the previous advice, when the ETL re-uires it.