4 ways to optimize powercenter mappings.odt

Upload: raffaella-dangelo

Post on 02-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt

    1/5

    4 Ways to Optimize PowerCenter Mappings

    Plenty of sources will tell you that this or that method is the best way to optimize an ETL.

    Seldom do they point out the downsides of it, and even rarer do they mention caveats.

    Not today! Lets tae a loo at four classic ways to improve performance inInformatica

    PowerCenterand, more importantly, what needs to be ept in mind when you do.

    "ffload the Effort

    The te#tboo answer to improvin$ efficiency of an ETL%

    dont let it do the wor. &y usin$ stored procedures, tailored S'L or pushdown optimization,

    you can have the source or tar$et (or both!) databases do the wor for you.

    The $ood

    *&+S are $ood at manipulatin$ their own data. ow for row, you are liely to $et better

    throu$hput by runnin$ the same -uery in any *&+S than in any ETL.

    nformatica was desi$ned to connect disparate databases, which *&+S tend to flounder at,

    but the home advanta$e will always be on the *&+S, so it is lo$ical to let each component do

    what they are best at.

    The bad

    There is a certain amount of stealin$ from Peter to pay Paul in this approach./es, your

    mappin$ will be much faster, mostly because all the wor is bein$ done elsewhere. The effort

    is still bein$ done, 0ust out of view (dont let it be out of mind).

    This re-uires careful handlin$ since you can easily mae the whole system worse by usin$ this

    approach% the transactional database should not be used to perform batch processin$, and the

    *1 is liely needed for its own processes, too. This techni-ue is best used when the database

    doin$ the wor is e#clusive to the ETL (sta$in$ areas and the lie).

    The u$ly

    2 bi$ advanta$e of ETLs over S'L is that they present the transformations in a step3by3step

    visual way. f you place the comple#ity in tailored -ueries or stored procedures, you will lose

    this advanta$e, and mae maintenance and future chan$es much harder4 a front3loaded ETL

  • 8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt

    2/5

    with all its lo$ic and comple#ity in the form of a S'L statement will defeat the advanta$es of

    havin$ nformatica in the first place!

    /ou can have your visual cae and eat it too if you can use Pushdown "ptimization, but bear

    in mind, there is only so much lo$ic that can be 5Push*owned6 7 and that it will not save you

    from a *&2 screamin$ for your head when your batch ETL sends a three hour -uery to his

    server.

    Sort "ut the Problem

    t is easy to dismiss the 5Sorted nput6 checbo# as a

    small thin$ that wont chan$e much 7 now did, and have seen many newcomers to

    Power8enter mae the same mistae.

    "n the contrary, this checbo# has the potential to solve most of your problems in one wide

    sweep. f all your transformations re-uire the same set of eys (or subsets thereof), you

    probably can sort once, and then watch as your memory re-uirements drop and your

    throu$hput spies.

    The $ood

    t really does wor. So much so, it is sometimes worth desi$nin$ around a sorted input,

    ensurin$ that each transformation receives the data ordered in the precise way it needs it.

    Even if the overall mappin$ loos 5out of order6 (say, by a$$re$atin$ before 0oinin$), if the

    order is re-uired by the 0oin (because the a$$re$ation uses a set of ey fields, which the 0oin

    will subse-uently scramble), the ETL will run faster as lon$ as you can eep usin$ the 5Sorted

    nput6 option.

    The bad

    Sortin$ is e#pensive. This techni-ue will only wor if the same sortin$ can be used by at least

    two transformations, and it is not worth desi$nin$ around it unless it will be used by three or

    four. *o not fall into the trap of orderin$ before each transformation 0ust so you can chec thebo# 7 the transformation will indeed be blazin$ly fast, but only because the time spent sortin$

    was moved to the Sortin$ transformation before it.

  • 8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt

    3/5

    The u$ly

    f you desi$n with sortin$ as the main motivator, be sure to document it 7 especially if the

    sortin$ is provided by the source. Nothin$ baffles more than wonderin$ why rows are sorted in

    some arcane manner, but chan$in$ it causes the mappin$ to fail.

    emember that if you are dependin$ on strict orderin$, future chan$es will be more difficult,

    since even small chan$es in orderin$ can render transformations unusable 7 but you will only

    discover this in e#ecution. Especially, beware of 0oins that mi$ht alter the sortin$ order 0ust by

    introducin$ a new field with une#pected values.

    +essin$ with +emory

    52d0ustin$ the cache6 is the ultimate catch3all solution to

    almost any computer problem, double for ETLs. No matter how cluny an ETL is, if you canrun the whole thin$ a$ainst cache, it will fly. 9ive enou$h memory to a problem and it will

    disappear, the sayin$ $oes.

    The $ood

    Power8enter :.; now includes a handy cache calculator, and memory mana$ement.

  • 8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt

    4/5

    &est approach e#perimentally% run a realistic load, then double cache and try a$ain. epeat

    until the $ains become insi$nificant compared to the increased cache, or when you run out of

    available memory (and always remember that it mi$ht not be the only process runnin$!)

    The Smallest +aster

    This approach only applies if you have a 0oin between a

    lar$e and cumbersome table and a small and a$ile one, but, perhaps unsurprisin$ly, that is a

    common occurrence in ETLs.

    Ensure the small one is selected as +aster 7 the +aster is the one loaded in cache, while the

    *etail is processed as rows arrive. f the +aster doesnt need pa$in$, processin$ both sides of

    the 0oin will be much faster.

    The $ood

    Such a small chan$e, and what a difference it maes. t cuts down your memory use,

    throu$hput 0umps, and it wors with sorted inputs but dont depend on them. 2ll for a sin$le

    selection that doesnt even re-uire other transformations added 7 0ust some data meta3

    nowled$e.

    The bad

    This approach only wors on a narrow band of scenarios (a common one ll $rant you, but

    narrow nonetheless). f your roadbloc is an a$$re$ation, or both sides of the 0oin are too bi$to cache, this will not help you.

    The u$ly

    mostly approve of the terminolo$y chan$e from =Left and =i$ht to =+aster and =*etail in

    0oins, $iven the vertical stacin$ inside the 0oiner transformation, but do wish they had chosen

    a different pair of words.

    t is seemin$ly inevitable that a developer new to Power8enter will choose the most important

    table as +aster, $iven the terminolo$y, when in fact the +aster should always be the smallestof the pair (i.e., the one that re-uires the least amount of cache).

  • 8/10/2019 4 Ways to Optimize PowerCenter Mappings.odt

    5/5

    2ll of the 2bove, and a Salad on the Side

    /ou wont find silver bullets anywhere. The best solution is one

    that addresses the problem as it is, not as it could be.

    Let the sta$in$ database tae care of sortin$ for you, but pile the cache on a sortin$

    transformation when the source is a database too crucial to bo$ down.

    8onsider carefully a desi$n built around sortin$, and be wary of any sortin$ that only wors 5as

    lon$ as the data loos lie it does in development6. >eep in mind what willbe cached (beware

    of lar$e looups) and what will be processed againstcached rows 7 mae the prior ones

    small, and the latter bi$.

    2nd of course, remember the basics of S'L% filter first. Sort before 0oinin$. 2$$re$ate on eys

    and inde#es. 2nd i$nore all the previous advice, when the ETL re-uires it.