sap data services 4.x cookbook

Post on 07-Jul-2016

811 Views

Category:

Documents

412 Downloads

Preview:

Click to see full reader

DESCRIPTION

Dataservices Book

TRANSCRIPT

SAPDataServices4.xCookbook

TableofContentsSAPDataServices4.xCookbook

Credits

AbouttheAuthor

AbouttheReviewers

www.PacktPub.com

Supportfiles,eBooks,discountoffers,andmoreWhysubscribe?FreeaccessforPacktaccountholdersInstantupdatesonnewPacktbooks

Preface

WhatthisbookcoversWhatyouneedforthisbookWhothisbookisforSectionsGettingreadyHowtodoit…Howitworks…There’smore…Seealso

ConventionsReaderfeedbackCustomersupportDownloadingtheexamplecodeDownloadingthecolorimagesofthisbookErrataPiracyQuestions

1.IntroductiontoETLDevelopment

IntroductionPreparingadatabaseenvironmentGettingreadyHowtodoit…Howitworks…

CreatingasourcesystemdatabaseHowtodoit…Howitworks…There’smore…

DefiningandcreatingstagingareastructuresHowtodoit…

Flatfiles

RDBMStablesHowitworks…

CreatingatargetdatawarehouseGettingreadyHowtodoit…Howitworks…There’smore…

2.ConfiguringtheDataServicesEnvironment

IntroductionCreatingIPSandDataServicesrepositoriesGettingready…Howtodoit…Howitworks…Seealso

InstallingandconfiguringInformationPlatformServicesGettingready…Howtodoit…Howitworks…

InstallingandconfiguringDataServicesGettingready…Howtodoit…Howitworks…

ConfiguringuseraccessGettingready…Howtodoit…Howitworks…

StartingandstoppingservicesHowtodoit…Howitworks…Seealso

AdministeringtasksHowtodoit…Howitworks…Seealso

UnderstandingtheDesignertoolGettingready…Howtodoit…Howitworks…

ExecutingETLcodeinDataServicesValidatingETLcodeTemplatetablesQuerytransformbasicsTheHelloWorldexample

3.DataServicesBasics–DataTypes,ScriptingLanguage,andFunctions

IntroductionCreatingvariablesandparametersGettingreadyHowtodoit…Howitworks…There’smore…

CreatingascriptHowtodoit…Howitworks…

UsingstringfunctionsHowtodoit…

UsingstringfunctionsinthescriptHowitworks…There’smore…

UsingdatefunctionsHowtodoit…

GeneratingcurrentdateandtimeExtractingpartsfromdates

Howitworks…There’smore…

UsingconversionfunctionsHowtodoit…Howitworks…There’smore…

UsingdatabasefunctionsHowtodoit…

key_generation()total_rows()sql()

Howitworks…UsingaggregatefunctionsHowtodoit…Howitworks…

UsingmathfunctionsHowtodoit…Howitworks…There’smore…

UsingmiscellaneousfunctionsHowtodoit…Howitworks…

CreatingcustomfunctionsHowtodoit…Howitworks…There’smore…

4.Dataflow–Extract,Transform,andLoad

IntroductionCreatingasourcedataobjectHowtodoit…Howitworks…There’smore…

CreatingatargetdataobjectGettingreadyHowtodoit…Howitworks…There’smore…

LoadingdataintoaflatfileHowtodoit…Howitworks…There’smore…

LoadingdatafromaflatfileHowtodoit…Howitworks…There’smore…

Loadingdatafromtabletotable–lookupsandjoinsHowtodoit…Howitworks…

UsingtheMap_OperationtransformHowtodoit…Howitworks…

UsingtheTable_ComparisontransformGettingreadyHowtodoit…Howitworks…

ExploringtheAutocorrectloadoptionGettingreadyHowtodoit…Howitworks…

SplittingtheflowofdatawiththeCasetransformGettingreadyHowtodoit…Howitworks…

MonitoringandanalyzingdataflowexecutionGettingreadyHowtodoit…Howitworks…There’smore…

5.Workflow–ControllingExecutionOrder

IntroductionCreatingaworkflowobjectHowtodoit…

Howitworks…NestingworkflowstocontroltheexecutionorderGettingreadyHowtodoitHowitworks…

UsingconditionalandwhileloopobjectstocontroltheexecutionorderGettingreadyHowtodoit…Howitworks…Thereismore…

UsingthebypassingfeatureGettingready…Howtodoit…Howitworks…Thereismore…

Controllingfailures–try-catchobjectsHowtodoit…Howitworks…

Usecaseexample–populatingdimensiontablesGettingreadyHowtodoit…Howitworks…

MappingDependenciesDevelopmentExecutionorderTestingETL

PreparingtestdatatopopulateDimSalesTerritoryPreparingtestdatatopopulateDimGeography

UsingacontinuousworkflowHowtodoit…Howitworks…Thereismore…Peekinginsidetherepository–parent-childrelationshipsbetweenData

ServicesobjectsGettingreadyHowtodoit…Howitworks…

GetalistofobjecttypesandtheircodesintheDataServicesrepository

DisplayinformationabouttheDF_Transform_DimGeographydataflow

DisplayinformationabouttheSalesTerritorytableobjectSeethecontentsofthescriptobject

6.Job–BuildingtheETLArchitecture

IntroductionProjectsandjobs–organizingETLGettingreadyHowtodoit…Howitworks…

HierarchicalobjectviewHistoryexecutionlogfilesExecuting/schedulingjobsfromtheManagementConsole

UsingobjectreplicationHowtodoit…Howitworks…

MigratingETLcodethroughthecentralrepositoryGettingreadyHowtodoit…Howitworks…

AddingobjectstoandfromtheCentralObjectLibraryComparingobjectsbetweentheLocalandCentralrepositories

Thereismore…MigratingETLcodewithexport/importGettingreadyHowtodoit…

Import/ExportusingATLfilesDirectexporttoanotherlocalrepository

Howitworks…DebuggingjobexecutionGettingready…Howtodoit…Howitworks…

MonitoringjobexecutionGettingreadyHowtodoit…Howitworks…

BuildinganexternalETLauditandauditreportingGettingready…Howtodoit…Howitworks…

Usingbuilt-inDataServicesETLauditandreportingfunctionalityGettingreadyHowtodoit…Howitworks…

AutoDocumentationinDataServicesHowtodoit…Howitworks…

7.ValidatingandCleansingData

Introduction

CreatingvalidationfunctionsGettingreadyHowtodoit…Howitworks…

UsingvalidationfunctionswiththeValidationtransformGettingreadyHowtodoit…Howitworks…

ReportingdatavalidationresultsGettingreadyHowtodoit…Howitworks…

UsingregularexpressionsupporttovalidatedataGettingreadyHowtodoit…Howitworks…

EnablingdataflowauditGettingreadyHowtodoit…Howitworks…There’smore…

DataQualitytransforms–cleansingyourdataGettingreadyHowtodoit…Howitworks…There’smore…

8.OptimizingETLPerformance

IntroductionOptimizingdataflowexecution–push-downtechniquesGettingreadyHowtodoit…Howitworks…

Optimizingdataflowexecution–theSQLtransformHowtodoit…Howitworks…

Optimizingdataflowexecution–theData_TransfertransformGettingreadyHowtodoit…Howitworks…

WhyweusedasecondData_TransfertransformobjectWhentouseData_Transfertransform

There’smore…Optimizingdataflowreaders–lookupmethodsGettingreadyHowtodoit…

LookupwiththeQuerytransformjoinLookupwiththelookup_ext()functionLookupwiththesql()function

Howitworks…Querytransformjoinslookup_ext()sql()Performancereview

Optimizingdataflowloaders–bulk-loadingmethodsHowtodoit…Howitworks…

Whentoenablebulkloading?Optimizingdataflowexecution–performanceoptionsGettingreadyHowtodoit…

DataflowperformanceoptionsSourcetableperformanceoptionsQuerytransformperformanceoptionslookup_ext()performanceoptionsTargettableperformanceoptions

9.AdvancedDesignTechniques

IntroductionChangeDataCapturetechniquesGettingready

NohistorySCD(Type1)LimitedhistorySCD(Type3)UnlimitedhistorySCD(Type2)

Howtodoit…Howitworks…

Source-basedETLCDCTarget-basedETLCDCNativeCDC

AutomaticjobrecoveryinDataServicesGettingreadyHowtodoit…Howitworks…There’smore…

SimplifyingETLexecutionwithsystemconfigurationsGettingreadyHowtodoit…Howitworks…

TransformingdatawiththePivottransformGettingreadyHowtodoit…Howitworks…

10.DevelopingReal-timeJobs

IntroductionWorkingwithnestedstructuresGettingreadyHowtodoit…Howitworks…Thereismore…

TheXML_MaptransformGettingreadyHowtodoit…Howitworks…

TheHierarchy_FlatteningtransformGettingreadyHowtodoit…

HorizontalhierarchyflatteningVerticalhierarchyflattening

Howitworks…Queryingresulttables

ConfiguringAccessServerGettingreadyHowtodoit…Howitworks…

Creatingreal-timejobsGettingready

InstallingSoapUIHowtodoit…Howitworks…

11.WorkingwithSAPApplications

IntroductionLoadingdataintoSAPERPGettingreadyHowtodoit…Howitworks…

IDocMonitoringIDocloadontheSAPsidePost-loadvalidationofloadeddata

Thereismore…

12.IntroductiontoInformationSteward

IntroductionExploringDataInsightcapabilitiesGettingreadyHowtodoit…

CreatingaconnectionobjectProfilingthedata

ViewingprofilingresultsCreatingavalidationruleCreatingascorecard

Howitworks…ProfilingRulesScorecards

Thereismore…PerformingMetadataManagementtasksGettingreadyHowtodoit…Howitworks…

WorkingwiththeMetapediafunctionalityHowtodoit…Howitworks…

CreatingacustomcleansingpackagewithCleansingPackageBuilderGettingreadyHowtodoit…Howitworks…Thereismore…

Index

SAPDataServices4.xCookbook

SAPDataServices4.xCookbookCopyright©2015PacktPublishing

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

Firstpublished:November2015

Productionreference:1261115

PublishedbyPacktPublishingLtd.

LiveryPlace

35LiveryStreet

BirminghamB32PB,UK.

ISBN978-1-78217-656-5

www.packtpub.com

CreditsAuthor

IvanShomnikov

Reviewers

AndrésAguadoAranda

DickGroenhof

BernardTimbalDuclauxdeMartin

SridharSunkaraneni

MeenakshiVerma

CommissioningEditor

VinayArgekar

AcquisitionEditors

ShaonBasu

KevinColaco

ContentDevelopmentEditor

MerintMathew

TechnicalEditor

HumeraShaikh

CopyEditors

BrandtD’mello

ShrutiIyer

KarunaNarayanan

SameenSiddiqui

ProjectCoordinator

FrancinaPinto

Proofreader

SafisEditing

Indexer

MonicaAjmeraMehta

ProductionCoordinator

NileshMohite

CoverWork

NileshMohite

AbouttheAuthorIvanShomnikovisanSAPanalyticsconsultantspecializingintheareaofExtract,Transform,andLoad(ETL).Hehasin-depthknowledgeofthedatawarehouselifecycleprocesses(DWHdesignandETLdevelopment)andextensivehands-onexperiencewithboththeSAPEnterpriseInformationManagement(DataServices)technologystackandtheSAPBusinessObjectsreportingproductsstack(WebIntelligence,Designer,Dashboards).

IvanhasbeeninvolvedintheimplementationofcomplexBIsolutionsontheSAPBusinessObjectsEnterpriseplatforminmajorNewZealandcompaniesacrossdifferentindustries.HealsohasastrongbackgroundasanOracledatabaseadministratoranddeveloper.

Thisismyfirstexperienceofwritingabook,andIwouldliketothankmypartnerandmysonfortheirpatienceandsupport.

AbouttheReviewersAndrésAguadoArandaisa26-year-oldcomputerengineerfromSpain.Hisexperiencehasgivenhimareallytechnicalbackgroundindatabases,datawarehouse,andbusinessintelligence.

Andréshasworkedindifferentbusinesssectors,suchasbanking,publicadministrations,andenergy,since2012indata-relatedpositions.

Thisbookismyfirststintasareviewer,andithasbeenreallyinterestingandvaluabletome,bothpersonallyandprofessionally.

IwouldliketothankmyfamilyandfriendsforalwaysbeingwillingtohelpmewhenIneeded.Also,Iwouldliketothanktomyformercoworkerandcurrentlyfriend,AntonioMartín-Cobos,aBIreportinganalystwhoreallyhelpedmegetthisopportunity.

DickGroenhofstartedhisprofessionalcareerin1990afterfinishinghisstudiesinbusinessinformationscienceatVrijeUniversiteitAmsterdam.Havingworkedasasoftwaredeveloperandservicemanagementconsultantforthefirstpartofhiscareer,hebecameactiveasaconsultantinthebusinessintelligencearenasince2005.

DickhasbeenaleadconsultantonnumerousSAPBIprojects,designingandimplementingsuccessfulsolutionsforhiscustomers,whoregardhimasatrustedadvisor.Hiscorecompetencesincludebothfrontend(suchasWebIntelligence,CrystalReports,andSAPDesignStudio)andbackendtools(suchasSAPDataServicesandInformationSteward).DickisanearlyadopteroftheSAPHANAplatform,creatinginnovativesolutionsusingHANAInformationViews,PredictiveAnalysisLibrary,andSQLScript.

HeisaCertifiedApplicationAssociateinSAPHANAandSAPBusinessObjectsWebIntelligence4.1.Currently,DickworksasseniorHANAandbigdataconsultantforahighlyrespectedandinnovativeSAPpartnerintheNetherlands.

HeisastrongbelieverinsharinghisknowledgewithregardtoSAPHANAandSAPDataServicesbywritingblogs(athttp://www.dickgroenhof.comandhttp://www.thenextview.nl/blog)andspeakingatseminars.

DickishappilymarriedtoEmmaandisaveryproudfatherofhisson,Christiaan,anddaughter,Myrthe.

BernardTimbalDuclauxdeMartinisabusinessintelligencearchitectandtechnicalexpertwithmorethan15yearsofexperience.Hehasbeeninvolvedinseverallargebusinessintelligencesystemdeploymentsandadministrationinbankingandinsurancecompanies.Inaddition,Bernardhasskillsinmodeling,dataextraction,transformation,loading,andreportingdesign.Hehasauthoredfourbooks,includingtworegardingSAPBusinessObjectsEnterpriseadministration.

MeenakshiVermahasbeenapartoftheITindustrysince1998.SheisanexperiencedbusinesssystemsspecialisthavingtheCBAPandTOGAFcertifications.Meenakshiiswell-versedwithavarietyoftoolsandtechniquesusedforbusinessanalysis,suchasSAPBI,SAPBusinessObjects,Java/J2EEtechnologies,andothers.SheiscurrentlybasedinToronto,Canada,andworkswithaleadingutilitycompany.

MeenakshihashelpedtechnicallyreviewmanybookspublishedbyPacktPublishingacrossvariousenterprisesolutions.HerearlierworksincludeJasperReportsforJavaDevelopers,JavaEE5DevelopmentusingGlassFishApplicationServer,PracticalDataAnalysisandReportingwithBIRT,EJB3DeveloperGuide,LearningDojo,andIBMWebSphereApplicationServer8.0AdministrationGuide.

I’dliketothankmyfather,Mr.BhopalSingh,andmother,Mrs.RajBala,forlayingastrongfoundationinmeandgivingmetheirunconditionalloveandsupport.Ialsoowethanksandgratitudetomyhusband,AtulVerma,forhisencouragementandsupportthroughoutthereviewingofthisbookandmanyothers;myten-year-oldson,PrieyaanshVerma,forgivingmethewarmthofhislovedespitemyhecticschedules;andmybrother,SachinSingh,foralwaysbeingthereforme.

www.PacktPub.com

Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.

DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<service@packtpub.com>formoredetails.

Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.

https://www2.packtpub.com/books/subscription/packtlib

DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.

Whysubscribe?

FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser

FreeaccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.

InstantupdatesonnewPacktbooksGetnotified!Findoutwhennewbooksarepublishedbyfollowing@PacktEnterpriseonTwitterorthePacktEnterpriseFacebookpage.

PrefaceSAPDataServicesdeliversanenterprise-classsolutiontobuilddataintegrationprocessesaswellasperformdataqualityanddataprofilingtasks,allowingyoutogovernyourdatainahighly-efficientway.

SomeofthetasksthatDataServiceshelpsaccomplishinclude:migrationofthedatabetweendatabasesorapplications,extractingdatafromvarioussourcesystemsintoflatfiles,datacleansing,datatransformationusingeithercommondatabase-likefunctionsorcomplexcustom-builtfunctionsthatarecreatedusinganinternalscriptinglanguage,andofcourse,loadingdataintoyourdatawarehouseorexternalsystems.SAPDataServiceshasanintuitiveuser-friendlygraphicalinterface,allowingyoutoaccessallitspowerfulExtract,Transform,andLoad(ETL)capabilitiesfromthesingleDesignertool.However,gettingstartedwithSAPDataServicescanbedifficult,especiallyforpeoplewhohavelittleornoexperienceinETLdevelopment.Thegoalofthisbookistoguideyouthrougheasy-to-understandexamplesofbuildingyourownETLarchitecture.Thebookcanalsobeusedasareferencetoperformspecifictasksasitprovidesreal-worldexamplesofusingthetooltosolvedataintegrationproblems.

WhatthisbookcoversChapter1,IntroductiontoETLDevelopment,explainswhatExtract,Transform,andLoad(ETL)processesare,andwhatroleDataServicesplaysinETLdevelopment.Itincludesthestepstoconfigurethedatabaseenvironmentusedinrecipesofthebook.

Chapter2,ConfiguringtheDataServicesEnvironment,explainshowtoinstallandconfigureallDataServicescomponentsandapplications.ItintroducestheDataServicesdevelopmentGUI—theDesignertool—withthesimpleexampleof“HelloWorld”ETLcode.

Chapter3,DataServicesBasics–DataTypes,ScriptingLanguage,andFunctions,introducesthereadertoDataServicesinternalscriptinglanguage.ItexplainsvariouscategoriesoffunctionsthatareavailableinDataServices,andgivesthereaderanexampleofhowscriptinglanguagecanbeusedtocreatecustomfunctions.

Chapter4,Dataflow–Extract,Transform,andLoad,introducesthemostimportantprocessingunitinDataService,dataflowobject,andthemostusefultypesoftransformationsthatcanbeperformedinsideadataflow.Itgivesthereaderexamplesofextractingdatafromsourcesystemsandloadingdataintotargetdatastructures.

Chapter5,Workflow–ControllingExecutionOrder,introducesanotherDataServicesobject,workflow,whichisusedtogroupotherworkflows,dataflows,andscriptobjectsintoexecutionunits.ItexplainstheconditionalandloopstructuresavailableinDataServices.

Chapter6,Job–BuildingtheETLArchitecture,bringsthereadertothejobobjectlevelandreviewsthestepsusedinthedevelopmentprocesstomakeasuccessfulandrobustETLsolution.ItcoversthemonitoringanddebuggingfunctionalityavailableinDataServicesandembeddedauditfeatures.

Chapter7,ValidatingandCleansingData,introducestheconceptsofvalidatingmethods,whichcanbeappliedtothedatapassingthroughtheETLprocessesinordertocleanseandconformitaccordingtothedefinedDataQualitystandards.

Chapter8,OptimizingETLPerformance,isoneofthefirstadvancedchapters,whichstartsexplainingcomplexETLdevelopmenttechniques.ThisparticularchapterhelpstheuserunderstandhowtheexistingprocessescanbeoptimizedfurtherinDataServicesinordertomakesurethattheyrunquicklyandefficiently,consumingaslesscomputerresourcesaspossiblewiththeleastamountofexecutiontime.

Chapter9,AdvancedDesignTechniques,guidesthereaderthroughadvanceddatatransformationtechniques.ItintroducesconceptsofChangeDataCapturemethodsthatareavailableinDataServices,pivotingtransformations,andautomaticrecoveryconcepts.

Chapter10,DevelopingReal-timeJobs,introducestheconceptofnestedstructuresandthetransformsthatworkwithnestedstructures.ItcoversthemainsaspectsofhowtheycanbecreatedandusedinDataServicesreal-timejobs.ItalsointroducesnewaDataServicescomponent—AccessServer.

Chapter11,WorkingwithSAPApplications,isdedicatedtothetopicofreadingand

loadingdatafromSAPsystemswiththeexampleoftheSAPERPsystem.Itpresentsthereal-lifeusecaseofloadingdataintotheSAPERPsystemmodule.

Chapter12,IntroductiontoInformationSteward,coversanotherSAPproduct,InformationSteward,whichaccompaniesDataServicesandprovidesacomprehensiveviewoftheorganization’sdata,andhelpsvalidateandcleanseitbyapplyingDataQualitymethods.

WhatyouneedforthisbookTousetheexamplesgiveninthisbook,youwillneedtodownloadandmakesurethatyouarelicensedtousethefollowingsoftwareproducts:

SQLServerExpress2012SAPDataServices4.2SP4orhigherSAPInformationSteward4.2SP4orhigherSAPERP(ECC)SoapUI—5.2.0

WhothisbookisforThebookwillbeusefultoapplicationdevelopersanddatabaseadministratorswhowanttogetfamiliarwithETLdevelopmentusingSAPDataServices.ItcanalsobeusefultoETLdevelopersorconsultantswhowanttoimproveandextendtheirknowledgeofthistool.ThebookcanalsobeusefultodataandbusinessanalystswhowanttotakeapeekatthebackendofBIdevelopment.TheonlyrequirementofthisbookisthatyouarefamiliarwiththeSQLlanguageandgeneraldatabaseconcepts.Knowledgeofanykindofprogramminglanguagewillbeabenefitaswell.

SectionsInthisbook,youwillfindseveralheadingsthatappearfrequently(Gettingready,Howtodoit,Howitworks,There’smore,andSeealso).

Togiveclearinstructionsonhowtocompletearecipe,weusethesesectionsasfollows:

GettingreadyThissectiontellsyouwhattoexpectintherecipe,anddescribeshowtosetupanysoftwareoranypreliminarysettingsrequiredfortherecipe.

Howtodoit…Thissectioncontainsthestepsrequiredtofollowtherecipe.

Howitworks…Thissectionusuallyconsistsofadetailedexplanationofwhathappenedintheprevioussection.

There’smore…Thissectionconsistsofadditionalinformationabouttherecipeinordertomakethereadermoreknowledgeableabouttherecipe.

SeealsoThissectionprovideshelpfullinkstootherusefulinformationfortherecipe.

ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.

Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“Wecanincludeothercontextsthroughtheuseoftheincludedirective.”

Ablockofcodeissetasfollows:select*

fromdbo.al_langtexttxt

JOINdbo.al_parent_childpc

ontxt.parent_objid=pc.descen_obj_key

where

pc.descen_obj=‘WF_continuous’;

Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:AlGUIComment(“ActaName_1”=‘RSavedAfterCheckOut’,“ActaName_2”=‘RDate_created’,“ActaName_3”=‘RDate_modified’,“ActaValue_1”=‘YES’,“ActaValue_2”=‘SatJul0416:52:332015’,“ActaValue_3”=‘SunJul0511:18:022015’,“x”=’-1’,“y”=’-1’)

CREATEPLANWF_continuous::‘7bb26cd4-3e0c-412a-81f3-b5fdd687f507’()

DECLARE

$l_DirectoryVARCHAR(255);

$l_FileVARCHAR(255);

BEGIN

AlGUIComment(“UI_DATA_XML”=’<UIDATA><MAINICON><LOCATION><X>0</X>

<Y>0</Y></LOCATION><SIZE><CX>216</CX><CY>-179</CY></SIZE></MAINICON>

<DESCRIPTION><LOCATION><X>0</X><Y>-190</Y></LOCATION><SIZE><CX>200</CX>

<CY>200</CY></SIZE><VISIBLE>0</VISIBLE></DESCRIPTION></U

IDATA>’,“ui_display_name”=‘script’,“ui_script_text”=’$l_Directory='C:\\AW\\Files\\';

$l_File='flag.txt';

$g_count=$g_count+1;

print('Execution#'||$g_count);

print('Starting'||workflow_name()||'…');

sleep(10000);

print('Finishing'||workflow_name()||'…');’,“x”=‘116’,“y”=’-175’)

BEGIN_SCRIPT

$l_Directory=‘C:\AW\Files\’;$l_File=‘flag.txt’;$g_count=($g_count+1);print((‘Execution#’||$g_count));print(((‘Starting’||workflow_name())||’…’));sleep(10000);print(((‘Finishing’||workflow_name())||’…’));END

END

SET(“loop_exit”=‘fn_check_flag($l_Directory,$l_File)’,“loop_exit

_option”=‘yes’,“restart_condition”=‘no’,“restart_count”=‘10’,“restart_count_option”=‘yes’,“workflow_type”=‘Continuous’)

Anycommand-lineinputoroutputiswrittenasfollows:

setup.exeSERVERINSTALL=Yes

Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:“OpentheworkflowpropertiesagaintoeditthecontinuousoptionsusingtheContinuousOptionstab.”

NoteWarningsorimportantnotesappearinaboxlikethis.

TipTipsandtricksappearlikethis.

ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.

Tosendusgeneralfeedback,simplye-mail<feedback@packtpub.com>,andmentionthebook’stitleinthesubjectofyourmessage.

Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.

CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.

DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

DownloadingthecolorimagesofthisbookWealsoprovideyouwithaPDFfilethathascolorimagesofthescreenshots/diagramsusedinthisbook.Thecolorimageswillhelpyoubetterunderstandthechangesintheoutput.Youcandownloadthisfilefrom:https://www.packtpub.com/sites/default/files/downloads/6565EN_Graphics.pdf.

ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.

Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.

PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.

Pleasecontactusat<copyright@packtpub.com>withalinktothesuspectedpiratedmaterial.

Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.

QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<questions@packtpub.com>,andwewilldoourbesttoaddresstheproblem.

Chapter1.IntroductiontoETLDevelopmentInthischapter,wewillcover:

PreparingadatabaseenvironmentCreatingasourcesystemdatabaseDefiningandcreatingstagingareastructuresCreatingatargetdatawarehouse

IntroductionSimplyput,Extract-Transform-Load(ETL)isanengineofanydatawarehouse.ThenatureoftheETLsystemisstraightforward:

Extractdatafromoperationaldatabases/systemsTransformdataaccordingtotherequirementsofyourdatawarehousesothatthedifferentpiecesofdatacanbeusedtogetherApplydataqualitytransformationmethodsinordertocleansedataandensurethatitisreliablebeforeitgetsloadedintoadatawarehouseLoadconformeddataintoadatawarehousesothatenduserscanaccessitviareportingtools,usingclientapplicationsdirectly,orwiththehelpofSQL-basedquerytools

Whileyourdatawarehousedeliverystructuresordatamartsrepresentthefrontendor,inotherwords,whatusersseewhentheyaccessthedata,theETLsystemitselfisabackbonebackendsolutionthatdoesalltheworkofmovingdataandgettingitreadyintimeforuserstouse.BuildingtheETLsystemcanbeareallychallengingtask,andthoughitisnotpartofthedatawarehousedatastructures,itisdefinitelythekeyfactorindefiningthesuccessofthedatawarehousesolutionasawhole.Intheend,whowantstouseadatawarehousewherethedataisunreliable,corrupted,orsometimesevenmissing?ThisisexactlywhatETLisresponsibleforgettingright.

ThefollowingdatastructuretypesmostoftenusedinETLdevelopmenttomovedatabetweensourcesandtargetsareflatfiles,XMLdatasets,andDBMStables,bothinnormalizedschemasanddimensionaldatamodels.WhenchoosinganETLsolution,youmightfacetwosimplechoices:buildingahandcodedETLsolutionorusingacommercialone.

ThefollowingaresomeadvantagesofahandcodedETLsolution:

AprogramminglanguageallowsyoutobuildyourownsophisticatedtransformationsYouaremoreflexibleinbuildingtheETLarchitectureasyouarenotlimitedbythevendor’sETLabilitiesSometimes,itcanbeacheapwayofbuildingafewsimplisticETLprocesses,

whereasbuyinganETLsolutionfromavendorcanbeoverkillYoudonothavetospendtimelearningthecommercialETLsolution’sarchitectureandfunctionality

HerearesomeadvantagesofacommercialETLsolution:

Thisismoreoftenasimpler,faster,andcheaperdevelopmentoptionasavarietyofexistingtoolsallowyoutobuildaverysophisticatedETLarchitecturequicklyYoudonothavetobeaprofessionalprogrammertousethetoolItautomaticallymanagesETLmetadatabycollecting,storing,andpresentingittotheETLdeveloper,whichisanotherimportantaspectofanyETLsolutionIthasahugerangeofadditionalready-to-usefunctionality,frombuilt-inschedulerstovariousconnectorstoexistingsystems,built-indatalineages,impactanalysisreports,andmanyothers

InthemajorityofDWHprojects,thecommercialETLsolutionfromaspecificvendor,inspiteofthehigherimmediatecost,eventuallysavesyouasignificantamountofmoneyonthedevelopmentandmaintenanceofETLcode.

SAPDataServicesisanETLsolutionprovidedbySAPandispartoftheEnterpriseInformationManagementproductstack,whichalsoincludesSAPInformationSteward;wewillreviewthisinoneofthelastchaptersofthisbook.

PreparingadatabaseenvironmentThisrecipewillleadyouthroughthefurtherstepsofpreparingtheworkingenvironment,suchaspreparingadatabaseenvironmenttobeutilizedbyETLprocessesasasourceandstagingandtargetingsystemsforthemigratedandtransformeddata.

GettingreadyTostarttheETLdevelopment,weneedtothinkaboutthreethings:thesystemthatwewillsourcethedatafrom,ourstagingarea(forinitialextractsandasapreliminarystoragefordataduringsubsequenttransformationsteps),andfinally,thedatawarehouseitself,towhichthedatawillbeeventuallydelivered.

Howtodoit…Throughoutthebook,wewillusea64-bitenvironment,soensurethatyoudownloadandinstallthe64-bitversionsofsoftwarecomponents.Performthefollowingsteps:

1. Let’sstartbypreparingoursourcesystem.Forquickdeployment,wewillchoosethe

MicrosoftSQLServer2012Expressdatabase,whichisavailablefordownloadathttp://www.microsoft.com/en-nz/download/details.aspx?id=29062.

2. ClickontheDownloadbuttonandselecttheSQLEXPRWT_x64_ENU.exefileinthelistoffilesthatareavailablefordownload.Thispackagecontainseverythingrequiredfortheinstallationandconfigurationofthedatabaseserver:theSQLServerExpressdatabaseengineandtheSQLServerManagementStudiotool.

3. Afterthedownloadiscomplete,runtheexecutablefileandfollowtheinstructionsonthescreen.TheinstallationofSQLServer2012Expressisextremelystraightforward,andalloptionscanbesettotheirdefaultvalues.Thereisnoneedtocreateanydefaultdatabasesduringoraftertheinstallationaswewilldoitabitlater.

Howitworks…Afteryouhavecompletedtheinstallation,youshouldbeabletoruntheSQLServerManagementStudioapplicationandconnecttoyourdatabaseengineusingthesettingsprovidedduringtheinstallationprocess.

Ifyouhavedoneeverythingcorrectly,youshouldseethe“green”stateofyourDatabaseEngineconnectionintheObjectExplorerwindowofSQLServerManagementStudio,asshowninthefollowingscreenshot:

Weneedan“empty”installationofMSSQLServer2012Expressbecausewewillcreateallthedatabasesweneedmanuallyinthenextstepsofthischapter.Thisdatabaseengineinstallationwillhostalloursource,stage,andtargetrelationaldatastructures.ThisoptionallowsustoeasilybuildatestenvironmentthatisperfectforlearningpurposesinordertobecomefamiliarwithETLdevelopmentusingSAPDataServices.

Inareal-lifescenario,yoursourcedatabases,stagingareadatabase,andDWHdatabase/appliancewillmostlikelyresideonseparateserverhosts,andtheymaysometimesbefromdifferentvendors.So,theroleofSAPDataServicesistolinkthemtogetherinordertomigratedatafromonesystemtoanother.

CreatingasourcesystemdatabaseInthissection,wewillcreateoursourcedatabase,whichwillplaytheroleofanoperationaldatabasethatwewillpulldatafromwiththehelpofDataServicesinordertotransformthedataanddeliverittoadatawarehouse.

Howtodoit…Luckilyforus,thereareplentyofdifferentflavorsofready-to-usedatabasesontheWebnowadays.Let’spickoneofthemostpopularones:AdventureWorksOLTPforSQLServer2012,whichisavailablefordownloadontheCodePlexwebsite.Performthefollowingsteps:

1. Usethefollowinglinktoseethelistofthefilesavailablefordownload:

https://msftdbprodsamples.codeplex.com/releases/view/55330

2. ClickontheAdventureWorks2012DataFilelink,whichshoulddownloadtheAdventureWorks2012_Data.mdfdatafile.

3. Whenthedownloadiscomplete,copythefileintotheC:\AdventureWorks\directory(createitbeforecopyingifnecessary).

Thenextstepistomapthisdatabasefiletoourdatabaseengine,whichwillcreateoursourcedatabase.Todothis,performthefollowingsteps:

1. StartSQLServerManagementStudio.2. ClickontheNewQuerybutton,whichwillopenanewsessionconnectiontoa

masterdatabase.3. IntheSQLQuerywindow,typethefollowingcommandandpressF5toexecuteit:

CREATEDATABASEAdventureWorks_OLTPON

(FILENAME=‘C:\AdventureWorks\AdventureWorks2012_Data.mdf’)

FORATTACH_REBUILD_LOG;

4. Afterasuccessfulcommandexecutionanduponrefreshingthedatabaselist(usingF5),youshouldbeabletoseetheAdventureWorks_OLTPdatabaseinthelistoftheavailabledatabasesintheObjectExplorerwindowofSQLServerManagementStudio.

TipDownloadingtheexamplecode

YoucandownloadtheexamplecodefilesforallPacktbooksyouhavepurchasedfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Howitworks…Inatypicalscenario,everySQLServerdatabaseconsistsoftwodatafiles:adatabasefileandatransactionlogfile.Adatabasefilecontainsactualdatastructuresanddata,whileatransactionlogfilekeepsthetransactionalchangesappliedtothedata.

Asweonlydownloadedthedatafile,wehadtoexecutetheCREATEDATABASEcommandwithaspecialATTACH_REBUILD_LOGclause,whichautomaticallycreatesamissingtransactionlogfilesothatthedatabasecouldbesuccessfullydeployedandopened.

Now,oursourcedatabaseisreadytobeusedbyDataServicesinordertoaccess,browse,andextractdatafromit.

There’smore…Therearedifferentwaystodeploytestdatabases.ThismainlydependsonwhichRDBMSsystemyouuse.Sometimes,youmayfindapackageofSQLscriptsthatcontainsthecommandsrequiredtocreateallthedatabasestructuresandcommandsusedtoinsertdataintothesestructures.Thisoptionmaybeusefulifyouhaveproblemswithattachingthedownloadedmdfdatafiletoyourdatabaseengineor,forexample,ifyoufindtheSQLscriptscreatedforSQLServerRDBMSbuthavetoapplythemtotheOracleDB.Withslightmodificationstothecommand,youcanruntheminordertocreateanOracledatabase.

ExplainingRDBMStechnologiesliesbeyondthescopeofthisbook.So,ifyouarelookingformoreinformationregardinghowaspecificRDBMSsystemworks,refertotheofficialdocumentation.

WhathastobesaidhereisthatfromtheperspectiveofusingDataServices,itdoesnotmatterwhichsourcesystemortargetsystemsyouuse.DataServicesnotonlysupportsthemajorityofthem,butitalsocreatesitsownrepresentationofthesourceandtargetobjects;thisway,theyalllookthesametoDataServicesusersandabidebythesameruleswithintheDataServicesenvironment.So,youreallydonothavetobeaDBAordatabasedevelopertoeasilyconnecttoanyRDBMSfromDataServices.AllthatisrequiredisaknowledgeoftheSQLlanguagetounderstandtheprincipleofmethodsthatDataServicesuseswhenextractingandloadingdataorcreatingdatabaseobjectsforyou.

DefiningandcreatingstagingareastructuresInthisrecipe,wewilltalkaboutETLdatastructuresthatwillbeusedinthisbook.Stagingstructuresareimportantstorageareaswhereextracteddataiskeptbeforeitgetstransformedorstoredbetweenthetransformationsteps.Thestagingareaingeneralcanbeusedtocreatebackupcopiesofdataortorunanalyticalqueriesonthedatainordertovalidatethetransformationsmadeortheextractprocesses.Stagingdatastructurescanbequitedifferent,asyouwillsee.Whichonetousedependsonthetasksyouaretryingtoaccomplish,yourprojectrequirements,andthearchitectureoftheenvironmentused.

Howtodoit…ThemostpopulardatastructuresthatcouldbeusedinthestagingareaareflatfilesandRDBMStables.

FlatfilesOneoftheperksofusingDataServicesagainstthehandcodedETLsolutionisthatDataServicesallowsyoutoeasilyreadfromandwriteinformationtoaflatfile.

CreatetheC:\AW\folder,whichwillbeusedthroughoutthisbooktostoreflatfiles.

NoteInsertingdataintoaflatfileisfasterthaninsertingdataintoanRDBMStable.So,duringETLdevelopment,flatfilesareoftenusedtoreachtwogoalssimultaneously:creatingabackupcopyofthedatasnapshotandprovidingyouwiththestoragelocationforyourpreliminarydatabeforeyouapplythenextsetoftransformationrules.

Anothercommonuseofflatfilesistheabilitytoexchangedatabetweensystemsthatcannotcommunicatewitheachotherinanyotherway.

Lastly,itisverycost-effectivetostoreflatfiles(OSdiskstoragespaceischeaperthanDBstoragespace).

Themaindisadvantageoftheflatfilesstoragemethodisthatthemodificationofdatainaflatfilecansometimesbearealpain,nottomentionthatitismuchslowerthanmodifyingdatainarelationalDBtable.

RDBMStablesTheseETLdatastructureswillbeusedmoreoftenthanotherstostagethedatathatisgoingthroughtheETLtransformationprocess.

Let’screatetwoseparatedatabasesforrelationaltables,whichwillplaytheroleoftheETLstagingareainourfutureexamples:

1. OpenSQLServerManagementStudio.2. Right-clickontheDatabasesiconandselecttheNewDatabase…option.3. Onthenextscreen,inputODSasthedatabasename,andspecify100MBastheinitial

sizevalueofthedatabasefileand10MBasthatofthetransactionallogfile:

4. RepeatthelasttwostepstocreateanotherdatasetcalledSTAGE.

Howitworks…Let’srecap.TheETLstagingareaisalocationtostorethepreliminaryresultsofourETLtransformationsandalsoalandingzonefortheextractsfromthesourcesystem.

Yes,DataServicesallowsyoutoextractdataandperformalltransformationsinthememorybeforeloadingtothetargetsystem.However,asyouwillseeinlaterchapters,theETLprocess,whichdoeseverythinginone“go”,canbecomplexanddifficulttomaintain.Plus,ifsomethinggoeswrongalongtheway,allthechangesthattheprocesshasalreadyperformedwillbelostandyoumayhavetostarttheextraction/transformationprocessagain.Thisobviouslycreatesextraworkloadonasourcesystembecauseyouhavetoqueryitagaininordertogetthedata.Finally,bigdoesnotmeaneffective.WewillshowyouhowsplittingyourETLprocessintosmallerpieceshelpsyoutocreateawell-performingsequenceofdataflow.

TheODSdatabasewillbeusedasalandingzoneforthedatacomingfromsourcesystems.Thestructureofthetablesherewillbeidenticaltothestructureofthesourcesystemtables.

TheSTAGEdatabasewillholdtherelationaltablesusedtostoredatabetweenthedatatransformationsteps.

WewillalsostoresomedataextractedfromasourcedatabaseinaflatfileformattodemonstratetheabilityofDataServicestoworkwiththemandshowtheconvenienceofthisdatastoragemethodintheETLsystem.

CreatingatargetdatawarehouseFinally,thisisthetimetocreateourtargetdatawarehousesystem.Thedatawarehousestructuresandtableswillbeusedbyenduserswiththehelpofvariousreportingtoolstomakesenseofthedataandanalyzeit.Asaresult,itshouldhelpbusinessuserstomakestrategicdecisions,whichwillhopefullyleadtobusinessgrowth.

Weshouldnotforgetthatthemainpurposeofadatawarehouse,andhencethatofourETLsystem,istoservebusinessneeds.

GettingreadyThedatawarehousecreatedinthisrecipewillbeusedasatargetdatabasepopulatedbytheETLprocessesdevelopedinSAPDataServices.ThisiswherethedatamodifiedandcleansedbyETLprocesseswillbeinsertedintheend.Plus,thisisthedatabasethatwillmainlybeaccessedbybusinessusersandreportingtools.

Howtodoit…Performthefollowingsteps:

1. AdventureWorkscomestotherescueagain.Useanotherlinktodownloadthe

AdventureWorksdatawarehousedatafile,whichwillbemappedinthesamemannertoourSQLServerExpressdatabaseengineinordertocreatealocaldatawarehouseforourownlearningpurposes.GotothefollowingURLandclickontheAdventureWorksDWforSQLServer2012link:

https://msftdbprodsamples.codeplex.com/releases/view/105902

2. AfteryouhavesuccessfullydownloadedtheAdventureWorksDW2012.zipfile,unpackitscontentsintothesamedirectoryasthepreviousfile:C:\AdventureWorks\

3. Thereshouldbetwofilesinthearchive:AdventureWorksDW2012_Data.mdf—thedatabasedatafileAdventureWorksDW2012_Log.ldf—thedatabasetransactionlogfile

4. OpenSQLServerManagementStudioandclickontheNewQuery…buttonintheuppermosttoolbar.

5. EnterandexecutethefollowingcommandintheSQLQuerywindow:CREATEDATABASEAdventureWorks_DWHON

(FILENAME=‘C:\AdventureWorks\AdventureWorksDW2012_Data.mdf’),(FILENAME=‘C:\AdventureWorks\AdventureWorksDW2012_Log.ldf’)FORATTACH;

6. Afterasuccessfulcommandexecution,right-clickontheDatabasesiconandchoosetheRefreshoptionintheopenedmenulist.Thisshouldrefreshthecontentsofyourobjectlibrary,andyoushouldseethefollowinglistofdatabases:

ODS

STAGE

AdventureWorks_OLTP

AdventureWorks_DWH

Howitworks…Getyourselffamiliarwiththetablesofthecreateddatawarehouse.Throughoutthewholebook,youwillbeusingtheminordertoinsert,update,anddeletedatausingDataServices.

Therearealsosomediagramsavailablethatcouldhelpyouseethevisualdatawarehousestructure.Togetaccesstothem,openSQLServerManagementStudio,expandtheDatabaseslistintheObjectExplorerwindow,thenexpandtheAdventureWorks_DWHdatabaseobjectlist,andfinallyopentheDiagramstree.Double-clickingonanydiagraminthelistopensanewwindowwithinManagementStudiowiththegraphicalpresentationoftables,keycolumns,andlinksbetweenthetables,whichshowsyoutherelationshipsbetweenthem.

There’smore…Inthenextrecipe,wewillhaveanoverviewoftheknowledgeresourcesthatexistontheWeb.Wehighlyrecommendthatyougetfamiliarwiththeminordertoimproveyourdatawarehousingskills,learnaboutthedatawarehouselifecycle,andunderstandwhatmakesasuccessfuldatawarehouseproject.Inthemeantime,feelfreetoopenNewQueryinSQLServerManagementStudioandstartrunningtheSELECTcommandstoexplorethecontentsofthetablesinyourAdventureWorks_DWHdatabase.

NoteThemostimportantassetofanyDWHarchitectorETLdeveloperisnottheknowledgeofaprogramminglanguageortheavailabletoolsbuttheabilitytounderstandthedatathatis,orwillbe,populatingthedatawarehouseandthebusinessneedsandrequirementsforthisdata.

Chapter2.ConfiguringtheDataServicesEnvironmentInthischapter,wewillinstallandconfigureallcomponentsrequiredforSAPDataServices.Inthischapter,wewillcoverthefollowingtopics:

CreatingIPSandDataServicesrepositoriesInstallingandconfiguringInformationPlatformServicesInstallingandconfiguringDataServicesConfiguringuseraccessStartingandstoppingservicesAdministeringtasksUnderstandingtheDesignertool

IntroductionThesamethingthatmakesSAPDataServicesagreatETLdevelopmentenvironmentmakesitquitenotatrivialonetoinstallandconfigure.Herethough,youhavetorememberthatDataServicesisanenterpriseclassETLsolutionthatisabletosolvethemostcomplexETLtasks.

Seethefollowingimageforaveryhigh-levelDataServicesarchitectureview.DataServiceshastwobasicgroupsofcomponents:clienttoolsandserver-basedcomponents:

Clienttoolsincludethefollowing(therearemore,butwementiontheonesmostoftenused):

TheDesignertool:Thisistheclient-basedmainGUIapplicationforETLdevelopmentRepositoryManager:Thisisaclient-basedGUIapplicationforDataServicestocreate,configure,andupgradeDataServicesrepositories

Themainserver-basedcomponentsincludethefollowingones:

IPSServices:Thisisusedforuserauthentication,systemconfigurationstorage,andinternalmetadatamanagementJobServer:ThisisacoreengineservicethatexecutesETLcode

Accessserver:Thisisareal-timerequest-replymessagebroker,whichimplementsreal-timeservicesintheDataServicesenvironmentWebapplicationserver:ThisprovidesaccesstosomeDataServicesadministrationandreportingtasksviatheDSManagementConsoleandCentralManagementConsoleweb-basedapplications

Inthecourseofthenextfewrecipes,wewillinstall,configure,andaccessallthecomponentsrequiredtoperformthemajorityofETLdevelopmenttasks.YouwilllearnabouttheirpurposesandsomeusefultipsthatwillhelpyoueffectivelyworkintheDataServicesenvironmentthroughoutthebookandinyourfuturework.

DataServicesinstallationsupportsallmajorOSanddatabaseenvironments.Forlearningpurposes,wehavechosentheWindowsOSasitinvolvestheleastconfigurationontheuserpart.BothclienttoolsandservercomponentswillbeinstalledonthesameWindowshost.

CreatingIPSandDataServicesrepositoriesTheIPSrepositoryisastorageforenvironmentanduserconfigurationinformationandmetadatacollectedbyvariousservicesofIPSandDataServices.Ithasanothername:theCMSdatabase.ThisnameshouldbequitefamiliartothosewhohaveusedSAPBusinessIntelligencesoftware.Basically,IPSisalightversionofSAPBIproductpackage.YouwillalwaysuseonlyoneIPSrepositoryperDataServicesinstallationandmostlikelywilldealwithitonlyonce:whenconfiguringtheenvironmentattheverybeginning.Mostofthetime,DataServiceswillbecommunicatingwithIPSservicesandtheCMSdatabaseinthebackground,withoutyouevennoticing.

TheDataServicesrepositoryisadifferentstory.ItismuchclosertoanETLdeveloperasitisadatabasethatstoresyourdevelopedcode.Inamultiuserdevelopmentenvironment,everyETLdeveloperusuallyhasitsownrepository.Theycanbeoftwotypes:centralandlocal.TheyservedifferentpurposesintheETLlifecycle,andIwillexplainthisinmoredetailintheupcomingchapters.Meanwhile,let’screateourfirstlocalDataServicesrepository.

Gettingready…BothrepositorieswillbestoredinthesameSQLServerExpressRDBMS((local)\SQLEXPRESS)thatweusedtocreateoursourceOLTPdatabase,ETLstagingdatabases,andtargetdatawarehouse.So,atthispoint,youonlyneedtohaveaccesstoSQLServerManagementStudioandyourSQLServerExpressservicesneedtostart.

Howtodoit…Thiswillconsistoftwomajortasks:

1. Creatingadatabase:

1. LogintoSQLServerManagementStudioandcreatetwodatabases:IPS_CMSandDS_LOCAL_REPO.

2. Rightnow,yourdatabaselistshouldlooklikethis:

2. ConfiguringtheODBClayer:InstallationrequiresthatyoucreatetheODBCdatasourcefortheIPS_CMSdatabase.1. GotoControlPanel|AdministrativeTools|ODBCDataSources(64-bit).2. OpentheSystemDSNtabandclickontheAdd…button.3. Choosethenameofthedatasource:SQL_IPS,thedescriptionSQLServer

Express,andtheSQLServeryouwanttoconnecttothroughthisODBCdatasource:(local)\SQLEXPRESS.Then,clickonNext.

4. ChooseSQLServerauthenticationandselectthecheckboxConnecttoSQLtoobtainthedefaultsettings.EntertheloginID(sauser)andpassword.ClickonNext.

5. SelectthecheckboxandchangethedefaultdatabasetoIPS_CMS.ClickonNext.6. SkipthenextscreenbyclickingonNext.7. ThefinalscreenoftheODBCconfigurationshouldlooklikethefollowing

screenshot.Then,clickingontheTestDataSourcebuttonshouldgiveyouthemessage,TESTSCOMPLETEDSUCCESSFULLY!

Howitworks…ThesetwoemptydatabaseswillbeusedbyDataServicestoolsduringinstallationandpost-installationconfigurationtasks.Allstructuresinsidethemwillbecreatedandpopulatedautomatically.

Usually,theyarenotbuiltforuserstoaccessthemdirectly,butintheupcomingchapters,Iwillshowyouafewtricksonhowtoextractvaluableinformationfromtheminordertotroubleshootpotentialproblems,doalittlebitofETLmetadatareporting,oruseanextendedsearchforETLobjects,whichisnotpossibleintheGUIoftheDesignertool.

TheODBClayerconfiguredfortheIPS_CMSdatabaseallowsyoutoaccessitfromtheIPSinstallation.WhenweinstallbothIPSandDataServices,youwillbeabletoconnecttothedatabasesdirectlyfromtheDataServicesapplications,asithasnativedriversforvarioustypesofdatabasesandalsoallowsyoutoconnectthroughODBClayersifyouwant.

SeealsoReferencestoafuturechaptercontainingtechniquesmentionedintheprecedingparagraph.

InstallingandconfiguringInformationPlatformServicesTheInformationPlatformServices(IPS)productpackagewasaddedasacomponentintotheDataServicesbundlestartingfromtheDataServices4.xversion.ThereasonforthiswastomaketheDataServicesarchitectureflexibleandrobustandintroducesomeextrafunctionality,thatis,ausermanagementlayertotheexistingSAPDataServicessolution.Aswementionedbefore,IPSisalightversionofSAPBIcoreservicesandhasalotofsimilarfunctionality.

Inthisrecipe,wewillperformtheinstallationandbasicconfigurationofIPS,whichisamandatorycomponentforfutureDataServicesinstallations.

TipAsanoption,youcouldalwaysusetheexistingfullenterpriseSAPBIsolutionifyouhaveitinstalledinyourenvironment.However,thisisgenerallyconsideredabadpractice.Imaginethatitislikestoringalleggsinonebasket.WheneveryouneedtoplandowntimeforyourBIsystem,youshouldkeepinmindthatitwillaffectyourETLenvironmentaswell,andyouwillnotbeabletorunanyDataServicesjobsduringthisperiod.Thatiswhy,IPSisinstalledtobeusedonlybyDataServicesasasaferandmoreconvenientoptionintermsofsupportandmaintenance.

Gettingready…DownloadtheInformationPlatformServicesinstallationpackagefromtheSAPsupportportalandunzipittothelocationofyourchoice.ThemainrequirementforinstallingIPSaswellasDataServicesinthenextrecipeisthatyourOSshouldhavea64-bitarchitecture.

Howtodoit…1. CreateanEIMfolderinyourCdrivetostoreyourinstallationinoneplace.2. LaunchtheIPSinstallerbyexecutingInstallIPS.exe.3. MakesurethatallyourcriticalprerequisiteshavetheSucceededstatusontheCheck

Prerequisitesscreen.Continuetothenextscreen.4. ChooseC:\EIM\astheinstallationdestinationfolder.Continuetothenextscreen.5. ChoosetheFullinstallationtype.Continuetothenextscreen.6. OnSelectDefaultorExistingDatabase,chooseConfigureanexistingdatabase

andcontinuetothenextscreen.7. SelectMicrosoftSQLServerusingODBCastheexistingCMSdatabasetype.8. SelectNoauditingdatabaseonthenextscreenandcontinue.9. ChooseInstallthedefaultTomcatJavaWebApplicationServerand

automaticallydeploywebapplications.Continuetothenextscreen.10. Forversionmanagement,chooseDonotconfigureaversioncontrolsystematthis

time.11. Onthenextscreen,specifytheSIAnameintheNodenamefieldasIPSandSIA

portas6410.12. DonotchangethedefaultCMSport,6400.13. OntheCMSaccountconfigurationscreen,inputpasswordsfortheadministrator

useraccountandtheCMSclusterkey(theycanbethesameifyouwant).Continuefurther.

14. UsethefollowingsettingsfromthefollowingscreenshottoconfiguretheCMSRepositoryDatabase:

15. LeavethedefaultvaluesforTomcatportsonthenextscreenandclickonNext.RemembertheConnectionPortsetting(defaultis8080)asyouwillrequireittoconnecttotheIPSandDataServiceswebapplications.

16. DonotconfigureconnectivitytoSMDAgent.17. DonotconfigureconnectivitytoIntroscopeEnterpriseManager.18. Finally,theinstallationwillbegin.Itshouldtakeapproximately5–15minutes,

dependingonyourhardware.

Howitworks…Now,byinstallingIPS,wepreparedthebaselayers,ontopofwhichwewillinstalltheDataServicesinstallationpackageitself.

TocheckthatyourIPSinstallationwassuccessful,starttheCentralManagementConsolewebapplicationusingthehttp://localhost:8080/BOE/CMCURLandusetheadministratoraccountthatyousetupduringIPSinstallationtologin.Inthesystemfield,uselocalhost:6400(yourhostnameandCMSportnumberspecifiedduringIPSinstallation).

CheckouttheCoreServicestreeintheServerssectionofCMC.AllserviceslistedshouldhavetheRunningandEnabledstatuses.

InstallingandconfiguringDataServicesTheinstallationofDataServicesinaWindowsenvironmentisasmoothandquickprocess.Ofcourse,youhavevariousinstallationoptions,buthere,wewillchoosetheeasiestpath:thefullinstallationofallcomponentsonthesamehostwithIPSservicesinstalledandthelocalrepositoryalreadycreatedandconfigured.

Gettingready…CompletionofthepreviousrecipeshouldprepareyourenvironmenttoinstallDataServices.DownloadtheDataServicesinstallationpackagefromtheSAPsupportportalandunzipittoalocalfolder.

Howtodoit…1. StartDataServicesfromWindowscommandline(cmd)byexecutingthis

command:

setup.exeSERVERINSTALL=Yes

2. MakesurethatallyourcriticalprerequisiteshavetheSucceededstatusontheCheckPrerequisitesscreen.

3. ChoosethedestinationfolderasC:\EIM\ifrequired.4. OntheCMSconnectioninformationstep,specifytheconnectiondetailstoyour

previouslyinstalledCMS(partofIPS)installation.Thesystemislocalhost:6400,andtheuserisAdministrator.ClickonNext.

5. IntheCMSServiceStop/Startpop-upwindow,agreetorestartSIAservers.6. ChooseInstallwithdefaultconfigurationontheInstallationTypeselection

screen.7. Makesurethatyouselectallfeaturesbyselectingallthecheckboxesonthenext

featureselectionscreenandclickonNext.8. SpecifyMicrosoft_SQL_Serverasadatabasetypeforalocalrepository.9. Usethefollowingdetailsasareferencetoconfiguringyourlocalrepositorydatabase

connectiononthenextscreen:

Option Value

RegistrationnameforCMS DS4_REPO

DatabaseType Microsoft_SQL_Server

Databaseservername (local)\SQLEXPRESS

Databaseport 50664

Databasename DS_LOCAL_REPO

UserName sa

Password <sauserpassword>

10. Forlogininformation,choosetheaccountrecommendedbyinstallation.11. Theinstallationshouldbecompletedin5–10minutes,dependingonyour

environment.

Howitworks…Afterfinishingthisrecipe,youwillhavealltheDataServicesserversandclientcomponentsinstalledonthesameWindowshost.Also,yourDataServicesinstallationisintegratedwithIPSservices.

Tocheckthattheinstallationandintegrationweresuccessful,logintoCMCandseethatinthemainmenu,thereisanewsectioncalledDataServices(seetheOrganizecolumn).GotothissectionandseewhetheryourDS4_REPOexistsinthelistoflocalrepositories.

ConfiguringuseraccessInthisrecipe,IwillshowyouhowtoconfigureyouraccessasafreshETLdeveloperinaDataServicesenvironment.Wewillcreateauseraccount,assignalltherequiredfunctionalprivileges,andassignownerprivilegesforourlocalDataServicesrepository.Inamultiuserdevelopmentenvironment,youwouldrequiretoperformthisstepforeverynewlycreateduser.

Gettingready…ChoosetheusernameandpasswordforyourETLdeveloperuseraccount.WewilllogintotheCMCapplicationtocreateauseraccountandgrantittherequiredsetofprivileges.

Howtodoit…1. LaunchtheCentralManagementConsolewebapplication.2. GotoUsersandGroups.3. ClickonCreateauserbutton(seethefollowingscreenshot):

4. Intheopenedwindow,chooseausername(wepickedetl)andpassword.Also,selectthePasswordneverexpiresoptionandunselectUsermustchangepasswordatnextlogon.ChooseConcurrentUserastheconnectiontype.

5. Now,weshouldaddournewlycreatedaccounttotwopre-existingusergroups.Right-clickontheuserandchoosetheMemberOfoptionintheright-clickmenu.

6. ClickontheJoinGroupbuttoninthenewlyopenedwindowandaddtwogroupsfromthegrouplisttotherightwindowpanel:DataServicesAdministratorUsersandDataServicesDesignerUsers.ClickonOK.

7. Fromtheleft-sideinstrumentpanel,clickontheCMCHomebuttontoreturntothemainCMCscreen.

8. Now,wehavetograntouruserextraprivilegesonthelocalrepository.Forthis,opentheDataServicessection,right-clickonDS4_REPO,andchooseUserSecurityfromthecontextmenu.

9. ClickontheAddprincipalsbutton,movetheetlusertotherightpanelandclickontheAddandAssignSecuritybuttonatthebottomofthescreen.

10. Onthenextscreen,assignthefullcontrol(owner)accesslevelontheAccessLevelstabandgototheAdvancedtab.

11. ClickontheAdd/RemoveRightslinkandsetthefollowingtwooptionsthatappeartoGrantedfortheDataServicesRepositoryapplication(seethefollowingscreenshot):

12. ClickonOKintheAssignSecuritywindowtoconfirmyourconfiguration.13. Asatest,logoutoftheCMCandloginusinganewlycreateduseraccount.

Howitworks…Inacomplexenterpriseenvironment,youcancreatemultiplegroupsfordifferentcategoriesofusers.Youhavefullflexibilityinordertoprovideuserswithvariouskindsofpermissions,dependingontheirneeds.

Someusersmightrequireadministrationprivilegestostart/stopservicesandtomanagerepositorieswithouttheneedtodevelopETLandaccessDesigner.

TheETLdeveloperrolemightrequireonlypermissionsfortheDesignertooltodevelopETLcode.

Inourcase,wehavecreatedasingleuseraccountthathasbothadministrationanddeveloperprivileges.

StartingandstoppingservicesInthisrecipe,IwillexplainhowyoucanrestarttheservicesofallthemaincomponentsinyourDataServicesenvironment.

Howtodoit…Thisrelatestothethreedifferentservices:

Webapplicationserver:

TheTomcatapplicationserverconfiguredinourenvironmentcanbeconfiguredfromtwoplaces:

ComputerManagement|ServicesandApplications|ServiceswhereitexistsasastandardWindowsserviceBOEXI40Tomcat

CentralConfigurationManagementtoolinstalledasapartofIPSproductpackage:

Usingthistool,youcan:

1. Start/stopservices.2. Backupandrestoresystemconfiguration.3. SpecifytheWindowsuserwhostartsandstopstheunderlyingservices.

DataServicesJobServer:TomanageDataServicesJobServerintheWindowsenvironment,SAPcreatedaseparateGUIapplicationcalledDataServicesServerManager.Usingthistool,youcanperformthefollowingtasks:

1. RestartJobServer.2. CreateandconfigureJobServers.3. CreateandconfigureAccessServers.4. PerformSSLconfiguration.5. Setupapageablecachedirectory.6. PerformSMTPconfigurationforthesmpt_to()DataServicesfunction.

InformationPlatformServices:Tomanipulatetheseservices,youhavetwooptions:

CentralManagementConsole(tostop/startandconfigureservicesparameters)

CentralConfigurationManagement(tostop/startservices)

Inmostcases,youwillbeusingtheCMCoption,asitisaquickandconvenientwaytoaccessallservicesincludedintheIPSpackage.Italsoallowsyoutoseemuchmoreservice-relatedinformation.

Thesecondoptionisusefulifyouhavetheapplicationserverstoppedforsomereason(CMCasaweb-basedapplicationwillnotbeworking,ofcourse),andyoustillneedto

accessIPSservicestoperformbasicadministrationtaskssuchasrestartingthem,forexample.

Howitworks…Sometimes,thingsturnsour,andrestartingservicesisthequickestandeasiestoptiontoreturnthemtoanormalstate.Inthisrecipe,Imentionedallthemainservercomponentsandpointsofaccesstoperformsuchatask.

Thelastthingyoushouldkeepinmindregardingthisistherecommendedstartup/shutdownsequencesofthosecomponents.

1. ThefirstthingthatshouldstartafterWindowsstartsisyourdatabaseserver,asit

hoststheCMSdatabaserequiredfortheIPSservicesandDataServiceslocalrepository.

2. Second,youshouldstartIPSservices(themainoneistheCMSservice)asanunderlyinglevelforDataServices.

3. Then,itistheturnoftheDataServicesJobServer.4. Finally,itgoestoTomcat(webapplicationserver)thatprovidesuserswithaccessto

web-basedapplications.

Seealso

IdefinitelyrecommendthatyougetfamiliarwiththeSAPDataServicesAdministratorsGuidetounderstandthedetailsregardingIPSandDataServicescomponentmanagementandconfiguration.KnowledgesourcedanddocumentationlinksfromChapter1,IntroductiontoETLDevelopment.

AdministeringtasksThepreviousrecipeispartofthebasicadministrationtaskstoo,ofcourse.IseparateditfromthecurrentoneasIwantedtoputanaccentonDataServicesarchitecturedetailsbyexplainingthemainDataServicescomponentsinrelationtothemethodsandtoolsyoucanusetomanipulatethem.

Howtodoit…Here,wewilllookatsomeofthemostimportantadministrativetasks.

1. UsingRepositoryManager:

Asyoucanprobablyremember,therearetwotypesofrepositoriesinDataServices:thelocalrepositoryandcentralrepository.Theyservedifferentpurposesbutcanbecreatedinquiteasimilarway:withthehelpoftheDataServicesRepositoryManagertool.

ThisisaGUI-basedtoolavailableonyourWindowsmachineandinstalledwithotherclienttools.

AswealreadyhaveonerepositorycreatedandconfiguredautomaticallyduringtheDataServicesinstallation,let’scheckitsversionusingtheRepositoryManagertool.

LaunchRepositoryManagerandenterthefollowingvaluesforthecorrespondingoptions:

Field Value

Repositorytype Local

DatabaseType MicrosoftSQLServer

Databaseservername (local)\SQLEXPRESS

Databasename DS_LOCAL_REPO

UserName sa

Password *******

Afterenteringthesedetails,youhaveseveraloptions:

Create:Thisoptioncreatesrepositoryobjectsinthedefineddatabase.AswealreadyhavearepositoryinDS_LOCAL_REPO,theapplicationwillaskuswhetherwewanttoresettheexistingrepository.Sometimes,thiscanbeuseful,butkeepinmindthatitwillcleansetherepositoryofallobjects,andifnotcareful,allyourETLthatresidesintherepositorycanbelost.

Upgrade:ThisoptionupgradestherepositorytotheversionoftheRepositoryManagertool.Itisusefulduringsoftwareupgrades.AfterinstallingthenewversionofIPSandDataServices,youhavetoupgradeyourrepositorycontentsaswell.ThisiswhenyoulaunchtheRepositoryManagertool(whichhasalreadybeenupdated)andupgradeyourrepositorytothecurrentversion.

Getversion:Thisisthesafestoptionofthemall.Itjustreturnsthestringcontainingtherepositoryversionnumber.Inourcase,itreturned:BODI-320030:Thelocalrepositoryversion:<14.2.4.0>.

2. UsingServerManagerandCMCtoregisterthenewrepository:

AfteryoucreatethenewrepositorywithRepositoryManager,youhavetoregisteritinIPSandlinkittotheexistingJobServer.

ToregisteranewrepositoryinIPS,usethefollowingsteps:

1. LaunchCentralManagementConsole.2. OpentheDataServicessectionfromtheCMChomepage.3. GotoManage|ConfigureRepository.4. EnterdatabasedetailsofyournewlycreatedrepositoryandclickonSave.5. Toassignusersarequiredsetofprivileges,useUserSecuritywhenright-

clickingontherepositoryinthelist.Fordetails,seetheConfiguringuseraccessrecipe.

TolinkanewrepositorytotheJobServer,performthesesteps:

1. LaunchtheDataServicesServerManagertool.2. ChoosetheJobServertab.3. PressontheConfigurationEditor…button.4. SelectJobServerandpresstheEdit…button.5. IntheAssociatedRepositoriespanel,presstheAdd…buttonandfillin

database-relatedinformationofthenewrepositoryinthecorrespondentfieldsontheright-handside.

6. UsetheCloseandRestartbuttonintheDataServicesServerManagementtooltoapplythechangesdonetoaJobServer.

3. UsingLicenseManager:1. LicenseManagerexistsonlyinacommand-linemode.2. UsethefollowingsyntaxtorunLicenseManager:

LicenseManager[-v|-a<keycode>|-r<keycode>[-l<location>]]

3. Usethe–voptiontoviewexistinglicensekeys,-atoaddanewlicensekey,and–rtoremovetheexistinglicensekeyfromthe–llocationspecified.

ThistoolisavailableatC:\EIM\DataServices\bin\.

Howitworks…CreatingandconfiguringanewlocalrepositoryisusuallyrequiredwhenyousetupanenvironmentforanewETLdeveloperorwanttouseanextrarepositorytomigrateyourETLforETLtestingpurposesortotestarepositoryupgrade.

Aftercreatinganewlocalrepository,youshouldalwayslinkittoanexistingJobServer.ThislinkensuresthatJobServerisawareoftherepositoryandcanexecutejobsfromit.

Finally,LicenseManagercanbeusedtoseethelicensekeyusedinyourinstallationsandtoaddnewextraonesifrequired.

SeealsoYoucanpracticewithyourDataServicesadminskillsbycreatinganewdatabaseandnewlocalDataServicesrepository.Donotforgetthatyoudonotjusthavetocreateit,butalsoregisteritwithIPSservicesandDataServicesJobServersothatyoucansuccessfullyrunjobsfromit.

Someotheradministrativetaskscanbefoundinthefollowingchapters:

TheStartingandstoppingservicesrecipefromthischapterTheConfigureODBClayerpointfromtheHowtodoit…sectionoftheCreatingIPSandDataServicesrepositoriesrecipeofthischapter

UnderstandingtheDesignertoolNowthatwehavereviewedalltheimportantserverandclientcomponentsofournewDataServicesinstallation,itistimetogetfamiliarwiththemostusableandmostimportanttoolintheDataServicesproductpackage.Itwillbeourmainfocusinthefollowingchapters,andofcourse,IamtalkingaboutourdevelopmentGUI:theDesignertool.

EveryobjectyoucreateinDesignerisstoredinalocalobjectlibrary,whichisalogicalstorageunitpartofthephysicallocalrepositorydatabase.Inthisrecipe,wewilllogintoalocalrepositoryviaDesigner,setupacoupleofsettings,andwriteourfirst“HelloWorld”program.

Gettingready…YourDataServicesETLdevelopmentenvironmentisfullydeployedandconfigured,sogoaheadandstarttheDesignerapplication.

Howtodoit…First,let’schangesomedefaultoptionstomakeourdevelopmentlifealittlebiteasierandtoseehowoptionswindowsinDataServiceslooks:

1. WhenyoulaunchyourDesignerapplication,youseequiteasophisticatedlogin

screen.Entertheetlusernamewecreatedinoneofthepreviousrecipesanditspasswordtoseethelistofrepositoriesavailableinthesystem.

2. Atthispoint,youshouldseeonlyonelocalrepository,DS4_REPO,thatwascreatedbydefaultduringtheDataServicesinstallation.Double-clickonit.

3. YoushouldseeyourDesignerapplicationstarted.4. GotoTools|Options.5. Intheopenedwindow,expandtheDesignertreeandchooseGeneral.6. SettheNumberofcharactersinworkspaceiconnameoptionto50andselectthe

Automaticallycalculatecolumnmappingscheckbox.7. ClickonOKtoclosetheoptionswindow.

Beforewecreateourfirst“HelloWorld”program,let’squicklytakealookatDesigner’suserinterface.

Inthisrecipe,youwillberequiredtoworkwithonlytwoareas:LocalObjectLibraryandthemaindevelopmentarea.Thebiggestwindowontheright-handsidewiththeStartPagetabwillopenbydefault.

LocalObjectLibrarycontainstabswithlistsofobjectsyoucancreateoruseduringyourETLdevelopment.TheseobjectsincludeProjects,Jobs,WorkFlows,DataFlows,Transforms,Datastores,Formats,andCustomFunctions:

Alltabsareempty,asyouhavenotcreatedanyobjectsofanykindyet,exceptfortheTransformstab.ThistabcontainsapredefinedsetoftransformsavailableforyoutouseforETLdevelopment.DataServicesdoesnotallowyoutocreateyourowntransforms(thereisanexceptionthatwewilldiscussintheupcomingchapters).So,everythingyouseeonthistabisbasicallyeverythingthatisavailableforyoutomanipulateyourdatawith.

Now,let’screateourfirst“HelloWorld”program.AsETLdevelopmentinDataServicesisnotquitetheusualexperienceofdevelopingwithaprogramminglanguage,weshouldagreeonwhatourfirstprogramshoulddo.Inalmostanyprogramminglanguagerelatedbook,thiskindofprogramjustperformsanoutputofa“HelloWorld”stringontoyourscreen.Inourcase,wewillgeneratea“HelloWorld”stringandoutputitinatablethatwillbeautomaticallycreatedbyDataServicesinourtargetdatabase.

IntheDesignerapplication,gototheLocalObjectLibrarywindow,choosetheJobstab,right-clickontheBatchJobstree,andselectNewfromthelistofoptionsthatappears.

1. ChoosethenameforanewjobJob_HelloWorldandenterit.Afterthejobiscreated,

double-clickonit.2. Youwillenterthejobdesignwindow(seeJob_HelloWorld–Jobatthebottomof

theapplication),andnow,youcanaddobjectstoyourjobandsetupitsvariablesandparameters.

3. InthedesignwindowoftheJob_HelloWorld–Jobtab,createadataflow.Todothis,fromtherighttoolpanel,chooseDataFlowobjectandleft-clickonamaindesignwindowtocreateit.NameitDF_HelloWorld.

4. Double-clickonanewlycreateddataflow(orjustclickonceonitstitle)toopentheDataFlowdesignwindow.Itappearsasanothertabinthemaindesignwindowarea.

5. Now,whenwearedesigningtheprocessingunitordataflow,wecanchoosethetransformsfromtheTransformstaboftheLocalObjectLibrarywindowtoperformmanipulationwiththedata.ClickontheTransformstab.

6. Here,selectthePlatformtransformstreeanddraganddroptheRow_GenerationtransformfromittotheDataFlowdesignwindow.

NoteAswearegeneratinganew“HelloWorld!”string,weshouldusetheRow_Generationtransform.ItisaveryusefulwayofgeneratingrowsinDataServices.Allothertransformsareperformingoperationsontherowsextractedfromsourceobjects(tablesorfiles)thatarepassingfromsourcetotargetwithinadataflow.Inthisexample,wedonothaveasourcetable.Hence,wehavetogeneratearecord.

7. Bydefault,theRow_GenerationtransformgeneratesonlyonerowwiththeIDas0.Now,wehavetocreateourstringandpresentitasafieldinafuturetargettable.Forthis,weneedtousetheQuerytransform.SelectitfromtherighttoolpanelordraganddropitfromTransformstoPlatform.TheiconoftheQuerytransformslookslikethis:

8. IntheDataFlowdesignwindow,linkRow_GenerationtoQuery,asshownhere,anddouble-clickontheQuerytransformtoopentheQueryEditortab:

NoteInthenextchapter,wewillexplainthedetailsoftheQuerytransform.Inthemeantime,let’sjustsaythatthisisoneofthemostusedtransformsinDataServices.Itallowsyoutojoinflowsofyourdataandmodifythedatasetbyadding/removingcolumnsintherow,changingdatatypes,andperforminggroupingoperations.Ontheleft-handsideoftheQueryEditor,youwillseeanincomingsetofcolumns,andontheright-handside,youwillseetheoutput.Thisiswhereyouwilldefineallyourtransformationfunctionsforspecificfieldsorassignhard-codedvalues.Wearenot

interestedintheincomingIDgeneratedbytheRow_Generationtransform.Forus,itservedthepurposeofcreatingarowthatwillholdour“HelloWorld!”valueandwillbeinsertedinatable.

9. IntherightpanelofQueryEditor,right-clickonQueryandchooseNewOutputColumn…:

10. SelectthefollowingsettingsintheopenedColumnPropertieswindowtodefinethepropertiesofournewlycreatedcolumnandclickonOK:

11. Now,whenourgeneratedrowhasonecolumn,wehavetopopulateitwithvalue.Forthis,wehavetousetheMappingtabinQueryEditor.SelectouroutputfieldTEXTandenterthe“HelloWorld!”valueinthemappingtabwindow.Donotforgetsinglequotes,whichmeanastringinDS.Then,closeQueryEditoreitherwiththetabcrossinthetop-rightcorner(donotconfuseitwiththeDesignerapplicationcrossthatislocateddangerouslyclosetoit)orjustusetheBackbutton(Alt+Left),agreenarrowiconinthetopinstrumentpanel.

Atthispoint,wehaveasourceinourdataflow.Wealsohaveatransformationobject(theQuerytransform),whichdefinesourtextcolumnandassignsavaluetoit.Whatismissingisatargetobjectwewillinsertourrowto.

Aswewilluseatableasatargetobject,wehavetocreateareferencetoadatabasewithinDataServices.Wewillusethisreferencetocreateatargettable.Thosedatabasereferencesarecalleddatastoresandareusedasapresentationofthedatabaselayer.Inthenextstep,wewillcreateareferencetoourSTAGEdatabasecreatedinthepreviouschapter.

12. GototheDatastorestabofLocalObjectLibrary.Then,right-clickontheemptywindowandselectNewtoopentheCreateNewDatastorewindow.

13. Choosethefollowingsettingsforthenewlycreateddatastoreobject:

14. Repeatsteps12and13tocreatetherestofdatastoreobjectsconnectedtothedatabaseswecreatedinthepreviousrecipes.UsethesamedatabaseservernameandusercredentialsandchangeonlytheDatastoreNameandDatabasenamefieldswhencreatingnewdatastores.Seethefollowingtableforreference:

DatastoreName Databasename

DS_ODS ODS

DWH AdventureWorks_DWH

OLTP AdventureWorks_OLTP

Now,youshouldhavefourdatastorescreated,referencingalldatabasescreatedintheSQLserver:DS_STAGE,DS_ODS,DWH,andOLTP.

15. Now,wecanusetheDS_STAGEdatastoretocreateourtargettable.GobacktotheDF_HelloWorldintheDataFlowtabofthedesignwindowandselectTemplateTableontherighttoolpanel.Putitontheright-handsideoftheQuerytransformandchooseHELLO_WORLDasthetablenameintheDS_STAGEdatastore.

16. Ourfinaldataflowshouldlooklikethisnow:

17. GobacktotheJob_HelloWorld–JobtabandclickontheValidateAllbuttoninthetopinstrumentpanel.YoushouldgetthefollowingmessageintheoutputwindowofDesignerontheleft-handsideofyourscreen:Validate:NoErrorsFound(BODI-1270017).

18. Now,wearereadytoexecuteourfirstjob.Forthis,usetheExecute…(F8)buttonfromthetopinstrumentpanel.AgreetosavethecurrentobjectsandclickonOKonthefollowingscreen.

19. Seethatthelogscreenthatshowsyoutheexecutionstepscontainsnoexecutionerrors.Then,gotoyourSQLServerManagementStudio,opentheSTAGEdatabase,andcheckthecontentsoftheappearedHELLO_WORLDtable.Ithasjustonecolumn,TEXT,withonlyonevalue,“HelloWorld!”.

Howitworks…“HelloWorld!”isasmallexamplethatintroducesalotofgeneralandevensophisticatedconcepts.Inthefollowingsections,wewillquicklyreviewthemostimportantones.TheywillhelpyougetfamiliarwiththedevelopmentenvironmentinDataServicesDesigner.Keepinmindthatwewillreturntoallthesesubjectsagainthroughoutthebook,discussingtheminmoredetail.

ExecutingETLcodeinDataServicesToexecuteanyETLcodedevelopedintheDataServicesDesignertool,youhavetocreateajobobject.InDataServices,theonlyexecutableobjectisjob.Everythingelsegoesinsidethejob.

ETLcodeisorganizedasahierarchyofobjectsinsidethejobobject.Tomodifyanynewobjectbyplacinganotherobjectinit,youhavetoopentheeditedobjectinthemainworkspacedesignareaandthendraganddroptherequiredobjectinsideit,placingthemintheworkspacearea.Inourrecipe,wecreatedajobobjectandplacedthedataflowobjectinit.Wethenopenedthedataflowobjectintheworkspaceareaandplacedtransformobjectsinsideit.Asyoucanseeinthefollowingscreenshot,workspaceareasopenedpreviouslycouldbeaccessiblethroughthetabsatthebottomoftheworkspacearea:

TheProjectAreapanelcandisplaythehierarchyofobjectsintheformofatree.Toseeit,youhavetoassignyournewlycreatedjobtoaspecificprojectandopentheprojectinProjectAreabydouble-clickingontheprojectobjectinLocalObjectLibrary.

ExecutableETLcodecontainsonejobobjectandcancontainscript,dataflow,andworkflowobjectscombinedinvariouswaysinsidethejob.

Asyousawfromtherecipesteps,youcancreateanewjobbygoingtoLocalObjectLibrary|Jobs.

Althoughyoucancombinealltypesofobjectsbyplacingtheminthejobdirectly,someobjects,forexample,transformobjects,canbeplacedonlyintodataflowobjectsasdataflowistheonlytypeofobjectthatcanprocessandactuallymigratedata(onarow-by-rowbasis).Hence,alltransformationsshouldhappenonlyinsidethedataflow.Inthesameway,youcanonlyplacedatastoreobjects,suchastablesandviews,directlyin

dataflowsassourceandtargetobjectsfordatatobemovedfromsourcetotargetandtransformedalongtheway.Whenadataflowobjectisexecutedwithinthejob,itreadsdatarowbyrowfromthesourceandmovestherowfromlefttorighttothenexttransformobjectinsidethedataflowuntilitreachestheendandissenttothetargetobject,whichusuallyisadatabasetable.

Throughoutthisbook,youwilllearnthepurposeofeachobjecttypeandhowandwhenitcanbeused.

Fornow,rememberthatallobjectsinsidethejobareexecutedinthesequentialorderfromlefttorightiftheyareconnectedandsimultaneouslyiftheyarenot.Anotherimportantruleisthattheparentobjectstartsexecutingfirstandthenallobjectsinsideit.Theparentobjectcompletesitsexecutiononlyafterallchildobjectshavecompletedsuccessfully.

ValidatingETLcodeToavoidjobexecutionfailuresduetoincorrectETLsyntax,youcanvalidatethejobandallitsobjectswiththeValidateCurrentorValidateAllbuttononthetopinstrumentpanelinsidetheDesignertool:

ValidateCurrentvalidatesonlythecurrentobjectopenedintheworkspacedesignareaandscriptobjectsinitanddoesnotvalidatetheunderlyingchildobjectsuchasdataflowsandworkflows.Intheprecedingexample,theobjectopenedintheworkspaceisajobobjectthathasonechilddataflowobjectcalledDF_HelloWorldinsideit.OnlyonejobobjectwillbevalidatedandnotDF_HelloWorld.

ValidateAllvalidatesthecurrentandallunderlyingobjects.So,botharecurrentlyopenedintheworkspaceobject,andallobjectsyouseeintheworkspacearevalidated.Thesameappliestotheobjectsnestedinsidethem,downtotheveryendoftheobjecthierarchy.

So,tovalidatethewholejobanditsobjects,youhavetogotothejoblevelbyopeningthejobobjectintheworkspaceareaandclickingonValidateAllbuttononthetopinstrumentpanel.

ValidationresultsaredisplayedintheOutputpanel.WarningmessagesdonotaffecttheexecutionofthejobandoftenindicatepossibleETLdesignproblemsorshowdatatypeconversionsperformedbyDataServicesautomatically.ErrormessagesintheOutput|ErrorstabmeansyntaxorcriticaldesignerrorsmadeinETL.Wheneveryoutrytorunthejobafterseeing“red”errorvalidationmessages,thejobwillfailwithexactlythesameerrorsthatyousawatthebeginningofexecution,aseveryjobisimplicitlyvalidatedwhen

executed.

AlwaysvalidateyourjobmanuallybeforeexecutingittoavoidjobfailuresduetoincorrectsyntaxorincorrectETLdesign.

TemplatetablesThisisaconvenientwaytospecifythetargettablethatdoesnotyetexistinthedatabaseandsenddatatoit.Whenadataflowobjectwherethetemplatetargettableobjectisplacedisexecuted,itrunstwoDDLcommands,DROPTABLE<templatetablename>andCREATETABLE<templatetablename>,usingtheoutputschema(setofcolumns)ofthelastobjectinsidethedataflowbeforethetargettemplatetable.Onlyafterthat,thedataflowprocessesallthedatafromthesource,passingrowsfromlefttorightthroughalltransformations,andfinallyinsertsdataintothefreshlycreatedtargettable.

NoteNotethattablesarenotcreatedonthedatabaselevelfromtemplatetablesuntiltheETLcode(dataflowobject)isexecutedwithinDataServices.Simplyplacingthetemplatetableobjectinsideadataflowandcreatingitinadatastorestructureisnotenoughfortheactualphysicaltabletobecreatedinthedatabase.Youhavetorunyourcode.

Theyaredisplayedunderdifferentcategoriesinthedatastore.Theyappearseparatelyfromnormaltableobjects:

TheusageoftemplatetableisextremelyusefulduringETLdevelopmentandtesting.Itenablesyoutonotthinkaboutgoingtothedatabaselevelandchangingthestructureofthetablesbyaltering,deleting,orcreatingthemmanuallyiftheETLcodethatinsertsthedatainthetablechanges.Everytimedataflowruns,itwillbedeletingandrecreatingthedatabasetabledefinedthroughthetemplatetableobject,withthecurrentlyrequiredtablestructuredefinedbyyourcurrentETLcode.

Templatetableobjectsareeasilyconvertedtonormaltableobjectsusingthe“Import”

commandonthem.Thiscommandisavailablefromtheobject’scontextmenuinthedataflowworkspaceorinthedatastorestabinLocalObjectLibrary.

QuerytransformbasicsQuerytransformisoneofthemostimportantandmostoftenusedtransformobjectsinDataServices.Itsmainpurposeistoreaddatafromleftobject(s)(inputschema(s))andsenddatatotheoutputschema(objecttotherightoftheQuerytransform).YoucanjoinmultipledatasetswiththehelpoftheQuerytransformusingsyntaxrulesoftheSQLlanguage.

Additionally,youcanspecifythemappingrulesfortheoutputschemacolumnsinsidetheQuerytransformbyapplyingvariousfunctionstothemappedfields.Youcanalsospecifyhard-codedvaluesorevencreateadditionaloutputschemacolumns,likewedidinourHelloWorldexample.

TheexampleinthenextscreenshotisnotfromourHelloWorldexample.However,itdemonstrateshowtherowextractedpreviouslyfromthesourceobject(inputschema)canbeaugmentedwithextracolumnsorcangetitscolumnsrenamedoritsvaluestransformedbyfunctionsappliedtothecolumns:

Seehowcolumnsfromtwodifferenttablesarecombinedinasingledatasetintheoutputschema,withcolumnsrenamedaccordingtonewstandardsandnewcolumnscreatedwithNULLvaluesinthem.

TheHelloWorldexampleYouhavejustcreatedthesimplestdataflowprocessingunitandexecuteditwithinyourfirstjob.

ThedataflowobjectinourexamplehastheRow_Generationtransform,whichgeneratesrowswithonlyonefield.WegeneratedonerowwiththehelpofthistransformandaddedanextrafieldtotherowwiththehelpoftheQuerytransform.WetheninsertedourfinalrowintotheHELLO_WORLDtablecreatedautomaticallybyDataServicesintheSTAGEdatabase.

YoualsohaveconfiguredacoupleofDesignerpropertiesandcreatedaDatastoreobjectthatrepresentstheDataServicesviewoftheunderlyingdatabaselevel.Notalldatabaseobjects(tablesandviews)arevisiblewithinyourdatastorebydefault.Youhavetoimportonlythoseyouaregoingtoworkwith.InourHelloWorldexample,wedidnotimportthetableinthedatastore,asweusedthetemplatetable.ToimportthetablethatexistsinthedatabaseintoyourdatastoresothatitcanbeusedinETLdevelopment,youcanperformthefollowingsteps:

1. GotoLocalObjectLibrary|Datastores.2. Expandthedatastoreobjectyouwanttoimportthetablein.3. Double-clickontheTablessectiontoopenthelistofdatabasetablesavailablefor

import:

4. Right-clickonthespecifictableintheExternalMetadatalistandchooseImportfromthetablecontextmenu.

5. ThetableobjectwillnowappearintheTablessectionofthechosendatastore.Asithasnotyetbeenplacedinanydataflowobject,theUsagecolumnshowsa0value:

Creatingdifferentdatastoresforthesamedatabasecouldalsobeaflexibleandconvenientwayofcategorizingyoursourceandtargetsystems.

Thereisalsoaconceptofconfigurationswhenyoucancreatemultipleconfigurationsofthesamedatastorewithdifferentparametersandswitchbetweenthem.Thisisveryusefulwhenyouareworkinginacomplexdevelopmentenvironmentwithdevelopment,test,andproductiondatabases.However,thisisatopicforfuturediscussionintheupcomingchapters.

Chapter3.DataServicesBasics–DataTypes,ScriptingLanguage,andFunctionsInthischapter,IwillintroduceyoutoscriptinglanguageinDataServices.Inthischapter,wewillcoverthefollowingtopics:

CreatingvariablesandparametersCreatingascriptUsingstringfunctionsUsingdatefunctionsUsingconversionfunctionsUsingdatabasefunctionsUsingaggregatefunctionsUsingmathfunctionsUsingmiscellaneousfunctionsCreatingcustomfunctions

IntroductionItiseasytounderestimatetheimportanceofthescriptinglanguageinDataServices,butyoushouldnotfallforthispitfall.Insimplewords,scriptinglanguageisagluethatallowsyoutobuildsmartandreliableETLanduniteallprocessingunitsofwork(whicharedataflowobjects)together.

ThescriptinglanguageinDataServicesismainlyusedtocreatecustomfunctionsandscriptobjects.Scriptobjectsrarelyperformdatamovementanddatatransformation.Theyareusedtoassistthedataflowobject(maindatamigrationandtransformationprocesses).Theyareusuallyplacedbeforeandafterthemtoassistwithexecutionlogicandcalculatetheexecutionparametervaluesfortheprocessesthatextract,transform,andloadthedata.

ThescriptinglanguageinDataServicesisarmedwithpowerfulfunctionsthatallowyoutoquerydatabases,executedatabasestoredprocedures,andperformsophisticatedcalculationsanddatavalidations.Itevensupportsregularexpressionsmatchingtechniques,and,ofcourse,itallowsyoutobuildyourowncustomfunctions.ThesefunctionscanbeusednotjustinthescriptsbutalsointhemappingofQuerytransformsinsidedataflows.

Withoutfurtherdelay,let’sgettolearningscriptinglanguage.

CreatingvariablesandparametersInthisrecipe,wewillextendthefunctionalityofourHelloWorlddataflow(seetheUnderstandingtheDesignertoolrecipefromChapter2,ConfiguringtheDataServicesEnvironment).Alongwiththefirstrowsaying“HelloWorld!”,wewillgeneratethesecondrow,providingyouwiththenameoftheDataServicesjobthatgeneratedthegreetings.

ThisexamplewillnotjustallowustogetfamiliarwithhowvariablesandparametersarecreatedbutalsointroduceustooneoftheDataServicesfunctions.

GettingreadyLaunchyourDesignertoolandopentheJob_HelloWorldjobcreatedinthepreviouschapter.

Howtodoit…Wewillparameterizeourdataflowsothatitcanreceivetheexternalvalueofthejobnamewhereitisbeingexecuted,andcreatethesecondrowaccordingly.

Wewillalsorequireanextraobjectinourjob,intheformofascriptthatwillbeexecutedbeforethedataflowandthatwillinitializeourvariablesbeforepassingtheirvaluestothedataflowparameters.

1. Usingthescriptbutton( )fromtherightinstrumentpanel,createascriptobject.

Nameitscr_init,andplaceittotheleftofyourdataflow.Donotforgettolinkthem,asshowninthefollowingscreenshot:

2. Tocreatedataflowparameters,clickonthedataflowobjecttoopenitinthemainworkspacewindow.

3. OpentheVariablesandParameterspanel.AllpanelsinDesignercanbeenabled/displayedwithhelptheofthebuttonslocatedinthetopinstrumentpanel,asinthefollowingscreenshot:

4. Iftheyarenotdisplayedonyourscreen,clickontheVariablesbuttononthetopinstrumentpanel( ).Then,right-clickonParametersandchooseInsertfromthecontextmenu.Specifythefollowingvaluesforthenewinputparameter:

NoteNotethatthe$signisveryimportantwhenyoureferenceavariableorparameter,asitdefinestheparameterinDataServicesandisrequiredsothatthecompilercan

parseitcorrectly.Otherwise,itwillbeinterpretedbyDataServicesasatextstring.DataServicesautomaticallyputsthedollarsigninwhenyoucreateanewvariableorparameterfromthepanelmenus.However,youshouldnotforgettouseitwhenyouarereferencingtheparameterorvariableinyourscriptorintheCallssectionofthedataflow.

5. Now,let’screateajobvariablethatwewillusetopassthevaluedefinedinthescripttothedataflowparameter.Forthis,usetheBack(Alt+Left)buttontogotothejoblevel(sothatitscontentisdisplayedinthemaindesignwindow).Then,right-clickonVariablesintheVariablesandParameterspanelandchooseInsertfromthecontextmenutoinsertanewvariable.Nameit$l_JobNameandassignthevarchar(100)datatypetoit,whichisthesameasthedataflowparametercreatedearlier.

6. Topassvariablevaluesfromthejobtotheinputparameterofthedataflow,gototheCallstaboftheVariablesandParameterspanelonthejobdesignlevel.Here,youshouldseetheinputdataflow$p_JobNameparameterwithanemptyvalue.

7. Double-clickonthe$p_JobNameparameterandreferencethe$l_JobNamevariableintheValuefieldoftheParameterValuewindow.ClickonOK:

8. Assignavaluetoajobvariableinthepreviouslycreatedscriptobject.Todothis,openthescriptinthemaindesignwindowandinsertthefollowingcodeinit:$l_JobName=‘Job_HelloWorld’;

9. Finally,let’smodifythedataflowtogenerateanewcolumninthetargettable.Forthis,openthedataflowinthemaindesignwindow.

10. OpentheQuerytransformandright-clickontheTEXTcolumntogotoNewOutputColumn…|InsertBelow.

11. IntheopenedColumnPropertieswindow,specifyJOB_NAMEasthenameofthenewcolumnandassignitthesamedatatype,varchar(100).

12. IntheMappingtaboftheQuerytransformfortheJOB_NAMEcolumn,specifythe‘Createdby‘||$p_JobNamestring.

13. Gobacktothejobcontextandcreateanewglobalvariable,$g_JobName,byright-clickingontheGlobalVariablessectionandselectingInsertfromthecontextmenu.

14. YourfinalQueryoutputshouldlooklikethis:

15. Now,gobacktothejoblevelandexecuteit.Youwillbeaskedtosaveyourworkandchoosetheexecutionparameters.Atthispoint,wearenotinterestedinmodifyingthem,sojustcontinuewiththedefaultones.

16. AfterexecutingthejobinDesigner,gotoManagementStudioandquerytheHELLO_WORLDtabletoseethatanewcolumnhasappearedwiththe‘CreatedbyJob_HelloWorld’value.

Howitworks…AllmainobjectsinDataServices(dataflow,workflow,andjob)canhavelocalvariablesorparametersdefined.thedifferencebetweenanobjectvariableandanobjectparameterisverysubtle.Parametersarecreatedandusedtoacceptthevaluesfromotherobjects(inputparameters)orpassthemoutsideoftheobject(outputparameters).Otherwise,parameterscanbehaveinthesamewayaslocalvariables—youcanusetheminthelocalfunctionsorusethemtostoreandpassthevaluestoothervariablesorparameters.Dataflowobjectscanonlyhaveparametersdefinedbutnotlocalvariables.Seethefollowingscreenshotoftheearlierexample:

Workflowandjobobjects,ontheotherhand,canonlyhavelocalvariablesdefinedbutnotparameters.Localvariablesareusedtostorethevalueslocallywithintheobjecttoperformvariousoperationsonthem.Asyouhaveseen,theycanbepassedtotheobjectsthatare“calling”forthem(gotoVariablesandParameters|Calls).

Thereisanothertypeofvariablecalledaglobalvariable.Thesevariablesaredefinedatthejoblevelandsharedamongallobjectsthatwereplacedinthejobstructure.

WhatyouhavedoneinthischapterisacommonpracticeinDataServicesETLdevelopment:passingvariablevaluesfromtheparentobject(jobinourexample)tothechildobject(dataflow)parameters.

Tokeepthingssimple,youcanspecifyhard-codedvaluesfortheinputdataflowparameters,butthisisusuallyconsideredbadpractice.

Whatwecouldalsodoinourexampleispassglobalvariablevaluestodataflowparameters.Globalvariablesarecreatedataverytopjoblevelandaresharedbyallnestedobjects,notjustwithimmediatejobchildobjects.Thatiswhytheyarecalledglobal.Theycanbecreatedonlyinthejobcontext,asshownhere:

Also,notethatinDataServices,youcannotreferenceparentobjectvariablesdirectlyintochildobjects.Youalwayshavetocreateinputchildobjectparametersandmapthemon

theparentlevel(usingtheCallstaboftheVariablesandParameterspanel)tolocalparentvariables.Onlyafterdoingthis,youcangoinyourchildobjectandmapitsparameterstothelocalchildobject’svariables.

Now,youcanseethatparametersarenotthesamethingasvariables,andtheycarryanextrafunctionofbridgingvariablescopebetweenparentandchild.Infact,youdonothavetomapthemtoalocalvariableinsideachildobjectifyouarenotgoingtomodifythem.Youcanuseparametersdirectlyinyourcalculations/columnmapping.

Lastthingtosayhereisthatdataflowsdonothavelocalvariablesatall.Theycanonlyacceptvaluesfromtheparentsandusetheminfunctioncalls/columnmapping.Thatisbecauseyoudonotwritescriptsinsideadataflowobject.Scriptsareonlycreatedatthejoborworkflowlevelorinsidethecustomfunctionsthathavetheirownvariablescope.

DatatypesavailableinDataServicesaresimilartocommonprogramminglanguagedatatypes.Foramoredetaileddescription,referencetheofficialDataServicesdocumentation.

NoteTheblobandlongdatatypescanonlybeusedbystructurescreatedinsideadataflowor,inotherwords,columns.Youcannotcreatescriptvariablesanddataflow/workflowparametersofbloborlongdatatypes.

There’smore…TrytomodifyyourJob_HelloWorldjobtopassglobalvariablevaluestodataflowparametersdirectly.Todothis,usethepreviouslycreatedglobalvariable$g_JobName,specifyahard-codedvalueforit(orassignitavalueinsideascript,aswedidwiththelocalvariable)andmapittotheinputdataflowparameterontheCallstaboftheVariablesandParameterspanelinthejobcontext.Donotforgettorunthejobandseetheresult.

CreatingascriptYes,technicallywecreatedourfirstscriptinthepreviousrecipe,butlet’sbehonest—thisisnotthemostadvancedscriptintheworld,anditdoesnotprovideuswithmuchknowledgeregardingscriptinglanguagecapabilitiesinDataServices.Finally,althoughsimplicityisusuallyavirtue,itwouldbenicetocreateascriptthatwouldhavemorethanonerowinit.

Inthefollowingrecipe,wewillcreateascriptthatwoulddosomedatamanipulationandalittlebitoftextprocessingbeforepassingavaluetoadataflowinputparameter.

Howtodoit…Clearthecontentsofyourscr_initscriptobjectsandaddthefollowinglines.Notethateverycommandorfunctioncallshouldendwithasemicolon:#Scriptwhichdeterminesnameofthejoband

#preparesitfordataflowinputparameter

print(‘INFO:scr_initscripthasstarted…’);

while($l_JobNameISNULL)

begin

if($g_JobNameISNOTNULL)

begin

print(‘INFO:assigning$g_JobNamevalue’

||’of{$g_JobName}toa$l_JobNamevariable…’);

$l_JobName=$g_JobName;

end

else

print(‘INFO:globalvariable$g_JobNameisempty,’

||’calculatingvaluefor$l_JobName’

||’usingDataServicesfunction…’);

$l_JobName=job_name();

print(‘INFO:newvalueassignedtoalocal’

||‘variable:$l_JobName={$l_JobName}!’);

end

print(‘INFO:scr_initscripthassuccessfullycompleted!’);

TrytorunajobnowandconfirmthattherowinsertedintothetargetHELLO_WORLDtablehasaproperjobnameinthesecondcolumn.

Howitworks…Weintroducedacoupleofnewelementsofscriptinglanguagesyntax.The#signdefinesthecommentsectioninDataServicesscripts.

Notethatwealsoreferencedvariablevaluesinthetextstringusingcurlybrackets{$l_JobName}.Ifyouskipthem,theDataServicescompilerwillnotrecognizevariablesmarkedwiththe$signandwillusethevariablenameanddollarsignaspartofthestring.

TipYoucanalsousesquarebrackets[]insteadofcurlybracketstoreferencevariable/parametervalueswithinatextstring.Thedifferencebetweenthemisthatifyouusecurlybrackets,thecompilerwillputthevariablevalueinthequotedstring`value`insteadofusingitasitisusedinthetextstring.

ScriptinglanguageinDataServicesiseasytolearnasitdoesnothavemuchvarietyintermsofconditionalconstructs.Ithasasimplesyntax,andallitspowerscomefromfunctions.

Inthisparticularexample,youcanseeonewhileloopandoneconditionalconstruct.ThewhileloopistheonlytypeofloopsupportedintheDataServicesscriptinglanguageandtheonlyconditionalsupportedaswell.Thisisreallyallyouneedinmostcases.

Thewhile(<condition>)loopexpressionshouldincludeablockofcodestartingwithbeginandendingwithend.Theconditioncheckhappensatthebeginningofeachiteration(eventheveryfirstone),sokeepitinmindasevenyourveryfirstloopiterationcanbeskipped.Inourexample,thelooprunswhilethe$l_JobNamelocalvariableisempty.

Thesyntaxoftheifconditionalelementisthesame—eachconditionalblockshouldbewrappedinbegin/end.Itsupportselseif,andyoucanincludemultipleconditionalstatementsseparatedbyANDorOR.Wecanusetheconditionaltocheckwhethertheglobalvariablefromwhichwewillbesourcingvalueforthelocalvariableisemptyornot.Ifitisnotempty,wewouldassignittoalocalvariable,andifit’sempty,weshouldgenerateajobnameusingthejob_name()functionthatreturnsthenameofthejobitisexecutedin.

Theprint()functionisamainloggingfunctionintheDataServicesscriptinglanguage.Itallowsyoutoprintoutmessagesinthetracelogfile.Lookatthefollowingscreenshot.Itshowsanexcerptfromthetracelogfiledisplayedinoneofthetabsinthemaindesignwindowafteryouexecutethejob.

NoteWhenyouexecutethejob,DataServicesgeneratesthreelogfiles:tracelog,monitorlog,anderrorlog.Wewillexplaintheselogsindetailintheupcomingrecipesandchapters.Fornow,usethetracelogbuttontoseetheresultofyourjobexecution.

Messagesgeneratedbytheprint()functionaremarkedinthetracelogasPRINTFN(seethefollowingscreenshot).Youcanalsoaddyourownformattingintheprint()functiontomakethemessagesmoredistinguishablefromtherestofthelogmessages(seetheINFOwordaddedintheexamplehere):

UsingstringfunctionsHere,wewillexploreafewusefulstringfunctionsbyupdatingourHelloWorldcodetoincludesomeextrafunctionality.ThereisonlyonedatatypeinDataServicesusedtostorecharacterstrings,andthatisvarchar.Itkeepsthingsprettysimpleforstring-relatedandconversionoperations.

Howtodoit…Here,youwillseetwoexamples:applyingstringfunctionstransformationwithinadataflowandusingstringfunctionsinthescriptobject.

FollowthesestepstousestringfunctionsinDataServicesusingtheexampleofthereplace_substr()function,whichsubstitutespartofthestringwithanothersubstring:

1. OpentheDF_HelloWorlddataflowintheworkspacewindowandaddanewQuery

transformnamedWho_says_What.PutitaftertheQuerytransformandbeforethetargettemplatetable.

2. OpentheWho_says_WhatQuerytransformandaddanewWHO_SAYS_WHAToutputcolumnofthevarchar(100)type.

3. Addthefollowingcodeintoamappingtabofthenewcolumn:replace_substr($p_JobName,‘_’,’’)||’says’||word(Query.TEXT,1)

4. YournewQuerytransformshouldlookliketheoneinthefollowingscreenshot.Notethatyoushouldusesinglequotestodefinethestringtextinmappingorscript:

5. Thefinalversionofthedataflowshouldlooklikethis:

Saveyourworkandexecutethejob.GotoManagementStudiotoseethecontentsofthedbo.HELLO_WORLDtable.Thetablenowhasanewcolumnwiththe“JobHelloWorld”saysHellostring.

UsingstringfunctionsinthescriptWearenotquitehappywiththeWho_says_Whatstring.Obviously,onlyHelloWorldshouldbeputindoublequotes(theydonotaffectthebehaviorofstringtextinData

Services).Also,wewillusetheinit_cap()functiontomakesurethatonlythefirstletterofourjobnameiscapitalized.

ChangethemappingofWHO_SAYS_WHATtothefollowingcode:‘Job”’||init_cap(ltrim(lower($p_JobName),‘job_’))||’”’||’says’||word(Query.TEXT,1)

Accordingtothislogic,weareexpectingthejobnametostartwiththeJob_prefix.Inthiscase,wehavetoaddanextralogictothescriptlogicrunningbeforethedataflowtomakesurethatwehavethisprefixinourjobname.Thefollowingcodewilladditifthejobnameisnotvalidaccordingtoournamingstandards.Addthefollowingcodebeforethelastprint()functioncall:#Checkthatjobisnamedaccordingtothenamingstandards

if(match_regex($l_JobName,’^(job_).*$’,

‘CASE_INSENSITIVE’)=1)

begin

print(‘INFO:thejobnameiscorrect!”);

end

else

begin

print(‘WARNING:jobhasnotbeennamedaccording’

||‘tothestandards.’

||‘Changingthenameof{$l_JobName}…’);

$l_JobName=‘Job_’||$l_JobName;

print(‘INFO:newjobnameis’||$l_JobName);

end

Asthefinalstep,savethejobandexecuteit.Now,thestringinyourthirdcolumnshouldbeJob“Helloworld”saysHello.Now,evenifyourenameyourjobandremovetheJob_prefix,yourscriptshouldseethisandaddtheprefixtoyourjobname.

Howitworks…Asyoucanseeintheprecedingexample,weusedcommonstringmanipulationfunctionssimilartotheotherprogramminglanguages.

Inthefirstpartoftherecipe,wetransformedthemappingoftheWHO_SAYS_WHATcolumntostripouttheJob_prefixfromtheparametervalue.Thisallowsustocorrectlywraptherestofthejobnameintodoublequotesforbetterpresentation.

Theinit_cap()functioncapitalizesthefirstcharacteroftheinputstring.

Thelower()functiontransformstheinputstringtolowercase.

Theltrim()functiontrimsthespecifiedcharactersontheleft-handsideoftheinputstring.Usually,itisusedtoquicklyremoveleadingblankcharactersinstrings.Thertrim()functiondoesthesamethingbutfortrailingcharacters.

Theword()functionisextremelyusefulinparsingtheinputstringtoextract“words”orpartsofastringseparatedbyspacecharacters.Thereisanextendedversionoftheword_ext()function.Itacceptsaspecifiedseparatorasthethirdparameter.Asthesecondparameterinboththeseversions,youwillspecifythewordnumbertobeextractedfromthestring.

Youprobablyhavealreadyguessedthat||isusedasastringconcatenationoperator.

Thesecondpartofthechangesimplementedinthisrecipeinthescriptobjectcontainedtheveryinterestingandpowerfulmatch_regex()function.ItisoneofthefewfunctionsthatrepresentsregularexpressionsupportwithinDataServices.Ifyouarenotfamiliarwithregularexpressionconcept,youcanfindmanysourcesontheInternetexplainingitindetail.Regularexpressionsaresupportedinalmostallmajorprogramminglanguagesandallowyoutospecifymatchingpatternsinaveryshortform.Thismakesthemveryeffectivetoparseastringandfindamatchingsubstringorpatternforit.

IntheDataServicesmatch_regex()function,ifyouspecifyaregularexpressionpatternstringasasecondinputparameter,itwillreturn1ifitfindsthematchofthepatternintheinputstring.Itwillreturn0ifitdoesnotfindthematch.Itisaveryeffectivewaytovalidatetheformatofthetextstringorlookforspecificcharactersorpatternsinthestring.

Here,wecheckedwhetherourjobhastheprefixJob_initsname.Ifnot,weshouldaddittothebeginningofthejobnamebeforepassingthevaluetoadataflow.

There’smore…FeelfreetoexploretheexistingstringfunctionsavailableinDataServices.Therearesomeextendedversionsofthefunctionswealreadyusedintheprecedingrecipe.Youcantakealookatthem.Forexample,theltrim_blanks()functionallowsyoutoquicklyremoveblankcharacterswithoutspecifyingextraparameters.Itsextendedversion,theltrim_blanks_ext(),substr()functionreturnspartofthestringfromanotherstring.Thereplace_substr()functionisusedtosubstitutepartofthestringwithanotherstring.

Wewilldefinitelyusesomeoftheminourfuturerecipesthroughoutthebook.

UsingdatefunctionsCorrectlydealingwithdatesandtimeiscriticallyimportantindatawarehouses.Intheend,youshouldunderstandthatthisisoneofthemostimportantattributesinamajorityoffacttablesinyourDWH,whichdefinesthe“position”ofyourdatarecords.Lotsofreportsarefilteringdatabydate-timefieldsbeforeperformingdataaggregation.ThisisprobablywhyDataServiceshasadecentamountofdatefunctions,allowingavarietyofoperationsondate-timevariablesandtablecolumns.

DataServicessupportsthefollowingdatedatatypes:date,datetime,time,andtimestamp.Theydefinewhatpartoftimeunitsarestoredinthefield:

date:Thisstoresthecalendardatedatetime:Thisstoresthecalendardateandthetimeofthedaytime:Thisstoresonlythetimeofthedaywithoutthecalendardatetimestamp:Thisstoresthetimeofthedayinsubseconds

Howtodoit…GeneratingcurrentdateandtimeHereisascriptthatcanbeincludedinyourcurrentscriptobjectintheHelloWorldjobtodisplaythegenerateddatevaluesinthejobtracelog.

Totestthisscript,createanewjobcalledJob_Date_FunctionsandanewscriptwithinitcalledSCR_Date_Functions.Also,createfourlocalvariablesinthejob:$l_dateofthedatedatatype,$l_datetimeofthedatetimedatatype,$l_timeofthetimedatatype,and$l_timestampofthetimestampdatatype.

Printoutdatefunctionexamplestothetracelog:$l_date=sysdate();

print(‘$l_date=[$l_date]’);

$l_datetime=sysdate();

print(‘$l_datetime=[$l_datetime]’);

$l_time=systime();

print(‘$l_time=[$l_time]’);

$l_timestamp=systime();

print(‘$l_timestamp=[$l_timestamp]’);

$l_timestamp=sysdate();

print(‘$l_timestamp=[$l_timestamp]’);

Thetracelogfiledisplaysthefollowinginformation:$l_date=2015.05.05

$l_datetime=2015.05.0518:47:27

$l_time=18:47:27

$l_timestamp=1900.01.0118:47:27.030000000

$l_timestamp=2015.05.0518:15:21.472000000

Asyoucansee,differentdatatypesareabletostoredifferentamountsofdata.Also,youseethatthesystime()functiondoesnotgeneratedate-relateddata(days,months,andyears),and1900.01.01thatyouseeinthefirsttimestampvariableoutputisadummydefaultdatevalue.Thesecondoutputshowsthatweusedthesysdate()functiontogetthisinformation.

ExtractingpartsfromdatesHerearesomeusefuloperationsyoucanperformtoextractpartsfromdatatypevalues.Notethatallofthemreturnintegervalues.Youcanappendthesecommandstothescriptobjectalreadycreatedinordertotesthowtheywork:$l_datetime=sysdate();

print(‘$l_datetime=[$l_datetime]’);

#ExtractYearfromdatefield

print(‘Year=’||date_part($l_datetime,‘YY’));

#ExtractDayfromdatefield

print(‘Day=’||date_part($l_datetime,‘DD’));

#ExtractMonthfromdatefield

print(‘Month=’||date_part($l_datetime,‘MM’));

#Displaydayinmonthfortheinputdate

print(‘DayinMonth=’||day_in_month($l_datetime));

#Displaydayinweekfortheinputdate

print(‘DayinWeek=’||day_in_week($l_datetime));

#Displaydayinyearfortheinputdate

print(‘DayinYear=’||day_in_year($l_datetime));

#Displaynumberofweekinyear

print(‘WeekinYear=’||week_in_year($l_datetime));

#Displaynumberofweekinmonth

print(‘WeekinMonth=’||week_in_month($l_datetime));

#Displaylastdayofthecurrentmonthintheprovidedinputdate

print(‘Lastdateofthedatemonth=’||last_date($l_datetime));

Theoutputinatracelogshouldbesimilartothis:$l_datetime=2015.05.0515:55:09

Year=2015

Day=5

Month=5

DayinMonth=5

DayinWeek=2

DayinYear=125

WeekinYear=18

WeekinMonth=1

Lastdateofthedatemonth=2015.05.3115:55:09

Howitworks…Somefunctionsusetheextraformattingparameter,forexample,date_part()does.Youcanalsouse‘HH’,‘MI’,‘SS’toextracthours,minutes,andsecondsrespectively.

Therearealsoshorterversionsofthedate_part()functionthatallowyoutoextractyear,month,orquarterwithoutspecifyinganyextraformattingparameters.Forthis,youcanusetheyear(),month(),andquarter()functions.

Aninterestingfunctionistheisweekend()function.Itreturns1ifthespecifieddatevalueisaweekend,and0ifit’snot.

There’smore…YoucanaccessthefulllistoffunctionsavailableinDataServicesfromdifferentplacesinDesigner.Oneoptionistoopenthescriptobject.ThereisaFunctions…buttonatthetopofthemaindesignwindow.ClickittoopentheSelectFunctionwindow.Allfunctionsarecategorizedandhaveashortdescriptionexplaininghowtheyworkandwhattheyrequireasinputparameters.Lookatthisscreenshot:

ThesamebuttonisalsoavailableontheMappingtaboftheQuerytransforminsideadataflow,soyoucanaccessitifyouaretryingtocreateatransformationruleforoneofthecolumns.

ThislistisalsoavailableinSmartEditor,butwewilldiscussitindetailinoneofthenextrecipes.Ofcourse,youcanalwaysreferencetheDataServicesdocumentationtoseeallfunctionsavailableinDataServices,andsomeexamplesoftheirusage.

UsingconversionfunctionsConversionfunctionsallowyoutochangethedatatypeofthevariableorcolumndatatypeintheQuerytransformfromonetoanother.Thisisveryhandy,forexample,whenyoureceivedatevaluesasstringcharactersandwanttoconvertthemtointernaldatedatatypestoapplydatefunctionsorperformarithmeticoperationsonthem.

Howtodoit…Oneofthemostusedfunctionstoconvertfromonedatatypetoanotheristhecast()function.Lookattheexampleshere.Asusual,createanewjobwithanemptyscriptobjectandtypethiscodeinit.Createa$l_varcharjoblocalvariableofthevarchar(10)datatype:$l_varchar=‘20150507’;

#Castingvarchartointeger

print(cast($l_varchar,‘integer’));

#Castingvarchartodecimal

print(cast($l_varchar,‘decimal(10,0)’));

#Castingintegervaluetovarchar

print(cast(987654321,‘varchar’(10)’));

#Castingintegertoadouble

print(cast($l_varchar,‘double’));

Theoutputisshownhere:

Rememberthattheprint()functionautomaticallyconvertstheinputtovarcharinordertodisplayitinatracefile.Notehowcastingtoadoubledatatypechangedtheappearanceofthenumber.

Castingishelpfulinordertomakesurethatyouaresendingvaluesofthecorrectdatatypetothecolumnofaspecificdatatypeorfunctionthatexpectsthedataoftheparticulardatatyperequiredforittoworkcorrectly.AutomaticconversionsperformedbyDataServiceswhenthevalueofonedatatypeisassignedtoavariableorcolumnofadifferentdatatypecouldproduceunexpectedresultsandleadtoerrors.

However,themostusefulconversionfunctionsarefunctionsusedtoconvertastringtoadateandviceversa.Addthefollowinglinestoyourscriptandrunthejob:$l_varchar=‘20150507’;

#Castingvarchartoadate

print(to_date($l_varchar,‘YYYYMMDD’));

#Convertingchangingformatoftheinputdate

#from”YYYYMMDD’to‘DD.MM.YYYY’

print(

to_char(to_date($l_varchar,‘YYYYMMDD’),‘DD.MM.YYYY’)

);

Whenconvertingtextstringtoadate,youhavetospecifytheformatofthestringsothattheDataServicescompilercaninterpretandconvertthevaluescorrectly.ThefulltableofpossibleformatsavailableinthesetwofunctionsisavailableintheDataServicesReferenceGuideavailablefordownloadathttp://help.sap.com.Refertoitformoredetails.Herearesomemoreexamplesoftheto_char()functionconversionsofadatevariable:$l_date=sysdate();

print(to_char($l_date,‘DDMONYYYY’));

print(to_char($l_date,‘MONTH-DD-YYYY’));

Thetracelogshouldbesimilartothefollowingone:07MAY2015

MAY-07-2015

Let’sgetfamiliarwithanotherinterestingdatatype:interval.Ithelpsyouperformarithmeticoperationsondates.Thescripthereperformsarithmeticoperationsonadatestoredinthe$l_datevariablebyfirstadding5daystoit,thencalculatingthefirstdateofthenextmonth,andfinallysubtracting1secondfromthedate-timevaluestoredinthe$l_datetimevariable.

Seetheexamplehere:$l_date=to_date(‘01/05/2015’,‘DD/MM/YYYY’);

print(‘Date=’||$l_date);

#Add5daystothe$l_datevalue

print(‘{$l_date}+5days=’||$l_date+num_to_interval(5,‘D’));

#Calculatefirstdayofnextmonth

print(‘Firsdayofnextmonth=’||last_date($l_date)+num_to_interval(1,‘D’));

#Subtract1secondoutofthedatetime

$l_datetime=to_date(‘01/05/201500:00:00’,‘DD/MM/YYYYHH24:MI:SS’);

print(‘{$l_datetime}minus1second=’||$l_datetime-num_to_interval(1,‘S’));

Howitworks…Youprobablyhavenotnoticed,butyouhavealreadyseentheresultsofimplicitdatatypeconversionmadeautomaticallybyDataServicesinthepreviousrecipes.Forexample,dateextractfunctionsreturnedintegervaluesthatwereconvertedautomaticallytovarcharsothattheycouldbeconcatenatedwiththestringpartanddisplayedusingtheprint()function,which,bytheway,canacceptonlyvarcharasaninputparameter.

DataServicesdoesdatatypeconversionsautomaticallywheneveryouassignavalueofonedatatypetoavariableorcolumnofadifferentdatatype.TheonlypotentialpitfallhereisthatifyourelyonautomaticconversionyouareleavingsomeguessingworktoDataServicesandcangetunexpectedresultsintheend.So,understandinghowandwhenconversionhappensautomaticallytoimplementmanualchecksinsteadcouldbecritical.ManybugsinETLcodearerelatedtoincorrectdatatypeconversion,soyoushouldbeextracareful.

There’smore…Trytoexperimentwithautomaticconversion.Forexample,whenaddingintegernumberstodatevariables:sysdate()+10toseehowDataServicesbehavesandwhichdefaultparametersitusesforformattingautomaticallyconvertedvalue.

UsingdatabasefunctionsThereisnogreatvarietyoffunctionsinthisarea.DataServicesencouragesyoutocommunicatewithdatabaseobjectsandcontroltheflowofdatawithinadataflow.

Howtodoit…Youwilllearnalittlemoreaboutthefunctionshere.

key_generation()First,let’slookatthekey_generation()function.Thisisthefunctionthecanbecalledonlyfromthedataflow(whenusedincolumnmapping),sowearenotinterestedinitatthispointaswecannotuseitintheDataServicesscripts.

ThisfunctionisactuallysimilartotheKey_Generationtransformobjectthatcanbeusedaspartofadataflowaswell,anditisusedtolookupthehighestkeyvaluefromatablecolumnandgeneratethenextone.Thisisoftenusedtopopulatethekeycolumnofthenewrecordwiththeuniquevaluesbeforeinsertingthisrecordtoatargettable.WewilltakeacloserlookattheKey_Generationtransformintheupcomingchapters.

total_rows()Thisfunctionisusedtocalculatethetotalnumberofrowsinthedatabasetable.Theeasiestandquickestwaytocheckinthescriptwhetherthetableisemptyornotbeforerunningadataflowpopulatingthistableistorunthisfunction.Then,accordingtotheresults,youcanmakefurtherdecisions,thatis,truncatethetabledirectlyfromascriptbeforerunningthenextdataflow.Alternatively,youcanuseconditionalstoskipthenextportionofETLcodeentirely.

Seetheexampleofhowthisfunctionisused.Asusual,youcancreateanewjobwithascriptobjectinsideit.Typethefollowingcodeandrunthejob:print(

total_rows(‘DWH.DBO.DIMACCOUNT’)

);

DonotforgettoimportthetableintoyourDWHdatastoreasyoucanreferenceonlytablesthathavebeenimportedinyourDataServicesrepository.Lookatthisscreenshot:

sql()Thesql()functionisauniversalfunctionthatallowsyoutoperformSQLcallstoanydatabaseforwhichyoucreatedadatastoreobject.YoucanrunDDLandDMLstatements,SELECTqueries,andevencallstoredproceduresanddatabasefunctions.

NoteYoushouldbeusingthesql()functionverycarefullyinyourscripts,andwedonotrecommendthatyouuseitatallincolumnmappingsinsideadataflow.Thisfunctionshouldonlybeusedtoreturnonerecordwithasfewfieldsaspossible.So,alwaystestthe

statementyouplaceinsidethesql()functiondirectlyinthedatabasefirsttomakesureitbehavesasexpected.

Forexample,tocalculatethetotalnumberofrowsintheDimAccounttablewiththesql()function,youcanusethefollowingcode:print(‘TotalnumberofrowsinDBO.DIMACCOUNTtableis:’||

sql(‘DWH’,‘SELECTCOUNT(*)FROMDBO.DIMACCOUNT’)

);

Howitworks…Thesql()functionisveryconvenientfordoingstoredproceduresexecutions,truncating,andcreatingdatabaseobjectsanddoinglookupsforaggregatedvalueswhenthequeryreturnsonlyoneroworevenonevalue.Ifyoutrytoreturnthedatasetofmultiplerows,youwillgetonlythevalueofthefirstfieldfromthefirstrow.Itisstillpossibletoquerymultiplefields,butitwillrequirethatyoumodifythequeryitselfandaddextracodetoparsethereturnedstring(seetheexamplehere):#returningmultiplefieldsfromadatabasetable

$l_row=sql(‘DWH’,‘SELECTCONVERT(VARCHAR(10),ACCOUNTKEY)’||

’+','+CONVERT(VARCHAR(50),ACCOUNTDESCRIPTION)’||

‘FROMDBO.DIMACCOUNT’);

$l_AccountKey=word_ext($l_row,1,’,’);

$l_AccountDescription=word_ext($l_row,2,’,’);

print(‘AccountKey={$l_AccountKey}’);

print(‘AccountDescription={$l_AccountDescription}’);

Asyoucansee,thisisalotofcodeforsuchasimpleprocedure.IfyouwanttoextractandparseamultiplerowsintheDataServicesscript,youwillhavetocreatearow-countingmechanismandloopthroughtherowsbydoingmultiplequeryexecutionswithinaloop.However,youcantrytodothisyourselfasanexercisetopracticealittlebitofDataServicesscriptinglanguage.

NoteNotethatyoudonothavetoimportthetableyouwanttoreferenceinthesql()functionintoadatastore.

UsingaggregatefunctionsAggregatefunctionsareusedindataflowQuerytransformstoperformaggregationonthegroupeddataset.

YoushouldbefamiliarwiththesefunctionsastheyarethesameonesusedintheSQLlanguage:avg(),min(),max(),count(),count_distinct(),andsum().

Howtodoit…Todemonstratetheuseofaggregatefunctions,wewillperformasimpleanalysisofoneofourtables.ImporttheDimGeographytableintotheDWHdatastoreandcreateanewjobwithasingledataflowinsideitusingthesesteps:

1. YourdataflowshouldincludetheDimGeographysourcetableandtheDimGeography

targettemplatetableinaSTAGEdatabasetosendtheoutputto:

2. OpentheQuerytransformandcreatethefollowingoutputstructure:

TheCOUNTRYREGIONCODEcolumncontainscountrycodevaluesandwillbethecolumnonwhichweperformthegroupingofthedataset.Itismappedfromtheinputdatasettotheoutput.Also,draganddropittotheGROUPBYtaboftheQuerytransformfromtheinputdatasettospecifyitasagroupingcolumn.OthercolumnsarecreatedasNewOutputColumn…(choosethisoptionfromthecontextmenuoftheCOUNTRYREGIONCODEcolumn)andcontainthefollowingmappings(seethetablehere):

Outputcolumnname Mappingexpression

COUNT_DISTINCT_PROVINCE count_distinct(DIMGEOGRAPHY.STATEPROVINCENAME)

COUNT_PROVINCE count(DIMGEOGRAPHY.STATEPROVINCENAME)

MIN_KEY min(DIMGEOGRAPHY.GEOGRAPHYKEY)

MAX_KEY max(DIMGEOGRAPHY.GEOGRAPHYKEY)

3. Savethechangesandrunthejob.Now,gotoManagementStudioandquerythecontentsofthenewlycreatedDimGeographytableintheSTAGEdatabase.Youshould

gettheresultsasshowninthisscreenshot:

Howitworks…WhatwehavejustbuiltinthedataflowintheQuerytransformcanbedonewiththefollowingSQLstatement:select

CountryRegionCode,

COUNT(DISTINCTStateProvinceName),

COUNT(StateProvinceName),

MIN(GeographyKey),

MAX(GeographyKey)

from

dbo.DimGeography

groupby

CountryRegionCode;

First,thecount_distinct()functioncalculatesthenumberofdistinctprovinceswithineachcountry,count()calculatesthetotalnumberofrowsforeachcountry,andmin()andmax()showthelowestandhighestGeographyKeyvalueswithineachcountrygroup,respectively.

NoteYoucannotusethesefunctionsdirectlyinthescriptinglanguagebutonlyintheQuerytransform.IfyouneedtoextracttheaggregatedvaluesfromthedatabasetableswithinDataServicesscript,youcanusesql()containingtheSELECTstatementwithaggregateddatabasefunctions.

UsingmathfunctionsDataServiceshasastandardsetoffunctionsavailabletoperformmathematicaloperations.Inthisrecipe,wewillusethemostpopularofthemtoshowyouwhatoperationscanbeperformedonnumericdatatypes.

Howtodoit…1. CreateanewjobandnameitJob_Math_Functions.2. Insidethisjob,createasingledataflowcalledDF_Math_Functions.3. ImporttheFactResellerSalestableinyourDHWdatastoreandaddittothe

dataflowasasourceobject.4. AddthefirstQuerytransformafterthesourcetableandlinkthemtogether.Then,

openitanddragtwocolumnstotheoutputschema:PRODUCTKEYandSALESAMOUNT.SpecifytheFACTRESELLERSALES.PRODUCTKEY=354filteringconditionintheWHEREtab:

5. AddthesecondQuerytransformandrenameitGroup.Here,wewillperformagroupingoperationontheproductkeyweselectedintheprevioustransform.Todothis,addthePRODUCTKEYcolumnintheGROUPBYtabandapplythesum()aggregatefunctiononSALESAMOUNTintheMappingtab:

6. Finally,addthelastQuerytransformcalledMathandlinkittothepreviousone.Insideit,dragallcolumnsfromthesourcetothetargetschemaandaddthenewonesusingNewOutputColumn….Specifymappingexpressions,asinthefollowingscreenshot:

7. Asthelaststep,addanewtemplatetablelocatedintheSTAGEdatabaseownedbythedbouser.ThistemplateiscalledFACTRESELLERSALES.Yourdataflowshouldlooklikethisnow:

8. Saveandrunthejob.Then,tochecktheresultdataset,eitherquerythenewtablefromSQLServerManagementStudio.Alternatively,openyourdataflowinDataServicesandclickonthemagnifiedglassiconofyourFACTRESELLERSALES(DS_STAGE.DBO)targettableobjecttobrowsethedatadirectlyfromDataServices.

Howitworks…TheresultyouseehereverywellexplainstheeffectofthemathfunctionsappliedtoyourSALESAMOUNTcolumnvalue:

Theceil()functionreturnsthesmallestintegervalue(automaticallyconvertedtoaninputcolumndatatype;thatiswhy,youseetrailingzeroes)equaltoorgreaterthanthespecifiedinputnumber.

Thefloor()functionreturnsthehighestintegervalueequaltoorlessthantheinputnumber.

Therand_ext()functionreturnsarandomrealnumberfrom0to1.InDataServices,youdonothavemuchcontroloverthebehaviorofthefunctionsthatgeneraterandomnumbers.So,youhavetoapplyextramathematicaloperationstodefinetherangeofthegeneratedrandomnumbersandtheirtypes.Intheexampleearlier,wegeneratedrandomintegernumbersfrom0to10inclusively.

Thetrunc()andround()functionsperformroundingoperationssimilartoceil()andfloor(),buttrunc()justtruncatesthenumbertothelengthspecifiedinthesecondparameterandshowsyoutheresultasis.Ontheotherhand,theround()functionroundsthenumberaccordingtotheprecisionspecified.

There’smore…Asanexercise,trytheotherDataServicesmathematicalfunctions.Modifythecreateddataflowtoincludeexamplesoftheirusage.Toseethefulllistofmathematicalfunctionsavailable,usetheFunctions…buttoninthescriptobjectorcolumnmappingfieldandchoosetheMathFunctionscategoryintheSelectFunctionwindow:

UsingmiscellaneousfunctionsActually,miscellaneousgroupsincludealmostalltypesoffunctionsthatcannoteasilybecategorized.Amongmiscellaneousfunctions,therearefunctionsthatallowyoutoextractusefulinformationfromtheDataServicesrepository,forexample,nameofthejob,theworkflowordataflowitisexecutedfrom,functionsthatallowyoutoperformadvancedstringsearches,functionssimilartootherstandardSQLfunctions,andmanyothers.Throughoutthebook,wewillveryoftenuseDataServicesmiscellaneousfunctions.So,inthisrecipe,wewilltakealookatsomeofthosethatareusuallyusedinthescriptsandhelpyouquerytheDataServicesrepository.

Howtodoit…Atthispoint,youshouldbeprettycomfortablecreatingnewjobs,scriptobjects,anddataflowobjects.So,Iwillnotexplainthestepsindetaileverytimeweneedtocreateanewtestjobobject.Ifyouforgothowtodoit,refertothepreviousrecipesinthebook.

1. Createanewjobandaddascriptobjectinit.2. Openthescriptandpopulateitwiththefollowingcode.Thiscodeshowsyouan

exampleofhowtousethreemiscellaneousfunctions:ifthenelse(),decode(),andnvl():#Conditionalfunctions

$l_string=‘Lengthofthatstringis38characters’;

$l_result=ifthenelse(length($l_string)=8,print(‘TRUE’),print(‘FALSE’));

$l_string=‘Lengthofthatstringis38characters’;

$l_result=decode(

length($l_string)=10,print(‘TRUE’),

length($l_string)=12,print(‘TRUE’),

length($l_string)=38,print(‘TRUE’),

print(‘FALSE’)

);

$l_string=NULL;

$l_string=nvl($l_string,‘Emptystring’);

print($l_string);

3. Forthisscripttowork,youshouldalsomakesurethatyouhavelocalvariablescreatedatthejoblevel—$l_stringand$l_resultofthevarchar(255)datatype.

Howitworks…MostofthemiscellaneousfunctionsarefunctionsthatrequireadvancedknowledgeofDataServices.Inthisbook,youwillseealotofexamplesofhowtheycanbeusedincomplexdataflowsandDataServicesscripts.

Inthisrecipe,wecanseethreeconditionalfunctions:ifthenelse(),decode(),andnvl().Theyallowyoutoevaluatetheresultofanexpressionandexecuteotherexpressions,dependingontheresultoftheinitialevaluation.

Afterexecutingtheearlierscript,youcanseethefollowingtracelogrecords:817212468PRINTFN18/05/20158:12:34p.m.FALSE

817212468PRINTFN18/05/20158:12:34p.m.TRUE

817212468PRINTFN18/05/20158:12:34p.m.Emptystring

Theifthenelse()functionacceptsoneinputparameter:acomparisonexpression,whichreturnseitherTRUEorFALSE.IfTRUE,thenthesecondparameterofifthenelse()isexecuted(ifitisanexpression)orjustreturnedastheresultofthefunction.Thethirdparameterisexecuted(orreturned)ifthecomparisonexpressionreturnsFALSE.

Thedecode()functiondoesthesamethingastheifthenelse()function,exceptthatitallowsyoutoevaluatemultipleexpressions.Itsparametersgoinpairs,asyoucanseeintheexample.ThefirstparameterinapairisacomparisonexpressionandthesecondparameteriswhatisreturnedbythefunctionifthecomparisonexpressionisTRUE.IfitreturnsFALSE,thendecode()movestothenextpairandthenthenextoneuntilitreachesthelastpair.IfnoneoftheexpressionsreturnedTRUE,thenthelastparameterofthedecode()isreturnedasadefaultvalue.

NoteBearinmindthatthedecode()functionfirstreturnsTRUEwithoutevaluatingtherestoftheconditions.So,becarefulwiththeorderofconditionalexpressionsinthedecode()function.

Finally,thelastfunctionintheexampleisthecommonSQLfunctionnvl().ItreturnsthevaluespecifiedinthesecondparameterifthefirstparameterisNULL.Thisfunctionisveryusefulindataflows.Usually,itisusedasamappingexpressionintheQuerytransformtopreventNULLvaluesfromcomingthroughforaspecificcolumn.AllNULLvalueswillbeconvertedtothevalueyoudefineinthenvl()function.

CreatingcustomfunctionsInthisrecipe,wewillgetfamiliarwithaSmartEditortoolavailableinDesignertohelpyouwriteyourscriptsorfunctionsinaconvenientway.

Wewillcreateanewfunctionthatcanbeexecutedeitherwithinascriptorwithinadataflow.Thisfunctionacceptstwoparameters:datevalueandnumberofdays.Itthenaddsthenumberofdaystotheinputdateandreturnstheresultdate.

Howtodoit…1. OpenDesignerandgotoTools|CustomFunctions…fromthetoplevelmenu:

2. Intheopenedwindow,right-clickintheareawiththelistoffunctionsandchooseNew….

3. Choosethenameofthenewfn_add_daysfunctionandpopulatethedescriptionsection,asshowninthisscreenshot:

4. Then,clickonNexttoopenaSmartEditorwindowandinputthefollowingcode:try

begin

$l_Date=to_date($p_InputDate,‘DD/MM/YYYY’);

$l_Days=num_to_interval($p_InputDays,‘D’);

end

catch(all)

begin

print(‘fn_add_days()FAILED:checkinputparameters’);

raise_exception(‘fn_add_days()FAILED:checkinputparameters:’||

’DateformatDD/MM/YYYYandnumberofdaysshouldbeanintegervalue’);

end

$l_Result=$l_Date+$l_Days;

Return$l_Result;

5. Forittowork,youhavetocreateasetofrequiredinput/outputparametersandlocalvariablesforthiscustomfunction.YourfunctionintheSmartEditorshouldlookliketheoneshowninthisscreenshot:

6. Createthefollowinginputparameters:$p_InputDateofthevarchardatatypeand$p_InputDaysoftheintegerdatatype.UsetheleftpanelVariablesinsidetheCustomFunctionwindow.

7. Theselocalvariableswillbeusedonlywithinafunctionandwillnotbeaccessiblefromoutsideofthefunction.Create$l_Dateofthedatedatatype,$l_Daysoftheintervaldatatype,and$l_Resultofthedatedatatype.

8. Now,itistimetoclickonOKtocreateourfirstcustomfunctionanduseitinthejob.Forthis,youcancreateasimplejobwithonescriptobjectinsideitusingthefollowingcode:print(to_char(fn_add_days(‘10/10/2015’,12),‘DD-MM-YYYY’));

Howitworks…Wemadetheinputparametersofthevarcharandintegerdatatypesfortheconvenienceofcallingthefunction.Itwillitselfperformtheconversiontothecorrectdateandintervaldatatypesbeforereturningtheresultofthedatesumoperation.

Eventhoughwehavenotusedthenum_to_interval()functiontoconvertintegervaluestointervals,DataServiceswillstillperformthecorrectsumoperation.Thisisbecauseitdoesanautomaticconversionofthenumericdatatypeintointervalsofdayswhenitisusedinarithmeticoperationwithdates.Thatiswhy,print(sysdate()+1)willreturnyoutomorrow’sdate.

Inthecodementionedearlier,youcanalsoseetheerror-handlingmechanismthatcanbeusedinDataServicesscripts:thetry-catchblock.Everythingexecutedbetweentryandcatchiffailedwillneverfailtheparentobjectexecution.Itisveryusefulifyoudonotwanttofailyourjobbecauseofthenon-criticalpieceofcodefailingsomewhereinsideit.Incaseofafailedexecution,itispassedtothesecondbegin-endblockofthetry-catch.Here,youcanwriteextralogmessagestothetracelogfileandstillfailthejobexecutionwiththeraise_exception()functionifyouwantto.WewilldiscussitinmoredetailinChapter5,Workflow–ControllingExecutionOrder,andChapter9,AdvancedDesignTechniques.

There’smore…ThescriptinglanguageinDataServicesisaveryimportanttoolextensivelyusedinsimpleorcomplexjobs.Inthischapter,weestablishedagoodbaseregardingbuildingDataServicesscriptlanguageskills.Youwillfindalotmoreexamplesthroughoutthisbook.

Chapter4.Dataflow–Extract,Transform,andLoadInthischapterwewilltakealookatexamplesofthemostimportantprocessingunitinDataServices—thedataflowobject—andthemostusefultypesoftransformationsyoucanuseinsidethem.Wewillcover:

CreatingasourcedataobjectCreatingatargetdataobjectLoadingdataintoaflatfileLoadingdatafromaflatfileLoadingdatafromtabletotable–lookupsandjoinsUsingtheMap_OperationtransformUsingtheTable_ComparisontransformExploringtheAutocorrectloadoptionSplittingtheflowofdatawiththeCasetransformMonitoringandanalyzingdataflowexecution

IntroductionInthischapterwemovetothemostimportantcomponentoftheETLdesigninDataServices:thedataflowobject.Thedataflowobjectisthecontainerthatholdsalltransformationsthatcanbeperformedondata.

Thestructureofthedataflowobjectissimple:oneormanysourceobjectsareplaced,ontheleft-handside(whichweextractthedatafrom),thensourceobjectsarelinkedtotheseriesoftransformobjects(whichperformmanipulationonthedataextracted),andfinally,thetransformobjectsarelinkedtooneormanytargettableobjects(tellingDataServiceswherethetransformeddatashouldbeinserted).Duringthetransformationofthedatasetinsidethedataflow,youcansplitthedatasetintomultipledatasetflows,orconversely,mergemultipleseparatelytransformeddataflowstogether.

Manipulationsperformedondatainsidedataflowsaredoneonarow-by-rowbasis.Therowsextractedfromthesourcegofromlefttorightthroughallobjectsplacedinsidethedataflow.

WewillreviewallmajoraspectsofdataflowdesigninDataServices,fromcreatingsourceandtargetobjectstotheusageofcomplextransformationsavailableaspartoftheDataServicesfunctionality.

CreatingasourcedataobjectInacoupleofpreviousrecipes,youhavealreadybecomefamiliarwithdatasources,importingtables,andusingimportedtablesinsidedataflowsassourceandtargetobjects.Inthisrecipe,wewillcreatetherestofthedatastoreobjectslinkingallourexistingdatabasestoaDataServicesrepositoryandwillspendmoretimeexplainingthisprocess.

Howtodoit…IntheUnderstandingtheDesignertoolrecipeinChapter2,ConfiguringtheDataServicesEnvironment,wealreadycreatedourfirstdatastoreobject,STAGE,forthe“HelloWorld”example.

So,whydoyouneedadatastoreobjectandwhatisitexactly?DatastoreobjectsarecontainersrepresentingtheconnectionstospecificdatabasesandstoringimporteddatabasestructuresthatcanbeusedinyourDataServicesETLcode.Inreality,datastoreobjectsdonotstorethedatabaseobjectsthemselvesbutratherthemetadatafortheobjectsbelongingtotheapplicationsystemordatabasethatthedatastoreobjectconnectsto.Theseobjectsmostcommonlyincludetables,views,databasefunctions,andstoredprocedures.

Ifyouhavenotfollowedthestepsinthe“HelloWorld”examplepresentedinChapter2,ConfiguringtheDataServicesEnvironment,youcanfindherethestepstocreatealldatastoreobjectsthatwillbeusedinthebookexplainedinbetterdetail.Withthesesteps,wewillcreatedatastoreobjectsreferencingalldatabaseswehavecreatedpreviouslyinSQLServerinthefirsttwochapters:

1. OpentheDatastorestabinLocalObjectLibrary.2. Right-clickonanyemptyspaceinthewindowandchooseNewfromthecontext

menu:

3. FirstspecifyDatastoreTypeforthedatastoreobject.ThedatastoretypedefinestheconnectivitytypeanddatastoreconfigurationoptionsthatwillbeusedbyDataServicestocommunicatewithreferencedsource/targetsystemobjectslyingbehindthisdatastoreconnection.Inthisbook,wewillmainlybeworkingwithdatastoresoftheDatabasetype.SeethatassoonasthedatastoretypeDatabaseisselected,asecondDatabaseTypeoptionappearswithalistofavailabledatabases:

4. TheCreateNewDatastorewindow,withalloptionsexpandingafteryouchooseDatastoreTypeandDatabaseType,lookslikethisscreenshot:

5. Leavealladvancedoptionsattheirdefaultvaluesandconfigureonlythemandatoryoptionsinthetopwindowpanel:thedatabaseconnectivitydetailsandusercredentials,whichwillbeusedbyDataServicestoaccessthedatabaseandread/insertthedata.

6. Usingtheprevioussteps,createanotherdatastorenamedODS.7. Altogether,youshouldhavethefollowinglistofdatastoreobjectscreatedforallour

localtestdatabases.Ifyoudonothaveallofthem,pleasecreatethemissingonesusingthesamestepsjustmentioned:

DS_ODS:ThisisthedatastorelinkingtotheODSdatabaseDS_STAGE:ThisisthedatastorelinkingtotheSTAGEdatabaseDWH:ThisisthedatastorelinkingtotheAdventureWorks_DWHdatabaseOLTP:ThisisthedatastorelinkingtotheAdventureWorks_OLTPdatabase

8. TocreateareferencetoadatabasetableinthedatastoreOLTP,expandtheOLTPdatastoreintheLocalObjectLibrarytabanddouble-clickontheTableslist.

9. TheDatabaseExplorerwindowopensinaworkspaceDesignersection,showing

youallthetableandviewobjectsintheOLTPdatabase.10. FindtheHumanResources.EmployeetableintheExternalMetadatalist,right-click

onit,andchoosetheImportoptionfromthecontextmenu:

11. YoucanseehowthetablestatushaschangedinDatabaseExplorertoYesunderImportedandNounderChanged.

12. Also,youcanseethetablereferencesappearinthedatastoreOLTPtablelist.AsitisnotusedanywhereinETLcode,theUsagecolumnintheLocalObjectLibraryshows0forthattable.

13. Now,closetheDatabaseExplorerwindowanddouble-clickontheimportedtablenameintheLocalObjectLibrarywindow.TheTableMetadatawindowopensshowingyourtableattributesandevenallowingyoutoviewthecontentsofthetable:

NoteThisTableMetadatawindowisextremelyusefulforperformingasourcesystemanalysiswhenyouhavetolearnthesourcedatatounderstanditbeforestartingto

developyourETLcodeandapplyingtransformationrulesonit.

14. TheViewDatatabhasthreesubtabswithinit:theData,Profile,andColumnprofiletabs.ChoosetheColumnprofiletabandselecttheGENDERcolumninthedrop-downlist.

15. ClickontheUpdatebuttontoseethecolumnprofiledata:

Columnprofilingdatashowsthatthereare206maleemployees(71.03%)against84(28.97%)femaleones.

Howitworks…Themostimportantthingyoushouldunderstandaboutdatastoreobjectsisthatwhenyouimportadatabaseobjectintoadatastore,allyoudoisyoucreateareferencetothedatabaseobject.YouarenotcreatingaphysicalcopyofthetableinyourDataServicesdatastorewhenyouimportatable.Hence,whenyouuseViewdataintheTableMetadatawindowforthattable,DataServicesexecutestheSELECTqueryinthebackgroundtoextractthisdataforyou.

Lookingatthebrowsingexternalmetadatascreenagain,youcanseethattherearetwootheroptionsavailableinthetablecontextmenu:OpenandReconcile:

TheOpenoptionallowsyoutoopenanexternaltablemetadatawindowwhichcandisplaytabledefinitioninformation,partitions,indices,tableattributes,andotherusefulinformation.

TheReconcileoptionsimplyupdatesthetwocolumns,ImportedandChanged,intheExternalMetadatalist.Itisusefulwhenyouwanttocheckwhetherthetableobjecthasbeenimportedintoadatastorealreadyandwhetherithaschangedinthedatabasesincethelasttimeitwasimportedintoadatastore.

NoteItistheETLdeveloper’sresponsibilitytoreimportthetableobjectsinthedatastoreiftheirdefinitionorstructurehasbeenchangedonthedatabaselevel.DataServicesdoesnotautomaticallyperformthisoperation.ThemostcommonproblemwithtableobjectsynchronizationiswhenthecolumnpopulatedbyETLgetsremovedfromthetableinthedatabase.Toreflectthischange,thedeveloperhastoreimportthetableobjectinthedatastoretoupdatetableobjectstructureinDataServicesandthenupdateETLcodetomakesurethatanon-existingcolumnisnotreferencedasthetargetcolumnanymore.

Viewsassourceobjectsbehaveexactlyastables.TheycanbeimportedinthedatastoreinthesameTablessectionalongwithothertableobjects.Theonlydifferenceisthatyoucannotspecifytheimportedviewasatargetobjectinyourdataflow.

Youmayalsowonderthat,ifthedatastoreobjectrepresentstheconnectiontoaspecific

database,whydoyounotseeallthedatabaseobjectsstraightawayaftercreatingit.Theanswerissimple:youimportonlythosedatabaseobjectsyouwillbeusinginyourETLcode.Ifthedatabasehasafewhundredtables,itwouldbeextremelytime-andresource-intensiveforDataServicestoautomaticallysynchronizealldatastoreobjectreferenceswithactualdatabaseobjectseachtimeyouopenaDesignerapplication.ItisalsoeasierforthedevelopertobeabletoseeonlythetablesusedinETLdevelopment.Plus,withthedatastoreconfigurationsfeature,youcanusethesamedatastoreobjecttoconnecttodifferentphysicaldatabases,thatmighthavedifferentversionsofthetableswiththesamenames,sothesynchronizationofobjectsimportedinthedatastoreissolelyyourresponsibilityandhastobedonemanually.Wewilldiscussconfigurationsinthefuturechapters.

TheprofilingfunctionalityofDataServicesthatweusedinthisrecipeallowsyoutolookintothedatawithouttheneedforgoingtoSQLServerManagementStudioandmanuallyqueryingthetables.ItiseasyandconvenienttouseduringETLdevelopment.

There’smore…Itisquitedifficulttocoveralltheinformationaboutalldatastoresettingsinonechapter,asDataServicesisabletoconnecttosomanydifferentdatabasesandapplicationsystems.Asthedatastoreoptionsaredatabasespecific,thenumberofoptionsandtheirbehaviorvarydependingonwhichdatabaseorsystemyouaretryingtoconnectto.

CreatingatargetdataobjectAtargetdataobjectistheobjecttowhichwesendthedatawithinadataflow.Thereareafewdifferenttypesoftargetdataobjects,butthetwomainonesaretablesandflatfiles.Inthisrecipe,wewilltakealookatatargettableobject.

NoteViewsimportedintoadatastorecannotbetargetobjectswithinadataflow.Theycanonlybeasourceofdata.

GettingreadyToprepareforthisrecipe,weneedtocreateatableinourSTAGEdatabase.Todothat,pleaseconnecttoSQLServerManagementStudioandcreatethePersontableintheSTAGEdatabaseusingthefollowingcommand:CREATETABLEdbo.Person

(

FirstNamevarchar(50),

LastNamevarchar(50),

Ageinteger

);

Thistablewillbeusedasatargettable,whichwewillloaddataintobyusingDataServices.WewillusethedatastoredinthePersontablefromtheOLTPdatabaseasthesourcedatatobeloaded.

Howtodoit…1. OpentheDataServicesDesignerapplication.2. IntheDS_STAGEdatastore,right-clickonTablesandchoosetheoptionImportBy

Name…(anotherquickmethodtoimportatabledefinitionintoDataServiceswithoutopeningDatabaseExplorer).Ofcourse,inordertodothat,youshouldknowtheexacttablenameandschemaitwascreatedin.

3. Intheopenedwindow,entertherequireddetails,asinthefollowingscreenshot:

4. ClickontheImportbuttontofinish.5. AlsointheOLTPdatastore,importanewtablePersonfromthePersonschema

insidetheAdventureWorks_OLTPdatabase.Wewillusethistableasasourceofdata.

NoteIntheexampleofusingSQLServerasanunderlyingdatabase,theownerissynonymouswiththedatabaseschema.Whenimportingatablebynameinthedatastoreorcreatingtemplatetables,theOwnerfielddefinestheschemawherethetablewillbeimportedfrom/createdinthedatabase.So,keepinmindthatyouhavetouseexistingschemacreatedpreviously.

6. Createanewjobwithanewdataflowobject,openthedataflow,anddragthePersontablefromtheOLTPdatastoreintothisdataflowasasource.Then,dragthePersontablefromtheDS_STAGEdatastoreasatarget.

7. CreateanewQuerytransformbetweenthemandlinkittobothsourceandtargettables:

8. AssoonasyouopentheQuerytransform,youwillseethatbothinputandoutputstructureswerecreatedforyou.AllcolumnnamesanddatatypeswereimportedfromthesourceandtargetobjectsyoulinkedtheQuerytransformto,andallyouhavetodoistomapthecolumnvaluesfromthesourcetopasstothecolumnsinthetargetyouwanttopassthemto.

9. MapthesourceFIRSTNAMEcolumntothetargetFIRSTNAMEcolumnandperformthe

samemappingforLASTNAME.AsthereisnoAGEcolumninthesource,putNULLasthevalueforthemappingexpressionfortheAGEtargetcolumnintheQuerytransform.Thiscanbedonebydragginganddroppingfromtheinputtotheoutputschemaorbytypingthemappingmanually:

10. EachtargetobjectwithinadataflowhasasetofoptionsthatisavailableintheTargetTableEditorwindow.Toopenit,double-clickonatargettableobjectinthedataflowworkspace:

11. Fornow,let’sjustselecttheDeletedatafromtablebeforeloadingcheckbox.Thisoptionmakessurethateachtimethedataflowruns,alltargettablerecordsaredeletedbyDataServicesbeforepopulatingthetargettablewithdatafromasourceobject.

12. ValidatethedataflowbyclickingontheValidateCurrentbuttonwhenthedataflowisopenedinthemainworkspacetomakesurethatyouhavenotmadeanydesignerrors.

13. NowexecutethejobandclickontheViewDatabuttoninthebottom-rightcornerofthetargettableiconwithinadataflowtoseethedataloadedintothetargettable.

Howitworks…Youcanseethatthetargettableobjecthasalotofoptions.DataServicescanperformdifferenttypesofloadingofthesamedataset,andallthosetypesareconfiguredinthetargettableobjecttabs.Someofthemareusediftheinserteddatasetisvoluminous,whilesomeofthemallowyoutoinsertdatawithoutduplicatingit.Wewilldiscussallofthisindetailinalaterchapters.

WhenDataServicesselectsdatafromsourcetables,allitdoesisexecutetheSELECTstatementinthebackground.ButwhenDataServicesinsertsthedata,thereareriskssuchasincompatibledatatypes/values,duplicatedata(whichviolatesreferentialintegrityinthetargettable),slowperformance,andsoon.Donotforgetthatyouinsertdataaftertransformingit,soitisyourresponsibilitytounderstandthetargetdatabaseobjectrequirementsandspecificsofthedatayouareinserting.

ThatiswhytheloadingmechanisminDataServiceshasmanymoresettingstoconfigureandismuchmoreflexiblethanthemechanismofgettingsourcedatainsideadataflow.

There’smore…Asyoumightrememberfromthe“HelloWorld”exampleinChapter2,ConfiguringtheDataServicesEnvironment,thereisagreatandsimplewaytocreatetargettableobjectsinadataflowwithoutthenecessitytocreateaphysicaltableinthedatabasefirstandimportitintotheDSdatastore.Weusedthistypeoftargettablebefore,andIamtalkingabouttemplatetables.ObjectsthatweusedinthepreviousrecipeswhenwewantedDataServicestocreateaphysicaltargettableforusfromthemappingswedefinedinaQuerytransforminsideourETLcodeinadataflow.

NoteNotethatthetemplatetargettablehasanextratargettableoption;Dropandre-createtable.Bydefault,itistickedandgetsphysicallydroppedandrecreatedeachtimethedataflowruns.DataServicesgeneratesatabledefinitionfromtheoutputschemaofthelasttransformobjectinthedataflowlinkedtothetargettableobject.

Asyoucanseeinthefollowingfigure,youcanspecifymultipletargettables.Theygetpopulatedwiththesamedatasetcomingfromthesourcetable,andastheygetpopulatedfromthesameoutputschemaoftheQuerytransform,theyhavethesametabledefinitionformat:

Tocreateatemplatetable,youusetheright-handsidetoolmenuintheDesignerandthetemplatetableiconshowninthefollowingscreenshot:

Clickonthetemplatetablebuttoninthetoolmenuandthenontheemptyspaceinthedataflowworkspacetoplaceitasatargettableobject.

Specifythetemplatetablename,theDataServicesdatastorewhereitshouldbecreated,andthedatabaseowner(schema)namewherethetablegetscreatedphysicallywhenthedataflowisexecuted:

LoadingdataintoaflatfileThisrecipewillteachyouhowtoexportinformationfromatableintoaflatfileusingDataServices.

Flatfilesareapopularchoicewhenyouneedtostoredataexternallyforbackuppurposesorinordertotransferandfeeditintoanothersystemorevensendtoanothercompany.

Thesimplestfileformatusuallydescribesalistofcolumnsinspecificorderandadelimiterusedtoseparatefieldvalues.Inmostcases,it’sallyouneed.Asyouwillseeabitlater,DataServiceshasmanyextraconfigurationoptionsavailableintheFileFormatobject,allowingyoutoloadthecontentsoftheflatfilesintoadatabaseorexportthedatafromadatabasetabletoadelimitedtextfile.

Howtodoit…1. CreateanewdataflowandusetheEMPLOYEEtablefromtheOLTPdatastoreimported

earlierasasourceobject.2. LinkthesourcetablewithaQuerytransformanddrag-and-dropallsource

columnstotheoutputschemaformappingconfiguration.3. IntheQuerytransform,right-clickontheparentQueryitem,whichincludesall

outputmappingcolumns,andchoosetheCreateFileFormat…optionatthebottomoftheopenedcontextmenu:

4. ThemainFileFormatEditorwindowopens:

5. RefertothefollowingtableformoredetailsaboutFileFormatoptionsandtheircorrespondingvalues:

FileFormatoptions

Description Value

TypeSpecifiesthetypeofthefileformat:Delimited,FixedWidth,UnstructuredText,andsoon.

Inthisrecipe,wearecreatingplaintextfilewithrowfieldsseparatedbyacomma.ChoosetheDelimitedoption.

NameNameoftheFileFormat.NotethatthisisnotthenameofthefilethatwillbecreatedbutthegeneralnameoftheFileFormat.

TypeF_EMPLOYEE.

Location

Thisisthephysicallocationofthefilereferencedusingthisfileformat.Inourcase,thelocationsofJobServerandLocalarethesameasDataServicesinstalledonthesame ChooseJobServer.

machinewhereweexecutedourDesignerapplication.

Rootdirectory

Directorypathtothefile.Makesurethatthisdirectoryexists. TypeC:\AW\Files.

Filename(s)

Nameofthefilethatwereaddatafromorwriteinto. TypeHR_Employee.csv.

Delimiters|Column

Youcaneitherchoosefromexistingoptions:Tab,Semicolon,Comma,Space,orjusttypeinyourowncustomdelimiterasonecharacterorasequenceofcharacters.

ChooseComma.

Delimiters|Text

Youcanspecifywhetheryouwantcharactervaluestobewrappedinquotes/doublequotesornot.

Choose“.

Skiprowheader

Whenyoureadfromthefile,usethisoptiontoskiptherowheadersoitisnotconfusedasafirstdatarecord.

Wedonothavetochangethisoptionasitwouldnotmakeanyeffectbecausewearegoingtowritetoaflatfile,notreadfromit.

Writerowheader

Sameoptionsasthepreviousone,butforcaseswhenyouwriteintoafile.IfsettoYes,therowheaderwillbecreatedasafirstlineinthefile.IfNo,thefirstlineinthefilewillbeadatarecord.

ChooseYestocreatearowheaderwhenwritingtoafile.

6. ClickontheSave&ClosebuttontocloseFileFormatEditorandsavethenewFileFormat.

7. NowyoucanopentheLocalObjectLibrary|FormatstabandseeyournewlycreatedfileformatF_EMPLOYEE.

8. Openthedataflowworkspaceanddrag-and-dropthisfileformatfromtheLocalObjectLibrarytabtoadataflowandchoosetheMakeTarget…option.

9. LinkyourQuerytransformtoatargetfileobjectandvalidateyourdataflowtomakesurethattherearenoerrors.

10. Runthejob.YouwillseethatthefileHR_Employee.csvappearsinC:\AW\Filesandgetspopulatedwith292records(1headerrecord+291datarecords).

Howitworks…Fileformatconfigurationprovidesyouwithaflexiblesolutionforreadingdatafromandloadingdataintoflatfiles.Youcanevensetupautomaticdaterecognitionandconfigureanerrorhandlingmechanismtorejectrowsthatdonotfitintoadefinedfileformatstructure.

NotethateditingthefileformatfromLocalObjectLibraryandeditingitdirectlyfromthedataflowwhereitwasplacedtobeusedtoreadorwritefromflatfilesisnotthesame.Ifyouedititinsidethedataflow,youwillnoticethatsomefieldsintheFileFormatEditoraregrayedout.OpeningthesamefileformatforeditingfromLocalObjectLibrarymakesthosefieldsavailableforediting.Thishappensbecausewhenimportedinadataflow,theFileFormatobjectbecomesaninstanceoftheparentFileFormatobjectstoredinaLocalObjectLibrary.andbecauseallchangesappliedtoaninstanceinsideadataflowarenotpropagatedtootherinstancesofthisFileFormatobjectimportedintootherdataflows.Alternatively,whenyoumodifytheFileFormatdefinitioninLocalObjectLibrary,changesmadearepropagatedtoallinstancesofthisFileFormatobjectimportedtodifferentdataflowsacrossETLcode.

NoteSomefileformatconfigurationparameterscanbechangedonlyontheparentfileformatobjectinLocalObjectLibrary.

YoushouldalsokeepinmindthatexporttoaflatfileinDataServicesisquiteaforgivingprocess.Forexample,ifyourfileformathasthevarchar(2)characterfieldandyouaretryingtoexportalineof50characterstoafileinthisfield,DataServiceswillallowyoutodothat.InfactDataServicesdoesnotcaremuchaboutthecolumnsspecifiedinthefileformatatallifyouuseyourfileformattoexportdatatoaflatfile.Datadefinitionwillbesourcedfromtheoutputschemaoftheprecedingtransformationobjectlinkedtothetargetfileobject.

Importingfromaflatfileontheotherhandisaverystrictprocess.DataServiceswillrejecttherecordimmediatelyifitdoesnotfitthefileformatdefinition.

There’smore…TherearemorewaystocreateaFileFormatobjectthanshowninthisrecipe.Somearelistedhere:

CreatinginLocalObjectLibrary:OpentheFormatstabintheLocalObjectLibrarywindow,right-clickonFlatFiles,chooseNewfromthecontextmenu.YoucanusetheLocation,Rootdirectory,andFileName(s)optionstoautomaticallyimporttheformatfromanexternalfile.Otherwise,youwillhavetodefineallcolumnsandtheirdatatypesmanually,onebyone.ReplicatingafileformatfromanexistingFileFormatobjectinLocalObjectLibrary:OntheFormatstab,choosetheobjectyouwanttoreplicate,right-clickonit,andchoosetheReplicate…optioninthecontextmenu.

LoadingdatafromaflatfileYoucanusethesameFileFormatobjectcreatedinthepreviousrecipetoloaddatafromaflatfile.Inthefollowingsection,wewilltakeacloserlookatfileformatoptionsrelevanttoloadingdatafromthefiles.

Howtodoit…1. Createanewjobandanewdataflowobjectinit.2. Createanewtextfile,Friends_30052015.txt,withthefollowinglinesinsideit:

NAME|DOB|HEIGHT|HOBBY

JANE|12.05.1985|176|HIKING

JOHN|07-08-1982|182|FOOTBALL

STEVE|01.09.1976|152|SLEEPING|10

DAVE|27.12.1983|AB5

3. GotoLocalObjectLibraryandcreateanewfileformatbyright-clickingonFlatFilesandchoosingNew.

4. PopulatetheFileFormatoptionsasshowninthefollowingscreenshot:

Delimiters|Columnissetto|inthiscaseasourfilehasthepipeasadelimiter.NULLIndicatorwassettoNULL,whichmeansthatonlyNULLvaluesintheincomingfileareinterpretedasNULLwhenreadbyDataServices.Theother“empty”valueswillbeinterpretedasemptystrings.Dateformatissettodd.mm.yyyyaswespecifiedinthefileformatthatweare

loadingtheDOB(DateofBirth)columnofthedatedatatype.ImaginethatyouhaveconfiguredtheQuerytransformmappingforthatcolumnusingtheto_date(<date>,‘dd.mm.yyyy’)function.SkiprowheaderissettoYesinordertospecifythatthefilehasaheaderrowwhichhastobeskipped.ThenwesetalloptionsrelatedtoerrorcapturingtoYestocatchallpossibleerrors.Writeerrorstoafileallowsyoutorecordtherejectedrecordsinaseparatefileforfurtheranalysis.WewillbewritingthemtotheFriends_rejected.txtfile.

5. Importthisfileformatobjectasadatasourceinyournewlycreateddataflow.6. MapallsourcecolumnstoaQuerytransformandcreatethetargettemplatetable

FRIENDSintheDS_STAGEdatastore.7. Saveandrunthejob.

Asaresult,youcanseethattworecordswererejected,onebecauseofanextracolumnintherowandtheotherbecausetherowhadonecolumnlessthanthedefinedfileformat.

Thecontentsofyourtargettableshouldlooklikethis:

Youcanseethatsomelinesaremissinghereduetotheerrorsintheinputfile.

What’sinterestingisthatDataServiceswassmartenoughtocorrectlyrecognizeandconvertthedateofbirthforJOHN.Rememberitwas07-08-1982inthefileandthedateformatwespecifiedwasdd.mm.yyyy.

Howitworks…Asyoucansee,mostofthefileformatoptionswehaveusedareusefulforvalidatingthecontentsofthesourcedatafileinordertorejectrecordswithdataofincorrectdatatypeorformat.

Themainquestionyouhavetoaskyourselfiswhetheryouwantalltheserecordstoberejected.Thealternativemightbetobuildadataflowthatloadsallrecordsofthevarchardatatypeandtriestocleanseandconvertincorrectvaluestoanacceptableformat,orputsthedefaultvalueinsteadofawrongonetomarkthefield.Sometimesyoudonotwanttolosethewholerecordifjustonevalueisincorrect.

Now,let’sfixthe“numberofcolumns”probleminthesourcefiletoseehowDataServicesdealswithconversionproblems.Doyourememberweputthecharactersymbolinoneoftheintegerdatatypefields?

ChangetherecordsforSteveandDavetothefollowinglinesandrerunthejob:STEVE|1976.01.01|152|SLEEPING

DAVE|27.12.1983|AB5|DREAMING

Bothrecordsarerejectedwiththefollowingerrormessagesappearingintheerrorlogfilewhenyouexecutethejob:

YoucanseethosemessagesinthejoberrorlogandintheFriends_rejected.txtfilealongwiththerejectedrecordsthemselves.Thenameandlocationoftherejectfileisdefinedbytwofileformatoptions:ErrorfilerootdirectoryandErrorfilename.Theybecomeavailablewhenyouopenthefileformatobjectinstanceforeditingfromwithinadataflow:

Aswejuststated,inordertoloadthoserecords,youshouldputinsomeextradevelopmenteffortandcreatealogicinyourdataflowtodealwithallpossiblescenariosinordertocleanseandcorrectlyconvertthedata,andofcourseyoushouldamendthefileformat,changingalldatatypestovarcharinordertopassthoserecordsthroughforfurthercleansing.

NoteYoucanusemasksintheFilename(s)optionwhenconfiguringthefileobjectinyourdataflow.Forexample,specifyinginvoice_*.csvasafilenamewillallowyoutoloadbothinvoice_number_1.csvandinvoice_number_2.csvfilesinasingleexecutionofthedataflow.Theywillbeloadedoneafteranother.

There’smore…TrytoexperimentfurtherwiththecontentsoftheFriends_30052015.txtfilebyaddingextrarowswithdifferentdatatypestoseewhethertheywillberejectedorloaded,andwhicherrormessagesyouwillgetfromDataServices.

Loadingdatafromtabletotable–lookupsandjoinsWhenyouspecifyarelationalsourcetableinthedataflow,DataServicesexecutessimpleSQLSELECTstatementsinthebackgroundtofetchthedata.Ifyouwantto,youcanseethelistofstatementsexecutedforeachsourcetable.Inthisrecipe,weexplorewhathappensunderthehoodwhenyouaddmultiplesourcetablesandhowDataServicesoptimizestheextractionofthedatafromthesesourcetablesandevenjoinsthemtogether,executingcomplexSQLqueriesinsteadofmultipleSELECT*FROM<table>.

Howtodoit…Inthisrecipe,wewillextractaperson’sname,address,andphonenumberfromthesourceOLTPdatabaseandpopulateanewstagetablePERSON_DETAILSwiththisdataset.

1. Createanewjobandanewdataflow.Specifyyourownnamesforthecreated

objects.2. Toextracttherequireddata,youwillneedtoimportthetablesPERSON,ADDRESS,and

BUSINESSENTITYADDRESS(whichisatablelinkingthefirsttwo)intoyoursourceOLTPdatastore.AllthesetablesarelocatedinthePersonschemaoftheAdventureWorks_OLTPdatabase.

3. Placetheimportedtablesassourceobjectsinyourdataflow,asshowninthefollowingfigure,andlinkthemwiththeQuerytransform.InsertthetargettemplatetablePERSON_DETAILStobecreatedintheDS_STAGEdatastore:

4. Tosettherequiredjoinconditions,youshouldusetheJoinpairssectionlocatedontheFROMtaboftheQuerytransform.Inthisexample,thesejoinconditionsshouldbegeneratedautomaticallyassoonasyouopentheQuerytransform.Iftheyweren’t,youcanclickontheiconwithtwointersectinggreencircleswiththehintClicktoproposejointogeneratethem,orclickontheJoinConditionfieldandtyperequiredjoinconditionsmanuallyforeachtablepairtocreatejoinconditionsmanually.PleaseusethefollowingscreenshotasareferencetocreatetwoInnerjoinpairs(PERSON-BUSINESSENTITYADDRESSandBUSINESSENTITYADDRESS-ADDRESS):

5. Atthispoint,youareabletoseewhichSQLstatementDSusestoextracttherequiredinformationbychoosingValidation|DisplayOptimizedSQLfromthemainmenu.ItopensthefollowingwindowshowingyouthenumberofdatastoresqueriedinthewindowontheleftandthefullSELECTstatementexecutedineachofthemontheright:

6. Weforgottoaddcountryinformationforeachperson.ItlooksliketheAddresstablehasonlystreetinformationandcitybutnocountryorstatedata.ImportanothertwotablesintheOLTPdatastore:STATEPROVINCEandCOUNTRYREGION.

7. Addthemassourcetablesinthedataflow,butdonotjointhemtoalreadyexistingonesinthesameQuerytransform.CreateanotherQuerytransformandcallitGet_Country.UseittojointheQuerydatasetwithtwonewsourcetables,asshowninthefollowingfigure:

8. AddtwonewcolumnmappingsintheGet_CountryQuerytransform:BUSINESSENTITYID,mappedfromthefieldwiththesamenamefromtheQueryinputschema,andtheCOUNTRYcolumn,mappedfromtheNAMEcolumnoftheCOUNTRYREGIONtableinputschema.

9. IfyoucheckValidation|DisplayOptimizedSQLagain,youwillseethattheSQLstatementhaschanged,nowincludingtwonewtables:

10. WestillhavemissingphoneinformationforourPERSON_DETAILStables.AddathirdQuerytransformontherightandcallitLookup_Phone.Tolookforthephoneinformation,wewillusethelookup_ext()functionexecutedfromafunctioncallwithinaQuerytransform.Thefunctionlookup_ext()ismostcommonlyusedincolumnmappingstoperformthelookupoperationforthevaluesfromothertables.

11. OpentheLookup_PhoneQuerytransformandmapallsourcecolumnstothetargetonesexceptfortheBUSINESSENTITYIDandADDRESSLINE2columns(wearenotgoingtopropagatethose).

12. Right-clickonthelastmappedcolumninthetargetschema(shouldbeCOUNTRY)andselecttheoptionNewFunctionCall…fromthecontextmenu:

13. ChooseInsertBelow…,andintheopenedSelectFunctionwindow,chooseLookupFunctions|lookup_ext

14. TheopenedLookup_ext|SelectParameterswindowallowsyoutosetlookupparametersforthetableyouwanttoextractinformationfrom.Rememberthatthisisbasicallyaformofajoin,soyouwouldhavetospecifythejoinconditionsoftheinputdatasettothelookuptable.Inourcase,thelookuptableisPERSONPHONE.IfyoudidnotimportitearlierinyourOLTPdatastore,pleasedothatnow.Usethelookupparameterdetailsshowninthefollowingscreenshot:

15. AfteryouclickonFinish,yourtargetschemaintheLookup_Phonetransformshouldlooklikethis:

16. ItsohappensthatthePHONENUMBERfieldwehaveextractedfromthelookuptableisakeycolumninthattable.DataServicesautomaticallydefineskeycolumnsfromsourcetablesintheQuerytransformasprimarykeysaswell.Tochangethisandmakesurethatourfinaldatasetdoesnotincludeduplicates,wearegoingtocreatealastQuerytransformandnameitDistinct.LinkitontherighttotheLookup_Phonetransformandopenitchoosingthefollowingoptions:

TochangethePHONENUMBERcolumnfrombeingaprimarykey,double-clickonthecolumninthetargetschemaanduncheckthePrimarykeyoption.Togetridoftheduplicatefields,opentheSELECTtabandcheckDistinctrows.

17. Saveandrunthejobandviewthedatausingthedataflowtargettableoption:

Asthefinalstep,importthetemplatetablePERSON_DETAILSsoitisconvertedintothenormaltableobjectinsidetheDS_STAGEdatastore.Todothat,right-clickonthetableeitherinLocalObjectLibraryorinsidethedataflowworkspace,asshowninthefollowingscreenshot,andchoosetheImportTableoptionfromtheobject’scontextmenu:

Howitworks…YouhaveseenanexampleofhowmultipletablescanbejoinedinDataServices.TheQuerytransformrepresentsthetraditionalSQLSELECTstatementwiththeabilitytogrouptheincomingdataset,usevariousjoinconditions(INNER,LEFT,orOUTER),usetheDISTINCToperator,sortdata(ontheORDERBYtab),andapplyfilteringconditionsintheWHEREtab.

TheDataServicesoptimizertriestobuildasfewSQLstatementsaspossibleinordertoextractthesourcedatabyjoiningtablesinacomplexSELECTstatement.Inafuturechapters,wewillseewhichfactorspreventthepropagationofdataflowlogictoadatabaselevel.

Wehavealsotriedtouseafunctioncallinthemappingsinordertojoinatabletoextractadditionaldata.ItwouldbeperfectlyvalidtoimportthePERSONPHONEtableasasourcetableandjoinitwiththerestofthetableswiththehelpoftheQuerytransform,butusingthelookup_ext()functionsgivesyouagreatadvantage.Italwaysreturnsonlyonerecordfromthelookuptableforeachrecordwelookupvaluesfor.WhereasjoiningwithaQuerytransformdoesnotpreventyoufromgettingduplicatedormultiplerecordsinthesamewayasifyouhavejoinedtwotablesinstandardSQLquery.Ofcourse,ifyouwantyourQuerytransformtobehaveexactlylikeaSELECTstatementjoiningtablesinthedatabase,producingmultipleoutputrecordsforeachlookuprecord,thelookup_ext()functionshouldnotbeused.

IfyouarewritingacomplexSQLSELECTstatement,youareprobablyawarethatjoiningmultipletablescanleadtoduplicaterecordsintheresultdataset.Thisdoesnotnecessarilymeanthatjoinsareincorrectlyspecified.Sometimesitistherequiredbehavior,oritcanbeadatabasedesignproblemorsimplythepresenceof“dirty”datainoneofthesourcetables.

Thefunctionlookup_ext()makessurethatifitfindsmultiplerecordsinthelookuptableforyoursourcerecord,itpicksonlyonevalueaccordingtothemethodspecifiedintheReturnpolicyfieldoftheLookup_extparameterswindow:

NoteThemaindisadvantagesofusingthelookup_ext()functionarelowtransparencyoftheETLcode—asitishiddeninsidetheQuerytransform—andthefactthatlookup_ext()functionspreventthepropagationofexecutionlogictoadatabaselevel.DataServicesalwaysextractsthefulltablespecifiedasthelookuptableinthelookup_ext()functionparameters.

Dependingonwhichversionoftheproductisusedandondatabaseenvironmentconfiguration,DataServicescanautomaticallygeneratealljoinconditionswhenyoujointablesintheQuerytransformandspecifyjoinpairs.Thisisbecause,whenyouimportthesourcetablestoadatastore,DataServicesimportsnotjusttabledefinitionsbutalsoinformationaboutprimarykeys,indexes,andothertablemetadata.So,ifDataServicesseesthatyouarejoiningtwotableswithidenticallynamedfieldswhicharemarkedasprimaryorforeignkeysonthedatabaselevel,itautomaticallyassumesthatthosetablescanbejoinedusingthosekeyfields.

KeepthatinmindthatifbusinessrulesorETLlogicdictatesjoinconditionstobedifferentfromwhatDataServicesautomaticallyproducesandyouhavetomodifythosevaluesinQuerytransformlogicorevenwriteyourownjoinconditionsbymanuallyenteringthem.

UsingtheMap_OperationtransformHereweexploreaveryinterestingtransformationavailabletoyouinDataServices.Infact,itdoesnotperformanytransformationofdataperse.WhatitdoesisthatitchangesthetypeoftheSQLDataModificationLanguage(DML)operationthatshouldbeappliedtotherowwhenitreachesthetargetobject.Asyouprobablyknowalready,theDMLoperationsinSQLlanguagearetheoperationswhichmodifythedata,inotherwords,theINSERT,UPDATE,andDELETEstatements.

FirstwewillseetheeffectMap_Operationhaswhenusedinadataflow,andthenwewillexplainindetailhowitworks.Inafewwords,theMap_OperationtransformallowsyoutochoosewhatDataServiceswilldowiththemigratedrowwhenpassingitfromMap_Operationtothenexttransform.Map_Operationassignsoneofthefourstatusestoeachrecordpassingthrough:normal,insert,update,ordelete.Bydefault,themajorityoftransformsinDataServicesproducerecordswithanormalstatus.Thismeansthattherecordwillbeinsertedwhenitreachesthetargettableobjectinadataflow.WithMap_Operation,youcancontrolthisbehavior.

Howtodoit…Inthisexercise,wearegoingtoslightlychangethecontentsofourPERSON_DETAILStable.WewillchangecountryvaluesforrecordsbelongingtoSamanthaSmithfromUnitedStatestoUSAandremovetherecordsforthesamepersonwithUnitedKingdomasthecountry.Thatmeanswewillspecifythesametablebothasasourceandasatarget:

1. CreateanewjobandnewdataflowobjectandplacethePERSON_DETAILStablefrom

theDS_STAGEdatastoreasasourcetable.2. JointhesourcetabletoanewQuerytransformnamedGet_Samantha_Smith.Mapall

columnsfromsourcetotargetandspecifyfilteringconditions,asshowninthefollowingscreenshot.Also,double-clickoneachofthethreecolumns,FIRSTNAME,LASTNAME,andADDRESSLINE1,todefinethemasprimarykeycolumns:

3. SplitthedataflowintwobycreatingtwonewQuerytransforms:USandUK.LinkthemtotwoMap_OperationtransformsimportedfromLocalObjectLibrary|Transforms|Platform|Map_Operationnamedupdateanddeleterespectively.ThenmergethedataflowstogetherwiththeMergetransform,whichcanbefoundinthesamePlatformcategory,andfinallylinkittothesametablePERSON_DETAILSspecifiedasatargettableobject.TheMergetransformdoesnotperformanytransformationsordoesnothaveanyconfigurationoptionsasitsimplymergestwodatasetstogether(liketheUNIONoperationinSQL).Ofcourse,inputschemaformatsshouldbeidenticalfortheMergetransformtowork.Seewhatthedataflowshouldlooklikeinthefollowingfigure:

4. IntheUStransform,mapallkeycolumnsandtheCOUNTRYcolumntotargetandchangemappingforCOUNTRYtoahardcodedvalue,USA.Mostimportantly,specifyGet_Samantha_Smith.COUNTRY=‘UnitedStates’intheWHEREtabtoselectonlyUnitedStatesrecords:

5. IntheUKtransform,maponlykeycolumnsandtheCOUNTRYcolumntotargetitaswellandputGet_Samantha_Smith.COUNTRY=‘UnitedKingdom’intheWHEREtab:

6. NowwehavetotellDataServicesthatwewanttoupdateonesetofrecordsanddeletetheother.Double-clickonyourupdateMap_Operationtransformandsetupthefollowingoptions:

Bydoingthis,wechangerowtypesfornormalrows(theQuerytransformproducesrowsofnormaltype)toupdate.ThismeansthatDataServiceswillexecuteanUPDATEstatementforthoserowsonthetargettable.

7. RepeatthesamefortheDeleteMap_Operationtransformbutnowchangenormaltodeleteanddiscardtherestoftherowtypes:

8. ForDataServicestocorrectlyperformanupdateanddeleteoperations,wehavetodefinethecorrecttargettablekeycolumns.Double-clickonatargettableobjectPERSON_DETAILSinthedataflowandchangeUseinputkeystoYesintheOptionstab.ThattellsDataServicestoconsiderprimarykeyinformationfromthesourcedatasetratherthanusingthetargettableprimarykeys:

9. Beforeexecutingthejob,let’scheckwhatourdatalookslikeinthePERSON_DETAILStableforSamanthaSmith.ClickontheViewdatabuttoninthetargettableandapplyfiltersbyclickingontheFiltersbutton.SpecifyfiltersintheFIRSTNAMEandLASTNAMEcolumnsandchecktherecords:

10. Setthefilters:

11. Thisiswhatthedatainthetablelookslikebeforejobexecution:

12. Runthejobandviewthedatausingthesamefilterstoseetheresult:

Howitworks…ThisisthekindoftaskthatwouldbemucheasiertoaccomplishwiththefollowingtwoSQLstatements:updatedbo.person_detailssetcountry=‘USA’

wherefirstname=‘Samantha’andlastname=‘Smith’andcountry=‘UnitedStates’;

deletefromdbo.person_details

wherefirstname=‘Samantha’andlastname=‘Smith’andcountry=‘UnitedKingdom’;

Butforus,thisexampleperfectlyillustrateswhatcanbedonewiththeuseoftheMap_OperationtransforminDataServices.

Eachrowpassedfromthesourcetoatargettableinadataflowthroughvarioustransformationobjectscanbeassignedoneofthefourtypes:normal,insert,update,anddelete.

Sometransformationscanchangethetypeoftherow,whileothersjustbehavedifferently,dependingonwhichtypetheincomingrowhas.Forthetargettableobject,thetypeoftherowdefineswhichDMLinstructionithastoexecuteonthetargettableusingsourcerowdata.Thisislistedasfollows:

insert:Iftherowcomeswithnormalorinserttype,DataServicesexecutestheINSERTstatementinordertoinsertthesourcerowintoatargettable.Itwillcheckthekeycolumnsdefinedonatargettableinordertocheckforduplicatesandpreventthemfrombeinginserted.update:Ifarowismarkedasanupdate,DataServicesdeterminesthekeycolumnsitwillusetofindthecorrespondingrecordinthetargettableandupdatesallnon-keycolumnvaluesofthetargettablerecordwiththevaluesfromthesourcerecord.delete:DataServicesdeterminesthekeycolumnstolinksourcerowsmarkedwiththedeletetypewithcorrespondingtargetrow(s)andthendeletestherowfoundinthetargettable.normal:Thisistreatedasaninsertwhentherowcomestoafinaltargettableobject.ItisthedefaulttypeofrowproducedbytheQuerytransformandthemajorityofothertransformsinDataServices.

WhattheMap_Operationtransformallowsyoutodoistochangethetypeoftheincomingrow.Thisallowsyoutoimplementsophisticatedlogicinyourdataflows,makingyourdatatransformationextremelyflexible.

NoteDefiningprimarykeysinDataServicesobjects,suchasQuerytransforms,tableandviewobjects,importedindatastoresdoesnotcreatethesameprimarykeyconstraintsforthecorrespondenttablesonthedatabaselevel.Ifyouhavethemdefinedonthedatabaselevel,theywillbeimportedalongwiththetabledefinitionandwillappearinDataServicesautomatically.Otherwise,youdefineprimarykeycolumnsmanuallytohelpDataServicestoefficientlyandcorrectlyprocessthedata.ManyDataServicestransformsandtargetobjectsrelyonthisinformationtocorrectlyprocessthepassingrecords.

SettingOutputrowtypetoDiscardinMap_Operationforaspecificinputrowtypewillcompletelyblocktherowsofthechosentype,notlettingthempassthroughtheMap_Operationtransform.Thisisagreatwaytomakesurethatyourdataflowdoesnotperformanyunexpectedinsertswhenitshould,forexample,alwaysonlyupdatethetargettable.

Notehowourtargettableinthisrecipedoesnothavetheprimarykeyconstraintsspecifiedatthedatabaselevel.ItsohappensthatweanalyzedthedatainthePERSON_DETAILStableandknowthattheFIRSTNAME,LASTNAME,andADDRESSLINEcolumnsdefinetheuniquenessoftherecord.Thatiswhy,wemanuallyspecifythemasprimarykeysinDataServicestransformsandusetheUpdatecontroloptionUseinputkeysonthetargettableobjectsoitknowswheretogetinformationregardingkeycolumnstoperformthecorrectexecutionoftheINSERT,UPDATE,andDELETEstatements.IncaseofUPDATE,allnon-keycolumnswillbeupdateswiththevaluesfromthesourcerow.ThatiswhywepropagatedonlytheCOUNTRYcolumnaswewantedtoupdateonlythisfield.IncaseofDELETE,thesetofnon-keycolumnsdoesnotmattermuchasonlysourcekeycolumnswillbeconsideredinordertofindthetargetrowtodelete.

TheotheroptionwouldbetomodifythetableobjectPERSON_DETAILSindatastoreandspecifyprimarykeysthere(seethefollowingscreenshot).Inthatcase,wewouldnothavetodefinekeysinthetransformsandusethetargettableloadingoptionasDataServiceswouldpickupthisinformationfromthetargettableobject.Todothat,expandthedatastoreobjectanddouble-clickonthetabletoopenthetableeditor,thendouble-clickonthecolumnandcheckPrimarykeyinthenewlyopenedwindow:

UsingtheTable_ComparisontransformTheTable_ComparisontransformcomparesadatasetgeneratedinsideadataflowtoatargettabledatasetandchangesthestatusesofdatasetrowstodifferenttypesaccordingtotheconditionsspecifiedintheTable_Comparisontransform.

DataServicesusesprimarykeyvaluesfortherowcomparisonandmarksthepassingrowaccordinglyas:aninsertrow,whichdoesnotexistinthetargettableyet;anupdaterow,therowforwhichprimarykeyvaluesexistinthetargettablebutwhosenon-primarykeyfields(orcomparisonfields)havedifferentvalues;andfinally,adeleterow(whenthetargetdatasethasrowswithprimarykeyvaluesthatdonotexistinthesourcedatasetgeneratedinsideadataflow).Insomeway,Table_ComparisondoesexactlythesamethingasMap_Operation:itchangestherowtypeofpassingrowsfromnormaltoinsert,update,ordelete.Thedifferenceisthatitdoesitinasmartway—aftercomparingthedatasettothetargettable.

GettingreadyInordertopreparethesourcedataintheOLTPsystemforthisrecipe,pleaseexecutethefollowingUPDATEintheAdventureWorks_OLTPdatabase.Itonlyupdatesonerowinthetable.updateProduction.ProductDescriptionsetDescription=‘EnhancedChromolysteel.’

whereDescription=‘Chromolysteel.’;

WeperformedthismodificationofthesourcedatasowecanusethischangetodemonstratethecapabilitiesoftheTable_Comparisontransform.

Howtodoit…Ourgoalinthisrecipeissimple.YourememberthatourDWHdatabasesourcesdatafromtheOLTPdatabase.OneofthetablesinthetargetDWHdatabaseweareinterestedinrightnowistheDimProducttable,whichisadimensiontablethatholdstheinformationaboutallcompanyproducts.Inthisrecipe,wearegoingtobuildajob,whichifexecuted,willchecktheproductdescriptionswithinsourceOLTPtables,andifnecessary,willapplyanychangestotheproductdescriptioninourdatawarehousetableDimProduct.

Thisisasmallexampleofpropagatingdatachangeshappeninginthesourcesystemstothedatawarehousetables.

Asanexample,imaginethatweneedtochangethenameofoneofthematerialsusedtoproduceoneofourproducts.InsteadoftheEnglishdescription“Chromolysteel”,wehavetouse“EnhancedChromolysteel”now.PeopleworkingwiththeOLTPdatabaseviaapplicationssystemshavealreadymadetherequiredchange,andnowitisourresponsibilitytodevelopanETLcodethatpropagatesthischangefromthesourcetothetargetdatawarehousetables.

1. Createanewjobwithonedataflow,sourcingdatafromthefollowingOLTPtables

(Productionschema):Product:Thisisatablecontainingproductswithsomeinformation(price,color,andsoon)ProductDescription:ThisisatablecontainingproductdescriptionsProductModelProductDescriptionCulture:Thisisalinkingtable,whichholdsthekeyreferencesofbothProductandProductDescriptiontables

2. Ifyoudonothavethesetablesimportedalreadyintoyourdatastore,pleasedothatinordertobeabletoreferencethemwithinyourdataflowobject.

3. AddaDimProducttablefromDWHasasourcetable.Yes,donotbesurprised,wearegoingtousethesametableasasourceandasatargetwithinthesamedataflow.TheTable_Comparisontransformwillcomparetwodatasets:thesourcedataset,whichisbasedontheDimProducttablemodifiedwiththehelpofthesourceOLTPtablesandthetargetdatasetoftheDimProducttableitself.

4. CreateanewJoinQuerytransformandmodifyitspropertiestojoinallfourtables,asshowninthefollowingscreenshot:

YoucanseethatweusetheProductandProductModelProductDescriptionCulturetablesjusttolinktheProductDescriptiontabletoourtargetDimProducttableinordertogetadatasetofDimProductprimarykeyvaluesandthecorrespondingEnglishdescriptionvaluesforspecificproducts.

5. NexttoyourJoinQuerytransform,placetheTable_Comparisontransform,whichcanbefoundinLocalObjectLibrary|Transforms|DataIntegrator|Table_Comparison.

6. OpentheTable_Comparisoneditorintheworkspaceandspecifythefollowingparameters:

7. Then,placetheMap_OperationtransformcalledMO_Updateanddiscardallrowsofnormal,insert,anddeletetypes,lettingthroughonlyrowswiththeupdatestatus:

8. Finally,linkMO_UpdatetothetargetDimProducttableandcheckwhetheryourdataflowlookslikethefollowingfigure:

Now,savethejobandexecuteit.Then,runthefollowingcommandinSQLServerManagementStudiotochecktheresultdataintheDimProducttable:selectEnglishDescriptionfromdbo.DimProductwhereEnglishDescriptionlike’%Chromolysteel%’;

Youshouldgetthefollowingresultingvalue:EnhancedChromolysteel

Howitworks…ToseewhatexactlyishappeningwiththedatasetbeforeandaftertheTable_Comparisontransform,replicateyourdataflowandchangethecopyinthefollowingmanner:

HerewedumptheresultoftheJoinQuerytransforminthetemporarytabletoseewhichdatasetwecomparetotheDimProducttableinsidetheTable_Comparisontransform.

ExtraMap_OperationtransformsallowustocapturerowsofdifferenttypescomingoutofTable_Comparison.UsingMap_Operation,weconvertallofthemtonormaltypeinordertoinsertthemintotemporarytablestoseewhichrowswereassignedwhichrowtypesbytheTable_Comparisontransform:

NoteAddingmultipletargettemplatetablesafteryourtransformationsisaverypopularmethodofdebugginginETLdevelopment.Itallowsyoutoseeexactlyhowyourdatasetlooksaftereachtransformation.

Let’sseewhatisgoingoninourETLbyanalyzingthedatainsertedintothetemporarytargettables.

ThePRODUCT_TEST_COMPAREtablecontainstherowsstartingfromProductKey=210.ThisissimplybecauseProductKeys<210intheDimProducttabledoesnothaveEnglishdescriptionsinthesourcesystem.

ThePRODUCT_DESC_INSERTtableisempty.Table_ComparisonusestheprimarykeyspecifiedintheInputprimarykeycolumnssectiontoidentifynewrowsintheinputdatasetthatdonotexistinthespecifiedcomparisontable,DWH.DBO.DIMPRODUCT.AsweusedtheDimProducttableasasourceofthePRODUCTKEYvalues,therecouldn’tbeanynewvaluesofcourse.Sonorowswereassignedtheinserttype.

PRODUCT_DESC_UPDATEcontainsexactlyonerowwithanewENGLISHDESCRIPTIONvalue:

Asyoucansee,therestoftherowfieldsDataServiceshassourcedfromthecomparison

table.AllofthemexceptforthecolumnspecifiedintheComparecolumnssectionoftheTable_Comparisontransform.

ThePRODUCT_DESC_DELETEtable,ontheotherhand,hasalotofrecords.Thosearethetargetrecords(fromcomparisontableDimProduct)forwhichprimarykeyvaluesdonotexistinthedatasetcomingtoaTable_ComparisontransformfromaJoinQuerytransform.Asyoumayremember,thosearerecordsthatdonothaveEnglishdescriptionrecordsinthesourcetables.ThisisanoptionalfeatureofTable_Comparison.DataServiceswilluseprimarykeyvaluesofthoserecordstoexecutetheDELETEstatementonthetargettable.YoucaneasilypreventdeleterowsfrombeinggeneratedbycheckingtheDetectdeletedrow(s)fromcomparisontableoptionintheTable_Comparisontransform.

NoteTheFiltersectionofTable_Comparisonallowsyoutoapplyadditionalfiltersonthecomparisontableinordertorestrictthenumberofrowsyouarecomparing.Thisisveryusefulifyourcomparisontableislarge.ThisallowsoptimizingtheresourcesconsumedbyDataServicesinordertoextractandstorethecomparisondatasetandalsospeedsupthecomparisonprocessitself.

ExploringtheAutocorrectloadoptionTheAutocorrectloadoptionisaconvenientmeansDataServicesprovidesforpreventingtheinsertionofduplicatesintoyourtargettable.Thisisthemethodofinsertingdataintoatargettableobjectinsidethedataflow.ItcaneasilybeconfiguredbysettingthetargettableoptiontoYes,withnomoreconfigurationrequired.Thisrecipedescribesdetailsregardingtheusageofthisloadmethod.

GettingreadyForthisrecipe,wewillcreateanewtableintheSTAGEdatabaseandpopulateitwithalistofcurrenciesfromtheDimCurrencydimensiontableintheAdventureWorks_DWHdatawarehouse.

ExecutethefollowingstatementsinSQLServerManagementStudio:SELECTCurrencyAlternateKey,CurrencyName

INTOSTAGE.dbo.NewCurrency

FROMAdventureWorks_DWH.dbo.DimCurrency;

ALTERTABLESTAGE.dbo.NewCurrency

ADDPRIMARYKEY(CurrencyAlternateKey);

WewillusetheAutocorrectloadoptiontomakesurethatourdataflowdoesnotinsertrowsalreadyexistinginthetargettable.

Howtodoit…First,wearegoingtodesignthedataflowthatwillpopulatethetargettableNewCurrency.

Inthedataflow,wewillusetheRow_Generationtransformtogeneratethreenewrows,eachfordifferentcurrencies,andtrytoinsertitintothepreviouslycreatedcurrencystagetableNewCurrency.TheNewCurrencytablealreadyhassomedataprepopulatedfromtheDimCurrencytable.ThatisrequiredifwewanttotesttheAutocorrectloadoption.

ThefirstgeneratedrowwillbeforEURcurrency(theCURRENCYALTERNATEKEYcolumn),whichalreadyexistsinatargettablebutwithadifferentcurrencyname:CURRENCYNAME=‘NEWEURO’.

Thesecondgeneratedrowwillbeanewcurrencywhichdoesnotexistinthetableyet:‘CRO’withCURRENCYNAME=‘CROWN’.

Thethirdgeneratedrowwillbe‘NZD’withCURRENCYNAME=‘NewZealandDollar’,matchingbothvaluesinfieldsCURRENCYALTERNATEKEYandCURRENCYNAMEoftheexistingrecordinNewCurrencytable.

1. Createanewjobandanewdataflow,pickingyourownnamesforthecreated

objects.2. Openthedataflowintheworkspacewindowtoedititandaddthreenew

Row_Generationtransforms,whichwewilluseasasourceofdatawithdefaultparameters.Bydefault,thistransformobjectgeneratesonerowwithasingleIDcolumnpopulatedwithintegervaluesstartingwith0.NamethethreenewlyaddedRow_GenerationtransformsGenerate_EURO,Generate_NZD,andGenerate_CROWN:

3. LinkeachRow_GenerationtransformtoarespectiveQuerytransformtocreateanoutputschemamatchingthetargettableschemawithtwocolumns:CURRENCYALTERNATEKEYandCURRENCYNAME.SeetheexampleforEUROshowninthefollowingscreenshot:

TheothertwoareCRO(CROWN)andNZD(NewZealandDollar)

4. Finally,mergethesethreerowsintoonedatasetwiththehelpoftheMergetransform(LocalObjectLibrary|Transforms|Platform|Merge).

5. MaptheMergetransformoutputtoQuerytransformcolumnswiththesamenamesandlinkQuerytothetargettableNewCurrencypreviouslyimportedintotheDS_STAGEdatastore.

6. CheckthetargetdataintheNewCurrencytablebeforerunningthiscode.ApplyfiltersinaViewDatawindowofthetargettable,asshowninthefollowingscreenshot,toseetheexistingrowsweareinterestedin:

YoucanseethatwehavetworecordsinthetargettableforEURandNZD.

7. Saveandrunthejob.Youshouldgetthefollowingerrormessage:

RecallhowweappliedtheprimarykeyconstraintontheNewCurrencytable.TheDataServicesjobfailsinanattempttoinsertrowswiththeprimarykeyvaluesthatalreadyexistinthetargettable.

8. NowtoenabletheAutocorrectloadoption,openthetargettableeditorintheworkspace.OntheOptionstab,changeAutocorrectloadtoYes:

9. Nowsavethejobandrunitagain.Itrunswithouterrors,andifyoubrowsethedatainthetargettableusingthesamefiltersasbefore,youwillseethatthenewCROcurrencyappearsinthelistandtheEURcurrencyhasanewcurrencyname:

Howitworks…PreventingduplicatedatafrombeinginsertedisoftenoneoftheresponsibilitiesoftheETLsolution.Inthisexample,wecreatedaconstraintobjectonourtargettable,delegatingcontroltothedatabaselevel.Butthisisnotacommonpracticeinmoderndatawarehouses.

Ifnotforthatconstraint,wewouldsuccessfullyhaveinsertedduplicaterowsonthefirstattemptandourjobwouldnotfail.ThebeautyoftheAutocorrectloadoptionisitssimplicity.Allittakesistosetupasingleoptiononatargetobject.Whenthisoptionisenabled,DataServicescheckseachrowbeforeinsertingittoatargettable.

Iftargettablehasarowidenticaltotheincomingdataflowrow,thentherowissimplydiscarded.Ifthetargettablehastherowwiththesameprimarykeyvaluesbutdifferentvaluesinoneormorecolumns,theDataServicesexecutestheUPDATEstatement,updatingallnon-primarykeycolumns.Andfinally,ifthetargettabledoesnothavetherowwiththesameprimarykeyvalues,theDataServicesexecutestheINSERTstatement,insertingtherowintothetargettable.

Youcanbuildadataflowwiththesamelogic,preventingduplicatesfrombeinginsertedbyusingtheTable_Comparisontransform.AutocorrectloadperformsthecomparisonbetweenthedataflowdatasetandthetargettabledatasetjustaswellasTable_Comparisondoes.BothmethodsproduceINSERT/UPDATErowtypes.TheonlydifferenceisthatAutocorrectloadcannotperformthedeletionoftargettablerecords.Thus,themainpurposeoftheAutocorrectloadoptionistoprovideyouwithasimpleandefficientmethodofprotectingyourtargetdatafromincomingduplicaterecords.

WealsousedtheMergetransforminthisrecipe.TheMergetransformdoesthesamethingastheSQLUNIONoperatorandhasthesamerequirements:thedatasetsshouldhavethesameformatinordertobesuccessfullymerged:

MergeisoftenusedincombinationwithTable_Comparison.First,yousplityourrows,assigningthemdifferentrowtypeswithTable_Comparison.Then,youdealwithdifferenttypesofrows,applyingdifferenttransformationsdependingonwhethertherow

isgoingtobeinsertedorupdatedinthetargettable.Finally,youjoinbothsplitdatasetsbackintoonewiththehelpofMergetransformsasyoucannotlinkmultipletransformstoasingletargetobject.

SplittingtheflowofdatawiththeCasetransformTheCasetransformallowsyoutoputbranchlogicinasinglelocationinsideadataflowinordertosplitthedatasetandsendpartsofittodifferentlocations.Theymightbetargetdataflowobjects,suchastablesandfiles,orjustothertransforms.TheuseoftheCasetransformsimplifiesETLdevelopmentandincreasesthereadabilityofyourcode.

GettingreadyInthisrecipe,wewillbuildthedataflowthatreadsthecontentsofthedimensiontableDimEmployeeandupdatesitaccordingtothefollowingbusinessrequirements:

AllmaleemployeesintheproductiondepartmentgetsextravacationhoursAllfemaleemployeesintheproductiondepartmentget10extrasickhoursAllemployeesinthequalityassurancedepartmentgettheirbaserateincreasedby1.5

So,beforeyoubegindevelopingyourETL,makesureyouimporttheDimEmployeetableintheDWHdatastore.Wearegoingtouseitasbothsourceandtargetobjectinourdataflow.

Howtodoit…1. Firstofall,letscalculateaveragevaluesperdepartmentandgenderweareinterested

in.ExecutethefollowingqueriesinSQLServerManagementStudio:—AveragevacationhoursforallmalesinProductiondepartment

selectavg(VacationHours)asAvgVacHrsfromdbo.DimEmployeewhereDepartmentName=‘Production’andGender=‘M’andStatus=‘Current’;

—AveragesickhoursforallfemalesinProductiondepartment

selectavg(SickLeaveHours)asAvgSickHrsfromdbo.DimEmployeewhereDepartmentName=‘Production’andGender=‘F’andStatus=‘Current’;

—AveragebaserateforallemployeesinQualityAssurancedepartment

selectavg(BaseRate)asAvgBaseRatefromdbo.DimEmployeewhereDepartmentName=‘QualityAssurance’andStatus=‘Current’;

2. Pleasenotetheresultantvaluestocomparethemwiththeresultswhenwerunourdataflowafterhavingupdatedthosefields:

3. Createanewjobandanewdataflowobject,andopenthedataflowintheworkspacewindowforediting.

4. PuttheDimEmployeetableobjectasasourceinsideyournewdataflowandlinkittotheCasetransform,whichcanbefoundatLocalObjectLibrary|Transforms|Platform|Case.

5. OpenCaseEditorintheworkspacebydouble-clickingontheCasetransform.Hereyoucanchooseoneofthetreeoptionsandspecifyconditionsaslabel-expressionpairs(bymodifyingtheLabelandExpressionsettings),accordingtowhichtherowwillbesendtooneoutputoranother:

6. Labelvaluesareusedtolabeldifferentoutput.YouwillusetheselabelstooutputinformationtodifferenttransformobjectswhenyouarelinkingCaseoutputtothenextobjectsinadataflow.

7. CheckonlytheRowcanbeTRUEforonecaseonlyoptionandaddthefollowingconditionexpressionsbyclickingontheAddbutton:

Label Expression

Female_in_Production

DIMEMPLOYEE.DEPARTMENTNAME=‘Production’AND

DIMEMPLOYEE.STATUS=‘Current’AND

DIMEMPLOYEE.GENDER=‘F’

Male_in_Production

DIMEMPLOYEE.DEPARTMENTNAME=‘Production’AND

DIMEMPLOYEE.STATUS=‘Current’AND

DIMEMPLOYEE.GENDER=‘M’

All_in_Quality_AssuranceDIMEMPLOYEE.DEPARTMENTNAME=‘Quality

Assurance’ANDDIMEMPLOYEE.STATUS=‘Current’

8. YourCaseEditorshouldlooklikethefollowingscreenshot:

9. NowwehavetolinkourCasetransformoutputtothreedifferentQuerytransformobjects.Eachtimeyoulinktheobjects,youwillbeaskedtochoosetheCaseoutputfromwhatwecreatedbefore.

10. ForQuerytransformnames,letschoosemeaningfulvaluesthatrepresentthetypeoftransformationswearegoingtoperforminsidethem.

TheIncrease_Sick_HoursQuerytransformislinkedtotheFemale_in_ProductionCaseoutputTheIncrease_Vacation_HoursQuerytransformislinkedtotheMale_in_ProductionCaseoutputTheIncrease_BaseRateQuerytransformislinkedtotheAll_in_Quality_AssuranceCaseoutput

11. Lastly,mergeallQueryoutputswiththeMergetransformobject,linkittotheMap_Operationtransformobject,andfinallytotheDimEmployeetableobjectbroughtfromtheDWHdatastoreasatargettable.

12. Pleaseusethefollowingscreenshotasareferenceforhowyourdataflowshouldlook:

13. NowwehavetoconfigureoutputmappingsinourQuerytransforms.Asweareinterestedinupdatingonlythreetargetcolumns—VacationHours,SickLeaveHours,

andBaseRate—wemapthemfromthesourceCasetransform.TheCasetransforminheritsallcolumnmappingsautomaticallyfromthesourceobject.WealsomaptheprimarykeycolumnEmployeeKeysoDataServiceswillknowwhichrowstoupdateinthetarget.

14. ThenineachQuerytransform,modifythemappingexpressionofthecorrespondentcolumnaccordingtothebusinesslogic.Usethefollowingtableforthelistofcolumnsandtheirnewmappingexpressions.RememberthateachofourQuerytransformsmodifiesonlyonecorrespondentcolumn;theothercolumnmappingsshouldremainintact.Wearesimplygoingtopropagatethemfromthesourceobject:

Querytransform Modifiedcolumn Mappingexpression

Increase_Sick_Hours SICKLEAVEHOURSCase_Female_in_Production.SICKLEAVEHOURS

+10

Increase_Vacation_Hours VACATIONHOURSCase_Male_in_Production.VACATIONHOURS+

5

Increase_BaseRate BASERATECase_All_in_Quality_Assurance.BASERATE*

1.5

15. SeetheexampleoftheIncrease_Vacation_Hoursmappingconfiguration:

16. ThelastobjectweneedtoconfigureistheMap_OperationtransformobjectnamedUpdate.YoushouldalreadyknowbynowthattheQuerytransformgeneratesthenormaltypeofrows,whichareinsertedintoatargetobjectwhentheyreachtheendofthedataflow.

17. Inourexample,aswewanttoperformanupdateofnon-keycolumnsdefinedinoursourcedatasetusingmatchingprimarykeyvaluesinthetargettable,weneedtomodifytherowtypefromnormaltoupdate:

18. TobeabsolutelyclearaboutthepurposeofthisMap_Operationobject,wechangetheotherrowtypestodiscard,thoughwewouldnevergettheinsert,update,ordeleterowsinthisdataflowwithoutmodifyingit.

19. Saveandrunthejobandrunthequeriestoseenewaverageresultsforthecolumnsupdatedinthetable:

Thedifferencebetween“before”and“after”valuesprovesthatDataServicescorrectlyupdatestherequiredrowsintheDimEmployeetable.

Howitworks…Thedevelopeddataflowisagoodexampleofadataflowperforminganupdateofthetargettable.

Wehavesplittherowsaccordingtotheconditionsspecified,performedtherequiredtransformationofthedataaccordingtothelogicprovidedintheconditions,andthenmergedallsplitdatasetsbacktogetherandmodifiedallrowtypestoupdate.WedidthissothatDataServiceswouldexecuteUPDATEstatementsforthewholedataset,updatingthecorrespondingrowsthathavethesameprimarykeyvalues.

Asweusedthetargettableasasourceobjectaswell,wecanbesurethatwewillnothaveanyextrarowsinourupdatedatasetthatdonotexistinthetarget.

Notethatthedatasetgeneratedinyourdataflowdoesnothavetomatchexactlythetargettablestructure.Whenyouperformtheupdateofthetargettable,makesureyouhavetheprimarykeydefinedcorrectlyandkeepinmindthatthetargettablewillhaveupdatedallcolumnsdefinedasnon-primarycolumnsinthesourceschemastructure.

NoteDataServicesusesprimarykeycolumnsdefinedinthetargettabletofindthematchingrows.Ifyouwanttouseadifferentsetofcolumnstofindthecorrespondingrecordtoupdateinthetarget,setthemupasprimarykeycolumnsintheoutputschemaoftheQuerytransforminsideadataflow,andsetUseinputkeystoYesintheUpdatecontrolsectionofthetargettableobject.

Thereisanother,lesselegantwayofdoingthesamethingthatCasetransformdoes.ItinvolvesusingtheWHEREtaboftheQuerytransformstofilterthedatarequiredfortransformation:

Thatdoeslooklikeasimplersolution,buttherearetwomaindisadvantages:

Youlosereadabilityofyourcode:WithCasetransform,youcanseelabelsoftheoutput,whichcanexplaintheconditionsusedtosplitthedata.Youloseinperformance:Insteadofsplittingthedataset,youactuallysenditthree

timestodifferentQuerytransforms,eachofwhichperformsthefiltering.Technically,youaretriplingthedataset,makingyourdataflowconsumemuchmorememory.

MonitoringandanalyzingdataflowexecutionWhenyouexecutethejob,DataServicespopulatesrelevantexecutioninformationintothreelogfiles:thetrace,monitor,anderrorlogs.Inlaterchapters,wewilltakeacloserlookattheconfigurationparametersavailableatthejoblevelinordertogathermoredetailedinformationregardingjobexecution.Meanwhile,inthisrecipe,wewillspendsometimeanalyzingthemonitorlogfile,whichlogsprocessinginformationfrominsidethedataflowcomponents.

GettingreadyForsimplicity,wewillusetheseconddataflowfromtherecipeUsingtheTable_ComparisontransformcreatedfordetailedexplanationoftheflowofthedatabeforeandafteritpassestheTable_Comparisontransformobject:

OpentheTable_ComparisontransformeditorintheworkspaceandchangethecomparisonmethodtoCachedcomparisontable:

WechangethisoptiontoslightlychangethebehavioroftheDataServicesoptimizer.Now,insteadofcomparingdatarowbyrow,executingtheSELECTstatementagainstthecomparisontableinthedatabaseforeachinputrow,DataServiceswillreadthewholecomparisontableandcacheitontheDataServicesserverside.OnlyafterthiswillitperformthecomparisonofinputdatasetrecordswithtablerecordscachedontheDataServicesserverside.Thatslightlyspeedsupthecomparisonprocessandchangeshowtheinformationaboutdataflowexecutionisloggedinthemonitorlog.

Howtodoit…1. Savethedataflowandexecutethejobwiththedefaultparametersasusual.2. Inthemainworkspace,opentheJobLogtabtoshowthetracelogsection,which

containsinformationaboutjobexecution.Toseethemonitorlog,clickonthesecondbuttonatthetopoftheworkspacearea.Forconvenience,youmayselecttherecordsfromthelogyouareinterestedinandcopyandpastethemintotheExcelspreadsheetusingtheright-clickcontextmenu:

3. Thismonitorlogsectiondisplaysinformationaboutthenumberofrecordsprocessedbyeachdataflowcomponentandhowlongittakestoprocessthem.Thereadercomponentsshowninthefollowingscreenshotareresponsibleforextractinginformationfromthesourcedatabasetables.YoucanseethattheDimProducttableisextractedbyaseparateprocess(probablybecauseitislocatedinadifferentdatabase),whereastheotherthreetablesarejoinedandextractedwithasingleSELECTstatementbyasinglecomponentwithquiteasophisticatedname,asyoucansee:

4. ThecomponentJoin_PRODUCT_TEST_COMPAREpassesthedatasetfromtheJoinQuerytransformtothefirsttargettable,PRODUCT_TEST_COMPARE.Youcanseethatithasprocessed396rowsin0.136seconds:

5. Finally,informationaboutdataflowcomponentsresponsibleforprocessingdatainMap_Operationtransformsshowsthattherewere210rowsprocessedbytheMO_DeletetransformandpassedtoatargetPRODUCT_DESC_DELETEtemplatetable.OnlyonerowwasprocessedbyMO_Updateandpassedtoa

correspondingtargettableandnorowswereprocessedbyMO_Insertasthereweren’tanyrowswithinsertrowtypegeneratedbythisdataflow:

6. Thelastcolumnshowsthetotaltimepassedintheexecuteddataflowobjectwhenthecomponentwasprocessingrecords.

Howitworks…DataServicesputsprocessinginformationfromalldataflowobjectsinasingleplace.Ifyouhaveajobwith100dataflowsandsomeofthemruninparallel,youcanimaginethatrecordsinthemonitorlogcouldbemixed.ThatiswhycopyingthelogdatatoaspreadsheetforfurthersearchandfilteringwithfunctionalityofExcelisquiteuseful.

Dataflowexecutionisaverycomplexprocess,andthecomponentsyouseeinthemonitorlogarenotalwaysinaone-to-onerelationshipwiththeobjectsplacedinsideadataflow.Therearevariousinternalservicecomponentsperformingjoins,splits,andthemergingofdatathatwillbedisplayedinthemonitorlog.SometimesDataServicescreatesafewprocessingcomponentsforasingletransformobject.

Ifyouknowwhatyouarelookingfor,readingthemonitorlogismucheasier.Hereisasummaryofwhatthecolumnsmean:

Thefirstcolumninthemonitorlogisthenameofthecomponentcontainingthenameofthedataflowandthenamesofthecomponentsinsidethedataflow.Thesecondcolumnisthestatusoftheprocessingcomponent.READYmeansthatthecomponenthasnotstartedprocessingdata;inotherwords,norecordshavereachedityet.PROCEEDmeansthatthecomponentisprocessingrowsatthemoment,andSTOPmeansthatallrowshavepassedthecomponentandithasfinishedprocessingthembypassingthemfurtherdownthedataflowexecutionsequence.Thethirdcolumnshowsyouthenumberofrowsprocessedbyacomponent.ThisvalueisinfluxwhilethecomponenthasthePROCEEDstatusandattainsafinalvaluewhenthecomponent’sstatuschangestoSTOP.Thefourthcolumnshowsyoutheexecutiontimeofthecomponent.Thefifthcolumnshowsyouthetotalexecutiontimeofthedataflowwhilethecomponentwasprocessingtherows.Assoonasthecomponent’sstatuschangestoSTOP,bothexecutiontimevaluesfreezeandstopchanging.

Toillustratethisevenfurther,let’scounttherowsinthesourcetablestocomparewithwhatwehaveseeninthemonitorlog.

First,seetheresultsofcountingthenumberofrecordsinthetablesDIMPRODUCTandPRODUCTMODELPRODUCTDESCRIPTIONCULTUREwiththehelpoftheViewDatafunctionavailableontheProfileTabfortableobjectsinsideadataflow.ClickontheRecordsbuttontocalculatethenumberofrecordsinthetable:

NowseetheresultofcountingthenumberofrecordsinthetablesPRODUCTandPRODUCTDESCRIPTIONwiththesameViewData|Profilefeature:

ByusingthetransformnameJoin,youcanseethecomponentsrelatedtotheexecutionofthefirstQuerytransform.

YouseetheDIMPRODUCT_11component(606rows)asbeingnotpartoftheJointransformcomponentsbecauseitwasexecutedseparately.DataServicescouldnotincludeitinasingleSELECTstatement(rememberthatthistableisintheDWHdatabase)withthreeothertablesthathadjoinconditionsspecifiedinsidetheJointransform.DataServicescouldrecognizethemasbelongingtothesamedatabaseandpusheddownthesingleSELECTstatementtothedatabaselevel,extracting294rows.

Somecomponents,thatisMap_Operationrelatedones,areeasilyrecognizablebyname,whichincludesthenameofthecurrenttransformationandthenexttargettableobjectname:Join_PRODUCT_TEST_COMPARE,MO_Update_PRODUCT_DESC_UPDATE,andsoon.

TheTable_Comparisonexecutionisthemostcomplexone,asyoucanseefromthemonitorlog.Allcompareddatasetsarefirstcachedbyseparatecomponentsandthencomparedtoeachotherbytheotherones.YoucanidentifycomponentsbelongingtoaTable_ComparisontransformbyusingthekeywordsTCRdrandTable_Comparison.

There’smore…Readingthemonitorlog,whichisthemainsourceofthedataflowexecutioninformation,canrequirealotofexperience.Inthefollowingchapters,wewillspendalotoftimepeekingintothemonitorlogfordifferentkindsofinformationaboutthedataflowexecution.Often,itisveryusefulforidentifyingpotentialperformancebottlenecksinsidethedataflow.

Chapter5.Workflow–ControllingExecutionOrderThischapterwillexplainindetailanothertypeofDataServicesobject:workflow.Workflowobjectsallowyoutogroupotherworkflows,dataflowsandscriptobjectsintoexecutionunits.Inthischapter,wewillcoverthefollowingtopics:

CreatingaworkflowobjectNestingworkflowstocontroltheexecutionorderUsingconditionalandwhileloopobjectstocontroltheexecutionorderUsingthebypassingfeatureControllingfailures–try-catchobjectsUsecaseexample–populatingdimensiontablesUsingacontinuousworkflowPeekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjects

IntroductionInthischapter,wewillmovetothenextobjectintheDataServiceshierarchyofobjectsusedinETLdesign:theworkflowobject.Workflowsdonotperformanymovementofdatathemselves;theirmainpurposeistogroupdataflows,scripts,andotherworkflowstogether.

Inotherwords,workflowsarecontainerobjectsgroupingpiecesofETLcode.TheyhelpdefinethedependenciesbetweenvariouspiecesofETLcodeinordertoproviderobustandflexibleETLarchitecture.

IwillalsoshowyouhowyoucanquerytheDataServicesrepositoryusingdatabasetoolsinordertoquerythehierarchyofobjectsdirectlyandwillshowyouhowthishierarchyisstoredinrepositorydatabasetables.Thismaybeveryusefulifyouwanttounderstandabitmoreabouthowthesoftwareisfunctioning“underthehood”.

Additionally,wewillbuildareal-lifeusecaseETLcodebypopulatingdimensiontablesindatawarehouse.ThisusecaseexamplewillincludethefunctionalityalreadyreviewedinthepreviouschaptersandwillshowyouhowyoucanaugmentexistingETLprocessesandmigratedata(dataflows)withthehelpofworkflowobjects.

CreatingaworkflowobjectAworkflowobjectisareusableobjectinDataServices.Oncecreated,thesameobjectcanbeusedindifferentplacesofyourETLcode.Forexample,youcanplacethesameworkflowindifferentjobsornestitinotherworkflowobjectsbyplacingthemintheworkflowworkspace.

NoteNotethataworkflowobjectcannotbenestedinsideadataflowobject.Workflowsareusedtogroupdataflowobjectsandotherworkflowssothatyoucancontroltheirexecutionorder.

Everyworkflowobjecthasitsownlocalvariablescopeandcanhaveasetofinput/outputparameterssothatitcan“communicate”withtheparentobject(inwhichitisnested)byacceptinginputparametervaluesorsendingvaluesbackthroughoutputparameters.Ascriptobjectplacedinsidetheworkflowbecomespartoftheworkflowandsharesitsvariablescope.ThatiswhyallworkflowlocalvariablescanbeusedwithinthescriptsplaceddirectlyintotheworkfloworpassedtothechildobjectsbygoingtoVariablesandParameters|Calls.

Laterinthischapter,wewillexplorehowthisobjecthierarchyisstoredwithintheDataServicesrepository.

Howtodoit…Therearefewwaystocreateaworkflowobject.Followthesesteps:

1. Tocreateaworkflowobjectintheworkspaceoftheotherparentobject,youcanuse

thetoolpalletontheright-handsideoftheDesignerinterface.Followthesesteps:1. Createanewjobandopenitintheworkspaceforediting.2. Left-clickontheWorkFlowiconintheworkspacetoolpalette(seethe

followingscreenshot),dragittothejobworkspace,andleft-clickontheemptyspaceintheworkspacetoplacethenewworkflowobject:

3. NametheobjectWF_exampleandpressEntertocreateit.NotethattheobjectimmediatelyappearsintheLocalObjectLibraryworkflowlist.TheparentobjectoftheWF_exampleworkflowisthejobitself.

2. CreateanotherworkflowobjectinsideWF_example.Now,wewilluseadifferentmethodtocreateworkflowsdirectlyfromLocalObjectLibraryratherthanusingtheworkspacetoolpalette.Then,performthesesteps:1. OpenWF_exampleinthemainworkspacewindow.2. GototheLocalObjectLibrarywindowandselecttheWorkFlowstab.3. Right-clickontheLocalObjectLibraryemptyareaofthistabandchoose

Newfromthecontextmenu.4. Fillintheworkflowname,WF_example_child,anddraganddropthecreated

objecttotheworkspaceareaofWF_examplefromLocalObjectLibrary.

Howitworks…AworkflowobjectorganizesandgroupspiecesofETLprocesses(dataflowandsometimesscripts).Itdoesnotperformanydataprocessingitself.Whenitisbeingexecuted,itsimplystartsexecutingsequentially(orinparallel)allitschildobjectsintheorderdefinedbytheuser.

Youcanthinkofaworkflowasacontainerthatholdstheexecutableelements.Likeaprojectobjectfunctionissimilartoarootfolder,workflowservesthesame“folder”functionalitywithafewextrafeatures,whichyouwillbeabletogetfamiliarwithinthenextfewrecipes.

Likethefolderstructureonyourdisk,youcancreatesophisticatednestedtreestructureswiththehelpofworkflowobjectsbyputtingthemintoeachother.

Onethingtorememberisthateachworkflowhasitsownscopeofvariablesorcontext.Topassvariablesfromaparentworkflowtoachildobject,selecttheCallstabontheVariablesandParameterspanel.Itshowsthelistofinputparametersfromthechildobjectsfortheobjectcurrentlyopeninthemainworkspacearea.

ToopentheVariablesandParameterswindow,youcanclickontheVariablesbuttoninthetoolmenuatthetopofyourDesignerscreen.

Here,youseethecontextofthecurrentlyopenobject,thatis,thelistofdefinedlocalvariables,inputparameters,andavailableglobalvariablesinheritedfromthejobcontext:

TheCallssectionallowsyoutopassyourpreviouslycreatedlocalvariable$WF_example_local_varoftheWF_exampleworkflowtotheWF_example_childchildworkflowobject’s$WF_example_child_var1inputparameter,asshownhere:

Ofcourse,youhavetoopenthechildobjectcontextfirstandcreateaninputparameterso

thatitscallisvisibleinthecontextoftheparent.

Scriptsarenotreusableobjectsanddonothavelocalvariablescopeorparametersoftheirown.Theybelongtotheworkfloworjobobjecttheyhavebeenplacedinto.Inotherwords,theycanseeandoperateonlyonthelocalvariablesandparametersdefinedattheparentobjectlevel.

Ofcourse,youcancopyandpastethecontentsofasinglescriptobjecttoanotherscriptobjectinadifferentworkflow.However,itwillbeanewinstanceofthescriptobjectthatwillberunninginanewcontextofthedifferentparentworkflow.Hence,thevariablesandparametersusedcouldbecompletelydifferent.

NestingworkflowstocontroltheexecutionorderInthisrecipe,wewillseehowworkflowobjectsareexecutedinthenestedstructure.

GettingreadyWewillnotcreatedataflowobjectsinthisrecipe,sotoprepareanenvironment,justcreateanemptyjobobject.

HowtodoitWewillcreateanestedstructureofafewworkflowobjects,eachofwhich,whenexecuted,willrunthescript.Itwilldisplaythecurrentworkflownameandthefullpathtotherootjobcontext.Followthesesteps:

1. Inthejobworkspace,createanewworkflowobject,WF_root,andopenit.2. IntheVariablesandParameterswindow,whenintheWF_rootcontext,createone

localvariable$l_wf_nameandoneinputparameter$p_wf_parent_name,bothofthevarchar(255)datatype.

3. Also,insideWF_root,addthenewscriptobjectnamedScriptwiththefollowingcode:$l_wf_name=workflow_name();

print(‘INFO:running{$l_wf_name}(parent={$p_wf_parent_name})’);

$l_wf_name=$p_wf_parent_name||’>’||$l_wf_name;

4. InthesameWF_rootworkflowworkspace,addtwootherworkflowobjects,WF_level_1andWF_level_1_2,andlinkallofthemtogether.

5. Repeatsteps2and3forbothnewworkflowsWF_level_1andWF_level_1_2.6. OpenWF_level_1,createanewworkflow,WF_parallel,andlinkittothescript

object.7. InsidetheWF_level_1workflow,createtwootherworkflowobjects,WF_level_3_1

andWF_level_3_2.Then,createonlyoneinputparameter,$p_wf_parent_name,withoutcreatingalocalvariable.

8. Repeatsteps2and3forboththeWF_level_3_1andWF_level_3_2workflows.9. Now,wehavetospecifymappingsfortheinputparametersofthecreatedworkflows.

Todothis,double-clickonparametername$p_wf_parent_namebygoingtoVariablesandParameters|Callsandinputthenameofthe$l_wf_namelocalvariable.

10. Therearetwoexceptionstotheinputparametermappingsettings.InthecontextofthejobfortheinputparameteroftheWF_rootworkflow,youhavetospecifythejob_name()functionasavalue.Performthesesteps:1. Openthejobinthemainworkspace(sothattheWF_rootworkflowisvisibleon

thescreen).2. ChooseVariablesandParameters|Callsanddouble-clickonthe

$p_wf_parent_nameinputparametername.3. IntheValuefield,enterthejob_name()functionandclickonOK.

11. ThesecondexceptionistheinputparametermappingsforworkflowsWF_level_3_1andWF_level_3_2.Performthefollowingsteps:1. OpentheWF_parallelworkflowtoseebothWF_level_3_1andWF_level_3_2

displayedonthescreen.2. GotoVariablesandParameter|Callsandspecifythefollowingvalueforboth

inputparametercalls:(($p_wf_parent_name||’>’)||workflow_name())

12. Yourjobshouldhavethefollowingworkflownestedstructure,asshowninthe

screenshothere:

TheonlyworkflowobjectthatdoesnothaveascriptobjectinsideitisWF_parallel.Itwillbeexplainedlaterintherecipe.

13. Now,openthejobintheworkspaceareaandexecuteit.14. Thetracelogshowstheorderofworkflowexecutions,currentlyexecutedworkflow

names,andtheirlocationintheobjecthierarchywithinthejob.Seethefollowingscreenshot:

Howitworks…Aswehavepassedvaluestotheinputparametersoftheobjectsinthepreviouschapterdedicatedtothecreationofdataflowobjects,youprobablyalreadyknowhowthismechanismworks.Theobjectcallsfortheinputparametervaluerightbeforeitsexecutionintheparentobjectwhereitislocated.

Everyworkflowinourstructure(exceptWF_parallel)hasalocalvariablethatisusedinthescriptobjecttosaveanddisplaythecurrentworkflownameandconcatenateittotheworkflowpathinthehierarchyreceivedfromtheparentobjectinordertopasstheconcatenatedvaluetothechildobjectintheircalls.

Let’sfollowtheexecutionssteps:

Whenajobexecutes,itfirstrunstheobjectthatislocatedinthejobcontext;inourcase,itisWF_root.Aswedonotspecifyanylocalvariableforthejob,wecannotpassitsvaluetotheinputparameteroftheWF_rootobject.So,wesimplypassitajob_name()functionthatreturnsthenameofthejobwhereitisbeingexecuted.Thejob_name()functiongeneratesthevaluethatispassedtotheinputparameterrightbeforetheWF_rootexecution.TheWF_rootexecutionrunsthescriptobjectfromlefttoright.Inthescript,thelocalvariablegetsthevaluefromtheoutputoftheworkflow_name()function,whichreturnsthenameoftheworkflowwhereitisbeingexecuted.Withtheprint()function,wedisplaythelocalvariablevalueandvalueoftheinputparameterreceivedfromtheparentobject(job).Asthenextstep,thevalueofthelocalvariableisbeingconcatenatedwiththevalueoftheinputparametertogetthecurrentlocationpathinthehierarchyforthechildobjectsWF_level_1andWF_level_1_2.AsallobjectsinsideWF_rootarelinkedtogether,theyareexecutedsequentiallyfromlefttoright.Everynextobjectonlyrunsaftersuccessfulcompletionofthepreviousobject.DataServicesrunsWF_level_1andrepeatsthesamesequenceofdisplayingthecurrentworkflownameandcurrentpathwiththeconsequentconcatenationandpassingofthevaluetotheinputparameteroftheWF_parallelworkflow.TheWF_parallelworkflowdemonstrateshowDataServicesexecutestwoworkflowobjectsplacedinthesamelevelthatarenotlinkedtoeachother.Here,wecannotusethescripttopreparetoperformourusualsequenceofscriptlogicsteps.Ifyoutrytoaddascriptobjectnotlinkedtotheparallelworkflows,DataServicesgivesyouanerrormessagefromthejobvalidationprocess:

Ifyoutrytolinkthescriptobjecttooneoftheworkflows,youwillgetthefollowingerrormessage:

NoteNotehowDataServicesdoesnotallowyoutolinkthescriptobjecttobothworkflows.

Ifusedwithinajoboraworkflow,scriptobjectsdisableparallelexecutionlogic,allowingyouonlyasequentialexecutionwithinthecurrentcontext:

Tomakesurethatyourworkflowexecutessimultaneouslyandrunsinparallel,makesurethatyoudonotusethescriptobjectinthesameworkspace.

Thatiswhy,whenwepassthevaluestotheinputparametersoftwoworkflowsexecutedinparallel,WF_level_3_1andWF_level_3_2,wespecifytheconcatenationformularightintheinputparametervaluefield:

It’sveryimportanttounderstandthat$p_wf_parent_namearetwodifferentparametersintheprecedingscreenshot.Theoneontheleft-handsideisthe$p_wf_parent_nameinputparameterbelongingtothechildobjectWF_level_3_1,whichasksforavalue.Theoneontheright-handsidebelongstothecurrentworkflowWF_parallel,inwhichcontextwearelocatedatthemoment,anditholds

thevaluereceivedfromitsparentobjectWF_level_1.

AftercompletionofWF_level_3_1andWF_level_3_2,DataServicescompletestheWF_parallelworkflow,thentheWF_level_1workflow,andfinallyrunstheWF_level_1_2workflow.WF_rootisthelastworkflowobjectthatisfinishingitsexecutionwithinthejob,sothejobcompletesitsexecutionsuccessfully.

Seethetracelogagaintofollowthesequenceofstepsexecuted,andmakesurethatyouunderstandwhytheywereexecutedinthisparticularorder.

UsingconditionalandwhileloopobjectstocontroltheexecutionorderConditionalandwhileloopobjectsarespecialcontrolobjectsthatbranchtheexecutionlogicattheworkflowlevel.Inthisrecipe,wewillmodifythejobfromthepreviousrecipetomaketheexecutionofourworkflowobjectsmoreflexible.

ConditionalandloopstructuresinDataServicesaresimilartotheonesusedinotherprogramminglanguages.

Forreaderswithnoprogrammingbackground,hereisabriefexplanationofconditionalandloopstructures.

TheIF-THEN-ELSEstructureallowsyoutochecktheresultoftheconditionalexpressionpresentedintheIFblockandexecuteseithertheTHENblockorELSEblockdependingonwhethertheresultoftheconditionalexpressionisTRUEorFALSE.TheLOOPstructureinprogramminglanguageallowsyoutoexecutethesamecodeagainandagainintheloopuntilthespecifiedconditionismet.Youshouldbeverycarefulwhencreatingloopstructuresinprogramminglanguageandcorrectlyspecifytheconditionthatexitsorendstheloop.Ifincorrectlyspecified,thecodeintheloopcouldrunindefinitely,makingyourprogramhang.

GettingreadyOpenthejobfromthepreviousrecipe.

Howtodoit…WewillgetridofourWF_parallelworkflowandexecuteonlyoneoftheunderlyingWF_level_3_1orWF_level_3_2workflowsrandomly.Thisisnotacommonscenarioyouwillseeinreallife,butitgivesaperfectexampleofhowDataServicesallowsyoutocontrolyourexecutionlogic.Performthesesteps:

1. OpenWF_level_1intheworkspaceandremoveWF_parallelfromit.2. Usingthetoolpaletteontheright-handside,createaconditionalobject,andlink

yourscriptobjecttoit.NametheconditionalobjectIf_Then_Else:

3. Double-clickontheIf_Then_ElseconditionalobjectorchooseOpenfromtheright-clickcontextmenu.

4. Youcanseethreesections:If,Then,andElse.IntheThenandElsesections,youcanputanyexecutionalelements(workflows,scripts,ordataflows).TheIffieldshouldcontaintheexpressionreturningaBooleanvalue.IfitreturnsTRUE,thenallobjectsintheThensectionareexecutedinsequentialorparallelorder,dependingontheirarrangement.IftheexpressionreturnsFALSE,thenallelementsfromtheElsesectionareexecuted:

5. PutWF_level_3_1fromLocalObjectLibraryintotheThensection.6. PutWF_level_3_2fromLocalObjectLibraryintotheElsesection.7. Mapinputparametercallsofeachworkflowtothelocal$l_wf_namevariableofthe

parentWF_level_1workflowobject.YoucannowseethatwithouttheWF_parallelworkflow,bothWF_level_3_1andWF_level_3_2areoperatingwithinthecontextoftheWF_level_1workflow(rememberthattheconditionalobjectdoesnothaveitsowncontextandvariablescope,anditistransparentinthataspect).

8. Typeinthefollowingexpressionthatrandomlygenerates0or1intheIfsection:cast(round(rand_ext(),0),‘integer’)=1

Wewillusethisexpressiontorandomlygenerateeither0or1inordertoexecutetheETLplacedinTHENorELSEblockseverytimeweruntheDataServicesjob.

9. Saveandexecutethejob.Thetracelogshowsthatonlyoneworkflow,WF_level_3_2,wasexecuted.TohavemorevisibilityonthevaluesgeneratedbytheIfexpression,youcanputinthescriptbeforeIf_Then_Elseandassignitsvaluetoalocalvariable,whichcanbeusedafterthatintheIfsectionoftheIf_Then_ElseobjecttogettheBooleanvalue:

Now,let’smakeourlastworkflowobjectinthejobrun10timesinaloop,usingthesesteps:

1. OpenWF_rootinthemainworkspace.2. DeleteWF_level_1_2fromtheworkspace.3. Addawhileloopobjectfromthetoolpalette,nameitWhile_Loop,andlinkitto

WF_level_1,asshowninthefollowingscreenshot.Asweknowthatwearegoingtorunaloopfor10cycles,weneedtocreateacounterthatwewilluseintheloopcondition.Forthispurpose,createa$l_countlocalintegervariablefortheWF_rootworkflowandassignitavalue“1”intheinitialscript.YourcodeintheScriptobjectshouldlooklikethis:$l_wf_name=workflow_name();

print(‘INFO:running{$l_wf_name}(parent={$p_wf_parent_name})’);

$l_wf_name=$p_wf_parent_name||’>’||$l_wf_name;

$l_count=1;

4. OpentheWhile_LoopintheworkspaceandplacetheWF_level_1_2workflowbycopyingordraggingitfromLocalObjectLibrary.

5. Placetwoscriptobjects,scriptandincrease_counter,beforeandaftertheworkflowandlinkallthreeobjectstogether.

6. Theinitialscriptwillcontaintheprint()functiondisplayingthecurrentloopcycle,andthefinalscriptwillincreasethecountervalueby1.YoualsohavetoputtheconditionalexpressionthatchecksthecurrentcountervalueinthewhilefieldoftheWhile_Loopobject.Theexpressionis$l_count<=10:

Theconditionalexpressionischeckedaftereachloopcycle.TheloopexecutessuccessfullyassoonastheconditionalexpressionreturnsFALSE.

7. Mapthe$p_wf_parent_nameinputparameterofWF_level_1_2tothelocalvariablefromtheparent’scontext,$l_wf_name,bygoingtoVariablesandParameters|Calls.

8. Saveandexecutethejob.CheckyourtracelogfiletoseethatWF_level_1_2wasexecuted10times:

Howitworks…Theif-then-elseconstructionisavailableinthescriptinglanguageaswell,butasyouknowalready,theusageofscriptobjectswithworkflowsisquitelimited—youcanonlyjointheseobjectssequentially.Thisiswhereconditionalobjectscomeinaction.

Themaincharacteristicoftheconditionalandwhileloopobjectsisthattheyarenotworkflowsanddonothavetheirowncontext.Theyoperatewithinthevariablescopeoftheirparentobjectsandcanonlybeplacedwithinaworkfloworjobobject.Thatiswhy,youneedtocreateanddefinealllocalvariablesusedintheif-then-elseorwhileconditionalexpressioninsidetheparentobjectcontext.

NoteScriptobjectshavetheirownif-then-elseandwhileloopconstructions,andtobranchlogicwithindataflows,youcanuseCase,Validation,orsimplyQuerywithfilteringconditionstransforms.

Thereismore…WorkflowobjectsthemselveshaveafewoptionstocontrolhowtheyareexecutedwithinthejobthataddssomeflexibilitytotheETLdesign.Theywillbeexplainedinthefollowingrecipesofthischapter.Now,wewilljusttakealookatoneofthem.

ThisistheExecuteonlyonceoptionavailableintheworkflowobjectpropertieswindow.

Toopenit,justright-clickontheworkfloweitherintheworkspaceorinLocalObjectLibraryandchooseProperties…fromthecontextmenu:

Toseetheeffectthisoptionhasonworkflowexecution,takethejobfromthisrecipeandtickthisoptionfortheWF_level_1_2workflow—theonethatrunsinloop.

Then,savethejobandexecuteit.Thetraceloglookslikethisnow:

Whatishappeningherenowisthataftersuccessfullyexecutingtheworkflowforthefirsttimeinthefirstrun,thewhilelooptriestodothisanother9times.However,astheworkflowhasalreadyrunwithinthisjobexecution,itskipsitwithsuccessfulworkflowcompletionstatus.

Thisoptionisrarelyusedwithinaloopasyou,ofcourse,donotputanythinginloopthatcanbeexecutedonlyonce,butitshowshowDataServicesdealswithworkflowslikethis.

Themostcommonscenarioiswhenyouputthespecificworkflowinmultiplebranchesoftheworkflowhierarchyasadependencyforotherworkflowsandyouonlyneedittobeexecutedoncewithoutcaringwhichbranchitwillbeexecutedinfirstaslongasitcompletessuccessfully.

Thescopeofthisoptionisrestrictedbyajoblevel.Ifyouplacetheworkflowwiththisoptionenabledinmultiplejobsandruntheminparallel,theworkflowwillbeexecutedonceineachjob.

UsingthebypassingfeatureThebypassingoptionallowsyoutoconfigureaworkflowordataflowobjecttobeskippedduringthejobexecution.

Gettingready…Wewillusethesamejobasinthepreviousrecipe.

Howtodoit…Let’sconfiguretheWF_level_1workflowobjectthatbelongstotheparentWF_rootworkflowtobeskippedpermanentlywhenthejobruns.

Theconfigurationofthisfeaturerequirestwosteps:creatingabypassingsubstitutionparameterandenablingthebypassingfeaturefortheworkflowusingacreatedsubstitutionparameter.

1. Tocreateabypassingsubstitutionparameter,followthesesteps:

1. GotoTools|SubstitutionParameterConfigurations….2. OntheSubstitutionParameterEditorwindow,youcanseethelistofdefault

substitutionparametersusedbyDataServices.3. Clickontheemptyfieldatthebottomofthelisttocreateanewsubstitution

parameter.4. Youcanchooseanynameyouwant,butrememberthatallsubstitution

parametersstartwiththedoubledollarsign.5. Callyournewsubstitutionparameter$$BypassEnabledandchoosethedefault

valueYESintheConfigurationcolumntotheright:

6. Asafinalstep,clickonOKtocreatethesubstitutionparameter.2. Now,youcan“label”anyworkflowobjectwiththissubstitutionparameterifyou

wantittobebypassedduringjobexecution.Followthesesteps:1. OpentheWF_rootworkflowwithinyourjobtoseeWF_level_1inthemain

workspacewindow.2. Right-clickontheWF_level_1workflowandchooseProperties…fromthe

contextmenutoopentheworkflowpropertieswindow.3. ClickontheBypassfieldcomboboxandchoosethenewlycreatedsubstitution

parameterfromthelist,[$$BypassEnabled].Bydefault,the{NoBypass}valueischoseninthisfield:

4. ClickonOK.Theworkflowbecomesmarkedwithacrossedred-circleicon.Thismeansthatduringthejobexecution,thisworkflowwillbeskipped,andthenextobjectinthesequencewillbeexecutedstraightaway:

Howitworks…Now,let’sseewhathappenswhenyourunthejob:

Duringjobvalidation,youcanseeawarningmessagetellingyouthataparticularworkflowwillbebypassed:

Whenthejobisexecuted,itrunstheworkflowsequenceasusual,exceptwhenitgetstothebypassedworkflowobject.Theworkflowobjectisskippedandalldependentobjects,theobjectnextinsequenceandtheparentworkflowwherethebypassedobjectresides,consideritsexecutiontobesuccessful.Ifyoutakealookatthetracelogofthejobexecution,youwillseesomethingsimilartothisscreenshot:

Thereismore…InDataServices,thereismorethanonewaytosetuptheworkflowobjectasbypassed.Ifyouright-clickontheworkflowobject,youwillseethattheBypassoptionisavailableinthecontextmenudirectly.ItopenstheSetBypasswindowwiththesamecomboboxlistofsubstitutionparametervaluesavailableforthisoption.

NoteYoucannotonlybypassworkflows.Dataflowobjectscanbebypassedinthesamemanner.

Controllingfailures–try-catchobjectsIntheCreatingcustomfunctionsrecipeinChapter3,DataServicesBasics–DataTypes,ScriptingLanguage,andFunctions,wecreatedacustomfunctionshowinganexampleofthetry-catchblockexceptionhandlinginthescriptinglanguage.Likeinthecaseofif-then-elseandwhileloop,DataServiceshasavariationofthetry-catchconstructionfortheworkflow/dataflowobjectlevelaswell.Youcanputthesequenceoftheexecutableobjects(workflows/dataflows)betweenTryandCatchobjectsandthencatchpotentialerrorsintheCatchobjectwhereyoucanputscripts,dataflows,orworkflowsthatyouwanttoruntohandlethecaughterrors.

Howtodoit…Thestepstodeployandenabletheexceptionhandlingblockinyourworkflowstructureareextremelyeasyandquicktoimplement.

Allyouhavetodoisplaceanobjectorsequenceofobjectsfromwhichyouwanttocatchpossibleexceptionsbetweentwospecialobjects,TryandCatch.Then,followthesesteps:

1. Openthejobfromthepreviousrecipe.2. OpenWF_rootintheworkspace.3. ChoosetheTryobjectfromtheright-sidetoolpaletteandplaceitatthebeginningof

theobjectsequence.NameitTry:

4. ChoosetheCatchobjectfromtheright-sidetoolpaletteandplaceitattheendoftheobjectsequence.NameitCatch:

5. TheTryobjectisnotmodifiableanddoesnothaveanypropertiesexceptdescription.Itsonlypurposeistomarkthebeginningofthesequenceforwhichyouwanttohandleanexception.

6. Double-clickontheCatchobjecttoopenitinthemainworkspace.Notethatallexceptiontypesareselectedbydefault.Thisway,wemakesurethatwecatchanypossiblefailuresthatcanhappenduringourcodeexecution.Ofcourse,therecanbescenarioswhenyouwanttheETLtofailanddonotwanttorunthecodeintheCatchblockforsometypesoferrors.Inthiscase,youcandeselecttheexceptiontobehandledintheCatchblock.Inourexample,wejustwantourcodetocontinuetorunputtingtheerrormessageinthetracelog.

7. Createthescriptobjectwiththefollowinglineinit:print(‘ERROR:exceptionhasbeencaughtandhandledsuccessfully’);

8. Saveandexecutethejob.Theexceptionyougeneratedinthescriptissuccessfullyhandledbythetry-catchconstruction,andthejobcompletessuccessfully.

Howitworks…Ifyoutakealookatthetracelogofyourjobrun,youcanseethattheWF_level_3_1andWF_level_1workflowsfailed:

WF_level_3_1failedastheexceptionwasraisedinthescriptinsideit,andWF_level_1failedbecauseitsexecutiondependsonthechildobjectWF_level_3_1.Youshouldrememberthatifanychildobjectswithinaworkflowfail(anotherworkflow,dataflow,orscript),theparentobjectfailsimmediately.Then,theparent’sparentobjectfailsaswell,andsoon,untiltherootlevelofthejobhierarchyisreachedandthejobitselffailsandstopsit’sexecution.

Byplacingthetry-catchsequenceinsideWF_root,wemadeitpossibletocatchallexceptionsinsideit,makingsurethatourWF_rootworkflowneverfails.

NoteTry-catchobjectsdonotpreventajobfromfailinginthecaseofthecrashofthejobserveritself.Thisis,ofcourse,becausethesuccessfulexecutionofthetry-catchlogicdependsontheworkoftheDataServicesjobserver.

Notethattheerrorlogisstillgeneratedinspiteofthesuccessfuljobexecution.Inthere,youcanseetheloggingmessagethatwasgeneratedbythelogicfromthecatchobjectandthecontextinwhichtheinitialexceptionhappened:

Try-catchobjectscanbeavitalpartofyourrecoverystrategy.Ifyourworkflowcontainsafewstepsthatyoucanthinkofasatransactionalunit,youwouldwanttocleanupwhensomeofthesestepsfailbeforerunningthesequenceagain.Asexplainedintherecipededicatedtotherecoverytopic,theDataServicesautomaticrecoverystrategyjustsimplyskipsthestepsthathavealreadybeenexecuted,andsometimes,thisissimplynotenough.

Italldependsonhowthoroughyouhavetobeduringyourrecovery.

Anotherveryimportantaspectistounderstandthattry-catchblockspreventthefailureoftheworkflowinwhichcontexttheyareput.Thismeansthattheerrorishiddeninsidethetry-catchandparentworkflow,andallsubsequentobjectsdowntheexecutionpathwillbeexecutedbyDataServices.

Therearesituationswhenyoudefinitelywanttofailthewholejobtopreventanyfurtherexecutionifsomeofthedataprocessinginsideitfails.Youcanstillusetry-catchblockstocatchtheerrorinordertologitproperlyordosomeextrasteps,butafterallthisisdone,theraise_exception()functionattheendofthecatchblockisputtofailtheworkflow.

Usecaseexample–populatingdimensiontablesInthisrecipe,wewillbuildtheETLjobtopopulatetwodimensiontablesintheAdventureWorks_DWHdatabase,DimGeographyandDimSalesTerritory,withthedatafromtheoperationaldatabaseAdventureWorks_OLTP.

GettingreadyForthisrecipe,youwillhavetocreatenewjob.Also,createtwonewschemasintheSTAGEdatabase:ExtractandTransform.Todothis,opentheSQLServerManagementStudio,expandDatabases|STAGE|Security|Schemas,right-clickontheSchemasfolder,andchoosetheNewSchema…optionfromthecontextmenu.Specifyyouradministratoruseraccountasaschemaowner.

Howtodoit…1. Inthefirststep,wewillcreateextractionprocessesusingthesesteps:

1. OpenthejobcontextandcreatetheWF_extractworkflow.2. OpentheWF_extractworkflowintheworkspaceandcreatefourworkflows:

eachforeverysourcetableweextractfromtheOLTPdatabase:WF_Extract_SalesTerritory,WF_Extract_Address,WF_Extract_StateProvince,WF_Extract_CountryRegion.Donotlinktheseworkflowobjectstomakethemruninparallel.

3. OpenWF_Extract_SalesTerritoryinthemainworkspaceareaandcreatetheDF_Extract_SalesTerritorydataflow.

4. OpenDF_Extract_SalesTerritoryintheworkspacearea.5. AddasourcetablefromtheOLTPdatastore:SalesTerritory.6. PlacetheQuerytransformafterthesourcetable,linkthem,opentheQuery

transformobjectintheworkspace,andmapallsourcecolumnstothetargetschemabyselectingthemtogetheranddraggingthemtothetargetschemaemptysection.

7. ExitQueryEditorandaddthetargettemplatetable,SalesTerritory.ChooseDS_STAGEasadatastoreobjectandExtractastheownertocreateatargetstagetableintheExtractschemaoftheSTAGEdatabase.

8. YourdataflowandQuerytransformmappingshouldlookasshowninthescreenshotshere:

9. Inthesamemanner,usingsteps3to8,createextractdataflowobjectsfortheotherOLTPtables:Address(dataflowDF_Extract_Address),StateProvince(dataflowDF_Extract_StateProvince),andCountryRegion(dataflowDF_Extract_CountryRegion).Placeeachofthecreateddataflowsinsidetheparentobjectwiththesamename,substitutingtheprefixDF_withWF_andputallextractworkflowstoruninparallelinsidetheWF_extractworkflowobject.Tonamethetargettemplatetablesinsideeachofthedataflows,choosethesame

nameasofthesourcetableobjectandselectDS_STAGEasadatabaseforthetabletobecreatedinandExtractastheowner/schema:

2. Now,let’screatetransformationprocessesusingthesesteps:1. GotothejobcontextlevelinyourDesignerandopentheWF_transformobject.2. Aswewillpopulatetwodimensiontables,wewillcreatetwotransformation

workflowsrunninginparallelforeachoneofthem:WF_Transform_DimSalesTerritoryandWF_Transform_DimGeography.

3. OpenWF_transform_DimSalesTerritoryandcreateanewdataflowinitsworkspace:DF_Transform_DimSalesTerritory.

4. Openthedataflowobjectanddesignitasshowninthefollowingscreenshot:

5. ItisnowimportantforthetransformationdataflowstocreatetargettemplatetablesintheTransformschemacreatedearlier.ThenameofthetargettabletemplateobjectshouldbethesameasthetargetdimensiontableinDWH.

6. TheJoinQuerytransformperformsthejoinoftwosourcetablesandmapsthecolumnsfromeachoneofthemtotheQueryoutputschema.Aswedonotmigrateimagecolumns,specifyNULLasamappingfortheSalesTerritoryImageoutputcolumn.Also,specifyNULLasamappingforSalesTerritoryKey,asitsvaluewillbegeneratedinoneoftheloadprocesses:

7. TocreatethetransformationprocessforDimGeography,gobacktotheWF_transformworkflowcontextlevelandcreateanewworkflowWF_Transform_DimGeographywithadataflowDF_Transform_DimGeographyinside.

8. Inthedataflow,wewillsourcethedatafromthreeOLTPtables,Address,StateProvince,andCountryRegion,topopulatethestagetransformationtablewiththetabledefinitionthatmatchesthetargetDWHDimGeographytable:

9. SpecifyjoinconditionsforallthreesourcetablesintheJoinQuerytransformandmapthesourcecolumntothetargetoutputschema:

10. PlaceanotherQuerytransformandnameitMapping.LinktheJoinQuerytransformtotheMappingQuerytransformandmapthesourcecolumnstothetargetschemacolumnswhichmatchthetabledefinitionoftheDWHDimGeographytable.MaponeextracolumnTERRITORYIDfromsourcetotarget:

11. IntheMappingQuerytransform,placeNULLinthemappingsectionsforthecolumnsthatwearenotgoingtopopulatevaluesfor.

3. Now,weneedtocreatefinalloadprocessesthatwillmovethedatafromthestagetransformationtablesintothetargetDWHdimensiontables.Performthesesteps:1. OpentheWF_loadworkflow,addtwoworkflowobjects

WF_Load_DimSalesTerritoryandWF_Load_DimGeography,andlinkthemtogethertorunsequentially.

2. OpenWF_Load_DimSalesTerritoryandcreateadataflowobject,DF_Load_DimSalesTerritory,insideit.

3. ThisdataflowwillperformacomparisonofsourcedatatoatargetDimSalesTerritorydimensiontabledataandwillproducethesetofupdatesfor

theexistingrecordswhosevalueshavechangedinthesourcesystem,orwillinsertrecordswithkeycolumnvaluesthatdonotexistinthedimensiontableyet:

4. IntheQuerytransform,simplymapallsourcecolumnsfromtheDimSalesTerritorytransformationtabletotheoutputschema.

5. InsidetheTable_Comparisonobject,definetargetDWHDimSalesTerritoryasacomparisontableandspecifySalesTerritoryAlternateKeyasakeycolumnandthreecomparecolumnsSalesTerritoryRegion,SalesTerritoryCountry,andSalesTerrtoryGroup,asshownhere:

6. Asthefinalstepinthedataflow,beforeinsertingdataintotargettableobject,theKey_GenerationtransformhelpsyoutopopulatetheSalesTerritoryKeycolumnofthetargetdimensiontablewithsequentialsurrogatekeys.SurrogatekeysarethekeysusuallygeneratedduringthepopulationofDWHtables.Surrogatekeycolumnscanidentifytheuniquenessoftherecord.Thisway,youhaveasinglecolumnwithauniqueIDthatyoucanuseinsteadofreferencingmultiplecolumnsinthetable,whichdefinestheuniquenessoftherecord:

7. Bydefault,alldimensiontablesintheDWHdatabaseweareusinghaveidentitycolumns.InSQLServer,theidentitycolumnsfeatureallowsyoutodelegatetheprocessofsurrogatekeyscreationtotheSQLServerdatabase.Yousimplyinserttherecordwithoutspecifyingvaluesfortheidentitycolumn,andtheSQLServerpopulatesthefieldforyouwiththesequentialuniquenumber.Inourcase,wewanttohavecontroloverthekeycreationourselvestobeabletogeneratethekeysintheETLbeforeinsertingthedata.Todothis,wehavetoenableIDENTITYINSERTbeforeinsertingtherecordsanddisableitaftertheinsert.Otherwise,youwillreceivetheerrormessagefromtheSQLServerinformingyouthatyoucannotpopulateidentitycolumnswithvaluesasitisdoneautomaticallybythedatabaseengine.

ToswitchtheabilitytoinsertsurrogatekeysinidentitycolumnsfromDataServices,openTargetTableEditoroftheDimSalesTerritorytableandpopulatethePre-LoadCommandsandPost-LoadCommandstabswiththefollowingtwocommandscorrespondingly:setidentity_insertdimsalesterritoryon

setidentity_insertdimsalesterritoryoff

8. Now,let’screatethesecondloadprocessofpopulatingtheDimGeographydimensiontable.OpentheDF_Load_DimGeographydataflowintheworkspacearea.

9. Thedataflowwillhavethesamestructureasthepreviousone,exceptthatwewilllookuptothealreadypopulatedDimSalesTerritorydimensiontableforSalesTerritoryKey:

10. IntheQuerytransform,mapallcolumnsfromthestageTransform.DimGeographytableandoneSalesTerritoryKeyfromtheDWHDimSalesTerritorytabletotheoutputschema.Forthejoincondition,specifythefollowingone:DIMGEOGRAPHY.TERRITORYID=DIMSALESTERRITORY.SALESTERRITORYALTERNATEKEY

11. Mappingtransformoutputschemadefinitionmatchesthetargettabledefinition,andhere,wewillfinallydroptheTERRITORYIDcolumnfromthemappings,aswedonotneeditanymore.

12. SpecifythefollowingsettingsintheTable_Comparisontransform:

13. IntheKey_Generationtransform,specifyDWH.DBO.DIMGEOGRAPHYasthetablenameandGEOGRAPHYKEYasthegeneratedkeycolumn.

14. Also,donotforgettodefinethecommandsinPre-LoadandPost-LoadtargettablesettingstoswitchonIDENTITY_INSERTandswitchitoffaftertheinsertiscomplete.Usethefollowingcommands:setidentity_insertdimgeographyon

setidentity_insertdimgeographyoff

Howitworks…Let’sreviewthedifferentaspectsoftheexamplewejustimplementedintheprevioussteps.

MappingBeforeyoustarttheETLdevelopmentinDataServices,youhavetodefinethemappingbetweensourcecolumnsofoperationaldatabasetables,targetcolumnsofDataWarehousetables,andtransformationrulesforthemigrateddata,ifrequired.Atthisstep,youalsohavetoidentifydependenciesbetweensourcedatastructurestocorrectlyidentifytypesofjoinrequiredtoextractthecorrectdataset.

Targetcolumn Sourcetable Sourcecolumn Transformationrule

SalesTerritoryKey NULL GeneratedsurrogatekeyinDWH

SalesTerritoryAlternateKey SalesTerritory TerritoryID Directmapping

SalesTerritoryRegion SalesTerritory Name Directmapping

SalesTerritoryCountry CountryRegion Name Directmapping

SalesTerritoryGroup SalesTerritory Group Directmapping

SalesTerritoryImage Notmigrating

Table1:MappingsfortheDimSalesTerritorydimension

Here,youcanfindthemappingtablefortheDimGeographydimension:

Targetcolumn Sourcetable Sourcecolumn Transformationrule

GeographyKey NULL GeneratedsurrogatekeyinDWH

City Address City Directmapping

StateProvinceCode StateProvince StateProvinceCode Directmapping

StateProvinceName StateProvince Name Directmapping

CountryRegionCode CountryRegion CountryRegionCode Directmapping

EnglishCountryRegionName CountryRegion Name Directmapping

SpanishCountryRegionName NULL Notmigrated

FrenchCountryRegionName NULL Notmigrated

PostalCode Address PostalCode Directmapping

SalesTerritoryKey DimSalesTerritory SalesTerritoryKey Lookup

IpAddressLocator NULL Notmigrated

Table2:MappingsforDimGeographydimension

Themajorityaredirectmappings,whichmeansthatwedonotchangethemigrateddataandmoveitasisfromsourcetotarget.TheinformationinthesemappingtablesisusedprimarilyintheQuerytransformsinsidethedataflowstojointhesourcetabletogetherandmapsourcecolumnsfromsourcetotargetschema:

Dependencies

ThenextstepistodefinethedependenciesbetweenpopulatedtargettablestounderstandinwhichorderETLprocessesloadingdataintothemshouldbeexecuted.TheprecedingdiagramshowsthatSalesTerritoryKeyfromtheDimSalesTerritorydimensiontableisusedasareferencekeyintheDimGeographydimensiontable.ThismeansthatETLprocessespopulatingeachofthesetablescannotbeexecutedinparallelandshouldrunsequentially,aswhenwepopulatetheDimGeographytable,wewillrequiretheinformationinDimSalesTerritorytobealreadyupdated.

Development

AfterdefiningthemappingsandtransformationrulesandmakingthedecisionabouttheexecutionorderofETLelements,youcanfinallyopentheDesignerapplicationandstartdevelopingtheETLjob.

NoteThenamingconventionsfortheworkflow,dataflow,scripts,anddifferenttransformationobjectsaswellasforstagingtableobjectsisveryimportant.ItallowsyoutoeasilyreadtheETLcodeandunderstandwhatresidesinonetableoranotherandwhattypeofoperationisperformedbyaspecificdataflowortransformationobjectwithinadataflow.

OurETLjobcontainsthreemainstagesthataredefinedbythreeworkflowobjectscreatedinthejob’sworkspace.Eachoftheseworkflowsplaysaroleofthecontainerfortheunderlyingworkflowobjectscontainingdataflows:

ThefirstworkflowcontainerWF_extractcontainstheprocessingunitsthatextractthedatafromtheOLTPsystemintotheDWHstagingarea.Therearedifferentadvantagesofthisapproachratherthanextractingandtransformingdatawithinthesamedataflow.Themainreasonisthatbycopyingthedataasisinthestagingarea,youaccesstheproductionOLTPsystemonlyonce,creatingaconsistentsnapshotoftheOLTPdataatspecifictime.Youcanqueryextractedtablesinstagingasmanytimesasyouwant,withoutaffectingtheliveproductionsystem’sperformance.Wedonotapplyanytransformationsormappinglogicintheseextractionprocessesandaresimplycopyingthecontentsofthesourcetablesasis.ThesecondworkflowcontainerWF_transformselectsthedatafromthestagetables,assemblesit,andtransformstomatchthetargettabledefinition.Atthisstage,wewillleaveallsurrogatekeycolumnsemptyandNULL-outthecolumnsforwhichwearenotgoingtomigratevalues.

NoteIntheDF_Transform_DimGeographydataflow,thetargettemplatetabledoesnotexactlymatchtheDWHtable’sDimGeographydefinition.WewillkeeponeextracolumnfromthesourceTERRITORYIDtoreferenceanotherdimensiontableDimSalesTerritoryattheloadstage.Withoutthiscolumn,wewouldnotbeabletolinkthesetwodimensiontablestogether.

Thethirdworkflowcontainer,WF_load,loadsthetransformeddatasetsintothetargetDWHdimensiontables.Anotherimportantoperationthisstepperformsisgeneratingsurrogatekeysforthenewrecordstobeinsertedintothetargetdimensiontable.

AnotherimportantdecisionyouhavetomakewhenyoupopulatedimensiontablesusingtheTable_Comparisontransformiswhichsetofkeysdefineanewrecordinthetargetdimensiontableandwhichcolumnsyouarecheckingforupdatedvalues.

Inthisexample,wemadeadecisiontoselectonlytwocomparisoncolumns,PostalCodeandSalesTerritoryKey.Wheneverthereisanewlocation(City+State+Country),therecordisinserted,andifthelocationexists,DataServicescheckswhetherthesource

recordcomingfromtheOLTPsystemcontainsnewvaluesinthePostalCodeorSalesTerritoryKeycolumn.Ifyes,thentheexistingrecordinthetargetdimensiontablewouldbeupdated.

NoteNotethatinthetransformationprocesseswedeveloped,wedidnotgenerateDWHsurrogatekeysforournewrecords.Themaingoalofthetransformationprocessistoassemblethedatasetforittomatchthetargettabledefinitionandapplyallrequiredtransformationifthesourcedatadonotcomplywiththedatawarehouserequirements.

Executionorder

Allthreestepsorthreeworkflows,WF_extract,WF_transform,andWF_load,runsequentiallyoneafteranother.Thenextworkflowstartsexecutiononlyaftersuccessfulcompletionofthepreviousone.

ChildobjectsofbothWF_extractandWF_transformruninparallelasatthosestages,wearenottryingtolinkthemigrateddatasetstoeachotherwithreferencekeys.

Atthefinalloadstage,WF_Load,containstwoworkflowobjectsthatrunsequentially.First,wewillfullypopulateandupdatetheDimSalesTerritorydimension,andthenafterit’sdone,wecansafelyreferenceitwhenpopulatingtheDimGeographytable.

TestingETL

ThebestwaytotestETListomakechangestothesourcesystem,runtheETLjob,andcheckthecontentsofthetargetdatawarehousetables.PreparingtestdatatopopulateDimSalesTerritory

Let’smakesomechangestothesourcedata.WewilladdanewsalesterritoryintheSales.SalesTerritorytableandanewstateinthePerson.StateProvincetable.RunthefollowingcodeintheSQLServerManagementStudio:—InsertnewrecordsintosourceOLTPtablestotestETL

—populatingDimSalesTerritory

USE[AdventureWorks_OLTP]

GO

—Insertnewsalesterritory

INSERTINTO[Sales].[SalesTerritory]

([Name],[CountryRegionCode],[Group],[SalesYTD],[SalesLastYear]

,[CostYTD],[CostLastYear],[rowguid],[ModifiedDate])

VALUES

(‘Russia’,‘RU’,‘Russia’,9000000.00,0.00

,0.00,0.00,NEWID(),GETDATE());

—Insertnewstate

INSERTINTO[Person].[StateProvince]

([StateProvinceCode],[CountryRegionCode],[IsOnlyStateProvinceFlag]

,[Name],[TerritoryID],[rowguid],[ModifiedDate])

VALUES

(‘CR’,‘RU’,1,‘Crimea’,12,NEWID(),GETDATE());

GO

PreparingtestdatatopopulateDimGeography

Toupdatethesourcetables,runthefollowingscriptintheSQLServerManagementStudio.ThisshouldcreateanewaddresswithanewcitywhichdoesnotyetexistintheDimGeographydimension.Youcouldskipthisstepas,bydefault,theOLTPdatabasehasmultipleaddressrecordsthatdonothavecorrespondentrowsinthetargetDWHdimension,buttomakethetestmoretransparent,itisrecommendedthatyoucreateyourownnewrecordinthesourcesystem:—InsertnewrecordsintosourceOLTPtablestotestETL

—populatingDimGeographydimension

USE[AdventureWorks_OLTP]

GO

—Insertnewaddress

INSERTINTO[Person].[Address]

([AddressLine1],[AddressLine2],[City],[StateProvinceID]

,[PostalCode],[SpatialLocation],[rowguid],[ModifiedDate])

VALUES

(‘10SuvorovaSt.’,NULL,‘Sevastopol’,182,‘299011’,NULL,NEWID(),GETDATE());

GO

Now,executethejobandquerybothdimensiontables.ThereisonenewrowinsertedinDimSalesTerritorywithSalesTerritoryKey=12andmultiplerecordswereinsertedintoandupdatedintheDimGeographytable.

AmongthenewrecordsinDimGeography,youshouldbeabletoseetherecordforthenewcityofSevastopolthatweinsertedmanuallywiththehelpoftheprecedingscript.

NoteIfyourunthejobagainwithoutmakingchangestothesourcesystem’sdata,itshouldnotcreateorupdateanyrecordsinthetargetdimensiontables,asallchangeshavealreadybeenpropagatedfromOLTPtoDWHbythefirstjobrun.ThemainobjectinourETLdrivingthechangestrackingistheTable_Comparisontransform.

UsingacontinuousworkflowInthisrecipe,wewilltakeacloselookatoneoftheworkflowobjectfeaturesthatcontrolshowtheworkflowrunswithinajob.

Howtodoit…

1. CreateajobwithasingleworkflowinsidenamedWF_continuous.Createasingle

globalvariable$g_countoftheintegertypeatthejoblevelcontext.2. Opentheworkflowpropertiesbyright-clickingontheworkflowobjectandselecting

theProperties…optionfromthecontextmenuandchangetheworkflowexecutiontypetoContinuousontheGeneralworkflowpropertiestab:

3. ExittheworkflowpropertiesbyclickingonOK.SeehowtheiconoftheworkflowobjectchangeswhenitsexecutiontypeischangedfromRegulartoContinuous:

4. GotoLocalObjectLibrary|CustomFunctions.5. Right-clickontheCustomFunctionslistandselectNewfromthecontextmenu.6. Namethecustomfunctionfn_check_flagandclickonNexttoopenthecustom

functioneditor.7. Createthefollowingparametersandvariables:

Variable/parameter Description

$p_DirectoryInputparameterofthevarchar(255)typetostorethedirectorypathvalue

$p_FileInputparameterofthevarchar(255)typetostorethefilenamevalue

$l_existLocalvariableoftheintegertypetostoretheresultofthefile_exists()function

8. Addthefollowingcodetothecustomfunctionbody:$l_exist=file_exists($p_Directory||$p_File);

if($l_exist=1)

begin

print(‘Check:fileexists’);

Return0;

end

else

begin

print(‘Check:filedoesnotexist’);

Return1;

End

Yourcustomfunctionshouldlooklikethis:

9. OpentheworkflowpropertiesagaintoeditthecontinuousoptionsusingtheContinuousOptionstab.

10. OntheContinuousOptionstab,tickthecheckboxwhentheresultofthefunctioniszerointheStopsectionatthebottomandinputthefollowinglineintheemptybox:fn_check_flag($l_Directory,$l_File).

11. ClickonOKtoexittheworkflowpropertiesandsavethechanges.12. Opentheworkflowinthemainworkspaceandcreatetwolocalvariablesinthe

VariablesandParameterswindow:$l_Directoryofthevarchar(255)typeand$l_Fileofthevarchar(255)type.

13. Createasinglescriptobjectwithinaworkflowandaddthefollowingcodeinit:$l_Directory=‘C:\AW\Files\’;

$l_File=‘flag.txt’;

$g_count=$g_count+1;

print(‘Execution#’||$g_count);

print(‘Starting’||workflow_name()||’…’);

sleep(10000);

print(‘Finishing’||workflow_name()||’…’);

14. Saveandvalidatethejobtomakesurethattherearenoerrors.15. Runthejobandafterfewworkflowexecutioncyclesaddtheflag.txtfileinthe

C:\AW\Files\directorytostopthecontinuousworkflowexecutionsequenceandthejobitself.

Howitworks…

Continuousexecutiontypeallowsyoutoruntheworkflowobjectanindefinitenumberoftimesinaloop.Therearemanyrestrictionsofusingthecontinuousworkflowexecutionmode.Someofthemareasfollows:

YoucannotnestcontinuousworkflowinanotherworkflowobjectSomedataflowtransformsarenotavailableforusewhenplacedunderacontinuousworkflowhierarchystructureAcontinuousworkflowobjectcanbeusedonlyinthebatchjob

Themainpurposeofthecontinuousworkflowisnottosubstitutethewhileloopasyoumighthavethoughtatafirstglance,buttosavememoryandprocessingresourcesforthetasksthathavetobeexecutedagainandagain,indefinitelyinthenon-stopmodeorforaverylongperiodoftime.DataServicesissavingresourcesbyinitializingandoptimizingforexecutionalltheunderlyingstructuressuchasdataflows,datastores,andmemorystructuresrequiredfordataflowprocessingonlyoncethecontinuousworkflowobjectisexecutedforthefirsttime.

ThereleaseresourcessectioninsidetheContinuousoptioncontrolshowoftenresourcesusedbytheunderlyingobjectsarereleasedandreinitialized.

Itisnotpossibletospecifytheexactnumberofcyclesforthecontinuousworkflowdirectly.Theonlyoptiontoaddthestoplogicistowriteacustomfunctionthatisexecutedaftereverycycle,andifitreturnszero,thevaluestopsthecontinuousworkflowexecutionsequence.

Intheprecedingrecipe,wecreatedacustomfunctionthatchecksthepresenceofthefileinthespecifiedfolder.Ifthefileappearsinthere,itreturns0.Thejobwillberunningindefinitelyuntilthefileappearsinthefolder,orthejobitselfiskilledmanually,orthejobservercrashes.

Tochecktheexistenceofthefile,thefile_exists()functionisused.Itreturns1ifthefileexistsand0ifitdoesnot.Thefunctionacceptsasingleparameter:afullfilenamethatincludesthepath.Asinourcase,weareinterestedinstoppingcontinuousworkflowexecution.Whenthefunctionreturns0,wehadtoinvertthereturnedvalueofthefunctionandcreatedacustomfunctionforthat.

Weaddedthesleep()functiontoimitatetheexecutionoftheworkflowsothatitwouldbeeasytoplacethefilewhiletheexecutioncycleisstillrunning.TheSleep()functionacceptsintegerparametersinmilliseconds,so10000isequalto10seconds.

Theglobalvariable$g_countwasaddedtocontrolthenumberofcyclesthatwereexecutedinthecontinuousworkflowsequence.

Anotherinterestingfactabouthowcontinuousworkflowbehavesisthatitalwaysexecutesanothercycleafterthestopfunctionreturnsthezerovalue.Lookatthefollowingscreenshot:

Seethatinspiteofthefactthatweplacedtheflag.txtfileduringthethirdexecutioncycleandthestopfunctionfounditandreturnedazerovalue(seetheCheck:fileexistsprintmessageinthetracelog),thefourthcyclestillwasexecuted.

Let’stryanothertesttoconfirmthis.Placetheflag.txtfilebeforethejobisexecutedandthenrunit.Thisiswhatyouseeinthetracelogfile:

Youcanseethatafterthecustomfunctionreturned0afterthefirstcycle,thecontinuousworkflowwasexecutedthesecondtime.

Thereismore…

Youhavetounderstandthatcontinuousworkflowusageisverylimitedinreallifebecauseoffunctionalrestrictionsandalsobecauseofthenatureoftheloopinwhichtheworkflowisexecuted.Inthemajorityofcases,thewhileloopobjectisapreferableoptiontoruntheworkfloworunderlyingprocessingsequenceofobjects.

Peekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjectsWiththeintroductionofworkflowobjects,whichallowthenestingandgroupingofobjects,youcanseethatETLcodeexecutedwithintheDataServicesjobisahierarchicalstructureofobjectsthatcanbequitecomplex.Justimagineifreal-lifejobshavehundredsofworkflowsintheirstructureandtwiceasmanydataflows.

Inthisrecipe,wewilllookunderthehoodofDataServicestoseehowitstorestheobjectinformation(ourETLcode)inthelocalDataServicesrepository.TechniqueslearnedinthisrecipecanhelpyoubrowsethehierarchyofobjectswithinyourlocalrepositorywiththehelpofthedatabaseSQLlanguagetoolset.Thisoftenprovestobeaveryconvenientmethodtouse.

Gettingready

YouwillnotcreateanyjobsorotherobjectsintheDataServicesDesigneraswearejustgoingtobrowsetheETLcodeandrunafewqueriesintheSQLServerManagementStudio.

Howtodoit…

FollowthesesimplestepstoaccessthecontentsoftheDataServiceslocalrepositoryinthisrecipe:

1. StarttheSQLServerManagementStudioandconnecttotheDS_LOCAL_REPO

databasecreatedinChapter2,ConfiguringtheDataServicesEnvironment.2. Querythedbo.AL_PARENT_CHILDtableforreferencesbetweenDataServicesobjects

andadditionalinfo.3. Querythedbo.AL_LANGTEXTtableforextraobjectpropertiesandscriptobject

contents.

Howitworks…

Queryingobject-relatedinformationfromtheDataServicesrepositorycouldbeusefulifyouwanttobuildthereportonETLmetadatathatdoesnotexistoutoftheboxinDataServices.ItalsocouldbeusefulwhentroubleshootingpotentialproblemswithyourETLcode.Wewilltakealookatthedifferentscenariosandbrieflyexplaineachcase.GetalistofobjecttypesandtheircodesintheDataServicesrepository

Usethefollowingquery:select

descen_obj_type,descen_obj_r_type,count(*)

from

dbo.al_parent_child

groupby

descen_obj_type,descen_obj_r_type;

ThemaintableofthereferenceistheAL_PARENT_CHILDtable.Itcontainsthefullhierarchyoftheobjectsstartingfromthejobobjectlevelandfinishingwiththetableobjectlevel.TheprecedingqueryshowsallthepossibleobjecttypesthatDataServices

registersintherepository.DisplayinformationabouttheDF_Transform_DimGeographydataflow

Usethequerytogetthisinformation:select*

from

dbo.al_parent_child

where

descen_obj=‘DF_Transform_DimGeography’;

Allcolumnsandtheirvaluesareexplainedinthistable:

Columnname Value Description

PARENT_OBJ WF_Transform_DimGeography

ThisisthenameoftheparentobjectDF_Transform_DimGeography

belongsto.Seethefollowingfigure.

PARENT_OBJ_TYPE WorkFlow Thisisthetypeoftheparentobject.

PARENT_OBJ_R_TYPE 0Thisisthetypecodeoftheparentobject.

PARENT_OBJ_DESC Nodescriptionavailable

Thisisthedescriptionoftheparentobject.ThisiswhatyouinputintheDescriptionfieldinsidetheworkflowpropertieswindowintheDesigner.Ifempty,DataServicesuses“Nodescriptionavailable”intherepotable.

PARENT_OBJ_KEY 175Thisistheinternalparentobjectkey(ID).

DESCEN_OBJ DF_Transform_DimGeographyThisistheobjectnamewearelookingupinformationfor.

DESCEN_OBJ_TYPE DataFlow Thisisthetypeoftheobject.

DESCEN_OBJ_R_TYPE 1 Thisisthetypecodeoftheobject.

DESCEN_OBJ_DESC Nodescriptionavailable

ThisisthecontentsoftheDescriptionfieldofdataflowpropertiesintheDesigner.Itisemptyforthisspecificdataflow.

DESCEN_OBJ_USAGE NULLThisindicateswhethertheobjectisasourceoratargetwithinadataflow.Astheobjectitselfisadataflow,thisfieldisnotpopulated.

DESCEN_OBJ_KEY 174 Thisistheinternalobjectkey(ID).

DESCEN_OBJ_DS NULL

Thisindicateswhatthedatastoreobjectbelongsto.Astheobjectwearelookingupisadataflow.thisfieldisnotpopulated.

DESCEN_OBJ_OWNER NULL

Thisisthedatabaseowneroftheobject.Itisnotapplicabletodataflowobjectseither.

DisplayinformationabouttheSalesTerritorytableobject

Usethefollowingquery:select

parent_obj,descen_obj_desc,descen_obj_usage,descen_obj_key,descen_obj_ds,descen_obj_owner

from

dbo.al_parent_child

wheredescen_obj=‘SALESTERRITORY’;

Theresultisinthefollowingscreenshot:

Fromtheprecedingscreenshot,youcanseethattwodifferentobjectswiththesamenameSALESTERRITORYexistintheDataServicesrepositorywithuniquekeys37and38.

TheonewithOBJ_KEYas37isimportedintheOLTPdatastoreandbelongstotheSalesschema.ItisusedonlyDF_Extract_SalesTerritoryasithasonlyonerecordwiththeparentobjectofthatname.

TheSALESTERRITORYobjectwithOBJ_KEYas38isastageareatableandisimportedintotheDS_STAGEdatastoreandbelongstotheExtractdatabaseschema.Ithastwodifferentparentobjects,asinDesigner,itwasplacedintotwodifferentdataflows:asatargettableobjectinDF_Extract_SalesTerritory(youcanseeitfromtheDESCEN_OBJ_USAGE

column)andasasourcetableobjectinDF_Transform_DimSalesTerritory.Seethecontentsofthescriptobject

TheonethingyouhaveprobablynoticedalreadyfromtheresultoftheveryfirstqueryinthisrecipeisthatDataServicesdoesnothaveascriptobjecttype.

Asyouprobablyremember,scriptobjectsdonothavetheirowncontextinDataServicesandoperateinthecontextoftheworkflowobjecttheybelongto.Thatiswhy,youhavetoquerytheinformationaboutworkflowpropertiesusinganothertableAL_LANGTEXTtofindtheinformationaboutscriptcontentsintheDataServicesrepository.

Usethefollowingquery:select*

fromdbo.al_langtexttxt

JOINdbo.al_parent_childpc

ontxt.parent_objid=pc.descen_obj_key

where

pc.descen_obj=‘WF_continuous’;

WeareextractinginformationaboutthescriptobjectcreatedintheWF_continuousworkflow.

Allworkflowpropertieswiththecontentsofallscriptsthatbelongtoitarestoredinaplaintextformat.

Inthistable,weareonlyinterestedintwocolumnsSEQNUM,whichrepresentsthenumberofpropertiestextrow,andTEXTVALUE,whichstoresthepropertiestextrowitself.

SeetheconcatenatedversionofinformationstoredintheTEXTVALUEcolumnoftheAL_LANGTEXTrepositorytablehere:AlGUIComment(“ActaName_1”=‘RSavedAfterCheckOut’,“ActaName_2”=‘RDate_created’,“ActaName_3”=‘RDate_modified’,“ActaValue_1”=‘YES’,“ActaValue_2”=‘SatJul0416:52:332015’,“ActaValue_3”=‘SunJul0511:18:022015’,“x”=’-1’,“y”=’-1’)

CREATEPLANWF_continuous::‘7bb26cd4-3e0c-412a-81f3-b5fdd687f507’()

DECLARE

$l_DirectoryVARCHAR(255);

$l_FileVARCHAR(255);

BEGIN

AlGUIComment(“UI_DATA_XML”=’<UIDATA><MAINICON><LOCATION><X>0</X>

<Y>0</Y></LOCATION><SIZE><CX>216</CX><CY>-179</CY></SIZE></MAINICON>

<DESCRIPTION><LOCATION><X>0</X><Y>-190</Y></LOCATION><SIZE><CX>200</CX>

<CY>200</CY></SIZE><VISIBLE>0</VISIBLE></DESCRIPTION></U

IDATA>’,“ui_display_name”=‘script’,“ui_script_text”=’$l_Directory='C:\\AW\\Files\\';

$l_File='flag.txt';

$g_count=$g_count+1;

print('Execution#'||$g_count);

print('Starting'||workflow_name()||'…');

sleep(10000);

print('Finishing'||workflow_name()||'…');’,“x”=‘116’,“y”=’-175’)

BEGIN_SCRIPT

$l_Directory=‘C:\AW\Files\’;$l_File=‘flag.txt’;$g_count=($g_count+1);print((‘Execution#’||$g_count));print(((‘Starting’||workflow_name())||’…’));sleep(10000);print(((‘Finishing’||workflow_name())||’…’));END

END

SET(“loop_exit”=‘fn_check_flag($l_Directory,$l_File)’,“loop_exit

_option”=‘yes’,“restart_condition”=‘no’,“restart_count”=‘10’,“restart_count_option”=‘yes’,“workflow_type”=‘Continuous’)

ThefirsthighlightedsectionoftheprecedingcodeisthedeclarationsectionoflocalworkflowvariablescreatedforWF_continuous.Thesecondhighlightedsectionismarkingthetextthatbelongstotheunderlyingscriptobject.YoucanseethatthescriptobjectisnotconsideredbyDataServicesasaseparateobjectentityandisjustapropertyoftheparentworkflowobject.Tocompare,takealookathowthescriptcontentslooklikeinDesigner:$l_Directory=‘C:\AW\Files\’;

$l_File=‘flag.txt’;

$g_count=$g_count+1;

print(‘Execution#’||$g_count);

print(‘Starting’||workflow_name()||’…’);

sleep(10000);

print(‘Finishing’||workflow_name()||’…’);

YoucanseethatformattingofthesameinformationstoredintheTEXTVALUEfieldisabitdifferent.So,becarefulwhenextractingandparsingthisdatafromthelocalrepository.

Finally,thethirdhighlightedsectionmarkstheworkflowpropertiesconfiguredwiththeProperties…contextmenuoptioninDesigner.

NoteThereisanotherversionoftheAL_LANGTEXTtablethatcontainsthesamepropertiesinformationbutintheXMLformat.ItistheAL_LANGXMLTEXTtable.

Chapter6.Job–BuildingtheETLArchitectureInthischapter,wewillcoverthefollowingtopics:

Projectsandjobs–organizingETLUsingobjectreplicationMigratingETLcodethroughthecentralrepositoryMigratingETLcodewithexport/importDebuggingjobexecutionMonitoringjobexecutionBuildinganexternalETLauditandauditreportingUsingbuilt-inDataServicesETLauditandreportingfunctionalityAutoDocumentationinDataServices

IntroductionInthischapter,wewillgouptothejoblevelandreviewthestepsinthedevelopmentprocessthatmakeasuccessfulandrobustETLsolution.Allrecipespresentedinthischaptercanfallintooneofthethreecategories:ETLdevelopment,ETLtroubleshooting,andETLreporting.ThesecategoriesincludedesigntechniquesandprocessesusuallyimplementedandexecutedsequentiallyinorderwithintheETLlifecycle.

Here,youcanseewhichtopicsfallunderwhichcategory.

DevelopingETL:

Projectsandjobs–organizingETLUsingobjectreplicationMigratingETLcodethroughthecentralrepositoryMigratingETLcodewithexport/import

ThedevelopingcategorydiscussesissuesfacedbyETLdevelopersonadailybasiswhentheyworkondesigningandimplementinganETLsolutioninDataServices.

TroubleshootingETL:DebuggingjobexecutionMonitoringjobexecution

ThetroubleshootingcategoryexplainsindetailthetroubleshootingtechniquesthatcanbeusedinDataServicesDesignertotroubleshoottheETLcode.

ReportingonETL:BuildingexternalETLauditandauditreportingUsingbuilt-inDataServicesETLauditandreportingfunctionalityAutoDocumentationinDataServices

ThereportingcategoryreviewsthemethodsusedtoreportonETLmetadataandalsoexplainstheAutoDocumentationfeatureavailableinDataServicestoquicklygenerateandexportdocumentationforthedevelopedETLcode.

Projectsandjobs–organizingETLProjectsareasimpleandgreatmechanismtogroupyourETLjobstogether.TheyarealsomandatorycomponentsofETLcodeorganizationforvariousDataServicesfeatures,suchasAutoDocumentationandbatchjobconfigurationavailableintheDataServicesManagementConsole.

GettingreadyTherearenopreparationsteps.Youhaveeverythingyouneedinyourlocalrepositorythathasalreadybeencreated.Inthisrecipe,wewilluseJob_DWH_DimGeographydevelopedinChapter5,Workflow–ControllingExecutionOrder,topopulatetheDWHdimensiontablesDimSalesTerritoryandDimGeography.

Howtodoit…TocreateaprojectobjectinDataServices,followthesesteps:

1. OpentheLocalObjectLibrarywindowandchoosetheProjectstab.2. Right-clickintheemptyspaceoftheProjectstabandselectNewfromthecontext

menu.TheProject–Newwindowappearsonthescreen.3. InputtheprojectnameasDWH_DimensionsintheProjectNamefield.4. OpentheProjectAreawindowusingtheProjectAreabuttononthetoolbaratthe

top:

5. GotoProjectArea|Designer.Youwillonlyseethecontentsofoneselectedproject.ToselecttheprojectormakeitvisibleintheProjectArea|Designerwindow,gotoLocalObjectLibrary|Projectsandeitherdouble-clickontheprojectyouareinterestedin(inourcase,ithasonlyoneprojectcreated)orchooseOpenfromthecontextmenuoftheselectedproject.

6. Toaddthejobintheproject,draganddroptheselectedjobfromLocalObjectLibrary|JobsintotheProjectArea|Designertabwindoworright-clickonthejobobjectinLocalObjectLibraryandchoosetheAddToProjectoptionfromthecontextmenu.AddJob_DWH_DimGeographycreatedinthepreviousrecipetotheDWH_Dimensionsproject:

Howitworks…Thisisallyouneedtodotocreateaprojectandplacejobsinit.Itisaverysimpleprocessthat,infact,bringsyouafewextraadvantagesthatyoucanuseinETLdevelopment.TheprocessalsorevealsnewfunctionalitynotaccessibleotherwiseinDataServices.Let’stakealookatsomeofthem.

HierarchicalobjectviewAvailableintheProjectArea|Designer,thisviewallowsyoutoquicklyaccessanychildobjectwithinajob.Inthefollowingscreenshot,theexpandingtreeshowsworkflow,dataflow,andtransformationobjects;byclickingonanyofthem,youopentheminthemainworkspacewindow:

HistoryexecutionlogfilesTheselogfilesareavailableonlyifthejobwasassignedtoaproject.TheProjectArea|Logtaballowsyoutoseeandaccessallavailablelogfiles(trace,performance,anderrorlogs)keptbyDataServicesforspecificjobs:

Executing/schedulingjobsfromtheManagementConsole

Yes,thisoptionisavailableonlyforjobsthatbelongtoaproject.

Usehttp://localhost:8080/DataServicestostartyourDataServicesManagementConsole.

LogintotheManagementConsoleusingtheetluseraccountcreatedintheConfiguringuseraccessrecipeofChapter2,ConfiguringtheDataServicesEnvironment.ItisthesameuseryouusetoconnecttoDataServicesDesigner.

GotoAdministrator|Batch|DS4_REPO.

IfyouopentheBatchJobConfigurationtab,youwillseethatonlyJob_DWH_DimGeographyisavailableforbeingexecuted/scheduled/exportedforexecution,asitwastheonlyjobinourlocalrepositorythatweaddedtoacreatedproject:

Asyoucansee,projectsarethecontainersforyourjobs,allowingyoutoorganizeanddisplayyourETLcodeandperformadditionaltasksfromtheManagementConsoleapplication.Keepinmindthatyoucannotaddanythingelseexceptthejobobjectdirectlyintotheprojectlevel.

UsingobjectreplicationDataServicesallowsyoutoinstantlycreateanexactreplicaofalmostanyobjecttypeyouareusinginETLdevelopment.Thisfeatureisusefultocreatenewversionsofanexistingworkflowordataflowtotestorjusttocreatebackupsattheobjectlevel.

Howtodoit…Wewillreplicateajobobjectusingthesesteps:

1. GotoLocalObjectLibrary|Jobs.2. Right-clickontheJob_DWH_DimGeographyjobandselectReplicatefromthecontext

menu:

3. CopyofthejobwiththenewnameiscreatedintheLocalObjectLibrary:

Howitworks…AllobjectsinDataServicescanbeidentifiedaseitherreusableornotreusable.

Areusableobjectcanbeusedinmultiplelocations,thatis,atableobjectimportedinadatastorecanbeusedasasourceortargetobjectindifferentdataflows.Nevertheless,allthesedataflowswillreferencethesameobject,andifchangedinoneplace,itwouldchangeeverywhereitisused.

Notreusableobjectsrepresenttheinstancesofaspecificobjecttype.Forexample,ifyoucopyandpastethescriptobjectfromoneworkflowtoanother,thesetwocopieswillbetwodifferentobjects,andbychangingoneofthem,youarenotmakingchangestoanother.

Let’stakeanotherexampleofadataflowobject.Dataflowsarereusableobjects.Ifyoucopyandpastetheselecteddataflowobjectintoanotherworkflow,youwouldcreateareferencetothesamedataflowobject.

Tobeabletomakeacopyofareusableobjectsothatthecopydoesnotreferencetheoriginalobject,ithasbeencopiedfromthereplicationfeatureusedinDataServices.Notethatthereplicatedobjectcannothavethesamenameastheoriginalobjectithasbeenreplicatedfrom.Thatisbecauseforreusableobjectssuchasworkflowsanddataflows,theirnamesuniquelyidentifytheobject.

NoteTheruleofthumbforcheckingwhetheranobjecttypeisreusableornot,istocheckifitexistsintheLocalObjectLibrarypanel.AllobjectsthatcanbefoundonLocalObjectLibrarypaneltabsarereusableobjects,exceptProjects,asitisnotpartofexecutableETLcode.Instead,itisalocationfolderthatisusedtoorganizejobobjects.Nevertheless,youcannotcreatetwoprojectswiththesametoollikeyoucanwiththescriptobjects.

ThefollowingtableshowswhichobjecttypecanbereplicatedinDataServicesandhowthereplicationprocessbehavesforeachoneofthem.Allthesearereusableobjecttypes.

Job NewobjectautomaticallycreatedinLocalObjectLibrarynamedasCopy_<ID>_<originaljobname>

Workflow NewobjectautomaticallycreatedinLocalObjectLibrarynamedasCopy_<ID>_<originalworkflowname>

Dataflow NewobjectautomaticallycreatedinLocalObjectLibrarynamedasCopy_<ID>_<originaldataflowname>

Fileformat

NewFileFormatEditorwindowisopened.ThenewnameisalreadydefinedasCopy_<ID>_<OriginalFileFormatname>,butyoucanchangeitbyaddinganewvalueintothenamefield

Customfunctions

NewCustomerFunctionwindowisopened.Youhavetoselectanewnameforthereplicatedfunction

Thereplicationprocessisaconvenientandeasywaytoperformobject-levelbackups.AllyouhavetodotocreateacopyoftheobjectbeforeeditingitistoclickontheReplicateoptionfromthecontextmenuoftheobjectyouarereplicating.

ItisalsoaneasywaytotestthecodechangesbeforeyoudecidetoupdatetheproductionversionoftheETL.

Forexample,ifyouwanttoseehowyourdataflowobjectbehavesafteryouchangethepropertiesoftheTable_Comparisontransforminsideit,youcanperformthefollowingsequenceofsteps:

1. Replicatethedataflowandsetituptorunseparatelywithinatestjob.2. Runthetestjobandtesttheoutputdatasettomakesurethatitgeneratestheexpected

result.3. Renametheoriginaldataflowbyaddingthe_archiveor_oldprefixtoit.4. Renamethenewreplicatedversiontotheoriginaldataflowname.5. Replacethearchivedataflowobjecteverywhereitisusedwithanewversion.

Toseeallparentobjectsthespecificobjectbelongstoyou.Inotherwords,toseeallthelocationswherethespecificobjectwasplaced,youcanuseoneofthefollowingsteps:

1. ChoosetheDIMGEOGRAPHYobjectfromtheDWHdatastoreinLocalObjectLibrary.

Right-clickonitandchoosetheViewWhereUsedoptionfromthecontextmenu.

TheparentobjectsthatthetableobjectbelongstoaredisplayedintheInformationtaboftheOutputwindow:

Youcanalsoseethenumberofparentobjects(locations)fortheobjectrightawayinLocalObjectLibraryintheUsagecolumnavailablenexttotheobjectname.Thisisusefulinformationthatcanhelpyouidentifyunusedor“orphaned”objects.

2. Picktheobjectofinterestintheworkspacearea(forexample,adataflowplaced

withinaworkflowworkspaceortableobjectplacedinthedataflow),right-clickonit,andchooseViewWhereUsedfromthecontextmenu.ThelistofparentobjectswillappearintheOutput|Informationwindow:

3. Finally,itispossibletocheckwherethecurrentlyopenedobjectisused.Whenyouhavetheobjectopenedintheworkspaceareaanddonothavetheabilitytoright-clickonit,insteadofgoingtotheLocalObjectLibrarylistsinordertofindtheobject,trytojustclickontheViewWhereUsedbuttonfromthetoptoolmenupanel:

NoteRememberthatitdisplaystheusedlocationslistfortheobjectcurrentlydisplayedontheactivetabofthemainworkspacearea.

MigratingETLcodethroughthecentralrepositoryInthisrecipe,wewilltakeabrieflookattheaspectsofworkinginthemultiple-userdevelopmentenvironmentandhowDataServicesaccommodatestheneedtomigratetheETLcodebetweenlocalrepositoriesbelongingtodifferentETLdevelopers.

GettingreadyTouseallfunctionalityavailableinDataServicestoworkinamultiuserdevelopmentenvironment,wemissaveryimportantcomponent:theconfiguredcentralrepository.So,togetready,andbeforeweexplorethisfunctionality,wehavetocreateanddeploythecentralrepositoryintoourDataServicesenvironment.

Performallthefollowingstepstocreate,configure,anddeploythecentralrepository:

1. OpentheSQLServerManagementStudioandconnecttotheSQLEXPRESSserver

engine.2. Right-clickonDatabasesandchoosetheNewDatabase…optionfromthecontext

menu.3. NamethenewdatabaseasDS_CENTRAL_REPOandkeepallitsparameterswithdefault

values.4. StarttheSAPDataServicesRepositoryManagerapplication.5. ChooseRepositorytypeasCentralandspecifyconnectivitysettingstothenew

databaseDS_CENTRAL_REPO.Whenyoufinish,clickontheCreatebuttontocreatecentralDataServicesrepositoryobjectsintheselecteddatabase:

6. Theprocessofcreatingarepositorycantakeafewminutes.Ifitissuccessful,youshouldseethefollowingoutputonthescreen:

7. Now,weneedtoregisterournewlycreatedcentralrepositorywithinDataServicesandtheInformationPlatformServices(IPS)configuration.StarttheCentralManagementConsolewebapplicationbygoingtohttp://localhost:8080/BOE/CMCandlogintotheadministratoraccount.ItisthesameaccountthatwascreatedduringtheinstallationofDataServices(seeChapter2,ConfiguringtheDataServicesEnvironment,fordetails).

8. ChoosetheDataServiceslinkonthehomescreentoopentheDataServicesrepositoryconfigurationarea.

9. Right-clickontheRepositoriesfolderorintheemptyareaofthemainwindowand

choosetheConfigurerepositoryoptionfromthecontextmenu:

10. NamethenewlyconfiguredrepositoryasDS4_CENTRALandinputconnectivitysettings.Afterthat,clickonTestConnectiontoseethesuccessfulconnectionmessage:

11. Closetherepositorypropertieswindow.Youshouldseethenewnon-securedcentralrepository,DS4_CENTRAL,displayedonthescreenalongwithlocalrepositoryDS4_REPO:

12. Right-clickonDS4_CENTRALandchoosetheUserSecurityoptionfromthecontextmenu.

13. ChooseDataServicesAdministratorUsersandclickontheAssignSecurity

button.14. OntheAssignSecuritywindow,gototheAdvancedtabandclickonthe

Add/RemoveRightslink.15. OntheAdd/RemoveRightswindow,chooseApplication|DataServices

Repositoryandselect/grantthefollowingoptionsintheright-handsideundertheSpecificRightsforDataServicesRepositorysection:

16. ClickonOKtosavethechangesandclosetheUserSecuritywindow.17. ThefinalstepofconfigurationistospecifythecentralrepositoryinyourDesigner

configurationsettings.ThiscanbeconfiguredontheDesignerOptionwindow,oryoucanopentheCentralRepositoryConnectionssectionbygoingtoTools|CentralRepositories…fromthetopmenu.

18. IntheCentralRepositoryConnectionssection,clickontheAddbuttontoopenthelistofrepositoriesavailableandselectDS4_CENTRAL.

19. TheActivatebuttonactivatesthecentralrepositoryfromthelist(ifyouaddmultipleones,onlyoneofthemcanbeactiveatatime).YoucanalsospecifytheReactivateautomaticallyflagforthecentralrepositorytoreactivateautomaticallywhentheDesignerapplicationrestarts:

20. Afterperformingallthesesteps,youshouldbeabletoactivatetheCentralObjectLibrarywindow(seethetoptoolpanel),whichlooksalmostexactlylikeLocalObjectLibrary:

Theprecedingstepsshowedyouhowtocreate,configure,anddeploythecentralrepositoryinDataServices.Next,wewillseehowyoucanactuallyusethecentralrepositorytomigratetheETLbetweendifferentlocalrepositories.

Howtodoit…ThecentralrepositoryorCentralObjectLibraryisalocationsharedbydifferentETLdeveloperstoexchangeandsynchronizetheETLcode.Inthisrecipe,wewillcopytheexistingjobintoCentralObjectLibraryandseewhichoperationsareavailableinDataServicesontheobjectsstoredthere.Followthesesteps:

1. GotoLocalObjectLibrary|Jobs.2. Right-clickontheJob_DWH_DimGeographyjobobjectandgotoAddtoCentral

Repository|ObjectandDependentsfromthecontextmenu.3. OpenCentralObjectLibraryandseethatthejobobjectandalldependentobjects,

workflows,anddataflowsappearedontheCentralObjectLibrarytabsections.TheETLcodeforJob_DWH_DimGeographyhasbeensuccessfullymigratedtothecentralrepository.

4. Now,gototheLocalObjectLibrary|Dataflows,findtheDF_Load_DimGeographydataflowobject,anddouble-clickonittoopenitintheworkspaceareaforediting.

5. RenamethefirstQuerytransformfromQuerytoJoinandsavethedataflow.6. NowthatyouhavechangedtheETLcodemigratedfromlocaltocentralrepository,

youcancomparethetwoversionsofyourjobandseethedifferencesdisplayedinDifferencesViewer.Right-clickonthejobinLocalObjectLibraryandgotoCompare|ObjectanddependentstoCentralfromthecontextmenu:

7. WheninCentralObjectLibrary,youcandothesamethingbyclickingonaspecificobjectandchoosingthepreferableoptionfromtheComparecontextmenu.

8. Togettheversionoftheobjectfromthecentralrepositorytoalocalone,selecttheDF_Load_DimGeographydataflowobjectintheCentralObjectLibrary,right-clickonit,andgotoGetLatestVersion|Objectfromthecontextmenu.

9. Ifyoucomparethelocalobjectversiontotheonestoredinthecentralrepositorynow,youwillseethatthereisnodifference,asthecentralobjectversionhasoverwrittenthelocalobjectversion.

Howitworks…ThepurposeofthecentralrepositoryistoprovideacentralizedlocationtostoreETLcode.

TheCentralObjectLibraryrepresentsthecontentsofthecentralrepositoryinthesamewaythattheLocalObjectLibraryrepresentsthecontentsofthelocalrepository.

TheETLcodestoredinthecentralrepositorycannotbechangeddirectlyasinthelocalrepository.So,itprovidesalevelofsecuritytomakesurethatthecentralrepositorychangescanbetracked,andthehistoryofalloperationsperformedonitsobjectscanbedisplayed.

AddingobjectstoandfromtheCentralObjectLibraryIftheobjectdoesnotexistinthecentralrepository,youcanadditusingtheAddtoCentralRepositoryoptionfromtheobjectscontextmenu.

Iftheobjectalreadyexistsinthecentralrepository,thereareafewextrastepsrequiredtoupdateitwithanewerversionfromthelocalone.Wewilltakeacloselookatthisfunctionalityintheupcomingchapters.

Gettingtheobjectfromthecentraltothelocalrepositoryismuchmoresimple.AllyouneedtodoisusetheGetLatestVersionoptionfromtheobjectscontextmenuinCentralObjectLibrary.Itdoesnotmatteriftheobjectexistsornotinthelocalrepository—itwillbecreatedoroverwritten.Thismeansthatitwillbedeletedandcopiedfromthecentralrepository.

Anotherimportantaspectofcopyinganobjectinto,andfrom,thecentralrepositoryistheavailabilityofthreemodes:Object,Objectanddependents,andWithfiltering:

Object:Inthismode,itdoesnotmatterwhichoperationyouperform,whetheritisgettingthelatestobjectversionfromcentraltolocal,comparingobjectversionsbetweencentralandlocal,orjustplacingobjectsfromlocaltocentral.Theoperationisperformedonthisobjectonly.Objectanddependents:Thisoperationaffectsallthechildobjectsbelongingtotheselectedobject,theirchildobjects,theirchildobjects,andsoonuntilthelowestlevel

downthehierarchy(whichisusuallyatable/fileformatlevel).Withfiltering:ThismodeisbasicallythesameasObjectanddependents,butwiththeabilitytoexcludethespecificobjectfromtheaffectedobjects.Whenchosen,thenewwindowopens,allowingyoutoexcludespecificobjectsfromthehierarchytree.HereistheresultofchoosingAddtoCentralRepository|WithfilteringfortheJob_DWH_DimGeographyobject:

ComparingobjectsbetweentheLocalandCentralrepositoriesDesignerhasaveryusefulComparefunctionavailableforallobjectsstoredinthelocalorcentralrepositories.Whenselectedfromthecontextmenuoftheobjectstoredinacentralrepositorylocation,therearetwoComparemethodsavailable:ObjecttoLocalandObjectwithdependentstoLocal.

Whenselectedfromthecontextmenuoftheobjectstoredinalocalrepositorylocation,therearetwoComparemethodsavailable:ObjecttoCentralandObjectwithdependentstoCentral.

TheresultispresentedintheDifferenceViewerwindow,whichopensinthemainworkspaceareainaseparatetabandlookssimilartothefollowingscreenshot:

ThisisanexampleoftheDifferenceViewerwindow.NotehowwehaveonlyrenamedtheQuerytransform,yetDifferenceViewershowsthewholestructureoftheJoinQueryobjectasdeleted,andontheCentraltab,itshowsthenewQueryQuerytransformstructure.TheMappingandLinkssectionsoftheupdateddataflowarealsoaffected,asyoucanseeintheprecedingscreenshot.

Thereismore…Ihavenotdescribedoneofthemostimportantconceptsofthecentralrepository:theabilitytocheckoutandcheckinobjectsandviewthehistoryofchangesinthemultiuserdevelopmentenvironment.Ihaveleftitformoreadvancedchapters,anditwillbeexplainedfurtherinthebook.

MigratingETLcodewithexport/importDataServicesDesignerhasvariousoptionstoimport/exportETLcode.

Inthisrecipe,wewillreviewallpossibleimport/exportscenariosandtakeacloserlookatthefileformatsusedforimport/exportinDataServices:ATLfiles(themainexportfileformatfortheDataServicescode)andXMLstructures.

GettingreadyTocompletethisrecipe,youwillneedanotherlocalrepositorycreatedinyourenvironment.RefertothefirsttwochaptersofthebooktocreateanotherrepositorynamedDS4_LOCAL_EXTinthenewdatabase,DS_LOCAL_REPO.DonotforgettoassignthepropersecuritysettingsforDataServicesAdministratorusersinCMCafterregisteringthenewrepository.

Howtodoit…DataServiceshastwomainimport/exportoptions:

UsingATL/XMLexternalfilesDirectimportintoanotherlocalrepository

Import/ExportusingATLfilesInthefollowingsteps,IwillshowyouanexampleofhowtoexportETLcodefromtheDataServicesDesignerintoanATLfile.

1. ExportJob_DWH_DimGeographyintoanATLfile.Right-clickonthejobobjectin

LocalObjectLibrary|JobsandselectExportfromthecontextmenu.TheExportwindowopensinthemainworkspacearea.Lookatthefollowingscreenshot:

2. Usingthecontextmenubyright-clickingonthespecificobjectorobjectsintheExportwindow,youcanexcludeselectedobjectswiththeExcludeoptionorselectedobjectswithalltheirdependenciesusingtheExcludeTreeoption.ExcludetheDF_Extract_SalesTerritorydataflowandallitsdependenciesfromtheexport,

asshowninthefollowingscreenshotusingtheExcludeTreeoption:

3. Objectsexcludedfromtheexportaremarkedwithredcrosses.SeeboththeObjectstoexportandDatastorestoexportareasontheExporttabfortheobjectsexcludedbytheExcludeTreecommandexecutedinthepreviousstep:

4. Toexecutetheexportoperation,right-clickinanyareaoftheExportworkspacetabandchoosetheExporttoATLfile…optionfromthecontextmenu.OntheopenedSaveAsscreen,choosethenameoftheATLfile,export.atl,anditslocation.Then,clickonOKandspecifythesecuritypassphrasefortheATLfile.

5. Exportcouldtakeanythingfromafewsecondsuptoafewminutes,dependingonthenumberofobjectsyouareexporting.Whenitisfinished,youwillseethefollowingoutputintheOutput|Informationwindow.Ifyoucheckthechosenlocation,youshouldseethattheexport.atlfilewascreated:

6. Now,logintothesecondlocalrepositorywithDesigner.Forthis,exittheDesignertorestarttheapplication.Onthelogonscreen,choosetoconnecttoanotherlocalrepository:

7. Thenewlocalrepositoryiscompletelyempty.Wewillusetheexport.atlfile

createdintheprevioussteptoimportthejobanditsdependentobjectsintothisnewrepository.SelecttheImportFromFile…optionfromthetopToolsmenulist.Then,selecttheexport.atlfileandclickonOK,thusagreeingtoimportallobjectsfromthefileintothecurrentlyopenlocalrepository.

8. Asweexportedthejobobjectanditsdependents,itdoesnotbelongtoanyprojectinanewrepository.CreateanewprojectcalledTESTandplacethejobinittoexpanditsstructure:

SeethatDF_Extract_SalesTerritoryandthetablesbelongingtoitaremissingfromthejobstructure,althoughDataServiceskeepsreferenceforWF_Extract_SalesTerritory.Ifthedataflowisimportedinthefuture,itwouldautomaticallybeassignedasachildobjecttotheworkflowandwouldfitintothejobstructure.

DirectexporttoanotherlocalrepositoryLet’sperformadirectexportofthemissingDF_Extract_SalesTerritoryobjectanditsdependentsfromtheDS4_REPOtoDS4_LOCAL_EXTrepository:

1. LogintoDS4_REPO,right-clickontheDF_Extract_SalesTerritorydataflowobject

intheLocalObjectLibrary,andselectExportfromthecontextmenutoopentheExporttabinthemainworkspacearea.Bydefault,theselectedobjectsandallitsdependentsareaddedtotheExporttab.

2. Right-clickontheExporttabandchoosetheExporttorepository…menuitemdisplayedwithboldtext.SelectDS4_LOCAL_EXTasthetargetrepository:

3. OntheExportConfirmationwindow,whichopensnext,excludeallobjectsthatalreadyexistinthetargetrepository.ThesearethedatastoreobjectsOLTPandDS_STAGE:

4. TheoutputofthedirectexportcommandisdisplayedintheOutput|Informationwindow:(14.2)07-13-1521:06:51(1000:6636)JOB:Exported1DataFlows

(14.2)07-13-1521:06:51(1000:6636)JOB:Exported2Tables

(14.2)07-13-

1521:06:51(1000:6636)JOB:CompletedExport.Exported3objects.

5. Now,exittheDesignerandreopenitbyconnectingtotheDS4_LOCAL_EXTrepository.ExpandthefullprojectTESTstructuretoseethatallmissingdependentobjectswereimportedintothestructureoftheJob_DWH_DimGeographyjob:

Howitworks…ManipulatingobjectsontheExporttabisapreparationstepthatallowsyoutoexcludetheobjectsthatyoudonotwanttoexporttotheATLfileordirectlytoanotherlocalrepository.AfterpreparingtheETLstructureforexportbyexcludingspecificobjectsthatyoudonotwanttoexport,incaseyoudonotwanttooverwriteversionsofthesameobjectsinthetargetrepositoryorarejustnotinterestedinmigratingthem,youhavethreeoptions:

Directexportintoanotherlocalrepository(acomparisonwindowopens,allowingyouexcludeobjectsfrombeingexportedandshowingwhichobjectexistsinthetargetrepository)ExporttoanATLfileExporttoanXMLfile(thisisexactlythesameasthepreviousoption,exceptthatadifferentflatfileformatisusedtostoretheETLcode)

AnATLfileisastructuredfilethatcontainsproperties,links,andreferencesfortheobjectsexported.

AnATLfilecanbeopenedinanytexteditor.Itcanbeusefultobrowseitscontentsifyouwanttocheckwhichspecificobjectincludedintheexportfile.Forfunctionobjects,itiseasytoseethetextoftheexportedfunctionifyouwanttocheckitsversionandsoon.

Forexample,ifyouopentheexport.atlfilegeneratedinthisrecipewithNotepadandsearchforDF_Load_DimGeography,youwillseethatitcanbefoundintwoplaceswithinafile:

Thefirstsectiondefinesthepropertiesoftheobject,andtheseconddefinesitsplacewithinanexecutionstructure.

DebuggingjobexecutionHere,IwillexplaintheuseofDataServicesInteractiveDebugger.Inthisrecipe,IwilldebugtheDF_Transform_DimGeographydataflow.

ThedebuggingprocessistheprocessofdefiningthepointsintheETLcode(dataflowinparticular)thatyouwanttomonitorcloselyduringjobexecution.Bymonitoringitclosely,Imeantoactuallyseetherowspassingthroughoreventohavecontroltopausetheexecutionatthosepointstoinvestigatethecurrentpassingrecordmoreclosely.

Thosepointsincodearecalledbreakpoints,andtheyareusuallyplacedbeforeandafterparticulartransformobjectsinordertoseetheeffectmadebyparticularatransformationonthepassingrow.

Gettingready…Theeasiestwaytodebugaspecificdataflowistocopyitinaseparatetestjob.CreateanewjobcalledJob_DebugandcopyDF_Transform_DimGeographyinitfromtheworkflowworkspacethatit’scurrentlylocatedin,orjustdraganddropthedataflowobjectintheJob_DebugworkspacefromLocalObjectLibrary|Dataflows.

Howtodoit…Herearethestepstocreateabreakpointandexecutethejobinthedebugmode:

1. First,definethebreakpointinsideadataflow.Todothis,double-clickonthelink

connectingthetwotransformobjects,JoinandMapping:

2. Createdbreakpointsaredisplayedasreddotsonthelinksbetweentransformobjects.Youcantogglethemon/offusingtheShowFilters/Breakpointsbuttonfromthetopinstrumentpanel:

3. GototheJob_DebugcontextandchooseDebug|StartDebug…fromthetopmenu,orjustclickontheStartDebug…(Ctrl+F8)buttononthetopinstrumentpanel:

4. TheDebugPropertieswindowopens,allowingyoutospecifyorchangethedebugproperties.Donotchangethem—thedefaultvaluesaresuitableformostdebuggingcases:

5. Inthedebuggingmode,thejobexecutesinthesamemannerasinthenormalexecutionmode,exceptthatitispossibletopauseitatanymomenttobrowsethedatabetweentransforms.Inourcase,thejobpausedautomaticallyassoonasthefirstpassingrowmeetsthespecifiedbreakpointcondition.Toviewthedatasetpassedbetweenthetransforms,clickonthemagnifyingglassicononthelinkbetweenthetransformobjects:

6. Whenpausedorrunning,thetop-levelinstrumentpanelchangestheactivatingdebuggingbuttons,allowingyoutostop/continuedebugging:

Alternatively,stepthroughthepassingrowsonebyonewhenviewingthedatasetbetweentransforms:

7. Alongwiththebreakpoints,youcandefinethefilterinthesamewindow:

Thefilterisdisplayedwithadifferenticoninthedataflowandallowsyoutofilterdatasetspassingthroughthedataflowinthedebuggingmode.

Howitworks…Two-stepprocess:

1. Definethebreakpointswhereyouwantthejobexecutiontopause.2. Runthejobinthedebuggingmode.

Breakpointsallowyoutopausejobexecutiononaspecificconditionsothatyouareabletoinvestigatethedataflowingthroughyoudataflowprocess.Inthedebuggingmode,itispossibletoseeallrecordspassedbetweentransformobjectsinsideadataflow.Youcanseehowaspecificrecordextractedfromthesourceobjectistransformedandchangedwhileitismakingitswayintothetargetobject.ItisalsoeasytodetectwhentherecordisfilteredbytheWHEREclausecondition,asitwillnotappearaftertheQuerytransformthatfiltersitout.

Managingfilters/breakpointswiththeFilters/Breakpoints…(Alt+F9)buttonfromtheinstrumentpanel.

Filtersappliedtolinksbetweentransformobjectsareconsideredonlywhenthejobisexecutedinthedebuggingmode.FiltersaswellasbreakpointsarenotvisiblefortheDataServicesenginewhenthejobisexecutedinthenormalexecutionmode.

NoteFiltersareagreatwaytodecreasethenumberofrecordspassingthroughthedataflowwhenyourunajobinthedebuggingmode.Ifyouareinterestedindebugging/seeingthetransformationbehaviorforasmall,specificamountofrecordsthatcanbedefinedwithfilteringconditions,thenitcouldsignificantlydecreasethedebuggingexecutiontime.

MonitoringjobexecutionInthisrecipe,wewilltakeacloserlookatthejobexecutionparameters,tracingoptions,andjobmonitoringtechniques.

GettingreadyWewillusethejobwedevelopedinthepreviouschapters,Job_DWH_DimGeography,toseehowthejobexecutioncanbetracedandmonitored.

Let’sperformminorchangestothejobtopreparethejobfortherecipeexamplesusingthesesteps:

1. Onthejob-levelcontext,createaglobalvariable,$g_RunDate,ofthedatedatatype

andassignthesysdate()functiontoitasavalue.2. Atthesamejoblevel,beforethesequenceofworkflows,placeanewscriptobject

withthefollowingcodeandlinkittothefirstworkflow.Thisscriptwillbethefirstobjectexecutedwithinajob:print(‘*************************************************’);

print(‘INFO:Job’||job_name()||’startedon’||$g_RunDate);

print(‘*************************************************’);

Howtodoit…ClickontheExecute…buttontoexecutethejob.Beforethejobruns,theExecutionPropertieswindowopens,allowingyoutosetupexecutionoptions,configurethetracingofthejob,orchangethepredefinedvaluesoftheglobalvariablesforthatparticularjobruntoadifferentone.

Let’stakeacloserlookatthetabsavailableonthiswindow:

ClickontheExecutionOptionstab.

Herearetheoptionsavailableonthistab:

Printalltracemessages:Thisoptiondisplaysallthepossibletracemessagesfromallcomponentsparticipatinginthejobexecution:objectparametersandoptions,internalsystemqueriesandinternallyexecutedcommands,loaderparameters,thedataitself,andmanyotherdifferentkindsofinformation.Theloggeneratedissoenormousthatwedonotrecommendthatyouusethisoptionifyouhaveafewworkflow/dataflowobjectsinsideyourjoborifthedatapassingyourdataflowsisbigenoughtonotwanttoseeeveryrowofitpassingthroughthetransformations.

ThisoptionliterallyshowswhatishappeningineveryDataServicesinternalcomponentparticipatinginthedataprocessing,andallthisinformationisdisplayedforeveryrowpassingthosecomponents.

Monitorsamplerate:Thisoptiondefineshowoftenyourlogsgetupdatedwhenthejobruns.Thedefaultis5seconds.

Collectstatisticsforoptimization:Thisoptioncollectsoptimizationstatistics,allowingDataServicestochooseoptimalcachetypesforvariouscomponentswhenexecutingdataflows.Wewilltalkaboutitinmoredetailintheupcomingchapters.Collectstatisticsformonitoring:Ifset,DataServiceswilldisplaycachesizesinthetracelogwhenthejobruns.Usecollectedstatistics:ThismakesDataServicesusethestatisticscollectedwhenthejobwasexecutedpreviouslywiththeCollectstatisticsforoptimizationoptionsetup.

ClickonthesecondTracetab.

Thistabhasalistofvarioustraceoptions.Settingupeachoftheseoptionsaddsextrainformationtothecontentsofthetracelogfilewhenthejobruns:

Bydefault,onlyTraceSession,TraceWorkFlow,andTraceDataFlowareenabled.SwitchtheirvaluestoNoandenableonlyTraceRowbychangingitsvaluetoYes.Afteryouexecutethejob,youwillseethefollowingtracelog:

Youcanseethatyoudonotseeinformationaboutthestatusesoftheworkflowanddataflowexecutionthatyounormallysee.Thetracelogfilenowdisplaysonlytheoutputoftheprint()functionsfromuserscriptobjectsandrowspassingthroughthedataflows.Beextracareful—thisisalotofdata.Avoidusingthisoptionunlessyouarespecificallyinadesigntestenvironmentwithjustafewrowsredfromthesourcetable.

ClickonthethirdGlobalVariabletab.

Thistabdisplaysthelistofallglobalvariablescreatedwithinthejob,allowingyoutomodifytheirvaluesforthisspecificjobexecutionwithoutchangingthesevaluesinthejobcontextlevel:

Tochangethevalue,justdouble-clickontheValuefieldofthespecificglobalvariablerowandinputthenewvalue.Rememberthatthischangeappliesonlytothiscurrentjobexecution.Whenyourunthejobnexttimeandopenthistab,theglobalvariableswillhavetheirdefaultvaluesdefinedagain.

LogintotheDataServicesManagementConsoletomonitorjobexecutionandgotoAdministrator|Batch|DS4_REPO.

TheManagementConsolenotjustallowstheWebaccesstothesamethreelogfiles,trace,log,andmonitor,butalsotoanotherone,PerformanceMonitor:

Thetop-levelsectionallowseasyaccesstothepreviousversionsofthelogfilesforaspecificjob.ItdoesnotmatterwhetherthejobhasbeenplacedintheProjectfolderornot.

Intheprecedingscreenshot,wedisplayedalllogfilesforthelast5daysfortheJob_DWH_DimGeographyjob.

ClickonthePerformanceMonitorlinkofthelastjobexecutiontoopenthePerformanceMonitorpage:

ThefirstpageofPerformanceMonitordisplaysthelistofdataflowsfromthejobstructure.Whenclickingonthespecificdataflow,itispossibletodrillinonthedataflowcomponentsleveltoseehowmanyrecordspassedthroughthespecificdataflowcomponentsandtheexecutiontimeofeachthem.

Infact,theinformationdisplayedinPerformanceMonitorisbasedonthesamedataastheinformationdisplayedintheMonitorlog.Itisjustpresenteddifferently,inmakingitsometimesmoreconvenientforanalysis.

Howitworks…Itissimplyamatterofpersonalchoicewhendecidingwhattousetomonitorjobexecution:thewebapplicationofDataServicesManagementConsoleortheDesignerclient.Sometimes,duetorestrictedaccesstotheenvironment,theWeboptionismorepreferable.Itisalsoeasiertouseifyouneedtofindanyoldlogfilesofaspecificjobforanalysis,performancecomparison,orsimplyneedtocopyandpastefewrowsfromthetracelogfile.

BuildinganexternalETLauditandauditreportingInthisrecipe,wewillimplementtheexternaluser-builtETLauditmechanism.OurETLauditwillincludeinformationaboutthestartandstoptimesoftheworkflowsrunningwithinthejob,theirstatuses,names,andinformationaboutwhichjobtheybelongto.

Gettingready…WeneedtocreateanETLaudittableinourdatabasewherewewillstoretheauditresults.

ConnecttotheSTAGEdatabaseusingtheSQLServerManagementStudioandexecutethefollowingstatementtocreatetheETLaudittable:createtabledbo.etl_audit(

job_run_idinteger,

workflow_statusvarchar(50),

job_namevarchar(255),

start_dtdatetime,

end_dtdatetime,

process_namevarchar(255)

);

Howtodoit…First,weneedtochooseobjectsforauditing.Thefollowingstepsshouldbeimplementedforeveryworkflowordataflowthatyouwanttocollectauditinginformationabout.Inthisparticularexample,wewillenableETLauditingforthejobobjectitself.

1. Createextravariablesforthejobobject:

$v_process_namevarchar(255)

$v_job_run_idinteger

2. Addthefollowingcodeinthescriptthatstartsthejobexecution:$v_process_name=job_name();

$v_job_run_id=job_run_id();

#Insertauditrecord

sql(‘DS_STAGE’,

’insertintodbo.etl_audit(job_run_id,workflow_status,job_name,start_dt,end_dt,process_name)’||

’values(‘||$v_job_run_id||’,’||’'STARTED'’||’,'’||job_name()||’',SYSDATETIME(),NULL,'’||$v_process_name||’')’

);

3. Createanewscript,ETL_audit_update,attheendoftheexecutionsequenceinsidethejobcontextandputthefollowingcodeinit:#UpdateETLauditrecord

sql(‘DS_STAGE’,

’updatedbo.etl_audit’||

’setworkflow_status=’||’'COMPLETED'’||’,end_dt=SYSDATETIME()’||

’wherejob_run_id=’||$v_job_run_id||’andprocess_name='’||$v_process_name||’'’

);

4. Thejobcontenthasnowbeenwrappedintheauditinginsert/updatecommandsplacedintheinitialandfinalscripts:

5. ImplementtheprecedingstepsforWF_Extract_SalesTerritory,whichcanbefoundintheWF_extractworkflowcontainertoenabletheETLauditforthatobjectaswell.

Theonlychangeisthatintheinitialscript,the$v_process_namevariablevalueshouldbechangedtotheworkflow_name()functioninsteadofthejob_name()function,howitwasdoneforthejob:

Howitworks…Now,ifyouexecutethejobandquerythecontentsoftheETLaudittableinafewseconds,youshouldseesomethinglikethis:

Afewsecondslater,afterthejobsuccessfullycompletes,yourETLaudittablewilllooklikethis:

Asimpleanalysisofthistablecananswerthefollowingquestions:

Whichobjectsarerunningwithinthecurrentlyrunningjob?Thisisveryusefulinformation,especiallyifyourjobcontainshundredsofworkflows,with20ofthemrunninginparallel.Inthiscase,itishardtoobtainthisinformationfromthetracelog.Whatwasthestatusoftheobjectwhenitwasexecutedlasttime?Tobeprecise,youalsohavetoimplementanotherpieceoflogic,thethirdupdatethatchangesthestatusoftheworkflowto“ERROR”ifsomethingunexpectedhappensandtheworkflowcannotbeconsideredassuccessfullycompleted.Thisthirdupdateusuallygoesintothecatchsectionofthetry-catchblock.Whatwastheexecutiontimeforthespecificobject?Theanswerspeaksforitself.Whatwastheexecutionorderoftheobjects?Youcancomparetheexecutiontimes.Ifyouknowwhentheobjectsstartedandended,youcaneasilyderivetheexecutionorder.Whencomparableworkflowsarenotdirectlylinkedandrunwithindifferentbranchesoflogic,itissometimesusefultoknowwhichonestartedorfinishedearlier.

Theadvantageoftheexternaluser-builtETLauditisthatyoucanbuildaflexiblesolutionthatgathersanyinformationthatyouwantittogather.

NoteNotethatwithinsert/updateETLauditstatements,youcandefinethelogicalbordersofasuccessfulobjectcompletion.Theoretically,aworkflowobjectandthejobitselfcanstillfailrightafteritsuccessfullyexecutesthesql()commandandupdatesitsstatusintheETLaudittableassuccessful.However,thisisoftenagoodthingasitisexactlywhatyouareinterestedinwhenyoumakethedecisionofifyoushouldrerunaspecificworkflowornot–hastheworkflowcompletedtheworkitwassupposedto?

InformationinETLaudittablescanbeutilizednotonlyinthereportsshowingthe

executionstatisticsofyourjobsbutalsotoimplementexecutionlogicinsidethejob.

Forexample,ifyouwanttorunthespecificworkflowonlyonceaweekbutitisbeingexecutedwithinadailyjob,youcouldaddthescriptobjectsinyourworkflow.YoucouldcheckfromETLaudittableswhentheworkflowwasrunthelasttimeandskipitifitwasexecutedandsuccessfullycompletedlessthanaweekago.

Finally,itisevenpossibletonotonlyauditaDataServicesobject(dataflow,workflow,job,orscriptobject)buttoauditanypieceofcode—partofthescriptorasinglebranchofthelogic.Youcanwrapanythingintheinsert/updatestatementssenttoanexternaltabletostoreauditinformation.

ThatisthetruepowerofcustomETLauditing.YoucancollectalltheinformationyouwantandeasilyquerythisinformationfromETLitselftomakevariousdecisions.

Usingbuilt-inDataServicesETLauditandreportingfunctionalityDataServicesprovidesETLreportingfunctionalitythroughtheManagementConsolewebapplication.ItisavailableintheformoftheOperationalDashboardapplicationonthemainManagementConsoleHopepage.

GettingreadyYoudonothavetoconfigureorpreparetheoperationaldashboardsfeature.Itisavailablebydefault,andallyouhavetodotoaccessitisstarttheDataServicesManagementConsole.

Howtodoit…Let’sreviewwhichETLreportingcapabilitiesareavailableinDataServices.Performthesesteps:

1. StarttheDataServicesManagementConsole.2. ChoosetheOperationalDashboardapplicationfromthehomepage:

3. ThemaininterfaceofOperationalDashboardincludesthreesections.Itincludesthepiechartofthegeneraljobstatusstatisticsperintervalforaselectedrepository.Greenshowsthenumberofsuccessfullycompletedjobsforaspecificperiodoftime,yellowshowsjobssuccessfullycompletedwithwarningmessages,andredshowsfailedjobs:

4. Thesectionbelowshowsmoredetailedjobexecutionstatisticsintheformofaverticalbarchartforspecificdaysorintervalofdays.Trytohoveryourmousecursoroverthebarstoseetheactualnumbersbehindthegraph.Theverticallineshowsthenumberofjobsexecutedonspecificdayswithdifferentstatuses:successfullywithnoerrors(green),successfullywithwarningmessages(yellow),andfailed(red):

5. Attheright-handside,youcanseethelistofjobswhoseexecutionstatisticsarerepresentedbygraphsontheleft-handside.Byclickingonaspecificrow,youcandrilldowntoseethelistofexecutionsforthisspecificjob.Themostusefulinformationhereistheexecutiontimedisplayedinseconds,therunIDofthejob,andstatusofthejob,asyoucanseeinthefollowingscreenshot:

Howitworks…OperationalDashboardreportingcanbeusedtoprovidejobexecutionhistorydata,analyzethepercentageoffailedjobsforaspecifictimeinterval,andcomparethosenumbersbetweendifferentdaysortimeintervals.

Thatisprettymuchit.Todomore,youwouldhavetobuildyourownETLmetadatacollectionandbuildyourownreportingfunctionalityontopofthisdata.

AutoDocumentationinDataServicesThisrecipewillguideyouthroughtheAutoDocumentationfeatureavailableinDataServices.LikeOperationalDashboard,thisfeatureisalsopartoffunctionalityavailableintheDataServicesManagementConsole.

Howtodoit…ThesestepswillcreateaPDFdocumentcontaininggraphicalrepresentation,descriptions,andrelationshipsbetweenallunderlyingobjectsoftheJob_DWH_DimGeographyjobobject:

1. LogintotheDataServicesManagementConsolewebapplication.2. Onthehomepage,clickontheAutoDocumentationicon:

3. Inthefollowingscreen,expandtheprojecttreeandleft-clickonthejobobject.Youcanseewhichobjectisdisplayedascurrentbycheckingtheobjectnameinthetoptabnameontheright-handsideofthewindow:

4. Then,clickonthesmallprintericonlocatedatthetopofthewindow:

5. Inthepop-upwindow,justclickonthePrintbutton,leavingalloptionswithdefaultvalues.

6. DataServices,bydefault,generatesaPDFdocumentinthebrowser’sdefaultDownloadsfolder:

Howitworks…Asyouhaveprobablynoticed,theAutoDocumentationfeatureisonlyavailableforthejobsincludedinprojectsasitdisplaystheobjecttreestartingfromtherootProjectlevel.JobsthatwerecreatedintheLocalObjectLibraryandwerenotassignedtoaspecificprojectwillnotbevisibleforauto-documenting.

AutoDocumentationexportisavailableintwoformats:PDFandMicrosoftWord(seethefollowingscreenshot):

Onthesamescreen,youcandisplaytypesofinformationtobeincludedinthedocumentationfile.

NoteNotethatdataflowdocumentationincludesmappingofeachandeverycolumnfromsourcetotargetthroughalldataflowtransformations.Thisisaverydetailedlevel,andevenourdataflowinsideJob_DWH_DimGeographyisnotatallcomplex.Thedatasetswearemigratingarerelativelysmall,butwestillgeta34-pagesdocument.So,youcanseethatthedocumentationlevelisextremelydetailed.

AnotherextremelyusefulfeatureofDataServicesAutoDocumentationistheTableUsagetab:

Itallowsustoseewhichsourceandtargettableobjectsareusedwithinthe

Job_DWH_DimGeographyobjecttree.

InformationlikethisaboutrelationshipsbetweenobjectswithinETLisextremelyusefulasduringdevelopment,someobjectsoftenchange,andyouneedtoevaluatehowitimpactstheETLcode.Ifthetablecolumnischanged(renamedanddatatypechanged)onthedatabaselevelandyouhavetoapplythesamechangestoyourETLcode.Otherwise,itwillfailthenexttimeitruns,asDataServicesisnotawareofthetablechangesandstilloperateswitholdversionofthetable.

TableobjectdependenciescanalsobevisualizedwithanotherDataServicesfeature:ImpactandLinageAnalysis.ThisfunctionalitywillbediscussedinChapter12,IntroductiontoInformationSteward.

Chapter7.ValidatingandCleansingDataHerearetherecipespresentedinthischapter:

CreatingvalidationfunctionsUsingvalidationfunctionswiththeValidationtransformReportingdatavalidationresultsUsingregularexpressionsupporttovalidatedataEnablingdataflowauditDataQualitytransforms–cleansingyourdata

IntroductionThischapterintroducestheconceptsofvalidatingmethodsthatcanbeappliedtothedatapassingthroughETLprocessesinordertocleanseandconformitaccordingtothedefinedDataQualitystandards.Itincludesvalidationmethodsthatconsistofdefiningvalidationexpressionswiththehelpofvalidationfunctionsandthensplittingdataintotwodatasets:validandinvaliddata.Invaliddatathatdoesnotpassthevalidationfunctionconditionsusuallygetsinsertedintoaseparatetargettableforfurtherinvestigation.

Anothertopicdiscussedinthischapterisdataflowaudit.ThisfeatureofDataServicesallowsthecollectionofexecutionalstatisticalinformationaboutthedataprocessedbythedataflowandevencontrolstheexecutionalbehaviordependingonthenumberscollected.

Finally,wewilldiscusstheDataQualitytransforms—thepowerfulsetofinstrumentsavailableinDataServicesinordertoparse,categorize,andmakecleansingsuggestionsinordertoincreasethereliabilityandqualityofthetransformeddata.

CreatingvalidationfunctionsOneofthewaystoimplementthedatavalidationprocessinDataServicesistousevalidationfunctionsalongwiththeValidationtransforminyourdataflowtosplittheflowofdataintotwo:recordsthatpassthedefinedvalidationruleandthosethatdonot.Thosevalidationrulescanbecombinedintovalidationfunctionobjectsforyourconvenienceandtraceability.

Inthisrecipe,wewillcreateastandardbutquitesimplevalidationfunction.Wewilldeployitinourdataflow,whichextractstheaddressdatafromthesourcesystemintoastagingarea.TheValidationfunctionwillchecktoseewhetherthecityinthemigratedrecordhasParisasavalue,andifitdoes,itwillsendtherecordstoaseparaterejecttable.

GettingreadyFirst,weneedtocreateanotherschemainourSTAGEdatabasetocontainrejecttables.CreatingtheRejectschematostorethesetablesallowsthekeepingoftheoriginaltablenames;thatmakeswritingqueriesandreportingagainstthosetablesaswellaslocatingthemmucheasier.

1. OpenSQLServerManagementStudio.2. GotoSTAGE|Security|SchemasintheObjectExplorerwindow.3. Right-clickonthelistandchooseNewSchema…inthecontextmenu.4. ChooseRejectforschemanameanddboasschemaowner.5. ClickonOKtocreatetheschema.

Howtodoit…Followthesestepstocreateavalidationfunction:

1. LogintoDataServicesDesignerandconnecttothelocalrepository.2. GotoLocalObjectLibrary|CustomFunctions.3. Right-clickonValidationFunctionsandselectNewfromthecontextmenu.4. Inputthefunctionnamefn_Check_Paris,checkValidationfunction,asshownin

thefollowingscreenshot,andpopulatethedescriptionfield.

5. ClickonNextandinputthefollowingcodeinthemainsectionofSmartEditor:#Validationfunctiontocheckifthepassedvalueequals

#to‘Paris’

#Wrapthefunctioninthetry-catchblock.Wedonotwant

#tofailthedataflowprocess

#ifthefunctionitselffails.

try

begin

#Assigninputparametervaluetoalocalvariable

$l_City=$p_City;

$l_AddressID=$p_AddressID;

#Default“Success”resultstatus

$l_Result=1;

if($l_City=‘Paris’)

begin

#Changeto“Failure”resultstatus

$l_Result=0;

end

#Returningresultstatus

Return$l_Result;

end

catch(all)

begin

#writinginformationaboutthefailureinthe

#tracelog

print(‘Validationfunctionfn_check_Paris()failedwitherror:’||

error_message()||’whileprocessingAddressID={$l_AddressID}withCity={$l_City}’);

#Returningtheresultstatus

Return$l_Result;

end

6. InthesameSmartEditorwindow,createlocalvariables$l_AddressIDint,$l_Cityvarchar(100),and$l_Resultintandthefunction’sinputparameters,$p_Cityvarchar(100)and$p_AddressIDint.

7. ClickontheValidatebuttontovalidatethefunctionandclickOKtocloseSmartEditorandsaveallchanges.

Howitworks…Thefunction’sbodyiswrappedintry-catchblocktopreventourmaindataflowprocessesfromfailingifsomethinggoeswrongwiththevalidationfunction.Thevalidationfunctionisexecutedforeachrowpassingthrough,soitwouldbeineffectiveatallowingtheexecutionofthefunctiontodeterminetheexecutionbehaviorofthemainprocess.

Trytoimagineasituationwhenyouyourdataflowprocess2millionrecordsfromthesourcetableand50ofthemmakethefunctionfailforsomereasonorother.Toprocessall2millionrecordsinonego,youwouldneedtowrapthelogicoftheentirefunctionintry-catchandoutputextrainformationintothetracelogorintoanexternaltableinthecatchsectiontoperformfurtheranalysisofthedataafterprocessingisdone.

Inourexample,weonlypasstheAddressIDfieldfortraceabilitypurposes,soitwouldbeeasytofindtheexactrowonwhichthefunctionfailed.

Thevalidationfunctionshouldreturneither1or0.Thevalue1meansthattheprocessedrowagainstwhichthevalidationfunctionwasexecutedsuccessfullypassedthevalidation;0meansfailure.

Seeinthefollowingscreenshotthat,inLocalObjectLibrary,validationfunctionsaredisplayedseparatelyfromcustomfunctions:

UsingvalidationfunctionswiththeValidationtransformThisrecipewilldemonstratehowvalidationfunctionsaredeployedandconfiguredwithinadataflow.Asthevalidationfunctionthatwecreatedinthepreviousrecipevalidatescityvalues,wewilldeployitintheDF_Extract_AddressdataflowobjecttoperformthevalidationofdataextractedfromtheAddresstablelocatedinthesourceOLTPdatabase.

GettingreadyOpenthejob-containingdataflow,DF_Extract_Address,alreadycreatedintheUsecaseexample–populatingdimensiontablesrecipeinChapter5,Workflow–ControllingExecutionOrder,andcopyitintoanewjobtobeabletoexecuteitasastandaloneprocess.

Howtodoit…1. OpenDF_Extract_Addressinthemainworkspaceforediting.2. GotoLocalObjectLibrary|Transform,findtheValidationtransformunder

Platform,anddragitintotheDF_Extract_AddressdataflowrightaftertheQuerytransform.

3. LinktheoutputoftheQuerytransformtotheValidationtransformanddouble-clickontheValidationtransformtoopenitforediting.

4. OpentheValidationtransformintheworkspaceandseehowValidationsplitstheflowintothreeoutputschemas:Validation_Pass,Validation_Fail,andValidation_RuleViolation:

TheValidation_PassandValidation_Failoutputschemasareidentical,exceptthatValidation_Failcontainsthreeextracolumns:DI_ERRORACTION,DI_ERRORCOLUMNS,andDI_ROWID.

5. InsidetheValidationtransform,clickontheAddbuttonlocatedontheValidationRulestabtocreatethefirstvalidationrule.ChoosetheValidationFunctionoptionforthecreatedruleandmapcolumnssentfromtheprevioustransformoutputtotheinputparameters,alsochoosingSendToFailasthevalueforActiononFail.Donotforgettospecifythevalidationrulenameanddescription.

6. ClickonOKtocreatethevalidationrule.ItisnowdisplayedintheValidationtransform.

7. NowclosetheValidationtransformeditorwindowandaddthreeQuerytransforms,oneforeachvalidationschemaoutput.NamethemValidation_Pass,Validation_Fail,andValidation_Rules.LinktheValidationtransformoutputtoallQuerytransformschoosingthecorrectlogicbrancheachtimeDataServicesasksyouto.

8. MapallinputschemacolumnstotheoutputschemasinallcreatedQuerytransformswithoutmakinganychangestothemappings.

9. CreatetwoadditionaltemplatetargettablestooutputdatafromtheRulesandFailtransforms.SpecifytheREJECTownerschemaforbothofthemasfollows:

TheADDRESStemplatetablefortheFailoutputTheADDRESS_RULEStemplatetablefortheRulesoutput

10. Yourfinaldataflowversionshouldlookliketheoneinthefollowingscreenshot:

11. Saveandexecutethejob.12. Aftertheexecutionisfinished,openthedataflowagainandviewthedatainthe

REJECT.ADDRESSandREJECT.ADDRESS_RULEStables:

NoteNotethattherowswherethevalueofCITYequalsParisarenotpassedtotheTransform.ADDRESSstagetableanymore.

Howitworks…Usually,theValidationtransformisdeployedrightbeforethetargetobjecttoperformthevalidationofdatachangedbyprevioustransformations.

ThePassoutputschemaoftheValidationtransformisusedtooutputrecordsthathavesuccessfullypassedthevalidationruledefinedbyeithervalidationfunction(s)orcolumncondition(s).

Notethatyoucandefineasmanyvalidationfunctionsorcolumnconditionrulesasyoulike,andDataServicesisveryflexibleinallowingyoutodefinedifferentActiononFailoptionsfordifferentfunctions.Thismakesitpossibletosendsome“failed”recordstobothPassandFailoutputsorothersonlytotheFailoutput,dependingontheseverityofthevalidationrule.

Let’sreviewanotherfeatureoftheValidationtransform—theabilitytomodifythevaluesofthepassingrowsdependingontheresultofvalidationrule.Followthesesteps:

1. OpentheValidationtransformforeditinginthemainworkspace.2. Aswearevalidatingthecityname,let’schangethebehavioroftheValidation

transformtosendtherowswhichdidnotpassvalidationtobothPassandFail.However,intherowssenttothePassoutput,changethecitynamevaluefromParistoNewParis.TodothatinthesectionlocatedatthebottomoftheValidationtransformeditor,choosetheQuery.CITYcolumnandspecify‘NewParis’intheexpressionfield,asshownhere:

3. Saveandexecutethejob.4. OpenthedataflowagainandviewthedatafromboththeTransform.ADDRESSand

Reject.ADDRESStables.YouwillseethatrecordswiththesameADDRESSIDfieldwereinsertedinboththetables,butinthemainstagingtable,thevaluesforcitynameweresubstitutedwithNewParis.

SeethefollowingtableforadescriptionoftheextracolumnsfromtheFailandRuleViolationValidationtransformoutputschemas:

DI_ERRORACTIONThisshowswheretheoutputforthespecificrulewassent:Bmeans“both”,Fmeans“fail”,andPmeans“pass”.

DI_ERRORCOLUMNS

Thisshowsthespecificcolumnsthatwerevalidated(aspartofinputvaluesforthevalidationfunctionorsimplyasasourceforcolumnvalidation).

DI_ROWID Thisistheuniqueidentifierofthefailedrow.

DI_RULENAME Thisisthenameoftherulewhichgeneratedthefailedrow.

DI_COLUMNNAME

Thisisthevalidatedcolumn(partofthevalidationfunctioninputvaluesorsourceforcolumnvalidationinthevalidationrule).NotethatinADDRESS_RULEoutput,onerowisgeneratedforeachvalidatedcolumnseparately.So,ifyourvalidationfunctionwasusingfivecolumnsfromthesourceobject,allfiveofthemareconsideredtobevalidatedcolumns,andincaseoffailure,fiverowswillbecreatedintheADDRESS_RULEtableforeachcolumnwiththesameROWID(seethefigureshowingthecontentsoftheADDRESS_RULEtableinthefirstexampleofjobexecutioninthisrecipe).

ReportingdatavalidationresultsOneoftheadvantagesofusingtheValidationtransformisthatDataServicesprovidesthereportingfunctionalitywhichisbasedonvalidationstatisticsandcollectedsampledataduringvalidationprocesses.

ValidationreportscanbeviewedintheDataServicesManagementConsole.Inthisrecipe,wewilllearnhowtocollectdataforvalidationreportsandaccessthemintheDataServicesManagementConsole.

GettingreadyUsethesamejobanddataflow,DF_Extract_Address,updatedwiththeValidationtransformasinthepreviousrecipesofthecurrentchapter.

Howtodoit…1. OpenthedataflowDF_Extract_Addressanddouble-clickontheValidation

transformobjecttoopenitforediting.

NoteTobeabletouseDataServicesvalidationreports,thevalidationstatisticscollectionhastobeenabledfirstforaValidationtransformobjectintheETLcodestructurethatyouwanttocollectthereportingdatafor.

2. OpentheValidationTransformOptionstabintheValidationtransformeditor.3. Tickbothcheck-boxes,CollectdatavalidationstatisticsandCollectsampledata.

4. SaveandrunthejobtocollectthedatavalidationstatisticsforthedatasetprocessedbyDF_Extract_Address.MakesurethatyoudonothavetheoptionDisabledatavalidationstatisticscollectionselectedonthejob’sExecutionPropertieswindow:

5. LaunchtheDataServicesManagementConsoleandlogintoit.6. OntheHomepage,clickontheDataValidationlinktostarttheDataValidation

dashboardwebapplication:

7. Experimentandhoveryourmouseoverthepiecharttoseethedetailedinformationaboutpassedandfailedrecordsforyourvalidationrule.

8. Clickonaspecificareainthepiecharttodrilldownintoanotherbarchartreportshowingvalidationrules.AsweonlyhaveonevalidationruledefinedinourValidationtransformandinallrepository,thereisonlyonebardisplayedfortheCity_not_Parisvalidationrule.

Howitworks…TheoptionsCollectDataValidationStatisticsandCollectsampledataenableDataServicestocollectexecutionstatisticsforspecificValidationtransformrules.Inourcase,wedefinedone,sothereisnotmuchdiversityinthepresenteddashboardreportsthatyoucanseeintheDataServicesManagementConsole.

Hereisthepiechartyouseeafterimplementingsteps7-8ofthisrecipe:

Byclickingontheobjectinthebarchart,youcandrilldowntotheactualdatasampleofthefailedrowscollectedbytheValidationtransformduringjobexecution.

Theinformationpresentedinthesedashboardreportsisaveryusefulgraphicalrepresentationofthequalityofthedatawhichpassesthroughdataflowobjectsandgetsvalidated.Youcaneasilyseewhatpercentageofdatadoesnotpassthevalidationrules,thecomparisonofvalidationstatisticsbetweendifferentperiodsoftime,andeventheactualrowsthatdidnotpassthespecificvalidationrulewithoutrunningSQLqueriesonyourdatabasetables,orusinganyotherapplicationexceptDataServicesManagementConsole.

UsingregularexpressionsupporttovalidatedataInthisrecipe,wewillseehowyoucanuseregularexpressionstovalidateyourdata.WewilltakeasimpleexampleofvalidatingphonenumbersextractedfromthesourceOLTPtablePERSONPHONElocatedinthePERSONschema.Thevalidationrulewouldbetoidentifyallrecordswhichhavephonenumbersdifferentfromthispattern:ddd-ddd-dddd(dbeinganumeral).Let’ssaythatwedonotwanttorejectanydata.Ourgoalistogenerateadashboardreportshowingthepercentageofrecordsinthesourcetablewhichdonotcomplywiththespecifiedrequirementforthephonenumberpattern.

GettingreadyMakesurethatyouhavethePERSON.PERSONPHONEtableimportedintotheOLTPdatastore.Wewillcreateanewjobandnewdataflow,DF_Extract_PersonPhone,whichwillbemigratingPersonPhonerecordsfromOLTPtoaSTAGEdatabase,atthesametimeasvalidatingthem.

Howtodoit…1. Createanewjobwithanewdataflow,DF_Extract_PersonPhone,designedasa

standardextractdataflowwithadeployedValidationtransform,asshowninthefollowingfigure:

2. YoushouldalsocreatetargettablesfortheRuleViolationandFailoutputschemasintheRejectschemaoftheSTAGEdatabase.

3. Toconfigurethevalidationrule,opentheValidationtransformforeditinginthemainworkspace.UseColumnValidationinsteadofValidationFunctionandputthefollowingcustomconditionintoQuery.PHONENUMBER:match_regex(Query.PHONENUMBER,’^\d{3}-\d{3}-\d{4}$’,NULL)=1

Thevalidationruleconfigurationshouldlooklikeinthefollowingscreenshot:

NoteNotethatforActiononFail,wesetupSendToBothaswedonotwantourvalidationprocessaffectingthemigrateddataset.

4. ClickonOKtocreateandsavethevalidationrule.5. Nowgotothesecondtab,ValidationTransformOptions,andcheckallthree

options:Collectdatavalidationstatistics,Collectsampledata,andCreatecolumnDI_ROWIDonValidation_Fail.

6. YourValidationtransformshouldlooklikethisnow:

7. Saveandexecutethejobtoextracttherecordsintothestagingtableandcollectthevalidationdataforthedashboardreport.

Howitworks…Regularexpressionsareapowerfulwaytovalidatethedatapassingthrough.Thematch_regex()functionusedinthisrecipereturns1ifthevalueintheinputcolumnmatchesthepatternspecifiedasthesecondinputparameter.

DataServicessupportsstandardPOSIXregularexpressions.Seethematch_regexsection(section6.3.96)inChapter6,FunctionsandProcedures,oftheDataServices4.2ReferenceGuideforfullsyntaxandregularexpressionsupportdetails.

Notethatinthisrecipe,wedidnotrejecttherecordswhichfailedthevalidationrule.Asourgoalwastosimplyevaluatethenumberofrecordswhichdonotcomplywiththephonenumberstandard,bothfailedandpassedrecordswereforwardedtothetargetmainstagingtable.

Let’sseehowthedashboardvalidationreportforourjobexecutionlooks:

1. LaunchtheDataServicesManagementConsoleandloginintoit.2. OpentheDataValidationapplicationonthemainHomepage.3. Bydefault,DataServicesshowsthedatavalidationstatisticsforallfunctionalareas

forthecurrentdate(startingfrommidnight).4. Hoveryourmousepointerandclickonthefailedredsectionofthepiecharttosee

thefollowingdetails:thepercentageandnumberofrowswhichdidnotpassthevalidationrule.

5. Ifyoudidnotrunanyjobsgatheringvalidationstatisticstoday,thepiechartforDF_Extract_PersonPhonecreatedandexecutedinthisrecipeshowsthat9,188records(46%)inthePERSONPHONEtablehaveaphonenumberinapatterndifferentfromddd-ddd-dddd,and10,784records(54%)havephonenumbersmatchingthispattern.

EnablingdataflowauditAuditinginDataServicesallowsthecollectionofadditionalinformationaboutthedatamigratedfromthesourcetothetargetbyaspecificdataflowonwhichtheauditisenabled,andevenallowsmakingdecisionsaccordingtotherulesappliedontheauditdata.Inthisrecipe,wewillseehowauditcanbeenabledandutilizedduringtheextractionofdatafromthesourcesystem.

GettingreadyForthisrecipe,youcanusethedataflowDF_Extract_Addressfromthepreviousrecipesofthischapter.

Howtodoit…Performthefollowingstepstoenabletheauditingforthespecificdataflow.

1. OpenDF_Extract_AddressintheworkspacewindowandselectTools|Auditfrom

thetop-levelmenu.

2. Inthenewlyopenedwindow,selecttheLabeltab,right-clickintheemptyspace,andchooseShowAllObjectsfromthecontextmenu.

3. TheLabeltabdisplaysthelistofobjectsfromwithinadataflow.EnableauditingontheQueryandPassQuerytransformobjectsbyright-clickingonthemandselectingtheCountoptionfromthecontextmenu.

4. Anotherwaytoenableauditingonspecificobjectsfromwithinadataflowistoright-clickonitandselectthePropertiesoptionfromthecontextmenu.

5. Then,gototheAudittabinthenewlyopenSchemaPropertieswindowandselecttherespectiveauditfunctionfromthecomboboxmenu.Inourcase,bothauditpoints

wereenabledforQuerytransforms,andtheonlyauditoptionavailableinthiscaseisCount.

6. DataServicescreatestwovariableswhichareusedtostoretheauditvalue.ForthePassQuerytransform,twovariableswerecreatedbydefault:$Count_Pass,tostorethenumberofsuccessfullypassedrecords,and$CountError_Passtostorethenumberofincorrectorrejectedrecords.

7. Let’schangethedefaultauditvariablenamesfortheQueryobjectbyopeningitspropertiesandselectingtheAudittabontheSchemaPropertieswindow.

8. Specifytheauditvariablenamestobe$Count_Extractand$CountError_Extract.Then,closethewindowbyclickingontheOKbutton.

9. Now,closetheAudit:DF_Extract_AddresswindowbyclickingontheClosebutton.

10. Ifyoutakealookatthedataflowobjectsintheworkspacewindow,youcanseethatthecreatedauditpointsweremarkedwithsmallgreenicons.Toaccessthedataflowauditconfiguration,youalsocanjustclickontheAuditbuttoninthetoolsmenu.

Howitworks…Atthispoint,youhaveconfiguredtheauditcollectionforrowspassingtwoQueryobjectsintheDF_Extract_Addressdataflow.Auditing,ifenabledattheobjectlevel,allowsonlysingle-auditfunctionusage:countauditfunction.Thisauditfunctionsimplykeepstrackofthenumberofrecordspassingthespecificobjectinsidethedataflow.

Auditingcanalsobeenabledonthecolumnlevelinsidetheobjectwhichresidesinsidethedataflow,usuallyonthecolumnsintheQuerytransforms.Inthatcase,threeadditionalauditfunctionsareavailable—Sum,Average,andChecksum—ifthecolumnisofnumericdatatypeandonlyChecksumisavailableifthecolumnisofthevarchardatatype.Asyoumighthaveguessed,thesefunctionsallowyoutostoreeitherthesummaryortheaverageofvaluesinthespecificcolumnsforallpassingrecordsorcalculatethechecksum.

Asyoucansee,thecollectedauditdatacanlaterbeaccessedfromtheOperationalDashboardtabintheDataServicesManagementConsole.However,themostusefulpurposeoftheauditfeatureistheabilitytodefinetherulesonthecollectedauditdataandperformtheactionsdependingontheresultoftheauditruleimplemented.

Herearethestepsshowingyouhowtoimplementtheruleoncollectedauditdata:

1. OpenDF_Extract_AddressintheworkspaceandclickontheAuditbuttontoopen

theAuditconfigurationwindowforthisdataflow.2. GototheRuletab.3. ClickontheAddbuttontoaddanewauditrule.4. ChoosetheCustomoptiontodefineacustomauditrule.5. Inputthecustomfunctionshowninthefollowingscreenshot:

6. ChecktheoptionRaiseexceptionintheActiononfailuresection.OtheroptionsareEmailtolistandScript.

TheEmailtolistoptionallowsyoutosendnotificationsaboutruleviolationstospecificemailrecipients.Notethattousethisfunctionality,youhavetospecifySMTPserverdetailsinyourDataServicesconfiguration.

TheScriptoptionallowsyoutoexecutescriptswritteninastandardDataServicesscriptinglanguage.

7. Therulethatwespecifiedisappliedattheveryendofthedataflowexecutionandchecksthatthepercentageofrowswhichpassedthevalidationruletakenfromthetotalamountofrowsextractedfromthesourcetableishigherthan80percent.RememberthatourvalidationrulechecksandrejectsallParisrecords.WeknowthatthenumberofrecordswithacityvalueequaltoParisissignificantlylessthan20percentoftherows,whichshouldberejectedduringvalidationtofailthedefinedauditrule.So,ifyourunyourdataflownow,nothingwillhappen;theauditrulewillnotbeviolatedandthejobwillbesuccessfullycompleted.Tomaketheauditrulefail,let’schangeourvalidationfunctiontorejectallrecordswithacityvaluenotequaltoParis,asshowninthefollowingscreenshot:

8. Asthefinalstepforutilizingauditfunctionalityonthejob’sExecutionPropertieswindow,youshouldchecktheEnableauditingoption.Ifthisisnotchecked,auditdatawillnotbecollectedandauditruleswillnotwork.

9. Saveandexecutethejob.Dataflowexecutionfailsandrelevantinformationisdisplayedintheerrorlog,asshownhere:

NoteRememberthatalthoughthedataflowDF_Extract_Addressfails,theauditrulecheckhappensafteritcompletesallthepreviousstepsandthedataissuccessfullyinsertedintoalltargets.

There’smore…CollectedauditnumberscanbeaccessedviatheOperationalDashboardtabfromtheDataServicesManagementConsole.

Toaccessit,opentheOperationalDashboardtabandselectspecificjobstoopenJobExecutionDetails.Byclickingonthejobexecutioninstancesfurther,youcanopenaJobDetailsview,whichwillcontaininformationaboutalldataflowsexecutedwithinajob.Ifthedataflowhasauditenabledforitcolumns,ContainsAuditDatawillshowyouthat.

ByclickingontheViewAuditDatabutton,youcanopenthenewwindowshowingvaluescollectedduringauditingandtheauditruleresultfortheselectedjobinstanceexecution.

DataQualitytransforms–cleansingyourdataDataQualitytransformsareavailableintheDataQualitysectionoftheLocalObjectLibraryTransformstab.Thesetransformshelpyoutobuildacleansingsolutionforyourmigrateddata.

ThesubjectofimplementingDataQualitysolutionsinETLprocessesissovastthatitprobablyrequiresawholechapter,orevenawholebook,dedicatedtoit.ThatiswhywewilljustscratchthesurfaceinthisrecipebyshowingyouhowtousethemostpopularofDataQualitytransforms,Data_Cleanse,toperformthesimplestdatacleansingtask.

GettingreadyTobuildadatacleansingprocess,itwouldbeidealifwehadsourcedatawhichrequiredcleansing.Unfortunately,ourOLTPdatasource,andespeciallyDWHdatasource,alreadycontainprettyconformedandcleandata.Therefore,wearegoingtocreatedirtydatabyconcatenatingmultiplefieldstogethertoseehowDataServicescleansingpackageswillautomaticallyparseandcleansethedataoutoftheconcatenatedtextfield.

Asapreparationstep,makesurethatyouhaveimportedthesethreetablesinyourOLTPdatastore:PERSON,PERSONPHONE,andEMAILADDRESS(allofthemarefromthePERSONschemaoftheSQLServer’sAdventureWorks_OLTPdatabase).

Howtodoit…1. Asthefirststep,createanewjobwithanewdataflowobjectinit.Namethedataflow

DF_Cleanse_Person_Details.2. Importthreetables—PERSON,PERSONPHONE,andEMAILADDRESS—fromtheOLTP

datastoreasasourcetableinsidethedataflow.3. JointhesetablesusingtheQuerytransformwiththejoinconditions,asshowninthe

followingscreenshot:

4. IntheoutputschemaoftheQuerytransform,createtwocolumns:ROWIDofthedatatypeinteger,withthefollowingfunctionasamapping:gen_row_num(),andDATAcolumnofthedatatypevarchar(255),withthefollowingmapping:PERSON.FIRSTNAME||’’||PERSON.MIDDLENAME||’’||PERSON.LASTNAME||’’||PERSONPHONE.PHONENUMBER||’’||EMAILADDRESS.EMAILADDRESS

5. Now,whenwehavepreparedthesourcefieldthatwewillbecleansing,let’simportandconfiguretheData_Cleansetransformsthemselves.DraganddroptheData_CleansetransformobjectsfromLocalObjectLibrary|Transforms|DataQualitytoyourdataflow.PleaserefertothefollowingstepsaseachData_Cleansetransformobjectwillbeimportedandconfigureddifferently.

6. ThefirstData_CleanseobjectwillbeparsingourDATAcolumntoextracttheemailaddressoftheperson.Whenimportingthetransformobjectintothedataflow,choosetheBase_DataCleanseconfiguration.

7. RenametheimportedData_CleansetransformtoEmail_DataCleanseandjointheQuerytransformoutputtoit.

8. OpentheEmail_DataCleansetransformeditorintheworkspacetoconfigureit.9. OntheInputtab,selectEMAIL1intheTransformInputFieldNamecolumnand

mapittotheDATAsourcefield.10. OntheOptionstab,choosePERSON_FIRMasacleansingpackagenameand

configuretherestoftheoptions,asshowninthefollowingscreenshot:

11. OntheOutputtab,selecttheEMAILfield(ofthePARSEDfieldclassrelatedtotheEMAIL1parentcomponent)tobeproducedbytheEmail_DataCleansetransform.ThatwillcreatetheEMAIL1_EMAIL_PARSEDcolumnintheoutputschemaoftheEmail_DataCleansetransform.PropagatethesourceRO0057IDcolumnaswell,whichwillbeusedtojointhecleanseddatasetstogetherinthelatersteps.

12. ClosetheEmail_DataCleanseeditorandimportthesecondData_CleansetransformwiththesameBase_DataCleanseconfiguration.RenametheimportedtransformobjecttoPhone_DataCleanse,joinittotheQuerytransformoutput,andopenitinthemainworkspaceforediting.

13. SelectthesametransformoptionsontheOptionstabasfortheEmail_DataCleansetransformexamplewejustsaw.

14. ChoosePHONE1astheinputparsingcomponent(TransformInputFieldName)andmapittothesourceDATAcolumnfromtheQuerytransformoutput.

15. OntheOutputtabofthePhone_DataCleansetransformeditor,choosethefollowingoutputfieldsfromthelist:

PARENT_COMPONENT FIELD_NAME FIELD_CLASS

NORTH_AMERICAN_PHONE1 NORTH_AMERICAN_PHONE PARSED

NORTH_AMERICAN_PHONE1 NORTH_AMERICAN_PHONE_EXTENSION PARSED

NORTH_AMERICAN_PHONE1 NORTH_AMERICAN_PHONE_LINE PARSED

NORTH_AMERICAN_PHONE1 NORTH_AMERICAN_PHONE_PREFIX PARSED

PHONE1 PHONE PARSED

16. Alsopropagatetwosourcefields,ROWIDandDATA,intotheoutputschemaofthePhone_DataCleansetransform.Closeittofinishediting.

17. WhenimportingthethirdData_Cleansetransform,selectthepredefinedEnglishNorthAmerica_DataCleanseconfigurationandrenamethetransformtoName_DataCleanse.

18. Openthetransformintheworkspaceforediting.YoudonothavetoconfigureanythingontheOptionstabthistime.So,selectthecomponentNAME_LINE1ontheInputtabandthefollowingfieldsontheOutputtab:

PARENT_COMPONENT FIELD_NAME FIELD_CLASS

PERSON1 FAMILY_NAME1 PARSED

PERSON1 GENDER STANDARDIZED

PERSON1 GIVEN_NAME1 PARSED

PERSON1 GIVEN_NAME2 PARSED

PERSON1 PERSON PARSED

19. ClosetheName_DataCleansetransformeditorandjoinallthreeData_CleanseoutputswithasingleJoinQuerytransform.UsetheROWIDcolumntojointhedatasetstogetherandremapthedefaultData_Cleanseoutputnamestomoremeaningfulnames,asshowninthefollowingscreenshot:

20. SpecifyPhone_DataCleanse.DATAISNOTNULLasajoinfilterintheJoinQuerytransformtoexcludetheemptyrecordsfromthemigration.

21. ImportthetargettemplatetableCLEANSE_RESULTstoredintheSTAGEdatastoretosavethecleansingresultsin.

22. Finally,yourdataflowshouldlooklikethis:

23. SaveandexecutethejobtoseethecleansingresultsintheCLEANSE_RESULTtable.

Howitworks…Inthefirstfewstepsoftheprecedingsequence,byconcatenationofthemultiplefieldsfromthesourceOLTPdatabase,wepreparedour“dirty”datacolumn,DATA,whichwasusedasasourcecolumnforallthreeData_Cleansetransforms.

WhenimportingtheData_Cleansetransform,DataServicesoffersyoutheoptiontochoosefromoneofthepredefinedconfigurations.TheBase_DataCleanseconfigurationrequiresyoutoconfigurethemandatoryoptionsmanuallyoryourimportedtransformobjectwillnotwork.

TheData_Cleansetransformisameremappingtooltomapyourinputcolumnstotherequiredparsingrulesanddesiredoutput.Parsingrulesandreferencedataaredefinedinthecleansingpackage,whichcouldbedevelopedandconfiguredbytheInformationStewardCleansingPackageBuildertool.Thistoolprovidesagraphicaluserinterfaceforthistask.Inthisrecipe,weareusingthedefaultcleansingpackagePERSON_FIRMavailableinDataServiceswithouttheneedtohaveInformationStewardinstalled.

NoteThedefaultPERSON_FIRMcleansingpackageallowsyoutoparseandstandardizedates,emails,firmdata,personnames,socialsecuritynumbers,andphonenumbers.

TheInputtaballowsyoutochoosethetypeofcomponentyouwouldliketoparsefromtheinputdataset.Pleasenotethatyoucannotspecifythesamefieldasasourceofdataformultiplecomponents.ThatiswhywehavetocreatethreedistinctData_CleansetransformobjectstoparsethesameDATAcolumnforemail,personname,andphonedata.Eachhasitsownconfigurationandmappingsfrominputcomponentstoadesiredsetofoutputfields.

ThesetoffieldsavailableontheOutputtabdependonwhichcomponentyouhavechosentoberecognizedandparsedontheInputtab,butitbasicallyincludesallpossibleinformationthatcanbeextractedforaselectedcomponent.Forexample,ifitisaPersonnamecomponent,outputdatacleansefieldsincludegivenname,secondgivenname,lastname,gender,andsimilarothers.

PropagationofanartificialROWIDcolumnallowsustojointhesplitdatasetstogetheraftertheyareprocessedbyData_Cleansetransforms.

ToviewtheresultdatausetheViewdataoptiononthetargettableobjectinthedatafloworopenSQLServerManagementStudioandrunthefollowingquerytoseetheparsedresults:selectDATA,EMAIL,PHONE,GIVEN_NAME,GIVEN_NAME_2ND,FAMILY_NAME,

GENDER_STANDARDIZED

fromdbo.CLEANSE_RESULT

Asyoucanseeinthefollowingscreenshot,Data_CleansetransformsdidaprettygoodjobofparsingtheinputDATAfield:

AninterestingresultisstoredintheGENDER_STANDARDIZEDcolumn.Basedontheparsingrulesandreferencedataavailable,DataServicessuggestshowaccuratethedeterminationofgendercouldbebasedsolelyontheavailablegivenandlastnames.

There’smore…Asmentionedbefore,DataServiceshasgreatDataQualitycapabilities.Thisisahugetopicfordiscussion,andwe’vejustscratchedthesurfacebyshowingyouonetransformfromthistoolset.ThispowerfulfunctionalityworksbestwhenDataServicesisintegratedwithInformationSteward.Youcanbuildyourowncleansingpackagestoparsethemigrateddatamoreefficientlyandaccurately.PleaserefertoChapter12,IntroductiontoInformationSteward,formoredetails.

Chapter8.OptimizingETLPerformanceIfyoutriedallthepreviousrecipesfromthebook,youcanconsideryourselffamiliarwiththebasicdesigntechniquesavailableinDataServicesandcanperformprettymuchanyETLdevelopmenttask.Startingfromthischapter,wewillbeginusingadvancedevelopmenttechniquesavailableinDataServices.ThisparticularchapterwillhelpyoutounderstandhowtheexistingETLprocessescanbeoptimizedfurthertomakesurethattheyrunquicklyandefficiently,consumingaslesscomputerresourcesaspossiblewiththeleastamountofexecutiontime.

Optimizingdataflowexecution–push-downtechniquesOptimizingdataflowexecution–theSQLtransformOptimizingdataflowexecution–theData_TransfertransformOptimizingdataflowreaders–lookupmethodsOptimizingdataflowloaders–bulk-loadingmethodsOptimizingdataflowexecution–performanceoptions

IntroductionDataServicesisapowerfuldevelopmenttool.Itsupportsalotofdifferentsourceandtargetenvironments,allofwhichworkdifferentlywithregardtoloadingandextractingdatafromthem.Thisiswhyitisrequiredofyou,asanETLdeveloper,tobeabletoapplydifferentdesignmethods,dependingontherequirementsofyourdatamigrationprocessesandtheenvironmentthatyouareworkingwith.

Inthischapter,wewillreviewthemethodsandtechniquesthatyoucanusetodevelopdatamigrationprocessesinordertoperformtransformationsandmigratedatafromthesourcetotargetmoreeffectively.Thetechniquesdescribedinthischapterareoftenconsideredasbestpractices,butdokeepinmindthattheirusagehastobejustified.Theyallowyoutomoveandtransformyourdatafaster,consumingfewerprocessingresourcesontheETLengine’sserverside.

Optimizingdataflowexecution–push-downtechniquesTheExtract,Transform,andLoadsequencecanbemodifiedtoExtract,Load,andTransformbydelegatingthepowerofprocessingandtransformingdatatothedatabaseitselfwherethedataisbeingloadedto.

Weknowthattoapplytransformationlogictoaspecificdatasetwehavetofirstextractitfromthedatabase,thenpassitthroughtransformobjects,andfinallyloaditbacktothedatabase.DataServicescan(andmostofthetime,should,ifpossible)delegatesometransformationlogictothedatabaseitselffromwhichitperformstheextract.ThesimplestexampleiswhenyouareusingmultiplesourcetablesinyourdataflowjoinedwithasingleQuerytransform.Insteadofextractingeachtable’scontentsseparatelyontoanETLboxbysendingmultipleSELECT*FROM<table>requests,DataServicescansendthegeneratedsingleSELECTstatementwithproperSQLjoinconditionsdefinedintheQuerytransform’sFROMandWHEREtabs.Asyoucanprobablyunderstand,thiscanbeveryefficient:insteadofpullingmillionsofrecordsintotheETLbox,youmightendupwithgettingonlyafew,dependingonthenatureofyourQueryjoins.SometimesthisprocessshortenstoacompletezeroprocessingontheDataServicesside.Then,DataServicesdoesnotevenhavetoextractthedatatoperformtransformations.WhathappensinthisscenarioisthatDataServicessimplysendstheSQLstatementinstructionsintheformofINSERTINTO…SELECTorUPDATE…FROMstatementstoadatabasewhenallthetransformationsarehardcodedinthoseSQLstatementsdirectly.

ThescenarioswhenDataServicesdelegatesthepartsoforalltheprocessinglogictotheunderlyingdatabasearecalledpush-downoperations.

Inthisrecipe,wewilltakealookatdifferentkindsofpush-downoperations,whatrulesyouhavetofollowtomakepush-downworkfromyourdesignedETLprocesses,andwhatpreventspush-downsfromhappening.

GettingreadyAsastartingexample,let’susethedataflowdevelopedintheLoadingdatafromtabletotable–lookupsandjoinsrecipeinChapter4,Dataflow–Extract,Transform,andLoad.Pleaserefertothisrecipetorebuildthedataflowif,forsomereason,youdonothaveitinyourlocalrepositoryanymore.

Push-downoperationscanbeoftwodifferenttypes:

Partialpush-downs:Apartialpush-downiswhenOptimizersendstheSELECTqueryjoiningmultiplesourcetablesusedinadatafloworsendsoneSELECTstatementtoextractdatafromaparticulartablewithmappinginstructionsandfilteringconditionsfromtheQuerytransformhardcodedinthisSELECTstatement.Fullpush-downs:Afullpush-downiswhenalldataflowlogicisreformedbyOptimizerinasingleSQLstatementandsenttothedatabase.ThemostcommonstatementsgeneratedinthesecasesarecomplexINSERT/UPDATEandMERGEstatements,whichincludeallsourcetablesfromthedataflowjoinedtogetherandtransformationsintheformofdatabasefunctionsappliedtothetablecolumns.

Howtodoit…1. TobeabletoseewhatSQLquerieshavebeenpusheddowntothedatabase,openthe

dataflowintheworkspacewindowandselectValidation|DisplayOptimizedSQL….

2. TheOptimizedSQLwindowshowsallqueriesgeneratedbyDataServicesOptimizerandpusheddowntothedatabaselevel.Inthefollowingscreenshot,youcanseetheELECTqueryandpartofthedataflowlogicwhichthisstatementrepresents:

3. Let’strytopushdownlogicfromtherestoftheQuerytransforms.Ideally,wewouldliketoperformafullpush-downtothedatabaselevel.

4. TheLookup_PhoneQuerytransformcontainsafunctioncallwhichextractsthePHONENUMBERcolumnfromanothertable.ThislogiccannotbeincludedasisbecauseOptimizercannottranslateinternalfunctioncallsintoSQLconstruction,whichcouldbeincludedinthepush-downstatement.

5. Let’stemporarilyremovethisfunctioncallbyspecifyingahardcodedNULLvalueforthePHONENUMBERcolumn.Justdeleteafunctioncallandcreateanewoutputcolumninsteadofthevarchar(25)datatype.

6. ValidateandsavethedataflowandopentheOptimizedSQLwindowagaintoseetheresultofthechanges.Straightaway,youcanseehowlogicfromboththeLookup_PhoneandDistinctQuerytransformswereincludedintheSELECTstatement:thedefaultNULLvalueforanewcolumnandDISTINCToperatoratthebeginningofthestatement:

7. Whatremainsforthefullpush-downistheloadingpartwhenalltransformationsandselecteddatasetsareinsertedintothetargettablePERSON_DETAILS.Thereasonwhythisdoesnothappeninthisparticularexampleisbecausethesourcetablesandtargettablesresideindifferentdatastoreswhichconnecttothedifferentdatabases:OLTP(AdventureWorks_OLTP)andSTAGE.

8. SubstitutethePERSON_DETAILStargettablefromtheDS_STAGEdatastorewithanewtemplatetable,PERSON_DETAILS,createdintheDBOschemaofOLTP.

9. Asachange,youcanseethatOptimizernowfullytransformsdataflowlogicintoapushed-downSQLstatement.

Howitworks…DataServicesOptimizerwantstoperformpush-downoperationswheneverpossible.Themostcommonreasons,aswedemonstratedduringtheprecedingsteps,forpush-downoperationsnotworkingareasfollows:

Functions:WhenfunctionsusedinmappingscannotbeconvertedbyOptimizertosimilardatabasefunctionsingeneratedSQLstatements.Inourexample,thelookup_ext()functionpreventspush-downfromhappening.Oneoftheworkaroundsforthisistosubstitutethelookup_ext()functionwithanimportedsourcetableobjectjoinedtothemaindatasetwiththehelpoftheQuerytransform(seethefollowingscreenshot):

Transformobjects:WhentransformobjectsusedinadataflowcannotbeconvertedbyOptimizertorelativeSQLstatements.Sometransformsaresimplynotsupportedforpush-down.Automaticdatatypeconversions:Thesecansometimespreventpush-downfromhappening.Differentdatasources:Forpush-downoperationstoworkforthelistofsourceortargetobjects,thoseobjectsmustresideinthesamedatabaseormustbeimportedintothesamedatastore.Iftheyresideindifferentdatabases,dblinkconnectivityshouldbeconfiguredonthedatabaselevelbetweenthosedatabases,anditshouldbeenabledasaconfigurationoptioninthedatastoreobjectproperties.AllDataServicescandoissendaSQLstatementtoonedatabasesource,soitislogicalthatifyouwanttojoinmultipletablesfromdifferentdatabasesinasingleSQLstatement,youhavetomakesurethatconnectivityisconfiguredbetweendatabases,andthenyoucanrunSQLdirectlyonthedatabaselevelbeforeevenstartingtodeveloptheETLcodeinDataServices.

WhatisalsoimportanttorememberisthatDataServicesOptimizercapabilitiesdependonthetypeofunderlyingdatabasethatholdsyoursourceandtargettableobjects.Ofcourse,ithastobeadatabasethatsupportstheSQLstandardlanguageasOptimizercansendthepush-downinstructionsonlyintheformofSQLstatements.

Sometimes,youactuallywanttopreventpush-downsfromhappening.Thiscanbethecaseif:

ThedatabaseisbusytotheextentthatitwouldbequickertodotheprocessingontheETLboxside.Thisisararescenario,butstillsometimesoccursinreallife.Ifthisisthecase,youcanuseoneofthemethodswejustdiscussedtoartificiallypreventthepush-downfromhappening.YouwanttoactuallymakerowsgothroughtheETLboxforauditingpurposesortoapplyspecialDataServicesfunctionswhichdonotexistatthedatabaselevel.Inthesecases,thepush-downwillautomaticallybedisabledandwillnotbeusedbyDataServicesanyway.

Optimizingdataflowexecution–theSQLtransformSimplyput,theSQLtransformallowsyoutospecifySQLstatementsdirectlyinsidethedataflowtoextractsourcedatainsteadofusingimportedsourcetableobjects.Technically,ithasnothingtodowithoptimizingtheperformanceofETLasitisnotagenerallyrecommendedpracticetosubstitutethesourcetableobjectswiththeSQLtransformcontaininghard-codedSELECTSQLstatements.

Howtodoit…1. TakethedataflowusedinthepreviousrecipeandselectValidation|Display

OptimizedSQL…toseethequerypusheddowntothedatabaselevel.WearegoingtousethisquerytoconfigureourSQLtransformobject,whichwillsubstituteallsourcetableobjectsontheleft-handsideofthedataflow.

2. OntheOptimizedSQLwindow,clickonSaveAs…tosavethispush-downquerytothefile.

3. Drag-and-droptheSQLtransformfromLocalObjectLibrary|Transforms|Platformintoyourdataflow.

4. Nowyoucanremoveallobjectsontheleft-handsideofthedataflowpriortotheLookup_PhoneQuerytransform.

5. OpentheSQLtransformforeditinginaworkspacewindow.ChooseOLTPasadatastoreandcopyandpastethequerysavedpreviouslyfromyourfileintotheSQLtextfield.TocompletetheSQLtransformconfiguration,createoutputschemafieldsofappropriatedatatypeswhichmatchthefieldsreturnedbytheSELECTstatement.

6. ExittheSQLtransformeditorandlinkittothenextLookup_PhoneQuerytransform.OpenLookup_Phoneandmapthesourcecolumnstotarget.

7. Pleasenotethatthedataflowdoesnotperformanynativepush-downqueriesanymore,andwillgiveyouthefollowingwarningmessageifyoutrytodisplayoptimizedSQL:

8. Validatethejobbeforeexecutingittomakesuretherearenoerrors.

Howitworks…Asyoucansee,thestructureoftheSQLtransformisprettysimple.Therearenotmanyoptionsavailableforconfiguration.

Datastore:ThisoptiondefineswhichdatabaseconnectionwillbeusedtopasstheSELECTquerytoDatabasetype:ThisoptionprettymuchduplicatesthevaluedefinedforthespecifieddatastoreobjectCache:ThisoptiondefineswhetherthedatasetreturnedbythequeryhastobecachedontheETLboxArrayfetchsize:ThisoptionbasicallycontrolstheamountofnetworktrafficgeneratedduringdatasettransferfromdatabasetoETLboxUpdateschema:ThisbuttonallowsyoutoquicklybuildthelistofschemaoutputcolumnsfromtheSQLSELECTstatementspecifiedintheSQLtextfield

ThetwomostcommonreasonswhywouldyouwanttouseSQLtransforminsteadofdefiningsourcetableobjectsareasfollows:

Simplicity:Sometimes,youdonotcareaboutanythingelseexceptgettingthingsdoneasfastaspossible.SometimesyoucangettheextractrequirementsintheformofaSELECTstatement,orifyouwanttouseatestedSELECTqueryinyourETLcodestraightaway.ToutilizedatabasefunctionalitywhichdoesnotexistinDataServices:ThisisusuallyapoorexcuseasexperiencedETLdeveloperscandoprettymuchanythingwithstandardDataServicesobjects.However,somedatabasescanhaveinternalnon-standardSQLfunctionsimplementedwhichcanperformcomplextransformations.Forexample,inNetezzayoucanhavefunctionswritteninC++,whichcanbeutilizedinstandardSQLstatementsand,mostimportantly,willbeusingthemassive-parallelprocessingfunctionalityoftheNetezzaengine.Ofcourse,DataServicesOptimizerisnotawareofthesefunctionsandtheonlywaytousethemistorundirectSELECTSQLstatementsagainstthedatabase.IfyouwanttocallaSQLstatementlikethisfromDataServices,themostconvenientwaytodoitfromwithinadataflowistousetheSQLtransformobjectinsidethedataflow.Performancereasons:Onceinawhile,youcangetasetofsourcetablesjoinedtoeachinadataflowforwhichOptimizer—forsomereasonorother—doesnotperformapush-downoperation.Youareveryrestrictedinthewaysyoucancreateandutilizedatabaseobjectsinthisparticulardatabaseenvironment.Insuchcases,usingahard-codedSELECTSQLstatementcanhelpyoutomaintainanadequatelevelofETLperformance.

Asageneralpractice,IwouldrecommendthatyouavoidSQLtransformsasmuchaspossible.Theycancomeinhandysometimes,butwhenusingthem.younotonlylosetheadvantageofutilizingDataServices,theInformationStewardreportingfunctionality,andabilitytoperformauditingoperations,youalsopotentiallycreatebigproblemsfor

yourselfintermsofETLdevelopmentprocess.TablesusedintheSELECTstatementscannotbetracedwiththeViewwereusedfeature.Theycanbemissingfromyourdatastores,whichmeansyoudonothaveacomprehensiveviewofyourenvironmentandunderlyingdatabaseobjectsutilizedbyhidingsourcedatabasetablesinsidetheETLcoderatherthanhavingthemondisplayinLocalObjectLibrary.

ThisobviouslymakesETLcodehardertomaintainandsupport.NottomentionthatmigrationtoanotherdatabasebecomesaproblemasyouwouldmostlikelyhavetorewriteallthequeriesusedinyourSQLtransforms.

NoteTheSQLtransformpreventsthefullpush-downfromhappening,sobecareful.OnlytheSELECTqueryinsidetheSQLtransformispusheddowntodatabaselevel.TherestofthedataflowlogicwillbeexecutedontheETLboxevenifthefullpush-downwasworkingbefore,whenyouhadsourcetableobjectsinsteadoftheSQLtransform.

Inotherwords,theresultdatasetfortheSQLtransformalwaystransferredtotheETLbox.ThatcanaffectthedecisionsaroundETLdesign.Fromtheperformanceperspective,itispreferabletospendmoretimebuildingadataflowbasedonthesourceobjecttablesbutforwhichDataServicesperformsthefullpush-down(producingtheINSERTINTO…SELECTstatement),ratherthanquicklybuildingthedataflowwhichwilltransferdatasetsbackandforthtothedatabase,increasingtheloadtimesignificantly.

Optimizingdataflowexecution–theData_TransfertransformThetransformobjectData_Transferisapureoptimizationtoolhelpingyoutopushdownresource-consumingoperationsandtransformationslikeJOINandGROUPBYtothedatabaselevel.

Gettingready1. TakethedataflowfromtheLoadingdatafromaflatfilerecipeinChapter4,Dataflow–Extract,Transform,andLoad.ThisdataflowloadstheFriends_*.txtfileintoaSTAGE.FRIENDStable.

2. ModifytheFriends_30052015.txtfileandremovealllinesexcepttheonesaboutJaneandDave.

3. Inthedataflow,addanothersourcetable,OLTP.PERSON,andjoinittoasourcefileobjectintheQuerytransformbythefirst-namefield.PropagatethePERSONTYPEandLASTNAMEcolumnsfromthesourceOLTP.PERSONtableintotheoutputQuerytransformschema,asshownhere:

Howtodoit…OurgoalwillbetoconfigurethisnewdataflowtopushdowntheinsertofthejoineddatasetofdatacomingfromthefileanddatacomingfromtheOLTP.PERSONtabletoadatabaselevel.

BycheckingtheOptimizedSQLwindow,youwillseethattheonlyquerysenttoadatabasefromthisdataflowistheSELECTstatementpullingallrecordsfromthedatabasetableOLTP.PERSONtotheETLbox,whereDataServiceswillperformanin-memoryjoinofthisdatawithdatacomingfromthefile.It’seasytoseethatthistypeofprocessingmaybeextremelyinefficientifthePERSONtablehasmillionsofrecordsandtheFRIENDStablehasonlyacoupleofthem.ThatiswhywedonotwanttopullallrecordsfromthePERSONtableforthejoinandwanttopushdownthisjointothedatabaselevel.

Lookingatthedataflow,wealreadyknowthatforthelogictobepusheddown,thedatabaseshouldbeawareofallthesourcedatasetsandshouldbeabletoaccessthembyrunningasingleSQLstatement.TheData_TransfertransformwillhelpustomakesurethattheFriendsfileispresentedtoadatabaseasatable.Followthesestepstoseehowitcanbedone:

1. AddtheData_TransferobjectfromLocalObjectLibrary|Transforms|Data

Integratorintoyourdataflow,puttingitbetweenthesourcefileobjectandtheQuerytransform.

2. EdittheData_Transferobjectbyopeningitinaworkspacewindow.SetTransfertypetoTableandspecifythenewtransfertableintheTableoptionssectionwithSTAGE.DBO.FRIENDS_FILE.

3. ClosetheDataTransfertransformeditorandselectValidation|DisplayOptimizedSQL…toseethequeriespusheddowntoadatabase.YoucanseethattherearenowtwoSELECTstatementsgeneratedtopulldatafromtheOLTP.PERSONandSTAGE.FRIENDS_FILEtables.

ThejoinbetweenthesetwodatasetshappensontheETLbox.ThenthemergeddatasetissentbacktothedatabasetobeinsertedintotheDS_STAGE.FRIENDStable.

4. AddanotherData_TransfertransformationbetweenthesourcetablePERSONandtheQuerytransform.IntheData_Transferconfigurationwindow,setTransfertypetoTableandspecifyDS_STAGE.DBO.DT_PERSONasthedatatransfertable.

5. ValidateandsavethedataflowanddisplaytheOptimizedSQLwindow.

Nowyoucanseethatwesuccessfullyimplementedafullpush-downofdataflowlogic,

insertingmergeddatafromtwosourceobjects(oneofwhichisaflatfile)intoastagingtable.Intheprecedingscreenshot,logicinthesectionmarkedasredisrepresentedbyaSQLstatementINSERTpusheddowntothedatabaselevel.

Howitworks…Underthehood,Data_Transfertransformcreatesasubprocessthattransfersthedatatothespecifiedlocation(fileortable).Simplyput,Data_Transferisatargetdataflowobjectinthemiddleofadataflow.Ithasalotofoptionssimilartowhatothertargettableobjectshave;inotherwords,youcansetupabulk-loadingmechanism,runPre-LoadCommandsandPost-LoadCommands,andsoon.

ThereasonwhyIcalledData_TransferapureoptimizationtoolisbecauseyoucanredesignanydataflowtodothesamethingthatData_Transferdoeswithoutusingit.Allyouhavetodoistosimplysplityourdataflowintwo(orthree,forthedataflowinourexample).InsteadofforwardingyourdataintoaData_Transfertransform,youforwardittoanormaltargetobjectandthen,inthenextdataflow,youusethisobjectasasource.

NoteWhatData_Transferstilldoes,whichcannotbedoneeasilywhenyouaresplittingdataflows,isautomaticallycleanuptemporarydatatransfertables.

Itiscriticaltounderstandhowpush-downmechanismsworkinDataServicestobeabletoeffectivelyusetheData_Transfertransform.Puttingittouseatthewrongplaceinadataflowcandecreaseperformancedrastically.

WhyweusedasecondData_TransfertransformobjectOurgoalwastomodifythedataflowinsuchawayastogetafullpush-downSQLstatementtobegenerated:INSERTINTOSTAGE.FRIENDSSELECT<joinedPERSONandFRIENDSdatasets>.

Aswerememberfromthepreviousrecipe,therecouldbemultiplereasonswhyfullpush-downdoesnotwork.Oneofthesereasons,whichiscausingtroubleinourcurrentexample,isthatthePERSONtableresidesinadifferentdatabase,whileourdatatransfertable,FRIENDS_FILE,andtargettable,FRIENDS,resideinthesameSTAGEdatabase.

Tomakethefullpush-downwork,wehadtouseasecondData_TransfertransformobjecttotransferdatafromtheOLTP.PERSONtableintoatemporarytablelocatedinaSTAGEdatabase.

WhentouseData_TransfertransformWheneveryouencounterasituationwhereadataflowhastoperformavery“heavy”transformation(saytheGROUPBYoperation,forexample)orjointwoverybigdatasetsandthisoperationishappeningonanETLbox.Inthesecases,itismuchquickertotransfertherequireddatasetstothedatabaselevelsothattheresource-intensiveoperationcanbecompletedtherebythedatabase.

There’smore…OneofthegoodexamplesofausecasefortheData_TransfertransformiswhenyouhavetoperformtheGROUPBYoperationinaQuerytransformrightbeforeinsertingdataintoatargettableobject.ByplacingData_TransferrightbeforetheQuerytransformattheendofthedataflow,youcanquicklyinsertthedatasetprocessedbydataflowlogicbeforetheQuerytransformwiththeGROUPBYoperationandthenpushdowntheINSERTandGROUPBYoperationsinasingleSQLstatementtoadatabaselevel.

Whenyouperformthetransformationsondatasetswhichincludemillionsofrecords,usingtheData_Transfertransformcansaveyouminutes,andsometimeshours,dependingonyourenvironmentandthenumberofprocessedrecords.

Optimizingdataflowreaders–lookupmethodsTherearedifferentwaysinwhichtoperformthelookupofarecordfromanothertableinDataServices.Thethreemostpopularonesare:atablejoinwithaQuerytransform,usingthelookup_ext()function,andusingthesql()function.

Inthisrecipe,wewilltakealookatallthesemethodsanddiscusshowtheyaffecttheperformanceofETLcodeexecutionandtheirimpactonadatabaseusedtosourcedatafrom.

GettingreadyWewillbeusingthesamedataflowasinthefirstrecipe,theonewhichpopulatesthePERSON_DETAILSstagetablefrommultipleOLTPtables.

Howtodoit…WewillperformalookupforthePHONENUMBERcolumnofapersonfromtheOLTPtablePERSONPHONEinthreedifferentways.

LookupwiththeQuerytransformjoin1. Importthelookuptableintoadatastoreandaddthetableobjectasasourceinthe

dataflowwhereyouneedtoperformthelookup.2. UsetheQuerytransformtojoinyourmaindatasetwiththelookuptableusingthe

BUSINESSENTITYIDreferencekeycolumn,whichresidesinbothtables.

Lookupwiththelookup_ext()function1. RemovethePERSONPHONEsourcetablefromyourdataflowandclearoutthejoin

conditionsintheLookup_PhoneQuerytransform.2. Asyouhaveseenintherecipesinpreviouschapters,thelookup_ext()functioncan

beexecutedasafunctioncallintheQuerytransformoutputcolumnslist.Theotheroptionistocallthelookup_ext()functioninthecolumnmappingsection.Forexample,saythatwewanttoputanextraconditiononwhenwewanttoperformalookupforspecificvalue.

InsteadofcreatinganewfunctioncallforlookingupthePHONENUMBERcolumnforallmigratedrecords,let’sputintheconditionthatwewanttoexecutethelookup_ext()functiononlywhentherowhasnonemptyADDRESSLINE1,CITY,andCOUNTRYcolumns;otherwise,wewanttousethedefaultvalueUNKNOWNLOCATION.

3. InsertthefollowinglinesintheMappingsectionofthePHONENUMBERcolumninsidetheLookup_PhoneQuerytransform:ifthenelse(

(Get_Country.ADDRESSLINE1ISNULL)OR

(Get_Country.CITYISNULL)OR

(Get_Country.COUNTRYISNULL),‘UNKNOWNLOCATION’,

lookup_ext()

)

4. Nowdouble-clickonthelookup_ext()texttohighlightonlythelookup_extfunctionandright-clickonthehighlightedareaforthecontextmenu.

5. Fromthiscontextmenu,selectModifyFunctionCalltoopentheLookup_extparameterconfigurationwindow.ConfigureittoperformalookupforaPHONENUMBERfieldvaluefromthePERSONPHONEtable.

Afterclosingthefunctionconfigurationwindow,youcanseethefullcodegeneratedbyDataServicesforthelookup_ext()functionintheMappingsection.

Whenselectingtheoutputfield,youcanseeallsourcefieldsusedinitsMappingsectionhighlightedintheSchemaInsectionontheleft-handside.

Lookupwiththesql()function1. OpentheLookup_PhoneQuerytransformforeditingintheworkspaceandclearout

allcodefromthePHONENUMBERmappingsection.2. PutthefollowingcodeintheMappingsection:

sql(‘OLTP’,‘selectPHONENUMBERfromPerson.PERSONPHONEwhereBUSINESSENTITYID=

Howitworks…QuerytransformjoinsTheadvantagesofthismethodare:

Codereadability:Itisveryclearwhichsourcetablesareusedintransformationwhenyouopenthedataflowinaworkspace.Push-downlookuptothedatabaselevel:ThiscanbeachievedbyincludingalookuptableinthesameSELECTstatement.Yes,assoonasyouhaveplacedthesourcetableobjectinthedataflowandjoineditproperlywithotherdatasourcesusingtheQuerytransform,thereisachancethatitwillbepusheddownasasingleSQLSELECTstatement,allowingthejoiningofsourcetablesatthedatabaselevel.DSmetadatareportfunctionalityandimpactanalysis:Themaindisadvantageofthismethodcomesnaturallyfromitsadvantage.Ifarecordfromthemaindatasetreferencesmultiplerecordsinthelookuptablebythekeycolumnused,theoutputdatasetwillincludemultiplerecordswithallthesevalues.ThatishowstandardSQLqueryjoinswork,andtheDataServicesQuerytransformworksinthesameway.Thiscouldpotentiallyleadtoduplicatedrecordsinsertedintoatargettable(duplicatedbykeycolumnsbutwithdifferentvaluesinthelookupfield,forexample).

lookup_ext()TheoppositeofaQuerytransform,thisfunctionhidesthesourcelookuptableobjectfromthedeveloperandfromsomeoftheDataServicesreportingfunctionality.Asyouhaveseen,itcanbeexecutedasafunctioncallorusedinthemappinglogicforaspecificcolumn.

Thisfunction’smainadvantageisthatitwillalwaysreturnasinglevaluefromthelookuptable.Youcanevenspecifythereturnpolicy,whichwillbeusedtodeterminethesinglevaluetoreturn—MAXorMIN—withtheabilitytoorderthelookuptabledatasetbyanycolumn.

sql()Similartothelookup_ext()functioninthepresentedexample,itisrarelyusedthatwayaslookup_ext()fetchesrowsfromthelookuptablemoreefficiently,ifallyouwanttodoistoextractvaluesfromthelookuptablereferencingkeycolumns.

Atthesametime,thesql()functionmakespossibletheimplementationofverycomplexandflexiblesolutionsasitallowsyoutopassanySQLstatementthatcanbeexecutedonthedatabaseside.Thiscanbetheexecutionofstoredprocedures,thegenerationofthesequencenumbers,runninganalyticalqueries,andsoon.

Asageneralrule,though,theusageofthesql()functioninthedataflowcolumnmappingsisnotrecommended.Themainreasonforthisisperformance,asyouwillseefurtheron.DataServiceshasarichsetofinstrumentstoperformthesametaskbutwithapropersetofobjectsandETLcodedesign.

PerformancereviewLet’squicklyreviewdataflowexecutiontimesforeachoftheexplainedmethods.

Thefirstmethod:ThelookupwiththeQuerytransformtook6.4seconds.

Thesecondmethod:Thelookupwiththelookup_ext()functiontook6.6seconds.

Thethirdmethod:Thisusedthesql()functionandtook73.3seconds.

Thefirsttwomethodslooklikethemethodswithsimilareffectiveness,butthatisonlybecausethenumberofrowsandthesizeofthedatasetusedisverysmall.Thelookup_ext()functionallowstheusageofthedifferentcachemethodsforthelookupdataset,whichmakesitpossibletotuneandconfigureitdependingonthenatureofyourmaindataandthatofthelookupdata.ItcanalsobeexecutedasaseparateOSprocess,increasingtheeffectivenessoffetchingthelookupdatafromthedatabase.

Thethirdfigureforthesql()function,onthecontrary,showstheperfectexampleofextremelypoorperformancewhenthesql()functionisusedinthecolumnmappings.

Optimizingdataflowloaders–bulk-loadingmethodsBydefault,allrecordsinsideadataflowcomingtoatargettableobjectaresentasseparateINSERTcommandstoatargettableatthedatabaselevel.IfmillionsofrecordspassthedataflowandtransformationhappensontheETLboxwithoutpush-downs,theperformanceofsendingmillionsofINSERTcommandsoverthenetworkbacktoadatabaseforinsertioncouldbeextremelyslow.Thatiswhyitispossibletoconfigurethealternativeloadmethodsonthetargettableobjectinsideadataflow.Thesetypesofloadsarecalledbulk-loadloads.Bulk-loadmethodsaredifferentinnature,butallofthemhavethemainprincipleandachievethesamegoal—theyavoidtheexecutionofmillionsofINSERTstatementsforeachmigratedrecord,providingalternativewaysofinsertingdata.

Bulk-loadmethodsexecutedbyDataServicesforinsertingdataintoatargettablearecompletelydependentonthetypeoftargetdatabase.Forexample,OracleDatabaseDataServicescanimplementbulk-loadingthroughthefilesorthroughtheOracleAPI.

Bulk-loadingmechanismsforinsertingdataintoNetezzaorTeradataarecompletelydifferent.YouwillnoticethisstraightawayifyoucreatedifferentdatastoresconnectingtodifferenttypesofdatabasesandcomparethetargettableBulkLoaderOptionstabtothetargettableobjectfromeachofthesedatastores.

Fordetailedinformationabouteachbulk-loadmethodavailableforeachdatabase,pleaserefertoofficialSAPdocumentation.

Howtodoit…Toseethedifferencebetweenloadingdatainnormalmode—rowbyrow—andbulkloading,wehavetogeneratequiteasignificantnumberofrows.Todothis,takethedataflowfromapreviousrecipe,Optimizingdataflowexecution–theSQLtransform,andreplicateittocreateanothercopyforusinginthisrecipe.NameitDF_Bulk_Load.

Openthedataflowintheworkspacewindowforediting.

1. AddanewRow_GenerationtransformfromLocalObjectLibrary|Transforms|

Platformasasourceobjectandconfigureittogenerate50rows,startingwithrownumber1.

2. TheRow_Generationtransformisusedtomultiplythenumberofrowscurrentlybeingtransformedbythedataflowlogic.Previously,thenumberofrowsreturnedbythePerson_OLTPSQLtransformwasapproximately19,000.ByperformingaCartesianjoinoftheserecordsto50artificiallygeneratedrecords,wecangetalmost1millionrecordsinsertedinatargetPERSON_DETAILStable.ToimplementCartesianjoin,usetheQuerytransformbutwithoutspecifyinganyjoinconditionsandleavingthesectionempty.

3. Yourdataflowshouldlooklikethis:

4. Totestthecurrentdataflowexecutiontime,saveandrunthejob,whichincludesthisdataflow.Yourtargettable’sBulkLoaderOptiontabshouldbedisabled,andontheOptionstab,theDeletedatafromtablebeforeloadingflagshouldbeselected.

5. Theexecutiontimeofthedataflowis49seconds,andasyoucansee,ittook42secondsforDataServicestoinsert9,39,900recordsintothetargettable.

6. Toenablebulkloading,openthetargettableconfigurationintheworkspaceforediting,gototheBulkLoaderOptionstab,andcheckBulkload.Afterthat,setModetotruncateandleaveotheroptionsattheirdefaultvalues.

7. Saveandexecutethejobagain.8. Thefollowingscreenshotshowsthattotaldataflowexecutiontimewas27seconds,

andittook20secondsforDataServicestoloadthesamenumberofrecords.ThatistwotimesfasterthanloadingrecordsinnormalmodeintotheSQLServerdatabase.YourtimecouldbeslightlydifferentdependingonthehardwareyouareusingforyourDataServicesanddatabaseenvironments.

Howitworks…Availabilityofthebulk-loadmethodsistotallydependentonwhichdatabaseyouuseasatarget.DataServicesdoesnotperformanymagic;itsimplyutilizesbulk-loadingmethodsavailableinadatabase.

Thesemethodsaredifferentfordifferentdatabases,buttheprincipleofbulkloadingisusuallyasfollows:DataServicessendstherowstothedatabasehostasquicklyaspossible,writingthemintoalocalfile.Then,DataServicesusestheexternaltablemechanismavailableinthedatabasetopresentthefileasarelationaltable.Finally,itexecutesafewUPDATE/INSERTcommandstoquerythisexternaltableandinsertdataintoatargettablespecifiedasatargetobjectinaDataServicesdataflow.

TorunoneINSERT…SELECTFROMcommandismuchfasterthantoexecute1millionINSERTcommands.

Somedatabasesperformthesesmallinsertoperationsquiteeffectively,whileforothersthiscouldbeareallybigproblem.Inalmostallcases,ifwetalkaboutasignificantnumberofrecords,thebulk-loadingmethodwillalwaysbethequickerwaytoinsertdata.

Whentoenablebulkloading?Youhaveprobablynoticedthatassoonasyouenablebulkloadingintargettableconfiguration,theOptionstabbecomesgrayedout.Unfortunately,byenablingbulkloading,youloseallextrafunctionalityavailableforloadingdata,suchasautocorrectload,forexample.Thishappensbecauseofthenatureofthebulk-loadoperation.DataServicessimplypassesthedatatothedatabaseforinsertionandcannotperformextracomparisonoperations,whichareavailableforrow-by-rowinserts.

Theotherreasonfornotusingbulkloadingisthatenabledbulkloadingpreventsfullpush-downsfromoccurring.Ofcourse,inmostofthecasespush-downisthebestpossibleoptionintermsofexecutionperformance,soyouwouldneverthinkaboutenablingbulkloadingifyouhavefullpush-downworking.Forpartialpush-downs,whenyoupushdownonlySELECTqueriestogetdataontotheETLboxfortransformation,bulkloadingisperfectlyvalid.Youstillwanttosendrecordsbacktothedatabaseforinsertionandwanttodoitasquicklyaspossible.

Mostofthetime,bulkloadingdoesaperfectjobwhenyouarepassingabignumberofrowsforinsertionfromtheETLboxanddonotutilizeanyextraloadingoptionsavailableinDataServices.

Thebestadviceintermsofmakingdecisionstoenableornotenablebulkloadingonyourtargettableistoexperimentandtrydifferentwaysofinsertingdata.Thisisadecisionwhichshouldtakeintoaccountallparameters,suchasenvironmentconfiguration,workloadonaDataServicesETLbox,workloadonadatabase,andofcourse,thenumberofrowstobeinsertedintoatargettable.

Optimizingdataflowexecution–performanceoptionsWewillreviewafewextraoptionsavailablefordifferenttransformsandobjectsinDataServiceswhichaffectperformanceand,sometimes,thewayETLprocessesandtransformsdata.

GettingreadyForthisrecipe,usethedataflowfromtherecipeOptimizingdataflowreaders–lookupmethodsinthischapter.Pleaserefertothisrecipeifyouneedtocreateorrebuildthisdataflow.

Howtodoit…DataServicesperformance-relatedconfigurationoptionscanbeputunderthefollowingcategories:

DataflowperformanceoptionsSourcetableperformanceoptionsQuerytransformperformanceoptionsLookupfunctionsperformanceoptionsTargettableperformanceoptions

Inthefollowingsections,wewillreviewandexplainallofthemindetails.

DataflowperformanceoptionsToaccessdataflowperformanceoptions,right-clickonadataflowobjectandselectPropertiesfromthecontextmenu.

TheDegreeofparallelismoptionreplicatestransformprocessesinsidethedataflowaccordingtothenumberspecified.DataServicescreatesseparatesubdataflowprocessesandexecutestheminparallel.Atthepointsinthedataflowwheretheprocessingcannotbeparallelized,dataismergedbacktogetherfromdifferentsubdataflowprocessesinthemaindataflowprocess.IfthesourcetableusedinthedataflowispartitionedandthevalueintheDegreeofparallelismoptionishigherthan1,DataServicescanusemultiplereaderprocessestoreadthedatafromthesametable.Eachreaderreadsdatafromcorrespondingpartitions.Then,dataismergedorcontinuedtobeprocessedinparallelifthenexttransformobjectallowsparallelization.

FordetailedinformationonhowtheDegreeofParallelismoptionworks,pleaserefertotheofficialdocumentation,SAPDataServices:PerformanceOptimizationGuide.Youshouldbeverycarefulwiththisparameter.TheusageandvalueofDegreeofparallelismshoulddependonthecomplexityofthedataflowandontheresourcesavailableonyourDataServicesETLserver,suchasthenumberofCPUsandamountofmemoryused.

IftheUsedatabaselinksoptionisconfiguredonbothdatabaseandDataServicesdatastorelevels,databaselinkscanhelptoproducepush-downoperations.Usethisoptiontoenableordisabledatabaselinksusageinsideadataflow.

Cachetypedefineswhichtypeofcachewillbeusedinsideadataflowforcachingdatasets.APageablecacheisstoredontheETLserver’sphysicaldiskandIn-Memorykeepsthecacheddatasetinmemory.Ifthedataflowprocessesverylargedatasets,itis

recommendedthatyouuseapageablecachetonotrunoutofmemory.

SourcetableperformanceoptionsOpenyourdataflowintheworkspaceanddouble-clickonanysourcetableobjecttoopenthetableconfigurationwindow.

ArrayfetchsizeallowsyoutooptimizethenumberofrequestsDataServicessendstofetchthesourcedatasetontotheETLbox.Thehigherthenumberused,thefewertherequeststhatDataServiceshastosendtofetchthedata.Thissettingshouldbedependentonthespeedofyournetwork.Thefasteryournetworkis,thehigherthenumberyoucanspecifytomovethedatainbiggerchunks.Bydecreasingthenumberofrequests,youcanpotentiallyalsodecreasetheCPUusageconsumptiononyourETLbox.

Joinrankspecifiesthe“weight”ofthetableusedinQuerytransformswhenyoujoinmultipletables.Thehighertherank,theearlierthetablewillbejoinedtotheothertables.IfyouhaveeveroptimizedSQLstatements,youknowthatspecifyingbigtablesinthejoinconditionsearliercanpotentiallydecreasetheexecutiontime.Thisisbecausethenumberofrecordsafterthefirstjoinpaircanbedecreaseddramaticallythroughinnerjoins,forexample.Thismakesthejoinpairsfurtheronproducesmallerdatasetsandrunquicker.ThesameprincipleapplieshereinDataServicesbuttospecifytheorderofjoinpairs,youcanusetherankoption.

CachecanbesetupifyouwantthesourcetabletobecachedontheETLserver.Thetypeofcacheusedisdeterminedbythedataflowcachetypeoption.

QuerytransformperformanceoptionsOpentheQuerytransformintheworkspacewindow:

Joinrankoffersthesameoptionsasdescribedearlierandallowsyoutospecifytheorderinwhichthetablesarejoined.

Cacheis,again,thesameasdescribedearlieranddefineswhetherthetablewillbecachedontheETLserver.

lookup_ext()performanceoptionsRight-clickontheselectedlookup_extfunctioninthecolumnmappingsectionoronthefunctioncallintheoutputschemaoftheQuerytransformandselectModifyFunctionCallinthecontextmenu:

Cachespecdefinesthetypeofcachemethodusedforthelookuptable.NO_CACHEmeansthat,foreveryrowinthemaindataset,aseparateSELECTlookupqueryisgenerated,extractingvaluefromthedatabaselookuptable.WhenPRE_LOAD_CACHEisused,thelookuptablefirstpulledtotheETLboxandcachedinmemoryoronthephysicaldisk(dependingonthedataflowcachetypeoption).DEMAND_LOAD_CACHEisamorecomplexmethodbestusedwhenyouarelookinguprepetitivevalues.Onlythenisitmostefficient.DataServicescachesonlyvaluesalreadyextractedfromthelookuptable.Ifitencountersanewkeyvaluethatdoesnotexistinthecachedtable,itmakesanotherrequesttothelookuptableinthedatabasetofinditandthencachesittoo.

Runasaseparateprocesscanbeencounteredinmanyothertransformsandobjectconfigurationoptions.Itisusefulwhenthetransformisperforminghigh-intensiveoperationsconsumingalotofCPUandmemoryresources.Ifthisoptionischecked,DataServicescreatesseparatesubdataflowprocessesthatperformthisoperation.Potentially,thisoptioncanhelpparallelizeobjectexecutionwithinadataflowandspeedupprocessingandtransformationssignificantly.Bydefault,theOScreatesasingleprocessforadataflow,andifnotparallelized,allprocessingisdonewithinthissingleOSprocess.RunasseparateprocesshelptocreatemultipleprocesseshelpingmaindataflowOSprocesstoperformallextracts,joinandcalculationsasfastaspossible.

TargettableperformanceoptionsClickonatargettabletoopenitsconfigurationoptionsintheworkspacewindow:

RowspercommitissimilartoArrayfetchsizebutdefineshowmanyrowsaresenttoadatabasewithinthesamenetworkpacket.Dodecreaseamountsofpacketswithrowsforinsertsenttoadatabaseyoucanincreasethisnumber.

Numberofloadershelpstoparallelizetheloadingprocesses.EnablepartitionsonthetableobjectsontheDatastorestabifthetablesarepartitionedatthedatabaselevel.Iftheyarenotpartitioned,setthesamenumberofloadersasDegreeofparallelism.

Chapter9.AdvancedDesignTechniquesThetopicswewillcoverinthischapterinclude:

ChangeDataCapturetechniquesAutomaticjobrecoveryinDataServicesSimplifyingETLexecutionwithsystemconfigurationsTransformingdatawiththePivottransform

IntroductionThischapterwillguideyouthroughtheadvancedETLdesignmethods.MostofthemwillutilizeDataServicesfeaturesandfunctionalityalreadyexplainedinthepreviouschapters.Asyouhaveprobablynoticed,therearemanywaystodothesamethinginDataServices.Themethodsandlogicyouapplytosolvethespecificproblemoftendependonenvironmentcharacteristicsandsomeotherconditions,suchasdevelopmentresourcesandextractrequirementsappliedtothesourcesystems.Onthecontrary,someofthemethodsandtechniquesexplainedfurtherdonotdependonallthesefactorsandcouldbeconsideredasETLdevelopmentbestpractices.

Inthischapterwewilldiscussaverypopularmethodofpopulatingslowlychangingdimensionsindatawarehouse,whichrequiretheuseofacombinationofDataServicestransformsanddataflowdesigntechniques.

WewillalsoreviewautomaticrecoverymethodsavailableinDataServices,whichallowyoutoeasilyrestartpreviouslyfailedjobswithoutperformingextrarecoverystepsforvariouscomponentsofETLcodeandunderlyingtargetdatastructures.

AnothertopicdiscussedinthischapteristheusageofsystemconfigurationsinDataServices.ThisfeatureallowsyoutosimplifyyourETLdevelopmentandmakesiteasytorunthesamejobsagainstvarioussourcesandtargetenvironments.

Finally,wewillreviewoneoftheadvancedDataServicestransformsthatenablesyoutoimplementthepivotingtransformationmethodonthepassingdataconvertingrowsintocolumnsandviceversa.

ChangeDataCapturetechniquesChangeDataCapture(CDC)isthemethodofdevelopingETLprocessestopropagatechangesinthesourcesystemintoyourdatawarehousefordimensiontables.

GettingreadyCDCisdirectlyrelatedtoanotherDWHconceptofSlowlyChangingDimensions(SCD),thedimensiontablesthatdatachangesconstantlythroughoutthelifeofdatawarehouse.

AgoodexamplewouldbetheEmployeedimensiontable,whichholdsdataontheemployeesinyourcompany.Asyoucanimagine,thistableisinconstantflux:newemployeesarehiredandsomeemployeesleavethecompany,changepositionsandroles,oreventransferbetweendepartments.AllthesechangeshavetobepropagatedtoanEmployeedimensiontableinDWHfromthesourcesystems,whichalwaysstoreonlythelateststateoftheEmployeedata.InDWH,inmostcases,formostofthedimensiontables,youwanttokeepthehistoricaldatatobeabletoderivethestateoftheEmployeedataataspecificpointoftimeinthepast.ThatiswhySCDtableshaveextrafieldstoaccommodatehistoricaldataandcanbepopulatedusingvariousmethods,dependingontheirtype.

TherearemanydifferenttypesofSCDtables,butwewillquicklydiscussonlythethreemainonesastherestarejustcombinationsofthesethree.WewillrefertoSCDtypenumbersaccordingtoRalphKimball’smethodologyinbrackets.

Asanexample,let’stakethecaseoftheEmployeedimensiontablewhenoneemployeeJohngetstransferredfrommarketingtofinance.

NohistorySCD(Type1)Yes,anohistorySCDtableisonethatdoesnotstorehistoricaldataatall.Recordsareinserted(newrecords)andupdated(changes).Takealookatthefollowingexample.

TheoriginalrecordforJohnlookslikethis:

ID NAME DEPARTMENT

1 John Marketing

Here’swhatthenewrecordlookslikeafterthechangesareapplied:

ID NAME DEPARTMENT

1 John Finance

ThistypeofSCDdoesnotkeephistoricalrecordsatall;asyoucansee,thereisnoinformationthatJohnhaseverworkedinadifferentdepartment.

LimitedhistorySCD(Type3)Alimitedhistorytableusesextrafieldsinthesamerecordtokeepthecurrentvalueandapreviousvalue,asshownhere:

ID NAME DEP_PREV DEP_CUR EFFECTIVE_DATE

1 John Marketing Finance 27/02/2015

Itis“limited”asyouhavetoaddextracolumnsforeverynew“historicalstate”oftherow.Intheprecedingexample,youcankeeptrackofonlythecurrentandpreviousvaluesoftherecord.

UnlimitedhistorySCD(Type2)Unlimitedhistoryispossibleifyoucreatemultiplerecordsforeachentity.Onlyonerecordrepresentsthecurrentvalue.OneofthevariationsofanunlimitedhistorySCDisshowninthefollowingtable:

KEY ID NAME DEPARTMENT START_DT END_DT CUR_FLAG

1 1 John Marketing 1582/01/01 27/02/2015 N

2 1 John Finance 27/02/2015 9999/12/31 Y

TheIDisanaturalkeyinthedimensiontable.ForJohn,thisis1.ThistypeofSCDrequiresthecreationofasurrogatekeytodefinetheuniquenessoftherecord.TheCUR_FLAGfielddefinescurrentrecord.TheSTART_DTandEND_DTcolumnsshowtheperiodoftimewhentherecordwasvalid/current.Notethatthesedatefieldsdonotrepresentanybusinessvaluesuchasstartemploymentdateordateofbirth.Theyjustshowthestartandenddatesoftheperiodwhentherecordwasvalid(orcurrent)andareonlyusedtoaccommodatepreservinghistoricalrecords.WhenpopulatinginitialrecordsforthefirsttimeinanSCDtable,youmayoftenwanttousedatesfromthedistantpastandfuture,suchas1582/01/01and9999/12/31,called“low”and“high”datevalues.Thisallowsuserstorunreportswhichretrievemoreaccuratehistoricalinformation.

ByusingalowdateintheSTART_DTfield,wemarktherecordasaninitialhistoricalrecordinourdimensiontable.ThesamegoesforusingahighdateintheEND_DTcolumn.ItalwayshasaCURR_INDflagsettoYandshowsthelatest(current)recordinthehistorytable.

EachtimeyoumakeachangetotheEmployeetable,inourcasetotheNAMEorDEPARTMENTfields,youhavetoupdatethe“current”recordbychangingtheEND_DTandCUR_FLAGfieldvalueswiththedateofchangeandN,respectively,andyoualsohavetoinsertanewrecordwithSTART_DTsettothedateofchangeandCUR_FLAGsettoY.

Inthisrecipe,wewillbuildadataflowthatpopulatestheSCDtableoftheunlimitedhistorytype(asshownintheType2example).DataServiceshasaspecialtransformobjectcalledHistory_Preserving,whichallowstheautomaticupdate/insertofthechangedandnewhistoryrecords.

Howtodoit…TobuildtheCDCprocess—whichwillupdateourtargetSCDtableindatawarehouse—fromasourceOLTPtable,weneedtohavetwodataflows.ThefirstwillextractdatafromthesourceOLTPsystemintoastagingtablelocatedintheSTAGEdatabase,andthesecondwillusethisSTAGEtabletocomparedatainitwiththetargettablecontentsandwillproducethehistoryrecords(inforofINSERTandUPDATESQLstatements)topropagatethedatechangesintoatargetSCDtable.

1. Createanewjobandnewextractdataflow,DF_OLTP_Extract_STAGE_Employee,that

extractstheEmployeetablefromtheHumanResourcesschemaintoastagingtable,STAGE_EMPLOYEE.

ForourfutureEmployeeSCDtable,wewillonlybeextractingthefollowinglistoffieldsfromOLTP.EMPLOYEE:

Field Description

BUSINESSENTITYID PrimarykeyforEmployeerecords

NATIONALIDNUMBER UniquenationalID

LOGINID Networklogic

ORGANIZATIONLEVEL Thedepthoftheemployeeinthecorporatehierarchy

JOBTITLE Worktitle

BIRTHDATE Dateofbirth

MARITALSTATUS M=Married,S=Single

GENDER M=Male,F=Female

HIREDATE Employeehiredonthisdate

SALARIEDFLAG Jobclassification

VACATIOINHOURS Numberofavailablevacationhours

SICKLEAVEHOURS Numberofavailablesickleavehours

MaponlythesefieldstotheoutputschemaoftheExtractQuerytransform.

2. Createanewdataflow,DF_STAGE_Load_DWH_Employee,andlinkthefirstextractdataflowtoitinthesamejob.

3. CreateanemptytargetSCDtable,EMPLOYEE,byusingtheCREATETABLEstatementinSQLServerManagementStudiowhenconnectedtotheAdventureWorks_DWHdatabase.CREATETABLE[dbo].[EMPLOYEE](

[ID][decimal](22,0)NULL,

[BUSINESSENTITYID][int]NULL,

[NATIONALIDNUMBER][varchar](15)NULL,

[LOGINID][varchar](256)NULL,

[ORGANIZATIONLEVEL][int]NULL,

[JOBTITLE][varchar](50)NULL,

[BIRTHDATE][date]NULL,

[MARITALSTATUS][varchar](1)NULL,

[GENDER][varchar](1)NULL,

[HIREDATE][date]NULL,

[SALARIEDFLAG][int]NULL,

[VACATIONHOURS][int]NULL,

[SICKLEAVEHOURS][int]NULL,

[START_DT][date]NULL,

[END_DT][date]NULL,

[CUR_FLAG][varchar](1)NULL

)ON[PRIMARY]

4. ImporttheEMPLOYEEtablecreatedinthepreviousstepintotheDWHdatastore.5. OpentheDF_STAGE_Load_DWH_Employeedataflowintheworkspacewindowtoeditit

andaddtherequiredtransformations,asshowninthefollowingfigure.

ThesestepsexplaintheconfigurationofeachoftheDFobjectswejustused:

1. TheQuerytransformisusedtocreateanextrafield,START_DT,ofthedatedatatype.

ItwillbeusedbyaHistory_PreservingtransformtoproducethestartdateofthehistoryrecordinthetargetSCDtable.

2. TheTable_ComparisontransformisusedtocomparethedatasetfromtheSTAGE_EMPLOYEEtabletothetargetSCDtabledatasetinordertoproducetherowsoftheINSERTtypetocreaterecordswhichdonotexistinthetargetbutdoexistinsourceaccordingtospecifiedkeycolumnsandrowsoftheUPDATEtype.SourcerecordsofwhichtheIDcolumnexistsinthetargettablewillbeusedtoprovidenewvaluesfornon-keyfields.TheinputprimarykeycolumnwespecifyforTable_ComparisontodeterminewhethertherecordexistsinthecomparisontableisBUSINESSENTITYID.TherestofthesourcecolumnsgointotheComparecolumnssectionaswewanttouseallofthemtodetermineifanyvalueinanyofthesefieldshaschanged.

3. TheHistory_PreservingtransformworksintandemwithTable_Comparisontoproduce“history”recordsupdatingtheadditionalSTART_DT,END_DT,andCUR_INDfields,alongwiththerestofthenon-keyfields,ortocreatenewhistoryrecordsfortheINSERTtypeofrowsdefinedbypreviousTable_Comparison.

TheComparecolumnssectionshouldhavethesamelistofcomparisoncolumnsasinthepreviousTable_Comparisontransform.Youcanalsocontrolwhichformatwillbeusedasahighdate(9999.12.31)andwhichvalueswillbeusedintheCurrentflagfield.

4. TheKey_GenerationtransformgeneratessurrogateuniquekeysintheIDfieldforourhistorySCDtableEMPLOYEEasBUSINESSENTITYIDwillnolongerrepresenttheuniquenessoftherecordifmultiplehistoryrowsarecreatedforthesameemployee.

5. SaveandexecutethejobinitiallytopopulatethetargetSCDtablewiththeinitialdataset.Afterrunningthejob,ifyoucheckthecontentsofthetargettable,youwillseethatitrepresentsthesamedatasetasintheOLTP.Employeetablebutwithextrastart/enddatecolumnspopulated.

Note

Notethat,asthisistheinitialdataset,nohistoryrecordshavebeencreatedforanyemployee.Thus,theBUSINESSENTITYIDcolumnstillhasuniquevaluesinthisdataset.

6. Let’sgeneratesomehistoryrecordsinourtargetSCDtable.Todothat,wehavetomakechangestothesourceOLTPtablebyexecutingthefollowingstatementsinSQLServerManagementStudiowhenconnectedtotheAdventureWorks_OLTPdatabase:select*fromHumanResources.EmployeewhereBusinessEntityIDin(1,999);

insertintoHumanResources.Employee

(BusinessEntityID,NationalIDNumber,LoginID,OrganizationNode,JobTitle,BirthDate,MaritalStatus,Gender,

HireDate,SalariedFlag,VacationHours,SickLeaveHours)

values

(999,‘999999999’,‘domain\johnny’,null,‘Engineer’,‘1982-01-

01’,‘S’,‘M’,SYSDATETIME(),1,99,10);

updateHumanResources.EmployeesetJobTitle=‘CEO’

whereBusinessEntityID=1;

7. NowrunthejobasecondtimeandcheckthecontentsofthetargetSCDtable,EMPLOYEE,foremployeeswithBUSINESSENTITYIDsetto1and999.

Howitworks…AnotherimportantthingwehavetodiscussbeforeweexplainindetailhowthisCDCdataflowworksisthedifferencebetweenthedifferenttypesofCDCarchitecture.

TherearetwobasictypesofCDCmethods,ormethodsallowingyoutopopulateSCDtables.Theyareusuallycalledsource-basedCDCandtarget-basedCDC.YoucanuseeitherofthemorevenbothofthemsimultaneouslytopopulateanytypeofSCDtable.Theyaredifferentonlyinhowchangesinthesourcedataaredetermined.

So,imaginethatyouhavepopulatedtheEmployeeDWHdimensiontable(whichhasnotbeenupdatedforacoupleofdays)ononehandandthesourceEmployeeOLTPtable(whichmightormightnotbedifferentfromthetargetDWHtable’scurrentsnapshotofemployeedata).

Source-basedETLCDCThismethodallowsyoutodeterminewhichemployeerecordshavehadtheirvalueschangedsincethelasttimeyouupdatedtheSCDdimensiontableinyourdatawarehousejustbylookingatthesourceEmployeetable.Forthistowork,thesourcetableshouldhavetheMODIFY_DATEandCREATE_DATEfieldsinit,updatedwiththecurrentdate/timeeachtimetherecordinthesourceEmployeetablegetsupdatedorcreated(ifitisanewemployeerecord).

Anothercomponentrequiredforsource-basedCDCisthedate/timewhentheEmployeetablehasbeenmigratedtopopulatetheDWHtableforthelasttime(usuallystoredinanETLlogtableandextractedintoavariable,$v_last_update_date).

So,eachtimeyouperformanextractionofthesourceEmployeetable,youaddafilteringcondition,suchasSELECT*FROMEMPLOYEEWHEREMODIFY_DATE>=$v_last_update_dateORCREATE_DATE>=$v_last_update_date.Thisallowsyoutoextractsignificantlyfewerrecordsfromthesourcesystem,increasingtheETLprocessingspeedanddecreasingyourCPU,memory,andnetworkresourceconsumption.

Then,inadataflowthatpopulatesthetargetSCDtableinDWH,youdeterminewhetherthisisaneworupdatedrecordbycheckingtheMODIFY_DATEandCREATE_DATEvalues.WiththeMap_Operationtransform,changetherecordoperationtypetoeitherINSERTorUPDATEtosendthemtotheHistory_Preservingtransformforhistoryrecordgeneration.

Target-basedETLCDCIntarget-basedCDC,thewholesourcetableisextractedandeachextractedrecordisthencomparedwitheachtargetSCDtablerecord.DataServiceshasanexcellenttransformationobject,Table_Comparison,whichperformsthisoperation,producingINSERT/UPDATE/DELETErecordsandsendingthemtotheHistory_Preservingtransformforhistoryrecordgeneration.

There’snoneedtospecifythatpuretarget-basedCDCisaresource-andtime-consumingmethod,themainadvantageofwhichisthesimplicityofimplementation.So,whynotmixthemtogetherthentogetthespeedofsource-basedCDCwhenextractingfewerrecordsandthesimplicityoftarget-basedCDC,usingonlytwotransforms,

Table_ComparisonandHistory_Preserving,todeterminetherowsforINSERTandUPDATEandforpreparinghistoryrowswhichwillbesenttotargetSCDtable.

Inthestepsofthisrecipe,weimplementedapuretarget-basedCDCmethod.Thefollowingscreenshotshowsyouoneofthepossibleways(inaverysimplisticform)inwhichtoupdateourtarget-basedCDCtoutilizethetechniquesofthesource-basedCDCmethodinordertodeterminethedatasetforextractionwithonlythechangeddata:

TheinitialscripthereusesthelogtableCDC_LOGtoextractthedatewhenthedatawassuccessfullyextractedandappliedtoSCDtargettablethelasttime.

TheCDC_LOGtablehasonlyonefield,EXTRACT_DATE,andalwayshasonlyonerecordshowingwhentheCDCprocesswasexecutedthelasttime.WeextractthisvaluefromitbeforerunningourCDCdataflowsandupdateitrightafterthesuccessfulexecutionofallCDCdataflows.

Thefinalscriptupdatesthelogtablewiththecurrenttime,sowhenthejobisexecutedthenexttime,itwillonlyextractrecordsthathavebeenmodifiedsincethatdate.

Therearemanyvariationsofsource-basedCDCmethodimplementation.Theyalldependonhowoftendataisextracted,ifthereisaMODIFIED_DATEcolumnonthesourcetable,howintensivelythesourcetableisupdatedwithnewvalues,andsoon.

Themainideahereistoextractasfewrecordsaspossiblewithoutlosingthechangesmadetothesourcetable.

NativeCDCSomedatabases,suchasMSSQLServerandOracle,havetheNativeCDCfunctionality,whichcanbeenabledforspecifictables.WhenanyDMLoperationsareperformedonthetablecontents,thedatabaseupdatestheinternalCDCstructuresloggingwhenandhowthe

tablerecordswereupdatedthelasttime.DataServicescanutilizethisnativeCDCfunctionalityprovidedbythedatabase.Thisconfigurationcanbedoneatadatastorelevelbyusingdatastoreoptionswhenyoucreateadatastoreobject.

Usingthisfunctionalityallowsyoutoalwaysselectonlychangedrecordsfromthesourcedatabasetables.

WewillnotdiscussthedetailsofusingNativeCDCinDataServices,butyoucanconsiderthistaskasagoodhomeworkpracticeandtrytocreateyourownCDCdataflows.JustdonotforgetthatCDChastofirstbeenabledatthedatabaselevelbeforeyoumakeanyconfigurationchangesontheDataServicessideandstartdevelopingETL.

AutomaticjobrecoveryinDataServicesTherecoveryprocessusuallykicksinwhenDataServicesjobsfail.Afailedjob,inmostcases,meansthatsomepartofithascompletedsuccessfullyandsomeparthasnot.Thejobwhichhasfailedrightattheverybeginningisrarelyaproblemandisofhardlyanyconcernforrecoveryasallyouhavetodoistostartitagain.

Complicationsarisewhenthejobfailsinthemiddleoftheinsert,intoatargettableforexample.Caseslikethatrequireyoutoeitherconsiderdeletingalreadyinsertedrecordsorevenrecoveringacopyofthetablefromabackupusingdatabaserecoverymethods.

RecoveryanderrorhandlingisanimportantpartofrobustETLcode.Inthisrecipe,wewilltakealookatthemethodsusedtodevelopETLinDataServicesandthefunctionalityavailableinthesoftwaretomakesurethattheprocessofresumingfailedprocessesgoesassmoothlyaspossible.

TheautomaticjobrecoveryfeatureavailableinDataServicesdoesnotfixtheproblemswiththepartiallyinserteddataormissingkeysproblems(whenaninsertintoafacttablecannotfindtherelatedkeyvaluesinthereferenceddimensiontablesbecausetheyhavenotbeenproperlypopulatedafterthelastjobfailure).Also,thisfeaturedoesnotprotectyoufrompoorETLdesignordevelopmenterrorswhen,forexample,yourETLmigrationprocessdoesanautomaticconversionofdatabetweenincompatibledatatypes.Inthatcase,itisyourjobtodevelopyourETLinsuchawaythatyoucancleansethedataifnecessaryanddomanualconversions,makingsurethatyoucaneitherconvertthevalueinthefieldbetweendatatypesorsettherowwiththisvalueasidetoinvestigateordealwithitlater.

Theautomaticjobrecoveryfeaturesimplytracksdowntheexecutionstatusesofalldataflowandworkflowobjectsfromwithinajob,andifthejobfails,itallowsyoutorestartthejobwithouttheneedtorunsuccessfullycompletedprocessesagain.

Let’sseehowitworks.

GettingreadyWewillusethejobfromthepreviousrecipe.Thisjobcontainstwodataflows:anextractoftheEmployeetablefromtheOLTPsourcedatabaseintothestagingareaandtheloadofthedatafromthestagingtableintothetargetdatawarehousehistorytable,Employee.

Wehavetoemulatethefailedprocess.Todothat,wewilldropthetargetdimensiontablepopulatedbytheseconddataflow.

Firstofall,generateaCREATETABLEstatementfromthetabledbo.EMPLOYEEusingSQLServerManagementStudio.Dothisbyright-clickingonthetableobjectandselectingScriptTableAs|CREATETo|NewQueryEditorWindowonthecontextmenusothatyoucancreateatablewiththesametabledefinitionwithoutanydifficulties.Savethiscodeonyourphysicaldriveforlaterusetorecreatethetable:CREATETABLE[dbo].[EMPLOYEE](

[ID][decimal](22,0)NULL,

[BUSINESSENTITYID][int]NULL,

[NATIONALIDNUMBER][varchar](15)NULL,

[LOGINID][varchar](256)NULL,

[ORGANIZATIONLEVEL][int]NULL,

[JOBTITLE][varchar](50)NULL,

[BIRTHDATE][date]NULL,

[MARITALSTATUS][varchar](1)NULL,

[GENDER][varchar](1)NULL,

[HIREDATE][date]NULL,

[SALARIEDFLAG][int]NULL,

[VACATIONHOURS][int]NULL,

[SICKLEAVEHOURS][int]NULL,

[START_DT][date]NULL,

[END_DT][date]NULL,

[CUR_FLAG][varchar](1)NULL

)ON[PRIMARY]

Then,executethefollowcommandtodropthetable:DROPTABLE[dbo].[EMPLOYEE]

Howtodoit…1. OpenJob_Employeeintheworkspacewindowandexecuteit.2. OntheExecutionPropertieswindow,checktheoptionEnablerecovery.This

optionwillenabletheexecutionstatusloggingoftheworkflowanddataflowobjectswithinthejob.

3. Thefirstdataflowexecutessuccessfully,butthesecondonefailsstraightawaywithanerrormessagefromtheKey_GenerationtransformwhichsendstheSQLstatementSELECTmax(ID)FROMdbo.EMPLOYEEinordertogetthelatestkeyvaluefromthetargettable.

4. Now,returnourmissingtableobjectsbyexecutingthepreviouslysavedCREATETABLEcommandinSQLServerManagementStudio.

5. Executethejobagain,butthistimeselecttheRecoverfromlastfailedexecutionoptionintheExecutionPropertieswindow.

6. ThetracelogstatesthatDF_OLTP_Extract_STAGE_Employeeissuccessfullyrecoveredfromthepreviousjobexecution.

Howitworks…Theautomaticrecoveryfeatureworksonlyifyouenabletheflagonthejobexecutionoptionswindowtoenabletheobjectstatusloggingmechanism.Ifyouhaven’tenableditbeforeyourjobfails,youcannotusetheautomaticrecoveryfeature.

Averyimportantthingbeforerunningthejobagaininrecoverymodeistocheckwhythejobhasfailed.Ifthejobfailedinthemiddleofpopulatingofoneofthetables(dimensionoffact),youhavetounderstandtheimpactofrunningthesameloadprocessagainwithoutcleaningupalreadyinsertedrecordsfirst.

Inourrecipe,wesimulatedthefailureoftheloaddataflow,whichpopulatesthetargetdimensiontable.AsithastheTable_ComparisonandHistory_Preservingtransforms,itisnotaproblemtoexecuteitagainwithoutanypreparatorystepsusingthesamedataset.RecordsthathavealreadybeeninsertedsimplywillnotbeconsideredbyTable_ComparisonforeitherINSERTorUPDATEandwillbeignored,soitissafeforustojustrestartthejobinrecoverymode.

NoteAlwaysconsiderthetypeoffailureandthenatureofyourdataandhowitispopulatedbyyourETLbeforerestartingthejobinrecoverymodetopreventinsertingduplicatesintoyourtargettablesortoavoidreferencingmissingkeyvalues.

TheworkflowobjectcangroupseveralchildobjectsplacedinsideitasasinglerecoverytransactionalunitbyusingRecoverasaunitoption.Thisisusefulwhenseveralofyourdataflowobjectsworkasasingleunitinordertopopulatethespecifictargettablebypreparingdataataspecificpointintime.Inthatcase,ifsomeofthesedataflowsfail,youwanttoexecuteallsequencesofdataflowfromthebeginning.Otherwise,DataServiceswillexecutethejobinthedefaultrecoverymode,skippingallpreviouslysuccessfullycompleteddataflowsandworkflows.

Tousethisability,placebothdataflowobjectsintothesingleworkflow.OpentheworkflowpropertiesandchecktheoptionRecoverasaunit.

Theworkflowiconwillbemarkedintheworkspacewindowbyagreenarrowandsmallblackcrosssothatyoucanvisuallydifferentiatewhichpartsofyourcodebehaveasatransactionalunitduringtherecoveryprocess.

NoteNotethatscriptobjectsarenotconsideredbyrecoverymodeastheyarepartoftheparentworkflowobject.Youshouldkeepthatinmindbeforererunningthejobinrecoverymode.

There’smore…Ofcourse,thebestwaytomakeyourlifeeasieristotrytopreventthenecessityofjobrecoveryinthefirstplace.Oneofthetechniquesthatcanbeimplementedtopreventpossibleproblemswithdatarecoveryandjobreruncomplicationsisbyputtingextracodeinthetry-catchblock.Thiscodecanbeasetofscriptsthatwillperformatableclean-upwithaconsequent“clean”failuresothejobcansimplybererunwithoutextraconsiderationsandpreparatorystepsorsoitcouldevenbeanalternativeworkflowthatprocessesthedatawithadifferentmethodascomparedtotheoriginalonethatfailed.

Forexample,ifyouuseadataflowthatloadstheflatfileintoatable,youcanwrapitinatry-catchblock.Ifitfails,executeanotherdataflowfromacatchblocktotrytoreadthefileagainbutfromadifferentlocationorusingdifferentmethod.

SimplifyingETLexecutionwithsystemconfigurationsWorkinginmultiplesourceandtargetenvironmentsisverycommon.ThedevelopmentofETLprocessesbyaccessingdatadirectlyfromtheproductionsystemhappensveryrarely.Mostofthetime,multiplecopiesofthesourcesystemdatabasearecreatedtoprovidetheworkingenvironmentforETLdevelopers.

Basically,thedevelopmentenvironmentisanexactcopyoftheproductionenvironmentwiththeonlydifferencebeingthatthedevelopmentenvironmentholdsanoldsnapshotofthedataortestdatainsmallervolumesforquicktestjobexecution.

So,whathappensafteryoucreateadatastoreobject,importallrequiredtablesfromdatabaseintoit,andfinishdevelopingyourETL?Youhavetoswitchtotheproductionenvironment.

DataServicesprovidesaveryconvenientwayofstoringmultipledatastoreconfigurationsinthesamedatastoreobject,soyoudonotneedtoeditdatastoreobjectoptionseachtimeyouwanttoextractfromeithertheproductionordevelopmentdatabaseenvironments.Instead,youcancreatemultipleconfigurationsthateachusedifferentcredentialsanddifferentdatabaseconnectionsettingsandquicklyswitchbetweenthemwhenexecutingajob.Thisallowsyoutotouchdatastoreobjectsettingsonlyonceinsteadofchangingthemeachtimeyouwanttorunyourjobagainstadifferentenvironment.

GettingreadyToimplementthestepsinthisrecipe,wewillneedtocreateacopyoftheAdventureWorks_DWHdatabase.OursampledatabasecopyisnamedDWH_backup.UseanypreferredSQLServermethodtocopythecontentsofAdventureWorks_DWHintoDWH_backup.ThequickestwayofperformingthiskindofbackupistobackupthedatabaseusingstandardSQLServermethodsavailableinthedatabaseobjectcontextmenu,andthenrecoveringthisbackupcopyinthedatabasewithanewname.

Howtodoit…ThereisnoneedtocreateaseparatedatastoreobjectforDWH_backuporchangetheDWHdatastoreconfigurationoptionseachtimewewanttoextracteitherfromAdventureWorks_DWHorDWH_backup.Let’sjustcreatetwoconfigurationsforourDWHdatastore.

1. GotoLocalObjectLibrary|Datastores.2. Right-clickontheDWHdatastoreandselectEdit…fromthecontextmenu.3. OntheEditDatastoreDWHwindow,clickonAdvanced<<toopentheadvanced

configurationpart,andthenclickontheEdit…buttonagainsttheConfigurations:label.

4. Inthetop-leftcorneroftheConfigurationsforDatastoreDWHwindow,youcanseefourbuttonsthatallowyoutocreateanewconfiguration,duplicatethecurrentlychosenone,andrenameordeleteconfigurations.UsethemtorenamethecurrentlyusedconfigurationtoDWH_Productionandcreateanewconfiguration,DWH_Development.

5. ChangethenewDWH_DevelopmentconfigurationtobethedefaultconfigurationbysettingDefaultconfigurationtoYes.NotethatthisvaluechangesautomaticallytoNoinotherconfigurations.

6. ChangetheDatabasenameorSIDorServiceNameoptionsettingforDWH_DevelopmenttoDWH_backuptopointthisconfigurationtoanotherdatabase.

Thereisnoneedtochangetheotheroptionsastheywillbeidenticalforbothconfigurations.

7. Nowlet’screatesystemconfigurationssothatwecanchoosetheconfigurationsetupwhenwerunthejobwithouttheneedtoeditthedatastore’sDefaultconfigurationoption.GotoTools|SystemConfigurations…andcreatetwosystemconfigurations:DevelopmentandProduction.

8. FortheDWHrecord,setDevelopmenttoDWH_DevelopmentandProductiontoDWH_Production.

9. ClickonOKtosavethechanges.

Howitworks…Usingconfigurationsenablesyoutoquicklyswitchbetweenenvironmentswithouttheneedtomodifyconnectivityandconfigurationsettingsinsideadatastoreobject.

Systemconfigurationsextendtheusabilityofdatastoreconfigurationsevenmorebyallowingyoutoselectthecombinationofenvironmentsrightatthejobexecutiontime.

NoteForthesystemconfigurationfunctionalitytowork,datastoreconfigurationshavetobecreatedfirst.

DoyouwanttobeabletoextractfromtheproductionOLTPsourcebutinsertintothedevelopmentDWHtargetwithinthesamejobwithoutchangingtheETLcodeordatastoresettings?Justcreateanewsystemconfigurationthatincludestherequiredcombinationofvariousdatastoreconfigurationsandexecutethejobwiththesystemconfigurationspecified.

Now,ifyouexecutetheJob_Employeejob,justselectthedesiredconfigurationinthejobexecutionoptions:

UsetheBrowse…buttontoreviewallsystemconfigurationscreated,ifnecessary.

TransformingdatawiththePivottransformThePivottransformbelongstotheDataIntegratorgroupoftransformobjects,whichareusuallyallaboutgenerationortransformation(changingthestructure)ofdata.Simplyput,thePivottransformallowsyoutoconvertcolumnsintorows.Pivotingtransformationincreasesthenumberofrowsinthedatasetasforeachcolumnconvertedintoarow,anextrarowiscreatedforeverykey(non-pivotedcolumn)pair.Convertedcolumnsarecalledpivotcolumns.

GettingreadyRuntheSQLfollowingstatementsagainsttheAdventureWorks_OLTPdatabasetocreateasourcetableandpopulateitwithdata:createtableSales.AccountBalance(

[AccountID]integer,

[AccountNumber]integer,

[Year]integer,

[Q1]decimal(10,2),

[Q2]decimal(10,2),

[Q3]decimal(10,2),

[Q4]decimal(10,2));

—Row1

insertintoSales.AccountBalance

([AccountID],[AccountNumber],[Year],[Q1],[Q2],[Q3],[Q4])

values(1,100,2015,100.00,150.00,120.00,300.00);

—Row2

insertintoSales.AccountBalance

([AccountID],[AccountNumber],[Year],[Q1],[Q2],[Q3],[Q4])

values(2,100,2015,50.00,350.00,620.00,180.00);

—Row3

insertintoSales.AccountBalance

([AccountID],[AccountNumber],[Year],[Q1],[Q2],[Q3],[Q4])

values(3,200,2015,333.33,440.00,12.00,105.50);

Thesourcetableshouldlooklikethefollowingfigure:

DonotforgettoimportitintotheDataServicesOLTPdatastore.

Howtodoit…1. CreateanewdataflowandnameitDF_OLTP_Pivot_STAGE_AccountBalance.2. Openthedataflowintheworkspacewindowtoedititandplacethesourcetable

ACCOUNTBALANCEfromtheOLTPdatastorecreatedintheGettingreadysectionofthisrecipe.

3. LinkthesourcetabletotheExtractQuerytransform,andpropagateallsourcecolumnstothetargetschema.

4. PlacethenewPivottransformobjectintoadataflowandlinktheExtractQuerytoit.ThePivottransformcanbefoundinLocalObjectLibrary|Transforms|DataIntegrator.

5. OpenthePivottransformintheworkspacetoeditandconfigureitsparametersaccordingtothefollowingscreenshot:

6. ClosethePivottransformandlinkittoanotherQuerytransformnamedPrepare_to_Load.

7. PropagateallsourcecolumnstothetargetschemaofthePrepare_to_Loadtransform,andfinallylinkittothetargetACCOUNTBALANCEtemplatetablecreatedintheDS_STAGEdatastoreandSTAGEdatabase.

8. Beforeexecutingthejob,openthePrepare_to_LoadQuerytransforminthe

workspacewindow,double-clickonthePIVOT_SEQcolumn,andcheckPrimarykeytospecifyanadditionalcolumnasaprimarykeycolumnforthemigrateddataset.

9. Saveandrunthejob.10. Openthedataflowagainintheworkspacewindowandimportthetargettableby

right-clickingonthetargettableandselectingImporttablefromthetablecontextmenu.

11. Openthetargettableintheworkspacewindowstoedititsproperties,andselecttheflagDeletedatafromtablebeforeloadingontheOptionstab.

12. YourdataflowandPrepare_to_LoadQuerytransformmappingshouldnowlooklikethefollowingscreenshot:

Howitworks…Pivotcolumnsarethecolumnswhosevalueswillbemergedintoonecolumnafterthepivotingoperationproducesanextrarowforeachpivotedcolumn.Non-pivotcolumnsarethecolumnsnotaffectedbypivotoperation.Asyoucansee,pivotingoperationdenormalizesthedataset,generatingmorerows.ThisiswhyACCOUNTIDdoesnotdefinetheuniquenessoftherecordanymoreandwhywehadtospecifytheextrakeycolumnPIVOT_SEQ.

YoumightaskWhypivot?WhynotjustusethedataasisandperformtherequiredoperationonthedatafromcolumnsQ1-Q4?

Theanswerinthegivenexampleisverysimple.Itismuchmoredifficulttoperformanaggregationwhentheamountsarespreadacrossthedifferentcolumns.Insteadofsummarizingbyasinglecolumnwiththesum(AMOUNT)function,wehavetowritetheexpressionsum(Q1+Q2+Q3+Q4)eachtime.Quartersisnottheworstthingyet.Trytoimaginethesituationwhenthetablehasamountsstoredincolumnsdefiningmonthperiodsoryouhavetofilterbythesetimeperiods.

Ofcourse,contrarycasesexistaswell—whenstoringdataacrossmultiplecolumnsinsteadofjustinoneisjustified.Inthesecases,ifyourdatastructureisnotlikethat,youcanusetheReverse_Pivottransform,whichdoesexactlytheoppositething—convertingrowsintocolumns.LookattheexampleoftheReverse_Pivotconfigurationgivenhere:

Reversepivotingorthetransformationofrowsintocolumnshasintroducedanotherterm:Pivotaxiscolumn.Thisisthecolumnthatholdsthecategoriesdefiningdifferentcolumnsafterreversepivotoperation.ItcorrespondstotheHeadercolumnoptioninthePivottransformconfiguration.

Chapter10.DevelopingReal-timeJobsTherecipesandtopicsthatwillbediscussedinthischapterareasfollows:

WorkingwithnestedstructuresTheXML_MaptransformTheHierarchy_FlatteningtransformConfiguringAccessServerCreatingreal-timejobs

IntroductionInallpreviouschapters,wehaveworkedwithbatch-typejobobjectsinDataServices.Aswealreadyknow,abatchjobinDataServiceshelpstoorganizeETLprocessessothattheycanbestartedondemandorscheduledtobeexecutedataspecifictimeeitheronceorregularly.

Themaindifferencebetweenareal-timejobandbatchjobisthewaythesetwojobobjectsareexecutedbyDataServicesengine.Thepurposeofareal-timejobistoprocessrequestsprovidingresponse.So,technically,areal-timejobcouldberunningforhours,days,orevenweekswithoutactuallyprocessinganydata.DataServicesengineactuallyexecutestheETLcodefromwithinthereal-timejobobjectonlywhennewrequestcomesfromanexternalservice.DataServicesusesthisrequestmessageasthedatasource,processesthisdata,andsendstheprocesseddatabacktoexternalserviceinformofresponsemessage.

AnewDataServicescomponentcalledAccessServerhascomeintotheframe.AccessServerplaystheroleofamessengerservicingreal-timejobs.ItisAccessServerthatacceptsandsendsbackmessagestobeusedasasourceandtargetdataforreal-timejobs.

Inthischapter,wewillalsoreviewtheconceptsofnestedstructures—howandwhentheyarecommonlyused.Themainreasonforthisisthatthereal-timejobsoftenuseXMLtechnologytoreceiverequestsandsendtheresponsesback.TheXMLformatisoftenusedtoexchangenesteddatastructures.

WewillalsoseehowtocreateandconfigureAccessServertobeabletousereal-timejobfunctionalityand,finally,wewillcreateareal-timejobitself.

WorkingwithnestedstructuresEarlierinthisbook,weworkedsolelywithaflatstructure—rowsextractedfromdatabasetablesandinsertedbackinadatabasetable,orexportedtoaflattextfile.Inthisrecipe,wewilltakealookathowtopreparenesteddatastructuresinsideadataflowandthenexportitintoanXMLfileasXMLisasimpleandveryconvenientwaytostorenesteddataandismostcommonlyusedasasourceandtargetobjectsinreal-timejobs.

GettingreadyWewillnotneedtohaveanXMLfilepreparedforthisrecipeaswearegoingtogeneratethemautomaticallywithhelpofDataServicesfromdatasetsstoredinourrelationaldatabases:OLTPandDWH.

Wewillconstructthenesteddatastructureofjobtitlelist,whereeachrecord(jobtitle)willhaveareferencetoalistofemployeeswhohavethesamejobtitleintheOLTPsystem.

Followingisthevisualpresentationofthisnesteddatastructure:

Intheflatdatastructure,thesewouldbetwodifferenttablesandwewouldhavetohavereferencekeycolumnsinbothtableslinkingthemtogetherasaparent–childrelationship.

Anesteddatastructureallowsyoutoavoidreferencekeyscompletely.Inotherwords,wedonotreallyneedJobTitleIDinordertolinkthesetwotablestogether.Alistofemployeeswillbeliterallystoredinthesamedatasetinoneofthefieldsforthespecificjobtitle.

WewillsourcethelistofjobtitlesfromtheHumanResources.EmployeetableofourOLTPdatabase.Persondatasuchasfirstnameandlastname,willbesourcedfromthePerson.PersontablethatislinkedtotheEmployeetablebytheBusinessEntityIDcolumn.

Howtodoit…1. Createanewdataflow,DF_OLTP_XML,andopenitintheworkspacewindowfor

editing.2. Import,ifnecessary,twotables,Person.PersonandHumanResources.Employee,into

theOLTPdatastore.3. PlacebothtablesinthedataflowDF_OLTP_XMLassourcetableobjects.4. PlacetheGet_PersonQuerytransforminsidetheworkspaceofDF_OLTP_XMLandlink

ittothePersontableobject.PropagatethreecolumnstotheoutputschemaoftheQuery—BUSINESSENTITYID,FIRSTNAME,LASTNAME—fromthePersontable.

5. CreatetwoQuerytransformstogetthedatafromtheEmployeetable:Get_JobTitle_PersonandDistinct_JobTitle.

Get_JobTitle_PersonshouldselectthedatasetconsistingoftwocolumnsBUSINESSENTITY_IDandJOBTITLE.

Distinct_JobTitleshouldonlyselecttheJOBTITLEcolumn.

6. IntheDistinct_JobTitleQueryEditor,tickthecheckboxDistinctrows…ontheSELECTtabandsetupascendingsortingontheJOBTITLEcolumnonORDERBYtab.

7. CreatetheGen_JobTitle_IDQuerytransformandlinkDistinct_JobTitletoit.ThisQuerytransformwillbeusedtogeneratenewuniqueidentifiersfordistinctvaluesofjobtitles.

8. Finally,joinallthreeQuerytransformstogetherusinganotherJoinQueryandpropagatefourcolumnstotheoutputschema:JOBTITLE_ID,JOBTITLE,FIRSTNAME,LASTNAME.

9. Nowthatwehavemergedourdatafrommultipletablesintoonedataset,let’sseewhatisrequiredtoconvertthisflatdatasettoanestedone.

10. Todothat,wehavetosplittheflatdataagain,separatingjobtitlesfromemployeedata.Bothresultdatasetsshouldhaveareferencekeycolumn,whichwillbeusedtodefinetherelationshipsbetweentherecords.

CreatetwoQuerytransforms,Q_JobTitleandQ_Person,propagatingJOBTITLE_IDinbothQueryobjects:

11. ThenestingofthedatahappensintheQuerytransformobjectthatisusedtojointhepreviouslysplitdatasets.CreatetheJobTitle_TreeQuerytransformandlinkittobothQ_JobTitleandQ_Person.

12. OpentheJobTitle_TreeQueryEditorinworkspacewindow.13. DraganddropJOBTITLE_IDandJOBTITLEfromQ_JobTitleinputschematothe

outputschema.14. DraganddropthewholeQ_Personinputschematotheoutputschema.Thatwillthe

placeQ_PersontableschemaatthesamelevelwiththeJOBTITLE_IDandJOBTITLEcolumns.Q_PersonisnowanestedsegmentinsidetheJobTitle_Treeschema.

15. Now,wecanswitchbetweenoutputschemasbydouble-clickingoneitherJobTitle_TreeorQ_Person,oryoucanright-clickonschemanameandselectMakecurrent…fromthecontextmenu.ThatisnecessaryifyouwanttochangesettingsontheQuerytransformtabs:Mapping,SELECT,FROM,WHERE,andsoon.Thosetabsarenotsharedbyallnestedoutputschemasandonly“current”outputschemavaluesaredisplayed.

16. MaketheJobTitle_TreeoutputschemacurrentandselecttheFROMtab.Make

surethatonlyoneQ_JobTitlecheckboxisselected.

17. Now,maketheQ_Personoutputschemacurrent.18. OntheFROMtab,tickonlytheQ_Personcheckbox.19. OntheWHEREtab,putthefollowingfilteringcondition:

(Q_Person.JOBTITLE_ID=Q_JobTitle.JOBTITLE_ID)

20. Finally,wehavetooutputournesteddatasetintoapropertargetobject,whichsupportsnesteddata.SQLServerdoesnotsupportnesteddata,andthatiswhywewilluseanXMLfileasatarget.

21. SelecttheNestedSchemasTemplateobjectfromtheright-sidetoolpanel, ,andplaceitasatargetobjectlinkedtothelastJobTitle_TreeQuerytransform.

22. NamethetargetobjectXML_targetandopenitintheworkspacewindowsforediting.Specifythefollowingoptions:

23. Yourdataflowshouldnowlooklikethefollowingfigure:

Howitworks…DataServicesallowsyoutoviewthetargetdataloadedbythelastjobrunfromtheXMLtargetobjectinthesamewayasfortargetdatabasetableobjects,asshowninthefollowingscreenshot:

IfyouopentheXML_target.xmlfilecreatedintheC:\AW\Files\folder,youwillseeacommonXMLstructure:

XMLisjustaconvenientexampleofanobjectthatcanstorenesteddatastructure.DataServiceshasothertargetobjectsthatcanacceptnesteddatasuchasBAPIfunctionsandIDocobjects,bothusedtoextract/loaddatafromandintoSAPsystems.Thesemethodsandconceptswillbeintroducedinthenextchapter.

DataServicesalsosupportstheJSONformatasanothersourceortargetfornesteddatastructures.

Nesteddataisoftencalledhierarchicaldataasitresemblesthetreestructure.Ifyouimaginerowfieldstobeleaves,thenoneoftheleavescouldbeanothertree(onerowormultiplerows)storedinsidealeafsection.

Inotherwords,nesteddatasimplymeansmappingsourcetableasacolumnintheoutputobjectstructureinsideadataflow.

Inthepreviouschapters,weworkedonlywiththeflattableorfiledatawhendatasetsconsistedofmultiplerowsandeachrowconsistedofmultiplefields,eachofwhichcouldonlyhaveonevalue(decimal,character,date,andsoon.).Nestedorhierarchicaldataallowsyoutoreferenceanothertableinsidearowfield.

NoteConvertingaflatdatasettoanesteddatasetnormalizesitasyoudonothavetoduplicateparentfieldsforeverychildsetofrows.

Youcanseehowanestedtablesegmentisdisplayedamongotherparentcolumns.Todefineifanestedstructurecanhavemultiplerecordsforeveryparentrecord,youcanright-clickonthenestedtablesegmentandselecttheRepeatablemenuoption.Unselectingthisoptionwillmakethenestedsegmentaone-recordsegmentandwillchangetheiconofthenestedtablesegmentfrom to .

Thereismore…DataServiceshasfullsupportofnesteddatastructures.Inthestepsofthisrecipe,weusedgoodoldQuerytransformtogenerateit.Inthenextrecipe,wewilldemonstratehowthesametaskcanbeimplementedwiththehelpofspecialDataServicestransform—XML_Maptransform.

TheXML_MaptransformInthefirstrecipeofthischapter,Workingwithnestedstructures,webuiltthenestedstructurewiththehelpofthemostuniversaltransforminDataServices—Querytransform.Querytransformhasthepowertodefinecolumnmapping,filterdata,joindatasetstogether,andmergedatainnestedsegments.Infact,manytransformsthatyouhaveusedbefore,suchasHistory_Preserving,Table_Comparison,Pivot,andothers,canbesubstitutedwiththesetofQuerytransforms.Ofcourse,thosewouldbecomplexETLsolutionsrequiringmoredevelopmenttime,wouldbehardertomaintainandread,and,mostimportantly,lessefficientintermsofperformance.

Inthisrecipe,wewilltakealookatanothertransformXML_Map,whichdoesexactlythesametaskasperformedinthepreviousrecipe—buildsandtransformsnestedstructures.

WewillusethesamesourcetablesPERSON.PERSONandHUMANRESOURCES.EMPLOYEEtobuildadatasetofjobtitleswithnestedlistsofemployees.

GettingreadyWehaveeverythingweneedforthisrecipealready:twosourcetables,PERSON.PERSONandHUMANRESOURCE.EMPLOYEE,importedinourOLTPdatastore.

Howtodoit…1. Createanewjobandnewdataflowandopenitintheworkspace.2. PlacethetwotablesPERSONandEMPLOYEEfromtheOLTPdatastoreinsideadataflow

assourcetables.3. DraganddropXML_MaptransformfromLocalObjectLibrary|Transforms|

Platformintoadataflowworkspaceandlinkbothsourcetablestoit.Whenplacingtransformintheworkspace,choosetheNormalmodeoption.

4. Left-clickonXML_Maptoopenitinworkspaceforediting.5. First,buildtheparentdatastructureofjobtitlesbymappingtheJOBTITLEcolumn

fromtheEMPLOYEEsourceschematotheoutputXML_Mapschema.6. OntheIterationRuletab,double-clickontheiterationrulefieldandselectthe

EMPLOYEEinputschema.7. OntheDISTINCTtab,draganddroptheEMPLOYEE.JOBTITLEsourcecolumninto

theDistinctcolumnsfield.8. OntheORDERBYtab,specifyAscendingsortingbytheEMPLOYEE.JOBTITLE

sourcefield,asshowninthefollowingscreenshot:

9. Now,addanesteddatasetcontainingpersonalinformation.DragthePERSONinputschematotheoutputandmakesurethatitisaddedonthesamelevelwiththepreviouslypropagatedJOBTITLEcolumn.

10. Double-clickontheoutputPERSONschematomakeitcurrentoruseMakecurrentfromthecontextmenubyright-clickingontheoutputPERSONschema.

11. OntheIterationRuletab,selecttheINNERJOINiterationruleandaddbothsourceinputschemasunderneathit.

12. OnthesameIterationRuletab,intheOnfield,specifythejoincondition:PERSON.BUSINESSENTITYID=EMPLOYEE.BUSINESSENTITYID

13. OntheWHEREtab,specifythejoinconditionbetweenparentandnesteddatasetsintheoutputschema:EMPLOYEE.JOBTITLE=XML_Map.JOBTITLE

14. CloseXML_MapEditorandlinkXML_MaptoQuerytransformobjectcalledGen_JobTitle_ID,inwhichwewillgenerateanIDcolumnfortheparentjobtitledataset.

AddtheJOBTITLE_IDoutputcolumn,asshownintheprecedingscreenshot,andputthemappingexpressiongen_row_num()foritontheMappingtab.

15. AfterQuerytransform,addtheNestedSchemasTemplateobjectasatargetobject.ConfigureitasanXMLtypewiththefilename:C:\AW\Files\XML_map.xml.

Howitworks…TheXML_MaptransformpropertiesareverysimilartoQuerytransformpropertieswithafewexceptionswhereXML_Maphassomeextrafunctionalitythatcanbeusedtobuildnesteddatastructures.

WhatmakestheXML_Maptransformareallypowerfultoolistheabilitytojoinanysourceinputdatasets(itdoesnotmatteriftheycomefromflatdatasourcesornesteddatastructures)anditerateonthecombineddataset,producingrequiredoutputresults.

Therearemultipletypesofjoinoperationsavailable:

*—cross-joinoperation:ThisproducesaCartesianproductofjoineddatasets.InSQLlanguage,itisanormalINNERJOINwithoutthespecifiedONclause.||—parallel-joinoperation:Thisisanon-standardSQLoperationthatbasicallyconcatenatesthecorrespondingrecordsfromtwojoineddatasets.Seetheexampleinthefollowingfigure:

INNERJOIN—standardSQLoperation:ThisiswhereyoucanspecifythejoinconditionintheOnfield.LEFTOUTERJOIN—standardSQLoperation:ThisiswhereyoucanspecifythejoinconditionintheOnfield.

Inthepreviousstepsoftherecipe,weproducedonehierarchicaldatasetwiththehelpofXML_Map,which,infact,hastwodatasetsinit—aparentdatasetofdistinctjobtitlessourcedfromtheEMPLOYEEtableandanesteddatasetoftheemployee’spersonalinformationwhichbelongstothespecificjobtitle.

IfwejustsourcedpersonalinformationfromonlythePERSONtable,wewouldnotbeabletospecifywhichpersonalinformation(FIRSTNAMEandLASTNAME)belongstowhichjobtitle.

Byprovidingajoineddatasetforpersonalinformationtoiterateon,wecoulddefinethedependencyforournestedstructurebyusingthefollowingexpressionintheWHEREtab,EMPLOYEE.JOBTITLE=XML_Map.JOBTITLE,whichcouldberoughlytranslatedtobuildadatasetfromthesourcetables,whichcontainsthefieldsJOBTITLE,FIRSTNAME,LASTNAME,andnesttherecordswithFIRSTNAMEandLASTNAMEfieldsinsidetheuniquerecordsoftheoutputjobtitledatasetbyreferencingthecorrespondingJOBTITLEcolumn.

ThefinalQuerytransform,whichisusedtogenerateanextraoutputcolumnwithauniqueIDforaparentjobtitledataset,isquitesimple.Wehavealreadyproducedanalphabeticallysortedanduniquelistofjobtitlesinourparentdatastructure,andallthatisleftistogeneratesequentialnumbersforeachrecord,whichcanbeeasilydonewithhelpofthegen_row_num()function.

NoteNotehowmuchmoreconciseourETLcodehasbecomewiththeuseoftheXML_MaptransformascomparedtothepreviousrecipewherewebuiltthesamehierarchicaldatasetbyonlyusingQuerytransformobjects.

TheHierarchy_FlatteningtransformSometimes,hierarchicaldataisnotrepresentedbynested(hierarchical)datastructuresbutisactuallystoredwithinasimpleflatstructureinnormaldatabasetablesorflatfiles.Thesimplestformofhierarchicalrelationshipsindatacanbepresentedasatablethathastwofields:parentandchild.

Lookattheexampleoffolderhierarchyonthedisk(asshowninthefollowingfigure).Thestructureontheleftisvisuallysimpletoreadandunderstand.Youcaneasilyseewhatistherootfolderandwhataretheleaves,andcaneasilyhighlightthespecificbranchyouareinterestedin.

Thetableontherightisthesimplestwaytostorethehierarchicalrelationshipsdataintheflatformat.ThisstructureisextremelyhardtoquerywiththestandardSQLlanguage.SomedatabaseslikeOraclehavespecialSQLclauses,whichcanhelptoqueryhierarchicaldatatobeabletoanalyzeitandpresentinanunderstandableandclearway.However,thosehierarchicalSQLstatementscanbequitecomplexandthemajorityofotherdatabasesdonotsupportthematall,leavingyouwiththenecessitytowritestoredproceduresinordertoparsethishierarchicaldata,answeringeventhesimplestquestionlikeselectall“children”forspecific“parent”.

Inthisrecipe,wewillreviewthemethodthatisavailableinDataServicestoconvertdatafromthatsimpleflathierarchicalpresentationofparent–childrelationshipsintothemoreefficientandeasy-to-usedatastructurethatcanbequeriedwithastandardSQLlanguage.ThiscanbedonewiththeHierarchy_Flatteningtransform.

GettingreadyAswedonothavemulti-levelparent–childrelationshiptable,weshouldartificiallycreateone.Let’sbuildthehierarchyoflocationsusingourthreesourcetablesfromtheOLTPdatabase:ADDRESS(tosourcecitiesfrom),STATEPROVINCE(tosourcestatesfrom),andCOUNTRYREGION(tosourcecountriesfrom)—allofthemarefromthesamePersonschemaoftheAdventureWorks_OLTPSQLServerdatabase.

Theresultingdatasetwillonlyhavetwocolumns—PARENTandCHILD—andeachrowinitwillrepresentonelinkofthehierarchicaldataset.

1. CreateanewjobandcreateanewdataflowinitnamedDF_Prepare_Hierarchy.2. Openthedataflowintheworkspacewindowforediting,andplacethreesourcetables

initfromOLTPdatastore:ADDRESS,STATEPROVINCE,andCOUNTRYREGION.3. CreateQuerytransformState_CityandjoinADDRESSandSTATEPROVINCEinitusing

theconfigurationsettings,asshowninthefollowingscreenshot,propagatingtheSTATEPROVINCE.NAMEandADDRESS.CITYsourcecolumnsasoutputPARENTandCHILDcolumnsrespectively:

4. CreateQuerytransformCountry_StateandjoinCOUNTRYREGIONandSTATEPROVINCEinitusingconfigurationsettings,asshowninthefollowingscreenshot,propagatingCOUNTRYREGION.NAMEandSTATEPROVINCE.NAMEsourcecolumnsasoutputPARENTandCHILDcolumnsrespectively:

5. MergetheoutputsofbothState_CityandCountry_StatetransformobjectswiththeMergetransform.

6. LinktheMergetransformoutputtotheHierarchyQuerytransformandpropagatebothPARENTandCHILDcolumnswithoutmakinganyotherconfigurationchangestotheQuerytransform.

7. Placethetargettemplatetableattheendofdataflowobjectsequencetoforwardtheresultdatato.NamethetargettableLOCATIONS_HIERARCHYandcreateitintheDS_STAGEdatastore.

Aftersavingandexecutingthejob,theLOCATIONS_HIERARCHYtablewillbecreatedand

populatedwithathree-levelhierarchyoflocations,whichincludecities,states,andcountries,asshowninthefollowingscreenshot:

Now,let’sseehowthisdatasetcanbeflattenedwiththeHierarchy_Flatteningtransform.

Howtodoit…TherearetwodifferentmodesinwhichtheHierarchy_Flatteningtransformparsesandrestructuresthesourcehierarchicaldata:horizontalandvertical.Theyproducedifferentresults,andwewillbuildtwodifferentdataflowsforeachoneoftheminordertoparseandflattenthesourcehierarchicaldataandcomparethefinalresultdatasets.

HorizontalhierarchyflatteningThefollowingarethestepstoperformHorizontalhierarchyflattening.

1. Createanewdataflow,DF_Hierarchy_Flattening_Horizontal,andlinkittothe

existingDF_Prepare_Hierarchyinthesamejob.Openitintheworkspaceforediting.

2. PuttheLOCATIONS_HIERARCHYtemplatetablefromtheDS_STAGEdatastoreasasourcetableobject.

3. LinkthesourcetabletotheHierarchy_Flatteningtransformobject,whichcanbefoundintheLocalObjectLibrary|Transforms|DataIntegratorsection.

4. OpentheHierarchy_Flatteningtransformintheworkspacewindowandchoosethehorizontalmethodofhierarchyflattening.

5. SpecifythesourcePARENTandCHILDcolumnsinthecorrespondingtransformconfigurationsettings:

6. ClosethetransformeditorandlinktheHierarchy_FlatteningtransformobjecttothetargettemplatetableLOCATIONS_TREE_HORIZONTALcreatedintheDS_STAGEdatastore.

VerticalhierarchyflatteningThefollowingarethestepstoperformverticalhierarchyflattening.

1. Createanewdataflow,DF_Hierarchy_Flattening_Vertical,andlinkittothepreviouslycreatedDF_Hierarchy_Flattening_Horizontaldataflowinthesamejob.Openitintheworkspaceforediting.

2. PuttheLOCATIONS_HIERARCHYtemplatetablefromtheDS_STAGEdatastoreasasourcetableobject.

3. LinkthesourcetabletotheHierarchy_Flatteningtransformobject,whichcanbefoundintheLocalObjectLibrary|Transforms|DataIntegratorsection.

4. OpentheHierarchy_Flatteningtransformintheworkspacewindowandchoosetheverticalmethodofhierarchyflattening.

5. SpecifythesourcePARENTandCHILDcolumnsinthecorrespondingtransformconfigurationsettings:

6. ClosethetransformeditorandlinktheHierarchy_FlatteningtransformobjecttothetargettemplatetableLOCATIONS_TREE_VERTICALcreatedintheDS_STAGEdatastore.

7. Saveandclosethedataflowtabintheworkspace.Yourjobshouldhavethreedataflowsnow:thefirstpreparesthehierarchicaldataset,thesecondflattensthisdatasethorizontally,andthethirdflattensthedatasetvertically.Bothresultdatasetsareinsertedintwodifferenttables:LOCATIONS_TREE_HORIZONTALandLOCATIONS_TREE_VERTICAL.

Howitworks…Thehorizontalflatteningresulttablelookslikethefollowing:

Youcannowseewhyitiscalled“horizontal”.Alllevelsofhierarchyarespreadacrossdifferentcolumnshorizontally.

CURRENT_LEAFshowsthenameofthespecificnodeandLEAF_LEVELshowswhichcolumnitcanbefound.

Theconvenienceofthismethodisthatyoucanseethefullpathtothenodeinonerow,startingfromtherootnode,andseetheLEVELcolumnswhereLEVEL0showstherootnode.

Verticalflatteninglooksabitdifferent:

ANCESTORandDESCENDENTarebasicallythesamePARENTandCHILDentities,butoutputresultssetafterhierarchyflatteninghavealotmorerecordsasextrarecordsshowingthe

dependencywerecreatedbetweenthetwonodeseveniftheyarenotrelateddirectly.

TheDEPTHcolumnshowsthedistancebetweentworelatednodes,where0meansthisisthesamenode,1meansthatthenodesarerelateddirectly,and2meansthatthereisanotherparentnodebetweenthem.

TheROOT_FLAGcolumnflagstherootnodesandtheLEAF_FLAGcolumnflagstheendleafnodesthatdonothavedescendants.

Asyoucanseefromthestepsofthisrecipe,theconfigurationoftheHierarchy_Flatteningtransformisextremelysimple.Allthatisrequiredfromyouistospecifytheparentandchildcolumnsthatstoretherelationshipsbetweentheneighbornodesofthehierarchy.

Extraparametersspecifictoeachtypeofhierarchyflatteningareexplainedasfollows.

Maximumdepth:Itexistsonlyforthehorizontalmethodbecausethismethodusesnewcolumnsfornewlevelsofhierarchy,andDataServicesneedsyoutospecifyhowmanyextracolumnsyouwanttocreateinyourresulttargettable.Imaginethesituationwhenyourhierarchicaldatasetstoresanextremelydeephierarchy—100levelsormore—andyoudonotknowaboutthisafterhavinglookedattheunflattenedhierarchyrepresentationwithonlyparentandchildfields.Inthatcase,atablewithafewhundredcolumnsforeachhierarchylevelmaynotbewhatyouarelookingfor.So,thisparameterallowsyoutocontroltheflatteningbehaviorofthetransform.Usemaximumlengthpaths:Thisparameterisspecifictoonlytheverticalmethodofhierarchyflattening.ItaffectsonlythevalueoftheDEPTHfieldintheresultoutputschema.Itworksonlyinsituationswhentherearemultiplepathsfromthedescendenttoitsancestorandtheyareofadifferentlength.SelectingthisoptionwillalwayspickthehighestnumberfortheDEPTHfieldoutofthesemultiplepaths.

QueryingresulttablesNow,let’strytoqueryaresulttablesothatyoucouldseehoweasyitistonowperformtheanalysisofthedata.YoucanrunthefollowingqueriesintheSQLServerManagementStudiowhenconnectedtotheSTAGEdatabase.

Selectallrootnodesofthehierarchy:selectCURRENT_LEAFfromdbo.LOCATIONS_TREE_HORIZONTALwhereLEAF_LEVEL=0orderbyCURRENT_LEAF;

selectANCESTORfromdbo.LOCATIONS_TREE_VERTICALwhereDEPTH=0andROOT_FLAG=1orderbyANCESTOR;

BothSQLstatementsproducethesameresult—alistof13rootnodes(weknowthatthosearecountries).

Checkif“UnitedStates”nodehasaleafnode“Aurora”amongitsdependents:select*fromdbo.LOCATIONS_TREE_HORIZONTALwhereLEVEL0=‘UnitedStates’andCURRENT_LEAF=‘Aurora’;

select*fromdbo.LOCATIONS_TREE_VERTICALwhereANCESTOR=‘UnitedStates’andDESCENDENT=‘Aurora’;

Theresultreturnedbytwoquerieslooksdifferent:

Youcanseethatthehorizontalviewismoreconvenientifyouwanttoseethefullpathtotheleafnodefromthetoprootnode.

TheverticalviewismoreconvenienttouseinSQLqueriesassometimesyoudonothavetofigureoutwhichcolumnyouhavetouseifyouwanttodoaspecificoperationonaspecificlevelofhierarchy.Resultcolumnsofverticalhierarchyflatteningarealwaysthesameandstatic,whereashorizontalhierarchyflatteningproducesanumberofcolumnsthatdependsonthedepthoftheflattenedhierarchy.

ThedecisionofwhattypeofhierarchyflatteningtouseshouldbemadeaftertakingintoaccountthetypeofSQLqueriesthatwillbeusedtoquerythisflatteneddata.

NoteIfyouhaveexperimentedwiththehierarchyflatteningresultdatasets,youwouldhaveprobablynoticedthatsomequerieswrittenagainst“horizontal”and“vertical”resulttablesproducedifferentresultsandarenotexactlywhatisexpected.Thathappensbecauseourparentandchildcolumnsaretextfields(namesofthecountries,regions,andcities),andtheydonotrepresenttheuniquenessofeverynode.Forexample,thereisastate“Ontario”thatbelongstoCanadaandacity“Ontario”thatbelongstothestateCalifornia.DataServicesdoesnotknowaboutthefactthatthesetwoaredifferentnodesandconsidersthemtobethesamenode(asthenamevaluematches).Youshouldkeepthatinmindanduseuniqueidentifiersforthenodesinparentandchildfieldsforhierarchyflatteningtoproducevalidandconsistentresults.

ConfiguringAccessServerAccessServerisrequiredforreal-timejobstowork.Inthisrecipe,wewillgothroughthestepsofcreatingandconfiguringtheAccessServercomponentthatwillberequiredforournextrecipe,wherewearegoingtocreateourfirstreal-timejob.

GettingreadyAccessServercanbecreatedandconfiguredwiththehelpoftwoDataServicestools:

DataServicesServerManager( )andDataServicesManagementConsole( ).

Howtodoit…1. StartSAPDataServicesServerManager.2. GototheAccessServertab.3. ClickontheConfigurationEditorbutton.4. OntheAccessServerConfigurationEditorwindow,clickontheAddbutton.5. FillintheAccessServerconfigurationfields,asshowninthefollowingscreenshot:

6. DonotforgettoenableAccessServerbytickingthecorrespondingoption.7. ClickonOKtocloseandsavethechanges.8. StarttheSAPDataServicesManagementconsoleinyourbrowserandlogin.9. GototheAdministrator|Managementsection.10. ClickontheAddbuttontoaddthepreviouslycreatedAccessServer.11. SpecifythehostnameandAccessServercommunicationport,andclickonApplyto

addtheAccessServer.

Howitworks…AccessServerisastandardDataServicescomponentthatservesasamessagebrokeracceptingrequestsandmessagesfromexternalsystems,forwardingthemtoDataServices,real-timeservicesforprocessing,andthenpassestheresponsebacktotheexternalsystem.

Inotherwords,thisisthekeycomponentrequiredinordertofeedreal-timejobswiththesourcedataandgetoutputdatafromthem.

Wewillcreateareal-timejobinthenextrecipeandexplainthedesignprocessofreal-timejobsindetail.Inthemeantime,youshouldonlyknowthatthemainsourceandtargetobjectsofreal-timejobsaremessages(mostcommonlyinanXMLstructure)andthatAccessServerisresponsiblefordeliveringthosemessages.

Withtheprecedingsteps,theAccessServerservicewascreatedandenabledintheDataServicesenvironmentandisnowreadytoaccepttherequestsfromexternalsystems.

Creatingreal-timejobsInthisrecipe,wewillcreatereal-timejobsandemulatetherequestsfromtheexternalsystemusingtheSoapUItestingtoolinordertogettheresponsewithprocesseddataback.Wewillgothroughallthestepsrequiredtoconfigureallcomponentsrequiredforreal-timejobstowork.

GettingreadyInthissection,wewillinstalltheopensourceSoapUItoolandcreateanewprojectthatwillbeusedtosendandreceiveSOAPmessages(XML-basedformat)toandfromDataServices.

InstallingSoapUIYoucandownloadandinstallSoapUIusingtheURLhttp://www.soapui.org/.

Theinstallationprocessisverystraightforward.Allyouhavetodoisjustfollowtheinstructionsonthescreen.

Aftertheinstallationiscomplete,starttheSoapUI.UsetheSOAPbuttoninthetoptoolbarmenutocreateanewSOAPproject.SpecifytheprojectnameandinitialWSDLaddress,asshowninthefollowingscreenshot:

TheinitialWSDLaddresscanbeobtainedfromDataServices.Togetit,logintoDataServicesManagementConsole,gototheAdministratorsection,chooseWebServices,andclickontheViewWSDLbuttonatthebottomofthemainwindow.

Inthenewopenedwindow,selectandcopythetopURLaddressandpasteitintheNewSOAPProjectconfigurationwindow,asshowninthefollowingscreenshot:

Atthispoint,wehavemadeaninitialconfigurationandcanproceedwithactuallycreatingreal-timejobsattheDataServicesside.

Howtodoit…Now,wehavean“external”systeminplaceandconfiguredinordertosendusrequestmessages.WerememberthattheDataServicescomponentresponsibleforacceptingthesemessagesandsendingthembackisAccessServer,anditwasalreadyconfiguredbyusinthepreviousrecipe.Now,weneedthelastandmostimportantcomponenttobecreatedandconfigured—DataServicesreal-timejob,whichwillbeprocessingtheseSOAPmessagesandreturningtherequiredresult.

Thegoalofourreal-timejobwillbetoprovidethefullnamesofthelocationcodesforaspecificcityandthepostalcodeofthecity.

1. GototheLocalObjectLibrary|Jobssection,right-clickonReal-TimeJobs,and

chooseNewfromthecontextmenu.2. Anyreal-timejobiscreatedwithtwodefaultmandatoryobjectsthatdefinethe

bordersofthereal-timejobprocessingsection:RT_Process_beginsandStep_ends.3. Createtwoscripts,Init_ScriptandFinal_Script,andplacethemcorrespondingly

beforeandafterreal-timejobprocessingsection.4. Insidethereal-timejobprocessingsection,createadataflowandnameit

DF_RT_Lookup_Geography,asshowninthefollowingfigure:

5. Now,openthedataflowDF_RT_Lookup_Geographyforeditinginthemainworkspacewindow.Firstwehave,tocreatefileformatsforourrequestandresponsemessages.

6. CreatearequestfileinyourC:\AW\FilesfoldernamedRT_request.xsd:<?xmlversion=“1.0”encoding=“UTF-8”?>

<xsd:schemaxmlns:xsd=“http://www.w3.org/2001/XMLSchema”>

<xsd:elementname=“Request”>

<xsd:complexType>

<xsd:sequence>

<xsd:elementname=“CITY”type=“xsd:string”/>

<xsd:elementname=“STATEPROVINCECODE”type=“xsd:string”/>

<xsd:elementname=“COUNTRYREGIONCODE”type=“xsd:string”/>

<xsd:elementname=“LANGUAGE”type=“xsd:string”/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xsd:schema>

7. CreatearesponsefileinyourC:\AW\FilesfoldernamedRT_response.xsd:<?xmlversion=“1.0”encoding=“UTF-8”?>

<xsd:schemaxmlns:xsd=“http://www.w3.org/2001/XMLSchema”>

<xsd:elementname=“Response”>

<xsd:complexType>

<xsd:sequence>

<xsd:elementname=“CITY”type=“xsd:string”/>

<xsd:elementname=“POSTALCODE”type=“xsd:string”/>

<xsd:elementname=“STATEPROVINCENAME”type=“xsd:string”/>

<xsd:elementname=“COUNTRYREGIONNAME”type=“xsd:string”/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xsd:schema>

8. Tocreatearequestmessagefileformat,openLocalObjectLibrary|Formats,right-clickonNestedSchemas,andchooseNew|XMLSchema…fromthecontextmenu.SpecifythefollowingsettingsintheopenedImportXMLSchemaFormatwindow:

9. Tocreatearesponsemessagefileformat,openLocalObjectLibrary|Formats,right-clickonNestedSchemas,andchooseNew|XMLSchema…fromthecontextmenu.SpecifythefollowingsettingsintheopenedImportXMLSchemaFormatwindow:

10. ImportRT_Geography_requestasasourceobjectintothedataflowDF_RT_Lookup_GeographyandlinkittotheRequestQuerytransform,propagatingallcolumnstotheoutputschema.ChoosetheMakeMessageSourceoptionwhenimportingtheobjectintoadataflow.

11. ImportRT_Geography_responseasatargetobjectintothedataflowDF_RT_Lookup_Geography.ChoosetheMakeMessageTargetoptionwhenimportingtheobjectintoadataflow.

12. ImporttheDIMGEOGRAPHYtableobjectfromDWHdatastoreandjoinitwiththeRequestQuerytransformusingtheLookup_DimGeographyQuerytransform.Configurethemappingsettingsaccordingtothefollowingscreenshots:

13. GototheFROMtabandconfigurethejoinconditionsforLEFTOUTERJOINbetweentheRequestQuerytransformandDIMGEOGRAPHYsourcetable:

14. LinktheLookup_DimGeographyQuerytransformtothetargetRT_Geography_responseXMLschemaobject.

15. Yourdataflowshouldlooklikethefollowingfigure:

16. Saveandvalidatethejob.

Howitworks…Thedataflowwecreatedinourreal-timejobacceptsXMLmessages(requests)asaninputandproducesXMLmessages(responses)asanoutput.

WeusetheDIMGEOGRAPHYtablefromourdatawarehousetofetchthepostalcode,fullstate/provincename,andcountrynameineitherFrench,English,orSpanish,dependingonwhichcityandlanguagecodewerereceivedintherequestmessage.

Basically,ourreal-timejobservesasalookupmechanismagainstdatawarehousedata.

Let’spublishourreal-timejobasawebserviceanddoourfirsttestruntoseehowtheexchangingmessagesmechanismworks.

1. OpentheDataServicesManagementConsole|Administrator|Real-Time|

<YourServerName>:4000|Real-TimeServices|Real-TimeServiceConfigurationtab.

2. ClickonAddtoaddthereal-timeserviceLookup_Geography;usethefollowingsettingstoconfigureit,andclickonApplywhenyouhavefinished:

3. GotothenextReal-TimeServicesStatustabandstarttheservicejustcreatedbyselectingitandclickingontheStartbutton:

Theiconofthereal-timeserviceshouldbecomegreen.

1. GototheAdministrator|WebServices|WebServicesConfigurationtab.2. SelectAddReal-TimeServicesinthebelowcomboboxandclickontheApply

buttonontheright.3. SelectLookup_GeographyfromthelistandclickonAdd:

4. TheLookup_Geographyreal-timeserviceshouldappearontheWebServicesStatustab,asshowninthefollowingscreenshot:

5. Wehavesuccessfullypublishedourcreatedareal-timejobasreal-timewebservice.Now,openSoapUIandmakesurethatyoucanseetheLookup_Geographywebservice.Todothat,starttheSoapUItoolandexpandtheDSProject|Real-time_Servicestabintheprojecttreepanel.

6. Right-clickontheLookup_GeographyitemandchooseNewrequestfromthecontextmenu.

7. ExpandLookup_Geographyanddouble-clickontheGeography_requestitem.8. Youwillseethatthenewwindowontherightopensshowingtwopanels:requestand

response.9. FillinvaluesinallthefieldsoftherequestXMLstructureandclickonthegreen

trianglebuttontosubmitarequest.Theresponseisreceivedanddisplayedintherightpanel.Asyoucansee,ithasthedatafromDIMGEOGRAPHYtable,whichresidesinthedatawarehouse:

NoteOneofthemostpopularcasesofusingreal-timejobsiscleansingthedatathroughwebservicesrequests.Areal-timejobreceivesthespecificvalueandpassesitthroughtheDataQualitytransformsavailableinDataServicesinordertocleanseitandthenreturnstheresultintheresponsemessage.

Chapter11.WorkingwithSAPApplications

IntroductionThischapterisdedicatedtothetopicofreadingandloadingdatafromSAPsystemswiththeexampleofaSAPERPsystem.DataServicesprovidesquickandconvenientmethodsofobtaininginformationfromSAPapplications.Asthisisaquitevasttopictodiscuss,therewillbeonlyonerecipebut,nevertheless,itshouldcoverallaspectsofextractingandloadingdataintoSAPERP.

LoadingdataintoSAPERPWewillnotdiscussthetopicofconfiguringtheSAPsystemtocommunicatewiththeDataServicesenvironmentasitwouldrequireanotherfewchaptersonthesubjectand,mostimportantly,itisnotthepurposeofthisbook.AllthisinformationcanbefoundindetailedSAPdocumentationavailableathttp://help.sap.com.

WepresumethatyouhaveexactlythesameDataServicesandstagingenvironmentsconfiguredandcreatedinthepreviouschaptersofthisbookandhavealsoinstalledandconfiguredtheSAPERPsystem,whichcancommunicatewiththeDataServicesjobserver.

Inthisrecipe,wewillgothroughthestepsofloadinginformationintotheSAPERPsystembyusingDataServices.Inoneofourpreparationprocesses,wewillbegeneratingdatarecordsforinsertionrightinthedataflow,whenusually,youhavethedatareadytobeloadedinthestagingareaextractedfromanothersystem.

WewillalsoreviewthemainSAPtransactionsinvolvedintheprocessofmanuallycreatingdataobjects,monitoringtheloadingprocessontheSAPside,andthetransaction,whichmightbeusedforthepost-loadvalidationofloadeddata.

Wewillbeusingtheexampleofcreating/loadingbatchdata,whichisrelatedtomaterialdatainSAPERP.First,wewillcreatethespecificmaterialrequiredforthebatchdatatobeloaded.Then,wewillcreatethetestbatchmanuallytoseehowitisdoneonSAPside,andthenwewilldevelopETLcodeinDataServices,whichwillpreparethebatchrecordandsendittotheSAPside.

GettingreadyThefirstthingwehavetodoistocreatethematerialforwhichwewillbecreatingbatchesinSAP.

1. LogintotheSAPERPsystemandrunthetransactionMM01tocreatenewmaterial.2. SpecifyMaterialasRAWMAT01,MaterialType,andIndustrysector:

3. Selectthefollowingviewsforthenewmaterial:Basicdata1,Basicdata2,Classification,andGeneralPlantData\Storage1:

4. Onthenextwindow,specifyOrganizationLevels:

5. Onthenextscreen,defineBaseUnitofMeasureandmaterialdescription:

6. OntheSales:general/planttab,ticktheBatchmanagementcheckboxtodefinethematerialasbatchmanaged:

7. Finally,ontheClassificationtab,classifythematerialastherawmaterialofclasstype023:

ClickontheContinue(Enter)buttoninthetop-leftcornertosaveandcreatenewmaterial.

Now,wecanmanuallycreatethefirstbatchobjectforournewmaterialsothatwecanlatercompareittothebatchobjectthatwillbegeneratedandinsertedbyDataServicesjobsautomatically.

8. RunthetransactionMSC1Ntocreateanewbatch,andspecifythematerialnumberandbatchnamethatyouwouldliketocreate:

9. ClickonContinue,andonthenextscreen,fillinthevaluesforthefollowingfields:theDateofManufacturebatch,theLastGoodsReceiptdate,andCtryofOrigin:

10. ClickonContinuetosaveandcreateanewbatch,20151009.

ThelastpreparationstepwehavetocompleteistheconfigurationofapartnerprofileinourSAPERPsothatthesystemcanaccepttheIDocmessagescontainingthebatchdatathatwillbesenttoSAPERPfromDataServices.

11. RunthetransactionWE20toconfigurethepartnerprofile.12. OnthePartnerprofileswindow,selectthePartnerTypeLSsectionandselectthe

clientyouarecurrentlyusing:

MakesurethatyourPartn.statusisActiveontheClassificationtabandthatyouhaveBATMASspecifiedintheInboundparmtrslist.Ifnot,thenclickontheCreateinboundparameterbuttonundertheInboundparmtrstabanddefinetheBATMASinboundparameter:

Now,everythingisreadyontheSAPERPsideandallwehavetodoiscreatetheDataServicesjobthatwillgenerateandsendthedataintotheSAPERPsystemforinsertion.

Howtodoit…1. StartDataServicesDesignerandgotoLocalObjectLibrary|Datastores.2. Right-clickontheemptyspaceoftheDatastorestabandchooseNewfromthe

contextmenuinordertocreatenewdatastoreobject.3. Createanewdatastore,SAP_ERP,byspecifyingthedatastoretypeSAPApplications

anddatabaseservernamealongwithyourSAPcredentials.4. ClickontheAdvancedbuttonandspecifytheadditionalsettingsrequiredforsetting

uptheconnectiontotheSAPERPsystem,suchasClientnumberandSystemnumber.Seethefollowingscreenshotforthefulllistofconfigurationsettings:

ClickonOKtocreatethedatastoreobject.

5. Importthefollowingobjectsinyourdatastorebyright-clickingontherequiredsectionoftheobjectyouwanttoimportandchoosingtheImportByName…optionfromthecontextmenu:

TheIDocobjectBATMAS03willbeusedasatargetobjecttotransferbatchdatatotheSAPsystem.

TheMARAandMCH1tableswillbeusedassourceobjectstoextractdatafromtheSAPsystemforpre-loadandpost-loadvalidationpurposes.

6. Createanewjobcontainingfourlinkeddataflowobjects,asshowninthefollowingscreenshot:

7. OpenthefirstDF_SAP_MARAdataflowintheworkspacewindowforeditingandspecifytheMARAtableobjectimportedintheSAP_ERPdatastoreasasourceandthenewSAP_MARAtemplatetableintheSTAGEdatabaseasatarget.PropagateallcolumnsfromthesourceMARAtabletoSAP_MARAusingQuerytransform.Runthejobonceandimportthetargettableobject:

8. OpenDF_Prepare_Batch_Dataintheworkspacewindowforediting.9. AddtheRow_Generationtransformasasource.Setituptogenerateonlyone

recordwiththerownumberstartingat1.10. LinkittotheCreate_Batch_RecordQuerytransform,whichwillbeusedtodefine

thefieldsofthecreatedrecord.Usethefollowingscreenshotasareferenceforcolumnnamesandmappings:

11. AddanotherQuerytransformnamedValidate_Material,linkCreate_Batch_Recordtoit,andpropagateallcolumnsfromtheinputschematotheoutputschema.

12. Addanextracolumnasanewfunctioncallofthelookup_extfunctionandconfigureitasshowninthefollowingscreenshot,lookinguptheMATNRfieldfromtheSAP_MARAtablebytheMATERIALfieldvaluefromtheinputschema:

13. AddtheValidationtransform,forkingthedatasetintothreecategories—Rule,Pass,andFail,sendingtheoutputstothreetargettables:BATCH,BATCH_REJECT,andBATCH_REJECT_RULE,asshowninthefollowingscreenshot:

14. OpentheValidationtransformintheworkspacewindowforeditingandaddinganewvalidationrule:

15. TheValidationtransformeditorshouldlookasshowninthefollowingscreenshot:

Closethedataflowandsavethejob.

16. Openthethirddataflow,DF_Batch_IDOC_Load,intheworkspacewindowforediting.17. Buildthestructureofthedataflow,asshowninthefollowingscreenshot.Thestepsto

configureeachofthedataflowcomponentswillbeprovidedfurther.

18. TheRow_Generationtransformshouldbeconfiguredtogenerateonerecord.UsethefollowingtabletodefineoutputschemamappingsintheEDI_DC40Querytransform.ThefollowingtablehastherecordsonlyforthemandatorycolumnsoftheEDI_DC40IDocsegment.PopulatetherestofthemwithNULLvalues.

Columnname Datatype Mappingexpression

TABNAM varchar(10) ‘EDI_DC40’

MANDT varchar(3) ‘100’

DOCREL varchar(4) ‘740’

DIRECT varchar(1) ‘2’

IDOCTYP varchar(30) ‘BATMAS03’

MESTYP varchar(30) ‘BATMAS’

SNDPOR varchar(10) ‘TRFC’

SNDPRT varchar(2) ‘LS’

SNDPRN varchar(10) ‘SBECLNT100’

CREDAT date sysdate()

CRETIM time systime()

ARCKEY varchar(70) ‘1’

NotePleasekeepinmindthatsomeofthevaluesinmappingexpressionsforthisspecificsegment,EDI_DC40,arespecifictoyourownSAPenvironment.SomeofthemareMANDTandSNDPRN,whichshouldbeobtainedfromyourSAPadministrator.

Toobtainthefulllistofcolumnsrequiredforthespecificsegment,refertotheBATMAS03objectstructureitself.

19. OpentheE1BATMASQuerytransformintheworkspacewindowforeditinganddefinethefollowingmappingsfortheoutputschemacolumns:

Columnname Datatype Mappingexpression

MATERIAL varchar(18) BATCH.MATERIAL

BATCH varchar(10) BATCH.BATCH_NUMBER

ROW_ID int BATCH.ROW_ID

20. OpentheE1BPBATCHATTQuerytransformintheworkspacewindowforeditinganddefinethefollowingmappingsfortheoutputschemacolumns:

Columnname Datatype Mappingexpression

LASTGRDATE date to_date(BATCH.GOODS_RECEIPT_DATE,‘YYYYMMDD’)

COUNTRYORI varchar(3) BATCH.COUNTRY_OF_ORIGIN

PROD_DATE date to_date(BATCH.DATE_OF_MANUFACTURE,‘YYYYMMDD’)

ROW_ID int BATCH.ROW_ID

21. OpentheE1BPBATCHATTXQuerytransformintheworkspacewindowforeditinganddefinethefollowingmappingsfortheoutputschemacolumns:

Columnname Datatype Mappingexpression

LASTGRDATE varchar(1) ‘X’

COUNTRYORI varchar(1) ‘X’

PROD_DATE varchar(1) ‘X’

ROW_ID int BATCH.ROW_ID

22. OpentheE1BPBATCHCTRLQuerytransformintheworkspacewindowforeditinganddefinethefollowingmappingsfortheoutputschemacolumns:

Columnname Datatype Mappingexpression

DOCLASSIFY varchar(1) ‘X’

ROW_ID Int BATCH.ROW_ID

23. OpentheIDOC_Nested_SchemaQuerytransformintheworkspacewindowforediting.

24. DraganddropEDI_DC40andE1BATMASsegmentsfromtheinputschemaintotheoutputschemaoftheIDOC_Nested_SchemaQuerytransform.

25. Double-clickonoutputschemaIDOC_Nested_Schematomakeitsstatusto“current”,opentheFROMtab,andselectonlytheE1BATMASinputschema.MarktheEDI_DC40segmentintheoutputnestedschemaasrepeatable(thefulltableicon).Ifthesegmentschemaiscreatedasrepeatablebydefaultthendonotchangeit.MarktheE1BATMASoutputschemasegmentasnon-repeatable.Todothat,makeitcurrentbydouble-clickingonit,andthenright-clickonit,unselectingtheRepeatableoptionfromthecontextmenu.SeethedifferencebetweentheoutputschemaiconsforEDI_DC40andE1BATMASasforrepeatableandnon-repeatablesegments.

26. Double-clickonthefirstEDI_DC40outputsegmenttomakeitsstatus“current”.OpentheFROMtabandselectonlytheEDI_DC40inputschema:

27. Double-clickonthesecondE1BATMASoutputsegmenttomakeitcurrent.OpentheFROMtabandselectonlytheEDI_DC40inputschema,inthesamewayasforthepreviousEDI_DC40outputschema.Also,deletetheROW_IDcolumnfromtheoutputschemaanddraganddroptherestoftheinputschemasE1BPBATCHATT,E1BPBATCHATTX,andE1BPBATCHCTRLinsidetheE1BATMASoutputschemacreatingnestingstructure:

28. Double-clickonthenestedE1BPBATCHATToutputschematomakeitcurrent.DeletetheROW_IDcolumnfromtheoutputschema.OntheFROMtab,selecttheE1BPBATCHATTinputschema.OntheWHEREtab,specifythefilteringcondition:(E1BPBATCHATT.ROW_ID=E1BATMAS.ROW_ID).

29. Performtheprecedingsamestepforthenextoutputsegment.Double-clickonthenestedE1BPBATCHATTXoutputschematomakeitcurrent.DeletetheROW_IDcolumnfromtheoutputschema.OntheFROMtab,selecttheE1BPBATCHATTXinputschema.

OntheWHEREtab,specifythefilteringcondition:(E1BPBATCHATTX.ROW_ID=E1BATMAS.ROW_ID).

30. Performthesameprecedingstepforthenextoutputsegment.Double-clickonthenestedE1BPBATCHCTRLoutputschematomakeitcurrent.DeletetheROW_IDcolumnfromtheoutputschema.OntheFROMtab,selecttheE1BPBATCHCTRLinputschema.OntheWHEREtab,specifythefilteringcondition:(E1BPBATCHCTRL.ROW_ID=E1BATMAS.ROW_ID).

31. ThetargetobjectBATMAS03importedintotheSAP_ERPdatastoreshouldbeconfiguredusingthevaluesshowninthefollowingscreenshot.OpentheBATMAS03targetobjectinthedataflowinthemainworkspaceforeditingtoconfigureit.

Closethedataflowobject.Saveandvalidatethejobtomakesurethatyouhavenotmadeanysyntaxerrorsinyourdataflowdesign.

32. Openthelastdataflow,DF_SAP_MCH1,foreditingintheworkspacewindow.33. AddtheMCH1tablefromtheSAP_ERPdatastoreasasourceobject.34. PropagateallthecolumnsfromtheMCH1tabletotheoutputschemausingthelinked

Querytransform.35. Addanewtemplatetable,-SAP_MCH1,fromtheSTAGEdatastoreasatargettable

object.36. Save,validate,andrunthejob.

Howitworks…TheprecedingstepsshowthecommonprocessofloadingdataintotheSAPsystemusingtheIDocmechanism.Theloadprocessusuallyconsistsoffewsteps:

ExtractmasterdatafromtheSAPsystemtomakesurethatwearereferencingthecorrectobjectsexistinginthetargetsystemProcessofbuilding/preparingdatasetforloadProcessofloadingthedataintoSAPThepost-validationprocessofextractingdataloadedinSAPbackintothestagingareaforvalidation

Let’sreviewalltheseprocessesbuiltintheformofadataflowinmoredetail.

Thefirstdataflow,DF_SAP_MARA,willbeextractingmaterialdatafromSAPERPforvalidationpurposestomakesurethatwedonottrytocreateabatchformaterialthatdoesnotexistinthetargetSAPsystem.

Theseconddataflow,DF_Prepare_Batch_Data,preparesthebatchrecordtobeloadedinSAP.AsyoucanseefromtheoutputschemamappingofoneoftheQuerytransforms,wepreparethebatch2015100901tobecreatedformaterialRAWMAT01.Asyoumightremember,wehavealreadymanuallycreatedbatch20151009.TherestofthemappingsshowthatwehavealsopopulatedtheCtryoforigin,LastGoodsReceipt,andDateofManufacturefields.

Thethirddataflow,DF_Batch_IDOC_Load,transformsthepreparedbatchrecordintothenestedformatofanIDocmessageandsendsthisIDocmessagetoSAP.Furthermore,wewilltakealookathowyoucanmonitortheprocessofreceivingandloadingIDocsontheSAPside.

Finally,thefourthdataflow,DF_SAP_MCH1,extractstheSAPtableMCH1,whichcontainsinformationaboutbatchescreatedinSAPforpost-loadvalidationpurposes.ThatallowsustoseewhichbatcheswereactuallyloadedinSAPandruntheSQLqueriesinourstagingareatovalidatefieldvalues.

IDocIDocisaformatandtransfermechanismthatSAPsystemsusetoexchangedata.DataServicesutilizesthismechanisminordertosendandreceiveinformationfromSAPsystems.IDocsthattheSAPsystemreceivesarecalledinboundandIDocssentbySAParecalledoutbound.YousawthattransactionWE20wasusedtoconfigureInboundIDocparameterssothatSAPcouldsuccessfullyacceptBATMASIDocmessagessenttoitfromDataServices.

BATMASIDocusedtoloadbatchdatahasanestedstructure,andthatiswhywehadtonestmultipledatasetswiththehelpofQuerytransform.WeusedartificialIDkeyROW_IDtolinkallthenestedsegmentstogether.

KeepinmindthatDataServicesdoesnotloaddatainSAPtablesdirectlyitself.AllDataServicesdoesispreparesthedataintheIDocformatsothatitcanbereceivedbySAPand

loadedintoSAPtablesusinginternalmechanisms/programs.

MonitoringIDocloadontheSAPsideDataServicessendsIDocmessagestoSAPsynchronously.AnIDocmessageisreceivedbySAPandthenprocessed.OnlyafterthatdoesDataServicessendsthenextIDocmessage.Sometimes,thisprocesscantakequitealongtime.AllyouwillseeintracelogontheDataServicessideisonerecordindicatingthatthedataflowloadingdataisstillrunning.

ToseewhatisgoingontheSAPside—howmanyIDocsfailandhowmanyofthemareprocessedsuccessfullybySAP—youcanusetransactionBD87:

ByexpandingtheBATMASsectionanddouble-clickingontheactualIDocrecordthatyouareinterestedin,youcanseethedataintheIDocnestedsegments:

Otherusefulinformationavailableonthisscreenincludes:

ThestatusoftheIDoc(processedsuccessfullyorfailed)Errormessages(iffailed)DatarecordsstoredinIDocmessage(E1BATMAS,E1BPBATCHATT,E1BPBATCHATTX,and

E1BATCHCTRLsegments)

Asyoucansee,theEDI_DC40segmentisnotvisibleasitisanIDocheaderitself.InformationwehaveprovidedinthissegmentisavailableintheShortTechnicalInformationpanelanddefinesthebehaviorofIDocprocessing.

ByclickingontheRefreshbuttonontheStatusMonitorforALEMessagesscreen,youcanseeinrealtimehowtheIDocsreceivedbySAPareprocessed.

Post-loadvalidationofloadeddataWeknowthatoneofthetablesinSAPwherebatchmasterdataisstoredisMCH1.Knowingwhichphysicaltablesareactuallypopulatedwithdatawhenyouenterdatamanuallyviatransactionalscreens,orloadingdatacomingfromexternalsystemsviaanIDocmechanism,isusefulasyoucanalwaysextractthecontentsofthesetablestoperformpost-validationtasks.

Toviewournewlycreatedbatch2015100901,wecanusetransactionMSC3N(DisplayBatch):

Or,wecanseethecontentsoftheMCH1tabledirectlyusingtheSE16transaction(DataBrowser):

Youcanseebothbatcheshere:theonecreatedmanuallyandtheoneloadedwiththehelpofDataServices.

DoyourememberthatwedevelopedadataflowtoextracttheMCH1tabletovalidateloadeddata?Let’schecktheactualrecordsextractedrightaftertheloadingprocesshasbeencompletedbybrowsingthecontentsoftheSAP_MCH1tableinourstagingarea:

TheCHARGcolumnintheMCH1tablestoresthebatchnumbervalues.

TipAstechnicalnamesinSAPtablescanbequitedifficulttounderstand,youcanusetransactionSE11toseethedescriptionsofthecolumnsforthespecifictable.

Thereismore…Wehavejustscratchedthesurfaceofoneofthepossiblemethodsofreading/loadingdatafromtheSAPsystem.

TherearemanyothermethodsthatcanbeusedtocommunicatewithSAPsystems:ABAPdataflows,BAPIcalls,directRFCcalls,OpenHubTables,andmanyothers.

Choosingbetweenthesemethodsusuallydependsonthetypeoftasksthathavetobeimplemented,theamountoftransferreddata,andthetypeofSAPenvironmentused.

Chapter12.IntroductiontoInformationStewardInthischapter,wewillseethefollowingrecipes:

ExploringDataInsightcapabilitiesPerformingMetadataManagementtasksWorkingwiththeMetapediafunctionalityCreatingacustomcleansingpackagewithCleansingPackageBuilder

IntroductionSAPInformationStewardisaseparateproductthatisinstalledalongsideSAPDataServicesandSAPBusinessIntelligenceandprovidesadditionalcapabilitiesforbusinessandITusersinordertoanalyzedataqualityandcreatecleansingpackagesthatcanincreasedatacleansingprocessesranbyDataServices.

TocoverallfunctionalitiesofInformationSteward,wewouldhavetowriteanotherbook.Inthischapter,wewillexplorethemainfunctionsofInformationStewardthatprovedthemselvestobethemostvaluabletousersoftheproduct.

AlltheseactivitiesrelatetospecificareaswithintheSAPInformationStewardapplication.

NoteLogintotheSAPInformationStewardapplicationathttp://localhost:8080/BOE/InfoStewardApp.

Onthemainpage,youcanseefivetabsthatrepresentthefourmainareasoftheInformationStewardproductfunctionality,asshowninthefollowingscreenshot:

ExploringDataInsightcapabilitiesTheDataInsighttabisthefirsttab,anditenablesyoutoprofilethedataavailablefromdifferentsources,buildvalidationrulesforthedata,anddesignascorecardinordertoseeavisualrepresentationofthequalityofyourdata.

GettingreadyBeforewelogintotheSAPInformationStewardapplication,wehavetocreatecoupleofInformationStewardobjectsinastandardCentralManagementConsole(CMC).ThegoalofthispreparationstepistodefinethesourcesofdatathatInformationStewardcanconnecttoinordertoperformdataqualityandanalysistasks.YoucandefinesomedatasourcesdirectlyintheInformationStewardapplicationlikeaflatfile,butsomeofthemshouldfirstbecreatedasconnectionsintheCMCInformationStewardarea.

1. LogintoCMCathttp://localhost:8080/BOE/CMC.2. GototheInformationStewardsection.3. ClickonConnectionsandclickontheCreateconnectionbuttoninthetopmenu.4. Fillinalltherequiredfields,asshowninthefollowingscreenshot,inordertocreate

aconnectionobjecttotheAdventureWorks_DWHSQLServerdatabase:

5. ClickontheTestConnectionbuttontovalidatetheinformationentered,andthenclickontheSavebuttontosavetheconnectionandexittheCreateConnectionscreen.

6. Thedwh_profileconnectionshouldappearinthelistofconnectionsthatcanbeusedinInformationSteward.

7. Finally,let’screateanewDataInsightprojectcalledGeography.Todothat,gototheDataInsightsectionandclickontheCreateaDataInsightprojectbutton.

Howtodoit…Beforeyoustartwiththefollowingsteps,firstlogintoSAPInformationStewardathttp://localhost:8080/BOE/InfoStewardApp.

ThecommonsequenceofactionsperformedontheDataInsighttabinInformationStewardincludes:

CreatingaconnectionobjectProfilingthedataViewingprofilingresultsCreatingavalidationruleCreatingascorecard

CreatingaconnectionobjectThefollowingstepsarerequiredtospecifythesourceofourdataforourDataInsightanalysis.

1. GotoDataInsight|GeographyProject.2. SelecttheWorkspaceHometabandclickontheAdd|Tables…buttoninthetop-

leftcorner.3. Intheopenedwindow,selectthedwh_profileconnectionobject,thenexpandit,

selectthedbo.DimGeographytable,andthenclickontheAddtoProjectbutton,asshowninthefollowingscreenshot:

Profilingthedata

Profilingorgatheringvariouskindsofinformationaboutthedatacanbeusedfordataanalysis.

1. ToprofilethedataintheaddedDimGeographytable,youcanusevariousprofiling

options.Let’scollectuniquenessprofilingdata.OntheWorkspaceHometab,selecttheDimGeographytableinthedwh_profileconnectionandclickontheProfile|UniquenessbuttonintheProfileResultstoolbarmenu.

2. IntheDefineTasks:Uniquenesswindow,specifywhichcolumnsyouwanttogatherauniquenessprofileinformationfor.SelectCityandCountryRegionCodeandclickontheSaveandRunNowbutton,asshowninthefollowingscreenshot:

3. Togathercolumnprofilinginformation,selecttheDimGeographytableandclickontheProfile|ColumnsbuttoninthetoolbarmenuoftheWorkspaceHome|ProfileResultstab.Specifyanameforthecolumnprofilingtask,Geography_column_profiling,andselectallprofilingoptions:Simple,Median&Distribution,andWordDistribution.Then,clickontheSaveandRunNowbuttontocreateandexecutethecolumnprofilingtask.

4. SelecttheTaskssectionontheleft-sidepaneltoseeboththeprofilingtaskscreatedintheprevioussteps.Youcanrunthemanytimefromthistabtorefreshtheprofilingdataaccordingtotheparametersspecified.

ViewingprofilingresultsThefollowingstepsshowyouhowtoviewthepreviouslygatheredprofilingresults.

1. Toseethedataprofileresults,gototheWorkspaceHome|ProfileResultstab.2. Expandthetableyouareinterestedintoseeitscolumnsandselectit.3. ClickontheRefresh|ProfileResultsbuttoninthetoolbarmenu.4. Then,byclickingonthefieldorspecificnumberyouareinterestedin,youcansee

thedetailedresultforthisfieldintheextrawindowsontheright-handsideofthescreen,andatthebottom,asshowninthefollowingscreenshot:

5. Toseetheresultsoftheuniquenessprofileinformationcollected,selecttheAdvancedviewmodeundertheProfileResultstab.

6. Intheopenedwindow,clickonthegreeniconintheUniquenesscolumnandselectthekeycombinationyouhavegatheredinformationon.Inourcase,wehavegathereduniquenessprofilinginformationfortwocolumnsoftheDimGeographytable,CityandCountryRegionCode,asshowninthefollowingscreenshot:

Byhoveringyourcursorovertheredzoneshowingthepercentageofnon-uniquerecordstheforselectedcombinationofcolumns,youcanseedetailedinformationsuchasthepercentageofnon-uniquerowsandnumberofnon-uniquerows.Inourcase,itis22.08%and151.Byclickingontheredzone,youcandisplaynon-uniquerowsatthebottomofthescreen.

Sofar,wehavegatheredtwotypesofprofilinginformation:columnprofiledataanduniquenessprofiledatafortheDimGeographytablelocatedinourdatawarehouse.

CreatingavalidationruleNow,let’sseehowyoucancreateavalidationruleinInformationStewardanddisplaytheresultofapplyingittothedatasetinagraphicalformbyusingscorecards.

1. OntheWorkspaceHome|ProfileResultstab,youcanfindayellowiconinthe

Advisorcolumnagainstthedbo.DimGeographytable,asshowninthefollowingscreenshot:

2. ClickontheyellowiconshownintheprecedingscreenshottolaunchDataValidationAdvisor:

3. WearenotgoingtoacceptthevalidationrulesuggestedbyDataValidationAdvisorandwillcreateourowncustomvalidationrule.

OurcustomrulewillcheckiftheDimGeographytablerecordhastranslatedvaluesinboththecolumns,FrenchCountryRegionNameandSpanishCountryRegionName.Tocreateanewrule,openthesecondverticaltableRules,whichisnexttotheWorkspaceHometab,andclickontheNewbuttonfromthetoolbarmenutocreateanewrule.

4. FillinalltheconfigurationfieldsofthenewFrench_Spanish_CountryRegionNamerule,asshowninthefollowingscreenshot:

Wehavecreatedtwoparameters,$French_translationand$Spanish_translation,ofthevarchardatatype.Eachparameterchecksthevalueineachofthetwocolumns,andintheDefinitiontab,wehavespecifiedtheconditiontobeappliedtothevalues.

5. ClickontheSubmitforApprovalbutton.TherulewillbesenttotheTaskstabforapprovalbyacategoryofusersspecifiedintheApproverfieldontheRuleEditorwindow.

6. TherulecanbeapprovedfromtheMyWorklistsection,asshowninthefollowingscreenshot:

7. GototheWorkspaceHome|RuleResultstabandclickontheBindtoRulebutton.

8. Bindtheruleparameterstothedwh_profile.dbo.DimGeographyfields,asshowninthefollowingscreenshot,andclickontheSaveandClosebutton:

9. ClickonRefresh|RuleResultstoseetheresultsofapplyingtheruletothecolumnsofthetablespecified,asshowninthefollowingscreenshot:

Theleftsideofthescreenshowstherulescoresforthespecifiedfieldsandthenumberofrecordsthatpassed/failedtherule.Inourexample,55rowsdonothaveeitheraFrenchorSpanishtranslationintheFrenchCountryRegionNameandSpanishCountryRegionNamefields.

Youcanseetheactualrecordsintheright-sidepanel.

10. YoucanseetheruleresultontheRulestabdirectly.AllyouneedtodoisselecttheruleandclickontheBindbutton.Theruleresultappearsontherightsideofthescreen,asshowninthefollowingscreenshot:

CreatingascorecardScorecardsareaconvenientwaytovisualizeandpresenthistoricalinformationaboutvalidationruleresults.

1. AscorecardcanbecreatedontheScorecardSetuptab.Thisisavery

straightforwardprocesswhereyoufirstspecifyKeyDataDomain,QualityDimension,thentheruleyouwanttoincludeinthescorecardoutput,and,finally,performrulebindingtolinktheruletotheactualdataset,asshowninthefollowingscreenshot:

2. Toviewthescorecardresults,gotoWorkspaceHomeandselecttheScorecardviewmodeinsteadofWorkspaceinthecomboboxlocationinthetop-rightcorner,asshowninthefollowingscreenshot:

Howitworks…Now,afterwehavecreatedourconnectionobject,gatheredtheprofilingdata,appliedthevalidationrule,andevencreatedthescorecardtoseeitsresults,let’sseeinmoredetailthevariousaspectsofthestepsperformed.

ProfilingAsyoucansee,workinginInformationStewardisaveryintuitiveprocess.

Asmentionedearlier,theDataInsightsectionofInformationStewardisallaboutunderstandingyourdata,whichispossiblewiththeprofilingcapabilitiesofIS.Inthemajorityofcases,profilingyourdataisthefirststepbeforestartinganydataqualityrelatedwork.Inthefollowingsection,wewillreviewthetypesofprofilingdataavailableintheProfileResultssection.

Thevaluesectionofprofilingdatashowstheactualborderandmedianvaluesfromthedatasetforaspecificfield.StringLengthprofilingvaluesprovideinformationaboutthesizeofthevalues.Thecompletenesssectionhelpsyoutoseeanygapsinthedata.Distributioncanbeextremelyusefultounderstandthecardinalityofthespecificfieldsinyourdataset.Forexample,seeingnumber7intheDistribution|ValuefieldoftheprofilingresultdataagainsttheCountryRegionCodefield,wewillknowthatwehaveonlysevendifferentvaluesinthatfield.Clickingonthatnumbershowsusthosevaluesandtheirdistributionintheright-handsidepanel.

RulesRulesallowyoutoanalyzethedataaccordingtocustomconditions.Rulesarecreatedforgeneralruleparameterssothatyoucanapplythesameruletodifferentdatasets,ifnecessary.Linkingtheruletoaspecificdatasetiscalledbinding.Itistheprocessoflinkingruleparameterstoactualtablefields.

Rulesareusuallydefinedbybusinessuserstounderstandhowdatacomplieswithspecificbusinessrequirements.

InformationStewardoffersaDataValidationAdvisorfeaturethatproposesthepreconfiguredrulesdependingontheprofilingresultsofyourdata.

ScorecardsScorecardsallowyoutogroupyourrulesandhelpyoutoseetrendsindatascorescalculatedbyspecificrules.

Thereismore…ThereismuchmoretotheDataInsightfunctionalitythanpresentedinthisrecipe.WehavejustscratchedthesurfaceofthebasicfunctionsavailableinthisareaofInformationSteward.

ItispossibletospecifyfileformatsdirectlyinInformationStewardinordertosourcedatafromflatfilesandfromExcelspreadsheets.

AnothergreatthingaboutInformationStewardDataInsightisthatitallowsyoutobuilddataviewsthatarebasedonmultiplesourcesofinformation.

Theintuitiveandwell-documentedinterfaceallowsyoutoeasilyexperimentandplaywithyourdataonyourown.Thisisalwaysaveryfascinatingprocessthatdoesnotrequireanydeeptechnicalknowledgeoftheunderlyingproduct.

PerformingMetadataManagementtasksThesecondtoolavailableinInformationStewardafterDataInsightisMetadataManagement.

TheMetadataManagementtoolisusedtocollectmetadatainformationfromvarioussystemsinordertogetacomprehensiveviewofitandtoanalyzetherelationshipsbetweenmetadataobjects.

Inthisrecipe,wewilltakealookattheexampleofusingMetadataManagementonourDataServicerepository,whichstorestheETLcodedevelopedforrecipesofthisbook.

GettingreadyAswithDataInsight,wehavetofirstestablishconnectivitytotheDataServicesrepository.ThisisusuallyanadministrationtaskthatcanbedoneinCMCintheInformationSteward|MetadataManagementsection.ClickonCreateanintegratorsourceandfillinalltherequiredfields,asshowninthefollowingscreenshot,todefineaconnectiontoDataServicesrepositoryfortheMetadataManagementtool:

Aftercreatinganintegratorsourceobject,youhavetorunitbyusingtheRunNowoptionintheobject’scontextmenu.ThatoperationwillperformthecollectionofmetadataorinformationaboutalltheobjectsintheDataServicesrepository.RememberthatanyrecentchangesmadetotherepositoryafterthisoperationwillnotbepropagatedtothecollectedMetadataManagementsnapshot,soyouwouldneedtoeitherrunitmanuallyorscheduleittorunregularlyaccordingtoyourrequirements.

ThefollowingscreenshotshowsyouhowtousetheRunNowoption:

TheLastRuncolumnshowsyouwhentheintegratorsourcedatawaslastupdated.

Toseethehistoryofruns,justselectHistoryfromtheintegratorsourceobjectscontextmenu,asshowninthefollowingscreenshot:

Thisscreencanshowyouhowlongittooktocollectmetadatainformationfromtherepositoryandevenprovidesaccesstothedatabaselogofthemetadatacollectionprocess,whichcanbeusedfortroubleshootinganypotentialproblems.

Howtodoit…NowthatwehavedefinedtheconnectiontoourDataServicesrepositoryandcollectedthemetadatasnapshotusingthisconnectioninCMC,wecanlaunchtheInformationStewardapplicationtousetheMetadataManagementfunctionality.

1. LogintoInformationStewardandgototheMetadataManagementsection,as

showninthefollowingscreenshot:

2. ClickontheData_Services_RepositorysourceintheDataIntegrationcategoryandontheopenedscreen,lookfortheDimGeographytableusingtheSearchfield.TheSearchResultssectionattheverybottomshowsyouallthepossiblematches,soallyouhavetodoisselecttheobjectyouneed—tablefromtheSTAGEdatabaseundertheTransformschema:

3. ToseetheimpactthetablehasonanotherobjectinETLrepository,clickontheViewImpactbutton.Youshouldseesomethinglikethefollowingscreenshot:

YoucanseethattheDimGeographytableisusedasasourcetopopulatetheotherDimGeographytables(fromAdventureWorks_DWHandDWH_backupdatabases).

4. ClickontheLINAGEsectioninthesamewindowtoseethesourceobjectfortheDimGeographytableoftheSTAGEdatabaseTransformschema,asshowninthefollowingscreenshot:

Youcanseethatthedatacamefromthreetables:ADDRESS,COUNTRYREGION,andSTATEPROVINCE.

5. ByswitchingtoColumnsMappingView,youcanseethelinageinformationonthecolumnlevel,asshowninthefollowingscreenshot:

6. ClosethiswindowtogobacktothemainMetadataManagementworkingarea.Now,let’sdefinetherelationshipbetweenthetwotablesfromtheDataServicesrepositoryarenotdirectlyrelatedtoeachotherinETLcode:STAGE.Transform.DIMGEOGRAPHYandSTAGE.Transform.DIMSALESTERRITORY.Todothat,youhavetoselecteachtableintheSearchresultssectionatthebottomandclickontheAddtoObjectTraybutton.

7. WhenbothtablesareaddedintoObjectTray,clickontheObjectTray(2)linkatthetopofthescreen(righttotheSearchfield).

8. Intheopenedwindow,selectbothobjects,asshowninthefollowingscreenshot:

9. ClickonEstablishRelationshipandconfigurethedesirablerelationshipbetweenthesetwoobjects,asshowninthefollowingscreenshot:

10. Now,ifyouclickontheViewRelatedTobutton,youcanseethattherelationshipinformationappearsonthescreen,asshowninthefollowingscreenshot:

11. ToexporttheinformationfromthisscreenintoanExcelspreadsheet,clickontheExportthetabularviewtoanExcelfilebuttoninthetop-rightcorner.

12. ChoosetheOpenwithMicrosoftExceloption,asshowninthefollowingscreenshot:

13. ThegeneratedExcelspreadsheetcouldbesenttootherbusinessusers,usedinfurtheranalysis,orsimplyusedasapieceofdocumentationforETLmetadata.

Howitworks…Metadatamanagementcanlinkinformationprovidedbymultiplesourcesinordertoperformlineageandimpactanalysisonobjects.Inourexample,weusedonlytheDataServicesrepository,butmultiplesources,suchasBusinessIntelligencemetadataareoftenimportedalongwithsourcedatabaseobjectsandtheDataServicesmetadata.Thatallowsyoutoseethefullpictureofwhatishappeningtoaspecificdataset,startingfromitsextractionfromthedatabase,whichETLtransformationsareappliedtoit,whichtargettablethetransformeddataisloadedto,and,finally,whichBIuniversesandBIreportsuseit.

Ontopofthatyoucancreateuser–customerrelationshipsbetweenobjectsthatarenotrelatedtoeachothereitherdirectlyorindirectly.

WorkingwiththeMetapediafunctionalityThinkofMetapediaasWikipediaforyourdata.Metapediaisusedtobuildahierarchyofbusinesstermsanddescriptionsforyourdata,groupthemintocategories,andevenassociateactualtechnicalobjectslikepiecesofETLcodeanddatabasetableswiththeseterms.

Inthisrecipe,wewillcreateasmallglossaryofbusinesstermsinInformationStewardandlearnhowitcanbedistributedoutsideofthesystemtobeupdatedbybusinessusersandimportedbackintoInformationSteward.

Howtodoit…1. LogintoInformationStewardandgototheMetapediasection.2. ClickontheNewCategorybuttontocreateanewcategory,Geography,asshownin

thefollowingscreenshot:

SpecifythekeywordstobeassociatedwiththecategoryforaneasysearchandclickontheSavebuttontocreatethecategory.

3. ChooseAllTermsandclickontheNewbuttontocreateanewterm,Postcode,asshowninthefollowingscreenshot:

ClickonSavetocreateitandclosethewindow.

4. Now,selectthecreatedterminthelistoftermsandclickonCategoryActions|AddtoCategory.

5. Ontheopenedcategorylistscreen,selecttheGeographycategoryandclickonOK,asshowninthefollowingscreenshot:

6. ClickontheExportMetapediatoMSExcelfilebuttonandselecttheAllTermsoption.

7. Inthepromptwindow,selectExporttermdescriptioninplaintextformat.8. Savethefileonthedisk.Now,let’sperformsomemodificationstothefileasifwe

arebusinessuserswhohavebeentoldtocreateaglossaryoftermsandcategoriesusingthisExcelspreadsheet.

9. AddthenewtermsontheBusinessTermstabofthespreadsheet,asshowninthefollowingscreenshot:

10. AddthenewcategoriesontheBusinessCategoriestabofthespreadsheet,asshowninthefollowingscreenshot:

11. GobacktoInformationSteward|MetapediaandclickonImportMetapediafromMSExcelfile.Specifythefilemodifiedinthepreviousstep,asshowninthefollowingscreenshot:

NotethatimportinginformationfromthisspreadsheetwillautomaticallyapprovealltermsandwillchangetheirstatusesfromEditingtoApproved.

12. Toassociateatermwithactualtechnicalobjects,double-clickonthespecifictermandclickontheActions|Associatewithobjectsbuttononthetermeditorscreen.SelecttheobjectsyouwanttoassociatewiththetermonebyonebyclickingontheAssociatewithtermbutton.ClickonDoneafteryouhavefinished.

13. Wehaveassociatedtwoobjects,tableCITYandparameter$p_City,fromourDataServicesrepositorywiththetermCity,asshowninthefollowingscreenshot:

Howitworks…ThemainfunctionofMetapediaistoprovideaglossarytobrowseandunderstandthedatapresentedandcategorizedinclearbusinessterms.Inotherwords,thepurposeofMetapediaistoprovideacleartranslationoftechnicaltermsintotermsthatcouldbeunderstoodbybusiness.

Itisasimplebutveryefficientsolution,andinthisrecipe,wedemonstratedhowasimpleglossarycanbecreatedinInformationStewardMetapedia,andthenexportedintoaspreadsheetfordistributionandimportedbackwithupdatedinformation.

ThisisveryusefulifyouneedtogatherthiskindofinformationfromuserswhodonothaveknowledgeoraccesstoInformationStewardandcreatetermsandcategoriesdirectlyinthesystem.

CreatingacustomcleansingpackagewithCleansingPackageBuilderInChapter7,ValidatingandCleansingData(seetherecipeDataQualitytransforms–cleansingyourdata),wealreadyusedthedefaultcleansingpackagePERSON_FIRMavailableinDataServicesfordatacleansingtasks.

Inthisrecipe,wewillcreateanewcleansingpackagefromscratchwiththehelpofInformationStewardandpublishitsothatitcanbeusedinDataServicestransforms.

OurnewcustomcleansingpackagewillbeusedtodeterminethetypeofstreetusedintheaddressfieldoftheAddresstablefromtheOLTPdatabase.

GettingreadyTheInformationStewardCleansingPackageBuildertoolrequiresasampleflatfilewithdatathatisusedtodefinecleansingrules.Thefollowingstepsdescribehowtopreparesuchaflatfilewithsampledata.

AswearegoingtouseourcustomcleansingpackagetocleansetheOLTP.Addresstabledata,wewillgenerateoursampledatasetfromthesametable.

1. LaunchDataServicesDesignerandlogintolocalrepository.2. GotoLocalObjectLibrary|Formats|FlatFiles.3. Right-clickontheFlatFilessectionandcreateanewflatfileformat,PB_sample,as

showninthefollowingscreenshot:

4. Createanewjobandnewdataflow.Insideadataflow,puttheOLTP.ADDRESStableasasourcetable.

5. LinkthesourcetabletoQuerytransformandpropagateonlytheADDRESSLINE1columntotheoutputschema.

6. LinktheoutputofaQuerytransformobjecttothetargetfilebasedonthePB_samplefileformatcreatedearlier.

7. Saveandrunthejob.ThePB_sample.txtfileshouldappearintheC:\AW\Files\folder.

Howtodoit…Now,afterwehavecreatedasamplefile,wecanfinallystarttheInformationStewardapplicationanduseCleansingPackageBuildertocreateournewcustomcleansingpackage.

1. LaunchtheInformationStewardapplicationandgototheCleansingPackage

Builderarea.2. ClickonNewCleansingPackage|CustomCleansingPackageandspecifythe

packagenameandsampledatafileinthefirststepofpackagebuilder:

3. Step2ofpackagebuildercontainsinformationwhichhelpstoparsethesampledatacorrectly:

4. Atstep3ofthepackagebuilder,youshoulddefinethenumberofrecordstakenfromthesamplefiletobeusedinthepackagedesignprocess.Themaximumnumberofrowsis3,000.Specifytherandommechanismofobtainingrowsfromthesamplefile,andnumberofrowstogetis3,000.

5. Step4definestheparsingstrategy:

6. Atstep5,youcanchooseacategorynameandassignsuggestedattributestoitifyouwantto.Inourexample,noneofthesuggestedattributesmatchesourcategorySTREET_TYPE,sowedonottickanyofthem:

7. Atstep6,wecreateattributesforourSTREET_CATEGORYcategoryandcategorizethevaluesfoundinthesamplefileagainsttheattributes.TheStandardFormscolumndefinesthestandardizedformoftheparsedvalueandtheVariationscolumndefineswhatvariationswillbestandardizedtothevaluespecifiedintheStandardFormswindow.PleaseseeanexampleoftheconfigurationfortheDRIVE_ATTRattribute:

8. AnotherexampleistheSTREET_ATTRattribute:

YoucanseehowwehaveassignedSTREETvaluestothestandardformthatarevisuallyandsyntacticallyverydifferent,likeStraseandRue.

9. Afterstep6,youmightthinkyouhavecreatedyourpackageandthatthejobisdone.Thisisalmosttrue.Wehavejustpassedthebasiccleansingpackagebuilderwizardstepsinordertocreatethecanvasforournewpackage.Therealworkstartswhenyoudouble-clickonthepackageintheCleansingPackageBuilderareaandthepackageeditoropens.Ithastwomaineditingmodes:DesignandAdvanced.Weare

notgoingtoworkwiththeadvanceddesignmodeasitwouldtakeanotherbooktocoveralltheaspectsoffine-tuningyourcleansingpackageinthismode.

10. Inthemeantime,youhaveprobablynoticedthatourcustompackagewascreatedwiththelockicon:

11. InformationStewardneedssometimetofinishitsbackgroundprocessesofthepackagecreation,soyouhavetowaitforcoupleofminutesuntiltheiconchangestodifferentone:

12. Nowthepackageisreadytobepublished.SelectthepackageontheleftandclickonthePublishbuttoninthetoolbarmenu.Theclockicononthepackageintheright-sidepanelmeansthatInformationStewardisstillperformingbackgroundoperationsinordertopublishthepackageandmakeitavailableforusageinDataServices:

13. Whenthepackagepublicationisfinished,theiconchangesagain:

14. Youcancontinuefine-tuningyourpackagebyenteringpackageDesignmode.Thismodeshowsyoutheresultofyouractionsimmediatelyinthetableatthebottom:

Howitworks…Let’sseehowthecleansingpackagewecreatedcouldactuallybeusedinDataServicestoperformdatacleansingtasks.

1. StartDataServicesDesigner.2. Createanewjobandnewdataflow.3. ImporttheOLTP.ADDRESStableasasourcetableobject.4. LinkthesourcetabletotheQuerytransformandpropagateonlytheADDRESSLINE1

columntotheoutputschemaaswearegoingtoperformcleansingonlyonthiscolumn.

5. LinktheQuerytransformobjecttotheData_Cleansetransform,whichcanbefoundinLocalObjectLibrary|Transforms|DataQuality|Data_Cleanse.

6. OpentheimportedData_Cleanseobjectforeditinginthemainworkspacewindowandgotothefirsttab,Input.

7. MaptheinputADDRESSLINE1fieldtotheMULTILINE1transforminputfieldname:

8. GotothesecondOptionstabandconfigurethefollowingoptionsspecifyingournewcreatedAddress_Customasthecleansingpackage:

9. Finally,openthethirdtabOutputanddefinethefollowingoutputcolumnsthatwillbeproducedbyData_Cleansetransform:

10. ClosetheData_Cleansetransformobjectandlinkittonewlyimportedtemplatetable,ADDRESS_CLEANSE_STREET_TYPE,createdintheDS_STAGEdatastore.

11. Yourdataflowshouldlooklikethatinthefollowingfigure:

Afteryouhavesavedandranthejob,youcanseethatthecleansingpackage“categorized”andpopulatedcolumnshavebeencreatedforeachattributeof

STREET_CATEGORY:

Howwellacleansingpackagedoesitsjobsolelydependsonyourabilitytodefinerulesandconfigureittoaccommodateallpossiblescenariosthatcanbeseeninyourdata.

Forexample,“Circle”hasnotbeencategorizedaswesimplydidnotdefineanyruleregardingthe“Circle”value.

ThisisoneofthesimplestcasesofthecleansingtaskbutitshouldgiveyouanideaoftheInformationStewardcapabilitiesinthisarea.

Thereismore…Openacleansingpackagebydouble-clickingandgoingtotheAdvancedmodetoseehowmanyoptionsexistforcreatingandtuningcleansingrulesandalgorithms.Youcandefinenewrulesandchangethealreadycreatedonestomakeyourcleansingprocessbehavedifferently.Thecomplexityofacleansingpackageisrestrictedonlybyyourfantasyandthecomplexityoftheaccommodatedcleansingprocessrequirements.

IndexA

AccessServerconfiguring/ConfiguringAccessServer,Howtodoit…

administrativetasksRepositoryManager,using/Howtodoit…ServerManager,using/Howtodoit…CMC,usedforregisteringnewrepository/Howtodoit…LicenseManager,using/Howtodoit…

aggregatefunctionsusing/Usingaggregatefunctions,Howtodoit…,Howitworks…

auditreportingabout/BuildinganexternalETLauditandauditreporting,Howtodoit…,Howitworks…

Autocorrectloadoptionabout/ExploringtheAutocorrectloadoption,Howtodoit…,Howitworks…

B

blobdatatype/Howitworks…bulk-load

about/Optimizingdataflowloaders–bulk-loadingmethodsbulk-loadingmethods

about/Optimizingdataflowloaders–bulk-loadingmethods,Howtodoit…,Howitworks…enabling/Whentoenablebulkloading?

bypassingfeatureusing/Usingthebypassingfeature,Howtodoit…,Howitworks…

C

Casetransformused,forsplittingdataflow/SplittingtheflowofdatawiththeCasetransform,Howtodoit…,Howitworks…

CentralConfigurationManagementabout/Howtodoit…

CentralManagementConsoleabout/Introduction,Howtodoit…

CentralManagementConsole(CMC)/GettingreadyCentralObjectLibrary

objects,assigningtoandfrom/AddingobjectstoandfromtheCentralObjectLibrary

centralrepositoryETLcode,migratingthrough/MigratingETLcodethroughthecentralrepository,Gettingreadyobjects,comparingbetweenLocalandCentral/ComparingobjectsbetweentheLocalandCentralrepositories

ChangeDataCapture(CDC)about/ChangeDataCapturetechniquesNohistorySCD(Type1)/NohistorySCD(Type1)limitedhistorySCD(Type3)/LimitedhistorySCD(Type3)unlimitedhistorySCD(Type2)/UnlimitedhistorySCD(Type2)process,building/Howtodoit…,Howitworks…source-basedETLCDC/Source-basedETLCDCtarget-basedETLCDC/Target-basedETLCDCnative/NativeCDC

CleansingPackageBuilderused,forcreatingcustomcleansingpackage/CreatingacustomcleansingpackagewithCleansingPackageBuilder,Gettingready,Howtodoit…,Howitworks…

clienttoolsabout/IntroductionDesignertool/IntroductionRepositoryManager/Introduction

CodePlexURL/Howtodoit…

commandline(cmd)about/Howtodoit…

conditionalandwhileloopobjectsused,forcontrollingexecutionorder/Usingconditionalandwhileloopobjectstocontroltheexecutionorder,Gettingready,Howtodoit…,Thereismore…

connectionobject,DataInsightcreating/Creatingaconnectionobject

continuousworkflow

using/Usingacontinuousworkflow,Howtodoit…,Howitworks…,Thereismore…

conversionfunctionsusing/Usingconversionfunctions,Howtodoit…,Howitworks…

customfunctionscreating/Creatingcustomfunctions,Howtodoit…,Howitworks…

D

dataloading,intoflatfile/Loadingdataintoaflatfile,Howtodoit…,Howitworks…,There’smore…loading,fromflatfile/Loadingdatafromaflatfile,Howtodoit…,Howitworks…,There’smore…loading,fromtabletotable/Loadingdatafromtabletotable–lookupsandjoins,Howtodoit…,Howitworks…flowsplitting,Casetransformused/SplittingtheflowofdatawiththeCasetransform,Howtodoit…,Howitworks…flowexecution,monitoring/Monitoringandanalyzingdataflowexecution,Gettingready,Howtodoit…,Howitworks…flowexecution,analyzing/Monitoringandanalyzingdataflowexecution,Gettingready,Howtodoit…,Howitworks…cleansing/DataQualitytransforms–cleansingyourdata,Howtodoit…,Howitworks…,There’smore…transforming,Pivottransformused/TransformingdatawiththePivottransform,Gettingready,Howtodoit…,Howitworks…loading,intoSAPERP/LoadingdataintoSAPERP,Gettingready,Howtodoit…,Howitworks…

databaseenvironmentpreparing/Preparingadatabaseenvironment,Howitworks…

databasefunctionsusing/Usingdatabasefunctionskey_generation()function/key_generation()total_rows()function/total_rows()sql()function/sql(),Howitworks…

dataflowauditenabling/Enablingdataflowaudit,Howtodoit…,Howitworks…,There’smore…

dataflowexecution,optimizingSQLtransform/Optimizingdataflowexecution–theSQLtransform,Howtodoit…,Howitworks…Data_Transfertransform/Optimizingdataflowexecution–theData_Transfertransform,Howtodoit…,Howitworks…Data_Transfertransform,usage/WhentouseData_Transfertransform,There’smore…performanceoptions/Optimizingdataflowexecution–performanceoptions,Howtodoit…dataflowperformanceoptions/Dataflowperformanceoptionssourcetableperformanceoptions/Sourcetableperformanceoptionsquerytransformperformanceoptions/Querytransformperformanceoptionslookup_ext()performanceoptions/lookup_ext()performanceoptionstargettableperformanceoptions/Targettableperformanceoptions

dataflowflow,optimizingpush-downtechniques/Optimizingdataflowexecution–push-downtechniques,Howtodoit…,Howitworks…

dataflowloaders,optimizingbulk-loadingmethods/Optimizingdataflowloaders–bulk-loadingmethods,Howtodoit…,Howitworks…bulkloading,enabling/Whentoenablebulkloading?

dataflowperformanceoptionsabout/Dataflowperformanceoptions

dataflowreaders,optimizinglookupmethods/Optimizingdataflowreaders–lookupmethodsQuerytransformjoin,lookupwith/LookupwiththeQuerytransformjoinlookup_ext()function,lookupwith/Lookupwiththelookup_ext()functionsql()function,lookupwith/Lookupwiththesql()functionQuerytransformjoin,advantages/Querytransformjoinslookup_ext()function/lookup_ext()sql()function/sql()performancereview/Performancereview

DataInsightcapabilities,exploring/ExploringDataInsightcapabilities,Gettingready,Howtodoit…connectionobject,creating/Creatingaconnectionobjectdata,profiling/Profilingthedataprofilingresults,viewing/Viewingprofilingresultsvalidationrule,creating/Creatingavalidationrulescorecard,creating/Creatingascorecard,Howitworks…profiling/Profilingrules/Rulesscorecards/Scorecards

DataModificationLanguage(DML)operation/UsingtheMap_OperationtransformDataQualitytransforms

about/DataQualitytransforms–cleansingyourdata,Howtodoit…,Howitworks…

DataServicesclienttools/Introductionserver-basedcomponents/Introductioninstalling/InstallingandconfiguringDataServices,Howtodoit…,Howitworks…configuring/InstallingandconfiguringDataServices,Howtodoit…,Howitworks…referenceguide,URL/Howtodoit…autodocumentation/AutoDocumentationinDataServices,Howtodoit…,Howitworks…automaticjobrecovery/AutomaticjobrecoveryinDataServices,Gettingready,Howtodoit…,Howitworks…,There’smore…

DataServicesobjects

andparent-childrelationships/Peekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjects,Howitworks…objecttypeslist,gettinginDataServicesrepository/GetalistofobjecttypesandtheircodesintheDataServicesrepositoryDF_Transform_DimGeographydataflowinformation,displaying/DisplayinformationabouttheDF_Transform_DimGeographydataflowSalesTerritorytableobjectinformation,displaying/DisplayinformationabouttheSalesTerritorytableobjectscriptobjectcontent,displaying/Seethecontentsofthescriptobject

DataServicesrepositorycreating/CreatingIPSandDataServicesrepositories,Howtodoit…,Howitworks…database,creating/Howtodoit…ODBClayer,configuring/Howtodoit…

datavalidationvalidationfunctions,creating/Creatingvalidationfunctions,Howtodoit…,Howitworks…results,reporting/Reportingdatavalidationresults,Howtodoit…,Howitworks…regularexpressionsupportused/Usingregularexpressionsupporttovalidatedata,Gettingready,Howtodoit…,Howitworks…

Data_Transfertransformabout/Optimizingdataflowexecution–theData_Transfertransform,Howtodoit…,Howitworks…usage/WhentouseData_Transfertransform,There’smore…

datefunctionsusing/Usingdatefunctionscurrentdateandtime,generating/Generatingcurrentdateandtimeparts,extractingfromdates/Extractingpartsfromdates,Howitworks…,There’smore…

Designertoolabout/UnderstandingtheDesignertoolsetting/Howtodoit…defaultoptions,setting/Howtodoit…ETLcode,executing/ExecutingETLcodeinDataServicesETLcode,validating/ValidatingETLcodetemplatetables,using/Templatetablesquerytransform/QuerytransformbasicsHelloWorldexample/TheHelloWorldexample

Dropandre-createtableoption/There’smore…DSManagementConsole

about/Introduction

E

ETLorganizing/Projectsandjobs–organizingETL,Howtodoit…,Howitworks…projects/Projectsandjobs–organizingETL,Howtodoit…,Howitworks…hierarchicalobjectview/Hierarchicalobjectviewhistoryexecutionlogfiles/Historyexecutionlogfilesjobs,schedulingfromManagementconsole/Executing/schedulingjobsfromtheManagementConsolejobs,executingfromManagementconsole/Executing/schedulingjobsfromtheManagementConsole

ETLauditexternalETLaudit,building/BuildinganexternalETLauditandauditreporting,Howtodoit…,Howitworks…built-in,using/Usingbuilt-inDataServicesETLauditandreportingfunctionality,Howtodoit…,Howitworks…

ETLcodemigrating,throughcentralrepository/MigratingETLcodethroughthecentralrepository,Gettingready,Howtodoit…migrating,withexport/import/MigratingETLcodewithexport/import,Howtodoit…

ETLexecutionsimplifying,withsystemconfigurations/SimplifyingETLexecutionwithsystemconfigurations,Gettingready,Howtodoit…,Howitworks…

ETLjobdimensiontables,populating/Usecaseexample–populatingdimensiontables,Howtodoit…building/Usecaseexample–populatingdimensiontables,Howtodoit…mapping,defining/Mappingdependencies,defining/Dependenciesdevelopment/Developmentexecutionorder/Executionordertesting/TestingETLtestdata,preparingtopopulateDimSalesTerritory/PreparingtestdatatopopulateDimSalesTerritorytestdata,preparingtopopulateDimGeography/PreparingtestdatatopopulateDimGeography

executionordercontrolling,bynestingworkflows/Nestingworkflowstocontroltheexecutionorder,Howtodoit,Howitworks…controlling,conditionalandwhileloopsobjectsused/Usingconditionalandwhileloopobjectstocontroltheexecutionorder,Gettingready,Howtodoit…,Howitworks…,Thereismore…

export/import

ETLcode,migratingwith/MigratingETLcodewithexport/import,GettingreadyATLfilesused/Import/ExportusingATLfilestolocalrepository/Directexporttoanotherlocalrepository,Howitworks…

Extract-Transform-Load(ETL)about/Introductionadvantages/Introduction

F

failurescontrolling/Controllingfailures–try-catchobjects,Howtodoit…,Howitworks…

flatfiledata,loadingin/Loadingdataintoaflatfile,Howtodoit…,Howitworks…,There’smore…data,loadingfrom/Loadingdatafromaflatfile,Howtodoit…,Howitworks…

fullpushdown/Gettingready

H

Hierarchy_Flatteningtransformabout/TheHierarchy_Flatteningtransform,Gettingreadyhorizontalhierarchyflattening,performing/Horizontalhierarchyflatteningverticalhierarchyflattening/Verticalhierarchyflattening,Howitworks…resulttables,querying/Queryingresulttables

horizontalhierarchyflatteningabout/Horizontalhierarchyflattening

I

IDocabout/IDocload,monitoringonSAPside/MonitoringIDocloadontheSAPsideloadeddata,post-loadvalidation/Post-loadvalidationofloadeddata

InformationPlatformServices(IPS)configuring/InstallingandconfiguringInformationPlatformServices,Howtodoit…,Howitworks…installing/InstallingandconfiguringInformationPlatformServices,Howtodoit…,Howitworks…

/GettingreadyIPSrepository

creating/CreatingIPSandDataServicesrepositories,Howtodoit…,Howitworks…

J

jobexecutiondebugging/Debuggingjobexecution,Howtodoit…,Howitworks…monitoring/Monitoringjobexecution,Howtodoit…

jobrecovery,automaticinDataServices/AutomaticjobrecoveryinDataServices,Howtodoit…,Howitworks…,There’smore…

joinoperations*-cross-joinoperation/Howitworks…||-parallel-joinoperation/Howitworks…INNERJOIN/Howitworks…LEFTOUTERJOIN/Howitworks…

K

key_generation()function/key_generation()

L

longdatatype/Howitworks…lookupmethods

withQuerytransformjoin/LookupwiththeQuerytransformjoinwithlookup_ext()function/Lookupwiththelookup_ext()functionwithsql()function/Lookupwiththesql()function

lookup_ext()functionlookupwith/Lookupwiththelookup_ext()functionadvantages/lookup_ext()

lookup_ext()performanceoptionsabout/lookup_ext()performanceoptions

M

Map_Operationtransformusing/UsingtheMap_Operationtransform,Howtodoit…,Howitworks…

mathfunctionsusing/Usingmathfunctions,Howtodoit…,There’smore…

MetadataManagementtasksperforming/PerformingMetadataManagementtasks,Gettingready,Howtodoit…,Howitworks…

Metapediaworkingwith/WorkingwiththeMetapediafunctionality,Howtodoit…,Howitworks…

MicrosoftSQLServer2012URL/Howtodoit…

miscellaneousfunctionsusing/Usingmiscellaneousfunctions,Howitworks…

N

nestedstructuresworkingwith/Workingwithnestedstructures,Howtodoit…,Howitworks…,Thereismore…

O

objectreplicationusing/Usingobjectreplication,Howitworks…

OLTPdatastore/Howtodoit…

P

parameterscreating/Creatingvariablesandparameters,Howtodoit…,Howitworks…

parent-childrelationshipsbetweenDataServicesobjects/Peekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjects,Gettingready

partialpushdown/Gettingreadyperformanceoptions

about/Optimizingdataflowexecution–performanceoptions,Howtodoit…Pivottransform

used,fortransformingdata/TransformingdatawiththePivottransform,Gettingready,Howtodoit…,Howitworks…

profilingdata/Profilingprofilingresults,DataInsight

viewing/Viewingprofilingresultspush-downoperations

about/Optimizingdataflowexecution–push-downtechniques,Howitworks…partialpushdown/Gettingreadyfullpushdown/Gettingready

Q

Querytransformjoinlookupwith/LookupwiththeQuerytransformjoin

querytransformjoinsadvantages/Querytransformjoins

Querytransformperformanceoptionsabout/Querytransformperformanceoptions

R

real-timejobscreating/Creatingreal-timejobsSoapUI,installing/InstallingSoapUI,Howtodoit…,Howitworks…

regularexpressionsupportused,forvalidatingdata/Usingregularexpressionsupporttovalidatedata,Gettingready,Howtodoit…,Howitworks…

replicationprocessabout/Howitworks…

rules/Rules

S

SAPERPdata,loadinginto/LoadingdataintoSAPERP,Gettingready,Howtodoit…,Howitworks…URL/LoadingdataintoSAPERP

SAPInformationStewardabout/IntroductionURL/Introduction

scorecard,DataInsightcreating/Creatingascorecard,Howitworks…

scorecards/Scorecardsscript

creating/Creatingascript,Howtodoit…,Howitworks…stringfunctions,using/Usingstringfunctionsinthescript,Howitworks…

server-basedcomponentsIPSServices/IntroductionJobServer/Introductionaccessserver/Introductionwebapplicationserver/Introduction

servicesstarting/Startingandstoppingservices,Howtodoit…,Seealsostopping/Startingandstoppingservices,Howtodoit…,Seealsowebapplicationserver/Howtodoit…DataServicesJobServer/Howtodoit…InformationPlatformServices/Howtodoit…

SlowlyChangingDimensions(SCD)about/Gettingready

SoapUIinstalling/InstallingSoapUI,Howtodoit…,Howitworks…URL/InstallingSoapUI

sourcedataobjectcreating/Creatingasourcedataobject,Howtodoit…,Howitworks…

sourcesystemdatabasecreating/Creatingasourcesystemdatabase,There’smore…

sourcetableperformanceoptionsabout/Sourcetableperformanceoptions

sql()function/sql(),Howitworks…lookupwith/Lookupwiththesql()functionabout/sql()

SQLtransformabout/Optimizingdataflowexecution–theSQLtransform,Howtodoit…,Howitworks…

stagingareastructuresdefining/Definingandcreatingstagingareastructures

creating/Definingandcreatingstagingareastructuresflatfiles/FlatfilesRDBMStables/RDBMStables,Howitworks…

stringfunctionsusing/Usingstringfunctions,Howtodoit…using,inscript/Usingstringfunctionsinthescript,Howitworks…

systemconfigurationsused,forsimplifyingETLexecution/SimplifyingETLexecutionwithsystemconfigurations,Howtodoit…,Howitworks…

T

Table_Comparisontransformusing/UsingtheTable_Comparisontransform,Gettingready,Howtodoit…,Howitworks…

targetdataobjectcreating/Creatingatargetdataobject,Howtodoit…,There’smore…

targetdatawarehousecreating/Creatingatargetdatawarehouse,Howitworks…,There’smore…

targettableperformanceoptionsabout/Targettableperformanceoptions

tasksadministering/Administeringtasks,Howtodoit…,Seealso

total_rows()function/total_rows()try-catchobjects

about/Controllingfailures–try-catchobjects,Howtodoit…,Howitworks…

U

useraccessconfiguring/Configuringuseraccess,Howtodoit…,Howitworks…

V

validationfunctionscreating/Creatingvalidationfunctions,Howtodoit…,Howitworks…using,withvalidationtransform/UsingvalidationfunctionswiththeValidationtransform,Howtodoit…,Howitworks…

validationrule,DataInsightcreating/Creatingavalidationrule

validationtransformvalidationfunctions,usingwith/UsingvalidationfunctionswiththeValidationtransform,Howtodoit…,Howitworks…

variablescreating/Creatingvariablesandparameters,Howtodoit…,Howitworks…

verticalhierarchyflatteningabout/Verticalhierarchyflattening,Howitworks…

W

workflowobjectcreating/Creatingaworkflowobject,Howtodoit…,Howitworks…

workflowsnesting,tocontrolexecutionorder/Nestingworkflowstocontroltheexecutionorder,Howtodoit,Howitworks…

X

XML_Maptransformabout/TheXML_Maptransform,Howtodoit…,Howitworks…

TableofContentsSAPDataServices4.xCookbook

Credits

AbouttheAuthor

AbouttheReviewers

www.PacktPub.com

Supportfiles,eBooks,discountoffers,andmore

Whysubscribe?

FreeaccessforPacktaccountholders

InstantupdatesonnewPacktbooks

Preface

Whatthisbookcovers

Whatyouneedforthisbook

Whothisbookisfor

Sections

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

Conventions

Readerfeedback

Customersupport

Downloadingtheexamplecode

Downloadingthecolorimagesofthisbook

Errata

Piracy

Questions

1.IntroductiontoETLDevelopment

Introduction

Preparingadatabaseenvironment

Gettingready

Howtodoit…

Howitworks…

Creatingasourcesystemdatabase

Howtodoit…

Howitworks…

There’smore…

Definingandcreatingstagingareastructures

Howtodoit…

Flatfiles

RDBMStables

Howitworks…

Creatingatargetdatawarehouse

Gettingready

Howtodoit…

Howitworks…

There’smore…

2.ConfiguringtheDataServicesEnvironment

Introduction

CreatingIPSandDataServicesrepositories

Gettingready…

Howtodoit…

Howitworks…

Seealso

InstallingandconfiguringInformationPlatformServices

Gettingready…

Howtodoit…

Howitworks…

InstallingandconfiguringDataServices

Gettingready…

Howtodoit…

Howitworks…

Configuringuseraccess

Gettingready…

Howtodoit…

Howitworks…

Startingandstoppingservices

Howtodoit…

Howitworks…

Seealso

Administeringtasks

Howtodoit…

Howitworks…

Seealso

UnderstandingtheDesignertool

Gettingready…

Howtodoit…

Howitworks…

ExecutingETLcodeinDataServices

ValidatingETLcode

Templatetables

Querytransformbasics

TheHelloWorldexample

3.DataServicesBasics–DataTypes,ScriptingLanguage,andFunctions

Introduction

Creatingvariablesandparameters

Gettingready

Howtodoit…

Howitworks…

There’smore…

Creatingascript

Howtodoit…

Howitworks…

Usingstringfunctions

Howtodoit…

Usingstringfunctionsinthescript

Howitworks…

There’smore…

Usingdatefunctions

Howtodoit…

Generatingcurrentdateandtime

Extractingpartsfromdates

Howitworks…

There’smore…

Usingconversionfunctions

Howtodoit…

Howitworks…

There’smore…

Usingdatabasefunctions

Howtodoit…

key_generation()

total_rows()

sql()

Howitworks…

Usingaggregatefunctions

Howtodoit…

Howitworks…

Usingmathfunctions

Howtodoit…

Howitworks…

There’smore…

Usingmiscellaneousfunctions

Howtodoit…

Howitworks…

Creatingcustomfunctions

Howtodoit…

Howitworks…

There’smore…

4.Dataflow–Extract,Transform,andLoad

Introduction

Creatingasourcedataobject

Howtodoit…

Howitworks…

There’smore…

Creatingatargetdataobject

Gettingready

Howtodoit…

Howitworks…

There’smore…

Loadingdataintoaflatfile

Howtodoit…

Howitworks…

There’smore…

Loadingdatafromaflatfile

Howtodoit…

Howitworks…

There’smore…

Loadingdatafromtabletotable–lookupsandjoins

Howtodoit…

Howitworks…

UsingtheMap_Operationtransform

Howtodoit…

Howitworks…

UsingtheTable_Comparisontransform

Gettingready

Howtodoit…

Howitworks…

ExploringtheAutocorrectloadoption

Gettingready

Howtodoit…

Howitworks…

SplittingtheflowofdatawiththeCasetransform

Gettingready

Howtodoit…

Howitworks…

Monitoringandanalyzingdataflowexecution

Gettingready

Howtodoit…

Howitworks…

There’smore…

5.Workflow–ControllingExecutionOrder

Introduction

Creatingaworkflowobject

Howtodoit…

Howitworks…

Nestingworkflowstocontroltheexecutionorder

Gettingready

Howtodoit

Howitworks…

Usingconditionalandwhileloopobjectstocontroltheexecutionorder

Gettingready

Howtodoit…

Howitworks…

Thereismore…

Usingthebypassingfeature

Gettingready…

Howtodoit…

Howitworks…

Thereismore…

Controllingfailures–try-catchobjects

Howtodoit…

Howitworks…

Usecaseexample–populatingdimensiontables

Gettingready

Howtodoit…

Howitworks…

Mapping

Dependencies

Development

Executionorder

TestingETL

PreparingtestdatatopopulateDimSalesTerritory

PreparingtestdatatopopulateDimGeography

Usingacontinuousworkflow

Howtodoit…

Howitworks…

Thereismore…

Peekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjects

Gettingready

Howtodoit…

Howitworks…

GetalistofobjecttypesandtheircodesintheDataServicesrepository

DisplayinformationabouttheDF_Transform_DimGeographydataflow

DisplayinformationabouttheSalesTerritorytableobject

Seethecontentsofthescriptobject

6.Job–BuildingtheETLArchitecture

Introduction

Projectsandjobs–organizingETL

Gettingready

Howtodoit…

Howitworks…

Hierarchicalobjectview

Historyexecutionlogfiles

Executing/schedulingjobsfromtheManagementConsole

Usingobjectreplication

Howtodoit…

Howitworks…

MigratingETLcodethroughthecentralrepository

Gettingready

Howtodoit…

Howitworks…

AddingobjectstoandfromtheCentralObjectLibrary

ComparingobjectsbetweentheLocalandCentralrepositories

Thereismore…

MigratingETLcodewithexport/import

Gettingready

Howtodoit…

Import/ExportusingATLfiles

Directexporttoanotherlocalrepository

Howitworks…

Debuggingjobexecution

Gettingready…

Howtodoit…

Howitworks…

Monitoringjobexecution

Gettingready

Howtodoit…

Howitworks…

BuildinganexternalETLauditandauditreporting

Gettingready…

Howtodoit…

Howitworks…

Usingbuilt-inDataServicesETLauditandreportingfunctionality

Gettingready

Howtodoit…

Howitworks…

AutoDocumentationinDataServices

Howtodoit…

Howitworks…

7.ValidatingandCleansingData

Introduction

Creatingvalidationfunctions

Gettingready

Howtodoit…

Howitworks…

UsingvalidationfunctionswiththeValidationtransform

Gettingready

Howtodoit…

Howitworks…

Reportingdatavalidationresults

Gettingready

Howtodoit…

Howitworks…

Usingregularexpressionsupporttovalidatedata

Gettingready

Howtodoit…

Howitworks…

Enablingdataflowaudit

Gettingready

Howtodoit…

Howitworks…

There’smore…

DataQualitytransforms–cleansingyourdata

Gettingready

Howtodoit…

Howitworks…

There’smore…

8.OptimizingETLPerformance

Introduction

Optimizingdataflowexecution–push-downtechniques

Gettingready

Howtodoit…

Howitworks…

Optimizingdataflowexecution–theSQLtransform

Howtodoit…

Howitworks…

Optimizingdataflowexecution–theData_Transfertransform

Gettingready

Howtodoit…

Howitworks…

WhyweusedasecondData_Transfertransformobject

WhentouseData_Transfertransform

There’smore…

Optimizingdataflowreaders–lookupmethods

Gettingready

Howtodoit…

LookupwiththeQuerytransformjoin

Lookupwiththelookup_ext()function

Lookupwiththesql()function

Howitworks…

Querytransformjoins

lookup_ext()

sql()

Performancereview

Optimizingdataflowloaders–bulk-loadingmethods

Howtodoit…

Howitworks…

Whentoenablebulkloading?

Optimizingdataflowexecution–performanceoptions

Gettingready

Howtodoit…

Dataflowperformanceoptions

Sourcetableperformanceoptions

Querytransformperformanceoptions

lookup_ext()performanceoptions

Targettableperformanceoptions

9.AdvancedDesignTechniques

Introduction

ChangeDataCapturetechniques

Gettingready

NohistorySCD(Type1)

LimitedhistorySCD(Type3)

UnlimitedhistorySCD(Type2)

Howtodoit…

Howitworks…

Source-basedETLCDC

Target-basedETLCDC

NativeCDC

AutomaticjobrecoveryinDataServices

Gettingready

Howtodoit…

Howitworks…

There’smore…

SimplifyingETLexecutionwithsystemconfigurations

Gettingready

Howtodoit…

Howitworks…

TransformingdatawiththePivottransform

Gettingready

Howtodoit…

Howitworks…

10.DevelopingReal-timeJobs

Introduction

Workingwithnestedstructures

Gettingready

Howtodoit…

Howitworks…

Thereismore…

TheXML_Maptransform

Gettingready

Howtodoit…

Howitworks…

TheHierarchy_Flatteningtransform

Gettingready

Howtodoit…

Horizontalhierarchyflattening

Verticalhierarchyflattening

Howitworks…

Queryingresulttables

ConfiguringAccessServer

Gettingready

Howtodoit…

Howitworks…

Creatingreal-timejobs

Gettingready

InstallingSoapUI

Howtodoit…

Howitworks…

11.WorkingwithSAPApplications

Introduction

LoadingdataintoSAPERP

Gettingready

Howtodoit…

Howitworks…

IDoc

MonitoringIDocloadontheSAPside

Post-loadvalidationofloadeddata

Thereismore…

12.IntroductiontoInformationSteward

Introduction

ExploringDataInsightcapabilities

Gettingready

Howtodoit…

Creatingaconnectionobject

Profilingthedata

Viewingprofilingresults

Creatingavalidationrule

Creatingascorecard

Howitworks…

Profiling

Rules

Scorecards

Thereismore…

PerformingMetadataManagementtasks

Gettingready

Howtodoit…

Howitworks…

WorkingwiththeMetapediafunctionality

Howtodoit…

Howitworks…

CreatingacustomcleansingpackagewithCleansingPackageBuilder

Gettingready

Howtodoit…

Howitworks…

Thereismore…

Index

top related