sap data services 4.x cookbook
DESCRIPTION
Dataservices BookTRANSCRIPT
SAPDataServices4.xCookbook
TableofContentsSAPDataServices4.xCookbook
Credits
AbouttheAuthor
AbouttheReviewers
www.PacktPub.com
Supportfiles,eBooks,discountoffers,andmoreWhysubscribe?FreeaccessforPacktaccountholdersInstantupdatesonnewPacktbooks
Preface
WhatthisbookcoversWhatyouneedforthisbookWhothisbookisforSectionsGettingreadyHowtodoit…Howitworks…There’smore…Seealso
ConventionsReaderfeedbackCustomersupportDownloadingtheexamplecodeDownloadingthecolorimagesofthisbookErrataPiracyQuestions
1.IntroductiontoETLDevelopment
IntroductionPreparingadatabaseenvironmentGettingreadyHowtodoit…Howitworks…
CreatingasourcesystemdatabaseHowtodoit…Howitworks…There’smore…
DefiningandcreatingstagingareastructuresHowtodoit…
Flatfiles
RDBMStablesHowitworks…
CreatingatargetdatawarehouseGettingreadyHowtodoit…Howitworks…There’smore…
2.ConfiguringtheDataServicesEnvironment
IntroductionCreatingIPSandDataServicesrepositoriesGettingready…Howtodoit…Howitworks…Seealso
InstallingandconfiguringInformationPlatformServicesGettingready…Howtodoit…Howitworks…
InstallingandconfiguringDataServicesGettingready…Howtodoit…Howitworks…
ConfiguringuseraccessGettingready…Howtodoit…Howitworks…
StartingandstoppingservicesHowtodoit…Howitworks…Seealso
AdministeringtasksHowtodoit…Howitworks…Seealso
UnderstandingtheDesignertoolGettingready…Howtodoit…Howitworks…
ExecutingETLcodeinDataServicesValidatingETLcodeTemplatetablesQuerytransformbasicsTheHelloWorldexample
3.DataServicesBasics–DataTypes,ScriptingLanguage,andFunctions
IntroductionCreatingvariablesandparametersGettingreadyHowtodoit…Howitworks…There’smore…
CreatingascriptHowtodoit…Howitworks…
UsingstringfunctionsHowtodoit…
UsingstringfunctionsinthescriptHowitworks…There’smore…
UsingdatefunctionsHowtodoit…
GeneratingcurrentdateandtimeExtractingpartsfromdates
Howitworks…There’smore…
UsingconversionfunctionsHowtodoit…Howitworks…There’smore…
UsingdatabasefunctionsHowtodoit…
key_generation()total_rows()sql()
Howitworks…UsingaggregatefunctionsHowtodoit…Howitworks…
UsingmathfunctionsHowtodoit…Howitworks…There’smore…
UsingmiscellaneousfunctionsHowtodoit…Howitworks…
CreatingcustomfunctionsHowtodoit…Howitworks…There’smore…
4.Dataflow–Extract,Transform,andLoad
IntroductionCreatingasourcedataobjectHowtodoit…Howitworks…There’smore…
CreatingatargetdataobjectGettingreadyHowtodoit…Howitworks…There’smore…
LoadingdataintoaflatfileHowtodoit…Howitworks…There’smore…
LoadingdatafromaflatfileHowtodoit…Howitworks…There’smore…
Loadingdatafromtabletotable–lookupsandjoinsHowtodoit…Howitworks…
UsingtheMap_OperationtransformHowtodoit…Howitworks…
UsingtheTable_ComparisontransformGettingreadyHowtodoit…Howitworks…
ExploringtheAutocorrectloadoptionGettingreadyHowtodoit…Howitworks…
SplittingtheflowofdatawiththeCasetransformGettingreadyHowtodoit…Howitworks…
MonitoringandanalyzingdataflowexecutionGettingreadyHowtodoit…Howitworks…There’smore…
5.Workflow–ControllingExecutionOrder
IntroductionCreatingaworkflowobjectHowtodoit…
Howitworks…NestingworkflowstocontroltheexecutionorderGettingreadyHowtodoitHowitworks…
UsingconditionalandwhileloopobjectstocontroltheexecutionorderGettingreadyHowtodoit…Howitworks…Thereismore…
UsingthebypassingfeatureGettingready…Howtodoit…Howitworks…Thereismore…
Controllingfailures–try-catchobjectsHowtodoit…Howitworks…
Usecaseexample–populatingdimensiontablesGettingreadyHowtodoit…Howitworks…
MappingDependenciesDevelopmentExecutionorderTestingETL
PreparingtestdatatopopulateDimSalesTerritoryPreparingtestdatatopopulateDimGeography
UsingacontinuousworkflowHowtodoit…Howitworks…Thereismore…Peekinginsidetherepository–parent-childrelationshipsbetweenData
ServicesobjectsGettingreadyHowtodoit…Howitworks…
GetalistofobjecttypesandtheircodesintheDataServicesrepository
DisplayinformationabouttheDF_Transform_DimGeographydataflow
DisplayinformationabouttheSalesTerritorytableobjectSeethecontentsofthescriptobject
6.Job–BuildingtheETLArchitecture
IntroductionProjectsandjobs–organizingETLGettingreadyHowtodoit…Howitworks…
HierarchicalobjectviewHistoryexecutionlogfilesExecuting/schedulingjobsfromtheManagementConsole
UsingobjectreplicationHowtodoit…Howitworks…
MigratingETLcodethroughthecentralrepositoryGettingreadyHowtodoit…Howitworks…
AddingobjectstoandfromtheCentralObjectLibraryComparingobjectsbetweentheLocalandCentralrepositories
Thereismore…MigratingETLcodewithexport/importGettingreadyHowtodoit…
Import/ExportusingATLfilesDirectexporttoanotherlocalrepository
Howitworks…DebuggingjobexecutionGettingready…Howtodoit…Howitworks…
MonitoringjobexecutionGettingreadyHowtodoit…Howitworks…
BuildinganexternalETLauditandauditreportingGettingready…Howtodoit…Howitworks…
Usingbuilt-inDataServicesETLauditandreportingfunctionalityGettingreadyHowtodoit…Howitworks…
AutoDocumentationinDataServicesHowtodoit…Howitworks…
7.ValidatingandCleansingData
Introduction
CreatingvalidationfunctionsGettingreadyHowtodoit…Howitworks…
UsingvalidationfunctionswiththeValidationtransformGettingreadyHowtodoit…Howitworks…
ReportingdatavalidationresultsGettingreadyHowtodoit…Howitworks…
UsingregularexpressionsupporttovalidatedataGettingreadyHowtodoit…Howitworks…
EnablingdataflowauditGettingreadyHowtodoit…Howitworks…There’smore…
DataQualitytransforms–cleansingyourdataGettingreadyHowtodoit…Howitworks…There’smore…
8.OptimizingETLPerformance
IntroductionOptimizingdataflowexecution–push-downtechniquesGettingreadyHowtodoit…Howitworks…
Optimizingdataflowexecution–theSQLtransformHowtodoit…Howitworks…
Optimizingdataflowexecution–theData_TransfertransformGettingreadyHowtodoit…Howitworks…
WhyweusedasecondData_TransfertransformobjectWhentouseData_Transfertransform
There’smore…Optimizingdataflowreaders–lookupmethodsGettingreadyHowtodoit…
LookupwiththeQuerytransformjoinLookupwiththelookup_ext()functionLookupwiththesql()function
Howitworks…Querytransformjoinslookup_ext()sql()Performancereview
Optimizingdataflowloaders–bulk-loadingmethodsHowtodoit…Howitworks…
Whentoenablebulkloading?Optimizingdataflowexecution–performanceoptionsGettingreadyHowtodoit…
DataflowperformanceoptionsSourcetableperformanceoptionsQuerytransformperformanceoptionslookup_ext()performanceoptionsTargettableperformanceoptions
9.AdvancedDesignTechniques
IntroductionChangeDataCapturetechniquesGettingready
NohistorySCD(Type1)LimitedhistorySCD(Type3)UnlimitedhistorySCD(Type2)
Howtodoit…Howitworks…
Source-basedETLCDCTarget-basedETLCDCNativeCDC
AutomaticjobrecoveryinDataServicesGettingreadyHowtodoit…Howitworks…There’smore…
SimplifyingETLexecutionwithsystemconfigurationsGettingreadyHowtodoit…Howitworks…
TransformingdatawiththePivottransformGettingreadyHowtodoit…Howitworks…
10.DevelopingReal-timeJobs
IntroductionWorkingwithnestedstructuresGettingreadyHowtodoit…Howitworks…Thereismore…
TheXML_MaptransformGettingreadyHowtodoit…Howitworks…
TheHierarchy_FlatteningtransformGettingreadyHowtodoit…
HorizontalhierarchyflatteningVerticalhierarchyflattening
Howitworks…Queryingresulttables
ConfiguringAccessServerGettingreadyHowtodoit…Howitworks…
Creatingreal-timejobsGettingready
InstallingSoapUIHowtodoit…Howitworks…
11.WorkingwithSAPApplications
IntroductionLoadingdataintoSAPERPGettingreadyHowtodoit…Howitworks…
IDocMonitoringIDocloadontheSAPsidePost-loadvalidationofloadeddata
Thereismore…
12.IntroductiontoInformationSteward
IntroductionExploringDataInsightcapabilitiesGettingreadyHowtodoit…
CreatingaconnectionobjectProfilingthedata
ViewingprofilingresultsCreatingavalidationruleCreatingascorecard
Howitworks…ProfilingRulesScorecards
Thereismore…PerformingMetadataManagementtasksGettingreadyHowtodoit…Howitworks…
WorkingwiththeMetapediafunctionalityHowtodoit…Howitworks…
CreatingacustomcleansingpackagewithCleansingPackageBuilderGettingreadyHowtodoit…Howitworks…Thereismore…
Index
SAPDataServices4.xCookbook
SAPDataServices4.xCookbookCopyright©2015PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
Firstpublished:November2015
Productionreference:1261115
PublishedbyPacktPublishingLtd.
LiveryPlace
35LiveryStreet
BirminghamB32PB,UK.
ISBN978-1-78217-656-5
www.packtpub.com
CreditsAuthor
IvanShomnikov
Reviewers
AndrésAguadoAranda
DickGroenhof
BernardTimbalDuclauxdeMartin
SridharSunkaraneni
MeenakshiVerma
CommissioningEditor
VinayArgekar
AcquisitionEditors
ShaonBasu
KevinColaco
ContentDevelopmentEditor
MerintMathew
TechnicalEditor
HumeraShaikh
CopyEditors
BrandtD’mello
ShrutiIyer
KarunaNarayanan
SameenSiddiqui
ProjectCoordinator
FrancinaPinto
Proofreader
SafisEditing
Indexer
MonicaAjmeraMehta
ProductionCoordinator
NileshMohite
CoverWork
NileshMohite
AbouttheAuthorIvanShomnikovisanSAPanalyticsconsultantspecializingintheareaofExtract,Transform,andLoad(ETL).Hehasin-depthknowledgeofthedatawarehouselifecycleprocesses(DWHdesignandETLdevelopment)andextensivehands-onexperiencewithboththeSAPEnterpriseInformationManagement(DataServices)technologystackandtheSAPBusinessObjectsreportingproductsstack(WebIntelligence,Designer,Dashboards).
IvanhasbeeninvolvedintheimplementationofcomplexBIsolutionsontheSAPBusinessObjectsEnterpriseplatforminmajorNewZealandcompaniesacrossdifferentindustries.HealsohasastrongbackgroundasanOracledatabaseadministratoranddeveloper.
Thisismyfirstexperienceofwritingabook,andIwouldliketothankmypartnerandmysonfortheirpatienceandsupport.
AbouttheReviewersAndrésAguadoArandaisa26-year-oldcomputerengineerfromSpain.Hisexperiencehasgivenhimareallytechnicalbackgroundindatabases,datawarehouse,andbusinessintelligence.
Andréshasworkedindifferentbusinesssectors,suchasbanking,publicadministrations,andenergy,since2012indata-relatedpositions.
Thisbookismyfirststintasareviewer,andithasbeenreallyinterestingandvaluabletome,bothpersonallyandprofessionally.
IwouldliketothankmyfamilyandfriendsforalwaysbeingwillingtohelpmewhenIneeded.Also,Iwouldliketothanktomyformercoworkerandcurrentlyfriend,AntonioMartín-Cobos,aBIreportinganalystwhoreallyhelpedmegetthisopportunity.
DickGroenhofstartedhisprofessionalcareerin1990afterfinishinghisstudiesinbusinessinformationscienceatVrijeUniversiteitAmsterdam.Havingworkedasasoftwaredeveloperandservicemanagementconsultantforthefirstpartofhiscareer,hebecameactiveasaconsultantinthebusinessintelligencearenasince2005.
DickhasbeenaleadconsultantonnumerousSAPBIprojects,designingandimplementingsuccessfulsolutionsforhiscustomers,whoregardhimasatrustedadvisor.Hiscorecompetencesincludebothfrontend(suchasWebIntelligence,CrystalReports,andSAPDesignStudio)andbackendtools(suchasSAPDataServicesandInformationSteward).DickisanearlyadopteroftheSAPHANAplatform,creatinginnovativesolutionsusingHANAInformationViews,PredictiveAnalysisLibrary,andSQLScript.
HeisaCertifiedApplicationAssociateinSAPHANAandSAPBusinessObjectsWebIntelligence4.1.Currently,DickworksasseniorHANAandbigdataconsultantforahighlyrespectedandinnovativeSAPpartnerintheNetherlands.
HeisastrongbelieverinsharinghisknowledgewithregardtoSAPHANAandSAPDataServicesbywritingblogs(athttp://www.dickgroenhof.comandhttp://www.thenextview.nl/blog)andspeakingatseminars.
DickishappilymarriedtoEmmaandisaveryproudfatherofhisson,Christiaan,anddaughter,Myrthe.
BernardTimbalDuclauxdeMartinisabusinessintelligencearchitectandtechnicalexpertwithmorethan15yearsofexperience.Hehasbeeninvolvedinseverallargebusinessintelligencesystemdeploymentsandadministrationinbankingandinsurancecompanies.Inaddition,Bernardhasskillsinmodeling,dataextraction,transformation,loading,andreportingdesign.Hehasauthoredfourbooks,includingtworegardingSAPBusinessObjectsEnterpriseadministration.
MeenakshiVermahasbeenapartoftheITindustrysince1998.SheisanexperiencedbusinesssystemsspecialisthavingtheCBAPandTOGAFcertifications.Meenakshiiswell-versedwithavarietyoftoolsandtechniquesusedforbusinessanalysis,suchasSAPBI,SAPBusinessObjects,Java/J2EEtechnologies,andothers.SheiscurrentlybasedinToronto,Canada,andworkswithaleadingutilitycompany.
MeenakshihashelpedtechnicallyreviewmanybookspublishedbyPacktPublishingacrossvariousenterprisesolutions.HerearlierworksincludeJasperReportsforJavaDevelopers,JavaEE5DevelopmentusingGlassFishApplicationServer,PracticalDataAnalysisandReportingwithBIRT,EJB3DeveloperGuide,LearningDojo,andIBMWebSphereApplicationServer8.0AdministrationGuide.
I’dliketothankmyfather,Mr.BhopalSingh,andmother,Mrs.RajBala,forlayingastrongfoundationinmeandgivingmetheirunconditionalloveandsupport.Ialsoowethanksandgratitudetomyhusband,AtulVerma,forhisencouragementandsupportthroughoutthereviewingofthisbookandmanyothers;myten-year-oldson,PrieyaanshVerma,forgivingmethewarmthofhislovedespitemyhecticschedules;andmybrother,SachinSingh,foralwaysbeingthereforme.
www.PacktPub.com
Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.
DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<[email protected]>formoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.
https://www2.packtpub.com/books/subscription/packtlib
DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.
Whysubscribe?
FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser
FreeaccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.
InstantupdatesonnewPacktbooksGetnotified!Findoutwhennewbooksarepublishedbyfollowing@PacktEnterpriseonTwitterorthePacktEnterpriseFacebookpage.
PrefaceSAPDataServicesdeliversanenterprise-classsolutiontobuilddataintegrationprocessesaswellasperformdataqualityanddataprofilingtasks,allowingyoutogovernyourdatainahighly-efficientway.
SomeofthetasksthatDataServiceshelpsaccomplishinclude:migrationofthedatabetweendatabasesorapplications,extractingdatafromvarioussourcesystemsintoflatfiles,datacleansing,datatransformationusingeithercommondatabase-likefunctionsorcomplexcustom-builtfunctionsthatarecreatedusinganinternalscriptinglanguage,andofcourse,loadingdataintoyourdatawarehouseorexternalsystems.SAPDataServiceshasanintuitiveuser-friendlygraphicalinterface,allowingyoutoaccessallitspowerfulExtract,Transform,andLoad(ETL)capabilitiesfromthesingleDesignertool.However,gettingstartedwithSAPDataServicescanbedifficult,especiallyforpeoplewhohavelittleornoexperienceinETLdevelopment.Thegoalofthisbookistoguideyouthrougheasy-to-understandexamplesofbuildingyourownETLarchitecture.Thebookcanalsobeusedasareferencetoperformspecifictasksasitprovidesreal-worldexamplesofusingthetooltosolvedataintegrationproblems.
WhatthisbookcoversChapter1,IntroductiontoETLDevelopment,explainswhatExtract,Transform,andLoad(ETL)processesare,andwhatroleDataServicesplaysinETLdevelopment.Itincludesthestepstoconfigurethedatabaseenvironmentusedinrecipesofthebook.
Chapter2,ConfiguringtheDataServicesEnvironment,explainshowtoinstallandconfigureallDataServicescomponentsandapplications.ItintroducestheDataServicesdevelopmentGUI—theDesignertool—withthesimpleexampleof“HelloWorld”ETLcode.
Chapter3,DataServicesBasics–DataTypes,ScriptingLanguage,andFunctions,introducesthereadertoDataServicesinternalscriptinglanguage.ItexplainsvariouscategoriesoffunctionsthatareavailableinDataServices,andgivesthereaderanexampleofhowscriptinglanguagecanbeusedtocreatecustomfunctions.
Chapter4,Dataflow–Extract,Transform,andLoad,introducesthemostimportantprocessingunitinDataService,dataflowobject,andthemostusefultypesoftransformationsthatcanbeperformedinsideadataflow.Itgivesthereaderexamplesofextractingdatafromsourcesystemsandloadingdataintotargetdatastructures.
Chapter5,Workflow–ControllingExecutionOrder,introducesanotherDataServicesobject,workflow,whichisusedtogroupotherworkflows,dataflows,andscriptobjectsintoexecutionunits.ItexplainstheconditionalandloopstructuresavailableinDataServices.
Chapter6,Job–BuildingtheETLArchitecture,bringsthereadertothejobobjectlevelandreviewsthestepsusedinthedevelopmentprocesstomakeasuccessfulandrobustETLsolution.ItcoversthemonitoringanddebuggingfunctionalityavailableinDataServicesandembeddedauditfeatures.
Chapter7,ValidatingandCleansingData,introducestheconceptsofvalidatingmethods,whichcanbeappliedtothedatapassingthroughtheETLprocessesinordertocleanseandconformitaccordingtothedefinedDataQualitystandards.
Chapter8,OptimizingETLPerformance,isoneofthefirstadvancedchapters,whichstartsexplainingcomplexETLdevelopmenttechniques.ThisparticularchapterhelpstheuserunderstandhowtheexistingprocessescanbeoptimizedfurtherinDataServicesinordertomakesurethattheyrunquicklyandefficiently,consumingaslesscomputerresourcesaspossiblewiththeleastamountofexecutiontime.
Chapter9,AdvancedDesignTechniques,guidesthereaderthroughadvanceddatatransformationtechniques.ItintroducesconceptsofChangeDataCapturemethodsthatareavailableinDataServices,pivotingtransformations,andautomaticrecoveryconcepts.
Chapter10,DevelopingReal-timeJobs,introducestheconceptofnestedstructuresandthetransformsthatworkwithnestedstructures.ItcoversthemainsaspectsofhowtheycanbecreatedandusedinDataServicesreal-timejobs.ItalsointroducesnewaDataServicescomponent—AccessServer.
Chapter11,WorkingwithSAPApplications,isdedicatedtothetopicofreadingand
loadingdatafromSAPsystemswiththeexampleoftheSAPERPsystem.Itpresentsthereal-lifeusecaseofloadingdataintotheSAPERPsystemmodule.
Chapter12,IntroductiontoInformationSteward,coversanotherSAPproduct,InformationSteward,whichaccompaniesDataServicesandprovidesacomprehensiveviewoftheorganization’sdata,andhelpsvalidateandcleanseitbyapplyingDataQualitymethods.
WhatyouneedforthisbookTousetheexamplesgiveninthisbook,youwillneedtodownloadandmakesurethatyouarelicensedtousethefollowingsoftwareproducts:
SQLServerExpress2012SAPDataServices4.2SP4orhigherSAPInformationSteward4.2SP4orhigherSAPERP(ECC)SoapUI—5.2.0
WhothisbookisforThebookwillbeusefultoapplicationdevelopersanddatabaseadministratorswhowanttogetfamiliarwithETLdevelopmentusingSAPDataServices.ItcanalsobeusefultoETLdevelopersorconsultantswhowanttoimproveandextendtheirknowledgeofthistool.ThebookcanalsobeusefultodataandbusinessanalystswhowanttotakeapeekatthebackendofBIdevelopment.TheonlyrequirementofthisbookisthatyouarefamiliarwiththeSQLlanguageandgeneraldatabaseconcepts.Knowledgeofanykindofprogramminglanguagewillbeabenefitaswell.
SectionsInthisbook,youwillfindseveralheadingsthatappearfrequently(Gettingready,Howtodoit,Howitworks,There’smore,andSeealso).
Togiveclearinstructionsonhowtocompletearecipe,weusethesesectionsasfollows:
GettingreadyThissectiontellsyouwhattoexpectintherecipe,anddescribeshowtosetupanysoftwareoranypreliminarysettingsrequiredfortherecipe.
Howtodoit…Thissectioncontainsthestepsrequiredtofollowtherecipe.
Howitworks…Thissectionusuallyconsistsofadetailedexplanationofwhathappenedintheprevioussection.
There’smore…Thissectionconsistsofadditionalinformationabouttherecipeinordertomakethereadermoreknowledgeableabouttherecipe.
SeealsoThissectionprovideshelpfullinkstootherusefulinformationfortherecipe.
ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.
Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“Wecanincludeothercontextsthroughtheuseoftheincludedirective.”
Ablockofcodeissetasfollows:select*
fromdbo.al_langtexttxt
JOINdbo.al_parent_childpc
ontxt.parent_objid=pc.descen_obj_key
where
pc.descen_obj=‘WF_continuous’;
Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:AlGUIComment(“ActaName_1”=‘RSavedAfterCheckOut’,“ActaName_2”=‘RDate_created’,“ActaName_3”=‘RDate_modified’,“ActaValue_1”=‘YES’,“ActaValue_2”=‘SatJul0416:52:332015’,“ActaValue_3”=‘SunJul0511:18:022015’,“x”=’-1’,“y”=’-1’)
CREATEPLANWF_continuous::‘7bb26cd4-3e0c-412a-81f3-b5fdd687f507’()
DECLARE
$l_DirectoryVARCHAR(255);
$l_FileVARCHAR(255);
BEGIN
AlGUIComment(“UI_DATA_XML”=’<UIDATA><MAINICON><LOCATION><X>0</X>
<Y>0</Y></LOCATION><SIZE><CX>216</CX><CY>-179</CY></SIZE></MAINICON>
<DESCRIPTION><LOCATION><X>0</X><Y>-190</Y></LOCATION><SIZE><CX>200</CX>
<CY>200</CY></SIZE><VISIBLE>0</VISIBLE></DESCRIPTION></U
IDATA>’,“ui_display_name”=‘script’,“ui_script_text”=’$l_Directory='C:\\AW\\Files\\';
$l_File='flag.txt';
$g_count=$g_count+1;
print('Execution#'||$g_count);
print('Starting'||workflow_name()||'…');
sleep(10000);
print('Finishing'||workflow_name()||'…');’,“x”=‘116’,“y”=’-175’)
BEGIN_SCRIPT
$l_Directory=‘C:\AW\Files\’;$l_File=‘flag.txt’;$g_count=($g_count+1);print((‘Execution#’||$g_count));print(((‘Starting’||workflow_name())||’…’));sleep(10000);print(((‘Finishing’||workflow_name())||’…’));END
END
SET(“loop_exit”=‘fn_check_flag($l_Directory,$l_File)’,“loop_exit
_option”=‘yes’,“restart_condition”=‘no’,“restart_count”=‘10’,“restart_count_option”=‘yes’,“workflow_type”=‘Continuous’)
Anycommand-lineinputoroutputiswrittenasfollows:
setup.exeSERVERINSTALL=Yes
Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:“OpentheworkflowpropertiesagaintoeditthecontinuousoptionsusingtheContinuousOptionstab.”
NoteWarningsorimportantnotesappearinaboxlikethis.
TipTipsandtricksappearlikethis.
ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.
Tosendusgeneralfeedback,simplye-mail<[email protected]>,andmentionthebook’stitleinthesubjectofyourmessage.
Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.
CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.
DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
DownloadingthecolorimagesofthisbookWealsoprovideyouwithaPDFfilethathascolorimagesofthescreenshots/diagramsusedinthisbook.Thecolorimageswillhelpyoubetterunderstandthechangesintheoutput.Youcandownloadthisfilefrom:https://www.packtpub.com/sites/default/files/downloads/6565EN_Graphics.pdf.
ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.
Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.
PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.
Pleasecontactusat<[email protected]>withalinktothesuspectedpiratedmaterial.
Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.
QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<[email protected]>,andwewilldoourbesttoaddresstheproblem.
Chapter1.IntroductiontoETLDevelopmentInthischapter,wewillcover:
PreparingadatabaseenvironmentCreatingasourcesystemdatabaseDefiningandcreatingstagingareastructuresCreatingatargetdatawarehouse
IntroductionSimplyput,Extract-Transform-Load(ETL)isanengineofanydatawarehouse.ThenatureoftheETLsystemisstraightforward:
Extractdatafromoperationaldatabases/systemsTransformdataaccordingtotherequirementsofyourdatawarehousesothatthedifferentpiecesofdatacanbeusedtogetherApplydataqualitytransformationmethodsinordertocleansedataandensurethatitisreliablebeforeitgetsloadedintoadatawarehouseLoadconformeddataintoadatawarehousesothatenduserscanaccessitviareportingtools,usingclientapplicationsdirectly,orwiththehelpofSQL-basedquerytools
Whileyourdatawarehousedeliverystructuresordatamartsrepresentthefrontendor,inotherwords,whatusersseewhentheyaccessthedata,theETLsystemitselfisabackbonebackendsolutionthatdoesalltheworkofmovingdataandgettingitreadyintimeforuserstouse.BuildingtheETLsystemcanbeareallychallengingtask,andthoughitisnotpartofthedatawarehousedatastructures,itisdefinitelythekeyfactorindefiningthesuccessofthedatawarehousesolutionasawhole.Intheend,whowantstouseadatawarehousewherethedataisunreliable,corrupted,orsometimesevenmissing?ThisisexactlywhatETLisresponsibleforgettingright.
ThefollowingdatastructuretypesmostoftenusedinETLdevelopmenttomovedatabetweensourcesandtargetsareflatfiles,XMLdatasets,andDBMStables,bothinnormalizedschemasanddimensionaldatamodels.WhenchoosinganETLsolution,youmightfacetwosimplechoices:buildingahandcodedETLsolutionorusingacommercialone.
ThefollowingaresomeadvantagesofahandcodedETLsolution:
AprogramminglanguageallowsyoutobuildyourownsophisticatedtransformationsYouaremoreflexibleinbuildingtheETLarchitectureasyouarenotlimitedbythevendor’sETLabilitiesSometimes,itcanbeacheapwayofbuildingafewsimplisticETLprocesses,
whereasbuyinganETLsolutionfromavendorcanbeoverkillYoudonothavetospendtimelearningthecommercialETLsolution’sarchitectureandfunctionality
HerearesomeadvantagesofacommercialETLsolution:
Thisismoreoftenasimpler,faster,andcheaperdevelopmentoptionasavarietyofexistingtoolsallowyoutobuildaverysophisticatedETLarchitecturequicklyYoudonothavetobeaprofessionalprogrammertousethetoolItautomaticallymanagesETLmetadatabycollecting,storing,andpresentingittotheETLdeveloper,whichisanotherimportantaspectofanyETLsolutionIthasahugerangeofadditionalready-to-usefunctionality,frombuilt-inschedulerstovariousconnectorstoexistingsystems,built-indatalineages,impactanalysisreports,andmanyothers
InthemajorityofDWHprojects,thecommercialETLsolutionfromaspecificvendor,inspiteofthehigherimmediatecost,eventuallysavesyouasignificantamountofmoneyonthedevelopmentandmaintenanceofETLcode.
SAPDataServicesisanETLsolutionprovidedbySAPandispartoftheEnterpriseInformationManagementproductstack,whichalsoincludesSAPInformationSteward;wewillreviewthisinoneofthelastchaptersofthisbook.
PreparingadatabaseenvironmentThisrecipewillleadyouthroughthefurtherstepsofpreparingtheworkingenvironment,suchaspreparingadatabaseenvironmenttobeutilizedbyETLprocessesasasourceandstagingandtargetingsystemsforthemigratedandtransformeddata.
GettingreadyTostarttheETLdevelopment,weneedtothinkaboutthreethings:thesystemthatwewillsourcethedatafrom,ourstagingarea(forinitialextractsandasapreliminarystoragefordataduringsubsequenttransformationsteps),andfinally,thedatawarehouseitself,towhichthedatawillbeeventuallydelivered.
Howtodoit…Throughoutthebook,wewillusea64-bitenvironment,soensurethatyoudownloadandinstallthe64-bitversionsofsoftwarecomponents.Performthefollowingsteps:
1. Let’sstartbypreparingoursourcesystem.Forquickdeployment,wewillchoosethe
MicrosoftSQLServer2012Expressdatabase,whichisavailablefordownloadathttp://www.microsoft.com/en-nz/download/details.aspx?id=29062.
2. ClickontheDownloadbuttonandselecttheSQLEXPRWT_x64_ENU.exefileinthelistoffilesthatareavailablefordownload.Thispackagecontainseverythingrequiredfortheinstallationandconfigurationofthedatabaseserver:theSQLServerExpressdatabaseengineandtheSQLServerManagementStudiotool.
3. Afterthedownloadiscomplete,runtheexecutablefileandfollowtheinstructionsonthescreen.TheinstallationofSQLServer2012Expressisextremelystraightforward,andalloptionscanbesettotheirdefaultvalues.Thereisnoneedtocreateanydefaultdatabasesduringoraftertheinstallationaswewilldoitabitlater.
Howitworks…Afteryouhavecompletedtheinstallation,youshouldbeabletoruntheSQLServerManagementStudioapplicationandconnecttoyourdatabaseengineusingthesettingsprovidedduringtheinstallationprocess.
Ifyouhavedoneeverythingcorrectly,youshouldseethe“green”stateofyourDatabaseEngineconnectionintheObjectExplorerwindowofSQLServerManagementStudio,asshowninthefollowingscreenshot:
Weneedan“empty”installationofMSSQLServer2012Expressbecausewewillcreateallthedatabasesweneedmanuallyinthenextstepsofthischapter.Thisdatabaseengineinstallationwillhostalloursource,stage,andtargetrelationaldatastructures.ThisoptionallowsustoeasilybuildatestenvironmentthatisperfectforlearningpurposesinordertobecomefamiliarwithETLdevelopmentusingSAPDataServices.
Inareal-lifescenario,yoursourcedatabases,stagingareadatabase,andDWHdatabase/appliancewillmostlikelyresideonseparateserverhosts,andtheymaysometimesbefromdifferentvendors.So,theroleofSAPDataServicesistolinkthemtogetherinordertomigratedatafromonesystemtoanother.
CreatingasourcesystemdatabaseInthissection,wewillcreateoursourcedatabase,whichwillplaytheroleofanoperationaldatabasethatwewillpulldatafromwiththehelpofDataServicesinordertotransformthedataanddeliverittoadatawarehouse.
Howtodoit…Luckilyforus,thereareplentyofdifferentflavorsofready-to-usedatabasesontheWebnowadays.Let’spickoneofthemostpopularones:AdventureWorksOLTPforSQLServer2012,whichisavailablefordownloadontheCodePlexwebsite.Performthefollowingsteps:
1. Usethefollowinglinktoseethelistofthefilesavailablefordownload:
https://msftdbprodsamples.codeplex.com/releases/view/55330
2. ClickontheAdventureWorks2012DataFilelink,whichshoulddownloadtheAdventureWorks2012_Data.mdfdatafile.
3. Whenthedownloadiscomplete,copythefileintotheC:\AdventureWorks\directory(createitbeforecopyingifnecessary).
Thenextstepistomapthisdatabasefiletoourdatabaseengine,whichwillcreateoursourcedatabase.Todothis,performthefollowingsteps:
1. StartSQLServerManagementStudio.2. ClickontheNewQuerybutton,whichwillopenanewsessionconnectiontoa
masterdatabase.3. IntheSQLQuerywindow,typethefollowingcommandandpressF5toexecuteit:
CREATEDATABASEAdventureWorks_OLTPON
(FILENAME=‘C:\AdventureWorks\AdventureWorks2012_Data.mdf’)
FORATTACH_REBUILD_LOG;
4. Afterasuccessfulcommandexecutionanduponrefreshingthedatabaselist(usingF5),youshouldbeabletoseetheAdventureWorks_OLTPdatabaseinthelistoftheavailabledatabasesintheObjectExplorerwindowofSQLServerManagementStudio.
TipDownloadingtheexamplecode
YoucandownloadtheexamplecodefilesforallPacktbooksyouhavepurchasedfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
Howitworks…Inatypicalscenario,everySQLServerdatabaseconsistsoftwodatafiles:adatabasefileandatransactionlogfile.Adatabasefilecontainsactualdatastructuresanddata,whileatransactionlogfilekeepsthetransactionalchangesappliedtothedata.
Asweonlydownloadedthedatafile,wehadtoexecutetheCREATEDATABASEcommandwithaspecialATTACH_REBUILD_LOGclause,whichautomaticallycreatesamissingtransactionlogfilesothatthedatabasecouldbesuccessfullydeployedandopened.
Now,oursourcedatabaseisreadytobeusedbyDataServicesinordertoaccess,browse,andextractdatafromit.
There’smore…Therearedifferentwaystodeploytestdatabases.ThismainlydependsonwhichRDBMSsystemyouuse.Sometimes,youmayfindapackageofSQLscriptsthatcontainsthecommandsrequiredtocreateallthedatabasestructuresandcommandsusedtoinsertdataintothesestructures.Thisoptionmaybeusefulifyouhaveproblemswithattachingthedownloadedmdfdatafiletoyourdatabaseengineor,forexample,ifyoufindtheSQLscriptscreatedforSQLServerRDBMSbuthavetoapplythemtotheOracleDB.Withslightmodificationstothecommand,youcanruntheminordertocreateanOracledatabase.
ExplainingRDBMStechnologiesliesbeyondthescopeofthisbook.So,ifyouarelookingformoreinformationregardinghowaspecificRDBMSsystemworks,refertotheofficialdocumentation.
WhathastobesaidhereisthatfromtheperspectiveofusingDataServices,itdoesnotmatterwhichsourcesystemortargetsystemsyouuse.DataServicesnotonlysupportsthemajorityofthem,butitalsocreatesitsownrepresentationofthesourceandtargetobjects;thisway,theyalllookthesametoDataServicesusersandabidebythesameruleswithintheDataServicesenvironment.So,youreallydonothavetobeaDBAordatabasedevelopertoeasilyconnecttoanyRDBMSfromDataServices.AllthatisrequiredisaknowledgeoftheSQLlanguagetounderstandtheprincipleofmethodsthatDataServicesuseswhenextractingandloadingdataorcreatingdatabaseobjectsforyou.
DefiningandcreatingstagingareastructuresInthisrecipe,wewilltalkaboutETLdatastructuresthatwillbeusedinthisbook.Stagingstructuresareimportantstorageareaswhereextracteddataiskeptbeforeitgetstransformedorstoredbetweenthetransformationsteps.Thestagingareaingeneralcanbeusedtocreatebackupcopiesofdataortorunanalyticalqueriesonthedatainordertovalidatethetransformationsmadeortheextractprocesses.Stagingdatastructurescanbequitedifferent,asyouwillsee.Whichonetousedependsonthetasksyouaretryingtoaccomplish,yourprojectrequirements,andthearchitectureoftheenvironmentused.
Howtodoit…ThemostpopulardatastructuresthatcouldbeusedinthestagingareaareflatfilesandRDBMStables.
FlatfilesOneoftheperksofusingDataServicesagainstthehandcodedETLsolutionisthatDataServicesallowsyoutoeasilyreadfromandwriteinformationtoaflatfile.
CreatetheC:\AW\folder,whichwillbeusedthroughoutthisbooktostoreflatfiles.
NoteInsertingdataintoaflatfileisfasterthaninsertingdataintoanRDBMStable.So,duringETLdevelopment,flatfilesareoftenusedtoreachtwogoalssimultaneously:creatingabackupcopyofthedatasnapshotandprovidingyouwiththestoragelocationforyourpreliminarydatabeforeyouapplythenextsetoftransformationrules.
Anothercommonuseofflatfilesistheabilitytoexchangedatabetweensystemsthatcannotcommunicatewitheachotherinanyotherway.
Lastly,itisverycost-effectivetostoreflatfiles(OSdiskstoragespaceischeaperthanDBstoragespace).
Themaindisadvantageoftheflatfilesstoragemethodisthatthemodificationofdatainaflatfilecansometimesbearealpain,nottomentionthatitismuchslowerthanmodifyingdatainarelationalDBtable.
RDBMStablesTheseETLdatastructureswillbeusedmoreoftenthanotherstostagethedatathatisgoingthroughtheETLtransformationprocess.
Let’screatetwoseparatedatabasesforrelationaltables,whichwillplaytheroleoftheETLstagingareainourfutureexamples:
1. OpenSQLServerManagementStudio.2. Right-clickontheDatabasesiconandselecttheNewDatabase…option.3. Onthenextscreen,inputODSasthedatabasename,andspecify100MBastheinitial
sizevalueofthedatabasefileand10MBasthatofthetransactionallogfile:
4. RepeatthelasttwostepstocreateanotherdatasetcalledSTAGE.
Howitworks…Let’srecap.TheETLstagingareaisalocationtostorethepreliminaryresultsofourETLtransformationsandalsoalandingzonefortheextractsfromthesourcesystem.
Yes,DataServicesallowsyoutoextractdataandperformalltransformationsinthememorybeforeloadingtothetargetsystem.However,asyouwillseeinlaterchapters,theETLprocess,whichdoeseverythinginone“go”,canbecomplexanddifficulttomaintain.Plus,ifsomethinggoeswrongalongtheway,allthechangesthattheprocesshasalreadyperformedwillbelostandyoumayhavetostarttheextraction/transformationprocessagain.Thisobviouslycreatesextraworkloadonasourcesystembecauseyouhavetoqueryitagaininordertogetthedata.Finally,bigdoesnotmeaneffective.WewillshowyouhowsplittingyourETLprocessintosmallerpieceshelpsyoutocreateawell-performingsequenceofdataflow.
TheODSdatabasewillbeusedasalandingzoneforthedatacomingfromsourcesystems.Thestructureofthetablesherewillbeidenticaltothestructureofthesourcesystemtables.
TheSTAGEdatabasewillholdtherelationaltablesusedtostoredatabetweenthedatatransformationsteps.
WewillalsostoresomedataextractedfromasourcedatabaseinaflatfileformattodemonstratetheabilityofDataServicestoworkwiththemandshowtheconvenienceofthisdatastoragemethodintheETLsystem.
CreatingatargetdatawarehouseFinally,thisisthetimetocreateourtargetdatawarehousesystem.Thedatawarehousestructuresandtableswillbeusedbyenduserswiththehelpofvariousreportingtoolstomakesenseofthedataandanalyzeit.Asaresult,itshouldhelpbusinessuserstomakestrategicdecisions,whichwillhopefullyleadtobusinessgrowth.
Weshouldnotforgetthatthemainpurposeofadatawarehouse,andhencethatofourETLsystem,istoservebusinessneeds.
GettingreadyThedatawarehousecreatedinthisrecipewillbeusedasatargetdatabasepopulatedbytheETLprocessesdevelopedinSAPDataServices.ThisiswherethedatamodifiedandcleansedbyETLprocesseswillbeinsertedintheend.Plus,thisisthedatabasethatwillmainlybeaccessedbybusinessusersandreportingtools.
Howtodoit…Performthefollowingsteps:
1. AdventureWorkscomestotherescueagain.Useanotherlinktodownloadthe
AdventureWorksdatawarehousedatafile,whichwillbemappedinthesamemannertoourSQLServerExpressdatabaseengineinordertocreatealocaldatawarehouseforourownlearningpurposes.GotothefollowingURLandclickontheAdventureWorksDWforSQLServer2012link:
https://msftdbprodsamples.codeplex.com/releases/view/105902
2. AfteryouhavesuccessfullydownloadedtheAdventureWorksDW2012.zipfile,unpackitscontentsintothesamedirectoryasthepreviousfile:C:\AdventureWorks\
3. Thereshouldbetwofilesinthearchive:AdventureWorksDW2012_Data.mdf—thedatabasedatafileAdventureWorksDW2012_Log.ldf—thedatabasetransactionlogfile
4. OpenSQLServerManagementStudioandclickontheNewQuery…buttonintheuppermosttoolbar.
5. EnterandexecutethefollowingcommandintheSQLQuerywindow:CREATEDATABASEAdventureWorks_DWHON
(FILENAME=‘C:\AdventureWorks\AdventureWorksDW2012_Data.mdf’),(FILENAME=‘C:\AdventureWorks\AdventureWorksDW2012_Log.ldf’)FORATTACH;
6. Afterasuccessfulcommandexecution,right-clickontheDatabasesiconandchoosetheRefreshoptionintheopenedmenulist.Thisshouldrefreshthecontentsofyourobjectlibrary,andyoushouldseethefollowinglistofdatabases:
ODS
STAGE
AdventureWorks_OLTP
AdventureWorks_DWH
Howitworks…Getyourselffamiliarwiththetablesofthecreateddatawarehouse.Throughoutthewholebook,youwillbeusingtheminordertoinsert,update,anddeletedatausingDataServices.
Therearealsosomediagramsavailablethatcouldhelpyouseethevisualdatawarehousestructure.Togetaccesstothem,openSQLServerManagementStudio,expandtheDatabaseslistintheObjectExplorerwindow,thenexpandtheAdventureWorks_DWHdatabaseobjectlist,andfinallyopentheDiagramstree.Double-clickingonanydiagraminthelistopensanewwindowwithinManagementStudiowiththegraphicalpresentationoftables,keycolumns,andlinksbetweenthetables,whichshowsyoutherelationshipsbetweenthem.
There’smore…Inthenextrecipe,wewillhaveanoverviewoftheknowledgeresourcesthatexistontheWeb.Wehighlyrecommendthatyougetfamiliarwiththeminordertoimproveyourdatawarehousingskills,learnaboutthedatawarehouselifecycle,andunderstandwhatmakesasuccessfuldatawarehouseproject.Inthemeantime,feelfreetoopenNewQueryinSQLServerManagementStudioandstartrunningtheSELECTcommandstoexplorethecontentsofthetablesinyourAdventureWorks_DWHdatabase.
NoteThemostimportantassetofanyDWHarchitectorETLdeveloperisnottheknowledgeofaprogramminglanguageortheavailabletoolsbuttheabilitytounderstandthedatathatis,orwillbe,populatingthedatawarehouseandthebusinessneedsandrequirementsforthisdata.
Chapter2.ConfiguringtheDataServicesEnvironmentInthischapter,wewillinstallandconfigureallcomponentsrequiredforSAPDataServices.Inthischapter,wewillcoverthefollowingtopics:
CreatingIPSandDataServicesrepositoriesInstallingandconfiguringInformationPlatformServicesInstallingandconfiguringDataServicesConfiguringuseraccessStartingandstoppingservicesAdministeringtasksUnderstandingtheDesignertool
IntroductionThesamethingthatmakesSAPDataServicesagreatETLdevelopmentenvironmentmakesitquitenotatrivialonetoinstallandconfigure.Herethough,youhavetorememberthatDataServicesisanenterpriseclassETLsolutionthatisabletosolvethemostcomplexETLtasks.
Seethefollowingimageforaveryhigh-levelDataServicesarchitectureview.DataServiceshastwobasicgroupsofcomponents:clienttoolsandserver-basedcomponents:
Clienttoolsincludethefollowing(therearemore,butwementiontheonesmostoftenused):
TheDesignertool:Thisistheclient-basedmainGUIapplicationforETLdevelopmentRepositoryManager:Thisisaclient-basedGUIapplicationforDataServicestocreate,configure,andupgradeDataServicesrepositories
Themainserver-basedcomponentsincludethefollowingones:
IPSServices:Thisisusedforuserauthentication,systemconfigurationstorage,andinternalmetadatamanagementJobServer:ThisisacoreengineservicethatexecutesETLcode
Accessserver:Thisisareal-timerequest-replymessagebroker,whichimplementsreal-timeservicesintheDataServicesenvironmentWebapplicationserver:ThisprovidesaccesstosomeDataServicesadministrationandreportingtasksviatheDSManagementConsoleandCentralManagementConsoleweb-basedapplications
Inthecourseofthenextfewrecipes,wewillinstall,configure,andaccessallthecomponentsrequiredtoperformthemajorityofETLdevelopmenttasks.YouwilllearnabouttheirpurposesandsomeusefultipsthatwillhelpyoueffectivelyworkintheDataServicesenvironmentthroughoutthebookandinyourfuturework.
DataServicesinstallationsupportsallmajorOSanddatabaseenvironments.Forlearningpurposes,wehavechosentheWindowsOSasitinvolvestheleastconfigurationontheuserpart.BothclienttoolsandservercomponentswillbeinstalledonthesameWindowshost.
CreatingIPSandDataServicesrepositoriesTheIPSrepositoryisastorageforenvironmentanduserconfigurationinformationandmetadatacollectedbyvariousservicesofIPSandDataServices.Ithasanothername:theCMSdatabase.ThisnameshouldbequitefamiliartothosewhohaveusedSAPBusinessIntelligencesoftware.Basically,IPSisalightversionofSAPBIproductpackage.YouwillalwaysuseonlyoneIPSrepositoryperDataServicesinstallationandmostlikelywilldealwithitonlyonce:whenconfiguringtheenvironmentattheverybeginning.Mostofthetime,DataServiceswillbecommunicatingwithIPSservicesandtheCMSdatabaseinthebackground,withoutyouevennoticing.
TheDataServicesrepositoryisadifferentstory.ItismuchclosertoanETLdeveloperasitisadatabasethatstoresyourdevelopedcode.Inamultiuserdevelopmentenvironment,everyETLdeveloperusuallyhasitsownrepository.Theycanbeoftwotypes:centralandlocal.TheyservedifferentpurposesintheETLlifecycle,andIwillexplainthisinmoredetailintheupcomingchapters.Meanwhile,let’screateourfirstlocalDataServicesrepository.
Gettingready…BothrepositorieswillbestoredinthesameSQLServerExpressRDBMS((local)\SQLEXPRESS)thatweusedtocreateoursourceOLTPdatabase,ETLstagingdatabases,andtargetdatawarehouse.So,atthispoint,youonlyneedtohaveaccesstoSQLServerManagementStudioandyourSQLServerExpressservicesneedtostart.
Howtodoit…Thiswillconsistoftwomajortasks:
1. Creatingadatabase:
1. LogintoSQLServerManagementStudioandcreatetwodatabases:IPS_CMSandDS_LOCAL_REPO.
2. Rightnow,yourdatabaselistshouldlooklikethis:
2. ConfiguringtheODBClayer:InstallationrequiresthatyoucreatetheODBCdatasourcefortheIPS_CMSdatabase.1. GotoControlPanel|AdministrativeTools|ODBCDataSources(64-bit).2. OpentheSystemDSNtabandclickontheAdd…button.3. Choosethenameofthedatasource:SQL_IPS,thedescriptionSQLServer
Express,andtheSQLServeryouwanttoconnecttothroughthisODBCdatasource:(local)\SQLEXPRESS.Then,clickonNext.
4. ChooseSQLServerauthenticationandselectthecheckboxConnecttoSQLtoobtainthedefaultsettings.EntertheloginID(sauser)andpassword.ClickonNext.
5. SelectthecheckboxandchangethedefaultdatabasetoIPS_CMS.ClickonNext.6. SkipthenextscreenbyclickingonNext.7. ThefinalscreenoftheODBCconfigurationshouldlooklikethefollowing
screenshot.Then,clickingontheTestDataSourcebuttonshouldgiveyouthemessage,TESTSCOMPLETEDSUCCESSFULLY!
Howitworks…ThesetwoemptydatabaseswillbeusedbyDataServicestoolsduringinstallationandpost-installationconfigurationtasks.Allstructuresinsidethemwillbecreatedandpopulatedautomatically.
Usually,theyarenotbuiltforuserstoaccessthemdirectly,butintheupcomingchapters,Iwillshowyouafewtricksonhowtoextractvaluableinformationfromtheminordertotroubleshootpotentialproblems,doalittlebitofETLmetadatareporting,oruseanextendedsearchforETLobjects,whichisnotpossibleintheGUIoftheDesignertool.
TheODBClayerconfiguredfortheIPS_CMSdatabaseallowsyoutoaccessitfromtheIPSinstallation.WhenweinstallbothIPSandDataServices,youwillbeabletoconnecttothedatabasesdirectlyfromtheDataServicesapplications,asithasnativedriversforvarioustypesofdatabasesandalsoallowsyoutoconnectthroughODBClayersifyouwant.
SeealsoReferencestoafuturechaptercontainingtechniquesmentionedintheprecedingparagraph.
InstallingandconfiguringInformationPlatformServicesTheInformationPlatformServices(IPS)productpackagewasaddedasacomponentintotheDataServicesbundlestartingfromtheDataServices4.xversion.ThereasonforthiswastomaketheDataServicesarchitectureflexibleandrobustandintroducesomeextrafunctionality,thatis,ausermanagementlayertotheexistingSAPDataServicessolution.Aswementionedbefore,IPSisalightversionofSAPBIcoreservicesandhasalotofsimilarfunctionality.
Inthisrecipe,wewillperformtheinstallationandbasicconfigurationofIPS,whichisamandatorycomponentforfutureDataServicesinstallations.
TipAsanoption,youcouldalwaysusetheexistingfullenterpriseSAPBIsolutionifyouhaveitinstalledinyourenvironment.However,thisisgenerallyconsideredabadpractice.Imaginethatitislikestoringalleggsinonebasket.WheneveryouneedtoplandowntimeforyourBIsystem,youshouldkeepinmindthatitwillaffectyourETLenvironmentaswell,andyouwillnotbeabletorunanyDataServicesjobsduringthisperiod.Thatiswhy,IPSisinstalledtobeusedonlybyDataServicesasasaferandmoreconvenientoptionintermsofsupportandmaintenance.
Gettingready…DownloadtheInformationPlatformServicesinstallationpackagefromtheSAPsupportportalandunzipittothelocationofyourchoice.ThemainrequirementforinstallingIPSaswellasDataServicesinthenextrecipeisthatyourOSshouldhavea64-bitarchitecture.
Howtodoit…1. CreateanEIMfolderinyourCdrivetostoreyourinstallationinoneplace.2. LaunchtheIPSinstallerbyexecutingInstallIPS.exe.3. MakesurethatallyourcriticalprerequisiteshavetheSucceededstatusontheCheck
Prerequisitesscreen.Continuetothenextscreen.4. ChooseC:\EIM\astheinstallationdestinationfolder.Continuetothenextscreen.5. ChoosetheFullinstallationtype.Continuetothenextscreen.6. OnSelectDefaultorExistingDatabase,chooseConfigureanexistingdatabase
andcontinuetothenextscreen.7. SelectMicrosoftSQLServerusingODBCastheexistingCMSdatabasetype.8. SelectNoauditingdatabaseonthenextscreenandcontinue.9. ChooseInstallthedefaultTomcatJavaWebApplicationServerand
automaticallydeploywebapplications.Continuetothenextscreen.10. Forversionmanagement,chooseDonotconfigureaversioncontrolsystematthis
time.11. Onthenextscreen,specifytheSIAnameintheNodenamefieldasIPSandSIA
portas6410.12. DonotchangethedefaultCMSport,6400.13. OntheCMSaccountconfigurationscreen,inputpasswordsfortheadministrator
useraccountandtheCMSclusterkey(theycanbethesameifyouwant).Continuefurther.
14. UsethefollowingsettingsfromthefollowingscreenshottoconfiguretheCMSRepositoryDatabase:
15. LeavethedefaultvaluesforTomcatportsonthenextscreenandclickonNext.RemembertheConnectionPortsetting(defaultis8080)asyouwillrequireittoconnecttotheIPSandDataServiceswebapplications.
16. DonotconfigureconnectivitytoSMDAgent.17. DonotconfigureconnectivitytoIntroscopeEnterpriseManager.18. Finally,theinstallationwillbegin.Itshouldtakeapproximately5–15minutes,
dependingonyourhardware.
Howitworks…Now,byinstallingIPS,wepreparedthebaselayers,ontopofwhichwewillinstalltheDataServicesinstallationpackageitself.
TocheckthatyourIPSinstallationwassuccessful,starttheCentralManagementConsolewebapplicationusingthehttp://localhost:8080/BOE/CMCURLandusetheadministratoraccountthatyousetupduringIPSinstallationtologin.Inthesystemfield,uselocalhost:6400(yourhostnameandCMSportnumberspecifiedduringIPSinstallation).
CheckouttheCoreServicestreeintheServerssectionofCMC.AllserviceslistedshouldhavetheRunningandEnabledstatuses.
InstallingandconfiguringDataServicesTheinstallationofDataServicesinaWindowsenvironmentisasmoothandquickprocess.Ofcourse,youhavevariousinstallationoptions,buthere,wewillchoosetheeasiestpath:thefullinstallationofallcomponentsonthesamehostwithIPSservicesinstalledandthelocalrepositoryalreadycreatedandconfigured.
Gettingready…CompletionofthepreviousrecipeshouldprepareyourenvironmenttoinstallDataServices.DownloadtheDataServicesinstallationpackagefromtheSAPsupportportalandunzipittoalocalfolder.
Howtodoit…1. StartDataServicesfromWindowscommandline(cmd)byexecutingthis
command:
setup.exeSERVERINSTALL=Yes
2. MakesurethatallyourcriticalprerequisiteshavetheSucceededstatusontheCheckPrerequisitesscreen.
3. ChoosethedestinationfolderasC:\EIM\ifrequired.4. OntheCMSconnectioninformationstep,specifytheconnectiondetailstoyour
previouslyinstalledCMS(partofIPS)installation.Thesystemislocalhost:6400,andtheuserisAdministrator.ClickonNext.
5. IntheCMSServiceStop/Startpop-upwindow,agreetorestartSIAservers.6. ChooseInstallwithdefaultconfigurationontheInstallationTypeselection
screen.7. Makesurethatyouselectallfeaturesbyselectingallthecheckboxesonthenext
featureselectionscreenandclickonNext.8. SpecifyMicrosoft_SQL_Serverasadatabasetypeforalocalrepository.9. Usethefollowingdetailsasareferencetoconfiguringyourlocalrepositorydatabase
connectiononthenextscreen:
Option Value
RegistrationnameforCMS DS4_REPO
DatabaseType Microsoft_SQL_Server
Databaseservername (local)\SQLEXPRESS
Databaseport 50664
Databasename DS_LOCAL_REPO
UserName sa
Password <sauserpassword>
10. Forlogininformation,choosetheaccountrecommendedbyinstallation.11. Theinstallationshouldbecompletedin5–10minutes,dependingonyour
environment.
Howitworks…Afterfinishingthisrecipe,youwillhavealltheDataServicesserversandclientcomponentsinstalledonthesameWindowshost.Also,yourDataServicesinstallationisintegratedwithIPSservices.
Tocheckthattheinstallationandintegrationweresuccessful,logintoCMCandseethatinthemainmenu,thereisanewsectioncalledDataServices(seetheOrganizecolumn).GotothissectionandseewhetheryourDS4_REPOexistsinthelistoflocalrepositories.
ConfiguringuseraccessInthisrecipe,IwillshowyouhowtoconfigureyouraccessasafreshETLdeveloperinaDataServicesenvironment.Wewillcreateauseraccount,assignalltherequiredfunctionalprivileges,andassignownerprivilegesforourlocalDataServicesrepository.Inamultiuserdevelopmentenvironment,youwouldrequiretoperformthisstepforeverynewlycreateduser.
Gettingready…ChoosetheusernameandpasswordforyourETLdeveloperuseraccount.WewilllogintotheCMCapplicationtocreateauseraccountandgrantittherequiredsetofprivileges.
Howtodoit…1. LaunchtheCentralManagementConsolewebapplication.2. GotoUsersandGroups.3. ClickonCreateauserbutton(seethefollowingscreenshot):
4. Intheopenedwindow,chooseausername(wepickedetl)andpassword.Also,selectthePasswordneverexpiresoptionandunselectUsermustchangepasswordatnextlogon.ChooseConcurrentUserastheconnectiontype.
5. Now,weshouldaddournewlycreatedaccounttotwopre-existingusergroups.Right-clickontheuserandchoosetheMemberOfoptionintheright-clickmenu.
6. ClickontheJoinGroupbuttoninthenewlyopenedwindowandaddtwogroupsfromthegrouplisttotherightwindowpanel:DataServicesAdministratorUsersandDataServicesDesignerUsers.ClickonOK.
7. Fromtheleft-sideinstrumentpanel,clickontheCMCHomebuttontoreturntothemainCMCscreen.
8. Now,wehavetograntouruserextraprivilegesonthelocalrepository.Forthis,opentheDataServicessection,right-clickonDS4_REPO,andchooseUserSecurityfromthecontextmenu.
9. ClickontheAddprincipalsbutton,movetheetlusertotherightpanelandclickontheAddandAssignSecuritybuttonatthebottomofthescreen.
10. Onthenextscreen,assignthefullcontrol(owner)accesslevelontheAccessLevelstabandgototheAdvancedtab.
11. ClickontheAdd/RemoveRightslinkandsetthefollowingtwooptionsthatappeartoGrantedfortheDataServicesRepositoryapplication(seethefollowingscreenshot):
12. ClickonOKintheAssignSecuritywindowtoconfirmyourconfiguration.13. Asatest,logoutoftheCMCandloginusinganewlycreateduseraccount.
Howitworks…Inacomplexenterpriseenvironment,youcancreatemultiplegroupsfordifferentcategoriesofusers.Youhavefullflexibilityinordertoprovideuserswithvariouskindsofpermissions,dependingontheirneeds.
Someusersmightrequireadministrationprivilegestostart/stopservicesandtomanagerepositorieswithouttheneedtodevelopETLandaccessDesigner.
TheETLdeveloperrolemightrequireonlypermissionsfortheDesignertooltodevelopETLcode.
Inourcase,wehavecreatedasingleuseraccountthathasbothadministrationanddeveloperprivileges.
StartingandstoppingservicesInthisrecipe,IwillexplainhowyoucanrestarttheservicesofallthemaincomponentsinyourDataServicesenvironment.
Howtodoit…Thisrelatestothethreedifferentservices:
Webapplicationserver:
TheTomcatapplicationserverconfiguredinourenvironmentcanbeconfiguredfromtwoplaces:
ComputerManagement|ServicesandApplications|ServiceswhereitexistsasastandardWindowsserviceBOEXI40Tomcat
CentralConfigurationManagementtoolinstalledasapartofIPSproductpackage:
Usingthistool,youcan:
1. Start/stopservices.2. Backupandrestoresystemconfiguration.3. SpecifytheWindowsuserwhostartsandstopstheunderlyingservices.
DataServicesJobServer:TomanageDataServicesJobServerintheWindowsenvironment,SAPcreatedaseparateGUIapplicationcalledDataServicesServerManager.Usingthistool,youcanperformthefollowingtasks:
1. RestartJobServer.2. CreateandconfigureJobServers.3. CreateandconfigureAccessServers.4. PerformSSLconfiguration.5. Setupapageablecachedirectory.6. PerformSMTPconfigurationforthesmpt_to()DataServicesfunction.
InformationPlatformServices:Tomanipulatetheseservices,youhavetwooptions:
CentralManagementConsole(tostop/startandconfigureservicesparameters)
CentralConfigurationManagement(tostop/startservices)
Inmostcases,youwillbeusingtheCMCoption,asitisaquickandconvenientwaytoaccessallservicesincludedintheIPSpackage.Italsoallowsyoutoseemuchmoreservice-relatedinformation.
Thesecondoptionisusefulifyouhavetheapplicationserverstoppedforsomereason(CMCasaweb-basedapplicationwillnotbeworking,ofcourse),andyoustillneedto
accessIPSservicestoperformbasicadministrationtaskssuchasrestartingthem,forexample.
Howitworks…Sometimes,thingsturnsour,andrestartingservicesisthequickestandeasiestoptiontoreturnthemtoanormalstate.Inthisrecipe,Imentionedallthemainservercomponentsandpointsofaccesstoperformsuchatask.
Thelastthingyoushouldkeepinmindregardingthisistherecommendedstartup/shutdownsequencesofthosecomponents.
1. ThefirstthingthatshouldstartafterWindowsstartsisyourdatabaseserver,asit
hoststheCMSdatabaserequiredfortheIPSservicesandDataServiceslocalrepository.
2. Second,youshouldstartIPSservices(themainoneistheCMSservice)asanunderlyinglevelforDataServices.
3. Then,itistheturnoftheDataServicesJobServer.4. Finally,itgoestoTomcat(webapplicationserver)thatprovidesuserswithaccessto
web-basedapplications.
Seealso
IdefinitelyrecommendthatyougetfamiliarwiththeSAPDataServicesAdministratorsGuidetounderstandthedetailsregardingIPSandDataServicescomponentmanagementandconfiguration.KnowledgesourcedanddocumentationlinksfromChapter1,IntroductiontoETLDevelopment.
AdministeringtasksThepreviousrecipeispartofthebasicadministrationtaskstoo,ofcourse.IseparateditfromthecurrentoneasIwantedtoputanaccentonDataServicesarchitecturedetailsbyexplainingthemainDataServicescomponentsinrelationtothemethodsandtoolsyoucanusetomanipulatethem.
Howtodoit…Here,wewilllookatsomeofthemostimportantadministrativetasks.
1. UsingRepositoryManager:
Asyoucanprobablyremember,therearetwotypesofrepositoriesinDataServices:thelocalrepositoryandcentralrepository.Theyservedifferentpurposesbutcanbecreatedinquiteasimilarway:withthehelpoftheDataServicesRepositoryManagertool.
ThisisaGUI-basedtoolavailableonyourWindowsmachineandinstalledwithotherclienttools.
AswealreadyhaveonerepositorycreatedandconfiguredautomaticallyduringtheDataServicesinstallation,let’scheckitsversionusingtheRepositoryManagertool.
LaunchRepositoryManagerandenterthefollowingvaluesforthecorrespondingoptions:
Field Value
Repositorytype Local
DatabaseType MicrosoftSQLServer
Databaseservername (local)\SQLEXPRESS
Databasename DS_LOCAL_REPO
UserName sa
Password *******
Afterenteringthesedetails,youhaveseveraloptions:
Create:Thisoptioncreatesrepositoryobjectsinthedefineddatabase.AswealreadyhavearepositoryinDS_LOCAL_REPO,theapplicationwillaskuswhetherwewanttoresettheexistingrepository.Sometimes,thiscanbeuseful,butkeepinmindthatitwillcleansetherepositoryofallobjects,andifnotcareful,allyourETLthatresidesintherepositorycanbelost.
Upgrade:ThisoptionupgradestherepositorytotheversionoftheRepositoryManagertool.Itisusefulduringsoftwareupgrades.AfterinstallingthenewversionofIPSandDataServices,youhavetoupgradeyourrepositorycontentsaswell.ThisiswhenyoulaunchtheRepositoryManagertool(whichhasalreadybeenupdated)andupgradeyourrepositorytothecurrentversion.
Getversion:Thisisthesafestoptionofthemall.Itjustreturnsthestringcontainingtherepositoryversionnumber.Inourcase,itreturned:BODI-320030:Thelocalrepositoryversion:<14.2.4.0>.
2. UsingServerManagerandCMCtoregisterthenewrepository:
AfteryoucreatethenewrepositorywithRepositoryManager,youhavetoregisteritinIPSandlinkittotheexistingJobServer.
ToregisteranewrepositoryinIPS,usethefollowingsteps:
1. LaunchCentralManagementConsole.2. OpentheDataServicessectionfromtheCMChomepage.3. GotoManage|ConfigureRepository.4. EnterdatabasedetailsofyournewlycreatedrepositoryandclickonSave.5. Toassignusersarequiredsetofprivileges,useUserSecuritywhenright-
clickingontherepositoryinthelist.Fordetails,seetheConfiguringuseraccessrecipe.
TolinkanewrepositorytotheJobServer,performthesesteps:
1. LaunchtheDataServicesServerManagertool.2. ChoosetheJobServertab.3. PressontheConfigurationEditor…button.4. SelectJobServerandpresstheEdit…button.5. IntheAssociatedRepositoriespanel,presstheAdd…buttonandfillin
database-relatedinformationofthenewrepositoryinthecorrespondentfieldsontheright-handside.
6. UsetheCloseandRestartbuttonintheDataServicesServerManagementtooltoapplythechangesdonetoaJobServer.
3. UsingLicenseManager:1. LicenseManagerexistsonlyinacommand-linemode.2. UsethefollowingsyntaxtorunLicenseManager:
LicenseManager[-v|-a<keycode>|-r<keycode>[-l<location>]]
3. Usethe–voptiontoviewexistinglicensekeys,-atoaddanewlicensekey,and–rtoremovetheexistinglicensekeyfromthe–llocationspecified.
ThistoolisavailableatC:\EIM\DataServices\bin\.
Howitworks…CreatingandconfiguringanewlocalrepositoryisusuallyrequiredwhenyousetupanenvironmentforanewETLdeveloperorwanttouseanextrarepositorytomigrateyourETLforETLtestingpurposesortotestarepositoryupgrade.
Aftercreatinganewlocalrepository,youshouldalwayslinkittoanexistingJobServer.ThislinkensuresthatJobServerisawareoftherepositoryandcanexecutejobsfromit.
Finally,LicenseManagercanbeusedtoseethelicensekeyusedinyourinstallationsandtoaddnewextraonesifrequired.
SeealsoYoucanpracticewithyourDataServicesadminskillsbycreatinganewdatabaseandnewlocalDataServicesrepository.Donotforgetthatyoudonotjusthavetocreateit,butalsoregisteritwithIPSservicesandDataServicesJobServersothatyoucansuccessfullyrunjobsfromit.
Someotheradministrativetaskscanbefoundinthefollowingchapters:
TheStartingandstoppingservicesrecipefromthischapterTheConfigureODBClayerpointfromtheHowtodoit…sectionoftheCreatingIPSandDataServicesrepositoriesrecipeofthischapter
UnderstandingtheDesignertoolNowthatwehavereviewedalltheimportantserverandclientcomponentsofournewDataServicesinstallation,itistimetogetfamiliarwiththemostusableandmostimportanttoolintheDataServicesproductpackage.Itwillbeourmainfocusinthefollowingchapters,andofcourse,IamtalkingaboutourdevelopmentGUI:theDesignertool.
EveryobjectyoucreateinDesignerisstoredinalocalobjectlibrary,whichisalogicalstorageunitpartofthephysicallocalrepositorydatabase.Inthisrecipe,wewilllogintoalocalrepositoryviaDesigner,setupacoupleofsettings,andwriteourfirst“HelloWorld”program.
Gettingready…YourDataServicesETLdevelopmentenvironmentisfullydeployedandconfigured,sogoaheadandstarttheDesignerapplication.
Howtodoit…First,let’schangesomedefaultoptionstomakeourdevelopmentlifealittlebiteasierandtoseehowoptionswindowsinDataServiceslooks:
1. WhenyoulaunchyourDesignerapplication,youseequiteasophisticatedlogin
screen.Entertheetlusernamewecreatedinoneofthepreviousrecipesanditspasswordtoseethelistofrepositoriesavailableinthesystem.
2. Atthispoint,youshouldseeonlyonelocalrepository,DS4_REPO,thatwascreatedbydefaultduringtheDataServicesinstallation.Double-clickonit.
3. YoushouldseeyourDesignerapplicationstarted.4. GotoTools|Options.5. Intheopenedwindow,expandtheDesignertreeandchooseGeneral.6. SettheNumberofcharactersinworkspaceiconnameoptionto50andselectthe
Automaticallycalculatecolumnmappingscheckbox.7. ClickonOKtoclosetheoptionswindow.
Beforewecreateourfirst“HelloWorld”program,let’squicklytakealookatDesigner’suserinterface.
Inthisrecipe,youwillberequiredtoworkwithonlytwoareas:LocalObjectLibraryandthemaindevelopmentarea.Thebiggestwindowontheright-handsidewiththeStartPagetabwillopenbydefault.
LocalObjectLibrarycontainstabswithlistsofobjectsyoucancreateoruseduringyourETLdevelopment.TheseobjectsincludeProjects,Jobs,WorkFlows,DataFlows,Transforms,Datastores,Formats,andCustomFunctions:
Alltabsareempty,asyouhavenotcreatedanyobjectsofanykindyet,exceptfortheTransformstab.ThistabcontainsapredefinedsetoftransformsavailableforyoutouseforETLdevelopment.DataServicesdoesnotallowyoutocreateyourowntransforms(thereisanexceptionthatwewilldiscussintheupcomingchapters).So,everythingyouseeonthistabisbasicallyeverythingthatisavailableforyoutomanipulateyourdatawith.
Now,let’screateourfirst“HelloWorld”program.AsETLdevelopmentinDataServicesisnotquitetheusualexperienceofdevelopingwithaprogramminglanguage,weshouldagreeonwhatourfirstprogramshoulddo.Inalmostanyprogramminglanguagerelatedbook,thiskindofprogramjustperformsanoutputofa“HelloWorld”stringontoyourscreen.Inourcase,wewillgeneratea“HelloWorld”stringandoutputitinatablethatwillbeautomaticallycreatedbyDataServicesinourtargetdatabase.
IntheDesignerapplication,gototheLocalObjectLibrarywindow,choosetheJobstab,right-clickontheBatchJobstree,andselectNewfromthelistofoptionsthatappears.
1. ChoosethenameforanewjobJob_HelloWorldandenterit.Afterthejobiscreated,
double-clickonit.2. Youwillenterthejobdesignwindow(seeJob_HelloWorld–Jobatthebottomof
theapplication),andnow,youcanaddobjectstoyourjobandsetupitsvariablesandparameters.
3. InthedesignwindowoftheJob_HelloWorld–Jobtab,createadataflow.Todothis,fromtherighttoolpanel,chooseDataFlowobjectandleft-clickonamaindesignwindowtocreateit.NameitDF_HelloWorld.
4. Double-clickonanewlycreateddataflow(orjustclickonceonitstitle)toopentheDataFlowdesignwindow.Itappearsasanothertabinthemaindesignwindowarea.
5. Now,whenwearedesigningtheprocessingunitordataflow,wecanchoosethetransformsfromtheTransformstaboftheLocalObjectLibrarywindowtoperformmanipulationwiththedata.ClickontheTransformstab.
6. Here,selectthePlatformtransformstreeanddraganddroptheRow_GenerationtransformfromittotheDataFlowdesignwindow.
NoteAswearegeneratinganew“HelloWorld!”string,weshouldusetheRow_Generationtransform.ItisaveryusefulwayofgeneratingrowsinDataServices.Allothertransformsareperformingoperationsontherowsextractedfromsourceobjects(tablesorfiles)thatarepassingfromsourcetotargetwithinadataflow.Inthisexample,wedonothaveasourcetable.Hence,wehavetogeneratearecord.
7. Bydefault,theRow_GenerationtransformgeneratesonlyonerowwiththeIDas0.Now,wehavetocreateourstringandpresentitasafieldinafuturetargettable.Forthis,weneedtousetheQuerytransform.SelectitfromtherighttoolpanelordraganddropitfromTransformstoPlatform.TheiconoftheQuerytransformslookslikethis:
8. IntheDataFlowdesignwindow,linkRow_GenerationtoQuery,asshownhere,anddouble-clickontheQuerytransformtoopentheQueryEditortab:
NoteInthenextchapter,wewillexplainthedetailsoftheQuerytransform.Inthemeantime,let’sjustsaythatthisisoneofthemostusedtransformsinDataServices.Itallowsyoutojoinflowsofyourdataandmodifythedatasetbyadding/removingcolumnsintherow,changingdatatypes,andperforminggroupingoperations.Ontheleft-handsideoftheQueryEditor,youwillseeanincomingsetofcolumns,andontheright-handside,youwillseetheoutput.Thisiswhereyouwilldefineallyourtransformationfunctionsforspecificfieldsorassignhard-codedvalues.Wearenot
interestedintheincomingIDgeneratedbytheRow_Generationtransform.Forus,itservedthepurposeofcreatingarowthatwillholdour“HelloWorld!”valueandwillbeinsertedinatable.
9. IntherightpanelofQueryEditor,right-clickonQueryandchooseNewOutputColumn…:
10. SelectthefollowingsettingsintheopenedColumnPropertieswindowtodefinethepropertiesofournewlycreatedcolumnandclickonOK:
11. Now,whenourgeneratedrowhasonecolumn,wehavetopopulateitwithvalue.Forthis,wehavetousetheMappingtabinQueryEditor.SelectouroutputfieldTEXTandenterthe“HelloWorld!”valueinthemappingtabwindow.Donotforgetsinglequotes,whichmeanastringinDS.Then,closeQueryEditoreitherwiththetabcrossinthetop-rightcorner(donotconfuseitwiththeDesignerapplicationcrossthatislocateddangerouslyclosetoit)orjustusetheBackbutton(Alt+Left),agreenarrowiconinthetopinstrumentpanel.
Atthispoint,wehaveasourceinourdataflow.Wealsohaveatransformationobject(theQuerytransform),whichdefinesourtextcolumnandassignsavaluetoit.Whatismissingisatargetobjectwewillinsertourrowto.
Aswewilluseatableasatargetobject,wehavetocreateareferencetoadatabasewithinDataServices.Wewillusethisreferencetocreateatargettable.Thosedatabasereferencesarecalleddatastoresandareusedasapresentationofthedatabaselayer.Inthenextstep,wewillcreateareferencetoourSTAGEdatabasecreatedinthepreviouschapter.
12. GototheDatastorestabofLocalObjectLibrary.Then,right-clickontheemptywindowandselectNewtoopentheCreateNewDatastorewindow.
13. Choosethefollowingsettingsforthenewlycreateddatastoreobject:
14. Repeatsteps12and13tocreatetherestofdatastoreobjectsconnectedtothedatabaseswecreatedinthepreviousrecipes.UsethesamedatabaseservernameandusercredentialsandchangeonlytheDatastoreNameandDatabasenamefieldswhencreatingnewdatastores.Seethefollowingtableforreference:
DatastoreName Databasename
DS_ODS ODS
DWH AdventureWorks_DWH
OLTP AdventureWorks_OLTP
Now,youshouldhavefourdatastorescreated,referencingalldatabasescreatedintheSQLserver:DS_STAGE,DS_ODS,DWH,andOLTP.
15. Now,wecanusetheDS_STAGEdatastoretocreateourtargettable.GobacktotheDF_HelloWorldintheDataFlowtabofthedesignwindowandselectTemplateTableontherighttoolpanel.Putitontheright-handsideoftheQuerytransformandchooseHELLO_WORLDasthetablenameintheDS_STAGEdatastore.
16. Ourfinaldataflowshouldlooklikethisnow:
17. GobacktotheJob_HelloWorld–JobtabandclickontheValidateAllbuttoninthetopinstrumentpanel.YoushouldgetthefollowingmessageintheoutputwindowofDesignerontheleft-handsideofyourscreen:Validate:NoErrorsFound(BODI-1270017).
18. Now,wearereadytoexecuteourfirstjob.Forthis,usetheExecute…(F8)buttonfromthetopinstrumentpanel.AgreetosavethecurrentobjectsandclickonOKonthefollowingscreen.
19. Seethatthelogscreenthatshowsyoutheexecutionstepscontainsnoexecutionerrors.Then,gotoyourSQLServerManagementStudio,opentheSTAGEdatabase,andcheckthecontentsoftheappearedHELLO_WORLDtable.Ithasjustonecolumn,TEXT,withonlyonevalue,“HelloWorld!”.
Howitworks…“HelloWorld!”isasmallexamplethatintroducesalotofgeneralandevensophisticatedconcepts.Inthefollowingsections,wewillquicklyreviewthemostimportantones.TheywillhelpyougetfamiliarwiththedevelopmentenvironmentinDataServicesDesigner.Keepinmindthatwewillreturntoallthesesubjectsagainthroughoutthebook,discussingtheminmoredetail.
ExecutingETLcodeinDataServicesToexecuteanyETLcodedevelopedintheDataServicesDesignertool,youhavetocreateajobobject.InDataServices,theonlyexecutableobjectisjob.Everythingelsegoesinsidethejob.
ETLcodeisorganizedasahierarchyofobjectsinsidethejobobject.Tomodifyanynewobjectbyplacinganotherobjectinit,youhavetoopentheeditedobjectinthemainworkspacedesignareaandthendraganddroptherequiredobjectinsideit,placingthemintheworkspacearea.Inourrecipe,wecreatedajobobjectandplacedthedataflowobjectinit.Wethenopenedthedataflowobjectintheworkspaceareaandplacedtransformobjectsinsideit.Asyoucanseeinthefollowingscreenshot,workspaceareasopenedpreviouslycouldbeaccessiblethroughthetabsatthebottomoftheworkspacearea:
TheProjectAreapanelcandisplaythehierarchyofobjectsintheformofatree.Toseeit,youhavetoassignyournewlycreatedjobtoaspecificprojectandopentheprojectinProjectAreabydouble-clickingontheprojectobjectinLocalObjectLibrary.
ExecutableETLcodecontainsonejobobjectandcancontainscript,dataflow,andworkflowobjectscombinedinvariouswaysinsidethejob.
Asyousawfromtherecipesteps,youcancreateanewjobbygoingtoLocalObjectLibrary|Jobs.
Althoughyoucancombinealltypesofobjectsbyplacingtheminthejobdirectly,someobjects,forexample,transformobjects,canbeplacedonlyintodataflowobjectsasdataflowistheonlytypeofobjectthatcanprocessandactuallymigratedata(onarow-by-rowbasis).Hence,alltransformationsshouldhappenonlyinsidethedataflow.Inthesameway,youcanonlyplacedatastoreobjects,suchastablesandviews,directlyin
dataflowsassourceandtargetobjectsfordatatobemovedfromsourcetotargetandtransformedalongtheway.Whenadataflowobjectisexecutedwithinthejob,itreadsdatarowbyrowfromthesourceandmovestherowfromlefttorighttothenexttransformobjectinsidethedataflowuntilitreachestheendandissenttothetargetobject,whichusuallyisadatabasetable.
Throughoutthisbook,youwilllearnthepurposeofeachobjecttypeandhowandwhenitcanbeused.
Fornow,rememberthatallobjectsinsidethejobareexecutedinthesequentialorderfromlefttorightiftheyareconnectedandsimultaneouslyiftheyarenot.Anotherimportantruleisthattheparentobjectstartsexecutingfirstandthenallobjectsinsideit.Theparentobjectcompletesitsexecutiononlyafterallchildobjectshavecompletedsuccessfully.
ValidatingETLcodeToavoidjobexecutionfailuresduetoincorrectETLsyntax,youcanvalidatethejobandallitsobjectswiththeValidateCurrentorValidateAllbuttononthetopinstrumentpanelinsidetheDesignertool:
ValidateCurrentvalidatesonlythecurrentobjectopenedintheworkspacedesignareaandscriptobjectsinitanddoesnotvalidatetheunderlyingchildobjectsuchasdataflowsandworkflows.Intheprecedingexample,theobjectopenedintheworkspaceisajobobjectthathasonechilddataflowobjectcalledDF_HelloWorldinsideit.OnlyonejobobjectwillbevalidatedandnotDF_HelloWorld.
ValidateAllvalidatesthecurrentandallunderlyingobjects.So,botharecurrentlyopenedintheworkspaceobject,andallobjectsyouseeintheworkspacearevalidated.Thesameappliestotheobjectsnestedinsidethem,downtotheveryendoftheobjecthierarchy.
So,tovalidatethewholejobanditsobjects,youhavetogotothejoblevelbyopeningthejobobjectintheworkspaceareaandclickingonValidateAllbuttononthetopinstrumentpanel.
ValidationresultsaredisplayedintheOutputpanel.WarningmessagesdonotaffecttheexecutionofthejobandoftenindicatepossibleETLdesignproblemsorshowdatatypeconversionsperformedbyDataServicesautomatically.ErrormessagesintheOutput|ErrorstabmeansyntaxorcriticaldesignerrorsmadeinETL.Wheneveryoutrytorunthejobafterseeing“red”errorvalidationmessages,thejobwillfailwithexactlythesameerrorsthatyousawatthebeginningofexecution,aseveryjobisimplicitlyvalidatedwhen
executed.
AlwaysvalidateyourjobmanuallybeforeexecutingittoavoidjobfailuresduetoincorrectsyntaxorincorrectETLdesign.
TemplatetablesThisisaconvenientwaytospecifythetargettablethatdoesnotyetexistinthedatabaseandsenddatatoit.Whenadataflowobjectwherethetemplatetargettableobjectisplacedisexecuted,itrunstwoDDLcommands,DROPTABLE<templatetablename>andCREATETABLE<templatetablename>,usingtheoutputschema(setofcolumns)ofthelastobjectinsidethedataflowbeforethetargettemplatetable.Onlyafterthat,thedataflowprocessesallthedatafromthesource,passingrowsfromlefttorightthroughalltransformations,andfinallyinsertsdataintothefreshlycreatedtargettable.
NoteNotethattablesarenotcreatedonthedatabaselevelfromtemplatetablesuntiltheETLcode(dataflowobject)isexecutedwithinDataServices.Simplyplacingthetemplatetableobjectinsideadataflowandcreatingitinadatastorestructureisnotenoughfortheactualphysicaltabletobecreatedinthedatabase.Youhavetorunyourcode.
Theyaredisplayedunderdifferentcategoriesinthedatastore.Theyappearseparatelyfromnormaltableobjects:
TheusageoftemplatetableisextremelyusefulduringETLdevelopmentandtesting.Itenablesyoutonotthinkaboutgoingtothedatabaselevelandchangingthestructureofthetablesbyaltering,deleting,orcreatingthemmanuallyiftheETLcodethatinsertsthedatainthetablechanges.Everytimedataflowruns,itwillbedeletingandrecreatingthedatabasetabledefinedthroughthetemplatetableobject,withthecurrentlyrequiredtablestructuredefinedbyyourcurrentETLcode.
Templatetableobjectsareeasilyconvertedtonormaltableobjectsusingthe“Import”
commandonthem.Thiscommandisavailablefromtheobject’scontextmenuinthedataflowworkspaceorinthedatastorestabinLocalObjectLibrary.
QuerytransformbasicsQuerytransformisoneofthemostimportantandmostoftenusedtransformobjectsinDataServices.Itsmainpurposeistoreaddatafromleftobject(s)(inputschema(s))andsenddatatotheoutputschema(objecttotherightoftheQuerytransform).YoucanjoinmultipledatasetswiththehelpoftheQuerytransformusingsyntaxrulesoftheSQLlanguage.
Additionally,youcanspecifythemappingrulesfortheoutputschemacolumnsinsidetheQuerytransformbyapplyingvariousfunctionstothemappedfields.Youcanalsospecifyhard-codedvaluesorevencreateadditionaloutputschemacolumns,likewedidinourHelloWorldexample.
TheexampleinthenextscreenshotisnotfromourHelloWorldexample.However,itdemonstrateshowtherowextractedpreviouslyfromthesourceobject(inputschema)canbeaugmentedwithextracolumnsorcangetitscolumnsrenamedoritsvaluestransformedbyfunctionsappliedtothecolumns:
Seehowcolumnsfromtwodifferenttablesarecombinedinasingledatasetintheoutputschema,withcolumnsrenamedaccordingtonewstandardsandnewcolumnscreatedwithNULLvaluesinthem.
TheHelloWorldexampleYouhavejustcreatedthesimplestdataflowprocessingunitandexecuteditwithinyourfirstjob.
ThedataflowobjectinourexamplehastheRow_Generationtransform,whichgeneratesrowswithonlyonefield.WegeneratedonerowwiththehelpofthistransformandaddedanextrafieldtotherowwiththehelpoftheQuerytransform.WetheninsertedourfinalrowintotheHELLO_WORLDtablecreatedautomaticallybyDataServicesintheSTAGEdatabase.
YoualsohaveconfiguredacoupleofDesignerpropertiesandcreatedaDatastoreobjectthatrepresentstheDataServicesviewoftheunderlyingdatabaselevel.Notalldatabaseobjects(tablesandviews)arevisiblewithinyourdatastorebydefault.Youhavetoimportonlythoseyouaregoingtoworkwith.InourHelloWorldexample,wedidnotimportthetableinthedatastore,asweusedthetemplatetable.ToimportthetablethatexistsinthedatabaseintoyourdatastoresothatitcanbeusedinETLdevelopment,youcanperformthefollowingsteps:
1. GotoLocalObjectLibrary|Datastores.2. Expandthedatastoreobjectyouwanttoimportthetablein.3. Double-clickontheTablessectiontoopenthelistofdatabasetablesavailablefor
import:
4. Right-clickonthespecifictableintheExternalMetadatalistandchooseImportfromthetablecontextmenu.
5. ThetableobjectwillnowappearintheTablessectionofthechosendatastore.Asithasnotyetbeenplacedinanydataflowobject,theUsagecolumnshowsa0value:
Creatingdifferentdatastoresforthesamedatabasecouldalsobeaflexibleandconvenientwayofcategorizingyoursourceandtargetsystems.
Thereisalsoaconceptofconfigurationswhenyoucancreatemultipleconfigurationsofthesamedatastorewithdifferentparametersandswitchbetweenthem.Thisisveryusefulwhenyouareworkinginacomplexdevelopmentenvironmentwithdevelopment,test,andproductiondatabases.However,thisisatopicforfuturediscussionintheupcomingchapters.
Chapter3.DataServicesBasics–DataTypes,ScriptingLanguage,andFunctionsInthischapter,IwillintroduceyoutoscriptinglanguageinDataServices.Inthischapter,wewillcoverthefollowingtopics:
CreatingvariablesandparametersCreatingascriptUsingstringfunctionsUsingdatefunctionsUsingconversionfunctionsUsingdatabasefunctionsUsingaggregatefunctionsUsingmathfunctionsUsingmiscellaneousfunctionsCreatingcustomfunctions
IntroductionItiseasytounderestimatetheimportanceofthescriptinglanguageinDataServices,butyoushouldnotfallforthispitfall.Insimplewords,scriptinglanguageisagluethatallowsyoutobuildsmartandreliableETLanduniteallprocessingunitsofwork(whicharedataflowobjects)together.
ThescriptinglanguageinDataServicesismainlyusedtocreatecustomfunctionsandscriptobjects.Scriptobjectsrarelyperformdatamovementanddatatransformation.Theyareusedtoassistthedataflowobject(maindatamigrationandtransformationprocesses).Theyareusuallyplacedbeforeandafterthemtoassistwithexecutionlogicandcalculatetheexecutionparametervaluesfortheprocessesthatextract,transform,andloadthedata.
ThescriptinglanguageinDataServicesisarmedwithpowerfulfunctionsthatallowyoutoquerydatabases,executedatabasestoredprocedures,andperformsophisticatedcalculationsanddatavalidations.Itevensupportsregularexpressionsmatchingtechniques,and,ofcourse,itallowsyoutobuildyourowncustomfunctions.ThesefunctionscanbeusednotjustinthescriptsbutalsointhemappingofQuerytransformsinsidedataflows.
Withoutfurtherdelay,let’sgettolearningscriptinglanguage.
CreatingvariablesandparametersInthisrecipe,wewillextendthefunctionalityofourHelloWorlddataflow(seetheUnderstandingtheDesignertoolrecipefromChapter2,ConfiguringtheDataServicesEnvironment).Alongwiththefirstrowsaying“HelloWorld!”,wewillgeneratethesecondrow,providingyouwiththenameoftheDataServicesjobthatgeneratedthegreetings.
ThisexamplewillnotjustallowustogetfamiliarwithhowvariablesandparametersarecreatedbutalsointroduceustooneoftheDataServicesfunctions.
GettingreadyLaunchyourDesignertoolandopentheJob_HelloWorldjobcreatedinthepreviouschapter.
Howtodoit…Wewillparameterizeourdataflowsothatitcanreceivetheexternalvalueofthejobnamewhereitisbeingexecuted,andcreatethesecondrowaccordingly.
Wewillalsorequireanextraobjectinourjob,intheformofascriptthatwillbeexecutedbeforethedataflowandthatwillinitializeourvariablesbeforepassingtheirvaluestothedataflowparameters.
1. Usingthescriptbutton( )fromtherightinstrumentpanel,createascriptobject.
Nameitscr_init,andplaceittotheleftofyourdataflow.Donotforgettolinkthem,asshowninthefollowingscreenshot:
2. Tocreatedataflowparameters,clickonthedataflowobjecttoopenitinthemainworkspacewindow.
3. OpentheVariablesandParameterspanel.AllpanelsinDesignercanbeenabled/displayedwithhelptheofthebuttonslocatedinthetopinstrumentpanel,asinthefollowingscreenshot:
4. Iftheyarenotdisplayedonyourscreen,clickontheVariablesbuttononthetopinstrumentpanel( ).Then,right-clickonParametersandchooseInsertfromthecontextmenu.Specifythefollowingvaluesforthenewinputparameter:
NoteNotethatthe$signisveryimportantwhenyoureferenceavariableorparameter,asitdefinestheparameterinDataServicesandisrequiredsothatthecompilercan
parseitcorrectly.Otherwise,itwillbeinterpretedbyDataServicesasatextstring.DataServicesautomaticallyputsthedollarsigninwhenyoucreateanewvariableorparameterfromthepanelmenus.However,youshouldnotforgettouseitwhenyouarereferencingtheparameterorvariableinyourscriptorintheCallssectionofthedataflow.
5. Now,let’screateajobvariablethatwewillusetopassthevaluedefinedinthescripttothedataflowparameter.Forthis,usetheBack(Alt+Left)buttontogotothejoblevel(sothatitscontentisdisplayedinthemaindesignwindow).Then,right-clickonVariablesintheVariablesandParameterspanelandchooseInsertfromthecontextmenutoinsertanewvariable.Nameit$l_JobNameandassignthevarchar(100)datatypetoit,whichisthesameasthedataflowparametercreatedearlier.
6. Topassvariablevaluesfromthejobtotheinputparameterofthedataflow,gototheCallstaboftheVariablesandParameterspanelonthejobdesignlevel.Here,youshouldseetheinputdataflow$p_JobNameparameterwithanemptyvalue.
7. Double-clickonthe$p_JobNameparameterandreferencethe$l_JobNamevariableintheValuefieldoftheParameterValuewindow.ClickonOK:
8. Assignavaluetoajobvariableinthepreviouslycreatedscriptobject.Todothis,openthescriptinthemaindesignwindowandinsertthefollowingcodeinit:$l_JobName=‘Job_HelloWorld’;
9. Finally,let’smodifythedataflowtogenerateanewcolumninthetargettable.Forthis,openthedataflowinthemaindesignwindow.
10. OpentheQuerytransformandright-clickontheTEXTcolumntogotoNewOutputColumn…|InsertBelow.
11. IntheopenedColumnPropertieswindow,specifyJOB_NAMEasthenameofthenewcolumnandassignitthesamedatatype,varchar(100).
12. IntheMappingtaboftheQuerytransformfortheJOB_NAMEcolumn,specifythe‘Createdby‘||$p_JobNamestring.
13. Gobacktothejobcontextandcreateanewglobalvariable,$g_JobName,byright-clickingontheGlobalVariablessectionandselectingInsertfromthecontextmenu.
14. YourfinalQueryoutputshouldlooklikethis:
15. Now,gobacktothejoblevelandexecuteit.Youwillbeaskedtosaveyourworkandchoosetheexecutionparameters.Atthispoint,wearenotinterestedinmodifyingthem,sojustcontinuewiththedefaultones.
16. AfterexecutingthejobinDesigner,gotoManagementStudioandquerytheHELLO_WORLDtabletoseethatanewcolumnhasappearedwiththe‘CreatedbyJob_HelloWorld’value.
Howitworks…AllmainobjectsinDataServices(dataflow,workflow,andjob)canhavelocalvariablesorparametersdefined.thedifferencebetweenanobjectvariableandanobjectparameterisverysubtle.Parametersarecreatedandusedtoacceptthevaluesfromotherobjects(inputparameters)orpassthemoutsideoftheobject(outputparameters).Otherwise,parameterscanbehaveinthesamewayaslocalvariables—youcanusetheminthelocalfunctionsorusethemtostoreandpassthevaluestoothervariablesorparameters.Dataflowobjectscanonlyhaveparametersdefinedbutnotlocalvariables.Seethefollowingscreenshotoftheearlierexample:
Workflowandjobobjects,ontheotherhand,canonlyhavelocalvariablesdefinedbutnotparameters.Localvariablesareusedtostorethevalueslocallywithintheobjecttoperformvariousoperationsonthem.Asyouhaveseen,theycanbepassedtotheobjectsthatare“calling”forthem(gotoVariablesandParameters|Calls).
Thereisanothertypeofvariablecalledaglobalvariable.Thesevariablesaredefinedatthejoblevelandsharedamongallobjectsthatwereplacedinthejobstructure.
WhatyouhavedoneinthischapterisacommonpracticeinDataServicesETLdevelopment:passingvariablevaluesfromtheparentobject(jobinourexample)tothechildobject(dataflow)parameters.
Tokeepthingssimple,youcanspecifyhard-codedvaluesfortheinputdataflowparameters,butthisisusuallyconsideredbadpractice.
Whatwecouldalsodoinourexampleispassglobalvariablevaluestodataflowparameters.Globalvariablesarecreatedataverytopjoblevelandaresharedbyallnestedobjects,notjustwithimmediatejobchildobjects.Thatiswhytheyarecalledglobal.Theycanbecreatedonlyinthejobcontext,asshownhere:
Also,notethatinDataServices,youcannotreferenceparentobjectvariablesdirectlyintochildobjects.Youalwayshavetocreateinputchildobjectparametersandmapthemon
theparentlevel(usingtheCallstaboftheVariablesandParameterspanel)tolocalparentvariables.Onlyafterdoingthis,youcangoinyourchildobjectandmapitsparameterstothelocalchildobject’svariables.
Now,youcanseethatparametersarenotthesamethingasvariables,andtheycarryanextrafunctionofbridgingvariablescopebetweenparentandchild.Infact,youdonothavetomapthemtoalocalvariableinsideachildobjectifyouarenotgoingtomodifythem.Youcanuseparametersdirectlyinyourcalculations/columnmapping.
Lastthingtosayhereisthatdataflowsdonothavelocalvariablesatall.Theycanonlyacceptvaluesfromtheparentsandusetheminfunctioncalls/columnmapping.Thatisbecauseyoudonotwritescriptsinsideadataflowobject.Scriptsareonlycreatedatthejoborworkflowlevelorinsidethecustomfunctionsthathavetheirownvariablescope.
DatatypesavailableinDataServicesaresimilartocommonprogramminglanguagedatatypes.Foramoredetaileddescription,referencetheofficialDataServicesdocumentation.
NoteTheblobandlongdatatypescanonlybeusedbystructurescreatedinsideadataflowor,inotherwords,columns.Youcannotcreatescriptvariablesanddataflow/workflowparametersofbloborlongdatatypes.
There’smore…TrytomodifyyourJob_HelloWorldjobtopassglobalvariablevaluestodataflowparametersdirectly.Todothis,usethepreviouslycreatedglobalvariable$g_JobName,specifyahard-codedvalueforit(orassignitavalueinsideascript,aswedidwiththelocalvariable)andmapittotheinputdataflowparameterontheCallstaboftheVariablesandParameterspanelinthejobcontext.Donotforgettorunthejobandseetheresult.
CreatingascriptYes,technicallywecreatedourfirstscriptinthepreviousrecipe,butlet’sbehonest—thisisnotthemostadvancedscriptintheworld,anditdoesnotprovideuswithmuchknowledgeregardingscriptinglanguagecapabilitiesinDataServices.Finally,althoughsimplicityisusuallyavirtue,itwouldbenicetocreateascriptthatwouldhavemorethanonerowinit.
Inthefollowingrecipe,wewillcreateascriptthatwoulddosomedatamanipulationandalittlebitoftextprocessingbeforepassingavaluetoadataflowinputparameter.
Howtodoit…Clearthecontentsofyourscr_initscriptobjectsandaddthefollowinglines.Notethateverycommandorfunctioncallshouldendwithasemicolon:#Scriptwhichdeterminesnameofthejoband
#preparesitfordataflowinputparameter
print(‘INFO:scr_initscripthasstarted…’);
while($l_JobNameISNULL)
begin
if($g_JobNameISNOTNULL)
begin
print(‘INFO:assigning$g_JobNamevalue’
||’of{$g_JobName}toa$l_JobNamevariable…’);
$l_JobName=$g_JobName;
end
else
print(‘INFO:globalvariable$g_JobNameisempty,’
||’calculatingvaluefor$l_JobName’
||’usingDataServicesfunction…’);
$l_JobName=job_name();
print(‘INFO:newvalueassignedtoalocal’
||‘variable:$l_JobName={$l_JobName}!’);
end
print(‘INFO:scr_initscripthassuccessfullycompleted!’);
TrytorunajobnowandconfirmthattherowinsertedintothetargetHELLO_WORLDtablehasaproperjobnameinthesecondcolumn.
Howitworks…Weintroducedacoupleofnewelementsofscriptinglanguagesyntax.The#signdefinesthecommentsectioninDataServicesscripts.
Notethatwealsoreferencedvariablevaluesinthetextstringusingcurlybrackets{$l_JobName}.Ifyouskipthem,theDataServicescompilerwillnotrecognizevariablesmarkedwiththe$signandwillusethevariablenameanddollarsignaspartofthestring.
TipYoucanalsousesquarebrackets[]insteadofcurlybracketstoreferencevariable/parametervalueswithinatextstring.Thedifferencebetweenthemisthatifyouusecurlybrackets,thecompilerwillputthevariablevalueinthequotedstring`value`insteadofusingitasitisusedinthetextstring.
ScriptinglanguageinDataServicesiseasytolearnasitdoesnothavemuchvarietyintermsofconditionalconstructs.Ithasasimplesyntax,andallitspowerscomefromfunctions.
Inthisparticularexample,youcanseeonewhileloopandoneconditionalconstruct.ThewhileloopistheonlytypeofloopsupportedintheDataServicesscriptinglanguageandtheonlyconditionalsupportedaswell.Thisisreallyallyouneedinmostcases.
Thewhile(<condition>)loopexpressionshouldincludeablockofcodestartingwithbeginandendingwithend.Theconditioncheckhappensatthebeginningofeachiteration(eventheveryfirstone),sokeepitinmindasevenyourveryfirstloopiterationcanbeskipped.Inourexample,thelooprunswhilethe$l_JobNamelocalvariableisempty.
Thesyntaxoftheifconditionalelementisthesame—eachconditionalblockshouldbewrappedinbegin/end.Itsupportselseif,andyoucanincludemultipleconditionalstatementsseparatedbyANDorOR.Wecanusetheconditionaltocheckwhethertheglobalvariablefromwhichwewillbesourcingvalueforthelocalvariableisemptyornot.Ifitisnotempty,wewouldassignittoalocalvariable,andifit’sempty,weshouldgenerateajobnameusingthejob_name()functionthatreturnsthenameofthejobitisexecutedin.
Theprint()functionisamainloggingfunctionintheDataServicesscriptinglanguage.Itallowsyoutoprintoutmessagesinthetracelogfile.Lookatthefollowingscreenshot.Itshowsanexcerptfromthetracelogfiledisplayedinoneofthetabsinthemaindesignwindowafteryouexecutethejob.
NoteWhenyouexecutethejob,DataServicesgeneratesthreelogfiles:tracelog,monitorlog,anderrorlog.Wewillexplaintheselogsindetailintheupcomingrecipesandchapters.Fornow,usethetracelogbuttontoseetheresultofyourjobexecution.
Messagesgeneratedbytheprint()functionaremarkedinthetracelogasPRINTFN(seethefollowingscreenshot).Youcanalsoaddyourownformattingintheprint()functiontomakethemessagesmoredistinguishablefromtherestofthelogmessages(seetheINFOwordaddedintheexamplehere):
UsingstringfunctionsHere,wewillexploreafewusefulstringfunctionsbyupdatingourHelloWorldcodetoincludesomeextrafunctionality.ThereisonlyonedatatypeinDataServicesusedtostorecharacterstrings,andthatisvarchar.Itkeepsthingsprettysimpleforstring-relatedandconversionoperations.
Howtodoit…Here,youwillseetwoexamples:applyingstringfunctionstransformationwithinadataflowandusingstringfunctionsinthescriptobject.
FollowthesestepstousestringfunctionsinDataServicesusingtheexampleofthereplace_substr()function,whichsubstitutespartofthestringwithanothersubstring:
1. OpentheDF_HelloWorlddataflowintheworkspacewindowandaddanewQuery
transformnamedWho_says_What.PutitaftertheQuerytransformandbeforethetargettemplatetable.
2. OpentheWho_says_WhatQuerytransformandaddanewWHO_SAYS_WHAToutputcolumnofthevarchar(100)type.
3. Addthefollowingcodeintoamappingtabofthenewcolumn:replace_substr($p_JobName,‘_’,’’)||’says’||word(Query.TEXT,1)
4. YournewQuerytransformshouldlookliketheoneinthefollowingscreenshot.Notethatyoushouldusesinglequotestodefinethestringtextinmappingorscript:
5. Thefinalversionofthedataflowshouldlooklikethis:
Saveyourworkandexecutethejob.GotoManagementStudiotoseethecontentsofthedbo.HELLO_WORLDtable.Thetablenowhasanewcolumnwiththe“JobHelloWorld”saysHellostring.
UsingstringfunctionsinthescriptWearenotquitehappywiththeWho_says_Whatstring.Obviously,onlyHelloWorldshouldbeputindoublequotes(theydonotaffectthebehaviorofstringtextinData
Services).Also,wewillusetheinit_cap()functiontomakesurethatonlythefirstletterofourjobnameiscapitalized.
ChangethemappingofWHO_SAYS_WHATtothefollowingcode:‘Job”’||init_cap(ltrim(lower($p_JobName),‘job_’))||’”’||’says’||word(Query.TEXT,1)
Accordingtothislogic,weareexpectingthejobnametostartwiththeJob_prefix.Inthiscase,wehavetoaddanextralogictothescriptlogicrunningbeforethedataflowtomakesurethatwehavethisprefixinourjobname.Thefollowingcodewilladditifthejobnameisnotvalidaccordingtoournamingstandards.Addthefollowingcodebeforethelastprint()functioncall:#Checkthatjobisnamedaccordingtothenamingstandards
if(match_regex($l_JobName,’^(job_).*$’,
‘CASE_INSENSITIVE’)=1)
begin
print(‘INFO:thejobnameiscorrect!”);
end
else
begin
print(‘WARNING:jobhasnotbeennamedaccording’
||‘tothestandards.’
||‘Changingthenameof{$l_JobName}…’);
$l_JobName=‘Job_’||$l_JobName;
print(‘INFO:newjobnameis’||$l_JobName);
end
Asthefinalstep,savethejobandexecuteit.Now,thestringinyourthirdcolumnshouldbeJob“Helloworld”saysHello.Now,evenifyourenameyourjobandremovetheJob_prefix,yourscriptshouldseethisandaddtheprefixtoyourjobname.
Howitworks…Asyoucanseeintheprecedingexample,weusedcommonstringmanipulationfunctionssimilartotheotherprogramminglanguages.
Inthefirstpartoftherecipe,wetransformedthemappingoftheWHO_SAYS_WHATcolumntostripouttheJob_prefixfromtheparametervalue.Thisallowsustocorrectlywraptherestofthejobnameintodoublequotesforbetterpresentation.
Theinit_cap()functioncapitalizesthefirstcharacteroftheinputstring.
Thelower()functiontransformstheinputstringtolowercase.
Theltrim()functiontrimsthespecifiedcharactersontheleft-handsideoftheinputstring.Usually,itisusedtoquicklyremoveleadingblankcharactersinstrings.Thertrim()functiondoesthesamethingbutfortrailingcharacters.
Theword()functionisextremelyusefulinparsingtheinputstringtoextract“words”orpartsofastringseparatedbyspacecharacters.Thereisanextendedversionoftheword_ext()function.Itacceptsaspecifiedseparatorasthethirdparameter.Asthesecondparameterinboththeseversions,youwillspecifythewordnumbertobeextractedfromthestring.
Youprobablyhavealreadyguessedthat||isusedasastringconcatenationoperator.
Thesecondpartofthechangesimplementedinthisrecipeinthescriptobjectcontainedtheveryinterestingandpowerfulmatch_regex()function.ItisoneofthefewfunctionsthatrepresentsregularexpressionsupportwithinDataServices.Ifyouarenotfamiliarwithregularexpressionconcept,youcanfindmanysourcesontheInternetexplainingitindetail.Regularexpressionsaresupportedinalmostallmajorprogramminglanguagesandallowyoutospecifymatchingpatternsinaveryshortform.Thismakesthemveryeffectivetoparseastringandfindamatchingsubstringorpatternforit.
IntheDataServicesmatch_regex()function,ifyouspecifyaregularexpressionpatternstringasasecondinputparameter,itwillreturn1ifitfindsthematchofthepatternintheinputstring.Itwillreturn0ifitdoesnotfindthematch.Itisaveryeffectivewaytovalidatetheformatofthetextstringorlookforspecificcharactersorpatternsinthestring.
Here,wecheckedwhetherourjobhastheprefixJob_initsname.Ifnot,weshouldaddittothebeginningofthejobnamebeforepassingthevaluetoadataflow.
There’smore…FeelfreetoexploretheexistingstringfunctionsavailableinDataServices.Therearesomeextendedversionsofthefunctionswealreadyusedintheprecedingrecipe.Youcantakealookatthem.Forexample,theltrim_blanks()functionallowsyoutoquicklyremoveblankcharacterswithoutspecifyingextraparameters.Itsextendedversion,theltrim_blanks_ext(),substr()functionreturnspartofthestringfromanotherstring.Thereplace_substr()functionisusedtosubstitutepartofthestringwithanotherstring.
Wewilldefinitelyusesomeoftheminourfuturerecipesthroughoutthebook.
UsingdatefunctionsCorrectlydealingwithdatesandtimeiscriticallyimportantindatawarehouses.Intheend,youshouldunderstandthatthisisoneofthemostimportantattributesinamajorityoffacttablesinyourDWH,whichdefinesthe“position”ofyourdatarecords.Lotsofreportsarefilteringdatabydate-timefieldsbeforeperformingdataaggregation.ThisisprobablywhyDataServiceshasadecentamountofdatefunctions,allowingavarietyofoperationsondate-timevariablesandtablecolumns.
DataServicessupportsthefollowingdatedatatypes:date,datetime,time,andtimestamp.Theydefinewhatpartoftimeunitsarestoredinthefield:
date:Thisstoresthecalendardatedatetime:Thisstoresthecalendardateandthetimeofthedaytime:Thisstoresonlythetimeofthedaywithoutthecalendardatetimestamp:Thisstoresthetimeofthedayinsubseconds
Howtodoit…GeneratingcurrentdateandtimeHereisascriptthatcanbeincludedinyourcurrentscriptobjectintheHelloWorldjobtodisplaythegenerateddatevaluesinthejobtracelog.
Totestthisscript,createanewjobcalledJob_Date_FunctionsandanewscriptwithinitcalledSCR_Date_Functions.Also,createfourlocalvariablesinthejob:$l_dateofthedatedatatype,$l_datetimeofthedatetimedatatype,$l_timeofthetimedatatype,and$l_timestampofthetimestampdatatype.
Printoutdatefunctionexamplestothetracelog:$l_date=sysdate();
print(‘$l_date=[$l_date]’);
$l_datetime=sysdate();
print(‘$l_datetime=[$l_datetime]’);
$l_time=systime();
print(‘$l_time=[$l_time]’);
$l_timestamp=systime();
print(‘$l_timestamp=[$l_timestamp]’);
$l_timestamp=sysdate();
print(‘$l_timestamp=[$l_timestamp]’);
Thetracelogfiledisplaysthefollowinginformation:$l_date=2015.05.05
$l_datetime=2015.05.0518:47:27
$l_time=18:47:27
$l_timestamp=1900.01.0118:47:27.030000000
$l_timestamp=2015.05.0518:15:21.472000000
Asyoucansee,differentdatatypesareabletostoredifferentamountsofdata.Also,youseethatthesystime()functiondoesnotgeneratedate-relateddata(days,months,andyears),and1900.01.01thatyouseeinthefirsttimestampvariableoutputisadummydefaultdatevalue.Thesecondoutputshowsthatweusedthesysdate()functiontogetthisinformation.
ExtractingpartsfromdatesHerearesomeusefuloperationsyoucanperformtoextractpartsfromdatatypevalues.Notethatallofthemreturnintegervalues.Youcanappendthesecommandstothescriptobjectalreadycreatedinordertotesthowtheywork:$l_datetime=sysdate();
print(‘$l_datetime=[$l_datetime]’);
#ExtractYearfromdatefield
print(‘Year=’||date_part($l_datetime,‘YY’));
#ExtractDayfromdatefield
print(‘Day=’||date_part($l_datetime,‘DD’));
#ExtractMonthfromdatefield
print(‘Month=’||date_part($l_datetime,‘MM’));
#Displaydayinmonthfortheinputdate
print(‘DayinMonth=’||day_in_month($l_datetime));
#Displaydayinweekfortheinputdate
print(‘DayinWeek=’||day_in_week($l_datetime));
#Displaydayinyearfortheinputdate
print(‘DayinYear=’||day_in_year($l_datetime));
#Displaynumberofweekinyear
print(‘WeekinYear=’||week_in_year($l_datetime));
#Displaynumberofweekinmonth
print(‘WeekinMonth=’||week_in_month($l_datetime));
#Displaylastdayofthecurrentmonthintheprovidedinputdate
print(‘Lastdateofthedatemonth=’||last_date($l_datetime));
Theoutputinatracelogshouldbesimilartothis:$l_datetime=2015.05.0515:55:09
Year=2015
Day=5
Month=5
DayinMonth=5
DayinWeek=2
DayinYear=125
WeekinYear=18
WeekinMonth=1
Lastdateofthedatemonth=2015.05.3115:55:09
Howitworks…Somefunctionsusetheextraformattingparameter,forexample,date_part()does.Youcanalsouse‘HH’,‘MI’,‘SS’toextracthours,minutes,andsecondsrespectively.
Therearealsoshorterversionsofthedate_part()functionthatallowyoutoextractyear,month,orquarterwithoutspecifyinganyextraformattingparameters.Forthis,youcanusetheyear(),month(),andquarter()functions.
Aninterestingfunctionistheisweekend()function.Itreturns1ifthespecifieddatevalueisaweekend,and0ifit’snot.
There’smore…YoucanaccessthefulllistoffunctionsavailableinDataServicesfromdifferentplacesinDesigner.Oneoptionistoopenthescriptobject.ThereisaFunctions…buttonatthetopofthemaindesignwindow.ClickittoopentheSelectFunctionwindow.Allfunctionsarecategorizedandhaveashortdescriptionexplaininghowtheyworkandwhattheyrequireasinputparameters.Lookatthisscreenshot:
ThesamebuttonisalsoavailableontheMappingtaboftheQuerytransforminsideadataflow,soyoucanaccessitifyouaretryingtocreateatransformationruleforoneofthecolumns.
ThislistisalsoavailableinSmartEditor,butwewilldiscussitindetailinoneofthenextrecipes.Ofcourse,youcanalwaysreferencetheDataServicesdocumentationtoseeallfunctionsavailableinDataServices,andsomeexamplesoftheirusage.
UsingconversionfunctionsConversionfunctionsallowyoutochangethedatatypeofthevariableorcolumndatatypeintheQuerytransformfromonetoanother.Thisisveryhandy,forexample,whenyoureceivedatevaluesasstringcharactersandwanttoconvertthemtointernaldatedatatypestoapplydatefunctionsorperformarithmeticoperationsonthem.
Howtodoit…Oneofthemostusedfunctionstoconvertfromonedatatypetoanotheristhecast()function.Lookattheexampleshere.Asusual,createanewjobwithanemptyscriptobjectandtypethiscodeinit.Createa$l_varcharjoblocalvariableofthevarchar(10)datatype:$l_varchar=‘20150507’;
#Castingvarchartointeger
print(cast($l_varchar,‘integer’));
#Castingvarchartodecimal
print(cast($l_varchar,‘decimal(10,0)’));
#Castingintegervaluetovarchar
print(cast(987654321,‘varchar’(10)’));
#Castingintegertoadouble
print(cast($l_varchar,‘double’));
Theoutputisshownhere:
Rememberthattheprint()functionautomaticallyconvertstheinputtovarcharinordertodisplayitinatracefile.Notehowcastingtoadoubledatatypechangedtheappearanceofthenumber.
Castingishelpfulinordertomakesurethatyouaresendingvaluesofthecorrectdatatypetothecolumnofaspecificdatatypeorfunctionthatexpectsthedataoftheparticulardatatyperequiredforittoworkcorrectly.AutomaticconversionsperformedbyDataServiceswhenthevalueofonedatatypeisassignedtoavariableorcolumnofadifferentdatatypecouldproduceunexpectedresultsandleadtoerrors.
However,themostusefulconversionfunctionsarefunctionsusedtoconvertastringtoadateandviceversa.Addthefollowinglinestoyourscriptandrunthejob:$l_varchar=‘20150507’;
#Castingvarchartoadate
print(to_date($l_varchar,‘YYYYMMDD’));
#Convertingchangingformatoftheinputdate
#from”YYYYMMDD’to‘DD.MM.YYYY’
print(
to_char(to_date($l_varchar,‘YYYYMMDD’),‘DD.MM.YYYY’)
);
Whenconvertingtextstringtoadate,youhavetospecifytheformatofthestringsothattheDataServicescompilercaninterpretandconvertthevaluescorrectly.ThefulltableofpossibleformatsavailableinthesetwofunctionsisavailableintheDataServicesReferenceGuideavailablefordownloadathttp://help.sap.com.Refertoitformoredetails.Herearesomemoreexamplesoftheto_char()functionconversionsofadatevariable:$l_date=sysdate();
print(to_char($l_date,‘DDMONYYYY’));
print(to_char($l_date,‘MONTH-DD-YYYY’));
Thetracelogshouldbesimilartothefollowingone:07MAY2015
MAY-07-2015
Let’sgetfamiliarwithanotherinterestingdatatype:interval.Ithelpsyouperformarithmeticoperationsondates.Thescripthereperformsarithmeticoperationsonadatestoredinthe$l_datevariablebyfirstadding5daystoit,thencalculatingthefirstdateofthenextmonth,andfinallysubtracting1secondfromthedate-timevaluestoredinthe$l_datetimevariable.
Seetheexamplehere:$l_date=to_date(‘01/05/2015’,‘DD/MM/YYYY’);
print(‘Date=’||$l_date);
#Add5daystothe$l_datevalue
print(‘{$l_date}+5days=’||$l_date+num_to_interval(5,‘D’));
#Calculatefirstdayofnextmonth
print(‘Firsdayofnextmonth=’||last_date($l_date)+num_to_interval(1,‘D’));
#Subtract1secondoutofthedatetime
$l_datetime=to_date(‘01/05/201500:00:00’,‘DD/MM/YYYYHH24:MI:SS’);
print(‘{$l_datetime}minus1second=’||$l_datetime-num_to_interval(1,‘S’));
Howitworks…Youprobablyhavenotnoticed,butyouhavealreadyseentheresultsofimplicitdatatypeconversionmadeautomaticallybyDataServicesinthepreviousrecipes.Forexample,dateextractfunctionsreturnedintegervaluesthatwereconvertedautomaticallytovarcharsothattheycouldbeconcatenatedwiththestringpartanddisplayedusingtheprint()function,which,bytheway,canacceptonlyvarcharasaninputparameter.
DataServicesdoesdatatypeconversionsautomaticallywheneveryouassignavalueofonedatatypetoavariableorcolumnofadifferentdatatype.TheonlypotentialpitfallhereisthatifyourelyonautomaticconversionyouareleavingsomeguessingworktoDataServicesandcangetunexpectedresultsintheend.So,understandinghowandwhenconversionhappensautomaticallytoimplementmanualchecksinsteadcouldbecritical.ManybugsinETLcodearerelatedtoincorrectdatatypeconversion,soyoushouldbeextracareful.
There’smore…Trytoexperimentwithautomaticconversion.Forexample,whenaddingintegernumberstodatevariables:sysdate()+10toseehowDataServicesbehavesandwhichdefaultparametersitusesforformattingautomaticallyconvertedvalue.
UsingdatabasefunctionsThereisnogreatvarietyoffunctionsinthisarea.DataServicesencouragesyoutocommunicatewithdatabaseobjectsandcontroltheflowofdatawithinadataflow.
Howtodoit…Youwilllearnalittlemoreaboutthefunctionshere.
key_generation()First,let’slookatthekey_generation()function.Thisisthefunctionthecanbecalledonlyfromthedataflow(whenusedincolumnmapping),sowearenotinterestedinitatthispointaswecannotuseitintheDataServicesscripts.
ThisfunctionisactuallysimilartotheKey_Generationtransformobjectthatcanbeusedaspartofadataflowaswell,anditisusedtolookupthehighestkeyvaluefromatablecolumnandgeneratethenextone.Thisisoftenusedtopopulatethekeycolumnofthenewrecordwiththeuniquevaluesbeforeinsertingthisrecordtoatargettable.WewilltakeacloserlookattheKey_Generationtransformintheupcomingchapters.
total_rows()Thisfunctionisusedtocalculatethetotalnumberofrowsinthedatabasetable.Theeasiestandquickestwaytocheckinthescriptwhetherthetableisemptyornotbeforerunningadataflowpopulatingthistableistorunthisfunction.Then,accordingtotheresults,youcanmakefurtherdecisions,thatis,truncatethetabledirectlyfromascriptbeforerunningthenextdataflow.Alternatively,youcanuseconditionalstoskipthenextportionofETLcodeentirely.
Seetheexampleofhowthisfunctionisused.Asusual,youcancreateanewjobwithascriptobjectinsideit.Typethefollowingcodeandrunthejob:print(
total_rows(‘DWH.DBO.DIMACCOUNT’)
);
DonotforgettoimportthetableintoyourDWHdatastoreasyoucanreferenceonlytablesthathavebeenimportedinyourDataServicesrepository.Lookatthisscreenshot:
sql()Thesql()functionisauniversalfunctionthatallowsyoutoperformSQLcallstoanydatabaseforwhichyoucreatedadatastoreobject.YoucanrunDDLandDMLstatements,SELECTqueries,andevencallstoredproceduresanddatabasefunctions.
NoteYoushouldbeusingthesql()functionverycarefullyinyourscripts,andwedonotrecommendthatyouuseitatallincolumnmappingsinsideadataflow.Thisfunctionshouldonlybeusedtoreturnonerecordwithasfewfieldsaspossible.So,alwaystestthe
statementyouplaceinsidethesql()functiondirectlyinthedatabasefirsttomakesureitbehavesasexpected.
Forexample,tocalculatethetotalnumberofrowsintheDimAccounttablewiththesql()function,youcanusethefollowingcode:print(‘TotalnumberofrowsinDBO.DIMACCOUNTtableis:’||
sql(‘DWH’,‘SELECTCOUNT(*)FROMDBO.DIMACCOUNT’)
);
Howitworks…Thesql()functionisveryconvenientfordoingstoredproceduresexecutions,truncating,andcreatingdatabaseobjectsanddoinglookupsforaggregatedvalueswhenthequeryreturnsonlyoneroworevenonevalue.Ifyoutrytoreturnthedatasetofmultiplerows,youwillgetonlythevalueofthefirstfieldfromthefirstrow.Itisstillpossibletoquerymultiplefields,butitwillrequirethatyoumodifythequeryitselfandaddextracodetoparsethereturnedstring(seetheexamplehere):#returningmultiplefieldsfromadatabasetable
$l_row=sql(‘DWH’,‘SELECTCONVERT(VARCHAR(10),ACCOUNTKEY)’||
’+','+CONVERT(VARCHAR(50),ACCOUNTDESCRIPTION)’||
‘FROMDBO.DIMACCOUNT’);
$l_AccountKey=word_ext($l_row,1,’,’);
$l_AccountDescription=word_ext($l_row,2,’,’);
print(‘AccountKey={$l_AccountKey}’);
print(‘AccountDescription={$l_AccountDescription}’);
Asyoucansee,thisisalotofcodeforsuchasimpleprocedure.IfyouwanttoextractandparseamultiplerowsintheDataServicesscript,youwillhavetocreatearow-countingmechanismandloopthroughtherowsbydoingmultiplequeryexecutionswithinaloop.However,youcantrytodothisyourselfasanexercisetopracticealittlebitofDataServicesscriptinglanguage.
NoteNotethatyoudonothavetoimportthetableyouwanttoreferenceinthesql()functionintoadatastore.
UsingaggregatefunctionsAggregatefunctionsareusedindataflowQuerytransformstoperformaggregationonthegroupeddataset.
YoushouldbefamiliarwiththesefunctionsastheyarethesameonesusedintheSQLlanguage:avg(),min(),max(),count(),count_distinct(),andsum().
Howtodoit…Todemonstratetheuseofaggregatefunctions,wewillperformasimpleanalysisofoneofourtables.ImporttheDimGeographytableintotheDWHdatastoreandcreateanewjobwithasingledataflowinsideitusingthesesteps:
1. YourdataflowshouldincludetheDimGeographysourcetableandtheDimGeography
targettemplatetableinaSTAGEdatabasetosendtheoutputto:
2. OpentheQuerytransformandcreatethefollowingoutputstructure:
TheCOUNTRYREGIONCODEcolumncontainscountrycodevaluesandwillbethecolumnonwhichweperformthegroupingofthedataset.Itismappedfromtheinputdatasettotheoutput.Also,draganddropittotheGROUPBYtaboftheQuerytransformfromtheinputdatasettospecifyitasagroupingcolumn.OthercolumnsarecreatedasNewOutputColumn…(choosethisoptionfromthecontextmenuoftheCOUNTRYREGIONCODEcolumn)andcontainthefollowingmappings(seethetablehere):
Outputcolumnname Mappingexpression
COUNT_DISTINCT_PROVINCE count_distinct(DIMGEOGRAPHY.STATEPROVINCENAME)
COUNT_PROVINCE count(DIMGEOGRAPHY.STATEPROVINCENAME)
MIN_KEY min(DIMGEOGRAPHY.GEOGRAPHYKEY)
MAX_KEY max(DIMGEOGRAPHY.GEOGRAPHYKEY)
3. Savethechangesandrunthejob.Now,gotoManagementStudioandquerythecontentsofthenewlycreatedDimGeographytableintheSTAGEdatabase.Youshould
gettheresultsasshowninthisscreenshot:
Howitworks…WhatwehavejustbuiltinthedataflowintheQuerytransformcanbedonewiththefollowingSQLstatement:select
CountryRegionCode,
COUNT(DISTINCTStateProvinceName),
COUNT(StateProvinceName),
MIN(GeographyKey),
MAX(GeographyKey)
from
dbo.DimGeography
groupby
CountryRegionCode;
First,thecount_distinct()functioncalculatesthenumberofdistinctprovinceswithineachcountry,count()calculatesthetotalnumberofrowsforeachcountry,andmin()andmax()showthelowestandhighestGeographyKeyvalueswithineachcountrygroup,respectively.
NoteYoucannotusethesefunctionsdirectlyinthescriptinglanguagebutonlyintheQuerytransform.IfyouneedtoextracttheaggregatedvaluesfromthedatabasetableswithinDataServicesscript,youcanusesql()containingtheSELECTstatementwithaggregateddatabasefunctions.
UsingmathfunctionsDataServiceshasastandardsetoffunctionsavailabletoperformmathematicaloperations.Inthisrecipe,wewillusethemostpopularofthemtoshowyouwhatoperationscanbeperformedonnumericdatatypes.
Howtodoit…1. CreateanewjobandnameitJob_Math_Functions.2. Insidethisjob,createasingledataflowcalledDF_Math_Functions.3. ImporttheFactResellerSalestableinyourDHWdatastoreandaddittothe
dataflowasasourceobject.4. AddthefirstQuerytransformafterthesourcetableandlinkthemtogether.Then,
openitanddragtwocolumnstotheoutputschema:PRODUCTKEYandSALESAMOUNT.SpecifytheFACTRESELLERSALES.PRODUCTKEY=354filteringconditionintheWHEREtab:
5. AddthesecondQuerytransformandrenameitGroup.Here,wewillperformagroupingoperationontheproductkeyweselectedintheprevioustransform.Todothis,addthePRODUCTKEYcolumnintheGROUPBYtabandapplythesum()aggregatefunctiononSALESAMOUNTintheMappingtab:
6. Finally,addthelastQuerytransformcalledMathandlinkittothepreviousone.Insideit,dragallcolumnsfromthesourcetothetargetschemaandaddthenewonesusingNewOutputColumn….Specifymappingexpressions,asinthefollowingscreenshot:
7. Asthelaststep,addanewtemplatetablelocatedintheSTAGEdatabaseownedbythedbouser.ThistemplateiscalledFACTRESELLERSALES.Yourdataflowshouldlooklikethisnow:
8. Saveandrunthejob.Then,tochecktheresultdataset,eitherquerythenewtablefromSQLServerManagementStudio.Alternatively,openyourdataflowinDataServicesandclickonthemagnifiedglassiconofyourFACTRESELLERSALES(DS_STAGE.DBO)targettableobjecttobrowsethedatadirectlyfromDataServices.
Howitworks…TheresultyouseehereverywellexplainstheeffectofthemathfunctionsappliedtoyourSALESAMOUNTcolumnvalue:
Theceil()functionreturnsthesmallestintegervalue(automaticallyconvertedtoaninputcolumndatatype;thatiswhy,youseetrailingzeroes)equaltoorgreaterthanthespecifiedinputnumber.
Thefloor()functionreturnsthehighestintegervalueequaltoorlessthantheinputnumber.
Therand_ext()functionreturnsarandomrealnumberfrom0to1.InDataServices,youdonothavemuchcontroloverthebehaviorofthefunctionsthatgeneraterandomnumbers.So,youhavetoapplyextramathematicaloperationstodefinetherangeofthegeneratedrandomnumbersandtheirtypes.Intheexampleearlier,wegeneratedrandomintegernumbersfrom0to10inclusively.
Thetrunc()andround()functionsperformroundingoperationssimilartoceil()andfloor(),buttrunc()justtruncatesthenumbertothelengthspecifiedinthesecondparameterandshowsyoutheresultasis.Ontheotherhand,theround()functionroundsthenumberaccordingtotheprecisionspecified.
There’smore…Asanexercise,trytheotherDataServicesmathematicalfunctions.Modifythecreateddataflowtoincludeexamplesoftheirusage.Toseethefulllistofmathematicalfunctionsavailable,usetheFunctions…buttoninthescriptobjectorcolumnmappingfieldandchoosetheMathFunctionscategoryintheSelectFunctionwindow:
UsingmiscellaneousfunctionsActually,miscellaneousgroupsincludealmostalltypesoffunctionsthatcannoteasilybecategorized.Amongmiscellaneousfunctions,therearefunctionsthatallowyoutoextractusefulinformationfromtheDataServicesrepository,forexample,nameofthejob,theworkflowordataflowitisexecutedfrom,functionsthatallowyoutoperformadvancedstringsearches,functionssimilartootherstandardSQLfunctions,andmanyothers.Throughoutthebook,wewillveryoftenuseDataServicesmiscellaneousfunctions.So,inthisrecipe,wewilltakealookatsomeofthosethatareusuallyusedinthescriptsandhelpyouquerytheDataServicesrepository.
Howtodoit…Atthispoint,youshouldbeprettycomfortablecreatingnewjobs,scriptobjects,anddataflowobjects.So,Iwillnotexplainthestepsindetaileverytimeweneedtocreateanewtestjobobject.Ifyouforgothowtodoit,refertothepreviousrecipesinthebook.
1. Createanewjobandaddascriptobjectinit.2. Openthescriptandpopulateitwiththefollowingcode.Thiscodeshowsyouan
exampleofhowtousethreemiscellaneousfunctions:ifthenelse(),decode(),andnvl():#Conditionalfunctions
$l_string=‘Lengthofthatstringis38characters’;
$l_result=ifthenelse(length($l_string)=8,print(‘TRUE’),print(‘FALSE’));
$l_string=‘Lengthofthatstringis38characters’;
$l_result=decode(
length($l_string)=10,print(‘TRUE’),
length($l_string)=12,print(‘TRUE’),
length($l_string)=38,print(‘TRUE’),
print(‘FALSE’)
);
$l_string=NULL;
$l_string=nvl($l_string,‘Emptystring’);
print($l_string);
3. Forthisscripttowork,youshouldalsomakesurethatyouhavelocalvariablescreatedatthejoblevel—$l_stringand$l_resultofthevarchar(255)datatype.
Howitworks…MostofthemiscellaneousfunctionsarefunctionsthatrequireadvancedknowledgeofDataServices.Inthisbook,youwillseealotofexamplesofhowtheycanbeusedincomplexdataflowsandDataServicesscripts.
Inthisrecipe,wecanseethreeconditionalfunctions:ifthenelse(),decode(),andnvl().Theyallowyoutoevaluatetheresultofanexpressionandexecuteotherexpressions,dependingontheresultoftheinitialevaluation.
Afterexecutingtheearlierscript,youcanseethefollowingtracelogrecords:817212468PRINTFN18/05/20158:12:34p.m.FALSE
817212468PRINTFN18/05/20158:12:34p.m.TRUE
817212468PRINTFN18/05/20158:12:34p.m.Emptystring
Theifthenelse()functionacceptsoneinputparameter:acomparisonexpression,whichreturnseitherTRUEorFALSE.IfTRUE,thenthesecondparameterofifthenelse()isexecuted(ifitisanexpression)orjustreturnedastheresultofthefunction.Thethirdparameterisexecuted(orreturned)ifthecomparisonexpressionreturnsFALSE.
Thedecode()functiondoesthesamethingastheifthenelse()function,exceptthatitallowsyoutoevaluatemultipleexpressions.Itsparametersgoinpairs,asyoucanseeintheexample.ThefirstparameterinapairisacomparisonexpressionandthesecondparameteriswhatisreturnedbythefunctionifthecomparisonexpressionisTRUE.IfitreturnsFALSE,thendecode()movestothenextpairandthenthenextoneuntilitreachesthelastpair.IfnoneoftheexpressionsreturnedTRUE,thenthelastparameterofthedecode()isreturnedasadefaultvalue.
NoteBearinmindthatthedecode()functionfirstreturnsTRUEwithoutevaluatingtherestoftheconditions.So,becarefulwiththeorderofconditionalexpressionsinthedecode()function.
Finally,thelastfunctionintheexampleisthecommonSQLfunctionnvl().ItreturnsthevaluespecifiedinthesecondparameterifthefirstparameterisNULL.Thisfunctionisveryusefulindataflows.Usually,itisusedasamappingexpressionintheQuerytransformtopreventNULLvaluesfromcomingthroughforaspecificcolumn.AllNULLvalueswillbeconvertedtothevalueyoudefineinthenvl()function.
CreatingcustomfunctionsInthisrecipe,wewillgetfamiliarwithaSmartEditortoolavailableinDesignertohelpyouwriteyourscriptsorfunctionsinaconvenientway.
Wewillcreateanewfunctionthatcanbeexecutedeitherwithinascriptorwithinadataflow.Thisfunctionacceptstwoparameters:datevalueandnumberofdays.Itthenaddsthenumberofdaystotheinputdateandreturnstheresultdate.
Howtodoit…1. OpenDesignerandgotoTools|CustomFunctions…fromthetoplevelmenu:
2. Intheopenedwindow,right-clickintheareawiththelistoffunctionsandchooseNew….
3. Choosethenameofthenewfn_add_daysfunctionandpopulatethedescriptionsection,asshowninthisscreenshot:
4. Then,clickonNexttoopenaSmartEditorwindowandinputthefollowingcode:try
begin
$l_Date=to_date($p_InputDate,‘DD/MM/YYYY’);
$l_Days=num_to_interval($p_InputDays,‘D’);
end
catch(all)
begin
print(‘fn_add_days()FAILED:checkinputparameters’);
raise_exception(‘fn_add_days()FAILED:checkinputparameters:’||
’DateformatDD/MM/YYYYandnumberofdaysshouldbeanintegervalue’);
end
$l_Result=$l_Date+$l_Days;
Return$l_Result;
5. Forittowork,youhavetocreateasetofrequiredinput/outputparametersandlocalvariablesforthiscustomfunction.YourfunctionintheSmartEditorshouldlookliketheoneshowninthisscreenshot:
6. Createthefollowinginputparameters:$p_InputDateofthevarchardatatypeand$p_InputDaysoftheintegerdatatype.UsetheleftpanelVariablesinsidetheCustomFunctionwindow.
7. Theselocalvariableswillbeusedonlywithinafunctionandwillnotbeaccessiblefromoutsideofthefunction.Create$l_Dateofthedatedatatype,$l_Daysoftheintervaldatatype,and$l_Resultofthedatedatatype.
8. Now,itistimetoclickonOKtocreateourfirstcustomfunctionanduseitinthejob.Forthis,youcancreateasimplejobwithonescriptobjectinsideitusingthefollowingcode:print(to_char(fn_add_days(‘10/10/2015’,12),‘DD-MM-YYYY’));
Howitworks…Wemadetheinputparametersofthevarcharandintegerdatatypesfortheconvenienceofcallingthefunction.Itwillitselfperformtheconversiontothecorrectdateandintervaldatatypesbeforereturningtheresultofthedatesumoperation.
Eventhoughwehavenotusedthenum_to_interval()functiontoconvertintegervaluestointervals,DataServiceswillstillperformthecorrectsumoperation.Thisisbecauseitdoesanautomaticconversionofthenumericdatatypeintointervalsofdayswhenitisusedinarithmeticoperationwithdates.Thatiswhy,print(sysdate()+1)willreturnyoutomorrow’sdate.
Inthecodementionedearlier,youcanalsoseetheerror-handlingmechanismthatcanbeusedinDataServicesscripts:thetry-catchblock.Everythingexecutedbetweentryandcatchiffailedwillneverfailtheparentobjectexecution.Itisveryusefulifyoudonotwanttofailyourjobbecauseofthenon-criticalpieceofcodefailingsomewhereinsideit.Incaseofafailedexecution,itispassedtothesecondbegin-endblockofthetry-catch.Here,youcanwriteextralogmessagestothetracelogfileandstillfailthejobexecutionwiththeraise_exception()functionifyouwantto.WewilldiscussitinmoredetailinChapter5,Workflow–ControllingExecutionOrder,andChapter9,AdvancedDesignTechniques.
There’smore…ThescriptinglanguageinDataServicesisaveryimportanttoolextensivelyusedinsimpleorcomplexjobs.Inthischapter,weestablishedagoodbaseregardingbuildingDataServicesscriptlanguageskills.Youwillfindalotmoreexamplesthroughoutthisbook.
Chapter4.Dataflow–Extract,Transform,andLoadInthischapterwewilltakealookatexamplesofthemostimportantprocessingunitinDataServices—thedataflowobject—andthemostusefultypesoftransformationsyoucanuseinsidethem.Wewillcover:
CreatingasourcedataobjectCreatingatargetdataobjectLoadingdataintoaflatfileLoadingdatafromaflatfileLoadingdatafromtabletotable–lookupsandjoinsUsingtheMap_OperationtransformUsingtheTable_ComparisontransformExploringtheAutocorrectloadoptionSplittingtheflowofdatawiththeCasetransformMonitoringandanalyzingdataflowexecution
IntroductionInthischapterwemovetothemostimportantcomponentoftheETLdesigninDataServices:thedataflowobject.Thedataflowobjectisthecontainerthatholdsalltransformationsthatcanbeperformedondata.
Thestructureofthedataflowobjectissimple:oneormanysourceobjectsareplaced,ontheleft-handside(whichweextractthedatafrom),thensourceobjectsarelinkedtotheseriesoftransformobjects(whichperformmanipulationonthedataextracted),andfinally,thetransformobjectsarelinkedtooneormanytargettableobjects(tellingDataServiceswherethetransformeddatashouldbeinserted).Duringthetransformationofthedatasetinsidethedataflow,youcansplitthedatasetintomultipledatasetflows,orconversely,mergemultipleseparatelytransformeddataflowstogether.
Manipulationsperformedondatainsidedataflowsaredoneonarow-by-rowbasis.Therowsextractedfromthesourcegofromlefttorightthroughallobjectsplacedinsidethedataflow.
WewillreviewallmajoraspectsofdataflowdesigninDataServices,fromcreatingsourceandtargetobjectstotheusageofcomplextransformationsavailableaspartoftheDataServicesfunctionality.
CreatingasourcedataobjectInacoupleofpreviousrecipes,youhavealreadybecomefamiliarwithdatasources,importingtables,andusingimportedtablesinsidedataflowsassourceandtargetobjects.Inthisrecipe,wewillcreatetherestofthedatastoreobjectslinkingallourexistingdatabasestoaDataServicesrepositoryandwillspendmoretimeexplainingthisprocess.
Howtodoit…IntheUnderstandingtheDesignertoolrecipeinChapter2,ConfiguringtheDataServicesEnvironment,wealreadycreatedourfirstdatastoreobject,STAGE,forthe“HelloWorld”example.
So,whydoyouneedadatastoreobjectandwhatisitexactly?DatastoreobjectsarecontainersrepresentingtheconnectionstospecificdatabasesandstoringimporteddatabasestructuresthatcanbeusedinyourDataServicesETLcode.Inreality,datastoreobjectsdonotstorethedatabaseobjectsthemselvesbutratherthemetadatafortheobjectsbelongingtotheapplicationsystemordatabasethatthedatastoreobjectconnectsto.Theseobjectsmostcommonlyincludetables,views,databasefunctions,andstoredprocedures.
Ifyouhavenotfollowedthestepsinthe“HelloWorld”examplepresentedinChapter2,ConfiguringtheDataServicesEnvironment,youcanfindherethestepstocreatealldatastoreobjectsthatwillbeusedinthebookexplainedinbetterdetail.Withthesesteps,wewillcreatedatastoreobjectsreferencingalldatabaseswehavecreatedpreviouslyinSQLServerinthefirsttwochapters:
1. OpentheDatastorestabinLocalObjectLibrary.2. Right-clickonanyemptyspaceinthewindowandchooseNewfromthecontext
menu:
3. FirstspecifyDatastoreTypeforthedatastoreobject.ThedatastoretypedefinestheconnectivitytypeanddatastoreconfigurationoptionsthatwillbeusedbyDataServicestocommunicatewithreferencedsource/targetsystemobjectslyingbehindthisdatastoreconnection.Inthisbook,wewillmainlybeworkingwithdatastoresoftheDatabasetype.SeethatassoonasthedatastoretypeDatabaseisselected,asecondDatabaseTypeoptionappearswithalistofavailabledatabases:
4. TheCreateNewDatastorewindow,withalloptionsexpandingafteryouchooseDatastoreTypeandDatabaseType,lookslikethisscreenshot:
5. Leavealladvancedoptionsattheirdefaultvaluesandconfigureonlythemandatoryoptionsinthetopwindowpanel:thedatabaseconnectivitydetailsandusercredentials,whichwillbeusedbyDataServicestoaccessthedatabaseandread/insertthedata.
6. Usingtheprevioussteps,createanotherdatastorenamedODS.7. Altogether,youshouldhavethefollowinglistofdatastoreobjectscreatedforallour
localtestdatabases.Ifyoudonothaveallofthem,pleasecreatethemissingonesusingthesamestepsjustmentioned:
DS_ODS:ThisisthedatastorelinkingtotheODSdatabaseDS_STAGE:ThisisthedatastorelinkingtotheSTAGEdatabaseDWH:ThisisthedatastorelinkingtotheAdventureWorks_DWHdatabaseOLTP:ThisisthedatastorelinkingtotheAdventureWorks_OLTPdatabase
8. TocreateareferencetoadatabasetableinthedatastoreOLTP,expandtheOLTPdatastoreintheLocalObjectLibrarytabanddouble-clickontheTableslist.
9. TheDatabaseExplorerwindowopensinaworkspaceDesignersection,showing
youallthetableandviewobjectsintheOLTPdatabase.10. FindtheHumanResources.EmployeetableintheExternalMetadatalist,right-click
onit,andchoosetheImportoptionfromthecontextmenu:
11. YoucanseehowthetablestatushaschangedinDatabaseExplorertoYesunderImportedandNounderChanged.
12. Also,youcanseethetablereferencesappearinthedatastoreOLTPtablelist.AsitisnotusedanywhereinETLcode,theUsagecolumnintheLocalObjectLibraryshows0forthattable.
13. Now,closetheDatabaseExplorerwindowanddouble-clickontheimportedtablenameintheLocalObjectLibrarywindow.TheTableMetadatawindowopensshowingyourtableattributesandevenallowingyoutoviewthecontentsofthetable:
NoteThisTableMetadatawindowisextremelyusefulforperformingasourcesystemanalysiswhenyouhavetolearnthesourcedatatounderstanditbeforestartingto
developyourETLcodeandapplyingtransformationrulesonit.
14. TheViewDatatabhasthreesubtabswithinit:theData,Profile,andColumnprofiletabs.ChoosetheColumnprofiletabandselecttheGENDERcolumninthedrop-downlist.
15. ClickontheUpdatebuttontoseethecolumnprofiledata:
Columnprofilingdatashowsthatthereare206maleemployees(71.03%)against84(28.97%)femaleones.
Howitworks…Themostimportantthingyoushouldunderstandaboutdatastoreobjectsisthatwhenyouimportadatabaseobjectintoadatastore,allyoudoisyoucreateareferencetothedatabaseobject.YouarenotcreatingaphysicalcopyofthetableinyourDataServicesdatastorewhenyouimportatable.Hence,whenyouuseViewdataintheTableMetadatawindowforthattable,DataServicesexecutestheSELECTqueryinthebackgroundtoextractthisdataforyou.
Lookingatthebrowsingexternalmetadatascreenagain,youcanseethattherearetwootheroptionsavailableinthetablecontextmenu:OpenandReconcile:
TheOpenoptionallowsyoutoopenanexternaltablemetadatawindowwhichcandisplaytabledefinitioninformation,partitions,indices,tableattributes,andotherusefulinformation.
TheReconcileoptionsimplyupdatesthetwocolumns,ImportedandChanged,intheExternalMetadatalist.Itisusefulwhenyouwanttocheckwhetherthetableobjecthasbeenimportedintoadatastorealreadyandwhetherithaschangedinthedatabasesincethelasttimeitwasimportedintoadatastore.
NoteItistheETLdeveloper’sresponsibilitytoreimportthetableobjectsinthedatastoreiftheirdefinitionorstructurehasbeenchangedonthedatabaselevel.DataServicesdoesnotautomaticallyperformthisoperation.ThemostcommonproblemwithtableobjectsynchronizationiswhenthecolumnpopulatedbyETLgetsremovedfromthetableinthedatabase.Toreflectthischange,thedeveloperhastoreimportthetableobjectinthedatastoretoupdatetableobjectstructureinDataServicesandthenupdateETLcodetomakesurethatanon-existingcolumnisnotreferencedasthetargetcolumnanymore.
Viewsassourceobjectsbehaveexactlyastables.TheycanbeimportedinthedatastoreinthesameTablessectionalongwithothertableobjects.Theonlydifferenceisthatyoucannotspecifytheimportedviewasatargetobjectinyourdataflow.
Youmayalsowonderthat,ifthedatastoreobjectrepresentstheconnectiontoaspecific
database,whydoyounotseeallthedatabaseobjectsstraightawayaftercreatingit.Theanswerissimple:youimportonlythosedatabaseobjectsyouwillbeusinginyourETLcode.Ifthedatabasehasafewhundredtables,itwouldbeextremelytime-andresource-intensiveforDataServicestoautomaticallysynchronizealldatastoreobjectreferenceswithactualdatabaseobjectseachtimeyouopenaDesignerapplication.ItisalsoeasierforthedevelopertobeabletoseeonlythetablesusedinETLdevelopment.Plus,withthedatastoreconfigurationsfeature,youcanusethesamedatastoreobjecttoconnecttodifferentphysicaldatabases,thatmighthavedifferentversionsofthetableswiththesamenames,sothesynchronizationofobjectsimportedinthedatastoreissolelyyourresponsibilityandhastobedonemanually.Wewilldiscussconfigurationsinthefuturechapters.
TheprofilingfunctionalityofDataServicesthatweusedinthisrecipeallowsyoutolookintothedatawithouttheneedforgoingtoSQLServerManagementStudioandmanuallyqueryingthetables.ItiseasyandconvenienttouseduringETLdevelopment.
There’smore…Itisquitedifficulttocoveralltheinformationaboutalldatastoresettingsinonechapter,asDataServicesisabletoconnecttosomanydifferentdatabasesandapplicationsystems.Asthedatastoreoptionsaredatabasespecific,thenumberofoptionsandtheirbehaviorvarydependingonwhichdatabaseorsystemyouaretryingtoconnectto.
CreatingatargetdataobjectAtargetdataobjectistheobjecttowhichwesendthedatawithinadataflow.Thereareafewdifferenttypesoftargetdataobjects,butthetwomainonesaretablesandflatfiles.Inthisrecipe,wewilltakealookatatargettableobject.
NoteViewsimportedintoadatastorecannotbetargetobjectswithinadataflow.Theycanonlybeasourceofdata.
GettingreadyToprepareforthisrecipe,weneedtocreateatableinourSTAGEdatabase.Todothat,pleaseconnecttoSQLServerManagementStudioandcreatethePersontableintheSTAGEdatabaseusingthefollowingcommand:CREATETABLEdbo.Person
(
FirstNamevarchar(50),
LastNamevarchar(50),
Ageinteger
);
Thistablewillbeusedasatargettable,whichwewillloaddataintobyusingDataServices.WewillusethedatastoredinthePersontablefromtheOLTPdatabaseasthesourcedatatobeloaded.
Howtodoit…1. OpentheDataServicesDesignerapplication.2. IntheDS_STAGEdatastore,right-clickonTablesandchoosetheoptionImportBy
Name…(anotherquickmethodtoimportatabledefinitionintoDataServiceswithoutopeningDatabaseExplorer).Ofcourse,inordertodothat,youshouldknowtheexacttablenameandschemaitwascreatedin.
3. Intheopenedwindow,entertherequireddetails,asinthefollowingscreenshot:
4. ClickontheImportbuttontofinish.5. AlsointheOLTPdatastore,importanewtablePersonfromthePersonschema
insidetheAdventureWorks_OLTPdatabase.Wewillusethistableasasourceofdata.
NoteIntheexampleofusingSQLServerasanunderlyingdatabase,theownerissynonymouswiththedatabaseschema.Whenimportingatablebynameinthedatastoreorcreatingtemplatetables,theOwnerfielddefinestheschemawherethetablewillbeimportedfrom/createdinthedatabase.So,keepinmindthatyouhavetouseexistingschemacreatedpreviously.
6. Createanewjobwithanewdataflowobject,openthedataflow,anddragthePersontablefromtheOLTPdatastoreintothisdataflowasasource.Then,dragthePersontablefromtheDS_STAGEdatastoreasatarget.
7. CreateanewQuerytransformbetweenthemandlinkittobothsourceandtargettables:
8. AssoonasyouopentheQuerytransform,youwillseethatbothinputandoutputstructureswerecreatedforyou.AllcolumnnamesanddatatypeswereimportedfromthesourceandtargetobjectsyoulinkedtheQuerytransformto,andallyouhavetodoistomapthecolumnvaluesfromthesourcetopasstothecolumnsinthetargetyouwanttopassthemto.
9. MapthesourceFIRSTNAMEcolumntothetargetFIRSTNAMEcolumnandperformthe
samemappingforLASTNAME.AsthereisnoAGEcolumninthesource,putNULLasthevalueforthemappingexpressionfortheAGEtargetcolumnintheQuerytransform.Thiscanbedonebydragginganddroppingfromtheinputtotheoutputschemaorbytypingthemappingmanually:
10. EachtargetobjectwithinadataflowhasasetofoptionsthatisavailableintheTargetTableEditorwindow.Toopenit,double-clickonatargettableobjectinthedataflowworkspace:
11. Fornow,let’sjustselecttheDeletedatafromtablebeforeloadingcheckbox.Thisoptionmakessurethateachtimethedataflowruns,alltargettablerecordsaredeletedbyDataServicesbeforepopulatingthetargettablewithdatafromasourceobject.
12. ValidatethedataflowbyclickingontheValidateCurrentbuttonwhenthedataflowisopenedinthemainworkspacetomakesurethatyouhavenotmadeanydesignerrors.
13. NowexecutethejobandclickontheViewDatabuttoninthebottom-rightcornerofthetargettableiconwithinadataflowtoseethedataloadedintothetargettable.
Howitworks…Youcanseethatthetargettableobjecthasalotofoptions.DataServicescanperformdifferenttypesofloadingofthesamedataset,andallthosetypesareconfiguredinthetargettableobjecttabs.Someofthemareusediftheinserteddatasetisvoluminous,whilesomeofthemallowyoutoinsertdatawithoutduplicatingit.Wewilldiscussallofthisindetailinalaterchapters.
WhenDataServicesselectsdatafromsourcetables,allitdoesisexecutetheSELECTstatementinthebackground.ButwhenDataServicesinsertsthedata,thereareriskssuchasincompatibledatatypes/values,duplicatedata(whichviolatesreferentialintegrityinthetargettable),slowperformance,andsoon.Donotforgetthatyouinsertdataaftertransformingit,soitisyourresponsibilitytounderstandthetargetdatabaseobjectrequirementsandspecificsofthedatayouareinserting.
ThatiswhytheloadingmechanisminDataServiceshasmanymoresettingstoconfigureandismuchmoreflexiblethanthemechanismofgettingsourcedatainsideadataflow.
There’smore…Asyoumightrememberfromthe“HelloWorld”exampleinChapter2,ConfiguringtheDataServicesEnvironment,thereisagreatandsimplewaytocreatetargettableobjectsinadataflowwithoutthenecessitytocreateaphysicaltableinthedatabasefirstandimportitintotheDSdatastore.Weusedthistypeoftargettablebefore,andIamtalkingabouttemplatetables.ObjectsthatweusedinthepreviousrecipeswhenwewantedDataServicestocreateaphysicaltargettableforusfromthemappingswedefinedinaQuerytransforminsideourETLcodeinadataflow.
NoteNotethatthetemplatetargettablehasanextratargettableoption;Dropandre-createtable.Bydefault,itistickedandgetsphysicallydroppedandrecreatedeachtimethedataflowruns.DataServicesgeneratesatabledefinitionfromtheoutputschemaofthelasttransformobjectinthedataflowlinkedtothetargettableobject.
Asyoucanseeinthefollowingfigure,youcanspecifymultipletargettables.Theygetpopulatedwiththesamedatasetcomingfromthesourcetable,andastheygetpopulatedfromthesameoutputschemaoftheQuerytransform,theyhavethesametabledefinitionformat:
Tocreateatemplatetable,youusetheright-handsidetoolmenuintheDesignerandthetemplatetableiconshowninthefollowingscreenshot:
Clickonthetemplatetablebuttoninthetoolmenuandthenontheemptyspaceinthedataflowworkspacetoplaceitasatargettableobject.
Specifythetemplatetablename,theDataServicesdatastorewhereitshouldbecreated,andthedatabaseowner(schema)namewherethetablegetscreatedphysicallywhenthedataflowisexecuted:
LoadingdataintoaflatfileThisrecipewillteachyouhowtoexportinformationfromatableintoaflatfileusingDataServices.
Flatfilesareapopularchoicewhenyouneedtostoredataexternallyforbackuppurposesorinordertotransferandfeeditintoanothersystemorevensendtoanothercompany.
Thesimplestfileformatusuallydescribesalistofcolumnsinspecificorderandadelimiterusedtoseparatefieldvalues.Inmostcases,it’sallyouneed.Asyouwillseeabitlater,DataServiceshasmanyextraconfigurationoptionsavailableintheFileFormatobject,allowingyoutoloadthecontentsoftheflatfilesintoadatabaseorexportthedatafromadatabasetabletoadelimitedtextfile.
Howtodoit…1. CreateanewdataflowandusetheEMPLOYEEtablefromtheOLTPdatastoreimported
earlierasasourceobject.2. LinkthesourcetablewithaQuerytransformanddrag-and-dropallsource
columnstotheoutputschemaformappingconfiguration.3. IntheQuerytransform,right-clickontheparentQueryitem,whichincludesall
outputmappingcolumns,andchoosetheCreateFileFormat…optionatthebottomoftheopenedcontextmenu:
4. ThemainFileFormatEditorwindowopens:
5. RefertothefollowingtableformoredetailsaboutFileFormatoptionsandtheircorrespondingvalues:
FileFormatoptions
Description Value
TypeSpecifiesthetypeofthefileformat:Delimited,FixedWidth,UnstructuredText,andsoon.
Inthisrecipe,wearecreatingplaintextfilewithrowfieldsseparatedbyacomma.ChoosetheDelimitedoption.
NameNameoftheFileFormat.NotethatthisisnotthenameofthefilethatwillbecreatedbutthegeneralnameoftheFileFormat.
TypeF_EMPLOYEE.
Location
Thisisthephysicallocationofthefilereferencedusingthisfileformat.Inourcase,thelocationsofJobServerandLocalarethesameasDataServicesinstalledonthesame ChooseJobServer.
machinewhereweexecutedourDesignerapplication.
Rootdirectory
Directorypathtothefile.Makesurethatthisdirectoryexists. TypeC:\AW\Files.
Filename(s)
Nameofthefilethatwereaddatafromorwriteinto. TypeHR_Employee.csv.
Delimiters|Column
Youcaneitherchoosefromexistingoptions:Tab,Semicolon,Comma,Space,orjusttypeinyourowncustomdelimiterasonecharacterorasequenceofcharacters.
ChooseComma.
Delimiters|Text
Youcanspecifywhetheryouwantcharactervaluestobewrappedinquotes/doublequotesornot.
Choose“.
Skiprowheader
Whenyoureadfromthefile,usethisoptiontoskiptherowheadersoitisnotconfusedasafirstdatarecord.
Wedonothavetochangethisoptionasitwouldnotmakeanyeffectbecausewearegoingtowritetoaflatfile,notreadfromit.
Writerowheader
Sameoptionsasthepreviousone,butforcaseswhenyouwriteintoafile.IfsettoYes,therowheaderwillbecreatedasafirstlineinthefile.IfNo,thefirstlineinthefilewillbeadatarecord.
ChooseYestocreatearowheaderwhenwritingtoafile.
6. ClickontheSave&ClosebuttontocloseFileFormatEditorandsavethenewFileFormat.
7. NowyoucanopentheLocalObjectLibrary|FormatstabandseeyournewlycreatedfileformatF_EMPLOYEE.
8. Openthedataflowworkspaceanddrag-and-dropthisfileformatfromtheLocalObjectLibrarytabtoadataflowandchoosetheMakeTarget…option.
9. LinkyourQuerytransformtoatargetfileobjectandvalidateyourdataflowtomakesurethattherearenoerrors.
10. Runthejob.YouwillseethatthefileHR_Employee.csvappearsinC:\AW\Filesandgetspopulatedwith292records(1headerrecord+291datarecords).
Howitworks…Fileformatconfigurationprovidesyouwithaflexiblesolutionforreadingdatafromandloadingdataintoflatfiles.Youcanevensetupautomaticdaterecognitionandconfigureanerrorhandlingmechanismtorejectrowsthatdonotfitintoadefinedfileformatstructure.
NotethateditingthefileformatfromLocalObjectLibraryandeditingitdirectlyfromthedataflowwhereitwasplacedtobeusedtoreadorwritefromflatfilesisnotthesame.Ifyouedititinsidethedataflow,youwillnoticethatsomefieldsintheFileFormatEditoraregrayedout.OpeningthesamefileformatforeditingfromLocalObjectLibrarymakesthosefieldsavailableforediting.Thishappensbecausewhenimportedinadataflow,theFileFormatobjectbecomesaninstanceoftheparentFileFormatobjectstoredinaLocalObjectLibrary.andbecauseallchangesappliedtoaninstanceinsideadataflowarenotpropagatedtootherinstancesofthisFileFormatobjectimportedintootherdataflows.Alternatively,whenyoumodifytheFileFormatdefinitioninLocalObjectLibrary,changesmadearepropagatedtoallinstancesofthisFileFormatobjectimportedtodifferentdataflowsacrossETLcode.
NoteSomefileformatconfigurationparameterscanbechangedonlyontheparentfileformatobjectinLocalObjectLibrary.
YoushouldalsokeepinmindthatexporttoaflatfileinDataServicesisquiteaforgivingprocess.Forexample,ifyourfileformathasthevarchar(2)characterfieldandyouaretryingtoexportalineof50characterstoafileinthisfield,DataServiceswillallowyoutodothat.InfactDataServicesdoesnotcaremuchaboutthecolumnsspecifiedinthefileformatatallifyouuseyourfileformattoexportdatatoaflatfile.Datadefinitionwillbesourcedfromtheoutputschemaoftheprecedingtransformationobjectlinkedtothetargetfileobject.
Importingfromaflatfileontheotherhandisaverystrictprocess.DataServiceswillrejecttherecordimmediatelyifitdoesnotfitthefileformatdefinition.
There’smore…TherearemorewaystocreateaFileFormatobjectthanshowninthisrecipe.Somearelistedhere:
CreatinginLocalObjectLibrary:OpentheFormatstabintheLocalObjectLibrarywindow,right-clickonFlatFiles,chooseNewfromthecontextmenu.YoucanusetheLocation,Rootdirectory,andFileName(s)optionstoautomaticallyimporttheformatfromanexternalfile.Otherwise,youwillhavetodefineallcolumnsandtheirdatatypesmanually,onebyone.ReplicatingafileformatfromanexistingFileFormatobjectinLocalObjectLibrary:OntheFormatstab,choosetheobjectyouwanttoreplicate,right-clickonit,andchoosetheReplicate…optioninthecontextmenu.
LoadingdatafromaflatfileYoucanusethesameFileFormatobjectcreatedinthepreviousrecipetoloaddatafromaflatfile.Inthefollowingsection,wewilltakeacloserlookatfileformatoptionsrelevanttoloadingdatafromthefiles.
Howtodoit…1. Createanewjobandanewdataflowobjectinit.2. Createanewtextfile,Friends_30052015.txt,withthefollowinglinesinsideit:
NAME|DOB|HEIGHT|HOBBY
JANE|12.05.1985|176|HIKING
JOHN|07-08-1982|182|FOOTBALL
STEVE|01.09.1976|152|SLEEPING|10
DAVE|27.12.1983|AB5
3. GotoLocalObjectLibraryandcreateanewfileformatbyright-clickingonFlatFilesandchoosingNew.
4. PopulatetheFileFormatoptionsasshowninthefollowingscreenshot:
Delimiters|Columnissetto|inthiscaseasourfilehasthepipeasadelimiter.NULLIndicatorwassettoNULL,whichmeansthatonlyNULLvaluesintheincomingfileareinterpretedasNULLwhenreadbyDataServices.Theother“empty”valueswillbeinterpretedasemptystrings.Dateformatissettodd.mm.yyyyaswespecifiedinthefileformatthatweare
loadingtheDOB(DateofBirth)columnofthedatedatatype.ImaginethatyouhaveconfiguredtheQuerytransformmappingforthatcolumnusingtheto_date(<date>,‘dd.mm.yyyy’)function.SkiprowheaderissettoYesinordertospecifythatthefilehasaheaderrowwhichhastobeskipped.ThenwesetalloptionsrelatedtoerrorcapturingtoYestocatchallpossibleerrors.Writeerrorstoafileallowsyoutorecordtherejectedrecordsinaseparatefileforfurtheranalysis.WewillbewritingthemtotheFriends_rejected.txtfile.
5. Importthisfileformatobjectasadatasourceinyournewlycreateddataflow.6. MapallsourcecolumnstoaQuerytransformandcreatethetargettemplatetable
FRIENDSintheDS_STAGEdatastore.7. Saveandrunthejob.
Asaresult,youcanseethattworecordswererejected,onebecauseofanextracolumnintherowandtheotherbecausetherowhadonecolumnlessthanthedefinedfileformat.
Thecontentsofyourtargettableshouldlooklikethis:
Youcanseethatsomelinesaremissinghereduetotheerrorsintheinputfile.
What’sinterestingisthatDataServiceswassmartenoughtocorrectlyrecognizeandconvertthedateofbirthforJOHN.Rememberitwas07-08-1982inthefileandthedateformatwespecifiedwasdd.mm.yyyy.
Howitworks…Asyoucansee,mostofthefileformatoptionswehaveusedareusefulforvalidatingthecontentsofthesourcedatafileinordertorejectrecordswithdataofincorrectdatatypeorformat.
Themainquestionyouhavetoaskyourselfiswhetheryouwantalltheserecordstoberejected.Thealternativemightbetobuildadataflowthatloadsallrecordsofthevarchardatatypeandtriestocleanseandconvertincorrectvaluestoanacceptableformat,orputsthedefaultvalueinsteadofawrongonetomarkthefield.Sometimesyoudonotwanttolosethewholerecordifjustonevalueisincorrect.
Now,let’sfixthe“numberofcolumns”probleminthesourcefiletoseehowDataServicesdealswithconversionproblems.Doyourememberweputthecharactersymbolinoneoftheintegerdatatypefields?
ChangetherecordsforSteveandDavetothefollowinglinesandrerunthejob:STEVE|1976.01.01|152|SLEEPING
DAVE|27.12.1983|AB5|DREAMING
Bothrecordsarerejectedwiththefollowingerrormessagesappearingintheerrorlogfilewhenyouexecutethejob:
YoucanseethosemessagesinthejoberrorlogandintheFriends_rejected.txtfilealongwiththerejectedrecordsthemselves.Thenameandlocationoftherejectfileisdefinedbytwofileformatoptions:ErrorfilerootdirectoryandErrorfilename.Theybecomeavailablewhenyouopenthefileformatobjectinstanceforeditingfromwithinadataflow:
Aswejuststated,inordertoloadthoserecords,youshouldputinsomeextradevelopmenteffortandcreatealogicinyourdataflowtodealwithallpossiblescenariosinordertocleanseandcorrectlyconvertthedata,andofcourseyoushouldamendthefileformat,changingalldatatypestovarcharinordertopassthoserecordsthroughforfurthercleansing.
NoteYoucanusemasksintheFilename(s)optionwhenconfiguringthefileobjectinyourdataflow.Forexample,specifyinginvoice_*.csvasafilenamewillallowyoutoloadbothinvoice_number_1.csvandinvoice_number_2.csvfilesinasingleexecutionofthedataflow.Theywillbeloadedoneafteranother.
There’smore…TrytoexperimentfurtherwiththecontentsoftheFriends_30052015.txtfilebyaddingextrarowswithdifferentdatatypestoseewhethertheywillberejectedorloaded,andwhicherrormessagesyouwillgetfromDataServices.
Loadingdatafromtabletotable–lookupsandjoinsWhenyouspecifyarelationalsourcetableinthedataflow,DataServicesexecutessimpleSQLSELECTstatementsinthebackgroundtofetchthedata.Ifyouwantto,youcanseethelistofstatementsexecutedforeachsourcetable.Inthisrecipe,weexplorewhathappensunderthehoodwhenyouaddmultiplesourcetablesandhowDataServicesoptimizestheextractionofthedatafromthesesourcetablesandevenjoinsthemtogether,executingcomplexSQLqueriesinsteadofmultipleSELECT*FROM<table>.
Howtodoit…Inthisrecipe,wewillextractaperson’sname,address,andphonenumberfromthesourceOLTPdatabaseandpopulateanewstagetablePERSON_DETAILSwiththisdataset.
1. Createanewjobandanewdataflow.Specifyyourownnamesforthecreated
objects.2. Toextracttherequireddata,youwillneedtoimportthetablesPERSON,ADDRESS,and
BUSINESSENTITYADDRESS(whichisatablelinkingthefirsttwo)intoyoursourceOLTPdatastore.AllthesetablesarelocatedinthePersonschemaoftheAdventureWorks_OLTPdatabase.
3. Placetheimportedtablesassourceobjectsinyourdataflow,asshowninthefollowingfigure,andlinkthemwiththeQuerytransform.InsertthetargettemplatetablePERSON_DETAILStobecreatedintheDS_STAGEdatastore:
4. Tosettherequiredjoinconditions,youshouldusetheJoinpairssectionlocatedontheFROMtaboftheQuerytransform.Inthisexample,thesejoinconditionsshouldbegeneratedautomaticallyassoonasyouopentheQuerytransform.Iftheyweren’t,youcanclickontheiconwithtwointersectinggreencircleswiththehintClicktoproposejointogeneratethem,orclickontheJoinConditionfieldandtyperequiredjoinconditionsmanuallyforeachtablepairtocreatejoinconditionsmanually.PleaseusethefollowingscreenshotasareferencetocreatetwoInnerjoinpairs(PERSON-BUSINESSENTITYADDRESSandBUSINESSENTITYADDRESS-ADDRESS):
5. Atthispoint,youareabletoseewhichSQLstatementDSusestoextracttherequiredinformationbychoosingValidation|DisplayOptimizedSQLfromthemainmenu.ItopensthefollowingwindowshowingyouthenumberofdatastoresqueriedinthewindowontheleftandthefullSELECTstatementexecutedineachofthemontheright:
6. Weforgottoaddcountryinformationforeachperson.ItlooksliketheAddresstablehasonlystreetinformationandcitybutnocountryorstatedata.ImportanothertwotablesintheOLTPdatastore:STATEPROVINCEandCOUNTRYREGION.
7. Addthemassourcetablesinthedataflow,butdonotjointhemtoalreadyexistingonesinthesameQuerytransform.CreateanotherQuerytransformandcallitGet_Country.UseittojointheQuerydatasetwithtwonewsourcetables,asshowninthefollowingfigure:
8. AddtwonewcolumnmappingsintheGet_CountryQuerytransform:BUSINESSENTITYID,mappedfromthefieldwiththesamenamefromtheQueryinputschema,andtheCOUNTRYcolumn,mappedfromtheNAMEcolumnoftheCOUNTRYREGIONtableinputschema.
9. IfyoucheckValidation|DisplayOptimizedSQLagain,youwillseethattheSQLstatementhaschanged,nowincludingtwonewtables:
10. WestillhavemissingphoneinformationforourPERSON_DETAILStables.AddathirdQuerytransformontherightandcallitLookup_Phone.Tolookforthephoneinformation,wewillusethelookup_ext()functionexecutedfromafunctioncallwithinaQuerytransform.Thefunctionlookup_ext()ismostcommonlyusedincolumnmappingstoperformthelookupoperationforthevaluesfromothertables.
11. OpentheLookup_PhoneQuerytransformandmapallsourcecolumnstothetargetonesexceptfortheBUSINESSENTITYIDandADDRESSLINE2columns(wearenotgoingtopropagatethose).
12. Right-clickonthelastmappedcolumninthetargetschema(shouldbeCOUNTRY)andselecttheoptionNewFunctionCall…fromthecontextmenu:
13. ChooseInsertBelow…,andintheopenedSelectFunctionwindow,chooseLookupFunctions|lookup_ext
14. TheopenedLookup_ext|SelectParameterswindowallowsyoutosetlookupparametersforthetableyouwanttoextractinformationfrom.Rememberthatthisisbasicallyaformofajoin,soyouwouldhavetospecifythejoinconditionsoftheinputdatasettothelookuptable.Inourcase,thelookuptableisPERSONPHONE.IfyoudidnotimportitearlierinyourOLTPdatastore,pleasedothatnow.Usethelookupparameterdetailsshowninthefollowingscreenshot:
15. AfteryouclickonFinish,yourtargetschemaintheLookup_Phonetransformshouldlooklikethis:
16. ItsohappensthatthePHONENUMBERfieldwehaveextractedfromthelookuptableisakeycolumninthattable.DataServicesautomaticallydefineskeycolumnsfromsourcetablesintheQuerytransformasprimarykeysaswell.Tochangethisandmakesurethatourfinaldatasetdoesnotincludeduplicates,wearegoingtocreatealastQuerytransformandnameitDistinct.LinkitontherighttotheLookup_Phonetransformandopenitchoosingthefollowingoptions:
TochangethePHONENUMBERcolumnfrombeingaprimarykey,double-clickonthecolumninthetargetschemaanduncheckthePrimarykeyoption.Togetridoftheduplicatefields,opentheSELECTtabandcheckDistinctrows.
17. Saveandrunthejobandviewthedatausingthedataflowtargettableoption:
Asthefinalstep,importthetemplatetablePERSON_DETAILSsoitisconvertedintothenormaltableobjectinsidetheDS_STAGEdatastore.Todothat,right-clickonthetableeitherinLocalObjectLibraryorinsidethedataflowworkspace,asshowninthefollowingscreenshot,andchoosetheImportTableoptionfromtheobject’scontextmenu:
Howitworks…YouhaveseenanexampleofhowmultipletablescanbejoinedinDataServices.TheQuerytransformrepresentsthetraditionalSQLSELECTstatementwiththeabilitytogrouptheincomingdataset,usevariousjoinconditions(INNER,LEFT,orOUTER),usetheDISTINCToperator,sortdata(ontheORDERBYtab),andapplyfilteringconditionsintheWHEREtab.
TheDataServicesoptimizertriestobuildasfewSQLstatementsaspossibleinordertoextractthesourcedatabyjoiningtablesinacomplexSELECTstatement.Inafuturechapters,wewillseewhichfactorspreventthepropagationofdataflowlogictoadatabaselevel.
Wehavealsotriedtouseafunctioncallinthemappingsinordertojoinatabletoextractadditionaldata.ItwouldbeperfectlyvalidtoimportthePERSONPHONEtableasasourcetableandjoinitwiththerestofthetableswiththehelpoftheQuerytransform,butusingthelookup_ext()functionsgivesyouagreatadvantage.Italwaysreturnsonlyonerecordfromthelookuptableforeachrecordwelookupvaluesfor.WhereasjoiningwithaQuerytransformdoesnotpreventyoufromgettingduplicatedormultiplerecordsinthesamewayasifyouhavejoinedtwotablesinstandardSQLquery.Ofcourse,ifyouwantyourQuerytransformtobehaveexactlylikeaSELECTstatementjoiningtablesinthedatabase,producingmultipleoutputrecordsforeachlookuprecord,thelookup_ext()functionshouldnotbeused.
IfyouarewritingacomplexSQLSELECTstatement,youareprobablyawarethatjoiningmultipletablescanleadtoduplicaterecordsintheresultdataset.Thisdoesnotnecessarilymeanthatjoinsareincorrectlyspecified.Sometimesitistherequiredbehavior,oritcanbeadatabasedesignproblemorsimplythepresenceof“dirty”datainoneofthesourcetables.
Thefunctionlookup_ext()makessurethatifitfindsmultiplerecordsinthelookuptableforyoursourcerecord,itpicksonlyonevalueaccordingtothemethodspecifiedintheReturnpolicyfieldoftheLookup_extparameterswindow:
NoteThemaindisadvantagesofusingthelookup_ext()functionarelowtransparencyoftheETLcode—asitishiddeninsidetheQuerytransform—andthefactthatlookup_ext()functionspreventthepropagationofexecutionlogictoadatabaselevel.DataServicesalwaysextractsthefulltablespecifiedasthelookuptableinthelookup_ext()functionparameters.
Dependingonwhichversionoftheproductisusedandondatabaseenvironmentconfiguration,DataServicescanautomaticallygeneratealljoinconditionswhenyoujointablesintheQuerytransformandspecifyjoinpairs.Thisisbecause,whenyouimportthesourcetablestoadatastore,DataServicesimportsnotjusttabledefinitionsbutalsoinformationaboutprimarykeys,indexes,andothertablemetadata.So,ifDataServicesseesthatyouarejoiningtwotableswithidenticallynamedfieldswhicharemarkedasprimaryorforeignkeysonthedatabaselevel,itautomaticallyassumesthatthosetablescanbejoinedusingthosekeyfields.
KeepthatinmindthatifbusinessrulesorETLlogicdictatesjoinconditionstobedifferentfromwhatDataServicesautomaticallyproducesandyouhavetomodifythosevaluesinQuerytransformlogicorevenwriteyourownjoinconditionsbymanuallyenteringthem.
UsingtheMap_OperationtransformHereweexploreaveryinterestingtransformationavailabletoyouinDataServices.Infact,itdoesnotperformanytransformationofdataperse.WhatitdoesisthatitchangesthetypeoftheSQLDataModificationLanguage(DML)operationthatshouldbeappliedtotherowwhenitreachesthetargetobject.Asyouprobablyknowalready,theDMLoperationsinSQLlanguagearetheoperationswhichmodifythedata,inotherwords,theINSERT,UPDATE,andDELETEstatements.
FirstwewillseetheeffectMap_Operationhaswhenusedinadataflow,andthenwewillexplainindetailhowitworks.Inafewwords,theMap_OperationtransformallowsyoutochoosewhatDataServiceswilldowiththemigratedrowwhenpassingitfromMap_Operationtothenexttransform.Map_Operationassignsoneofthefourstatusestoeachrecordpassingthrough:normal,insert,update,ordelete.Bydefault,themajorityoftransformsinDataServicesproducerecordswithanormalstatus.Thismeansthattherecordwillbeinsertedwhenitreachesthetargettableobjectinadataflow.WithMap_Operation,youcancontrolthisbehavior.
Howtodoit…Inthisexercise,wearegoingtoslightlychangethecontentsofourPERSON_DETAILStable.WewillchangecountryvaluesforrecordsbelongingtoSamanthaSmithfromUnitedStatestoUSAandremovetherecordsforthesamepersonwithUnitedKingdomasthecountry.Thatmeanswewillspecifythesametablebothasasourceandasatarget:
1. CreateanewjobandnewdataflowobjectandplacethePERSON_DETAILStablefrom
theDS_STAGEdatastoreasasourcetable.2. JointhesourcetabletoanewQuerytransformnamedGet_Samantha_Smith.Mapall
columnsfromsourcetotargetandspecifyfilteringconditions,asshowninthefollowingscreenshot.Also,double-clickoneachofthethreecolumns,FIRSTNAME,LASTNAME,andADDRESSLINE1,todefinethemasprimarykeycolumns:
3. SplitthedataflowintwobycreatingtwonewQuerytransforms:USandUK.LinkthemtotwoMap_OperationtransformsimportedfromLocalObjectLibrary|Transforms|Platform|Map_Operationnamedupdateanddeleterespectively.ThenmergethedataflowstogetherwiththeMergetransform,whichcanbefoundinthesamePlatformcategory,andfinallylinkittothesametablePERSON_DETAILSspecifiedasatargettableobject.TheMergetransformdoesnotperformanytransformationsordoesnothaveanyconfigurationoptionsasitsimplymergestwodatasetstogether(liketheUNIONoperationinSQL).Ofcourse,inputschemaformatsshouldbeidenticalfortheMergetransformtowork.Seewhatthedataflowshouldlooklikeinthefollowingfigure:
4. IntheUStransform,mapallkeycolumnsandtheCOUNTRYcolumntotargetandchangemappingforCOUNTRYtoahardcodedvalue,USA.Mostimportantly,specifyGet_Samantha_Smith.COUNTRY=‘UnitedStates’intheWHEREtabtoselectonlyUnitedStatesrecords:
5. IntheUKtransform,maponlykeycolumnsandtheCOUNTRYcolumntotargetitaswellandputGet_Samantha_Smith.COUNTRY=‘UnitedKingdom’intheWHEREtab:
6. NowwehavetotellDataServicesthatwewanttoupdateonesetofrecordsanddeletetheother.Double-clickonyourupdateMap_Operationtransformandsetupthefollowingoptions:
Bydoingthis,wechangerowtypesfornormalrows(theQuerytransformproducesrowsofnormaltype)toupdate.ThismeansthatDataServiceswillexecuteanUPDATEstatementforthoserowsonthetargettable.
7. RepeatthesamefortheDeleteMap_Operationtransformbutnowchangenormaltodeleteanddiscardtherestoftherowtypes:
8. ForDataServicestocorrectlyperformanupdateanddeleteoperations,wehavetodefinethecorrecttargettablekeycolumns.Double-clickonatargettableobjectPERSON_DETAILSinthedataflowandchangeUseinputkeystoYesintheOptionstab.ThattellsDataServicestoconsiderprimarykeyinformationfromthesourcedatasetratherthanusingthetargettableprimarykeys:
9. Beforeexecutingthejob,let’scheckwhatourdatalookslikeinthePERSON_DETAILStableforSamanthaSmith.ClickontheViewdatabuttoninthetargettableandapplyfiltersbyclickingontheFiltersbutton.SpecifyfiltersintheFIRSTNAMEandLASTNAMEcolumnsandchecktherecords:
10. Setthefilters:
11. Thisiswhatthedatainthetablelookslikebeforejobexecution:
12. Runthejobandviewthedatausingthesamefilterstoseetheresult:
Howitworks…ThisisthekindoftaskthatwouldbemucheasiertoaccomplishwiththefollowingtwoSQLstatements:updatedbo.person_detailssetcountry=‘USA’
wherefirstname=‘Samantha’andlastname=‘Smith’andcountry=‘UnitedStates’;
deletefromdbo.person_details
wherefirstname=‘Samantha’andlastname=‘Smith’andcountry=‘UnitedKingdom’;
Butforus,thisexampleperfectlyillustrateswhatcanbedonewiththeuseoftheMap_OperationtransforminDataServices.
Eachrowpassedfromthesourcetoatargettableinadataflowthroughvarioustransformationobjectscanbeassignedoneofthefourtypes:normal,insert,update,anddelete.
Sometransformationscanchangethetypeoftherow,whileothersjustbehavedifferently,dependingonwhichtypetheincomingrowhas.Forthetargettableobject,thetypeoftherowdefineswhichDMLinstructionithastoexecuteonthetargettableusingsourcerowdata.Thisislistedasfollows:
insert:Iftherowcomeswithnormalorinserttype,DataServicesexecutestheINSERTstatementinordertoinsertthesourcerowintoatargettable.Itwillcheckthekeycolumnsdefinedonatargettableinordertocheckforduplicatesandpreventthemfrombeinginserted.update:Ifarowismarkedasanupdate,DataServicesdeterminesthekeycolumnsitwillusetofindthecorrespondingrecordinthetargettableandupdatesallnon-keycolumnvaluesofthetargettablerecordwiththevaluesfromthesourcerecord.delete:DataServicesdeterminesthekeycolumnstolinksourcerowsmarkedwiththedeletetypewithcorrespondingtargetrow(s)andthendeletestherowfoundinthetargettable.normal:Thisistreatedasaninsertwhentherowcomestoafinaltargettableobject.ItisthedefaulttypeofrowproducedbytheQuerytransformandthemajorityofothertransformsinDataServices.
WhattheMap_Operationtransformallowsyoutodoistochangethetypeoftheincomingrow.Thisallowsyoutoimplementsophisticatedlogicinyourdataflows,makingyourdatatransformationextremelyflexible.
NoteDefiningprimarykeysinDataServicesobjects,suchasQuerytransforms,tableandviewobjects,importedindatastoresdoesnotcreatethesameprimarykeyconstraintsforthecorrespondenttablesonthedatabaselevel.Ifyouhavethemdefinedonthedatabaselevel,theywillbeimportedalongwiththetabledefinitionandwillappearinDataServicesautomatically.Otherwise,youdefineprimarykeycolumnsmanuallytohelpDataServicestoefficientlyandcorrectlyprocessthedata.ManyDataServicestransformsandtargetobjectsrelyonthisinformationtocorrectlyprocessthepassingrecords.
SettingOutputrowtypetoDiscardinMap_Operationforaspecificinputrowtypewillcompletelyblocktherowsofthechosentype,notlettingthempassthroughtheMap_Operationtransform.Thisisagreatwaytomakesurethatyourdataflowdoesnotperformanyunexpectedinsertswhenitshould,forexample,alwaysonlyupdatethetargettable.
Notehowourtargettableinthisrecipedoesnothavetheprimarykeyconstraintsspecifiedatthedatabaselevel.ItsohappensthatweanalyzedthedatainthePERSON_DETAILStableandknowthattheFIRSTNAME,LASTNAME,andADDRESSLINEcolumnsdefinetheuniquenessoftherecord.Thatiswhy,wemanuallyspecifythemasprimarykeysinDataServicestransformsandusetheUpdatecontroloptionUseinputkeysonthetargettableobjectsoitknowswheretogetinformationregardingkeycolumnstoperformthecorrectexecutionoftheINSERT,UPDATE,andDELETEstatements.IncaseofUPDATE,allnon-keycolumnswillbeupdateswiththevaluesfromthesourcerow.ThatiswhywepropagatedonlytheCOUNTRYcolumnaswewantedtoupdateonlythisfield.IncaseofDELETE,thesetofnon-keycolumnsdoesnotmattermuchasonlysourcekeycolumnswillbeconsideredinordertofindthetargetrowtodelete.
TheotheroptionwouldbetomodifythetableobjectPERSON_DETAILSindatastoreandspecifyprimarykeysthere(seethefollowingscreenshot).Inthatcase,wewouldnothavetodefinekeysinthetransformsandusethetargettableloadingoptionasDataServiceswouldpickupthisinformationfromthetargettableobject.Todothat,expandthedatastoreobjectanddouble-clickonthetabletoopenthetableeditor,thendouble-clickonthecolumnandcheckPrimarykeyinthenewlyopenedwindow:
UsingtheTable_ComparisontransformTheTable_ComparisontransformcomparesadatasetgeneratedinsideadataflowtoatargettabledatasetandchangesthestatusesofdatasetrowstodifferenttypesaccordingtotheconditionsspecifiedintheTable_Comparisontransform.
DataServicesusesprimarykeyvaluesfortherowcomparisonandmarksthepassingrowaccordinglyas:aninsertrow,whichdoesnotexistinthetargettableyet;anupdaterow,therowforwhichprimarykeyvaluesexistinthetargettablebutwhosenon-primarykeyfields(orcomparisonfields)havedifferentvalues;andfinally,adeleterow(whenthetargetdatasethasrowswithprimarykeyvaluesthatdonotexistinthesourcedatasetgeneratedinsideadataflow).Insomeway,Table_ComparisondoesexactlythesamethingasMap_Operation:itchangestherowtypeofpassingrowsfromnormaltoinsert,update,ordelete.Thedifferenceisthatitdoesitinasmartway—aftercomparingthedatasettothetargettable.
GettingreadyInordertopreparethesourcedataintheOLTPsystemforthisrecipe,pleaseexecutethefollowingUPDATEintheAdventureWorks_OLTPdatabase.Itonlyupdatesonerowinthetable.updateProduction.ProductDescriptionsetDescription=‘EnhancedChromolysteel.’
whereDescription=‘Chromolysteel.’;
WeperformedthismodificationofthesourcedatasowecanusethischangetodemonstratethecapabilitiesoftheTable_Comparisontransform.
Howtodoit…Ourgoalinthisrecipeissimple.YourememberthatourDWHdatabasesourcesdatafromtheOLTPdatabase.OneofthetablesinthetargetDWHdatabaseweareinterestedinrightnowistheDimProducttable,whichisadimensiontablethatholdstheinformationaboutallcompanyproducts.Inthisrecipe,wearegoingtobuildajob,whichifexecuted,willchecktheproductdescriptionswithinsourceOLTPtables,andifnecessary,willapplyanychangestotheproductdescriptioninourdatawarehousetableDimProduct.
Thisisasmallexampleofpropagatingdatachangeshappeninginthesourcesystemstothedatawarehousetables.
Asanexample,imaginethatweneedtochangethenameofoneofthematerialsusedtoproduceoneofourproducts.InsteadoftheEnglishdescription“Chromolysteel”,wehavetouse“EnhancedChromolysteel”now.PeopleworkingwiththeOLTPdatabaseviaapplicationssystemshavealreadymadetherequiredchange,andnowitisourresponsibilitytodevelopanETLcodethatpropagatesthischangefromthesourcetothetargetdatawarehousetables.
1. Createanewjobwithonedataflow,sourcingdatafromthefollowingOLTPtables
(Productionschema):Product:Thisisatablecontainingproductswithsomeinformation(price,color,andsoon)ProductDescription:ThisisatablecontainingproductdescriptionsProductModelProductDescriptionCulture:Thisisalinkingtable,whichholdsthekeyreferencesofbothProductandProductDescriptiontables
2. Ifyoudonothavethesetablesimportedalreadyintoyourdatastore,pleasedothatinordertobeabletoreferencethemwithinyourdataflowobject.
3. AddaDimProducttablefromDWHasasourcetable.Yes,donotbesurprised,wearegoingtousethesametableasasourceandasatargetwithinthesamedataflow.TheTable_Comparisontransformwillcomparetwodatasets:thesourcedataset,whichisbasedontheDimProducttablemodifiedwiththehelpofthesourceOLTPtablesandthetargetdatasetoftheDimProducttableitself.
4. CreateanewJoinQuerytransformandmodifyitspropertiestojoinallfourtables,asshowninthefollowingscreenshot:
YoucanseethatweusetheProductandProductModelProductDescriptionCulturetablesjusttolinktheProductDescriptiontabletoourtargetDimProducttableinordertogetadatasetofDimProductprimarykeyvaluesandthecorrespondingEnglishdescriptionvaluesforspecificproducts.
5. NexttoyourJoinQuerytransform,placetheTable_Comparisontransform,whichcanbefoundinLocalObjectLibrary|Transforms|DataIntegrator|Table_Comparison.
6. OpentheTable_Comparisoneditorintheworkspaceandspecifythefollowingparameters:
7. Then,placetheMap_OperationtransformcalledMO_Updateanddiscardallrowsofnormal,insert,anddeletetypes,lettingthroughonlyrowswiththeupdatestatus:
8. Finally,linkMO_UpdatetothetargetDimProducttableandcheckwhetheryourdataflowlookslikethefollowingfigure:
Now,savethejobandexecuteit.Then,runthefollowingcommandinSQLServerManagementStudiotochecktheresultdataintheDimProducttable:selectEnglishDescriptionfromdbo.DimProductwhereEnglishDescriptionlike’%Chromolysteel%’;
Youshouldgetthefollowingresultingvalue:EnhancedChromolysteel
Howitworks…ToseewhatexactlyishappeningwiththedatasetbeforeandaftertheTable_Comparisontransform,replicateyourdataflowandchangethecopyinthefollowingmanner:
HerewedumptheresultoftheJoinQuerytransforminthetemporarytabletoseewhichdatasetwecomparetotheDimProducttableinsidetheTable_Comparisontransform.
ExtraMap_OperationtransformsallowustocapturerowsofdifferenttypescomingoutofTable_Comparison.UsingMap_Operation,weconvertallofthemtonormaltypeinordertoinsertthemintotemporarytablestoseewhichrowswereassignedwhichrowtypesbytheTable_Comparisontransform:
NoteAddingmultipletargettemplatetablesafteryourtransformationsisaverypopularmethodofdebugginginETLdevelopment.Itallowsyoutoseeexactlyhowyourdatasetlooksaftereachtransformation.
Let’sseewhatisgoingoninourETLbyanalyzingthedatainsertedintothetemporarytargettables.
ThePRODUCT_TEST_COMPAREtablecontainstherowsstartingfromProductKey=210.ThisissimplybecauseProductKeys<210intheDimProducttabledoesnothaveEnglishdescriptionsinthesourcesystem.
ThePRODUCT_DESC_INSERTtableisempty.Table_ComparisonusestheprimarykeyspecifiedintheInputprimarykeycolumnssectiontoidentifynewrowsintheinputdatasetthatdonotexistinthespecifiedcomparisontable,DWH.DBO.DIMPRODUCT.AsweusedtheDimProducttableasasourceofthePRODUCTKEYvalues,therecouldn’tbeanynewvaluesofcourse.Sonorowswereassignedtheinserttype.
PRODUCT_DESC_UPDATEcontainsexactlyonerowwithanewENGLISHDESCRIPTIONvalue:
Asyoucansee,therestoftherowfieldsDataServiceshassourcedfromthecomparison
table.AllofthemexceptforthecolumnspecifiedintheComparecolumnssectionoftheTable_Comparisontransform.
ThePRODUCT_DESC_DELETEtable,ontheotherhand,hasalotofrecords.Thosearethetargetrecords(fromcomparisontableDimProduct)forwhichprimarykeyvaluesdonotexistinthedatasetcomingtoaTable_ComparisontransformfromaJoinQuerytransform.Asyoumayremember,thosearerecordsthatdonothaveEnglishdescriptionrecordsinthesourcetables.ThisisanoptionalfeatureofTable_Comparison.DataServiceswilluseprimarykeyvaluesofthoserecordstoexecutetheDELETEstatementonthetargettable.YoucaneasilypreventdeleterowsfrombeinggeneratedbycheckingtheDetectdeletedrow(s)fromcomparisontableoptionintheTable_Comparisontransform.
NoteTheFiltersectionofTable_Comparisonallowsyoutoapplyadditionalfiltersonthecomparisontableinordertorestrictthenumberofrowsyouarecomparing.Thisisveryusefulifyourcomparisontableislarge.ThisallowsoptimizingtheresourcesconsumedbyDataServicesinordertoextractandstorethecomparisondatasetandalsospeedsupthecomparisonprocessitself.
ExploringtheAutocorrectloadoptionTheAutocorrectloadoptionisaconvenientmeansDataServicesprovidesforpreventingtheinsertionofduplicatesintoyourtargettable.Thisisthemethodofinsertingdataintoatargettableobjectinsidethedataflow.ItcaneasilybeconfiguredbysettingthetargettableoptiontoYes,withnomoreconfigurationrequired.Thisrecipedescribesdetailsregardingtheusageofthisloadmethod.
GettingreadyForthisrecipe,wewillcreateanewtableintheSTAGEdatabaseandpopulateitwithalistofcurrenciesfromtheDimCurrencydimensiontableintheAdventureWorks_DWHdatawarehouse.
ExecutethefollowingstatementsinSQLServerManagementStudio:SELECTCurrencyAlternateKey,CurrencyName
INTOSTAGE.dbo.NewCurrency
FROMAdventureWorks_DWH.dbo.DimCurrency;
ALTERTABLESTAGE.dbo.NewCurrency
ADDPRIMARYKEY(CurrencyAlternateKey);
WewillusetheAutocorrectloadoptiontomakesurethatourdataflowdoesnotinsertrowsalreadyexistinginthetargettable.
Howtodoit…First,wearegoingtodesignthedataflowthatwillpopulatethetargettableNewCurrency.
Inthedataflow,wewillusetheRow_Generationtransformtogeneratethreenewrows,eachfordifferentcurrencies,andtrytoinsertitintothepreviouslycreatedcurrencystagetableNewCurrency.TheNewCurrencytablealreadyhassomedataprepopulatedfromtheDimCurrencytable.ThatisrequiredifwewanttotesttheAutocorrectloadoption.
ThefirstgeneratedrowwillbeforEURcurrency(theCURRENCYALTERNATEKEYcolumn),whichalreadyexistsinatargettablebutwithadifferentcurrencyname:CURRENCYNAME=‘NEWEURO’.
Thesecondgeneratedrowwillbeanewcurrencywhichdoesnotexistinthetableyet:‘CRO’withCURRENCYNAME=‘CROWN’.
Thethirdgeneratedrowwillbe‘NZD’withCURRENCYNAME=‘NewZealandDollar’,matchingbothvaluesinfieldsCURRENCYALTERNATEKEYandCURRENCYNAMEoftheexistingrecordinNewCurrencytable.
1. Createanewjobandanewdataflow,pickingyourownnamesforthecreated
objects.2. Openthedataflowintheworkspacewindowtoedititandaddthreenew
Row_Generationtransforms,whichwewilluseasasourceofdatawithdefaultparameters.Bydefault,thistransformobjectgeneratesonerowwithasingleIDcolumnpopulatedwithintegervaluesstartingwith0.NamethethreenewlyaddedRow_GenerationtransformsGenerate_EURO,Generate_NZD,andGenerate_CROWN:
3. LinkeachRow_GenerationtransformtoarespectiveQuerytransformtocreateanoutputschemamatchingthetargettableschemawithtwocolumns:CURRENCYALTERNATEKEYandCURRENCYNAME.SeetheexampleforEUROshowninthefollowingscreenshot:
TheothertwoareCRO(CROWN)andNZD(NewZealandDollar)
4. Finally,mergethesethreerowsintoonedatasetwiththehelpoftheMergetransform(LocalObjectLibrary|Transforms|Platform|Merge).
5. MaptheMergetransformoutputtoQuerytransformcolumnswiththesamenamesandlinkQuerytothetargettableNewCurrencypreviouslyimportedintotheDS_STAGEdatastore.
6. CheckthetargetdataintheNewCurrencytablebeforerunningthiscode.ApplyfiltersinaViewDatawindowofthetargettable,asshowninthefollowingscreenshot,toseetheexistingrowsweareinterestedin:
YoucanseethatwehavetworecordsinthetargettableforEURandNZD.
7. Saveandrunthejob.Youshouldgetthefollowingerrormessage:
RecallhowweappliedtheprimarykeyconstraintontheNewCurrencytable.TheDataServicesjobfailsinanattempttoinsertrowswiththeprimarykeyvaluesthatalreadyexistinthetargettable.
8. NowtoenabletheAutocorrectloadoption,openthetargettableeditorintheworkspace.OntheOptionstab,changeAutocorrectloadtoYes:
9. Nowsavethejobandrunitagain.Itrunswithouterrors,andifyoubrowsethedatainthetargettableusingthesamefiltersasbefore,youwillseethatthenewCROcurrencyappearsinthelistandtheEURcurrencyhasanewcurrencyname:
Howitworks…PreventingduplicatedatafrombeinginsertedisoftenoneoftheresponsibilitiesoftheETLsolution.Inthisexample,wecreatedaconstraintobjectonourtargettable,delegatingcontroltothedatabaselevel.Butthisisnotacommonpracticeinmoderndatawarehouses.
Ifnotforthatconstraint,wewouldsuccessfullyhaveinsertedduplicaterowsonthefirstattemptandourjobwouldnotfail.ThebeautyoftheAutocorrectloadoptionisitssimplicity.Allittakesistosetupasingleoptiononatargetobject.Whenthisoptionisenabled,DataServicescheckseachrowbeforeinsertingittoatargettable.
Iftargettablehasarowidenticaltotheincomingdataflowrow,thentherowissimplydiscarded.Ifthetargettablehastherowwiththesameprimarykeyvaluesbutdifferentvaluesinoneormorecolumns,theDataServicesexecutestheUPDATEstatement,updatingallnon-primarykeycolumns.Andfinally,ifthetargettabledoesnothavetherowwiththesameprimarykeyvalues,theDataServicesexecutestheINSERTstatement,insertingtherowintothetargettable.
Youcanbuildadataflowwiththesamelogic,preventingduplicatesfrombeinginsertedbyusingtheTable_Comparisontransform.AutocorrectloadperformsthecomparisonbetweenthedataflowdatasetandthetargettabledatasetjustaswellasTable_Comparisondoes.BothmethodsproduceINSERT/UPDATErowtypes.TheonlydifferenceisthatAutocorrectloadcannotperformthedeletionoftargettablerecords.Thus,themainpurposeoftheAutocorrectloadoptionistoprovideyouwithasimpleandefficientmethodofprotectingyourtargetdatafromincomingduplicaterecords.
WealsousedtheMergetransforminthisrecipe.TheMergetransformdoesthesamethingastheSQLUNIONoperatorandhasthesamerequirements:thedatasetsshouldhavethesameformatinordertobesuccessfullymerged:
MergeisoftenusedincombinationwithTable_Comparison.First,yousplityourrows,assigningthemdifferentrowtypeswithTable_Comparison.Then,youdealwithdifferenttypesofrows,applyingdifferenttransformationsdependingonwhethertherow
isgoingtobeinsertedorupdatedinthetargettable.Finally,youjoinbothsplitdatasetsbackintoonewiththehelpofMergetransformsasyoucannotlinkmultipletransformstoasingletargetobject.
SplittingtheflowofdatawiththeCasetransformTheCasetransformallowsyoutoputbranchlogicinasinglelocationinsideadataflowinordertosplitthedatasetandsendpartsofittodifferentlocations.Theymightbetargetdataflowobjects,suchastablesandfiles,orjustothertransforms.TheuseoftheCasetransformsimplifiesETLdevelopmentandincreasesthereadabilityofyourcode.
GettingreadyInthisrecipe,wewillbuildthedataflowthatreadsthecontentsofthedimensiontableDimEmployeeandupdatesitaccordingtothefollowingbusinessrequirements:
AllmaleemployeesintheproductiondepartmentgetsextravacationhoursAllfemaleemployeesintheproductiondepartmentget10extrasickhoursAllemployeesinthequalityassurancedepartmentgettheirbaserateincreasedby1.5
So,beforeyoubegindevelopingyourETL,makesureyouimporttheDimEmployeetableintheDWHdatastore.Wearegoingtouseitasbothsourceandtargetobjectinourdataflow.
Howtodoit…1. Firstofall,letscalculateaveragevaluesperdepartmentandgenderweareinterested
in.ExecutethefollowingqueriesinSQLServerManagementStudio:—AveragevacationhoursforallmalesinProductiondepartment
selectavg(VacationHours)asAvgVacHrsfromdbo.DimEmployeewhereDepartmentName=‘Production’andGender=‘M’andStatus=‘Current’;
—AveragesickhoursforallfemalesinProductiondepartment
selectavg(SickLeaveHours)asAvgSickHrsfromdbo.DimEmployeewhereDepartmentName=‘Production’andGender=‘F’andStatus=‘Current’;
—AveragebaserateforallemployeesinQualityAssurancedepartment
selectavg(BaseRate)asAvgBaseRatefromdbo.DimEmployeewhereDepartmentName=‘QualityAssurance’andStatus=‘Current’;
2. Pleasenotetheresultantvaluestocomparethemwiththeresultswhenwerunourdataflowafterhavingupdatedthosefields:
3. Createanewjobandanewdataflowobject,andopenthedataflowintheworkspacewindowforediting.
4. PuttheDimEmployeetableobjectasasourceinsideyournewdataflowandlinkittotheCasetransform,whichcanbefoundatLocalObjectLibrary|Transforms|Platform|Case.
5. OpenCaseEditorintheworkspacebydouble-clickingontheCasetransform.Hereyoucanchooseoneofthetreeoptionsandspecifyconditionsaslabel-expressionpairs(bymodifyingtheLabelandExpressionsettings),accordingtowhichtherowwillbesendtooneoutputoranother:
6. Labelvaluesareusedtolabeldifferentoutput.YouwillusetheselabelstooutputinformationtodifferenttransformobjectswhenyouarelinkingCaseoutputtothenextobjectsinadataflow.
7. CheckonlytheRowcanbeTRUEforonecaseonlyoptionandaddthefollowingconditionexpressionsbyclickingontheAddbutton:
Label Expression
Female_in_Production
DIMEMPLOYEE.DEPARTMENTNAME=‘Production’AND
DIMEMPLOYEE.STATUS=‘Current’AND
DIMEMPLOYEE.GENDER=‘F’
Male_in_Production
DIMEMPLOYEE.DEPARTMENTNAME=‘Production’AND
DIMEMPLOYEE.STATUS=‘Current’AND
DIMEMPLOYEE.GENDER=‘M’
All_in_Quality_AssuranceDIMEMPLOYEE.DEPARTMENTNAME=‘Quality
Assurance’ANDDIMEMPLOYEE.STATUS=‘Current’
8. YourCaseEditorshouldlooklikethefollowingscreenshot:
9. NowwehavetolinkourCasetransformoutputtothreedifferentQuerytransformobjects.Eachtimeyoulinktheobjects,youwillbeaskedtochoosetheCaseoutputfromwhatwecreatedbefore.
10. ForQuerytransformnames,letschoosemeaningfulvaluesthatrepresentthetypeoftransformationswearegoingtoperforminsidethem.
TheIncrease_Sick_HoursQuerytransformislinkedtotheFemale_in_ProductionCaseoutputTheIncrease_Vacation_HoursQuerytransformislinkedtotheMale_in_ProductionCaseoutputTheIncrease_BaseRateQuerytransformislinkedtotheAll_in_Quality_AssuranceCaseoutput
11. Lastly,mergeallQueryoutputswiththeMergetransformobject,linkittotheMap_Operationtransformobject,andfinallytotheDimEmployeetableobjectbroughtfromtheDWHdatastoreasatargettable.
12. Pleaseusethefollowingscreenshotasareferenceforhowyourdataflowshouldlook:
13. NowwehavetoconfigureoutputmappingsinourQuerytransforms.Asweareinterestedinupdatingonlythreetargetcolumns—VacationHours,SickLeaveHours,
andBaseRate—wemapthemfromthesourceCasetransform.TheCasetransforminheritsallcolumnmappingsautomaticallyfromthesourceobject.WealsomaptheprimarykeycolumnEmployeeKeysoDataServiceswillknowwhichrowstoupdateinthetarget.
14. ThenineachQuerytransform,modifythemappingexpressionofthecorrespondentcolumnaccordingtothebusinesslogic.Usethefollowingtableforthelistofcolumnsandtheirnewmappingexpressions.RememberthateachofourQuerytransformsmodifiesonlyonecorrespondentcolumn;theothercolumnmappingsshouldremainintact.Wearesimplygoingtopropagatethemfromthesourceobject:
Querytransform Modifiedcolumn Mappingexpression
Increase_Sick_Hours SICKLEAVEHOURSCase_Female_in_Production.SICKLEAVEHOURS
+10
Increase_Vacation_Hours VACATIONHOURSCase_Male_in_Production.VACATIONHOURS+
5
Increase_BaseRate BASERATECase_All_in_Quality_Assurance.BASERATE*
1.5
15. SeetheexampleoftheIncrease_Vacation_Hoursmappingconfiguration:
16. ThelastobjectweneedtoconfigureistheMap_OperationtransformobjectnamedUpdate.YoushouldalreadyknowbynowthattheQuerytransformgeneratesthenormaltypeofrows,whichareinsertedintoatargetobjectwhentheyreachtheendofthedataflow.
17. Inourexample,aswewanttoperformanupdateofnon-keycolumnsdefinedinoursourcedatasetusingmatchingprimarykeyvaluesinthetargettable,weneedtomodifytherowtypefromnormaltoupdate:
18. TobeabsolutelyclearaboutthepurposeofthisMap_Operationobject,wechangetheotherrowtypestodiscard,thoughwewouldnevergettheinsert,update,ordeleterowsinthisdataflowwithoutmodifyingit.
19. Saveandrunthejobandrunthequeriestoseenewaverageresultsforthecolumnsupdatedinthetable:
Thedifferencebetween“before”and“after”valuesprovesthatDataServicescorrectlyupdatestherequiredrowsintheDimEmployeetable.
Howitworks…Thedevelopeddataflowisagoodexampleofadataflowperforminganupdateofthetargettable.
Wehavesplittherowsaccordingtotheconditionsspecified,performedtherequiredtransformationofthedataaccordingtothelogicprovidedintheconditions,andthenmergedallsplitdatasetsbacktogetherandmodifiedallrowtypestoupdate.WedidthissothatDataServiceswouldexecuteUPDATEstatementsforthewholedataset,updatingthecorrespondingrowsthathavethesameprimarykeyvalues.
Asweusedthetargettableasasourceobjectaswell,wecanbesurethatwewillnothaveanyextrarowsinourupdatedatasetthatdonotexistinthetarget.
Notethatthedatasetgeneratedinyourdataflowdoesnothavetomatchexactlythetargettablestructure.Whenyouperformtheupdateofthetargettable,makesureyouhavetheprimarykeydefinedcorrectlyandkeepinmindthatthetargettablewillhaveupdatedallcolumnsdefinedasnon-primarycolumnsinthesourceschemastructure.
NoteDataServicesusesprimarykeycolumnsdefinedinthetargettabletofindthematchingrows.Ifyouwanttouseadifferentsetofcolumnstofindthecorrespondingrecordtoupdateinthetarget,setthemupasprimarykeycolumnsintheoutputschemaoftheQuerytransforminsideadataflow,andsetUseinputkeystoYesintheUpdatecontrolsectionofthetargettableobject.
Thereisanother,lesselegantwayofdoingthesamethingthatCasetransformdoes.ItinvolvesusingtheWHEREtaboftheQuerytransformstofilterthedatarequiredfortransformation:
Thatdoeslooklikeasimplersolution,buttherearetwomaindisadvantages:
Youlosereadabilityofyourcode:WithCasetransform,youcanseelabelsoftheoutput,whichcanexplaintheconditionsusedtosplitthedata.Youloseinperformance:Insteadofsplittingthedataset,youactuallysenditthree
timestodifferentQuerytransforms,eachofwhichperformsthefiltering.Technically,youaretriplingthedataset,makingyourdataflowconsumemuchmorememory.
MonitoringandanalyzingdataflowexecutionWhenyouexecutethejob,DataServicespopulatesrelevantexecutioninformationintothreelogfiles:thetrace,monitor,anderrorlogs.Inlaterchapters,wewilltakeacloserlookattheconfigurationparametersavailableatthejoblevelinordertogathermoredetailedinformationregardingjobexecution.Meanwhile,inthisrecipe,wewillspendsometimeanalyzingthemonitorlogfile,whichlogsprocessinginformationfrominsidethedataflowcomponents.
GettingreadyForsimplicity,wewillusetheseconddataflowfromtherecipeUsingtheTable_ComparisontransformcreatedfordetailedexplanationoftheflowofthedatabeforeandafteritpassestheTable_Comparisontransformobject:
OpentheTable_ComparisontransformeditorintheworkspaceandchangethecomparisonmethodtoCachedcomparisontable:
WechangethisoptiontoslightlychangethebehavioroftheDataServicesoptimizer.Now,insteadofcomparingdatarowbyrow,executingtheSELECTstatementagainstthecomparisontableinthedatabaseforeachinputrow,DataServiceswillreadthewholecomparisontableandcacheitontheDataServicesserverside.OnlyafterthiswillitperformthecomparisonofinputdatasetrecordswithtablerecordscachedontheDataServicesserverside.Thatslightlyspeedsupthecomparisonprocessandchangeshowtheinformationaboutdataflowexecutionisloggedinthemonitorlog.
Howtodoit…1. Savethedataflowandexecutethejobwiththedefaultparametersasusual.2. Inthemainworkspace,opentheJobLogtabtoshowthetracelogsection,which
containsinformationaboutjobexecution.Toseethemonitorlog,clickonthesecondbuttonatthetopoftheworkspacearea.Forconvenience,youmayselecttherecordsfromthelogyouareinterestedinandcopyandpastethemintotheExcelspreadsheetusingtheright-clickcontextmenu:
3. Thismonitorlogsectiondisplaysinformationaboutthenumberofrecordsprocessedbyeachdataflowcomponentandhowlongittakestoprocessthem.Thereadercomponentsshowninthefollowingscreenshotareresponsibleforextractinginformationfromthesourcedatabasetables.YoucanseethattheDimProducttableisextractedbyaseparateprocess(probablybecauseitislocatedinadifferentdatabase),whereastheotherthreetablesarejoinedandextractedwithasingleSELECTstatementbyasinglecomponentwithquiteasophisticatedname,asyoucansee:
4. ThecomponentJoin_PRODUCT_TEST_COMPAREpassesthedatasetfromtheJoinQuerytransformtothefirsttargettable,PRODUCT_TEST_COMPARE.Youcanseethatithasprocessed396rowsin0.136seconds:
5. Finally,informationaboutdataflowcomponentsresponsibleforprocessingdatainMap_Operationtransformsshowsthattherewere210rowsprocessedbytheMO_DeletetransformandpassedtoatargetPRODUCT_DESC_DELETEtemplatetable.OnlyonerowwasprocessedbyMO_Updateandpassedtoa
correspondingtargettableandnorowswereprocessedbyMO_Insertasthereweren’tanyrowswithinsertrowtypegeneratedbythisdataflow:
6. Thelastcolumnshowsthetotaltimepassedintheexecuteddataflowobjectwhenthecomponentwasprocessingrecords.
Howitworks…DataServicesputsprocessinginformationfromalldataflowobjectsinasingleplace.Ifyouhaveajobwith100dataflowsandsomeofthemruninparallel,youcanimaginethatrecordsinthemonitorlogcouldbemixed.ThatiswhycopyingthelogdatatoaspreadsheetforfurthersearchandfilteringwithfunctionalityofExcelisquiteuseful.
Dataflowexecutionisaverycomplexprocess,andthecomponentsyouseeinthemonitorlogarenotalwaysinaone-to-onerelationshipwiththeobjectsplacedinsideadataflow.Therearevariousinternalservicecomponentsperformingjoins,splits,andthemergingofdatathatwillbedisplayedinthemonitorlog.SometimesDataServicescreatesafewprocessingcomponentsforasingletransformobject.
Ifyouknowwhatyouarelookingfor,readingthemonitorlogismucheasier.Hereisasummaryofwhatthecolumnsmean:
Thefirstcolumninthemonitorlogisthenameofthecomponentcontainingthenameofthedataflowandthenamesofthecomponentsinsidethedataflow.Thesecondcolumnisthestatusoftheprocessingcomponent.READYmeansthatthecomponenthasnotstartedprocessingdata;inotherwords,norecordshavereachedityet.PROCEEDmeansthatthecomponentisprocessingrowsatthemoment,andSTOPmeansthatallrowshavepassedthecomponentandithasfinishedprocessingthembypassingthemfurtherdownthedataflowexecutionsequence.Thethirdcolumnshowsyouthenumberofrowsprocessedbyacomponent.ThisvalueisinfluxwhilethecomponenthasthePROCEEDstatusandattainsafinalvaluewhenthecomponent’sstatuschangestoSTOP.Thefourthcolumnshowsyoutheexecutiontimeofthecomponent.Thefifthcolumnshowsyouthetotalexecutiontimeofthedataflowwhilethecomponentwasprocessingtherows.Assoonasthecomponent’sstatuschangestoSTOP,bothexecutiontimevaluesfreezeandstopchanging.
Toillustratethisevenfurther,let’scounttherowsinthesourcetablestocomparewithwhatwehaveseeninthemonitorlog.
First,seetheresultsofcountingthenumberofrecordsinthetablesDIMPRODUCTandPRODUCTMODELPRODUCTDESCRIPTIONCULTUREwiththehelpoftheViewDatafunctionavailableontheProfileTabfortableobjectsinsideadataflow.ClickontheRecordsbuttontocalculatethenumberofrecordsinthetable:
NowseetheresultofcountingthenumberofrecordsinthetablesPRODUCTandPRODUCTDESCRIPTIONwiththesameViewData|Profilefeature:
ByusingthetransformnameJoin,youcanseethecomponentsrelatedtotheexecutionofthefirstQuerytransform.
YouseetheDIMPRODUCT_11component(606rows)asbeingnotpartoftheJointransformcomponentsbecauseitwasexecutedseparately.DataServicescouldnotincludeitinasingleSELECTstatement(rememberthatthistableisintheDWHdatabase)withthreeothertablesthathadjoinconditionsspecifiedinsidetheJointransform.DataServicescouldrecognizethemasbelongingtothesamedatabaseandpusheddownthesingleSELECTstatementtothedatabaselevel,extracting294rows.
Somecomponents,thatisMap_Operationrelatedones,areeasilyrecognizablebyname,whichincludesthenameofthecurrenttransformationandthenexttargettableobjectname:Join_PRODUCT_TEST_COMPARE,MO_Update_PRODUCT_DESC_UPDATE,andsoon.
TheTable_Comparisonexecutionisthemostcomplexone,asyoucanseefromthemonitorlog.Allcompareddatasetsarefirstcachedbyseparatecomponentsandthencomparedtoeachotherbytheotherones.YoucanidentifycomponentsbelongingtoaTable_ComparisontransformbyusingthekeywordsTCRdrandTable_Comparison.
There’smore…Readingthemonitorlog,whichisthemainsourceofthedataflowexecutioninformation,canrequirealotofexperience.Inthefollowingchapters,wewillspendalotoftimepeekingintothemonitorlogfordifferentkindsofinformationaboutthedataflowexecution.Often,itisveryusefulforidentifyingpotentialperformancebottlenecksinsidethedataflow.
Chapter5.Workflow–ControllingExecutionOrderThischapterwillexplainindetailanothertypeofDataServicesobject:workflow.Workflowobjectsallowyoutogroupotherworkflows,dataflowsandscriptobjectsintoexecutionunits.Inthischapter,wewillcoverthefollowingtopics:
CreatingaworkflowobjectNestingworkflowstocontroltheexecutionorderUsingconditionalandwhileloopobjectstocontroltheexecutionorderUsingthebypassingfeatureControllingfailures–try-catchobjectsUsecaseexample–populatingdimensiontablesUsingacontinuousworkflowPeekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjects
IntroductionInthischapter,wewillmovetothenextobjectintheDataServiceshierarchyofobjectsusedinETLdesign:theworkflowobject.Workflowsdonotperformanymovementofdatathemselves;theirmainpurposeistogroupdataflows,scripts,andotherworkflowstogether.
Inotherwords,workflowsarecontainerobjectsgroupingpiecesofETLcode.TheyhelpdefinethedependenciesbetweenvariouspiecesofETLcodeinordertoproviderobustandflexibleETLarchitecture.
IwillalsoshowyouhowyoucanquerytheDataServicesrepositoryusingdatabasetoolsinordertoquerythehierarchyofobjectsdirectlyandwillshowyouhowthishierarchyisstoredinrepositorydatabasetables.Thismaybeveryusefulifyouwanttounderstandabitmoreabouthowthesoftwareisfunctioning“underthehood”.
Additionally,wewillbuildareal-lifeusecaseETLcodebypopulatingdimensiontablesindatawarehouse.ThisusecaseexamplewillincludethefunctionalityalreadyreviewedinthepreviouschaptersandwillshowyouhowyoucanaugmentexistingETLprocessesandmigratedata(dataflows)withthehelpofworkflowobjects.
CreatingaworkflowobjectAworkflowobjectisareusableobjectinDataServices.Oncecreated,thesameobjectcanbeusedindifferentplacesofyourETLcode.Forexample,youcanplacethesameworkflowindifferentjobsornestitinotherworkflowobjectsbyplacingthemintheworkflowworkspace.
NoteNotethataworkflowobjectcannotbenestedinsideadataflowobject.Workflowsareusedtogroupdataflowobjectsandotherworkflowssothatyoucancontroltheirexecutionorder.
Everyworkflowobjecthasitsownlocalvariablescopeandcanhaveasetofinput/outputparameterssothatitcan“communicate”withtheparentobject(inwhichitisnested)byacceptinginputparametervaluesorsendingvaluesbackthroughoutputparameters.Ascriptobjectplacedinsidetheworkflowbecomespartoftheworkflowandsharesitsvariablescope.ThatiswhyallworkflowlocalvariablescanbeusedwithinthescriptsplaceddirectlyintotheworkfloworpassedtothechildobjectsbygoingtoVariablesandParameters|Calls.
Laterinthischapter,wewillexplorehowthisobjecthierarchyisstoredwithintheDataServicesrepository.
Howtodoit…Therearefewwaystocreateaworkflowobject.Followthesesteps:
1. Tocreateaworkflowobjectintheworkspaceoftheotherparentobject,youcanuse
thetoolpalletontheright-handsideoftheDesignerinterface.Followthesesteps:1. Createanewjobandopenitintheworkspaceforediting.2. Left-clickontheWorkFlowiconintheworkspacetoolpalette(seethe
followingscreenshot),dragittothejobworkspace,andleft-clickontheemptyspaceintheworkspacetoplacethenewworkflowobject:
3. NametheobjectWF_exampleandpressEntertocreateit.NotethattheobjectimmediatelyappearsintheLocalObjectLibraryworkflowlist.TheparentobjectoftheWF_exampleworkflowisthejobitself.
2. CreateanotherworkflowobjectinsideWF_example.Now,wewilluseadifferentmethodtocreateworkflowsdirectlyfromLocalObjectLibraryratherthanusingtheworkspacetoolpalette.Then,performthesesteps:1. OpenWF_exampleinthemainworkspacewindow.2. GototheLocalObjectLibrarywindowandselecttheWorkFlowstab.3. Right-clickontheLocalObjectLibraryemptyareaofthistabandchoose
Newfromthecontextmenu.4. Fillintheworkflowname,WF_example_child,anddraganddropthecreated
objecttotheworkspaceareaofWF_examplefromLocalObjectLibrary.
Howitworks…AworkflowobjectorganizesandgroupspiecesofETLprocesses(dataflowandsometimesscripts).Itdoesnotperformanydataprocessingitself.Whenitisbeingexecuted,itsimplystartsexecutingsequentially(orinparallel)allitschildobjectsintheorderdefinedbytheuser.
Youcanthinkofaworkflowasacontainerthatholdstheexecutableelements.Likeaprojectobjectfunctionissimilartoarootfolder,workflowservesthesame“folder”functionalitywithafewextrafeatures,whichyouwillbeabletogetfamiliarwithinthenextfewrecipes.
Likethefolderstructureonyourdisk,youcancreatesophisticatednestedtreestructureswiththehelpofworkflowobjectsbyputtingthemintoeachother.
Onethingtorememberisthateachworkflowhasitsownscopeofvariablesorcontext.Topassvariablesfromaparentworkflowtoachildobject,selecttheCallstabontheVariablesandParameterspanel.Itshowsthelistofinputparametersfromthechildobjectsfortheobjectcurrentlyopeninthemainworkspacearea.
ToopentheVariablesandParameterswindow,youcanclickontheVariablesbuttoninthetoolmenuatthetopofyourDesignerscreen.
Here,youseethecontextofthecurrentlyopenobject,thatis,thelistofdefinedlocalvariables,inputparameters,andavailableglobalvariablesinheritedfromthejobcontext:
TheCallssectionallowsyoutopassyourpreviouslycreatedlocalvariable$WF_example_local_varoftheWF_exampleworkflowtotheWF_example_childchildworkflowobject’s$WF_example_child_var1inputparameter,asshownhere:
Ofcourse,youhavetoopenthechildobjectcontextfirstandcreateaninputparameterso
thatitscallisvisibleinthecontextoftheparent.
Scriptsarenotreusableobjectsanddonothavelocalvariablescopeorparametersoftheirown.Theybelongtotheworkfloworjobobjecttheyhavebeenplacedinto.Inotherwords,theycanseeandoperateonlyonthelocalvariablesandparametersdefinedattheparentobjectlevel.
Ofcourse,youcancopyandpastethecontentsofasinglescriptobjecttoanotherscriptobjectinadifferentworkflow.However,itwillbeanewinstanceofthescriptobjectthatwillberunninginanewcontextofthedifferentparentworkflow.Hence,thevariablesandparametersusedcouldbecompletelydifferent.
NestingworkflowstocontroltheexecutionorderInthisrecipe,wewillseehowworkflowobjectsareexecutedinthenestedstructure.
GettingreadyWewillnotcreatedataflowobjectsinthisrecipe,sotoprepareanenvironment,justcreateanemptyjobobject.
HowtodoitWewillcreateanestedstructureofafewworkflowobjects,eachofwhich,whenexecuted,willrunthescript.Itwilldisplaythecurrentworkflownameandthefullpathtotherootjobcontext.Followthesesteps:
1. Inthejobworkspace,createanewworkflowobject,WF_root,andopenit.2. IntheVariablesandParameterswindow,whenintheWF_rootcontext,createone
localvariable$l_wf_nameandoneinputparameter$p_wf_parent_name,bothofthevarchar(255)datatype.
3. Also,insideWF_root,addthenewscriptobjectnamedScriptwiththefollowingcode:$l_wf_name=workflow_name();
print(‘INFO:running{$l_wf_name}(parent={$p_wf_parent_name})’);
$l_wf_name=$p_wf_parent_name||’>’||$l_wf_name;
4. InthesameWF_rootworkflowworkspace,addtwootherworkflowobjects,WF_level_1andWF_level_1_2,andlinkallofthemtogether.
5. Repeatsteps2and3forbothnewworkflowsWF_level_1andWF_level_1_2.6. OpenWF_level_1,createanewworkflow,WF_parallel,andlinkittothescript
object.7. InsidetheWF_level_1workflow,createtwootherworkflowobjects,WF_level_3_1
andWF_level_3_2.Then,createonlyoneinputparameter,$p_wf_parent_name,withoutcreatingalocalvariable.
8. Repeatsteps2and3forboththeWF_level_3_1andWF_level_3_2workflows.9. Now,wehavetospecifymappingsfortheinputparametersofthecreatedworkflows.
Todothis,double-clickonparametername$p_wf_parent_namebygoingtoVariablesandParameters|Callsandinputthenameofthe$l_wf_namelocalvariable.
10. Therearetwoexceptionstotheinputparametermappingsettings.InthecontextofthejobfortheinputparameteroftheWF_rootworkflow,youhavetospecifythejob_name()functionasavalue.Performthesesteps:1. Openthejobinthemainworkspace(sothattheWF_rootworkflowisvisibleon
thescreen).2. ChooseVariablesandParameters|Callsanddouble-clickonthe
$p_wf_parent_nameinputparametername.3. IntheValuefield,enterthejob_name()functionandclickonOK.
11. ThesecondexceptionistheinputparametermappingsforworkflowsWF_level_3_1andWF_level_3_2.Performthefollowingsteps:1. OpentheWF_parallelworkflowtoseebothWF_level_3_1andWF_level_3_2
displayedonthescreen.2. GotoVariablesandParameter|Callsandspecifythefollowingvalueforboth
inputparametercalls:(($p_wf_parent_name||’>’)||workflow_name())
12. Yourjobshouldhavethefollowingworkflownestedstructure,asshowninthe
screenshothere:
TheonlyworkflowobjectthatdoesnothaveascriptobjectinsideitisWF_parallel.Itwillbeexplainedlaterintherecipe.
13. Now,openthejobintheworkspaceareaandexecuteit.14. Thetracelogshowstheorderofworkflowexecutions,currentlyexecutedworkflow
names,andtheirlocationintheobjecthierarchywithinthejob.Seethefollowingscreenshot:
Howitworks…Aswehavepassedvaluestotheinputparametersoftheobjectsinthepreviouschapterdedicatedtothecreationofdataflowobjects,youprobablyalreadyknowhowthismechanismworks.Theobjectcallsfortheinputparametervaluerightbeforeitsexecutionintheparentobjectwhereitislocated.
Everyworkflowinourstructure(exceptWF_parallel)hasalocalvariablethatisusedinthescriptobjecttosaveanddisplaythecurrentworkflownameandconcatenateittotheworkflowpathinthehierarchyreceivedfromtheparentobjectinordertopasstheconcatenatedvaluetothechildobjectintheircalls.
Let’sfollowtheexecutionssteps:
Whenajobexecutes,itfirstrunstheobjectthatislocatedinthejobcontext;inourcase,itisWF_root.Aswedonotspecifyanylocalvariableforthejob,wecannotpassitsvaluetotheinputparameteroftheWF_rootobject.So,wesimplypassitajob_name()functionthatreturnsthenameofthejobwhereitisbeingexecuted.Thejob_name()functiongeneratesthevaluethatispassedtotheinputparameterrightbeforetheWF_rootexecution.TheWF_rootexecutionrunsthescriptobjectfromlefttoright.Inthescript,thelocalvariablegetsthevaluefromtheoutputoftheworkflow_name()function,whichreturnsthenameoftheworkflowwhereitisbeingexecuted.Withtheprint()function,wedisplaythelocalvariablevalueandvalueoftheinputparameterreceivedfromtheparentobject(job).Asthenextstep,thevalueofthelocalvariableisbeingconcatenatedwiththevalueoftheinputparametertogetthecurrentlocationpathinthehierarchyforthechildobjectsWF_level_1andWF_level_1_2.AsallobjectsinsideWF_rootarelinkedtogether,theyareexecutedsequentiallyfromlefttoright.Everynextobjectonlyrunsaftersuccessfulcompletionofthepreviousobject.DataServicesrunsWF_level_1andrepeatsthesamesequenceofdisplayingthecurrentworkflownameandcurrentpathwiththeconsequentconcatenationandpassingofthevaluetotheinputparameteroftheWF_parallelworkflow.TheWF_parallelworkflowdemonstrateshowDataServicesexecutestwoworkflowobjectsplacedinthesamelevelthatarenotlinkedtoeachother.Here,wecannotusethescripttopreparetoperformourusualsequenceofscriptlogicsteps.Ifyoutrytoaddascriptobjectnotlinkedtotheparallelworkflows,DataServicesgivesyouanerrormessagefromthejobvalidationprocess:
Ifyoutrytolinkthescriptobjecttooneoftheworkflows,youwillgetthefollowingerrormessage:
NoteNotehowDataServicesdoesnotallowyoutolinkthescriptobjecttobothworkflows.
Ifusedwithinajoboraworkflow,scriptobjectsdisableparallelexecutionlogic,allowingyouonlyasequentialexecutionwithinthecurrentcontext:
Tomakesurethatyourworkflowexecutessimultaneouslyandrunsinparallel,makesurethatyoudonotusethescriptobjectinthesameworkspace.
Thatiswhy,whenwepassthevaluestotheinputparametersoftwoworkflowsexecutedinparallel,WF_level_3_1andWF_level_3_2,wespecifytheconcatenationformularightintheinputparametervaluefield:
It’sveryimportanttounderstandthat$p_wf_parent_namearetwodifferentparametersintheprecedingscreenshot.Theoneontheleft-handsideisthe$p_wf_parent_nameinputparameterbelongingtothechildobjectWF_level_3_1,whichasksforavalue.Theoneontheright-handsidebelongstothecurrentworkflowWF_parallel,inwhichcontextwearelocatedatthemoment,anditholds
thevaluereceivedfromitsparentobjectWF_level_1.
AftercompletionofWF_level_3_1andWF_level_3_2,DataServicescompletestheWF_parallelworkflow,thentheWF_level_1workflow,andfinallyrunstheWF_level_1_2workflow.WF_rootisthelastworkflowobjectthatisfinishingitsexecutionwithinthejob,sothejobcompletesitsexecutionsuccessfully.
Seethetracelogagaintofollowthesequenceofstepsexecuted,andmakesurethatyouunderstandwhytheywereexecutedinthisparticularorder.
UsingconditionalandwhileloopobjectstocontroltheexecutionorderConditionalandwhileloopobjectsarespecialcontrolobjectsthatbranchtheexecutionlogicattheworkflowlevel.Inthisrecipe,wewillmodifythejobfromthepreviousrecipetomaketheexecutionofourworkflowobjectsmoreflexible.
ConditionalandloopstructuresinDataServicesaresimilartotheonesusedinotherprogramminglanguages.
Forreaderswithnoprogrammingbackground,hereisabriefexplanationofconditionalandloopstructures.
TheIF-THEN-ELSEstructureallowsyoutochecktheresultoftheconditionalexpressionpresentedintheIFblockandexecuteseithertheTHENblockorELSEblockdependingonwhethertheresultoftheconditionalexpressionisTRUEorFALSE.TheLOOPstructureinprogramminglanguageallowsyoutoexecutethesamecodeagainandagainintheloopuntilthespecifiedconditionismet.Youshouldbeverycarefulwhencreatingloopstructuresinprogramminglanguageandcorrectlyspecifytheconditionthatexitsorendstheloop.Ifincorrectlyspecified,thecodeintheloopcouldrunindefinitely,makingyourprogramhang.
GettingreadyOpenthejobfromthepreviousrecipe.
Howtodoit…WewillgetridofourWF_parallelworkflowandexecuteonlyoneoftheunderlyingWF_level_3_1orWF_level_3_2workflowsrandomly.Thisisnotacommonscenarioyouwillseeinreallife,butitgivesaperfectexampleofhowDataServicesallowsyoutocontrolyourexecutionlogic.Performthesesteps:
1. OpenWF_level_1intheworkspaceandremoveWF_parallelfromit.2. Usingthetoolpaletteontheright-handside,createaconditionalobject,andlink
yourscriptobjecttoit.NametheconditionalobjectIf_Then_Else:
3. Double-clickontheIf_Then_ElseconditionalobjectorchooseOpenfromtheright-clickcontextmenu.
4. Youcanseethreesections:If,Then,andElse.IntheThenandElsesections,youcanputanyexecutionalelements(workflows,scripts,ordataflows).TheIffieldshouldcontaintheexpressionreturningaBooleanvalue.IfitreturnsTRUE,thenallobjectsintheThensectionareexecutedinsequentialorparallelorder,dependingontheirarrangement.IftheexpressionreturnsFALSE,thenallelementsfromtheElsesectionareexecuted:
5. PutWF_level_3_1fromLocalObjectLibraryintotheThensection.6. PutWF_level_3_2fromLocalObjectLibraryintotheElsesection.7. Mapinputparametercallsofeachworkflowtothelocal$l_wf_namevariableofthe
parentWF_level_1workflowobject.YoucannowseethatwithouttheWF_parallelworkflow,bothWF_level_3_1andWF_level_3_2areoperatingwithinthecontextoftheWF_level_1workflow(rememberthattheconditionalobjectdoesnothaveitsowncontextandvariablescope,anditistransparentinthataspect).
8. Typeinthefollowingexpressionthatrandomlygenerates0or1intheIfsection:cast(round(rand_ext(),0),‘integer’)=1
Wewillusethisexpressiontorandomlygenerateeither0or1inordertoexecutetheETLplacedinTHENorELSEblockseverytimeweruntheDataServicesjob.
9. Saveandexecutethejob.Thetracelogshowsthatonlyoneworkflow,WF_level_3_2,wasexecuted.TohavemorevisibilityonthevaluesgeneratedbytheIfexpression,youcanputinthescriptbeforeIf_Then_Elseandassignitsvaluetoalocalvariable,whichcanbeusedafterthatintheIfsectionoftheIf_Then_ElseobjecttogettheBooleanvalue:
Now,let’smakeourlastworkflowobjectinthejobrun10timesinaloop,usingthesesteps:
1. OpenWF_rootinthemainworkspace.2. DeleteWF_level_1_2fromtheworkspace.3. Addawhileloopobjectfromthetoolpalette,nameitWhile_Loop,andlinkitto
WF_level_1,asshowninthefollowingscreenshot.Asweknowthatwearegoingtorunaloopfor10cycles,weneedtocreateacounterthatwewilluseintheloopcondition.Forthispurpose,createa$l_countlocalintegervariablefortheWF_rootworkflowandassignitavalue“1”intheinitialscript.YourcodeintheScriptobjectshouldlooklikethis:$l_wf_name=workflow_name();
print(‘INFO:running{$l_wf_name}(parent={$p_wf_parent_name})’);
$l_wf_name=$p_wf_parent_name||’>’||$l_wf_name;
$l_count=1;
4. OpentheWhile_LoopintheworkspaceandplacetheWF_level_1_2workflowbycopyingordraggingitfromLocalObjectLibrary.
5. Placetwoscriptobjects,scriptandincrease_counter,beforeandaftertheworkflowandlinkallthreeobjectstogether.
6. Theinitialscriptwillcontaintheprint()functiondisplayingthecurrentloopcycle,andthefinalscriptwillincreasethecountervalueby1.YoualsohavetoputtheconditionalexpressionthatchecksthecurrentcountervalueinthewhilefieldoftheWhile_Loopobject.Theexpressionis$l_count<=10:
Theconditionalexpressionischeckedaftereachloopcycle.TheloopexecutessuccessfullyassoonastheconditionalexpressionreturnsFALSE.
7. Mapthe$p_wf_parent_nameinputparameterofWF_level_1_2tothelocalvariablefromtheparent’scontext,$l_wf_name,bygoingtoVariablesandParameters|Calls.
8. Saveandexecutethejob.CheckyourtracelogfiletoseethatWF_level_1_2wasexecuted10times:
Howitworks…Theif-then-elseconstructionisavailableinthescriptinglanguageaswell,butasyouknowalready,theusageofscriptobjectswithworkflowsisquitelimited—youcanonlyjointheseobjectssequentially.Thisiswhereconditionalobjectscomeinaction.
Themaincharacteristicoftheconditionalandwhileloopobjectsisthattheyarenotworkflowsanddonothavetheirowncontext.Theyoperatewithinthevariablescopeoftheirparentobjectsandcanonlybeplacedwithinaworkfloworjobobject.Thatiswhy,youneedtocreateanddefinealllocalvariablesusedintheif-then-elseorwhileconditionalexpressioninsidetheparentobjectcontext.
NoteScriptobjectshavetheirownif-then-elseandwhileloopconstructions,andtobranchlogicwithindataflows,youcanuseCase,Validation,orsimplyQuerywithfilteringconditionstransforms.
Thereismore…WorkflowobjectsthemselveshaveafewoptionstocontrolhowtheyareexecutedwithinthejobthataddssomeflexibilitytotheETLdesign.Theywillbeexplainedinthefollowingrecipesofthischapter.Now,wewilljusttakealookatoneofthem.
ThisistheExecuteonlyonceoptionavailableintheworkflowobjectpropertieswindow.
Toopenit,justright-clickontheworkfloweitherintheworkspaceorinLocalObjectLibraryandchooseProperties…fromthecontextmenu:
Toseetheeffectthisoptionhasonworkflowexecution,takethejobfromthisrecipeandtickthisoptionfortheWF_level_1_2workflow—theonethatrunsinloop.
Then,savethejobandexecuteit.Thetraceloglookslikethisnow:
Whatishappeningherenowisthataftersuccessfullyexecutingtheworkflowforthefirsttimeinthefirstrun,thewhilelooptriestodothisanother9times.However,astheworkflowhasalreadyrunwithinthisjobexecution,itskipsitwithsuccessfulworkflowcompletionstatus.
Thisoptionisrarelyusedwithinaloopasyou,ofcourse,donotputanythinginloopthatcanbeexecutedonlyonce,butitshowshowDataServicesdealswithworkflowslikethis.
Themostcommonscenarioiswhenyouputthespecificworkflowinmultiplebranchesoftheworkflowhierarchyasadependencyforotherworkflowsandyouonlyneedittobeexecutedoncewithoutcaringwhichbranchitwillbeexecutedinfirstaslongasitcompletessuccessfully.
Thescopeofthisoptionisrestrictedbyajoblevel.Ifyouplacetheworkflowwiththisoptionenabledinmultiplejobsandruntheminparallel,theworkflowwillbeexecutedonceineachjob.
UsingthebypassingfeatureThebypassingoptionallowsyoutoconfigureaworkflowordataflowobjecttobeskippedduringthejobexecution.
Gettingready…Wewillusethesamejobasinthepreviousrecipe.
Howtodoit…Let’sconfiguretheWF_level_1workflowobjectthatbelongstotheparentWF_rootworkflowtobeskippedpermanentlywhenthejobruns.
Theconfigurationofthisfeaturerequirestwosteps:creatingabypassingsubstitutionparameterandenablingthebypassingfeaturefortheworkflowusingacreatedsubstitutionparameter.
1. Tocreateabypassingsubstitutionparameter,followthesesteps:
1. GotoTools|SubstitutionParameterConfigurations….2. OntheSubstitutionParameterEditorwindow,youcanseethelistofdefault
substitutionparametersusedbyDataServices.3. Clickontheemptyfieldatthebottomofthelisttocreateanewsubstitution
parameter.4. Youcanchooseanynameyouwant,butrememberthatallsubstitution
parametersstartwiththedoubledollarsign.5. Callyournewsubstitutionparameter$$BypassEnabledandchoosethedefault
valueYESintheConfigurationcolumntotheright:
6. Asafinalstep,clickonOKtocreatethesubstitutionparameter.2. Now,youcan“label”anyworkflowobjectwiththissubstitutionparameterifyou
wantittobebypassedduringjobexecution.Followthesesteps:1. OpentheWF_rootworkflowwithinyourjobtoseeWF_level_1inthemain
workspacewindow.2. Right-clickontheWF_level_1workflowandchooseProperties…fromthe
contextmenutoopentheworkflowpropertieswindow.3. ClickontheBypassfieldcomboboxandchoosethenewlycreatedsubstitution
parameterfromthelist,[$$BypassEnabled].Bydefault,the{NoBypass}valueischoseninthisfield:
4. ClickonOK.Theworkflowbecomesmarkedwithacrossedred-circleicon.Thismeansthatduringthejobexecution,thisworkflowwillbeskipped,andthenextobjectinthesequencewillbeexecutedstraightaway:
Howitworks…Now,let’sseewhathappenswhenyourunthejob:
Duringjobvalidation,youcanseeawarningmessagetellingyouthataparticularworkflowwillbebypassed:
Whenthejobisexecuted,itrunstheworkflowsequenceasusual,exceptwhenitgetstothebypassedworkflowobject.Theworkflowobjectisskippedandalldependentobjects,theobjectnextinsequenceandtheparentworkflowwherethebypassedobjectresides,consideritsexecutiontobesuccessful.Ifyoutakealookatthetracelogofthejobexecution,youwillseesomethingsimilartothisscreenshot:
Thereismore…InDataServices,thereismorethanonewaytosetuptheworkflowobjectasbypassed.Ifyouright-clickontheworkflowobject,youwillseethattheBypassoptionisavailableinthecontextmenudirectly.ItopenstheSetBypasswindowwiththesamecomboboxlistofsubstitutionparametervaluesavailableforthisoption.
NoteYoucannotonlybypassworkflows.Dataflowobjectscanbebypassedinthesamemanner.
Controllingfailures–try-catchobjectsIntheCreatingcustomfunctionsrecipeinChapter3,DataServicesBasics–DataTypes,ScriptingLanguage,andFunctions,wecreatedacustomfunctionshowinganexampleofthetry-catchblockexceptionhandlinginthescriptinglanguage.Likeinthecaseofif-then-elseandwhileloop,DataServiceshasavariationofthetry-catchconstructionfortheworkflow/dataflowobjectlevelaswell.Youcanputthesequenceoftheexecutableobjects(workflows/dataflows)betweenTryandCatchobjectsandthencatchpotentialerrorsintheCatchobjectwhereyoucanputscripts,dataflows,orworkflowsthatyouwanttoruntohandlethecaughterrors.
Howtodoit…Thestepstodeployandenabletheexceptionhandlingblockinyourworkflowstructureareextremelyeasyandquicktoimplement.
Allyouhavetodoisplaceanobjectorsequenceofobjectsfromwhichyouwanttocatchpossibleexceptionsbetweentwospecialobjects,TryandCatch.Then,followthesesteps:
1. Openthejobfromthepreviousrecipe.2. OpenWF_rootintheworkspace.3. ChoosetheTryobjectfromtheright-sidetoolpaletteandplaceitatthebeginningof
theobjectsequence.NameitTry:
4. ChoosetheCatchobjectfromtheright-sidetoolpaletteandplaceitattheendoftheobjectsequence.NameitCatch:
5. TheTryobjectisnotmodifiableanddoesnothaveanypropertiesexceptdescription.Itsonlypurposeistomarkthebeginningofthesequenceforwhichyouwanttohandleanexception.
6. Double-clickontheCatchobjecttoopenitinthemainworkspace.Notethatallexceptiontypesareselectedbydefault.Thisway,wemakesurethatwecatchanypossiblefailuresthatcanhappenduringourcodeexecution.Ofcourse,therecanbescenarioswhenyouwanttheETLtofailanddonotwanttorunthecodeintheCatchblockforsometypesoferrors.Inthiscase,youcandeselecttheexceptiontobehandledintheCatchblock.Inourexample,wejustwantourcodetocontinuetorunputtingtheerrormessageinthetracelog.
7. Createthescriptobjectwiththefollowinglineinit:print(‘ERROR:exceptionhasbeencaughtandhandledsuccessfully’);
8. Saveandexecutethejob.Theexceptionyougeneratedinthescriptissuccessfullyhandledbythetry-catchconstruction,andthejobcompletessuccessfully.
Howitworks…Ifyoutakealookatthetracelogofyourjobrun,youcanseethattheWF_level_3_1andWF_level_1workflowsfailed:
WF_level_3_1failedastheexceptionwasraisedinthescriptinsideit,andWF_level_1failedbecauseitsexecutiondependsonthechildobjectWF_level_3_1.Youshouldrememberthatifanychildobjectswithinaworkflowfail(anotherworkflow,dataflow,orscript),theparentobjectfailsimmediately.Then,theparent’sparentobjectfailsaswell,andsoon,untiltherootlevelofthejobhierarchyisreachedandthejobitselffailsandstopsit’sexecution.
Byplacingthetry-catchsequenceinsideWF_root,wemadeitpossibletocatchallexceptionsinsideit,makingsurethatourWF_rootworkflowneverfails.
NoteTry-catchobjectsdonotpreventajobfromfailinginthecaseofthecrashofthejobserveritself.Thisis,ofcourse,becausethesuccessfulexecutionofthetry-catchlogicdependsontheworkoftheDataServicesjobserver.
Notethattheerrorlogisstillgeneratedinspiteofthesuccessfuljobexecution.Inthere,youcanseetheloggingmessagethatwasgeneratedbythelogicfromthecatchobjectandthecontextinwhichtheinitialexceptionhappened:
Try-catchobjectscanbeavitalpartofyourrecoverystrategy.Ifyourworkflowcontainsafewstepsthatyoucanthinkofasatransactionalunit,youwouldwanttocleanupwhensomeofthesestepsfailbeforerunningthesequenceagain.Asexplainedintherecipededicatedtotherecoverytopic,theDataServicesautomaticrecoverystrategyjustsimplyskipsthestepsthathavealreadybeenexecuted,andsometimes,thisissimplynotenough.
Italldependsonhowthoroughyouhavetobeduringyourrecovery.
Anotherveryimportantaspectistounderstandthattry-catchblockspreventthefailureoftheworkflowinwhichcontexttheyareput.Thismeansthattheerrorishiddeninsidethetry-catchandparentworkflow,andallsubsequentobjectsdowntheexecutionpathwillbeexecutedbyDataServices.
Therearesituationswhenyoudefinitelywanttofailthewholejobtopreventanyfurtherexecutionifsomeofthedataprocessinginsideitfails.Youcanstillusetry-catchblockstocatchtheerrorinordertologitproperlyordosomeextrasteps,butafterallthisisdone,theraise_exception()functionattheendofthecatchblockisputtofailtheworkflow.
Usecaseexample–populatingdimensiontablesInthisrecipe,wewillbuildtheETLjobtopopulatetwodimensiontablesintheAdventureWorks_DWHdatabase,DimGeographyandDimSalesTerritory,withthedatafromtheoperationaldatabaseAdventureWorks_OLTP.
GettingreadyForthisrecipe,youwillhavetocreatenewjob.Also,createtwonewschemasintheSTAGEdatabase:ExtractandTransform.Todothis,opentheSQLServerManagementStudio,expandDatabases|STAGE|Security|Schemas,right-clickontheSchemasfolder,andchoosetheNewSchema…optionfromthecontextmenu.Specifyyouradministratoruseraccountasaschemaowner.
Howtodoit…1. Inthefirststep,wewillcreateextractionprocessesusingthesesteps:
1. OpenthejobcontextandcreatetheWF_extractworkflow.2. OpentheWF_extractworkflowintheworkspaceandcreatefourworkflows:
eachforeverysourcetableweextractfromtheOLTPdatabase:WF_Extract_SalesTerritory,WF_Extract_Address,WF_Extract_StateProvince,WF_Extract_CountryRegion.Donotlinktheseworkflowobjectstomakethemruninparallel.
3. OpenWF_Extract_SalesTerritoryinthemainworkspaceareaandcreatetheDF_Extract_SalesTerritorydataflow.
4. OpenDF_Extract_SalesTerritoryintheworkspacearea.5. AddasourcetablefromtheOLTPdatastore:SalesTerritory.6. PlacetheQuerytransformafterthesourcetable,linkthem,opentheQuery
transformobjectintheworkspace,andmapallsourcecolumnstothetargetschemabyselectingthemtogetheranddraggingthemtothetargetschemaemptysection.
7. ExitQueryEditorandaddthetargettemplatetable,SalesTerritory.ChooseDS_STAGEasadatastoreobjectandExtractastheownertocreateatargetstagetableintheExtractschemaoftheSTAGEdatabase.
8. YourdataflowandQuerytransformmappingshouldlookasshowninthescreenshotshere:
9. Inthesamemanner,usingsteps3to8,createextractdataflowobjectsfortheotherOLTPtables:Address(dataflowDF_Extract_Address),StateProvince(dataflowDF_Extract_StateProvince),andCountryRegion(dataflowDF_Extract_CountryRegion).Placeeachofthecreateddataflowsinsidetheparentobjectwiththesamename,substitutingtheprefixDF_withWF_andputallextractworkflowstoruninparallelinsidetheWF_extractworkflowobject.Tonamethetargettemplatetablesinsideeachofthedataflows,choosethesame
nameasofthesourcetableobjectandselectDS_STAGEasadatabaseforthetabletobecreatedinandExtractastheowner/schema:
2. Now,let’screatetransformationprocessesusingthesesteps:1. GotothejobcontextlevelinyourDesignerandopentheWF_transformobject.2. Aswewillpopulatetwodimensiontables,wewillcreatetwotransformation
workflowsrunninginparallelforeachoneofthem:WF_Transform_DimSalesTerritoryandWF_Transform_DimGeography.
3. OpenWF_transform_DimSalesTerritoryandcreateanewdataflowinitsworkspace:DF_Transform_DimSalesTerritory.
4. Openthedataflowobjectanddesignitasshowninthefollowingscreenshot:
5. ItisnowimportantforthetransformationdataflowstocreatetargettemplatetablesintheTransformschemacreatedearlier.ThenameofthetargettabletemplateobjectshouldbethesameasthetargetdimensiontableinDWH.
6. TheJoinQuerytransformperformsthejoinoftwosourcetablesandmapsthecolumnsfromeachoneofthemtotheQueryoutputschema.Aswedonotmigrateimagecolumns,specifyNULLasamappingfortheSalesTerritoryImageoutputcolumn.Also,specifyNULLasamappingforSalesTerritoryKey,asitsvaluewillbegeneratedinoneoftheloadprocesses:
7. TocreatethetransformationprocessforDimGeography,gobacktotheWF_transformworkflowcontextlevelandcreateanewworkflowWF_Transform_DimGeographywithadataflowDF_Transform_DimGeographyinside.
8. Inthedataflow,wewillsourcethedatafromthreeOLTPtables,Address,StateProvince,andCountryRegion,topopulatethestagetransformationtablewiththetabledefinitionthatmatchesthetargetDWHDimGeographytable:
9. SpecifyjoinconditionsforallthreesourcetablesintheJoinQuerytransformandmapthesourcecolumntothetargetoutputschema:
10. PlaceanotherQuerytransformandnameitMapping.LinktheJoinQuerytransformtotheMappingQuerytransformandmapthesourcecolumnstothetargetschemacolumnswhichmatchthetabledefinitionoftheDWHDimGeographytable.MaponeextracolumnTERRITORYIDfromsourcetotarget:
11. IntheMappingQuerytransform,placeNULLinthemappingsectionsforthecolumnsthatwearenotgoingtopopulatevaluesfor.
3. Now,weneedtocreatefinalloadprocessesthatwillmovethedatafromthestagetransformationtablesintothetargetDWHdimensiontables.Performthesesteps:1. OpentheWF_loadworkflow,addtwoworkflowobjects
WF_Load_DimSalesTerritoryandWF_Load_DimGeography,andlinkthemtogethertorunsequentially.
2. OpenWF_Load_DimSalesTerritoryandcreateadataflowobject,DF_Load_DimSalesTerritory,insideit.
3. ThisdataflowwillperformacomparisonofsourcedatatoatargetDimSalesTerritorydimensiontabledataandwillproducethesetofupdatesfor
theexistingrecordswhosevalueshavechangedinthesourcesystem,orwillinsertrecordswithkeycolumnvaluesthatdonotexistinthedimensiontableyet:
4. IntheQuerytransform,simplymapallsourcecolumnsfromtheDimSalesTerritorytransformationtabletotheoutputschema.
5. InsidetheTable_Comparisonobject,definetargetDWHDimSalesTerritoryasacomparisontableandspecifySalesTerritoryAlternateKeyasakeycolumnandthreecomparecolumnsSalesTerritoryRegion,SalesTerritoryCountry,andSalesTerrtoryGroup,asshownhere:
6. Asthefinalstepinthedataflow,beforeinsertingdataintotargettableobject,theKey_GenerationtransformhelpsyoutopopulatetheSalesTerritoryKeycolumnofthetargetdimensiontablewithsequentialsurrogatekeys.SurrogatekeysarethekeysusuallygeneratedduringthepopulationofDWHtables.Surrogatekeycolumnscanidentifytheuniquenessoftherecord.Thisway,youhaveasinglecolumnwithauniqueIDthatyoucanuseinsteadofreferencingmultiplecolumnsinthetable,whichdefinestheuniquenessoftherecord:
7. Bydefault,alldimensiontablesintheDWHdatabaseweareusinghaveidentitycolumns.InSQLServer,theidentitycolumnsfeatureallowsyoutodelegatetheprocessofsurrogatekeyscreationtotheSQLServerdatabase.Yousimplyinserttherecordwithoutspecifyingvaluesfortheidentitycolumn,andtheSQLServerpopulatesthefieldforyouwiththesequentialuniquenumber.Inourcase,wewanttohavecontroloverthekeycreationourselvestobeabletogeneratethekeysintheETLbeforeinsertingthedata.Todothis,wehavetoenableIDENTITYINSERTbeforeinsertingtherecordsanddisableitaftertheinsert.Otherwise,youwillreceivetheerrormessagefromtheSQLServerinformingyouthatyoucannotpopulateidentitycolumnswithvaluesasitisdoneautomaticallybythedatabaseengine.
ToswitchtheabilitytoinsertsurrogatekeysinidentitycolumnsfromDataServices,openTargetTableEditoroftheDimSalesTerritorytableandpopulatethePre-LoadCommandsandPost-LoadCommandstabswiththefollowingtwocommandscorrespondingly:setidentity_insertdimsalesterritoryon
setidentity_insertdimsalesterritoryoff
8. Now,let’screatethesecondloadprocessofpopulatingtheDimGeographydimensiontable.OpentheDF_Load_DimGeographydataflowintheworkspacearea.
9. Thedataflowwillhavethesamestructureasthepreviousone,exceptthatwewilllookuptothealreadypopulatedDimSalesTerritorydimensiontableforSalesTerritoryKey:
10. IntheQuerytransform,mapallcolumnsfromthestageTransform.DimGeographytableandoneSalesTerritoryKeyfromtheDWHDimSalesTerritorytabletotheoutputschema.Forthejoincondition,specifythefollowingone:DIMGEOGRAPHY.TERRITORYID=DIMSALESTERRITORY.SALESTERRITORYALTERNATEKEY
11. Mappingtransformoutputschemadefinitionmatchesthetargettabledefinition,andhere,wewillfinallydroptheTERRITORYIDcolumnfromthemappings,aswedonotneeditanymore.
12. SpecifythefollowingsettingsintheTable_Comparisontransform:
13. IntheKey_Generationtransform,specifyDWH.DBO.DIMGEOGRAPHYasthetablenameandGEOGRAPHYKEYasthegeneratedkeycolumn.
14. Also,donotforgettodefinethecommandsinPre-LoadandPost-LoadtargettablesettingstoswitchonIDENTITY_INSERTandswitchitoffaftertheinsertiscomplete.Usethefollowingcommands:setidentity_insertdimgeographyon
setidentity_insertdimgeographyoff
Howitworks…Let’sreviewthedifferentaspectsoftheexamplewejustimplementedintheprevioussteps.
MappingBeforeyoustarttheETLdevelopmentinDataServices,youhavetodefinethemappingbetweensourcecolumnsofoperationaldatabasetables,targetcolumnsofDataWarehousetables,andtransformationrulesforthemigrateddata,ifrequired.Atthisstep,youalsohavetoidentifydependenciesbetweensourcedatastructurestocorrectlyidentifytypesofjoinrequiredtoextractthecorrectdataset.
Targetcolumn Sourcetable Sourcecolumn Transformationrule
SalesTerritoryKey NULL GeneratedsurrogatekeyinDWH
SalesTerritoryAlternateKey SalesTerritory TerritoryID Directmapping
SalesTerritoryRegion SalesTerritory Name Directmapping
SalesTerritoryCountry CountryRegion Name Directmapping
SalesTerritoryGroup SalesTerritory Group Directmapping
SalesTerritoryImage Notmigrating
Table1:MappingsfortheDimSalesTerritorydimension
Here,youcanfindthemappingtablefortheDimGeographydimension:
Targetcolumn Sourcetable Sourcecolumn Transformationrule
GeographyKey NULL GeneratedsurrogatekeyinDWH
City Address City Directmapping
StateProvinceCode StateProvince StateProvinceCode Directmapping
StateProvinceName StateProvince Name Directmapping
CountryRegionCode CountryRegion CountryRegionCode Directmapping
EnglishCountryRegionName CountryRegion Name Directmapping
SpanishCountryRegionName NULL Notmigrated
FrenchCountryRegionName NULL Notmigrated
PostalCode Address PostalCode Directmapping
SalesTerritoryKey DimSalesTerritory SalesTerritoryKey Lookup
IpAddressLocator NULL Notmigrated
Table2:MappingsforDimGeographydimension
Themajorityaredirectmappings,whichmeansthatwedonotchangethemigrateddataandmoveitasisfromsourcetotarget.TheinformationinthesemappingtablesisusedprimarilyintheQuerytransformsinsidethedataflowstojointhesourcetabletogetherandmapsourcecolumnsfromsourcetotargetschema:
Dependencies
ThenextstepistodefinethedependenciesbetweenpopulatedtargettablestounderstandinwhichorderETLprocessesloadingdataintothemshouldbeexecuted.TheprecedingdiagramshowsthatSalesTerritoryKeyfromtheDimSalesTerritorydimensiontableisusedasareferencekeyintheDimGeographydimensiontable.ThismeansthatETLprocessespopulatingeachofthesetablescannotbeexecutedinparallelandshouldrunsequentially,aswhenwepopulatetheDimGeographytable,wewillrequiretheinformationinDimSalesTerritorytobealreadyupdated.
Development
AfterdefiningthemappingsandtransformationrulesandmakingthedecisionabouttheexecutionorderofETLelements,youcanfinallyopentheDesignerapplicationandstartdevelopingtheETLjob.
NoteThenamingconventionsfortheworkflow,dataflow,scripts,anddifferenttransformationobjectsaswellasforstagingtableobjectsisveryimportant.ItallowsyoutoeasilyreadtheETLcodeandunderstandwhatresidesinonetableoranotherandwhattypeofoperationisperformedbyaspecificdataflowortransformationobjectwithinadataflow.
OurETLjobcontainsthreemainstagesthataredefinedbythreeworkflowobjectscreatedinthejob’sworkspace.Eachoftheseworkflowsplaysaroleofthecontainerfortheunderlyingworkflowobjectscontainingdataflows:
ThefirstworkflowcontainerWF_extractcontainstheprocessingunitsthatextractthedatafromtheOLTPsystemintotheDWHstagingarea.Therearedifferentadvantagesofthisapproachratherthanextractingandtransformingdatawithinthesamedataflow.Themainreasonisthatbycopyingthedataasisinthestagingarea,youaccesstheproductionOLTPsystemonlyonce,creatingaconsistentsnapshotoftheOLTPdataatspecifictime.Youcanqueryextractedtablesinstagingasmanytimesasyouwant,withoutaffectingtheliveproductionsystem’sperformance.Wedonotapplyanytransformationsormappinglogicintheseextractionprocessesandaresimplycopyingthecontentsofthesourcetablesasis.ThesecondworkflowcontainerWF_transformselectsthedatafromthestagetables,assemblesit,andtransformstomatchthetargettabledefinition.Atthisstage,wewillleaveallsurrogatekeycolumnsemptyandNULL-outthecolumnsforwhichwearenotgoingtomigratevalues.
NoteIntheDF_Transform_DimGeographydataflow,thetargettemplatetabledoesnotexactlymatchtheDWHtable’sDimGeographydefinition.WewillkeeponeextracolumnfromthesourceTERRITORYIDtoreferenceanotherdimensiontableDimSalesTerritoryattheloadstage.Withoutthiscolumn,wewouldnotbeabletolinkthesetwodimensiontablestogether.
Thethirdworkflowcontainer,WF_load,loadsthetransformeddatasetsintothetargetDWHdimensiontables.Anotherimportantoperationthisstepperformsisgeneratingsurrogatekeysforthenewrecordstobeinsertedintothetargetdimensiontable.
AnotherimportantdecisionyouhavetomakewhenyoupopulatedimensiontablesusingtheTable_Comparisontransformiswhichsetofkeysdefineanewrecordinthetargetdimensiontableandwhichcolumnsyouarecheckingforupdatedvalues.
Inthisexample,wemadeadecisiontoselectonlytwocomparisoncolumns,PostalCodeandSalesTerritoryKey.Wheneverthereisanewlocation(City+State+Country),therecordisinserted,andifthelocationexists,DataServicescheckswhetherthesource
recordcomingfromtheOLTPsystemcontainsnewvaluesinthePostalCodeorSalesTerritoryKeycolumn.Ifyes,thentheexistingrecordinthetargetdimensiontablewouldbeupdated.
NoteNotethatinthetransformationprocesseswedeveloped,wedidnotgenerateDWHsurrogatekeysforournewrecords.Themaingoalofthetransformationprocessistoassemblethedatasetforittomatchthetargettabledefinitionandapplyallrequiredtransformationifthesourcedatadonotcomplywiththedatawarehouserequirements.
Executionorder
Allthreestepsorthreeworkflows,WF_extract,WF_transform,andWF_load,runsequentiallyoneafteranother.Thenextworkflowstartsexecutiononlyaftersuccessfulcompletionofthepreviousone.
ChildobjectsofbothWF_extractandWF_transformruninparallelasatthosestages,wearenottryingtolinkthemigrateddatasetstoeachotherwithreferencekeys.
Atthefinalloadstage,WF_Load,containstwoworkflowobjectsthatrunsequentially.First,wewillfullypopulateandupdatetheDimSalesTerritorydimension,andthenafterit’sdone,wecansafelyreferenceitwhenpopulatingtheDimGeographytable.
TestingETL
ThebestwaytotestETListomakechangestothesourcesystem,runtheETLjob,andcheckthecontentsofthetargetdatawarehousetables.PreparingtestdatatopopulateDimSalesTerritory
Let’smakesomechangestothesourcedata.WewilladdanewsalesterritoryintheSales.SalesTerritorytableandanewstateinthePerson.StateProvincetable.RunthefollowingcodeintheSQLServerManagementStudio:—InsertnewrecordsintosourceOLTPtablestotestETL
—populatingDimSalesTerritory
USE[AdventureWorks_OLTP]
GO
—Insertnewsalesterritory
INSERTINTO[Sales].[SalesTerritory]
([Name],[CountryRegionCode],[Group],[SalesYTD],[SalesLastYear]
,[CostYTD],[CostLastYear],[rowguid],[ModifiedDate])
VALUES
(‘Russia’,‘RU’,‘Russia’,9000000.00,0.00
,0.00,0.00,NEWID(),GETDATE());
—Insertnewstate
INSERTINTO[Person].[StateProvince]
([StateProvinceCode],[CountryRegionCode],[IsOnlyStateProvinceFlag]
,[Name],[TerritoryID],[rowguid],[ModifiedDate])
VALUES
(‘CR’,‘RU’,1,‘Crimea’,12,NEWID(),GETDATE());
GO
PreparingtestdatatopopulateDimGeography
Toupdatethesourcetables,runthefollowingscriptintheSQLServerManagementStudio.ThisshouldcreateanewaddresswithanewcitywhichdoesnotyetexistintheDimGeographydimension.Youcouldskipthisstepas,bydefault,theOLTPdatabasehasmultipleaddressrecordsthatdonothavecorrespondentrowsinthetargetDWHdimension,buttomakethetestmoretransparent,itisrecommendedthatyoucreateyourownnewrecordinthesourcesystem:—InsertnewrecordsintosourceOLTPtablestotestETL
—populatingDimGeographydimension
USE[AdventureWorks_OLTP]
GO
—Insertnewaddress
INSERTINTO[Person].[Address]
([AddressLine1],[AddressLine2],[City],[StateProvinceID]
,[PostalCode],[SpatialLocation],[rowguid],[ModifiedDate])
VALUES
(‘10SuvorovaSt.’,NULL,‘Sevastopol’,182,‘299011’,NULL,NEWID(),GETDATE());
GO
Now,executethejobandquerybothdimensiontables.ThereisonenewrowinsertedinDimSalesTerritorywithSalesTerritoryKey=12andmultiplerecordswereinsertedintoandupdatedintheDimGeographytable.
AmongthenewrecordsinDimGeography,youshouldbeabletoseetherecordforthenewcityofSevastopolthatweinsertedmanuallywiththehelpoftheprecedingscript.
NoteIfyourunthejobagainwithoutmakingchangestothesourcesystem’sdata,itshouldnotcreateorupdateanyrecordsinthetargetdimensiontables,asallchangeshavealreadybeenpropagatedfromOLTPtoDWHbythefirstjobrun.ThemainobjectinourETLdrivingthechangestrackingistheTable_Comparisontransform.
UsingacontinuousworkflowInthisrecipe,wewilltakeacloselookatoneoftheworkflowobjectfeaturesthatcontrolshowtheworkflowrunswithinajob.
Howtodoit…
1. CreateajobwithasingleworkflowinsidenamedWF_continuous.Createasingle
globalvariable$g_countoftheintegertypeatthejoblevelcontext.2. Opentheworkflowpropertiesbyright-clickingontheworkflowobjectandselecting
theProperties…optionfromthecontextmenuandchangetheworkflowexecutiontypetoContinuousontheGeneralworkflowpropertiestab:
3. ExittheworkflowpropertiesbyclickingonOK.SeehowtheiconoftheworkflowobjectchangeswhenitsexecutiontypeischangedfromRegulartoContinuous:
4. GotoLocalObjectLibrary|CustomFunctions.5. Right-clickontheCustomFunctionslistandselectNewfromthecontextmenu.6. Namethecustomfunctionfn_check_flagandclickonNexttoopenthecustom
functioneditor.7. Createthefollowingparametersandvariables:
Variable/parameter Description
$p_DirectoryInputparameterofthevarchar(255)typetostorethedirectorypathvalue
$p_FileInputparameterofthevarchar(255)typetostorethefilenamevalue
$l_existLocalvariableoftheintegertypetostoretheresultofthefile_exists()function
8. Addthefollowingcodetothecustomfunctionbody:$l_exist=file_exists($p_Directory||$p_File);
if($l_exist=1)
begin
print(‘Check:fileexists’);
Return0;
end
else
begin
print(‘Check:filedoesnotexist’);
Return1;
End
Yourcustomfunctionshouldlooklikethis:
9. OpentheworkflowpropertiesagaintoeditthecontinuousoptionsusingtheContinuousOptionstab.
10. OntheContinuousOptionstab,tickthecheckboxwhentheresultofthefunctioniszerointheStopsectionatthebottomandinputthefollowinglineintheemptybox:fn_check_flag($l_Directory,$l_File).
11. ClickonOKtoexittheworkflowpropertiesandsavethechanges.12. Opentheworkflowinthemainworkspaceandcreatetwolocalvariablesinthe
VariablesandParameterswindow:$l_Directoryofthevarchar(255)typeand$l_Fileofthevarchar(255)type.
13. Createasinglescriptobjectwithinaworkflowandaddthefollowingcodeinit:$l_Directory=‘C:\AW\Files\’;
$l_File=‘flag.txt’;
$g_count=$g_count+1;
print(‘Execution#’||$g_count);
print(‘Starting’||workflow_name()||’…’);
sleep(10000);
print(‘Finishing’||workflow_name()||’…’);
14. Saveandvalidatethejobtomakesurethattherearenoerrors.15. Runthejobandafterfewworkflowexecutioncyclesaddtheflag.txtfileinthe
C:\AW\Files\directorytostopthecontinuousworkflowexecutionsequenceandthejobitself.
Howitworks…
Continuousexecutiontypeallowsyoutoruntheworkflowobjectanindefinitenumberoftimesinaloop.Therearemanyrestrictionsofusingthecontinuousworkflowexecutionmode.Someofthemareasfollows:
YoucannotnestcontinuousworkflowinanotherworkflowobjectSomedataflowtransformsarenotavailableforusewhenplacedunderacontinuousworkflowhierarchystructureAcontinuousworkflowobjectcanbeusedonlyinthebatchjob
Themainpurposeofthecontinuousworkflowisnottosubstitutethewhileloopasyoumighthavethoughtatafirstglance,buttosavememoryandprocessingresourcesforthetasksthathavetobeexecutedagainandagain,indefinitelyinthenon-stopmodeorforaverylongperiodoftime.DataServicesissavingresourcesbyinitializingandoptimizingforexecutionalltheunderlyingstructuressuchasdataflows,datastores,andmemorystructuresrequiredfordataflowprocessingonlyoncethecontinuousworkflowobjectisexecutedforthefirsttime.
ThereleaseresourcessectioninsidetheContinuousoptioncontrolshowoftenresourcesusedbytheunderlyingobjectsarereleasedandreinitialized.
Itisnotpossibletospecifytheexactnumberofcyclesforthecontinuousworkflowdirectly.Theonlyoptiontoaddthestoplogicistowriteacustomfunctionthatisexecutedaftereverycycle,andifitreturnszero,thevaluestopsthecontinuousworkflowexecutionsequence.
Intheprecedingrecipe,wecreatedacustomfunctionthatchecksthepresenceofthefileinthespecifiedfolder.Ifthefileappearsinthere,itreturns0.Thejobwillberunningindefinitelyuntilthefileappearsinthefolder,orthejobitselfiskilledmanually,orthejobservercrashes.
Tochecktheexistenceofthefile,thefile_exists()functionisused.Itreturns1ifthefileexistsand0ifitdoesnot.Thefunctionacceptsasingleparameter:afullfilenamethatincludesthepath.Asinourcase,weareinterestedinstoppingcontinuousworkflowexecution.Whenthefunctionreturns0,wehadtoinvertthereturnedvalueofthefunctionandcreatedacustomfunctionforthat.
Weaddedthesleep()functiontoimitatetheexecutionoftheworkflowsothatitwouldbeeasytoplacethefilewhiletheexecutioncycleisstillrunning.TheSleep()functionacceptsintegerparametersinmilliseconds,so10000isequalto10seconds.
Theglobalvariable$g_countwasaddedtocontrolthenumberofcyclesthatwereexecutedinthecontinuousworkflowsequence.
Anotherinterestingfactabouthowcontinuousworkflowbehavesisthatitalwaysexecutesanothercycleafterthestopfunctionreturnsthezerovalue.Lookatthefollowingscreenshot:
Seethatinspiteofthefactthatweplacedtheflag.txtfileduringthethirdexecutioncycleandthestopfunctionfounditandreturnedazerovalue(seetheCheck:fileexistsprintmessageinthetracelog),thefourthcyclestillwasexecuted.
Let’stryanothertesttoconfirmthis.Placetheflag.txtfilebeforethejobisexecutedandthenrunit.Thisiswhatyouseeinthetracelogfile:
Youcanseethatafterthecustomfunctionreturned0afterthefirstcycle,thecontinuousworkflowwasexecutedthesecondtime.
Thereismore…
Youhavetounderstandthatcontinuousworkflowusageisverylimitedinreallifebecauseoffunctionalrestrictionsandalsobecauseofthenatureoftheloopinwhichtheworkflowisexecuted.Inthemajorityofcases,thewhileloopobjectisapreferableoptiontoruntheworkfloworunderlyingprocessingsequenceofobjects.
Peekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjectsWiththeintroductionofworkflowobjects,whichallowthenestingandgroupingofobjects,youcanseethatETLcodeexecutedwithintheDataServicesjobisahierarchicalstructureofobjectsthatcanbequitecomplex.Justimagineifreal-lifejobshavehundredsofworkflowsintheirstructureandtwiceasmanydataflows.
Inthisrecipe,wewilllookunderthehoodofDataServicestoseehowitstorestheobjectinformation(ourETLcode)inthelocalDataServicesrepository.TechniqueslearnedinthisrecipecanhelpyoubrowsethehierarchyofobjectswithinyourlocalrepositorywiththehelpofthedatabaseSQLlanguagetoolset.Thisoftenprovestobeaveryconvenientmethodtouse.
Gettingready
YouwillnotcreateanyjobsorotherobjectsintheDataServicesDesigneraswearejustgoingtobrowsetheETLcodeandrunafewqueriesintheSQLServerManagementStudio.
Howtodoit…
FollowthesesimplestepstoaccessthecontentsoftheDataServiceslocalrepositoryinthisrecipe:
1. StarttheSQLServerManagementStudioandconnecttotheDS_LOCAL_REPO
databasecreatedinChapter2,ConfiguringtheDataServicesEnvironment.2. Querythedbo.AL_PARENT_CHILDtableforreferencesbetweenDataServicesobjects
andadditionalinfo.3. Querythedbo.AL_LANGTEXTtableforextraobjectpropertiesandscriptobject
contents.
Howitworks…
Queryingobject-relatedinformationfromtheDataServicesrepositorycouldbeusefulifyouwanttobuildthereportonETLmetadatathatdoesnotexistoutoftheboxinDataServices.ItalsocouldbeusefulwhentroubleshootingpotentialproblemswithyourETLcode.Wewilltakealookatthedifferentscenariosandbrieflyexplaineachcase.GetalistofobjecttypesandtheircodesintheDataServicesrepository
Usethefollowingquery:select
descen_obj_type,descen_obj_r_type,count(*)
from
dbo.al_parent_child
groupby
descen_obj_type,descen_obj_r_type;
ThemaintableofthereferenceistheAL_PARENT_CHILDtable.Itcontainsthefullhierarchyoftheobjectsstartingfromthejobobjectlevelandfinishingwiththetableobjectlevel.TheprecedingqueryshowsallthepossibleobjecttypesthatDataServices
registersintherepository.DisplayinformationabouttheDF_Transform_DimGeographydataflow
Usethequerytogetthisinformation:select*
from
dbo.al_parent_child
where
descen_obj=‘DF_Transform_DimGeography’;
Allcolumnsandtheirvaluesareexplainedinthistable:
Columnname Value Description
PARENT_OBJ WF_Transform_DimGeography
ThisisthenameoftheparentobjectDF_Transform_DimGeography
belongsto.Seethefollowingfigure.
PARENT_OBJ_TYPE WorkFlow Thisisthetypeoftheparentobject.
PARENT_OBJ_R_TYPE 0Thisisthetypecodeoftheparentobject.
PARENT_OBJ_DESC Nodescriptionavailable
Thisisthedescriptionoftheparentobject.ThisiswhatyouinputintheDescriptionfieldinsidetheworkflowpropertieswindowintheDesigner.Ifempty,DataServicesuses“Nodescriptionavailable”intherepotable.
PARENT_OBJ_KEY 175Thisistheinternalparentobjectkey(ID).
DESCEN_OBJ DF_Transform_DimGeographyThisistheobjectnamewearelookingupinformationfor.
DESCEN_OBJ_TYPE DataFlow Thisisthetypeoftheobject.
DESCEN_OBJ_R_TYPE 1 Thisisthetypecodeoftheobject.
DESCEN_OBJ_DESC Nodescriptionavailable
ThisisthecontentsoftheDescriptionfieldofdataflowpropertiesintheDesigner.Itisemptyforthisspecificdataflow.
DESCEN_OBJ_USAGE NULLThisindicateswhethertheobjectisasourceoratargetwithinadataflow.Astheobjectitselfisadataflow,thisfieldisnotpopulated.
DESCEN_OBJ_KEY 174 Thisistheinternalobjectkey(ID).
DESCEN_OBJ_DS NULL
Thisindicateswhatthedatastoreobjectbelongsto.Astheobjectwearelookingupisadataflow.thisfieldisnotpopulated.
DESCEN_OBJ_OWNER NULL
Thisisthedatabaseowneroftheobject.Itisnotapplicabletodataflowobjectseither.
DisplayinformationabouttheSalesTerritorytableobject
Usethefollowingquery:select
parent_obj,descen_obj_desc,descen_obj_usage,descen_obj_key,descen_obj_ds,descen_obj_owner
from
dbo.al_parent_child
wheredescen_obj=‘SALESTERRITORY’;
Theresultisinthefollowingscreenshot:
Fromtheprecedingscreenshot,youcanseethattwodifferentobjectswiththesamenameSALESTERRITORYexistintheDataServicesrepositorywithuniquekeys37and38.
TheonewithOBJ_KEYas37isimportedintheOLTPdatastoreandbelongstotheSalesschema.ItisusedonlyDF_Extract_SalesTerritoryasithasonlyonerecordwiththeparentobjectofthatname.
TheSALESTERRITORYobjectwithOBJ_KEYas38isastageareatableandisimportedintotheDS_STAGEdatastoreandbelongstotheExtractdatabaseschema.Ithastwodifferentparentobjects,asinDesigner,itwasplacedintotwodifferentdataflows:asatargettableobjectinDF_Extract_SalesTerritory(youcanseeitfromtheDESCEN_OBJ_USAGE
column)andasasourcetableobjectinDF_Transform_DimSalesTerritory.Seethecontentsofthescriptobject
TheonethingyouhaveprobablynoticedalreadyfromtheresultoftheveryfirstqueryinthisrecipeisthatDataServicesdoesnothaveascriptobjecttype.
Asyouprobablyremember,scriptobjectsdonothavetheirowncontextinDataServicesandoperateinthecontextoftheworkflowobjecttheybelongto.Thatiswhy,youhavetoquerytheinformationaboutworkflowpropertiesusinganothertableAL_LANGTEXTtofindtheinformationaboutscriptcontentsintheDataServicesrepository.
Usethefollowingquery:select*
fromdbo.al_langtexttxt
JOINdbo.al_parent_childpc
ontxt.parent_objid=pc.descen_obj_key
where
pc.descen_obj=‘WF_continuous’;
WeareextractinginformationaboutthescriptobjectcreatedintheWF_continuousworkflow.
Allworkflowpropertieswiththecontentsofallscriptsthatbelongtoitarestoredinaplaintextformat.
Inthistable,weareonlyinterestedintwocolumnsSEQNUM,whichrepresentsthenumberofpropertiestextrow,andTEXTVALUE,whichstoresthepropertiestextrowitself.
SeetheconcatenatedversionofinformationstoredintheTEXTVALUEcolumnoftheAL_LANGTEXTrepositorytablehere:AlGUIComment(“ActaName_1”=‘RSavedAfterCheckOut’,“ActaName_2”=‘RDate_created’,“ActaName_3”=‘RDate_modified’,“ActaValue_1”=‘YES’,“ActaValue_2”=‘SatJul0416:52:332015’,“ActaValue_3”=‘SunJul0511:18:022015’,“x”=’-1’,“y”=’-1’)
CREATEPLANWF_continuous::‘7bb26cd4-3e0c-412a-81f3-b5fdd687f507’()
DECLARE
$l_DirectoryVARCHAR(255);
$l_FileVARCHAR(255);
BEGIN
AlGUIComment(“UI_DATA_XML”=’<UIDATA><MAINICON><LOCATION><X>0</X>
<Y>0</Y></LOCATION><SIZE><CX>216</CX><CY>-179</CY></SIZE></MAINICON>
<DESCRIPTION><LOCATION><X>0</X><Y>-190</Y></LOCATION><SIZE><CX>200</CX>
<CY>200</CY></SIZE><VISIBLE>0</VISIBLE></DESCRIPTION></U
IDATA>’,“ui_display_name”=‘script’,“ui_script_text”=’$l_Directory='C:\\AW\\Files\\';
$l_File='flag.txt';
$g_count=$g_count+1;
print('Execution#'||$g_count);
print('Starting'||workflow_name()||'…');
sleep(10000);
print('Finishing'||workflow_name()||'…');’,“x”=‘116’,“y”=’-175’)
BEGIN_SCRIPT
$l_Directory=‘C:\AW\Files\’;$l_File=‘flag.txt’;$g_count=($g_count+1);print((‘Execution#’||$g_count));print(((‘Starting’||workflow_name())||’…’));sleep(10000);print(((‘Finishing’||workflow_name())||’…’));END
END
SET(“loop_exit”=‘fn_check_flag($l_Directory,$l_File)’,“loop_exit
_option”=‘yes’,“restart_condition”=‘no’,“restart_count”=‘10’,“restart_count_option”=‘yes’,“workflow_type”=‘Continuous’)
ThefirsthighlightedsectionoftheprecedingcodeisthedeclarationsectionoflocalworkflowvariablescreatedforWF_continuous.Thesecondhighlightedsectionismarkingthetextthatbelongstotheunderlyingscriptobject.YoucanseethatthescriptobjectisnotconsideredbyDataServicesasaseparateobjectentityandisjustapropertyoftheparentworkflowobject.Tocompare,takealookathowthescriptcontentslooklikeinDesigner:$l_Directory=‘C:\AW\Files\’;
$l_File=‘flag.txt’;
$g_count=$g_count+1;
print(‘Execution#’||$g_count);
print(‘Starting’||workflow_name()||’…’);
sleep(10000);
print(‘Finishing’||workflow_name()||’…’);
YoucanseethatformattingofthesameinformationstoredintheTEXTVALUEfieldisabitdifferent.So,becarefulwhenextractingandparsingthisdatafromthelocalrepository.
Finally,thethirdhighlightedsectionmarkstheworkflowpropertiesconfiguredwiththeProperties…contextmenuoptioninDesigner.
NoteThereisanotherversionoftheAL_LANGTEXTtablethatcontainsthesamepropertiesinformationbutintheXMLformat.ItistheAL_LANGXMLTEXTtable.
Chapter6.Job–BuildingtheETLArchitectureInthischapter,wewillcoverthefollowingtopics:
Projectsandjobs–organizingETLUsingobjectreplicationMigratingETLcodethroughthecentralrepositoryMigratingETLcodewithexport/importDebuggingjobexecutionMonitoringjobexecutionBuildinganexternalETLauditandauditreportingUsingbuilt-inDataServicesETLauditandreportingfunctionalityAutoDocumentationinDataServices
IntroductionInthischapter,wewillgouptothejoblevelandreviewthestepsinthedevelopmentprocessthatmakeasuccessfulandrobustETLsolution.Allrecipespresentedinthischaptercanfallintooneofthethreecategories:ETLdevelopment,ETLtroubleshooting,andETLreporting.ThesecategoriesincludedesigntechniquesandprocessesusuallyimplementedandexecutedsequentiallyinorderwithintheETLlifecycle.
Here,youcanseewhichtopicsfallunderwhichcategory.
DevelopingETL:
Projectsandjobs–organizingETLUsingobjectreplicationMigratingETLcodethroughthecentralrepositoryMigratingETLcodewithexport/import
ThedevelopingcategorydiscussesissuesfacedbyETLdevelopersonadailybasiswhentheyworkondesigningandimplementinganETLsolutioninDataServices.
TroubleshootingETL:DebuggingjobexecutionMonitoringjobexecution
ThetroubleshootingcategoryexplainsindetailthetroubleshootingtechniquesthatcanbeusedinDataServicesDesignertotroubleshoottheETLcode.
ReportingonETL:BuildingexternalETLauditandauditreportingUsingbuilt-inDataServicesETLauditandreportingfunctionalityAutoDocumentationinDataServices
ThereportingcategoryreviewsthemethodsusedtoreportonETLmetadataandalsoexplainstheAutoDocumentationfeatureavailableinDataServicestoquicklygenerateandexportdocumentationforthedevelopedETLcode.
Projectsandjobs–organizingETLProjectsareasimpleandgreatmechanismtogroupyourETLjobstogether.TheyarealsomandatorycomponentsofETLcodeorganizationforvariousDataServicesfeatures,suchasAutoDocumentationandbatchjobconfigurationavailableintheDataServicesManagementConsole.
GettingreadyTherearenopreparationsteps.Youhaveeverythingyouneedinyourlocalrepositorythathasalreadybeencreated.Inthisrecipe,wewilluseJob_DWH_DimGeographydevelopedinChapter5,Workflow–ControllingExecutionOrder,topopulatetheDWHdimensiontablesDimSalesTerritoryandDimGeography.
Howtodoit…TocreateaprojectobjectinDataServices,followthesesteps:
1. OpentheLocalObjectLibrarywindowandchoosetheProjectstab.2. Right-clickintheemptyspaceoftheProjectstabandselectNewfromthecontext
menu.TheProject–Newwindowappearsonthescreen.3. InputtheprojectnameasDWH_DimensionsintheProjectNamefield.4. OpentheProjectAreawindowusingtheProjectAreabuttononthetoolbaratthe
top:
5. GotoProjectArea|Designer.Youwillonlyseethecontentsofoneselectedproject.ToselecttheprojectormakeitvisibleintheProjectArea|Designerwindow,gotoLocalObjectLibrary|Projectsandeitherdouble-clickontheprojectyouareinterestedin(inourcase,ithasonlyoneprojectcreated)orchooseOpenfromthecontextmenuoftheselectedproject.
6. Toaddthejobintheproject,draganddroptheselectedjobfromLocalObjectLibrary|JobsintotheProjectArea|Designertabwindoworright-clickonthejobobjectinLocalObjectLibraryandchoosetheAddToProjectoptionfromthecontextmenu.AddJob_DWH_DimGeographycreatedinthepreviousrecipetotheDWH_Dimensionsproject:
Howitworks…Thisisallyouneedtodotocreateaprojectandplacejobsinit.Itisaverysimpleprocessthat,infact,bringsyouafewextraadvantagesthatyoucanuseinETLdevelopment.TheprocessalsorevealsnewfunctionalitynotaccessibleotherwiseinDataServices.Let’stakealookatsomeofthem.
HierarchicalobjectviewAvailableintheProjectArea|Designer,thisviewallowsyoutoquicklyaccessanychildobjectwithinajob.Inthefollowingscreenshot,theexpandingtreeshowsworkflow,dataflow,andtransformationobjects;byclickingonanyofthem,youopentheminthemainworkspacewindow:
HistoryexecutionlogfilesTheselogfilesareavailableonlyifthejobwasassignedtoaproject.TheProjectArea|Logtaballowsyoutoseeandaccessallavailablelogfiles(trace,performance,anderrorlogs)keptbyDataServicesforspecificjobs:
Executing/schedulingjobsfromtheManagementConsole
Yes,thisoptionisavailableonlyforjobsthatbelongtoaproject.
Usehttp://localhost:8080/DataServicestostartyourDataServicesManagementConsole.
LogintotheManagementConsoleusingtheetluseraccountcreatedintheConfiguringuseraccessrecipeofChapter2,ConfiguringtheDataServicesEnvironment.ItisthesameuseryouusetoconnecttoDataServicesDesigner.
GotoAdministrator|Batch|DS4_REPO.
IfyouopentheBatchJobConfigurationtab,youwillseethatonlyJob_DWH_DimGeographyisavailableforbeingexecuted/scheduled/exportedforexecution,asitwastheonlyjobinourlocalrepositorythatweaddedtoacreatedproject:
Asyoucansee,projectsarethecontainersforyourjobs,allowingyoutoorganizeanddisplayyourETLcodeandperformadditionaltasksfromtheManagementConsoleapplication.Keepinmindthatyoucannotaddanythingelseexceptthejobobjectdirectlyintotheprojectlevel.
UsingobjectreplicationDataServicesallowsyoutoinstantlycreateanexactreplicaofalmostanyobjecttypeyouareusinginETLdevelopment.Thisfeatureisusefultocreatenewversionsofanexistingworkflowordataflowtotestorjusttocreatebackupsattheobjectlevel.
Howtodoit…Wewillreplicateajobobjectusingthesesteps:
1. GotoLocalObjectLibrary|Jobs.2. Right-clickontheJob_DWH_DimGeographyjobandselectReplicatefromthecontext
menu:
3. CopyofthejobwiththenewnameiscreatedintheLocalObjectLibrary:
Howitworks…AllobjectsinDataServicescanbeidentifiedaseitherreusableornotreusable.
Areusableobjectcanbeusedinmultiplelocations,thatis,atableobjectimportedinadatastorecanbeusedasasourceortargetobjectindifferentdataflows.Nevertheless,allthesedataflowswillreferencethesameobject,andifchangedinoneplace,itwouldchangeeverywhereitisused.
Notreusableobjectsrepresenttheinstancesofaspecificobjecttype.Forexample,ifyoucopyandpastethescriptobjectfromoneworkflowtoanother,thesetwocopieswillbetwodifferentobjects,andbychangingoneofthem,youarenotmakingchangestoanother.
Let’stakeanotherexampleofadataflowobject.Dataflowsarereusableobjects.Ifyoucopyandpastetheselecteddataflowobjectintoanotherworkflow,youwouldcreateareferencetothesamedataflowobject.
Tobeabletomakeacopyofareusableobjectsothatthecopydoesnotreferencetheoriginalobject,ithasbeencopiedfromthereplicationfeatureusedinDataServices.Notethatthereplicatedobjectcannothavethesamenameastheoriginalobjectithasbeenreplicatedfrom.Thatisbecauseforreusableobjectssuchasworkflowsanddataflows,theirnamesuniquelyidentifytheobject.
NoteTheruleofthumbforcheckingwhetheranobjecttypeisreusableornot,istocheckifitexistsintheLocalObjectLibrarypanel.AllobjectsthatcanbefoundonLocalObjectLibrarypaneltabsarereusableobjects,exceptProjects,asitisnotpartofexecutableETLcode.Instead,itisalocationfolderthatisusedtoorganizejobobjects.Nevertheless,youcannotcreatetwoprojectswiththesametoollikeyoucanwiththescriptobjects.
ThefollowingtableshowswhichobjecttypecanbereplicatedinDataServicesandhowthereplicationprocessbehavesforeachoneofthem.Allthesearereusableobjecttypes.
Job NewobjectautomaticallycreatedinLocalObjectLibrarynamedasCopy_<ID>_<originaljobname>
Workflow NewobjectautomaticallycreatedinLocalObjectLibrarynamedasCopy_<ID>_<originalworkflowname>
Dataflow NewobjectautomaticallycreatedinLocalObjectLibrarynamedasCopy_<ID>_<originaldataflowname>
Fileformat
NewFileFormatEditorwindowisopened.ThenewnameisalreadydefinedasCopy_<ID>_<OriginalFileFormatname>,butyoucanchangeitbyaddinganewvalueintothenamefield
Customfunctions
NewCustomerFunctionwindowisopened.Youhavetoselectanewnameforthereplicatedfunction
Thereplicationprocessisaconvenientandeasywaytoperformobject-levelbackups.AllyouhavetodotocreateacopyoftheobjectbeforeeditingitistoclickontheReplicateoptionfromthecontextmenuoftheobjectyouarereplicating.
ItisalsoaneasywaytotestthecodechangesbeforeyoudecidetoupdatetheproductionversionoftheETL.
Forexample,ifyouwanttoseehowyourdataflowobjectbehavesafteryouchangethepropertiesoftheTable_Comparisontransforminsideit,youcanperformthefollowingsequenceofsteps:
1. Replicatethedataflowandsetituptorunseparatelywithinatestjob.2. Runthetestjobandtesttheoutputdatasettomakesurethatitgeneratestheexpected
result.3. Renametheoriginaldataflowbyaddingthe_archiveor_oldprefixtoit.4. Renamethenewreplicatedversiontotheoriginaldataflowname.5. Replacethearchivedataflowobjecteverywhereitisusedwithanewversion.
Toseeallparentobjectsthespecificobjectbelongstoyou.Inotherwords,toseeallthelocationswherethespecificobjectwasplaced,youcanuseoneofthefollowingsteps:
1. ChoosetheDIMGEOGRAPHYobjectfromtheDWHdatastoreinLocalObjectLibrary.
Right-clickonitandchoosetheViewWhereUsedoptionfromthecontextmenu.
TheparentobjectsthatthetableobjectbelongstoaredisplayedintheInformationtaboftheOutputwindow:
Youcanalsoseethenumberofparentobjects(locations)fortheobjectrightawayinLocalObjectLibraryintheUsagecolumnavailablenexttotheobjectname.Thisisusefulinformationthatcanhelpyouidentifyunusedor“orphaned”objects.
2. Picktheobjectofinterestintheworkspacearea(forexample,adataflowplaced
withinaworkflowworkspaceortableobjectplacedinthedataflow),right-clickonit,andchooseViewWhereUsedfromthecontextmenu.ThelistofparentobjectswillappearintheOutput|Informationwindow:
3. Finally,itispossibletocheckwherethecurrentlyopenedobjectisused.Whenyouhavetheobjectopenedintheworkspaceareaanddonothavetheabilitytoright-clickonit,insteadofgoingtotheLocalObjectLibrarylistsinordertofindtheobject,trytojustclickontheViewWhereUsedbuttonfromthetoptoolmenupanel:
NoteRememberthatitdisplaystheusedlocationslistfortheobjectcurrentlydisplayedontheactivetabofthemainworkspacearea.
MigratingETLcodethroughthecentralrepositoryInthisrecipe,wewilltakeabrieflookattheaspectsofworkinginthemultiple-userdevelopmentenvironmentandhowDataServicesaccommodatestheneedtomigratetheETLcodebetweenlocalrepositoriesbelongingtodifferentETLdevelopers.
GettingreadyTouseallfunctionalityavailableinDataServicestoworkinamultiuserdevelopmentenvironment,wemissaveryimportantcomponent:theconfiguredcentralrepository.So,togetready,andbeforeweexplorethisfunctionality,wehavetocreateanddeploythecentralrepositoryintoourDataServicesenvironment.
Performallthefollowingstepstocreate,configure,anddeploythecentralrepository:
1. OpentheSQLServerManagementStudioandconnecttotheSQLEXPRESSserver
engine.2. Right-clickonDatabasesandchoosetheNewDatabase…optionfromthecontext
menu.3. NamethenewdatabaseasDS_CENTRAL_REPOandkeepallitsparameterswithdefault
values.4. StarttheSAPDataServicesRepositoryManagerapplication.5. ChooseRepositorytypeasCentralandspecifyconnectivitysettingstothenew
databaseDS_CENTRAL_REPO.Whenyoufinish,clickontheCreatebuttontocreatecentralDataServicesrepositoryobjectsintheselecteddatabase:
6. Theprocessofcreatingarepositorycantakeafewminutes.Ifitissuccessful,youshouldseethefollowingoutputonthescreen:
7. Now,weneedtoregisterournewlycreatedcentralrepositorywithinDataServicesandtheInformationPlatformServices(IPS)configuration.StarttheCentralManagementConsolewebapplicationbygoingtohttp://localhost:8080/BOE/CMCandlogintotheadministratoraccount.ItisthesameaccountthatwascreatedduringtheinstallationofDataServices(seeChapter2,ConfiguringtheDataServicesEnvironment,fordetails).
8. ChoosetheDataServiceslinkonthehomescreentoopentheDataServicesrepositoryconfigurationarea.
9. Right-clickontheRepositoriesfolderorintheemptyareaofthemainwindowand
choosetheConfigurerepositoryoptionfromthecontextmenu:
10. NamethenewlyconfiguredrepositoryasDS4_CENTRALandinputconnectivitysettings.Afterthat,clickonTestConnectiontoseethesuccessfulconnectionmessage:
11. Closetherepositorypropertieswindow.Youshouldseethenewnon-securedcentralrepository,DS4_CENTRAL,displayedonthescreenalongwithlocalrepositoryDS4_REPO:
12. Right-clickonDS4_CENTRALandchoosetheUserSecurityoptionfromthecontextmenu.
13. ChooseDataServicesAdministratorUsersandclickontheAssignSecurity
button.14. OntheAssignSecuritywindow,gototheAdvancedtabandclickonthe
Add/RemoveRightslink.15. OntheAdd/RemoveRightswindow,chooseApplication|DataServices
Repositoryandselect/grantthefollowingoptionsintheright-handsideundertheSpecificRightsforDataServicesRepositorysection:
16. ClickonOKtosavethechangesandclosetheUserSecuritywindow.17. ThefinalstepofconfigurationistospecifythecentralrepositoryinyourDesigner
configurationsettings.ThiscanbeconfiguredontheDesignerOptionwindow,oryoucanopentheCentralRepositoryConnectionssectionbygoingtoTools|CentralRepositories…fromthetopmenu.
18. IntheCentralRepositoryConnectionssection,clickontheAddbuttontoopenthelistofrepositoriesavailableandselectDS4_CENTRAL.
19. TheActivatebuttonactivatesthecentralrepositoryfromthelist(ifyouaddmultipleones,onlyoneofthemcanbeactiveatatime).YoucanalsospecifytheReactivateautomaticallyflagforthecentralrepositorytoreactivateautomaticallywhentheDesignerapplicationrestarts:
20. Afterperformingallthesesteps,youshouldbeabletoactivatetheCentralObjectLibrarywindow(seethetoptoolpanel),whichlooksalmostexactlylikeLocalObjectLibrary:
Theprecedingstepsshowedyouhowtocreate,configure,anddeploythecentralrepositoryinDataServices.Next,wewillseehowyoucanactuallyusethecentralrepositorytomigratetheETLbetweendifferentlocalrepositories.
Howtodoit…ThecentralrepositoryorCentralObjectLibraryisalocationsharedbydifferentETLdeveloperstoexchangeandsynchronizetheETLcode.Inthisrecipe,wewillcopytheexistingjobintoCentralObjectLibraryandseewhichoperationsareavailableinDataServicesontheobjectsstoredthere.Followthesesteps:
1. GotoLocalObjectLibrary|Jobs.2. Right-clickontheJob_DWH_DimGeographyjobobjectandgotoAddtoCentral
Repository|ObjectandDependentsfromthecontextmenu.3. OpenCentralObjectLibraryandseethatthejobobjectandalldependentobjects,
workflows,anddataflowsappearedontheCentralObjectLibrarytabsections.TheETLcodeforJob_DWH_DimGeographyhasbeensuccessfullymigratedtothecentralrepository.
4. Now,gototheLocalObjectLibrary|Dataflows,findtheDF_Load_DimGeographydataflowobject,anddouble-clickonittoopenitintheworkspaceareaforediting.
5. RenamethefirstQuerytransformfromQuerytoJoinandsavethedataflow.6. NowthatyouhavechangedtheETLcodemigratedfromlocaltocentralrepository,
youcancomparethetwoversionsofyourjobandseethedifferencesdisplayedinDifferencesViewer.Right-clickonthejobinLocalObjectLibraryandgotoCompare|ObjectanddependentstoCentralfromthecontextmenu:
7. WheninCentralObjectLibrary,youcandothesamethingbyclickingonaspecificobjectandchoosingthepreferableoptionfromtheComparecontextmenu.
8. Togettheversionoftheobjectfromthecentralrepositorytoalocalone,selecttheDF_Load_DimGeographydataflowobjectintheCentralObjectLibrary,right-clickonit,andgotoGetLatestVersion|Objectfromthecontextmenu.
9. Ifyoucomparethelocalobjectversiontotheonestoredinthecentralrepositorynow,youwillseethatthereisnodifference,asthecentralobjectversionhasoverwrittenthelocalobjectversion.
Howitworks…ThepurposeofthecentralrepositoryistoprovideacentralizedlocationtostoreETLcode.
TheCentralObjectLibraryrepresentsthecontentsofthecentralrepositoryinthesamewaythattheLocalObjectLibraryrepresentsthecontentsofthelocalrepository.
TheETLcodestoredinthecentralrepositorycannotbechangeddirectlyasinthelocalrepository.So,itprovidesalevelofsecuritytomakesurethatthecentralrepositorychangescanbetracked,andthehistoryofalloperationsperformedonitsobjectscanbedisplayed.
AddingobjectstoandfromtheCentralObjectLibraryIftheobjectdoesnotexistinthecentralrepository,youcanadditusingtheAddtoCentralRepositoryoptionfromtheobjectscontextmenu.
Iftheobjectalreadyexistsinthecentralrepository,thereareafewextrastepsrequiredtoupdateitwithanewerversionfromthelocalone.Wewilltakeacloselookatthisfunctionalityintheupcomingchapters.
Gettingtheobjectfromthecentraltothelocalrepositoryismuchmoresimple.AllyouneedtodoisusetheGetLatestVersionoptionfromtheobjectscontextmenuinCentralObjectLibrary.Itdoesnotmatteriftheobjectexistsornotinthelocalrepository—itwillbecreatedoroverwritten.Thismeansthatitwillbedeletedandcopiedfromthecentralrepository.
Anotherimportantaspectofcopyinganobjectinto,andfrom,thecentralrepositoryistheavailabilityofthreemodes:Object,Objectanddependents,andWithfiltering:
Object:Inthismode,itdoesnotmatterwhichoperationyouperform,whetheritisgettingthelatestobjectversionfromcentraltolocal,comparingobjectversionsbetweencentralandlocal,orjustplacingobjectsfromlocaltocentral.Theoperationisperformedonthisobjectonly.Objectanddependents:Thisoperationaffectsallthechildobjectsbelongingtotheselectedobject,theirchildobjects,theirchildobjects,andsoonuntilthelowestlevel
downthehierarchy(whichisusuallyatable/fileformatlevel).Withfiltering:ThismodeisbasicallythesameasObjectanddependents,butwiththeabilitytoexcludethespecificobjectfromtheaffectedobjects.Whenchosen,thenewwindowopens,allowingyoutoexcludespecificobjectsfromthehierarchytree.HereistheresultofchoosingAddtoCentralRepository|WithfilteringfortheJob_DWH_DimGeographyobject:
ComparingobjectsbetweentheLocalandCentralrepositoriesDesignerhasaveryusefulComparefunctionavailableforallobjectsstoredinthelocalorcentralrepositories.Whenselectedfromthecontextmenuoftheobjectstoredinacentralrepositorylocation,therearetwoComparemethodsavailable:ObjecttoLocalandObjectwithdependentstoLocal.
Whenselectedfromthecontextmenuoftheobjectstoredinalocalrepositorylocation,therearetwoComparemethodsavailable:ObjecttoCentralandObjectwithdependentstoCentral.
TheresultispresentedintheDifferenceViewerwindow,whichopensinthemainworkspaceareainaseparatetabandlookssimilartothefollowingscreenshot:
ThisisanexampleoftheDifferenceViewerwindow.NotehowwehaveonlyrenamedtheQuerytransform,yetDifferenceViewershowsthewholestructureoftheJoinQueryobjectasdeleted,andontheCentraltab,itshowsthenewQueryQuerytransformstructure.TheMappingandLinkssectionsoftheupdateddataflowarealsoaffected,asyoucanseeintheprecedingscreenshot.
Thereismore…Ihavenotdescribedoneofthemostimportantconceptsofthecentralrepository:theabilitytocheckoutandcheckinobjectsandviewthehistoryofchangesinthemultiuserdevelopmentenvironment.Ihaveleftitformoreadvancedchapters,anditwillbeexplainedfurtherinthebook.
MigratingETLcodewithexport/importDataServicesDesignerhasvariousoptionstoimport/exportETLcode.
Inthisrecipe,wewillreviewallpossibleimport/exportscenariosandtakeacloserlookatthefileformatsusedforimport/exportinDataServices:ATLfiles(themainexportfileformatfortheDataServicescode)andXMLstructures.
GettingreadyTocompletethisrecipe,youwillneedanotherlocalrepositorycreatedinyourenvironment.RefertothefirsttwochaptersofthebooktocreateanotherrepositorynamedDS4_LOCAL_EXTinthenewdatabase,DS_LOCAL_REPO.DonotforgettoassignthepropersecuritysettingsforDataServicesAdministratorusersinCMCafterregisteringthenewrepository.
Howtodoit…DataServiceshastwomainimport/exportoptions:
UsingATL/XMLexternalfilesDirectimportintoanotherlocalrepository
Import/ExportusingATLfilesInthefollowingsteps,IwillshowyouanexampleofhowtoexportETLcodefromtheDataServicesDesignerintoanATLfile.
1. ExportJob_DWH_DimGeographyintoanATLfile.Right-clickonthejobobjectin
LocalObjectLibrary|JobsandselectExportfromthecontextmenu.TheExportwindowopensinthemainworkspacearea.Lookatthefollowingscreenshot:
2. Usingthecontextmenubyright-clickingonthespecificobjectorobjectsintheExportwindow,youcanexcludeselectedobjectswiththeExcludeoptionorselectedobjectswithalltheirdependenciesusingtheExcludeTreeoption.ExcludetheDF_Extract_SalesTerritorydataflowandallitsdependenciesfromtheexport,
asshowninthefollowingscreenshotusingtheExcludeTreeoption:
3. Objectsexcludedfromtheexportaremarkedwithredcrosses.SeeboththeObjectstoexportandDatastorestoexportareasontheExporttabfortheobjectsexcludedbytheExcludeTreecommandexecutedinthepreviousstep:
4. Toexecutetheexportoperation,right-clickinanyareaoftheExportworkspacetabandchoosetheExporttoATLfile…optionfromthecontextmenu.OntheopenedSaveAsscreen,choosethenameoftheATLfile,export.atl,anditslocation.Then,clickonOKandspecifythesecuritypassphrasefortheATLfile.
5. Exportcouldtakeanythingfromafewsecondsuptoafewminutes,dependingonthenumberofobjectsyouareexporting.Whenitisfinished,youwillseethefollowingoutputintheOutput|Informationwindow.Ifyoucheckthechosenlocation,youshouldseethattheexport.atlfilewascreated:
6. Now,logintothesecondlocalrepositorywithDesigner.Forthis,exittheDesignertorestarttheapplication.Onthelogonscreen,choosetoconnecttoanotherlocalrepository:
7. Thenewlocalrepositoryiscompletelyempty.Wewillusetheexport.atlfile
createdintheprevioussteptoimportthejobanditsdependentobjectsintothisnewrepository.SelecttheImportFromFile…optionfromthetopToolsmenulist.Then,selecttheexport.atlfileandclickonOK,thusagreeingtoimportallobjectsfromthefileintothecurrentlyopenlocalrepository.
8. Asweexportedthejobobjectanditsdependents,itdoesnotbelongtoanyprojectinanewrepository.CreateanewprojectcalledTESTandplacethejobinittoexpanditsstructure:
SeethatDF_Extract_SalesTerritoryandthetablesbelongingtoitaremissingfromthejobstructure,althoughDataServiceskeepsreferenceforWF_Extract_SalesTerritory.Ifthedataflowisimportedinthefuture,itwouldautomaticallybeassignedasachildobjecttotheworkflowandwouldfitintothejobstructure.
DirectexporttoanotherlocalrepositoryLet’sperformadirectexportofthemissingDF_Extract_SalesTerritoryobjectanditsdependentsfromtheDS4_REPOtoDS4_LOCAL_EXTrepository:
1. LogintoDS4_REPO,right-clickontheDF_Extract_SalesTerritorydataflowobject
intheLocalObjectLibrary,andselectExportfromthecontextmenutoopentheExporttabinthemainworkspacearea.Bydefault,theselectedobjectsandallitsdependentsareaddedtotheExporttab.
2. Right-clickontheExporttabandchoosetheExporttorepository…menuitemdisplayedwithboldtext.SelectDS4_LOCAL_EXTasthetargetrepository:
3. OntheExportConfirmationwindow,whichopensnext,excludeallobjectsthatalreadyexistinthetargetrepository.ThesearethedatastoreobjectsOLTPandDS_STAGE:
4. TheoutputofthedirectexportcommandisdisplayedintheOutput|Informationwindow:(14.2)07-13-1521:06:51(1000:6636)JOB:Exported1DataFlows
(14.2)07-13-1521:06:51(1000:6636)JOB:Exported2Tables
(14.2)07-13-
1521:06:51(1000:6636)JOB:CompletedExport.Exported3objects.
5. Now,exittheDesignerandreopenitbyconnectingtotheDS4_LOCAL_EXTrepository.ExpandthefullprojectTESTstructuretoseethatallmissingdependentobjectswereimportedintothestructureoftheJob_DWH_DimGeographyjob:
Howitworks…ManipulatingobjectsontheExporttabisapreparationstepthatallowsyoutoexcludetheobjectsthatyoudonotwanttoexporttotheATLfileordirectlytoanotherlocalrepository.AfterpreparingtheETLstructureforexportbyexcludingspecificobjectsthatyoudonotwanttoexport,incaseyoudonotwanttooverwriteversionsofthesameobjectsinthetargetrepositoryorarejustnotinterestedinmigratingthem,youhavethreeoptions:
Directexportintoanotherlocalrepository(acomparisonwindowopens,allowingyouexcludeobjectsfrombeingexportedandshowingwhichobjectexistsinthetargetrepository)ExporttoanATLfileExporttoanXMLfile(thisisexactlythesameasthepreviousoption,exceptthatadifferentflatfileformatisusedtostoretheETLcode)
AnATLfileisastructuredfilethatcontainsproperties,links,andreferencesfortheobjectsexported.
AnATLfilecanbeopenedinanytexteditor.Itcanbeusefultobrowseitscontentsifyouwanttocheckwhichspecificobjectincludedintheexportfile.Forfunctionobjects,itiseasytoseethetextoftheexportedfunctionifyouwanttocheckitsversionandsoon.
Forexample,ifyouopentheexport.atlfilegeneratedinthisrecipewithNotepadandsearchforDF_Load_DimGeography,youwillseethatitcanbefoundintwoplaceswithinafile:
Thefirstsectiondefinesthepropertiesoftheobject,andtheseconddefinesitsplacewithinanexecutionstructure.
DebuggingjobexecutionHere,IwillexplaintheuseofDataServicesInteractiveDebugger.Inthisrecipe,IwilldebugtheDF_Transform_DimGeographydataflow.
ThedebuggingprocessistheprocessofdefiningthepointsintheETLcode(dataflowinparticular)thatyouwanttomonitorcloselyduringjobexecution.Bymonitoringitclosely,Imeantoactuallyseetherowspassingthroughoreventohavecontroltopausetheexecutionatthosepointstoinvestigatethecurrentpassingrecordmoreclosely.
Thosepointsincodearecalledbreakpoints,andtheyareusuallyplacedbeforeandafterparticulartransformobjectsinordertoseetheeffectmadebyparticularatransformationonthepassingrow.
Gettingready…Theeasiestwaytodebugaspecificdataflowistocopyitinaseparatetestjob.CreateanewjobcalledJob_DebugandcopyDF_Transform_DimGeographyinitfromtheworkflowworkspacethatit’scurrentlylocatedin,orjustdraganddropthedataflowobjectintheJob_DebugworkspacefromLocalObjectLibrary|Dataflows.
Howtodoit…Herearethestepstocreateabreakpointandexecutethejobinthedebugmode:
1. First,definethebreakpointinsideadataflow.Todothis,double-clickonthelink
connectingthetwotransformobjects,JoinandMapping:
2. Createdbreakpointsaredisplayedasreddotsonthelinksbetweentransformobjects.Youcantogglethemon/offusingtheShowFilters/Breakpointsbuttonfromthetopinstrumentpanel:
3. GototheJob_DebugcontextandchooseDebug|StartDebug…fromthetopmenu,orjustclickontheStartDebug…(Ctrl+F8)buttononthetopinstrumentpanel:
4. TheDebugPropertieswindowopens,allowingyoutospecifyorchangethedebugproperties.Donotchangethem—thedefaultvaluesaresuitableformostdebuggingcases:
5. Inthedebuggingmode,thejobexecutesinthesamemannerasinthenormalexecutionmode,exceptthatitispossibletopauseitatanymomenttobrowsethedatabetweentransforms.Inourcase,thejobpausedautomaticallyassoonasthefirstpassingrowmeetsthespecifiedbreakpointcondition.Toviewthedatasetpassedbetweenthetransforms,clickonthemagnifyingglassicononthelinkbetweenthetransformobjects:
6. Whenpausedorrunning,thetop-levelinstrumentpanelchangestheactivatingdebuggingbuttons,allowingyoutostop/continuedebugging:
Alternatively,stepthroughthepassingrowsonebyonewhenviewingthedatasetbetweentransforms:
7. Alongwiththebreakpoints,youcandefinethefilterinthesamewindow:
Thefilterisdisplayedwithadifferenticoninthedataflowandallowsyoutofilterdatasetspassingthroughthedataflowinthedebuggingmode.
Howitworks…Two-stepprocess:
1. Definethebreakpointswhereyouwantthejobexecutiontopause.2. Runthejobinthedebuggingmode.
Breakpointsallowyoutopausejobexecutiononaspecificconditionsothatyouareabletoinvestigatethedataflowingthroughyoudataflowprocess.Inthedebuggingmode,itispossibletoseeallrecordspassedbetweentransformobjectsinsideadataflow.Youcanseehowaspecificrecordextractedfromthesourceobjectistransformedandchangedwhileitismakingitswayintothetargetobject.ItisalsoeasytodetectwhentherecordisfilteredbytheWHEREclausecondition,asitwillnotappearaftertheQuerytransformthatfiltersitout.
Managingfilters/breakpointswiththeFilters/Breakpoints…(Alt+F9)buttonfromtheinstrumentpanel.
Filtersappliedtolinksbetweentransformobjectsareconsideredonlywhenthejobisexecutedinthedebuggingmode.FiltersaswellasbreakpointsarenotvisiblefortheDataServicesenginewhenthejobisexecutedinthenormalexecutionmode.
NoteFiltersareagreatwaytodecreasethenumberofrecordspassingthroughthedataflowwhenyourunajobinthedebuggingmode.Ifyouareinterestedindebugging/seeingthetransformationbehaviorforasmall,specificamountofrecordsthatcanbedefinedwithfilteringconditions,thenitcouldsignificantlydecreasethedebuggingexecutiontime.
MonitoringjobexecutionInthisrecipe,wewilltakeacloserlookatthejobexecutionparameters,tracingoptions,andjobmonitoringtechniques.
GettingreadyWewillusethejobwedevelopedinthepreviouschapters,Job_DWH_DimGeography,toseehowthejobexecutioncanbetracedandmonitored.
Let’sperformminorchangestothejobtopreparethejobfortherecipeexamplesusingthesesteps:
1. Onthejob-levelcontext,createaglobalvariable,$g_RunDate,ofthedatedatatype
andassignthesysdate()functiontoitasavalue.2. Atthesamejoblevel,beforethesequenceofworkflows,placeanewscriptobject
withthefollowingcodeandlinkittothefirstworkflow.Thisscriptwillbethefirstobjectexecutedwithinajob:print(‘*************************************************’);
print(‘INFO:Job’||job_name()||’startedon’||$g_RunDate);
print(‘*************************************************’);
Howtodoit…ClickontheExecute…buttontoexecutethejob.Beforethejobruns,theExecutionPropertieswindowopens,allowingyoutosetupexecutionoptions,configurethetracingofthejob,orchangethepredefinedvaluesoftheglobalvariablesforthatparticularjobruntoadifferentone.
Let’stakeacloserlookatthetabsavailableonthiswindow:
ClickontheExecutionOptionstab.
Herearetheoptionsavailableonthistab:
Printalltracemessages:Thisoptiondisplaysallthepossibletracemessagesfromallcomponentsparticipatinginthejobexecution:objectparametersandoptions,internalsystemqueriesandinternallyexecutedcommands,loaderparameters,thedataitself,andmanyotherdifferentkindsofinformation.Theloggeneratedissoenormousthatwedonotrecommendthatyouusethisoptionifyouhaveafewworkflow/dataflowobjectsinsideyourjoborifthedatapassingyourdataflowsisbigenoughtonotwanttoseeeveryrowofitpassingthroughthetransformations.
ThisoptionliterallyshowswhatishappeningineveryDataServicesinternalcomponentparticipatinginthedataprocessing,andallthisinformationisdisplayedforeveryrowpassingthosecomponents.
Monitorsamplerate:Thisoptiondefineshowoftenyourlogsgetupdatedwhenthejobruns.Thedefaultis5seconds.
Collectstatisticsforoptimization:Thisoptioncollectsoptimizationstatistics,allowingDataServicestochooseoptimalcachetypesforvariouscomponentswhenexecutingdataflows.Wewilltalkaboutitinmoredetailintheupcomingchapters.Collectstatisticsformonitoring:Ifset,DataServiceswilldisplaycachesizesinthetracelogwhenthejobruns.Usecollectedstatistics:ThismakesDataServicesusethestatisticscollectedwhenthejobwasexecutedpreviouslywiththeCollectstatisticsforoptimizationoptionsetup.
ClickonthesecondTracetab.
Thistabhasalistofvarioustraceoptions.Settingupeachoftheseoptionsaddsextrainformationtothecontentsofthetracelogfilewhenthejobruns:
Bydefault,onlyTraceSession,TraceWorkFlow,andTraceDataFlowareenabled.SwitchtheirvaluestoNoandenableonlyTraceRowbychangingitsvaluetoYes.Afteryouexecutethejob,youwillseethefollowingtracelog:
Youcanseethatyoudonotseeinformationaboutthestatusesoftheworkflowanddataflowexecutionthatyounormallysee.Thetracelogfilenowdisplaysonlytheoutputoftheprint()functionsfromuserscriptobjectsandrowspassingthroughthedataflows.Beextracareful—thisisalotofdata.Avoidusingthisoptionunlessyouarespecificallyinadesigntestenvironmentwithjustafewrowsredfromthesourcetable.
ClickonthethirdGlobalVariabletab.
Thistabdisplaysthelistofallglobalvariablescreatedwithinthejob,allowingyoutomodifytheirvaluesforthisspecificjobexecutionwithoutchangingthesevaluesinthejobcontextlevel:
Tochangethevalue,justdouble-clickontheValuefieldofthespecificglobalvariablerowandinputthenewvalue.Rememberthatthischangeappliesonlytothiscurrentjobexecution.Whenyourunthejobnexttimeandopenthistab,theglobalvariableswillhavetheirdefaultvaluesdefinedagain.
LogintotheDataServicesManagementConsoletomonitorjobexecutionandgotoAdministrator|Batch|DS4_REPO.
TheManagementConsolenotjustallowstheWebaccesstothesamethreelogfiles,trace,log,andmonitor,butalsotoanotherone,PerformanceMonitor:
Thetop-levelsectionallowseasyaccesstothepreviousversionsofthelogfilesforaspecificjob.ItdoesnotmatterwhetherthejobhasbeenplacedintheProjectfolderornot.
Intheprecedingscreenshot,wedisplayedalllogfilesforthelast5daysfortheJob_DWH_DimGeographyjob.
ClickonthePerformanceMonitorlinkofthelastjobexecutiontoopenthePerformanceMonitorpage:
ThefirstpageofPerformanceMonitordisplaysthelistofdataflowsfromthejobstructure.Whenclickingonthespecificdataflow,itispossibletodrillinonthedataflowcomponentsleveltoseehowmanyrecordspassedthroughthespecificdataflowcomponentsandtheexecutiontimeofeachthem.
Infact,theinformationdisplayedinPerformanceMonitorisbasedonthesamedataastheinformationdisplayedintheMonitorlog.Itisjustpresenteddifferently,inmakingitsometimesmoreconvenientforanalysis.
Howitworks…Itissimplyamatterofpersonalchoicewhendecidingwhattousetomonitorjobexecution:thewebapplicationofDataServicesManagementConsoleortheDesignerclient.Sometimes,duetorestrictedaccesstotheenvironment,theWeboptionismorepreferable.Itisalsoeasiertouseifyouneedtofindanyoldlogfilesofaspecificjobforanalysis,performancecomparison,orsimplyneedtocopyandpastefewrowsfromthetracelogfile.
BuildinganexternalETLauditandauditreportingInthisrecipe,wewillimplementtheexternaluser-builtETLauditmechanism.OurETLauditwillincludeinformationaboutthestartandstoptimesoftheworkflowsrunningwithinthejob,theirstatuses,names,andinformationaboutwhichjobtheybelongto.
Gettingready…WeneedtocreateanETLaudittableinourdatabasewherewewillstoretheauditresults.
ConnecttotheSTAGEdatabaseusingtheSQLServerManagementStudioandexecutethefollowingstatementtocreatetheETLaudittable:createtabledbo.etl_audit(
job_run_idinteger,
workflow_statusvarchar(50),
job_namevarchar(255),
start_dtdatetime,
end_dtdatetime,
process_namevarchar(255)
);
Howtodoit…First,weneedtochooseobjectsforauditing.Thefollowingstepsshouldbeimplementedforeveryworkflowordataflowthatyouwanttocollectauditinginformationabout.Inthisparticularexample,wewillenableETLauditingforthejobobjectitself.
1. Createextravariablesforthejobobject:
$v_process_namevarchar(255)
$v_job_run_idinteger
2. Addthefollowingcodeinthescriptthatstartsthejobexecution:$v_process_name=job_name();
$v_job_run_id=job_run_id();
#Insertauditrecord
sql(‘DS_STAGE’,
’insertintodbo.etl_audit(job_run_id,workflow_status,job_name,start_dt,end_dt,process_name)’||
’values(‘||$v_job_run_id||’,’||’'STARTED'’||’,'’||job_name()||’',SYSDATETIME(),NULL,'’||$v_process_name||’')’
);
3. Createanewscript,ETL_audit_update,attheendoftheexecutionsequenceinsidethejobcontextandputthefollowingcodeinit:#UpdateETLauditrecord
sql(‘DS_STAGE’,
’updatedbo.etl_audit’||
’setworkflow_status=’||’'COMPLETED'’||’,end_dt=SYSDATETIME()’||
’wherejob_run_id=’||$v_job_run_id||’andprocess_name='’||$v_process_name||’'’
);
4. Thejobcontenthasnowbeenwrappedintheauditinginsert/updatecommandsplacedintheinitialandfinalscripts:
5. ImplementtheprecedingstepsforWF_Extract_SalesTerritory,whichcanbefoundintheWF_extractworkflowcontainertoenabletheETLauditforthatobjectaswell.
Theonlychangeisthatintheinitialscript,the$v_process_namevariablevalueshouldbechangedtotheworkflow_name()functioninsteadofthejob_name()function,howitwasdoneforthejob:
Howitworks…Now,ifyouexecutethejobandquerythecontentsoftheETLaudittableinafewseconds,youshouldseesomethinglikethis:
Afewsecondslater,afterthejobsuccessfullycompletes,yourETLaudittablewilllooklikethis:
Asimpleanalysisofthistablecananswerthefollowingquestions:
Whichobjectsarerunningwithinthecurrentlyrunningjob?Thisisveryusefulinformation,especiallyifyourjobcontainshundredsofworkflows,with20ofthemrunninginparallel.Inthiscase,itishardtoobtainthisinformationfromthetracelog.Whatwasthestatusoftheobjectwhenitwasexecutedlasttime?Tobeprecise,youalsohavetoimplementanotherpieceoflogic,thethirdupdatethatchangesthestatusoftheworkflowto“ERROR”ifsomethingunexpectedhappensandtheworkflowcannotbeconsideredassuccessfullycompleted.Thisthirdupdateusuallygoesintothecatchsectionofthetry-catchblock.Whatwastheexecutiontimeforthespecificobject?Theanswerspeaksforitself.Whatwastheexecutionorderoftheobjects?Youcancomparetheexecutiontimes.Ifyouknowwhentheobjectsstartedandended,youcaneasilyderivetheexecutionorder.Whencomparableworkflowsarenotdirectlylinkedandrunwithindifferentbranchesoflogic,itissometimesusefultoknowwhichonestartedorfinishedearlier.
Theadvantageoftheexternaluser-builtETLauditisthatyoucanbuildaflexiblesolutionthatgathersanyinformationthatyouwantittogather.
NoteNotethatwithinsert/updateETLauditstatements,youcandefinethelogicalbordersofasuccessfulobjectcompletion.Theoretically,aworkflowobjectandthejobitselfcanstillfailrightafteritsuccessfullyexecutesthesql()commandandupdatesitsstatusintheETLaudittableassuccessful.However,thisisoftenagoodthingasitisexactlywhatyouareinterestedinwhenyoumakethedecisionofifyoushouldrerunaspecificworkflowornot–hastheworkflowcompletedtheworkitwassupposedto?
InformationinETLaudittablescanbeutilizednotonlyinthereportsshowingthe
executionstatisticsofyourjobsbutalsotoimplementexecutionlogicinsidethejob.
Forexample,ifyouwanttorunthespecificworkflowonlyonceaweekbutitisbeingexecutedwithinadailyjob,youcouldaddthescriptobjectsinyourworkflow.YoucouldcheckfromETLaudittableswhentheworkflowwasrunthelasttimeandskipitifitwasexecutedandsuccessfullycompletedlessthanaweekago.
Finally,itisevenpossibletonotonlyauditaDataServicesobject(dataflow,workflow,job,orscriptobject)buttoauditanypieceofcode—partofthescriptorasinglebranchofthelogic.Youcanwrapanythingintheinsert/updatestatementssenttoanexternaltabletostoreauditinformation.
ThatisthetruepowerofcustomETLauditing.YoucancollectalltheinformationyouwantandeasilyquerythisinformationfromETLitselftomakevariousdecisions.
Usingbuilt-inDataServicesETLauditandreportingfunctionalityDataServicesprovidesETLreportingfunctionalitythroughtheManagementConsolewebapplication.ItisavailableintheformoftheOperationalDashboardapplicationonthemainManagementConsoleHopepage.
GettingreadyYoudonothavetoconfigureorpreparetheoperationaldashboardsfeature.Itisavailablebydefault,andallyouhavetodotoaccessitisstarttheDataServicesManagementConsole.
Howtodoit…Let’sreviewwhichETLreportingcapabilitiesareavailableinDataServices.Performthesesteps:
1. StarttheDataServicesManagementConsole.2. ChoosetheOperationalDashboardapplicationfromthehomepage:
3. ThemaininterfaceofOperationalDashboardincludesthreesections.Itincludesthepiechartofthegeneraljobstatusstatisticsperintervalforaselectedrepository.Greenshowsthenumberofsuccessfullycompletedjobsforaspecificperiodoftime,yellowshowsjobssuccessfullycompletedwithwarningmessages,andredshowsfailedjobs:
4. Thesectionbelowshowsmoredetailedjobexecutionstatisticsintheformofaverticalbarchartforspecificdaysorintervalofdays.Trytohoveryourmousecursoroverthebarstoseetheactualnumbersbehindthegraph.Theverticallineshowsthenumberofjobsexecutedonspecificdayswithdifferentstatuses:successfullywithnoerrors(green),successfullywithwarningmessages(yellow),andfailed(red):
5. Attheright-handside,youcanseethelistofjobswhoseexecutionstatisticsarerepresentedbygraphsontheleft-handside.Byclickingonaspecificrow,youcandrilldowntoseethelistofexecutionsforthisspecificjob.Themostusefulinformationhereistheexecutiontimedisplayedinseconds,therunIDofthejob,andstatusofthejob,asyoucanseeinthefollowingscreenshot:
Howitworks…OperationalDashboardreportingcanbeusedtoprovidejobexecutionhistorydata,analyzethepercentageoffailedjobsforaspecifictimeinterval,andcomparethosenumbersbetweendifferentdaysortimeintervals.
Thatisprettymuchit.Todomore,youwouldhavetobuildyourownETLmetadatacollectionandbuildyourownreportingfunctionalityontopofthisdata.
AutoDocumentationinDataServicesThisrecipewillguideyouthroughtheAutoDocumentationfeatureavailableinDataServices.LikeOperationalDashboard,thisfeatureisalsopartoffunctionalityavailableintheDataServicesManagementConsole.
Howtodoit…ThesestepswillcreateaPDFdocumentcontaininggraphicalrepresentation,descriptions,andrelationshipsbetweenallunderlyingobjectsoftheJob_DWH_DimGeographyjobobject:
1. LogintotheDataServicesManagementConsolewebapplication.2. Onthehomepage,clickontheAutoDocumentationicon:
3. Inthefollowingscreen,expandtheprojecttreeandleft-clickonthejobobject.Youcanseewhichobjectisdisplayedascurrentbycheckingtheobjectnameinthetoptabnameontheright-handsideofthewindow:
4. Then,clickonthesmallprintericonlocatedatthetopofthewindow:
5. Inthepop-upwindow,justclickonthePrintbutton,leavingalloptionswithdefaultvalues.
6. DataServices,bydefault,generatesaPDFdocumentinthebrowser’sdefaultDownloadsfolder:
Howitworks…Asyouhaveprobablynoticed,theAutoDocumentationfeatureisonlyavailableforthejobsincludedinprojectsasitdisplaystheobjecttreestartingfromtherootProjectlevel.JobsthatwerecreatedintheLocalObjectLibraryandwerenotassignedtoaspecificprojectwillnotbevisibleforauto-documenting.
AutoDocumentationexportisavailableintwoformats:PDFandMicrosoftWord(seethefollowingscreenshot):
Onthesamescreen,youcandisplaytypesofinformationtobeincludedinthedocumentationfile.
NoteNotethatdataflowdocumentationincludesmappingofeachandeverycolumnfromsourcetotargetthroughalldataflowtransformations.Thisisaverydetailedlevel,andevenourdataflowinsideJob_DWH_DimGeographyisnotatallcomplex.Thedatasetswearemigratingarerelativelysmall,butwestillgeta34-pagesdocument.So,youcanseethatthedocumentationlevelisextremelydetailed.
AnotherextremelyusefulfeatureofDataServicesAutoDocumentationistheTableUsagetab:
Itallowsustoseewhichsourceandtargettableobjectsareusedwithinthe
Job_DWH_DimGeographyobjecttree.
InformationlikethisaboutrelationshipsbetweenobjectswithinETLisextremelyusefulasduringdevelopment,someobjectsoftenchange,andyouneedtoevaluatehowitimpactstheETLcode.Ifthetablecolumnischanged(renamedanddatatypechanged)onthedatabaselevelandyouhavetoapplythesamechangestoyourETLcode.Otherwise,itwillfailthenexttimeitruns,asDataServicesisnotawareofthetablechangesandstilloperateswitholdversionofthetable.
TableobjectdependenciescanalsobevisualizedwithanotherDataServicesfeature:ImpactandLinageAnalysis.ThisfunctionalitywillbediscussedinChapter12,IntroductiontoInformationSteward.
Chapter7.ValidatingandCleansingDataHerearetherecipespresentedinthischapter:
CreatingvalidationfunctionsUsingvalidationfunctionswiththeValidationtransformReportingdatavalidationresultsUsingregularexpressionsupporttovalidatedataEnablingdataflowauditDataQualitytransforms–cleansingyourdata
IntroductionThischapterintroducestheconceptsofvalidatingmethodsthatcanbeappliedtothedatapassingthroughETLprocessesinordertocleanseandconformitaccordingtothedefinedDataQualitystandards.Itincludesvalidationmethodsthatconsistofdefiningvalidationexpressionswiththehelpofvalidationfunctionsandthensplittingdataintotwodatasets:validandinvaliddata.Invaliddatathatdoesnotpassthevalidationfunctionconditionsusuallygetsinsertedintoaseparatetargettableforfurtherinvestigation.
Anothertopicdiscussedinthischapterisdataflowaudit.ThisfeatureofDataServicesallowsthecollectionofexecutionalstatisticalinformationaboutthedataprocessedbythedataflowandevencontrolstheexecutionalbehaviordependingonthenumberscollected.
Finally,wewilldiscusstheDataQualitytransforms—thepowerfulsetofinstrumentsavailableinDataServicesinordertoparse,categorize,andmakecleansingsuggestionsinordertoincreasethereliabilityandqualityofthetransformeddata.
CreatingvalidationfunctionsOneofthewaystoimplementthedatavalidationprocessinDataServicesistousevalidationfunctionsalongwiththeValidationtransforminyourdataflowtosplittheflowofdataintotwo:recordsthatpassthedefinedvalidationruleandthosethatdonot.Thosevalidationrulescanbecombinedintovalidationfunctionobjectsforyourconvenienceandtraceability.
Inthisrecipe,wewillcreateastandardbutquitesimplevalidationfunction.Wewilldeployitinourdataflow,whichextractstheaddressdatafromthesourcesystemintoastagingarea.TheValidationfunctionwillchecktoseewhetherthecityinthemigratedrecordhasParisasavalue,andifitdoes,itwillsendtherecordstoaseparaterejecttable.
GettingreadyFirst,weneedtocreateanotherschemainourSTAGEdatabasetocontainrejecttables.CreatingtheRejectschematostorethesetablesallowsthekeepingoftheoriginaltablenames;thatmakeswritingqueriesandreportingagainstthosetablesaswellaslocatingthemmucheasier.
1. OpenSQLServerManagementStudio.2. GotoSTAGE|Security|SchemasintheObjectExplorerwindow.3. Right-clickonthelistandchooseNewSchema…inthecontextmenu.4. ChooseRejectforschemanameanddboasschemaowner.5. ClickonOKtocreatetheschema.
Howtodoit…Followthesestepstocreateavalidationfunction:
1. LogintoDataServicesDesignerandconnecttothelocalrepository.2. GotoLocalObjectLibrary|CustomFunctions.3. Right-clickonValidationFunctionsandselectNewfromthecontextmenu.4. Inputthefunctionnamefn_Check_Paris,checkValidationfunction,asshownin
thefollowingscreenshot,andpopulatethedescriptionfield.
5. ClickonNextandinputthefollowingcodeinthemainsectionofSmartEditor:#Validationfunctiontocheckifthepassedvalueequals
#to‘Paris’
#Wrapthefunctioninthetry-catchblock.Wedonotwant
#tofailthedataflowprocess
#ifthefunctionitselffails.
try
begin
#Assigninputparametervaluetoalocalvariable
$l_City=$p_City;
$l_AddressID=$p_AddressID;
#Default“Success”resultstatus
$l_Result=1;
if($l_City=‘Paris’)
begin
#Changeto“Failure”resultstatus
$l_Result=0;
end
#Returningresultstatus
Return$l_Result;
end
catch(all)
begin
#writinginformationaboutthefailureinthe
#tracelog
print(‘Validationfunctionfn_check_Paris()failedwitherror:’||
error_message()||’whileprocessingAddressID={$l_AddressID}withCity={$l_City}’);
#Returningtheresultstatus
Return$l_Result;
end
6. InthesameSmartEditorwindow,createlocalvariables$l_AddressIDint,$l_Cityvarchar(100),and$l_Resultintandthefunction’sinputparameters,$p_Cityvarchar(100)and$p_AddressIDint.
7. ClickontheValidatebuttontovalidatethefunctionandclickOKtocloseSmartEditorandsaveallchanges.
Howitworks…Thefunction’sbodyiswrappedintry-catchblocktopreventourmaindataflowprocessesfromfailingifsomethinggoeswrongwiththevalidationfunction.Thevalidationfunctionisexecutedforeachrowpassingthrough,soitwouldbeineffectiveatallowingtheexecutionofthefunctiontodeterminetheexecutionbehaviorofthemainprocess.
Trytoimagineasituationwhenyouyourdataflowprocess2millionrecordsfromthesourcetableand50ofthemmakethefunctionfailforsomereasonorother.Toprocessall2millionrecordsinonego,youwouldneedtowrapthelogicoftheentirefunctionintry-catchandoutputextrainformationintothetracelogorintoanexternaltableinthecatchsectiontoperformfurtheranalysisofthedataafterprocessingisdone.
Inourexample,weonlypasstheAddressIDfieldfortraceabilitypurposes,soitwouldbeeasytofindtheexactrowonwhichthefunctionfailed.
Thevalidationfunctionshouldreturneither1or0.Thevalue1meansthattheprocessedrowagainstwhichthevalidationfunctionwasexecutedsuccessfullypassedthevalidation;0meansfailure.
Seeinthefollowingscreenshotthat,inLocalObjectLibrary,validationfunctionsaredisplayedseparatelyfromcustomfunctions:
UsingvalidationfunctionswiththeValidationtransformThisrecipewilldemonstratehowvalidationfunctionsaredeployedandconfiguredwithinadataflow.Asthevalidationfunctionthatwecreatedinthepreviousrecipevalidatescityvalues,wewilldeployitintheDF_Extract_AddressdataflowobjecttoperformthevalidationofdataextractedfromtheAddresstablelocatedinthesourceOLTPdatabase.
GettingreadyOpenthejob-containingdataflow,DF_Extract_Address,alreadycreatedintheUsecaseexample–populatingdimensiontablesrecipeinChapter5,Workflow–ControllingExecutionOrder,andcopyitintoanewjobtobeabletoexecuteitasastandaloneprocess.
Howtodoit…1. OpenDF_Extract_Addressinthemainworkspaceforediting.2. GotoLocalObjectLibrary|Transform,findtheValidationtransformunder
Platform,anddragitintotheDF_Extract_AddressdataflowrightaftertheQuerytransform.
3. LinktheoutputoftheQuerytransformtotheValidationtransformanddouble-clickontheValidationtransformtoopenitforediting.
4. OpentheValidationtransformintheworkspaceandseehowValidationsplitstheflowintothreeoutputschemas:Validation_Pass,Validation_Fail,andValidation_RuleViolation:
TheValidation_PassandValidation_Failoutputschemasareidentical,exceptthatValidation_Failcontainsthreeextracolumns:DI_ERRORACTION,DI_ERRORCOLUMNS,andDI_ROWID.
5. InsidetheValidationtransform,clickontheAddbuttonlocatedontheValidationRulestabtocreatethefirstvalidationrule.ChoosetheValidationFunctionoptionforthecreatedruleandmapcolumnssentfromtheprevioustransformoutputtotheinputparameters,alsochoosingSendToFailasthevalueforActiononFail.Donotforgettospecifythevalidationrulenameanddescription.
6. ClickonOKtocreatethevalidationrule.ItisnowdisplayedintheValidationtransform.
7. NowclosetheValidationtransformeditorwindowandaddthreeQuerytransforms,oneforeachvalidationschemaoutput.NamethemValidation_Pass,Validation_Fail,andValidation_Rules.LinktheValidationtransformoutputtoallQuerytransformschoosingthecorrectlogicbrancheachtimeDataServicesasksyouto.
8. MapallinputschemacolumnstotheoutputschemasinallcreatedQuerytransformswithoutmakinganychangestothemappings.
9. CreatetwoadditionaltemplatetargettablestooutputdatafromtheRulesandFailtransforms.SpecifytheREJECTownerschemaforbothofthemasfollows:
TheADDRESStemplatetablefortheFailoutputTheADDRESS_RULEStemplatetablefortheRulesoutput
10. Yourfinaldataflowversionshouldlookliketheoneinthefollowingscreenshot:
11. Saveandexecutethejob.12. Aftertheexecutionisfinished,openthedataflowagainandviewthedatainthe
REJECT.ADDRESSandREJECT.ADDRESS_RULEStables:
NoteNotethattherowswherethevalueofCITYequalsParisarenotpassedtotheTransform.ADDRESSstagetableanymore.
Howitworks…Usually,theValidationtransformisdeployedrightbeforethetargetobjecttoperformthevalidationofdatachangedbyprevioustransformations.
ThePassoutputschemaoftheValidationtransformisusedtooutputrecordsthathavesuccessfullypassedthevalidationruledefinedbyeithervalidationfunction(s)orcolumncondition(s).
Notethatyoucandefineasmanyvalidationfunctionsorcolumnconditionrulesasyoulike,andDataServicesisveryflexibleinallowingyoutodefinedifferentActiononFailoptionsfordifferentfunctions.Thismakesitpossibletosendsome“failed”recordstobothPassandFailoutputsorothersonlytotheFailoutput,dependingontheseverityofthevalidationrule.
Let’sreviewanotherfeatureoftheValidationtransform—theabilitytomodifythevaluesofthepassingrowsdependingontheresultofvalidationrule.Followthesesteps:
1. OpentheValidationtransformforeditinginthemainworkspace.2. Aswearevalidatingthecityname,let’schangethebehavioroftheValidation
transformtosendtherowswhichdidnotpassvalidationtobothPassandFail.However,intherowssenttothePassoutput,changethecitynamevaluefromParistoNewParis.TodothatinthesectionlocatedatthebottomoftheValidationtransformeditor,choosetheQuery.CITYcolumnandspecify‘NewParis’intheexpressionfield,asshownhere:
3. Saveandexecutethejob.4. OpenthedataflowagainandviewthedatafromboththeTransform.ADDRESSand
Reject.ADDRESStables.YouwillseethatrecordswiththesameADDRESSIDfieldwereinsertedinboththetables,butinthemainstagingtable,thevaluesforcitynameweresubstitutedwithNewParis.
SeethefollowingtableforadescriptionoftheextracolumnsfromtheFailandRuleViolationValidationtransformoutputschemas:
DI_ERRORACTIONThisshowswheretheoutputforthespecificrulewassent:Bmeans“both”,Fmeans“fail”,andPmeans“pass”.
DI_ERRORCOLUMNS
Thisshowsthespecificcolumnsthatwerevalidated(aspartofinputvaluesforthevalidationfunctionorsimplyasasourceforcolumnvalidation).
DI_ROWID Thisistheuniqueidentifierofthefailedrow.
DI_RULENAME Thisisthenameoftherulewhichgeneratedthefailedrow.
DI_COLUMNNAME
Thisisthevalidatedcolumn(partofthevalidationfunctioninputvaluesorsourceforcolumnvalidationinthevalidationrule).NotethatinADDRESS_RULEoutput,onerowisgeneratedforeachvalidatedcolumnseparately.So,ifyourvalidationfunctionwasusingfivecolumnsfromthesourceobject,allfiveofthemareconsideredtobevalidatedcolumns,andincaseoffailure,fiverowswillbecreatedintheADDRESS_RULEtableforeachcolumnwiththesameROWID(seethefigureshowingthecontentsoftheADDRESS_RULEtableinthefirstexampleofjobexecutioninthisrecipe).
ReportingdatavalidationresultsOneoftheadvantagesofusingtheValidationtransformisthatDataServicesprovidesthereportingfunctionalitywhichisbasedonvalidationstatisticsandcollectedsampledataduringvalidationprocesses.
ValidationreportscanbeviewedintheDataServicesManagementConsole.Inthisrecipe,wewilllearnhowtocollectdataforvalidationreportsandaccessthemintheDataServicesManagementConsole.
GettingreadyUsethesamejobanddataflow,DF_Extract_Address,updatedwiththeValidationtransformasinthepreviousrecipesofthecurrentchapter.
Howtodoit…1. OpenthedataflowDF_Extract_Addressanddouble-clickontheValidation
transformobjecttoopenitforediting.
NoteTobeabletouseDataServicesvalidationreports,thevalidationstatisticscollectionhastobeenabledfirstforaValidationtransformobjectintheETLcodestructurethatyouwanttocollectthereportingdatafor.
2. OpentheValidationTransformOptionstabintheValidationtransformeditor.3. Tickbothcheck-boxes,CollectdatavalidationstatisticsandCollectsampledata.
4. SaveandrunthejobtocollectthedatavalidationstatisticsforthedatasetprocessedbyDF_Extract_Address.MakesurethatyoudonothavetheoptionDisabledatavalidationstatisticscollectionselectedonthejob’sExecutionPropertieswindow:
5. LaunchtheDataServicesManagementConsoleandlogintoit.6. OntheHomepage,clickontheDataValidationlinktostarttheDataValidation
dashboardwebapplication:
7. Experimentandhoveryourmouseoverthepiecharttoseethedetailedinformationaboutpassedandfailedrecordsforyourvalidationrule.
8. Clickonaspecificareainthepiecharttodrilldownintoanotherbarchartreportshowingvalidationrules.AsweonlyhaveonevalidationruledefinedinourValidationtransformandinallrepository,thereisonlyonebardisplayedfortheCity_not_Parisvalidationrule.
Howitworks…TheoptionsCollectDataValidationStatisticsandCollectsampledataenableDataServicestocollectexecutionstatisticsforspecificValidationtransformrules.Inourcase,wedefinedone,sothereisnotmuchdiversityinthepresenteddashboardreportsthatyoucanseeintheDataServicesManagementConsole.
Hereisthepiechartyouseeafterimplementingsteps7-8ofthisrecipe:
Byclickingontheobjectinthebarchart,youcandrilldowntotheactualdatasampleofthefailedrowscollectedbytheValidationtransformduringjobexecution.
Theinformationpresentedinthesedashboardreportsisaveryusefulgraphicalrepresentationofthequalityofthedatawhichpassesthroughdataflowobjectsandgetsvalidated.Youcaneasilyseewhatpercentageofdatadoesnotpassthevalidationrules,thecomparisonofvalidationstatisticsbetweendifferentperiodsoftime,andeventheactualrowsthatdidnotpassthespecificvalidationrulewithoutrunningSQLqueriesonyourdatabasetables,orusinganyotherapplicationexceptDataServicesManagementConsole.
UsingregularexpressionsupporttovalidatedataInthisrecipe,wewillseehowyoucanuseregularexpressionstovalidateyourdata.WewilltakeasimpleexampleofvalidatingphonenumbersextractedfromthesourceOLTPtablePERSONPHONElocatedinthePERSONschema.Thevalidationrulewouldbetoidentifyallrecordswhichhavephonenumbersdifferentfromthispattern:ddd-ddd-dddd(dbeinganumeral).Let’ssaythatwedonotwanttorejectanydata.Ourgoalistogenerateadashboardreportshowingthepercentageofrecordsinthesourcetablewhichdonotcomplywiththespecifiedrequirementforthephonenumberpattern.
GettingreadyMakesurethatyouhavethePERSON.PERSONPHONEtableimportedintotheOLTPdatastore.Wewillcreateanewjobandnewdataflow,DF_Extract_PersonPhone,whichwillbemigratingPersonPhonerecordsfromOLTPtoaSTAGEdatabase,atthesametimeasvalidatingthem.
Howtodoit…1. Createanewjobwithanewdataflow,DF_Extract_PersonPhone,designedasa
standardextractdataflowwithadeployedValidationtransform,asshowninthefollowingfigure:
2. YoushouldalsocreatetargettablesfortheRuleViolationandFailoutputschemasintheRejectschemaoftheSTAGEdatabase.
3. Toconfigurethevalidationrule,opentheValidationtransformforeditinginthemainworkspace.UseColumnValidationinsteadofValidationFunctionandputthefollowingcustomconditionintoQuery.PHONENUMBER:match_regex(Query.PHONENUMBER,’^\d{3}-\d{3}-\d{4}$’,NULL)=1
Thevalidationruleconfigurationshouldlooklikeinthefollowingscreenshot:
NoteNotethatforActiononFail,wesetupSendToBothaswedonotwantourvalidationprocessaffectingthemigrateddataset.
4. ClickonOKtocreateandsavethevalidationrule.5. Nowgotothesecondtab,ValidationTransformOptions,andcheckallthree
options:Collectdatavalidationstatistics,Collectsampledata,andCreatecolumnDI_ROWIDonValidation_Fail.
6. YourValidationtransformshouldlooklikethisnow:
7. Saveandexecutethejobtoextracttherecordsintothestagingtableandcollectthevalidationdataforthedashboardreport.
Howitworks…Regularexpressionsareapowerfulwaytovalidatethedatapassingthrough.Thematch_regex()functionusedinthisrecipereturns1ifthevalueintheinputcolumnmatchesthepatternspecifiedasthesecondinputparameter.
DataServicessupportsstandardPOSIXregularexpressions.Seethematch_regexsection(section6.3.96)inChapter6,FunctionsandProcedures,oftheDataServices4.2ReferenceGuideforfullsyntaxandregularexpressionsupportdetails.
Notethatinthisrecipe,wedidnotrejecttherecordswhichfailedthevalidationrule.Asourgoalwastosimplyevaluatethenumberofrecordswhichdonotcomplywiththephonenumberstandard,bothfailedandpassedrecordswereforwardedtothetargetmainstagingtable.
Let’sseehowthedashboardvalidationreportforourjobexecutionlooks:
1. LaunchtheDataServicesManagementConsoleandloginintoit.2. OpentheDataValidationapplicationonthemainHomepage.3. Bydefault,DataServicesshowsthedatavalidationstatisticsforallfunctionalareas
forthecurrentdate(startingfrommidnight).4. Hoveryourmousepointerandclickonthefailedredsectionofthepiecharttosee
thefollowingdetails:thepercentageandnumberofrowswhichdidnotpassthevalidationrule.
5. Ifyoudidnotrunanyjobsgatheringvalidationstatisticstoday,thepiechartforDF_Extract_PersonPhonecreatedandexecutedinthisrecipeshowsthat9,188records(46%)inthePERSONPHONEtablehaveaphonenumberinapatterndifferentfromddd-ddd-dddd,and10,784records(54%)havephonenumbersmatchingthispattern.
EnablingdataflowauditAuditinginDataServicesallowsthecollectionofadditionalinformationaboutthedatamigratedfromthesourcetothetargetbyaspecificdataflowonwhichtheauditisenabled,andevenallowsmakingdecisionsaccordingtotherulesappliedontheauditdata.Inthisrecipe,wewillseehowauditcanbeenabledandutilizedduringtheextractionofdatafromthesourcesystem.
GettingreadyForthisrecipe,youcanusethedataflowDF_Extract_Addressfromthepreviousrecipesofthischapter.
Howtodoit…Performthefollowingstepstoenabletheauditingforthespecificdataflow.
1. OpenDF_Extract_AddressintheworkspacewindowandselectTools|Auditfrom
thetop-levelmenu.
2. Inthenewlyopenedwindow,selecttheLabeltab,right-clickintheemptyspace,andchooseShowAllObjectsfromthecontextmenu.
3. TheLabeltabdisplaysthelistofobjectsfromwithinadataflow.EnableauditingontheQueryandPassQuerytransformobjectsbyright-clickingonthemandselectingtheCountoptionfromthecontextmenu.
4. Anotherwaytoenableauditingonspecificobjectsfromwithinadataflowistoright-clickonitandselectthePropertiesoptionfromthecontextmenu.
5. Then,gototheAudittabinthenewlyopenSchemaPropertieswindowandselecttherespectiveauditfunctionfromthecomboboxmenu.Inourcase,bothauditpoints
wereenabledforQuerytransforms,andtheonlyauditoptionavailableinthiscaseisCount.
6. DataServicescreatestwovariableswhichareusedtostoretheauditvalue.ForthePassQuerytransform,twovariableswerecreatedbydefault:$Count_Pass,tostorethenumberofsuccessfullypassedrecords,and$CountError_Passtostorethenumberofincorrectorrejectedrecords.
7. Let’schangethedefaultauditvariablenamesfortheQueryobjectbyopeningitspropertiesandselectingtheAudittabontheSchemaPropertieswindow.
8. Specifytheauditvariablenamestobe$Count_Extractand$CountError_Extract.Then,closethewindowbyclickingontheOKbutton.
9. Now,closetheAudit:DF_Extract_AddresswindowbyclickingontheClosebutton.
10. Ifyoutakealookatthedataflowobjectsintheworkspacewindow,youcanseethatthecreatedauditpointsweremarkedwithsmallgreenicons.Toaccessthedataflowauditconfiguration,youalsocanjustclickontheAuditbuttoninthetoolsmenu.
Howitworks…Atthispoint,youhaveconfiguredtheauditcollectionforrowspassingtwoQueryobjectsintheDF_Extract_Addressdataflow.Auditing,ifenabledattheobjectlevel,allowsonlysingle-auditfunctionusage:countauditfunction.Thisauditfunctionsimplykeepstrackofthenumberofrecordspassingthespecificobjectinsidethedataflow.
Auditingcanalsobeenabledonthecolumnlevelinsidetheobjectwhichresidesinsidethedataflow,usuallyonthecolumnsintheQuerytransforms.Inthatcase,threeadditionalauditfunctionsareavailable—Sum,Average,andChecksum—ifthecolumnisofnumericdatatypeandonlyChecksumisavailableifthecolumnisofthevarchardatatype.Asyoumighthaveguessed,thesefunctionsallowyoutostoreeitherthesummaryortheaverageofvaluesinthespecificcolumnsforallpassingrecordsorcalculatethechecksum.
Asyoucansee,thecollectedauditdatacanlaterbeaccessedfromtheOperationalDashboardtabintheDataServicesManagementConsole.However,themostusefulpurposeoftheauditfeatureistheabilitytodefinetherulesonthecollectedauditdataandperformtheactionsdependingontheresultoftheauditruleimplemented.
Herearethestepsshowingyouhowtoimplementtheruleoncollectedauditdata:
1. OpenDF_Extract_AddressintheworkspaceandclickontheAuditbuttontoopen
theAuditconfigurationwindowforthisdataflow.2. GototheRuletab.3. ClickontheAddbuttontoaddanewauditrule.4. ChoosetheCustomoptiontodefineacustomauditrule.5. Inputthecustomfunctionshowninthefollowingscreenshot:
6. ChecktheoptionRaiseexceptionintheActiononfailuresection.OtheroptionsareEmailtolistandScript.
TheEmailtolistoptionallowsyoutosendnotificationsaboutruleviolationstospecificemailrecipients.Notethattousethisfunctionality,youhavetospecifySMTPserverdetailsinyourDataServicesconfiguration.
TheScriptoptionallowsyoutoexecutescriptswritteninastandardDataServicesscriptinglanguage.
7. Therulethatwespecifiedisappliedattheveryendofthedataflowexecutionandchecksthatthepercentageofrowswhichpassedthevalidationruletakenfromthetotalamountofrowsextractedfromthesourcetableishigherthan80percent.RememberthatourvalidationrulechecksandrejectsallParisrecords.WeknowthatthenumberofrecordswithacityvalueequaltoParisissignificantlylessthan20percentoftherows,whichshouldberejectedduringvalidationtofailthedefinedauditrule.So,ifyourunyourdataflownow,nothingwillhappen;theauditrulewillnotbeviolatedandthejobwillbesuccessfullycompleted.Tomaketheauditrulefail,let’schangeourvalidationfunctiontorejectallrecordswithacityvaluenotequaltoParis,asshowninthefollowingscreenshot:
8. Asthefinalstepforutilizingauditfunctionalityonthejob’sExecutionPropertieswindow,youshouldchecktheEnableauditingoption.Ifthisisnotchecked,auditdatawillnotbecollectedandauditruleswillnotwork.
9. Saveandexecutethejob.Dataflowexecutionfailsandrelevantinformationisdisplayedintheerrorlog,asshownhere:
NoteRememberthatalthoughthedataflowDF_Extract_Addressfails,theauditrulecheckhappensafteritcompletesallthepreviousstepsandthedataissuccessfullyinsertedintoalltargets.
There’smore…CollectedauditnumberscanbeaccessedviatheOperationalDashboardtabfromtheDataServicesManagementConsole.
Toaccessit,opentheOperationalDashboardtabandselectspecificjobstoopenJobExecutionDetails.Byclickingonthejobexecutioninstancesfurther,youcanopenaJobDetailsview,whichwillcontaininformationaboutalldataflowsexecutedwithinajob.Ifthedataflowhasauditenabledforitcolumns,ContainsAuditDatawillshowyouthat.
ByclickingontheViewAuditDatabutton,youcanopenthenewwindowshowingvaluescollectedduringauditingandtheauditruleresultfortheselectedjobinstanceexecution.
DataQualitytransforms–cleansingyourdataDataQualitytransformsareavailableintheDataQualitysectionoftheLocalObjectLibraryTransformstab.Thesetransformshelpyoutobuildacleansingsolutionforyourmigrateddata.
ThesubjectofimplementingDataQualitysolutionsinETLprocessesissovastthatitprobablyrequiresawholechapter,orevenawholebook,dedicatedtoit.ThatiswhywewilljustscratchthesurfaceinthisrecipebyshowingyouhowtousethemostpopularofDataQualitytransforms,Data_Cleanse,toperformthesimplestdatacleansingtask.
GettingreadyTobuildadatacleansingprocess,itwouldbeidealifwehadsourcedatawhichrequiredcleansing.Unfortunately,ourOLTPdatasource,andespeciallyDWHdatasource,alreadycontainprettyconformedandcleandata.Therefore,wearegoingtocreatedirtydatabyconcatenatingmultiplefieldstogethertoseehowDataServicescleansingpackageswillautomaticallyparseandcleansethedataoutoftheconcatenatedtextfield.
Asapreparationstep,makesurethatyouhaveimportedthesethreetablesinyourOLTPdatastore:PERSON,PERSONPHONE,andEMAILADDRESS(allofthemarefromthePERSONschemaoftheSQLServer’sAdventureWorks_OLTPdatabase).
Howtodoit…1. Asthefirststep,createanewjobwithanewdataflowobjectinit.Namethedataflow
DF_Cleanse_Person_Details.2. Importthreetables—PERSON,PERSONPHONE,andEMAILADDRESS—fromtheOLTP
datastoreasasourcetableinsidethedataflow.3. JointhesetablesusingtheQuerytransformwiththejoinconditions,asshowninthe
followingscreenshot:
4. IntheoutputschemaoftheQuerytransform,createtwocolumns:ROWIDofthedatatypeinteger,withthefollowingfunctionasamapping:gen_row_num(),andDATAcolumnofthedatatypevarchar(255),withthefollowingmapping:PERSON.FIRSTNAME||’’||PERSON.MIDDLENAME||’’||PERSON.LASTNAME||’’||PERSONPHONE.PHONENUMBER||’’||EMAILADDRESS.EMAILADDRESS
5. Now,whenwehavepreparedthesourcefieldthatwewillbecleansing,let’simportandconfiguretheData_Cleansetransformsthemselves.DraganddroptheData_CleansetransformobjectsfromLocalObjectLibrary|Transforms|DataQualitytoyourdataflow.PleaserefertothefollowingstepsaseachData_Cleansetransformobjectwillbeimportedandconfigureddifferently.
6. ThefirstData_CleanseobjectwillbeparsingourDATAcolumntoextracttheemailaddressoftheperson.Whenimportingthetransformobjectintothedataflow,choosetheBase_DataCleanseconfiguration.
7. RenametheimportedData_CleansetransformtoEmail_DataCleanseandjointheQuerytransformoutputtoit.
8. OpentheEmail_DataCleansetransformeditorintheworkspacetoconfigureit.9. OntheInputtab,selectEMAIL1intheTransformInputFieldNamecolumnand
mapittotheDATAsourcefield.10. OntheOptionstab,choosePERSON_FIRMasacleansingpackagenameand
configuretherestoftheoptions,asshowninthefollowingscreenshot:
11. OntheOutputtab,selecttheEMAILfield(ofthePARSEDfieldclassrelatedtotheEMAIL1parentcomponent)tobeproducedbytheEmail_DataCleansetransform.ThatwillcreatetheEMAIL1_EMAIL_PARSEDcolumnintheoutputschemaoftheEmail_DataCleansetransform.PropagatethesourceRO0057IDcolumnaswell,whichwillbeusedtojointhecleanseddatasetstogetherinthelatersteps.
12. ClosetheEmail_DataCleanseeditorandimportthesecondData_CleansetransformwiththesameBase_DataCleanseconfiguration.RenametheimportedtransformobjecttoPhone_DataCleanse,joinittotheQuerytransformoutput,andopenitinthemainworkspaceforediting.
13. SelectthesametransformoptionsontheOptionstabasfortheEmail_DataCleansetransformexamplewejustsaw.
14. ChoosePHONE1astheinputparsingcomponent(TransformInputFieldName)andmapittothesourceDATAcolumnfromtheQuerytransformoutput.
15. OntheOutputtabofthePhone_DataCleansetransformeditor,choosethefollowingoutputfieldsfromthelist:
PARENT_COMPONENT FIELD_NAME FIELD_CLASS
NORTH_AMERICAN_PHONE1 NORTH_AMERICAN_PHONE PARSED
NORTH_AMERICAN_PHONE1 NORTH_AMERICAN_PHONE_EXTENSION PARSED
NORTH_AMERICAN_PHONE1 NORTH_AMERICAN_PHONE_LINE PARSED
NORTH_AMERICAN_PHONE1 NORTH_AMERICAN_PHONE_PREFIX PARSED
PHONE1 PHONE PARSED
16. Alsopropagatetwosourcefields,ROWIDandDATA,intotheoutputschemaofthePhone_DataCleansetransform.Closeittofinishediting.
17. WhenimportingthethirdData_Cleansetransform,selectthepredefinedEnglishNorthAmerica_DataCleanseconfigurationandrenamethetransformtoName_DataCleanse.
18. Openthetransformintheworkspaceforediting.YoudonothavetoconfigureanythingontheOptionstabthistime.So,selectthecomponentNAME_LINE1ontheInputtabandthefollowingfieldsontheOutputtab:
PARENT_COMPONENT FIELD_NAME FIELD_CLASS
PERSON1 FAMILY_NAME1 PARSED
PERSON1 GENDER STANDARDIZED
PERSON1 GIVEN_NAME1 PARSED
PERSON1 GIVEN_NAME2 PARSED
PERSON1 PERSON PARSED
19. ClosetheName_DataCleansetransformeditorandjoinallthreeData_CleanseoutputswithasingleJoinQuerytransform.UsetheROWIDcolumntojointhedatasetstogetherandremapthedefaultData_Cleanseoutputnamestomoremeaningfulnames,asshowninthefollowingscreenshot:
20. SpecifyPhone_DataCleanse.DATAISNOTNULLasajoinfilterintheJoinQuerytransformtoexcludetheemptyrecordsfromthemigration.
21. ImportthetargettemplatetableCLEANSE_RESULTstoredintheSTAGEdatastoretosavethecleansingresultsin.
22. Finally,yourdataflowshouldlooklikethis:
23. SaveandexecutethejobtoseethecleansingresultsintheCLEANSE_RESULTtable.
Howitworks…Inthefirstfewstepsoftheprecedingsequence,byconcatenationofthemultiplefieldsfromthesourceOLTPdatabase,wepreparedour“dirty”datacolumn,DATA,whichwasusedasasourcecolumnforallthreeData_Cleansetransforms.
WhenimportingtheData_Cleansetransform,DataServicesoffersyoutheoptiontochoosefromoneofthepredefinedconfigurations.TheBase_DataCleanseconfigurationrequiresyoutoconfigurethemandatoryoptionsmanuallyoryourimportedtransformobjectwillnotwork.
TheData_Cleansetransformisameremappingtooltomapyourinputcolumnstotherequiredparsingrulesanddesiredoutput.Parsingrulesandreferencedataaredefinedinthecleansingpackage,whichcouldbedevelopedandconfiguredbytheInformationStewardCleansingPackageBuildertool.Thistoolprovidesagraphicaluserinterfaceforthistask.Inthisrecipe,weareusingthedefaultcleansingpackagePERSON_FIRMavailableinDataServiceswithouttheneedtohaveInformationStewardinstalled.
NoteThedefaultPERSON_FIRMcleansingpackageallowsyoutoparseandstandardizedates,emails,firmdata,personnames,socialsecuritynumbers,andphonenumbers.
TheInputtaballowsyoutochoosethetypeofcomponentyouwouldliketoparsefromtheinputdataset.Pleasenotethatyoucannotspecifythesamefieldasasourceofdataformultiplecomponents.ThatiswhywehavetocreatethreedistinctData_CleansetransformobjectstoparsethesameDATAcolumnforemail,personname,andphonedata.Eachhasitsownconfigurationandmappingsfrominputcomponentstoadesiredsetofoutputfields.
ThesetoffieldsavailableontheOutputtabdependonwhichcomponentyouhavechosentoberecognizedandparsedontheInputtab,butitbasicallyincludesallpossibleinformationthatcanbeextractedforaselectedcomponent.Forexample,ifitisaPersonnamecomponent,outputdatacleansefieldsincludegivenname,secondgivenname,lastname,gender,andsimilarothers.
PropagationofanartificialROWIDcolumnallowsustojointhesplitdatasetstogetheraftertheyareprocessedbyData_Cleansetransforms.
ToviewtheresultdatausetheViewdataoptiononthetargettableobjectinthedatafloworopenSQLServerManagementStudioandrunthefollowingquerytoseetheparsedresults:selectDATA,EMAIL,PHONE,GIVEN_NAME,GIVEN_NAME_2ND,FAMILY_NAME,
GENDER_STANDARDIZED
fromdbo.CLEANSE_RESULT
Asyoucanseeinthefollowingscreenshot,Data_CleansetransformsdidaprettygoodjobofparsingtheinputDATAfield:
AninterestingresultisstoredintheGENDER_STANDARDIZEDcolumn.Basedontheparsingrulesandreferencedataavailable,DataServicessuggestshowaccuratethedeterminationofgendercouldbebasedsolelyontheavailablegivenandlastnames.
There’smore…Asmentionedbefore,DataServiceshasgreatDataQualitycapabilities.Thisisahugetopicfordiscussion,andwe’vejustscratchedthesurfacebyshowingyouonetransformfromthistoolset.ThispowerfulfunctionalityworksbestwhenDataServicesisintegratedwithInformationSteward.Youcanbuildyourowncleansingpackagestoparsethemigrateddatamoreefficientlyandaccurately.PleaserefertoChapter12,IntroductiontoInformationSteward,formoredetails.
Chapter8.OptimizingETLPerformanceIfyoutriedallthepreviousrecipesfromthebook,youcanconsideryourselffamiliarwiththebasicdesigntechniquesavailableinDataServicesandcanperformprettymuchanyETLdevelopmenttask.Startingfromthischapter,wewillbeginusingadvancedevelopmenttechniquesavailableinDataServices.ThisparticularchapterwillhelpyoutounderstandhowtheexistingETLprocessescanbeoptimizedfurthertomakesurethattheyrunquicklyandefficiently,consumingaslesscomputerresourcesaspossiblewiththeleastamountofexecutiontime.
Optimizingdataflowexecution–push-downtechniquesOptimizingdataflowexecution–theSQLtransformOptimizingdataflowexecution–theData_TransfertransformOptimizingdataflowreaders–lookupmethodsOptimizingdataflowloaders–bulk-loadingmethodsOptimizingdataflowexecution–performanceoptions
IntroductionDataServicesisapowerfuldevelopmenttool.Itsupportsalotofdifferentsourceandtargetenvironments,allofwhichworkdifferentlywithregardtoloadingandextractingdatafromthem.Thisiswhyitisrequiredofyou,asanETLdeveloper,tobeabletoapplydifferentdesignmethods,dependingontherequirementsofyourdatamigrationprocessesandtheenvironmentthatyouareworkingwith.
Inthischapter,wewillreviewthemethodsandtechniquesthatyoucanusetodevelopdatamigrationprocessesinordertoperformtransformationsandmigratedatafromthesourcetotargetmoreeffectively.Thetechniquesdescribedinthischapterareoftenconsideredasbestpractices,butdokeepinmindthattheirusagehastobejustified.Theyallowyoutomoveandtransformyourdatafaster,consumingfewerprocessingresourcesontheETLengine’sserverside.
Optimizingdataflowexecution–push-downtechniquesTheExtract,Transform,andLoadsequencecanbemodifiedtoExtract,Load,andTransformbydelegatingthepowerofprocessingandtransformingdatatothedatabaseitselfwherethedataisbeingloadedto.
Weknowthattoapplytransformationlogictoaspecificdatasetwehavetofirstextractitfromthedatabase,thenpassitthroughtransformobjects,andfinallyloaditbacktothedatabase.DataServicescan(andmostofthetime,should,ifpossible)delegatesometransformationlogictothedatabaseitselffromwhichitperformstheextract.ThesimplestexampleiswhenyouareusingmultiplesourcetablesinyourdataflowjoinedwithasingleQuerytransform.Insteadofextractingeachtable’scontentsseparatelyontoanETLboxbysendingmultipleSELECT*FROM<table>requests,DataServicescansendthegeneratedsingleSELECTstatementwithproperSQLjoinconditionsdefinedintheQuerytransform’sFROMandWHEREtabs.Asyoucanprobablyunderstand,thiscanbeveryefficient:insteadofpullingmillionsofrecordsintotheETLbox,youmightendupwithgettingonlyafew,dependingonthenatureofyourQueryjoins.SometimesthisprocessshortenstoacompletezeroprocessingontheDataServicesside.Then,DataServicesdoesnotevenhavetoextractthedatatoperformtransformations.WhathappensinthisscenarioisthatDataServicessimplysendstheSQLstatementinstructionsintheformofINSERTINTO…SELECTorUPDATE…FROMstatementstoadatabasewhenallthetransformationsarehardcodedinthoseSQLstatementsdirectly.
ThescenarioswhenDataServicesdelegatesthepartsoforalltheprocessinglogictotheunderlyingdatabasearecalledpush-downoperations.
Inthisrecipe,wewilltakealookatdifferentkindsofpush-downoperations,whatrulesyouhavetofollowtomakepush-downworkfromyourdesignedETLprocesses,andwhatpreventspush-downsfromhappening.
GettingreadyAsastartingexample,let’susethedataflowdevelopedintheLoadingdatafromtabletotable–lookupsandjoinsrecipeinChapter4,Dataflow–Extract,Transform,andLoad.Pleaserefertothisrecipetorebuildthedataflowif,forsomereason,youdonothaveitinyourlocalrepositoryanymore.
Push-downoperationscanbeoftwodifferenttypes:
Partialpush-downs:Apartialpush-downiswhenOptimizersendstheSELECTqueryjoiningmultiplesourcetablesusedinadatafloworsendsoneSELECTstatementtoextractdatafromaparticulartablewithmappinginstructionsandfilteringconditionsfromtheQuerytransformhardcodedinthisSELECTstatement.Fullpush-downs:Afullpush-downiswhenalldataflowlogicisreformedbyOptimizerinasingleSQLstatementandsenttothedatabase.ThemostcommonstatementsgeneratedinthesecasesarecomplexINSERT/UPDATEandMERGEstatements,whichincludeallsourcetablesfromthedataflowjoinedtogetherandtransformationsintheformofdatabasefunctionsappliedtothetablecolumns.
Howtodoit…1. TobeabletoseewhatSQLquerieshavebeenpusheddowntothedatabase,openthe
dataflowintheworkspacewindowandselectValidation|DisplayOptimizedSQL….
2. TheOptimizedSQLwindowshowsallqueriesgeneratedbyDataServicesOptimizerandpusheddowntothedatabaselevel.Inthefollowingscreenshot,youcanseetheELECTqueryandpartofthedataflowlogicwhichthisstatementrepresents:
3. Let’strytopushdownlogicfromtherestoftheQuerytransforms.Ideally,wewouldliketoperformafullpush-downtothedatabaselevel.
4. TheLookup_PhoneQuerytransformcontainsafunctioncallwhichextractsthePHONENUMBERcolumnfromanothertable.ThislogiccannotbeincludedasisbecauseOptimizercannottranslateinternalfunctioncallsintoSQLconstruction,whichcouldbeincludedinthepush-downstatement.
5. Let’stemporarilyremovethisfunctioncallbyspecifyingahardcodedNULLvalueforthePHONENUMBERcolumn.Justdeleteafunctioncallandcreateanewoutputcolumninsteadofthevarchar(25)datatype.
6. ValidateandsavethedataflowandopentheOptimizedSQLwindowagaintoseetheresultofthechanges.Straightaway,youcanseehowlogicfromboththeLookup_PhoneandDistinctQuerytransformswereincludedintheSELECTstatement:thedefaultNULLvalueforanewcolumnandDISTINCToperatoratthebeginningofthestatement:
7. Whatremainsforthefullpush-downistheloadingpartwhenalltransformationsandselecteddatasetsareinsertedintothetargettablePERSON_DETAILS.Thereasonwhythisdoesnothappeninthisparticularexampleisbecausethesourcetablesandtargettablesresideindifferentdatastoreswhichconnecttothedifferentdatabases:OLTP(AdventureWorks_OLTP)andSTAGE.
8. SubstitutethePERSON_DETAILStargettablefromtheDS_STAGEdatastorewithanewtemplatetable,PERSON_DETAILS,createdintheDBOschemaofOLTP.
9. Asachange,youcanseethatOptimizernowfullytransformsdataflowlogicintoapushed-downSQLstatement.
Howitworks…DataServicesOptimizerwantstoperformpush-downoperationswheneverpossible.Themostcommonreasons,aswedemonstratedduringtheprecedingsteps,forpush-downoperationsnotworkingareasfollows:
Functions:WhenfunctionsusedinmappingscannotbeconvertedbyOptimizertosimilardatabasefunctionsingeneratedSQLstatements.Inourexample,thelookup_ext()functionpreventspush-downfromhappening.Oneoftheworkaroundsforthisistosubstitutethelookup_ext()functionwithanimportedsourcetableobjectjoinedtothemaindatasetwiththehelpoftheQuerytransform(seethefollowingscreenshot):
Transformobjects:WhentransformobjectsusedinadataflowcannotbeconvertedbyOptimizertorelativeSQLstatements.Sometransformsaresimplynotsupportedforpush-down.Automaticdatatypeconversions:Thesecansometimespreventpush-downfromhappening.Differentdatasources:Forpush-downoperationstoworkforthelistofsourceortargetobjects,thoseobjectsmustresideinthesamedatabaseormustbeimportedintothesamedatastore.Iftheyresideindifferentdatabases,dblinkconnectivityshouldbeconfiguredonthedatabaselevelbetweenthosedatabases,anditshouldbeenabledasaconfigurationoptioninthedatastoreobjectproperties.AllDataServicescandoissendaSQLstatementtoonedatabasesource,soitislogicalthatifyouwanttojoinmultipletablesfromdifferentdatabasesinasingleSQLstatement,youhavetomakesurethatconnectivityisconfiguredbetweendatabases,andthenyoucanrunSQLdirectlyonthedatabaselevelbeforeevenstartingtodeveloptheETLcodeinDataServices.
WhatisalsoimportanttorememberisthatDataServicesOptimizercapabilitiesdependonthetypeofunderlyingdatabasethatholdsyoursourceandtargettableobjects.Ofcourse,ithastobeadatabasethatsupportstheSQLstandardlanguageasOptimizercansendthepush-downinstructionsonlyintheformofSQLstatements.
Sometimes,youactuallywanttopreventpush-downsfromhappening.Thiscanbethecaseif:
ThedatabaseisbusytotheextentthatitwouldbequickertodotheprocessingontheETLboxside.Thisisararescenario,butstillsometimesoccursinreallife.Ifthisisthecase,youcanuseoneofthemethodswejustdiscussedtoartificiallypreventthepush-downfromhappening.YouwanttoactuallymakerowsgothroughtheETLboxforauditingpurposesortoapplyspecialDataServicesfunctionswhichdonotexistatthedatabaselevel.Inthesecases,thepush-downwillautomaticallybedisabledandwillnotbeusedbyDataServicesanyway.
Optimizingdataflowexecution–theSQLtransformSimplyput,theSQLtransformallowsyoutospecifySQLstatementsdirectlyinsidethedataflowtoextractsourcedatainsteadofusingimportedsourcetableobjects.Technically,ithasnothingtodowithoptimizingtheperformanceofETLasitisnotagenerallyrecommendedpracticetosubstitutethesourcetableobjectswiththeSQLtransformcontaininghard-codedSELECTSQLstatements.
Howtodoit…1. TakethedataflowusedinthepreviousrecipeandselectValidation|Display
OptimizedSQL…toseethequerypusheddowntothedatabaselevel.WearegoingtousethisquerytoconfigureourSQLtransformobject,whichwillsubstituteallsourcetableobjectsontheleft-handsideofthedataflow.
2. OntheOptimizedSQLwindow,clickonSaveAs…tosavethispush-downquerytothefile.
3. Drag-and-droptheSQLtransformfromLocalObjectLibrary|Transforms|Platformintoyourdataflow.
4. Nowyoucanremoveallobjectsontheleft-handsideofthedataflowpriortotheLookup_PhoneQuerytransform.
5. OpentheSQLtransformforeditinginaworkspacewindow.ChooseOLTPasadatastoreandcopyandpastethequerysavedpreviouslyfromyourfileintotheSQLtextfield.TocompletetheSQLtransformconfiguration,createoutputschemafieldsofappropriatedatatypeswhichmatchthefieldsreturnedbytheSELECTstatement.
6. ExittheSQLtransformeditorandlinkittothenextLookup_PhoneQuerytransform.OpenLookup_Phoneandmapthesourcecolumnstotarget.
7. Pleasenotethatthedataflowdoesnotperformanynativepush-downqueriesanymore,andwillgiveyouthefollowingwarningmessageifyoutrytodisplayoptimizedSQL:
8. Validatethejobbeforeexecutingittomakesuretherearenoerrors.
Howitworks…Asyoucansee,thestructureoftheSQLtransformisprettysimple.Therearenotmanyoptionsavailableforconfiguration.
Datastore:ThisoptiondefineswhichdatabaseconnectionwillbeusedtopasstheSELECTquerytoDatabasetype:ThisoptionprettymuchduplicatesthevaluedefinedforthespecifieddatastoreobjectCache:ThisoptiondefineswhetherthedatasetreturnedbythequeryhastobecachedontheETLboxArrayfetchsize:ThisoptionbasicallycontrolstheamountofnetworktrafficgeneratedduringdatasettransferfromdatabasetoETLboxUpdateschema:ThisbuttonallowsyoutoquicklybuildthelistofschemaoutputcolumnsfromtheSQLSELECTstatementspecifiedintheSQLtextfield
ThetwomostcommonreasonswhywouldyouwanttouseSQLtransforminsteadofdefiningsourcetableobjectsareasfollows:
Simplicity:Sometimes,youdonotcareaboutanythingelseexceptgettingthingsdoneasfastaspossible.SometimesyoucangettheextractrequirementsintheformofaSELECTstatement,orifyouwanttouseatestedSELECTqueryinyourETLcodestraightaway.ToutilizedatabasefunctionalitywhichdoesnotexistinDataServices:ThisisusuallyapoorexcuseasexperiencedETLdeveloperscandoprettymuchanythingwithstandardDataServicesobjects.However,somedatabasescanhaveinternalnon-standardSQLfunctionsimplementedwhichcanperformcomplextransformations.Forexample,inNetezzayoucanhavefunctionswritteninC++,whichcanbeutilizedinstandardSQLstatementsand,mostimportantly,willbeusingthemassive-parallelprocessingfunctionalityoftheNetezzaengine.Ofcourse,DataServicesOptimizerisnotawareofthesefunctionsandtheonlywaytousethemistorundirectSELECTSQLstatementsagainstthedatabase.IfyouwanttocallaSQLstatementlikethisfromDataServices,themostconvenientwaytodoitfromwithinadataflowistousetheSQLtransformobjectinsidethedataflow.Performancereasons:Onceinawhile,youcangetasetofsourcetablesjoinedtoeachinadataflowforwhichOptimizer—forsomereasonorother—doesnotperformapush-downoperation.Youareveryrestrictedinthewaysyoucancreateandutilizedatabaseobjectsinthisparticulardatabaseenvironment.Insuchcases,usingahard-codedSELECTSQLstatementcanhelpyoutomaintainanadequatelevelofETLperformance.
Asageneralpractice,IwouldrecommendthatyouavoidSQLtransformsasmuchaspossible.Theycancomeinhandysometimes,butwhenusingthem.younotonlylosetheadvantageofutilizingDataServices,theInformationStewardreportingfunctionality,andabilitytoperformauditingoperations,youalsopotentiallycreatebigproblemsfor
yourselfintermsofETLdevelopmentprocess.TablesusedintheSELECTstatementscannotbetracedwiththeViewwereusedfeature.Theycanbemissingfromyourdatastores,whichmeansyoudonothaveacomprehensiveviewofyourenvironmentandunderlyingdatabaseobjectsutilizedbyhidingsourcedatabasetablesinsidetheETLcoderatherthanhavingthemondisplayinLocalObjectLibrary.
ThisobviouslymakesETLcodehardertomaintainandsupport.NottomentionthatmigrationtoanotherdatabasebecomesaproblemasyouwouldmostlikelyhavetorewriteallthequeriesusedinyourSQLtransforms.
NoteTheSQLtransformpreventsthefullpush-downfromhappening,sobecareful.OnlytheSELECTqueryinsidetheSQLtransformispusheddowntodatabaselevel.TherestofthedataflowlogicwillbeexecutedontheETLboxevenifthefullpush-downwasworkingbefore,whenyouhadsourcetableobjectsinsteadoftheSQLtransform.
Inotherwords,theresultdatasetfortheSQLtransformalwaystransferredtotheETLbox.ThatcanaffectthedecisionsaroundETLdesign.Fromtheperformanceperspective,itispreferabletospendmoretimebuildingadataflowbasedonthesourceobjecttablesbutforwhichDataServicesperformsthefullpush-down(producingtheINSERTINTO…SELECTstatement),ratherthanquicklybuildingthedataflowwhichwilltransferdatasetsbackandforthtothedatabase,increasingtheloadtimesignificantly.
Optimizingdataflowexecution–theData_TransfertransformThetransformobjectData_Transferisapureoptimizationtoolhelpingyoutopushdownresource-consumingoperationsandtransformationslikeJOINandGROUPBYtothedatabaselevel.
Gettingready1. TakethedataflowfromtheLoadingdatafromaflatfilerecipeinChapter4,Dataflow–Extract,Transform,andLoad.ThisdataflowloadstheFriends_*.txtfileintoaSTAGE.FRIENDStable.
2. ModifytheFriends_30052015.txtfileandremovealllinesexcepttheonesaboutJaneandDave.
3. Inthedataflow,addanothersourcetable,OLTP.PERSON,andjoinittoasourcefileobjectintheQuerytransformbythefirst-namefield.PropagatethePERSONTYPEandLASTNAMEcolumnsfromthesourceOLTP.PERSONtableintotheoutputQuerytransformschema,asshownhere:
Howtodoit…OurgoalwillbetoconfigurethisnewdataflowtopushdowntheinsertofthejoineddatasetofdatacomingfromthefileanddatacomingfromtheOLTP.PERSONtabletoadatabaselevel.
BycheckingtheOptimizedSQLwindow,youwillseethattheonlyquerysenttoadatabasefromthisdataflowistheSELECTstatementpullingallrecordsfromthedatabasetableOLTP.PERSONtotheETLbox,whereDataServiceswillperformanin-memoryjoinofthisdatawithdatacomingfromthefile.It’seasytoseethatthistypeofprocessingmaybeextremelyinefficientifthePERSONtablehasmillionsofrecordsandtheFRIENDStablehasonlyacoupleofthem.ThatiswhywedonotwanttopullallrecordsfromthePERSONtableforthejoinandwanttopushdownthisjointothedatabaselevel.
Lookingatthedataflow,wealreadyknowthatforthelogictobepusheddown,thedatabaseshouldbeawareofallthesourcedatasetsandshouldbeabletoaccessthembyrunningasingleSQLstatement.TheData_TransfertransformwillhelpustomakesurethattheFriendsfileispresentedtoadatabaseasatable.Followthesestepstoseehowitcanbedone:
1. AddtheData_TransferobjectfromLocalObjectLibrary|Transforms|Data
Integratorintoyourdataflow,puttingitbetweenthesourcefileobjectandtheQuerytransform.
2. EdittheData_Transferobjectbyopeningitinaworkspacewindow.SetTransfertypetoTableandspecifythenewtransfertableintheTableoptionssectionwithSTAGE.DBO.FRIENDS_FILE.
3. ClosetheDataTransfertransformeditorandselectValidation|DisplayOptimizedSQL…toseethequeriespusheddowntoadatabase.YoucanseethattherearenowtwoSELECTstatementsgeneratedtopulldatafromtheOLTP.PERSONandSTAGE.FRIENDS_FILEtables.
ThejoinbetweenthesetwodatasetshappensontheETLbox.ThenthemergeddatasetissentbacktothedatabasetobeinsertedintotheDS_STAGE.FRIENDStable.
4. AddanotherData_TransfertransformationbetweenthesourcetablePERSONandtheQuerytransform.IntheData_Transferconfigurationwindow,setTransfertypetoTableandspecifyDS_STAGE.DBO.DT_PERSONasthedatatransfertable.
5. ValidateandsavethedataflowanddisplaytheOptimizedSQLwindow.
Nowyoucanseethatwesuccessfullyimplementedafullpush-downofdataflowlogic,
insertingmergeddatafromtwosourceobjects(oneofwhichisaflatfile)intoastagingtable.Intheprecedingscreenshot,logicinthesectionmarkedasredisrepresentedbyaSQLstatementINSERTpusheddowntothedatabaselevel.
Howitworks…Underthehood,Data_Transfertransformcreatesasubprocessthattransfersthedatatothespecifiedlocation(fileortable).Simplyput,Data_Transferisatargetdataflowobjectinthemiddleofadataflow.Ithasalotofoptionssimilartowhatothertargettableobjectshave;inotherwords,youcansetupabulk-loadingmechanism,runPre-LoadCommandsandPost-LoadCommands,andsoon.
ThereasonwhyIcalledData_TransferapureoptimizationtoolisbecauseyoucanredesignanydataflowtodothesamethingthatData_Transferdoeswithoutusingit.Allyouhavetodoistosimplysplityourdataflowintwo(orthree,forthedataflowinourexample).InsteadofforwardingyourdataintoaData_Transfertransform,youforwardittoanormaltargetobjectandthen,inthenextdataflow,youusethisobjectasasource.
NoteWhatData_Transferstilldoes,whichcannotbedoneeasilywhenyouaresplittingdataflows,isautomaticallycleanuptemporarydatatransfertables.
Itiscriticaltounderstandhowpush-downmechanismsworkinDataServicestobeabletoeffectivelyusetheData_Transfertransform.Puttingittouseatthewrongplaceinadataflowcandecreaseperformancedrastically.
WhyweusedasecondData_TransfertransformobjectOurgoalwastomodifythedataflowinsuchawayastogetafullpush-downSQLstatementtobegenerated:INSERTINTOSTAGE.FRIENDSSELECT<joinedPERSONandFRIENDSdatasets>.
Aswerememberfromthepreviousrecipe,therecouldbemultiplereasonswhyfullpush-downdoesnotwork.Oneofthesereasons,whichiscausingtroubleinourcurrentexample,isthatthePERSONtableresidesinadifferentdatabase,whileourdatatransfertable,FRIENDS_FILE,andtargettable,FRIENDS,resideinthesameSTAGEdatabase.
Tomakethefullpush-downwork,wehadtouseasecondData_TransfertransformobjecttotransferdatafromtheOLTP.PERSONtableintoatemporarytablelocatedinaSTAGEdatabase.
WhentouseData_TransfertransformWheneveryouencounterasituationwhereadataflowhastoperformavery“heavy”transformation(saytheGROUPBYoperation,forexample)orjointwoverybigdatasetsandthisoperationishappeningonanETLbox.Inthesecases,itismuchquickertotransfertherequireddatasetstothedatabaselevelsothattheresource-intensiveoperationcanbecompletedtherebythedatabase.
There’smore…OneofthegoodexamplesofausecasefortheData_TransfertransformiswhenyouhavetoperformtheGROUPBYoperationinaQuerytransformrightbeforeinsertingdataintoatargettableobject.ByplacingData_TransferrightbeforetheQuerytransformattheendofthedataflow,youcanquicklyinsertthedatasetprocessedbydataflowlogicbeforetheQuerytransformwiththeGROUPBYoperationandthenpushdowntheINSERTandGROUPBYoperationsinasingleSQLstatementtoadatabaselevel.
Whenyouperformthetransformationsondatasetswhichincludemillionsofrecords,usingtheData_Transfertransformcansaveyouminutes,andsometimeshours,dependingonyourenvironmentandthenumberofprocessedrecords.
Optimizingdataflowreaders–lookupmethodsTherearedifferentwaysinwhichtoperformthelookupofarecordfromanothertableinDataServices.Thethreemostpopularonesare:atablejoinwithaQuerytransform,usingthelookup_ext()function,andusingthesql()function.
Inthisrecipe,wewilltakealookatallthesemethodsanddiscusshowtheyaffecttheperformanceofETLcodeexecutionandtheirimpactonadatabaseusedtosourcedatafrom.
GettingreadyWewillbeusingthesamedataflowasinthefirstrecipe,theonewhichpopulatesthePERSON_DETAILSstagetablefrommultipleOLTPtables.
Howtodoit…WewillperformalookupforthePHONENUMBERcolumnofapersonfromtheOLTPtablePERSONPHONEinthreedifferentways.
LookupwiththeQuerytransformjoin1. Importthelookuptableintoadatastoreandaddthetableobjectasasourceinthe
dataflowwhereyouneedtoperformthelookup.2. UsetheQuerytransformtojoinyourmaindatasetwiththelookuptableusingthe
BUSINESSENTITYIDreferencekeycolumn,whichresidesinbothtables.
Lookupwiththelookup_ext()function1. RemovethePERSONPHONEsourcetablefromyourdataflowandclearoutthejoin
conditionsintheLookup_PhoneQuerytransform.2. Asyouhaveseenintherecipesinpreviouschapters,thelookup_ext()functioncan
beexecutedasafunctioncallintheQuerytransformoutputcolumnslist.Theotheroptionistocallthelookup_ext()functioninthecolumnmappingsection.Forexample,saythatwewanttoputanextraconditiononwhenwewanttoperformalookupforspecificvalue.
InsteadofcreatinganewfunctioncallforlookingupthePHONENUMBERcolumnforallmigratedrecords,let’sputintheconditionthatwewanttoexecutethelookup_ext()functiononlywhentherowhasnonemptyADDRESSLINE1,CITY,andCOUNTRYcolumns;otherwise,wewanttousethedefaultvalueUNKNOWNLOCATION.
3. InsertthefollowinglinesintheMappingsectionofthePHONENUMBERcolumninsidetheLookup_PhoneQuerytransform:ifthenelse(
(Get_Country.ADDRESSLINE1ISNULL)OR
(Get_Country.CITYISNULL)OR
(Get_Country.COUNTRYISNULL),‘UNKNOWNLOCATION’,
lookup_ext()
)
4. Nowdouble-clickonthelookup_ext()texttohighlightonlythelookup_extfunctionandright-clickonthehighlightedareaforthecontextmenu.
5. Fromthiscontextmenu,selectModifyFunctionCalltoopentheLookup_extparameterconfigurationwindow.ConfigureittoperformalookupforaPHONENUMBERfieldvaluefromthePERSONPHONEtable.
Afterclosingthefunctionconfigurationwindow,youcanseethefullcodegeneratedbyDataServicesforthelookup_ext()functionintheMappingsection.
Whenselectingtheoutputfield,youcanseeallsourcefieldsusedinitsMappingsectionhighlightedintheSchemaInsectionontheleft-handside.
Lookupwiththesql()function1. OpentheLookup_PhoneQuerytransformforeditingintheworkspaceandclearout
allcodefromthePHONENUMBERmappingsection.2. PutthefollowingcodeintheMappingsection:
sql(‘OLTP’,‘selectPHONENUMBERfromPerson.PERSONPHONEwhereBUSINESSENTITYID=
Howitworks…QuerytransformjoinsTheadvantagesofthismethodare:
Codereadability:Itisveryclearwhichsourcetablesareusedintransformationwhenyouopenthedataflowinaworkspace.Push-downlookuptothedatabaselevel:ThiscanbeachievedbyincludingalookuptableinthesameSELECTstatement.Yes,assoonasyouhaveplacedthesourcetableobjectinthedataflowandjoineditproperlywithotherdatasourcesusingtheQuerytransform,thereisachancethatitwillbepusheddownasasingleSQLSELECTstatement,allowingthejoiningofsourcetablesatthedatabaselevel.DSmetadatareportfunctionalityandimpactanalysis:Themaindisadvantageofthismethodcomesnaturallyfromitsadvantage.Ifarecordfromthemaindatasetreferencesmultiplerecordsinthelookuptablebythekeycolumnused,theoutputdatasetwillincludemultiplerecordswithallthesevalues.ThatishowstandardSQLqueryjoinswork,andtheDataServicesQuerytransformworksinthesameway.Thiscouldpotentiallyleadtoduplicatedrecordsinsertedintoatargettable(duplicatedbykeycolumnsbutwithdifferentvaluesinthelookupfield,forexample).
lookup_ext()TheoppositeofaQuerytransform,thisfunctionhidesthesourcelookuptableobjectfromthedeveloperandfromsomeoftheDataServicesreportingfunctionality.Asyouhaveseen,itcanbeexecutedasafunctioncallorusedinthemappinglogicforaspecificcolumn.
Thisfunction’smainadvantageisthatitwillalwaysreturnasinglevaluefromthelookuptable.Youcanevenspecifythereturnpolicy,whichwillbeusedtodeterminethesinglevaluetoreturn—MAXorMIN—withtheabilitytoorderthelookuptabledatasetbyanycolumn.
sql()Similartothelookup_ext()functioninthepresentedexample,itisrarelyusedthatwayaslookup_ext()fetchesrowsfromthelookuptablemoreefficiently,ifallyouwanttodoistoextractvaluesfromthelookuptablereferencingkeycolumns.
Atthesametime,thesql()functionmakespossibletheimplementationofverycomplexandflexiblesolutionsasitallowsyoutopassanySQLstatementthatcanbeexecutedonthedatabaseside.Thiscanbetheexecutionofstoredprocedures,thegenerationofthesequencenumbers,runninganalyticalqueries,andsoon.
Asageneralrule,though,theusageofthesql()functioninthedataflowcolumnmappingsisnotrecommended.Themainreasonforthisisperformance,asyouwillseefurtheron.DataServiceshasarichsetofinstrumentstoperformthesametaskbutwithapropersetofobjectsandETLcodedesign.
PerformancereviewLet’squicklyreviewdataflowexecutiontimesforeachoftheexplainedmethods.
Thefirstmethod:ThelookupwiththeQuerytransformtook6.4seconds.
Thesecondmethod:Thelookupwiththelookup_ext()functiontook6.6seconds.
Thethirdmethod:Thisusedthesql()functionandtook73.3seconds.
Thefirsttwomethodslooklikethemethodswithsimilareffectiveness,butthatisonlybecausethenumberofrowsandthesizeofthedatasetusedisverysmall.Thelookup_ext()functionallowstheusageofthedifferentcachemethodsforthelookupdataset,whichmakesitpossibletotuneandconfigureitdependingonthenatureofyourmaindataandthatofthelookupdata.ItcanalsobeexecutedasaseparateOSprocess,increasingtheeffectivenessoffetchingthelookupdatafromthedatabase.
Thethirdfigureforthesql()function,onthecontrary,showstheperfectexampleofextremelypoorperformancewhenthesql()functionisusedinthecolumnmappings.
Optimizingdataflowloaders–bulk-loadingmethodsBydefault,allrecordsinsideadataflowcomingtoatargettableobjectaresentasseparateINSERTcommandstoatargettableatthedatabaselevel.IfmillionsofrecordspassthedataflowandtransformationhappensontheETLboxwithoutpush-downs,theperformanceofsendingmillionsofINSERTcommandsoverthenetworkbacktoadatabaseforinsertioncouldbeextremelyslow.Thatiswhyitispossibletoconfigurethealternativeloadmethodsonthetargettableobjectinsideadataflow.Thesetypesofloadsarecalledbulk-loadloads.Bulk-loadmethodsaredifferentinnature,butallofthemhavethemainprincipleandachievethesamegoal—theyavoidtheexecutionofmillionsofINSERTstatementsforeachmigratedrecord,providingalternativewaysofinsertingdata.
Bulk-loadmethodsexecutedbyDataServicesforinsertingdataintoatargettablearecompletelydependentonthetypeoftargetdatabase.Forexample,OracleDatabaseDataServicescanimplementbulk-loadingthroughthefilesorthroughtheOracleAPI.
Bulk-loadingmechanismsforinsertingdataintoNetezzaorTeradataarecompletelydifferent.YouwillnoticethisstraightawayifyoucreatedifferentdatastoresconnectingtodifferenttypesofdatabasesandcomparethetargettableBulkLoaderOptionstabtothetargettableobjectfromeachofthesedatastores.
Fordetailedinformationabouteachbulk-loadmethodavailableforeachdatabase,pleaserefertoofficialSAPdocumentation.
Howtodoit…Toseethedifferencebetweenloadingdatainnormalmode—rowbyrow—andbulkloading,wehavetogeneratequiteasignificantnumberofrows.Todothis,takethedataflowfromapreviousrecipe,Optimizingdataflowexecution–theSQLtransform,andreplicateittocreateanothercopyforusinginthisrecipe.NameitDF_Bulk_Load.
Openthedataflowintheworkspacewindowforediting.
1. AddanewRow_GenerationtransformfromLocalObjectLibrary|Transforms|
Platformasasourceobjectandconfigureittogenerate50rows,startingwithrownumber1.
2. TheRow_Generationtransformisusedtomultiplythenumberofrowscurrentlybeingtransformedbythedataflowlogic.Previously,thenumberofrowsreturnedbythePerson_OLTPSQLtransformwasapproximately19,000.ByperformingaCartesianjoinoftheserecordsto50artificiallygeneratedrecords,wecangetalmost1millionrecordsinsertedinatargetPERSON_DETAILStable.ToimplementCartesianjoin,usetheQuerytransformbutwithoutspecifyinganyjoinconditionsandleavingthesectionempty.
3. Yourdataflowshouldlooklikethis:
4. Totestthecurrentdataflowexecutiontime,saveandrunthejob,whichincludesthisdataflow.Yourtargettable’sBulkLoaderOptiontabshouldbedisabled,andontheOptionstab,theDeletedatafromtablebeforeloadingflagshouldbeselected.
5. Theexecutiontimeofthedataflowis49seconds,andasyoucansee,ittook42secondsforDataServicestoinsert9,39,900recordsintothetargettable.
6. Toenablebulkloading,openthetargettableconfigurationintheworkspaceforediting,gototheBulkLoaderOptionstab,andcheckBulkload.Afterthat,setModetotruncateandleaveotheroptionsattheirdefaultvalues.
7. Saveandexecutethejobagain.8. Thefollowingscreenshotshowsthattotaldataflowexecutiontimewas27seconds,
andittook20secondsforDataServicestoloadthesamenumberofrecords.ThatistwotimesfasterthanloadingrecordsinnormalmodeintotheSQLServerdatabase.YourtimecouldbeslightlydifferentdependingonthehardwareyouareusingforyourDataServicesanddatabaseenvironments.
Howitworks…Availabilityofthebulk-loadmethodsistotallydependentonwhichdatabaseyouuseasatarget.DataServicesdoesnotperformanymagic;itsimplyutilizesbulk-loadingmethodsavailableinadatabase.
Thesemethodsaredifferentfordifferentdatabases,buttheprincipleofbulkloadingisusuallyasfollows:DataServicessendstherowstothedatabasehostasquicklyaspossible,writingthemintoalocalfile.Then,DataServicesusestheexternaltablemechanismavailableinthedatabasetopresentthefileasarelationaltable.Finally,itexecutesafewUPDATE/INSERTcommandstoquerythisexternaltableandinsertdataintoatargettablespecifiedasatargetobjectinaDataServicesdataflow.
TorunoneINSERT…SELECTFROMcommandismuchfasterthantoexecute1millionINSERTcommands.
Somedatabasesperformthesesmallinsertoperationsquiteeffectively,whileforothersthiscouldbeareallybigproblem.Inalmostallcases,ifwetalkaboutasignificantnumberofrecords,thebulk-loadingmethodwillalwaysbethequickerwaytoinsertdata.
Whentoenablebulkloading?Youhaveprobablynoticedthatassoonasyouenablebulkloadingintargettableconfiguration,theOptionstabbecomesgrayedout.Unfortunately,byenablingbulkloading,youloseallextrafunctionalityavailableforloadingdata,suchasautocorrectload,forexample.Thishappensbecauseofthenatureofthebulk-loadoperation.DataServicessimplypassesthedatatothedatabaseforinsertionandcannotperformextracomparisonoperations,whichareavailableforrow-by-rowinserts.
Theotherreasonfornotusingbulkloadingisthatenabledbulkloadingpreventsfullpush-downsfromoccurring.Ofcourse,inmostofthecasespush-downisthebestpossibleoptionintermsofexecutionperformance,soyouwouldneverthinkaboutenablingbulkloadingifyouhavefullpush-downworking.Forpartialpush-downs,whenyoupushdownonlySELECTqueriestogetdataontotheETLboxfortransformation,bulkloadingisperfectlyvalid.Youstillwanttosendrecordsbacktothedatabaseforinsertionandwanttodoitasquicklyaspossible.
Mostofthetime,bulkloadingdoesaperfectjobwhenyouarepassingabignumberofrowsforinsertionfromtheETLboxanddonotutilizeanyextraloadingoptionsavailableinDataServices.
Thebestadviceintermsofmakingdecisionstoenableornotenablebulkloadingonyourtargettableistoexperimentandtrydifferentwaysofinsertingdata.Thisisadecisionwhichshouldtakeintoaccountallparameters,suchasenvironmentconfiguration,workloadonaDataServicesETLbox,workloadonadatabase,andofcourse,thenumberofrowstobeinsertedintoatargettable.
Optimizingdataflowexecution–performanceoptionsWewillreviewafewextraoptionsavailablefordifferenttransformsandobjectsinDataServiceswhichaffectperformanceand,sometimes,thewayETLprocessesandtransformsdata.
GettingreadyForthisrecipe,usethedataflowfromtherecipeOptimizingdataflowreaders–lookupmethodsinthischapter.Pleaserefertothisrecipeifyouneedtocreateorrebuildthisdataflow.
Howtodoit…DataServicesperformance-relatedconfigurationoptionscanbeputunderthefollowingcategories:
DataflowperformanceoptionsSourcetableperformanceoptionsQuerytransformperformanceoptionsLookupfunctionsperformanceoptionsTargettableperformanceoptions
Inthefollowingsections,wewillreviewandexplainallofthemindetails.
DataflowperformanceoptionsToaccessdataflowperformanceoptions,right-clickonadataflowobjectandselectPropertiesfromthecontextmenu.
TheDegreeofparallelismoptionreplicatestransformprocessesinsidethedataflowaccordingtothenumberspecified.DataServicescreatesseparatesubdataflowprocessesandexecutestheminparallel.Atthepointsinthedataflowwheretheprocessingcannotbeparallelized,dataismergedbacktogetherfromdifferentsubdataflowprocessesinthemaindataflowprocess.IfthesourcetableusedinthedataflowispartitionedandthevalueintheDegreeofparallelismoptionishigherthan1,DataServicescanusemultiplereaderprocessestoreadthedatafromthesametable.Eachreaderreadsdatafromcorrespondingpartitions.Then,dataismergedorcontinuedtobeprocessedinparallelifthenexttransformobjectallowsparallelization.
FordetailedinformationonhowtheDegreeofParallelismoptionworks,pleaserefertotheofficialdocumentation,SAPDataServices:PerformanceOptimizationGuide.Youshouldbeverycarefulwiththisparameter.TheusageandvalueofDegreeofparallelismshoulddependonthecomplexityofthedataflowandontheresourcesavailableonyourDataServicesETLserver,suchasthenumberofCPUsandamountofmemoryused.
IftheUsedatabaselinksoptionisconfiguredonbothdatabaseandDataServicesdatastorelevels,databaselinkscanhelptoproducepush-downoperations.Usethisoptiontoenableordisabledatabaselinksusageinsideadataflow.
Cachetypedefineswhichtypeofcachewillbeusedinsideadataflowforcachingdatasets.APageablecacheisstoredontheETLserver’sphysicaldiskandIn-Memorykeepsthecacheddatasetinmemory.Ifthedataflowprocessesverylargedatasets,itis
recommendedthatyouuseapageablecachetonotrunoutofmemory.
SourcetableperformanceoptionsOpenyourdataflowintheworkspaceanddouble-clickonanysourcetableobjecttoopenthetableconfigurationwindow.
ArrayfetchsizeallowsyoutooptimizethenumberofrequestsDataServicessendstofetchthesourcedatasetontotheETLbox.Thehigherthenumberused,thefewertherequeststhatDataServiceshastosendtofetchthedata.Thissettingshouldbedependentonthespeedofyournetwork.Thefasteryournetworkis,thehigherthenumberyoucanspecifytomovethedatainbiggerchunks.Bydecreasingthenumberofrequests,youcanpotentiallyalsodecreasetheCPUusageconsumptiononyourETLbox.
Joinrankspecifiesthe“weight”ofthetableusedinQuerytransformswhenyoujoinmultipletables.Thehighertherank,theearlierthetablewillbejoinedtotheothertables.IfyouhaveeveroptimizedSQLstatements,youknowthatspecifyingbigtablesinthejoinconditionsearliercanpotentiallydecreasetheexecutiontime.Thisisbecausethenumberofrecordsafterthefirstjoinpaircanbedecreaseddramaticallythroughinnerjoins,forexample.Thismakesthejoinpairsfurtheronproducesmallerdatasetsandrunquicker.ThesameprincipleapplieshereinDataServicesbuttospecifytheorderofjoinpairs,youcanusetherankoption.
CachecanbesetupifyouwantthesourcetabletobecachedontheETLserver.Thetypeofcacheusedisdeterminedbythedataflowcachetypeoption.
QuerytransformperformanceoptionsOpentheQuerytransformintheworkspacewindow:
Joinrankoffersthesameoptionsasdescribedearlierandallowsyoutospecifytheorderinwhichthetablesarejoined.
Cacheis,again,thesameasdescribedearlieranddefineswhetherthetablewillbecachedontheETLserver.
lookup_ext()performanceoptionsRight-clickontheselectedlookup_extfunctioninthecolumnmappingsectionoronthefunctioncallintheoutputschemaoftheQuerytransformandselectModifyFunctionCallinthecontextmenu:
Cachespecdefinesthetypeofcachemethodusedforthelookuptable.NO_CACHEmeansthat,foreveryrowinthemaindataset,aseparateSELECTlookupqueryisgenerated,extractingvaluefromthedatabaselookuptable.WhenPRE_LOAD_CACHEisused,thelookuptablefirstpulledtotheETLboxandcachedinmemoryoronthephysicaldisk(dependingonthedataflowcachetypeoption).DEMAND_LOAD_CACHEisamorecomplexmethodbestusedwhenyouarelookinguprepetitivevalues.Onlythenisitmostefficient.DataServicescachesonlyvaluesalreadyextractedfromthelookuptable.Ifitencountersanewkeyvaluethatdoesnotexistinthecachedtable,itmakesanotherrequesttothelookuptableinthedatabasetofinditandthencachesittoo.
Runasaseparateprocesscanbeencounteredinmanyothertransformsandobjectconfigurationoptions.Itisusefulwhenthetransformisperforminghigh-intensiveoperationsconsumingalotofCPUandmemoryresources.Ifthisoptionischecked,DataServicescreatesseparatesubdataflowprocessesthatperformthisoperation.Potentially,thisoptioncanhelpparallelizeobjectexecutionwithinadataflowandspeedupprocessingandtransformationssignificantly.Bydefault,theOScreatesasingleprocessforadataflow,andifnotparallelized,allprocessingisdonewithinthissingleOSprocess.RunasseparateprocesshelptocreatemultipleprocesseshelpingmaindataflowOSprocesstoperformallextracts,joinandcalculationsasfastaspossible.
TargettableperformanceoptionsClickonatargettabletoopenitsconfigurationoptionsintheworkspacewindow:
RowspercommitissimilartoArrayfetchsizebutdefineshowmanyrowsaresenttoadatabasewithinthesamenetworkpacket.Dodecreaseamountsofpacketswithrowsforinsertsenttoadatabaseyoucanincreasethisnumber.
Numberofloadershelpstoparallelizetheloadingprocesses.EnablepartitionsonthetableobjectsontheDatastorestabifthetablesarepartitionedatthedatabaselevel.Iftheyarenotpartitioned,setthesamenumberofloadersasDegreeofparallelism.
Chapter9.AdvancedDesignTechniquesThetopicswewillcoverinthischapterinclude:
ChangeDataCapturetechniquesAutomaticjobrecoveryinDataServicesSimplifyingETLexecutionwithsystemconfigurationsTransformingdatawiththePivottransform
IntroductionThischapterwillguideyouthroughtheadvancedETLdesignmethods.MostofthemwillutilizeDataServicesfeaturesandfunctionalityalreadyexplainedinthepreviouschapters.Asyouhaveprobablynoticed,therearemanywaystodothesamethinginDataServices.Themethodsandlogicyouapplytosolvethespecificproblemoftendependonenvironmentcharacteristicsandsomeotherconditions,suchasdevelopmentresourcesandextractrequirementsappliedtothesourcesystems.Onthecontrary,someofthemethodsandtechniquesexplainedfurtherdonotdependonallthesefactorsandcouldbeconsideredasETLdevelopmentbestpractices.
Inthischapterwewilldiscussaverypopularmethodofpopulatingslowlychangingdimensionsindatawarehouse,whichrequiretheuseofacombinationofDataServicestransformsanddataflowdesigntechniques.
WewillalsoreviewautomaticrecoverymethodsavailableinDataServices,whichallowyoutoeasilyrestartpreviouslyfailedjobswithoutperformingextrarecoverystepsforvariouscomponentsofETLcodeandunderlyingtargetdatastructures.
AnothertopicdiscussedinthischapteristheusageofsystemconfigurationsinDataServices.ThisfeatureallowsyoutosimplifyyourETLdevelopmentandmakesiteasytorunthesamejobsagainstvarioussourcesandtargetenvironments.
Finally,wewillreviewoneoftheadvancedDataServicestransformsthatenablesyoutoimplementthepivotingtransformationmethodonthepassingdataconvertingrowsintocolumnsandviceversa.
ChangeDataCapturetechniquesChangeDataCapture(CDC)isthemethodofdevelopingETLprocessestopropagatechangesinthesourcesystemintoyourdatawarehousefordimensiontables.
GettingreadyCDCisdirectlyrelatedtoanotherDWHconceptofSlowlyChangingDimensions(SCD),thedimensiontablesthatdatachangesconstantlythroughoutthelifeofdatawarehouse.
AgoodexamplewouldbetheEmployeedimensiontable,whichholdsdataontheemployeesinyourcompany.Asyoucanimagine,thistableisinconstantflux:newemployeesarehiredandsomeemployeesleavethecompany,changepositionsandroles,oreventransferbetweendepartments.AllthesechangeshavetobepropagatedtoanEmployeedimensiontableinDWHfromthesourcesystems,whichalwaysstoreonlythelateststateoftheEmployeedata.InDWH,inmostcases,formostofthedimensiontables,youwanttokeepthehistoricaldatatobeabletoderivethestateoftheEmployeedataataspecificpointoftimeinthepast.ThatiswhySCDtableshaveextrafieldstoaccommodatehistoricaldataandcanbepopulatedusingvariousmethods,dependingontheirtype.
TherearemanydifferenttypesofSCDtables,butwewillquicklydiscussonlythethreemainonesastherestarejustcombinationsofthesethree.WewillrefertoSCDtypenumbersaccordingtoRalphKimball’smethodologyinbrackets.
Asanexample,let’stakethecaseoftheEmployeedimensiontablewhenoneemployeeJohngetstransferredfrommarketingtofinance.
NohistorySCD(Type1)Yes,anohistorySCDtableisonethatdoesnotstorehistoricaldataatall.Recordsareinserted(newrecords)andupdated(changes).Takealookatthefollowingexample.
TheoriginalrecordforJohnlookslikethis:
ID NAME DEPARTMENT
1 John Marketing
Here’swhatthenewrecordlookslikeafterthechangesareapplied:
ID NAME DEPARTMENT
1 John Finance
ThistypeofSCDdoesnotkeephistoricalrecordsatall;asyoucansee,thereisnoinformationthatJohnhaseverworkedinadifferentdepartment.
LimitedhistorySCD(Type3)Alimitedhistorytableusesextrafieldsinthesamerecordtokeepthecurrentvalueandapreviousvalue,asshownhere:
ID NAME DEP_PREV DEP_CUR EFFECTIVE_DATE
1 John Marketing Finance 27/02/2015
Itis“limited”asyouhavetoaddextracolumnsforeverynew“historicalstate”oftherow.Intheprecedingexample,youcankeeptrackofonlythecurrentandpreviousvaluesoftherecord.
UnlimitedhistorySCD(Type2)Unlimitedhistoryispossibleifyoucreatemultiplerecordsforeachentity.Onlyonerecordrepresentsthecurrentvalue.OneofthevariationsofanunlimitedhistorySCDisshowninthefollowingtable:
KEY ID NAME DEPARTMENT START_DT END_DT CUR_FLAG
1 1 John Marketing 1582/01/01 27/02/2015 N
2 1 John Finance 27/02/2015 9999/12/31 Y
TheIDisanaturalkeyinthedimensiontable.ForJohn,thisis1.ThistypeofSCDrequiresthecreationofasurrogatekeytodefinetheuniquenessoftherecord.TheCUR_FLAGfielddefinescurrentrecord.TheSTART_DTandEND_DTcolumnsshowtheperiodoftimewhentherecordwasvalid/current.Notethatthesedatefieldsdonotrepresentanybusinessvaluesuchasstartemploymentdateordateofbirth.Theyjustshowthestartandenddatesoftheperiodwhentherecordwasvalid(orcurrent)andareonlyusedtoaccommodatepreservinghistoricalrecords.WhenpopulatinginitialrecordsforthefirsttimeinanSCDtable,youmayoftenwanttousedatesfromthedistantpastandfuture,suchas1582/01/01and9999/12/31,called“low”and“high”datevalues.Thisallowsuserstorunreportswhichretrievemoreaccuratehistoricalinformation.
ByusingalowdateintheSTART_DTfield,wemarktherecordasaninitialhistoricalrecordinourdimensiontable.ThesamegoesforusingahighdateintheEND_DTcolumn.ItalwayshasaCURR_INDflagsettoYandshowsthelatest(current)recordinthehistorytable.
EachtimeyoumakeachangetotheEmployeetable,inourcasetotheNAMEorDEPARTMENTfields,youhavetoupdatethe“current”recordbychangingtheEND_DTandCUR_FLAGfieldvalueswiththedateofchangeandN,respectively,andyoualsohavetoinsertanewrecordwithSTART_DTsettothedateofchangeandCUR_FLAGsettoY.
Inthisrecipe,wewillbuildadataflowthatpopulatestheSCDtableoftheunlimitedhistorytype(asshownintheType2example).DataServiceshasaspecialtransformobjectcalledHistory_Preserving,whichallowstheautomaticupdate/insertofthechangedandnewhistoryrecords.
Howtodoit…TobuildtheCDCprocess—whichwillupdateourtargetSCDtableindatawarehouse—fromasourceOLTPtable,weneedtohavetwodataflows.ThefirstwillextractdatafromthesourceOLTPsystemintoastagingtablelocatedintheSTAGEdatabase,andthesecondwillusethisSTAGEtabletocomparedatainitwiththetargettablecontentsandwillproducethehistoryrecords(inforofINSERTandUPDATESQLstatements)topropagatethedatechangesintoatargetSCDtable.
1. Createanewjobandnewextractdataflow,DF_OLTP_Extract_STAGE_Employee,that
extractstheEmployeetablefromtheHumanResourcesschemaintoastagingtable,STAGE_EMPLOYEE.
ForourfutureEmployeeSCDtable,wewillonlybeextractingthefollowinglistoffieldsfromOLTP.EMPLOYEE:
Field Description
BUSINESSENTITYID PrimarykeyforEmployeerecords
NATIONALIDNUMBER UniquenationalID
LOGINID Networklogic
ORGANIZATIONLEVEL Thedepthoftheemployeeinthecorporatehierarchy
JOBTITLE Worktitle
BIRTHDATE Dateofbirth
MARITALSTATUS M=Married,S=Single
GENDER M=Male,F=Female
HIREDATE Employeehiredonthisdate
SALARIEDFLAG Jobclassification
VACATIOINHOURS Numberofavailablevacationhours
SICKLEAVEHOURS Numberofavailablesickleavehours
MaponlythesefieldstotheoutputschemaoftheExtractQuerytransform.
2. Createanewdataflow,DF_STAGE_Load_DWH_Employee,andlinkthefirstextractdataflowtoitinthesamejob.
3. CreateanemptytargetSCDtable,EMPLOYEE,byusingtheCREATETABLEstatementinSQLServerManagementStudiowhenconnectedtotheAdventureWorks_DWHdatabase.CREATETABLE[dbo].[EMPLOYEE](
[ID][decimal](22,0)NULL,
[BUSINESSENTITYID][int]NULL,
[NATIONALIDNUMBER][varchar](15)NULL,
[LOGINID][varchar](256)NULL,
[ORGANIZATIONLEVEL][int]NULL,
[JOBTITLE][varchar](50)NULL,
[BIRTHDATE][date]NULL,
[MARITALSTATUS][varchar](1)NULL,
[GENDER][varchar](1)NULL,
[HIREDATE][date]NULL,
[SALARIEDFLAG][int]NULL,
[VACATIONHOURS][int]NULL,
[SICKLEAVEHOURS][int]NULL,
[START_DT][date]NULL,
[END_DT][date]NULL,
[CUR_FLAG][varchar](1)NULL
)ON[PRIMARY]
4. ImporttheEMPLOYEEtablecreatedinthepreviousstepintotheDWHdatastore.5. OpentheDF_STAGE_Load_DWH_Employeedataflowintheworkspacewindowtoeditit
andaddtherequiredtransformations,asshowninthefollowingfigure.
ThesestepsexplaintheconfigurationofeachoftheDFobjectswejustused:
1. TheQuerytransformisusedtocreateanextrafield,START_DT,ofthedatedatatype.
ItwillbeusedbyaHistory_PreservingtransformtoproducethestartdateofthehistoryrecordinthetargetSCDtable.
2. TheTable_ComparisontransformisusedtocomparethedatasetfromtheSTAGE_EMPLOYEEtabletothetargetSCDtabledatasetinordertoproducetherowsoftheINSERTtypetocreaterecordswhichdonotexistinthetargetbutdoexistinsourceaccordingtospecifiedkeycolumnsandrowsoftheUPDATEtype.SourcerecordsofwhichtheIDcolumnexistsinthetargettablewillbeusedtoprovidenewvaluesfornon-keyfields.TheinputprimarykeycolumnwespecifyforTable_ComparisontodeterminewhethertherecordexistsinthecomparisontableisBUSINESSENTITYID.TherestofthesourcecolumnsgointotheComparecolumnssectionaswewanttouseallofthemtodetermineifanyvalueinanyofthesefieldshaschanged.
3. TheHistory_PreservingtransformworksintandemwithTable_Comparisontoproduce“history”recordsupdatingtheadditionalSTART_DT,END_DT,andCUR_INDfields,alongwiththerestofthenon-keyfields,ortocreatenewhistoryrecordsfortheINSERTtypeofrowsdefinedbypreviousTable_Comparison.
TheComparecolumnssectionshouldhavethesamelistofcomparisoncolumnsasinthepreviousTable_Comparisontransform.Youcanalsocontrolwhichformatwillbeusedasahighdate(9999.12.31)andwhichvalueswillbeusedintheCurrentflagfield.
4. TheKey_GenerationtransformgeneratessurrogateuniquekeysintheIDfieldforourhistorySCDtableEMPLOYEEasBUSINESSENTITYIDwillnolongerrepresenttheuniquenessoftherecordifmultiplehistoryrowsarecreatedforthesameemployee.
5. SaveandexecutethejobinitiallytopopulatethetargetSCDtablewiththeinitialdataset.Afterrunningthejob,ifyoucheckthecontentsofthetargettable,youwillseethatitrepresentsthesamedatasetasintheOLTP.Employeetablebutwithextrastart/enddatecolumnspopulated.
Note
Notethat,asthisistheinitialdataset,nohistoryrecordshavebeencreatedforanyemployee.Thus,theBUSINESSENTITYIDcolumnstillhasuniquevaluesinthisdataset.
6. Let’sgeneratesomehistoryrecordsinourtargetSCDtable.Todothat,wehavetomakechangestothesourceOLTPtablebyexecutingthefollowingstatementsinSQLServerManagementStudiowhenconnectedtotheAdventureWorks_OLTPdatabase:select*fromHumanResources.EmployeewhereBusinessEntityIDin(1,999);
insertintoHumanResources.Employee
(BusinessEntityID,NationalIDNumber,LoginID,OrganizationNode,JobTitle,BirthDate,MaritalStatus,Gender,
HireDate,SalariedFlag,VacationHours,SickLeaveHours)
values
(999,‘999999999’,‘domain\johnny’,null,‘Engineer’,‘1982-01-
01’,‘S’,‘M’,SYSDATETIME(),1,99,10);
updateHumanResources.EmployeesetJobTitle=‘CEO’
whereBusinessEntityID=1;
7. NowrunthejobasecondtimeandcheckthecontentsofthetargetSCDtable,EMPLOYEE,foremployeeswithBUSINESSENTITYIDsetto1and999.
Howitworks…AnotherimportantthingwehavetodiscussbeforeweexplainindetailhowthisCDCdataflowworksisthedifferencebetweenthedifferenttypesofCDCarchitecture.
TherearetwobasictypesofCDCmethods,ormethodsallowingyoutopopulateSCDtables.Theyareusuallycalledsource-basedCDCandtarget-basedCDC.YoucanuseeitherofthemorevenbothofthemsimultaneouslytopopulateanytypeofSCDtable.Theyaredifferentonlyinhowchangesinthesourcedataaredetermined.
So,imaginethatyouhavepopulatedtheEmployeeDWHdimensiontable(whichhasnotbeenupdatedforacoupleofdays)ononehandandthesourceEmployeeOLTPtable(whichmightormightnotbedifferentfromthetargetDWHtable’scurrentsnapshotofemployeedata).
Source-basedETLCDCThismethodallowsyoutodeterminewhichemployeerecordshavehadtheirvalueschangedsincethelasttimeyouupdatedtheSCDdimensiontableinyourdatawarehousejustbylookingatthesourceEmployeetable.Forthistowork,thesourcetableshouldhavetheMODIFY_DATEandCREATE_DATEfieldsinit,updatedwiththecurrentdate/timeeachtimetherecordinthesourceEmployeetablegetsupdatedorcreated(ifitisanewemployeerecord).
Anothercomponentrequiredforsource-basedCDCisthedate/timewhentheEmployeetablehasbeenmigratedtopopulatetheDWHtableforthelasttime(usuallystoredinanETLlogtableandextractedintoavariable,$v_last_update_date).
So,eachtimeyouperformanextractionofthesourceEmployeetable,youaddafilteringcondition,suchasSELECT*FROMEMPLOYEEWHEREMODIFY_DATE>=$v_last_update_dateORCREATE_DATE>=$v_last_update_date.Thisallowsyoutoextractsignificantlyfewerrecordsfromthesourcesystem,increasingtheETLprocessingspeedanddecreasingyourCPU,memory,andnetworkresourceconsumption.
Then,inadataflowthatpopulatesthetargetSCDtableinDWH,youdeterminewhetherthisisaneworupdatedrecordbycheckingtheMODIFY_DATEandCREATE_DATEvalues.WiththeMap_Operationtransform,changetherecordoperationtypetoeitherINSERTorUPDATEtosendthemtotheHistory_Preservingtransformforhistoryrecordgeneration.
Target-basedETLCDCIntarget-basedCDC,thewholesourcetableisextractedandeachextractedrecordisthencomparedwitheachtargetSCDtablerecord.DataServiceshasanexcellenttransformationobject,Table_Comparison,whichperformsthisoperation,producingINSERT/UPDATE/DELETErecordsandsendingthemtotheHistory_Preservingtransformforhistoryrecordgeneration.
There’snoneedtospecifythatpuretarget-basedCDCisaresource-andtime-consumingmethod,themainadvantageofwhichisthesimplicityofimplementation.So,whynotmixthemtogetherthentogetthespeedofsource-basedCDCwhenextractingfewerrecordsandthesimplicityoftarget-basedCDC,usingonlytwotransforms,
Table_ComparisonandHistory_Preserving,todeterminetherowsforINSERTandUPDATEandforpreparinghistoryrowswhichwillbesenttotargetSCDtable.
Inthestepsofthisrecipe,weimplementedapuretarget-basedCDCmethod.Thefollowingscreenshotshowsyouoneofthepossibleways(inaverysimplisticform)inwhichtoupdateourtarget-basedCDCtoutilizethetechniquesofthesource-basedCDCmethodinordertodeterminethedatasetforextractionwithonlythechangeddata:
TheinitialscripthereusesthelogtableCDC_LOGtoextractthedatewhenthedatawassuccessfullyextractedandappliedtoSCDtargettablethelasttime.
TheCDC_LOGtablehasonlyonefield,EXTRACT_DATE,andalwayshasonlyonerecordshowingwhentheCDCprocesswasexecutedthelasttime.WeextractthisvaluefromitbeforerunningourCDCdataflowsandupdateitrightafterthesuccessfulexecutionofallCDCdataflows.
Thefinalscriptupdatesthelogtablewiththecurrenttime,sowhenthejobisexecutedthenexttime,itwillonlyextractrecordsthathavebeenmodifiedsincethatdate.
Therearemanyvariationsofsource-basedCDCmethodimplementation.Theyalldependonhowoftendataisextracted,ifthereisaMODIFIED_DATEcolumnonthesourcetable,howintensivelythesourcetableisupdatedwithnewvalues,andsoon.
Themainideahereistoextractasfewrecordsaspossiblewithoutlosingthechangesmadetothesourcetable.
NativeCDCSomedatabases,suchasMSSQLServerandOracle,havetheNativeCDCfunctionality,whichcanbeenabledforspecifictables.WhenanyDMLoperationsareperformedonthetablecontents,thedatabaseupdatestheinternalCDCstructuresloggingwhenandhowthe
tablerecordswereupdatedthelasttime.DataServicescanutilizethisnativeCDCfunctionalityprovidedbythedatabase.Thisconfigurationcanbedoneatadatastorelevelbyusingdatastoreoptionswhenyoucreateadatastoreobject.
Usingthisfunctionalityallowsyoutoalwaysselectonlychangedrecordsfromthesourcedatabasetables.
WewillnotdiscussthedetailsofusingNativeCDCinDataServices,butyoucanconsiderthistaskasagoodhomeworkpracticeandtrytocreateyourownCDCdataflows.JustdonotforgetthatCDChastofirstbeenabledatthedatabaselevelbeforeyoumakeanyconfigurationchangesontheDataServicessideandstartdevelopingETL.
AutomaticjobrecoveryinDataServicesTherecoveryprocessusuallykicksinwhenDataServicesjobsfail.Afailedjob,inmostcases,meansthatsomepartofithascompletedsuccessfullyandsomeparthasnot.Thejobwhichhasfailedrightattheverybeginningisrarelyaproblemandisofhardlyanyconcernforrecoveryasallyouhavetodoistostartitagain.
Complicationsarisewhenthejobfailsinthemiddleoftheinsert,intoatargettableforexample.Caseslikethatrequireyoutoeitherconsiderdeletingalreadyinsertedrecordsorevenrecoveringacopyofthetablefromabackupusingdatabaserecoverymethods.
RecoveryanderrorhandlingisanimportantpartofrobustETLcode.Inthisrecipe,wewilltakealookatthemethodsusedtodevelopETLinDataServicesandthefunctionalityavailableinthesoftwaretomakesurethattheprocessofresumingfailedprocessesgoesassmoothlyaspossible.
TheautomaticjobrecoveryfeatureavailableinDataServicesdoesnotfixtheproblemswiththepartiallyinserteddataormissingkeysproblems(whenaninsertintoafacttablecannotfindtherelatedkeyvaluesinthereferenceddimensiontablesbecausetheyhavenotbeenproperlypopulatedafterthelastjobfailure).Also,thisfeaturedoesnotprotectyoufrompoorETLdesignordevelopmenterrorswhen,forexample,yourETLmigrationprocessdoesanautomaticconversionofdatabetweenincompatibledatatypes.Inthatcase,itisyourjobtodevelopyourETLinsuchawaythatyoucancleansethedataifnecessaryanddomanualconversions,makingsurethatyoucaneitherconvertthevalueinthefieldbetweendatatypesorsettherowwiththisvalueasidetoinvestigateordealwithitlater.
Theautomaticjobrecoveryfeaturesimplytracksdowntheexecutionstatusesofalldataflowandworkflowobjectsfromwithinajob,andifthejobfails,itallowsyoutorestartthejobwithouttheneedtorunsuccessfullycompletedprocessesagain.
Let’sseehowitworks.
GettingreadyWewillusethejobfromthepreviousrecipe.Thisjobcontainstwodataflows:anextractoftheEmployeetablefromtheOLTPsourcedatabaseintothestagingareaandtheloadofthedatafromthestagingtableintothetargetdatawarehousehistorytable,Employee.
Wehavetoemulatethefailedprocess.Todothat,wewilldropthetargetdimensiontablepopulatedbytheseconddataflow.
Firstofall,generateaCREATETABLEstatementfromthetabledbo.EMPLOYEEusingSQLServerManagementStudio.Dothisbyright-clickingonthetableobjectandselectingScriptTableAs|CREATETo|NewQueryEditorWindowonthecontextmenusothatyoucancreateatablewiththesametabledefinitionwithoutanydifficulties.Savethiscodeonyourphysicaldriveforlaterusetorecreatethetable:CREATETABLE[dbo].[EMPLOYEE](
[ID][decimal](22,0)NULL,
[BUSINESSENTITYID][int]NULL,
[NATIONALIDNUMBER][varchar](15)NULL,
[LOGINID][varchar](256)NULL,
[ORGANIZATIONLEVEL][int]NULL,
[JOBTITLE][varchar](50)NULL,
[BIRTHDATE][date]NULL,
[MARITALSTATUS][varchar](1)NULL,
[GENDER][varchar](1)NULL,
[HIREDATE][date]NULL,
[SALARIEDFLAG][int]NULL,
[VACATIONHOURS][int]NULL,
[SICKLEAVEHOURS][int]NULL,
[START_DT][date]NULL,
[END_DT][date]NULL,
[CUR_FLAG][varchar](1)NULL
)ON[PRIMARY]
Then,executethefollowcommandtodropthetable:DROPTABLE[dbo].[EMPLOYEE]
Howtodoit…1. OpenJob_Employeeintheworkspacewindowandexecuteit.2. OntheExecutionPropertieswindow,checktheoptionEnablerecovery.This
optionwillenabletheexecutionstatusloggingoftheworkflowanddataflowobjectswithinthejob.
3. Thefirstdataflowexecutessuccessfully,butthesecondonefailsstraightawaywithanerrormessagefromtheKey_GenerationtransformwhichsendstheSQLstatementSELECTmax(ID)FROMdbo.EMPLOYEEinordertogetthelatestkeyvaluefromthetargettable.
4. Now,returnourmissingtableobjectsbyexecutingthepreviouslysavedCREATETABLEcommandinSQLServerManagementStudio.
5. Executethejobagain,butthistimeselecttheRecoverfromlastfailedexecutionoptionintheExecutionPropertieswindow.
6. ThetracelogstatesthatDF_OLTP_Extract_STAGE_Employeeissuccessfullyrecoveredfromthepreviousjobexecution.
Howitworks…Theautomaticrecoveryfeatureworksonlyifyouenabletheflagonthejobexecutionoptionswindowtoenabletheobjectstatusloggingmechanism.Ifyouhaven’tenableditbeforeyourjobfails,youcannotusetheautomaticrecoveryfeature.
Averyimportantthingbeforerunningthejobagaininrecoverymodeistocheckwhythejobhasfailed.Ifthejobfailedinthemiddleofpopulatingofoneofthetables(dimensionoffact),youhavetounderstandtheimpactofrunningthesameloadprocessagainwithoutcleaningupalreadyinsertedrecordsfirst.
Inourrecipe,wesimulatedthefailureoftheloaddataflow,whichpopulatesthetargetdimensiontable.AsithastheTable_ComparisonandHistory_Preservingtransforms,itisnotaproblemtoexecuteitagainwithoutanypreparatorystepsusingthesamedataset.RecordsthathavealreadybeeninsertedsimplywillnotbeconsideredbyTable_ComparisonforeitherINSERTorUPDATEandwillbeignored,soitissafeforustojustrestartthejobinrecoverymode.
NoteAlwaysconsiderthetypeoffailureandthenatureofyourdataandhowitispopulatedbyyourETLbeforerestartingthejobinrecoverymodetopreventinsertingduplicatesintoyourtargettablesortoavoidreferencingmissingkeyvalues.
TheworkflowobjectcangroupseveralchildobjectsplacedinsideitasasinglerecoverytransactionalunitbyusingRecoverasaunitoption.Thisisusefulwhenseveralofyourdataflowobjectsworkasasingleunitinordertopopulatethespecifictargettablebypreparingdataataspecificpointintime.Inthatcase,ifsomeofthesedataflowsfail,youwanttoexecuteallsequencesofdataflowfromthebeginning.Otherwise,DataServiceswillexecutethejobinthedefaultrecoverymode,skippingallpreviouslysuccessfullycompleteddataflowsandworkflows.
Tousethisability,placebothdataflowobjectsintothesingleworkflow.OpentheworkflowpropertiesandchecktheoptionRecoverasaunit.
Theworkflowiconwillbemarkedintheworkspacewindowbyagreenarrowandsmallblackcrosssothatyoucanvisuallydifferentiatewhichpartsofyourcodebehaveasatransactionalunitduringtherecoveryprocess.
NoteNotethatscriptobjectsarenotconsideredbyrecoverymodeastheyarepartoftheparentworkflowobject.Youshouldkeepthatinmindbeforererunningthejobinrecoverymode.
There’smore…Ofcourse,thebestwaytomakeyourlifeeasieristotrytopreventthenecessityofjobrecoveryinthefirstplace.Oneofthetechniquesthatcanbeimplementedtopreventpossibleproblemswithdatarecoveryandjobreruncomplicationsisbyputtingextracodeinthetry-catchblock.Thiscodecanbeasetofscriptsthatwillperformatableclean-upwithaconsequent“clean”failuresothejobcansimplybererunwithoutextraconsiderationsandpreparatorystepsorsoitcouldevenbeanalternativeworkflowthatprocessesthedatawithadifferentmethodascomparedtotheoriginalonethatfailed.
Forexample,ifyouuseadataflowthatloadstheflatfileintoatable,youcanwrapitinatry-catchblock.Ifitfails,executeanotherdataflowfromacatchblocktotrytoreadthefileagainbutfromadifferentlocationorusingdifferentmethod.
SimplifyingETLexecutionwithsystemconfigurationsWorkinginmultiplesourceandtargetenvironmentsisverycommon.ThedevelopmentofETLprocessesbyaccessingdatadirectlyfromtheproductionsystemhappensveryrarely.Mostofthetime,multiplecopiesofthesourcesystemdatabasearecreatedtoprovidetheworkingenvironmentforETLdevelopers.
Basically,thedevelopmentenvironmentisanexactcopyoftheproductionenvironmentwiththeonlydifferencebeingthatthedevelopmentenvironmentholdsanoldsnapshotofthedataortestdatainsmallervolumesforquicktestjobexecution.
So,whathappensafteryoucreateadatastoreobject,importallrequiredtablesfromdatabaseintoit,andfinishdevelopingyourETL?Youhavetoswitchtotheproductionenvironment.
DataServicesprovidesaveryconvenientwayofstoringmultipledatastoreconfigurationsinthesamedatastoreobject,soyoudonotneedtoeditdatastoreobjectoptionseachtimeyouwanttoextractfromeithertheproductionordevelopmentdatabaseenvironments.Instead,youcancreatemultipleconfigurationsthateachusedifferentcredentialsanddifferentdatabaseconnectionsettingsandquicklyswitchbetweenthemwhenexecutingajob.Thisallowsyoutotouchdatastoreobjectsettingsonlyonceinsteadofchangingthemeachtimeyouwanttorunyourjobagainstadifferentenvironment.
GettingreadyToimplementthestepsinthisrecipe,wewillneedtocreateacopyoftheAdventureWorks_DWHdatabase.OursampledatabasecopyisnamedDWH_backup.UseanypreferredSQLServermethodtocopythecontentsofAdventureWorks_DWHintoDWH_backup.ThequickestwayofperformingthiskindofbackupistobackupthedatabaseusingstandardSQLServermethodsavailableinthedatabaseobjectcontextmenu,andthenrecoveringthisbackupcopyinthedatabasewithanewname.
Howtodoit…ThereisnoneedtocreateaseparatedatastoreobjectforDWH_backuporchangetheDWHdatastoreconfigurationoptionseachtimewewanttoextracteitherfromAdventureWorks_DWHorDWH_backup.Let’sjustcreatetwoconfigurationsforourDWHdatastore.
1. GotoLocalObjectLibrary|Datastores.2. Right-clickontheDWHdatastoreandselectEdit…fromthecontextmenu.3. OntheEditDatastoreDWHwindow,clickonAdvanced<<toopentheadvanced
configurationpart,andthenclickontheEdit…buttonagainsttheConfigurations:label.
4. Inthetop-leftcorneroftheConfigurationsforDatastoreDWHwindow,youcanseefourbuttonsthatallowyoutocreateanewconfiguration,duplicatethecurrentlychosenone,andrenameordeleteconfigurations.UsethemtorenamethecurrentlyusedconfigurationtoDWH_Productionandcreateanewconfiguration,DWH_Development.
5. ChangethenewDWH_DevelopmentconfigurationtobethedefaultconfigurationbysettingDefaultconfigurationtoYes.NotethatthisvaluechangesautomaticallytoNoinotherconfigurations.
6. ChangetheDatabasenameorSIDorServiceNameoptionsettingforDWH_DevelopmenttoDWH_backuptopointthisconfigurationtoanotherdatabase.
Thereisnoneedtochangetheotheroptionsastheywillbeidenticalforbothconfigurations.
7. Nowlet’screatesystemconfigurationssothatwecanchoosetheconfigurationsetupwhenwerunthejobwithouttheneedtoeditthedatastore’sDefaultconfigurationoption.GotoTools|SystemConfigurations…andcreatetwosystemconfigurations:DevelopmentandProduction.
8. FortheDWHrecord,setDevelopmenttoDWH_DevelopmentandProductiontoDWH_Production.
9. ClickonOKtosavethechanges.
Howitworks…Usingconfigurationsenablesyoutoquicklyswitchbetweenenvironmentswithouttheneedtomodifyconnectivityandconfigurationsettingsinsideadatastoreobject.
Systemconfigurationsextendtheusabilityofdatastoreconfigurationsevenmorebyallowingyoutoselectthecombinationofenvironmentsrightatthejobexecutiontime.
NoteForthesystemconfigurationfunctionalitytowork,datastoreconfigurationshavetobecreatedfirst.
DoyouwanttobeabletoextractfromtheproductionOLTPsourcebutinsertintothedevelopmentDWHtargetwithinthesamejobwithoutchangingtheETLcodeordatastoresettings?Justcreateanewsystemconfigurationthatincludestherequiredcombinationofvariousdatastoreconfigurationsandexecutethejobwiththesystemconfigurationspecified.
Now,ifyouexecutetheJob_Employeejob,justselectthedesiredconfigurationinthejobexecutionoptions:
UsetheBrowse…buttontoreviewallsystemconfigurationscreated,ifnecessary.
TransformingdatawiththePivottransformThePivottransformbelongstotheDataIntegratorgroupoftransformobjects,whichareusuallyallaboutgenerationortransformation(changingthestructure)ofdata.Simplyput,thePivottransformallowsyoutoconvertcolumnsintorows.Pivotingtransformationincreasesthenumberofrowsinthedatasetasforeachcolumnconvertedintoarow,anextrarowiscreatedforeverykey(non-pivotedcolumn)pair.Convertedcolumnsarecalledpivotcolumns.
GettingreadyRuntheSQLfollowingstatementsagainsttheAdventureWorks_OLTPdatabasetocreateasourcetableandpopulateitwithdata:createtableSales.AccountBalance(
[AccountID]integer,
[AccountNumber]integer,
[Year]integer,
[Q1]decimal(10,2),
[Q2]decimal(10,2),
[Q3]decimal(10,2),
[Q4]decimal(10,2));
—Row1
insertintoSales.AccountBalance
([AccountID],[AccountNumber],[Year],[Q1],[Q2],[Q3],[Q4])
values(1,100,2015,100.00,150.00,120.00,300.00);
—Row2
insertintoSales.AccountBalance
([AccountID],[AccountNumber],[Year],[Q1],[Q2],[Q3],[Q4])
values(2,100,2015,50.00,350.00,620.00,180.00);
—Row3
insertintoSales.AccountBalance
([AccountID],[AccountNumber],[Year],[Q1],[Q2],[Q3],[Q4])
values(3,200,2015,333.33,440.00,12.00,105.50);
Thesourcetableshouldlooklikethefollowingfigure:
DonotforgettoimportitintotheDataServicesOLTPdatastore.
Howtodoit…1. CreateanewdataflowandnameitDF_OLTP_Pivot_STAGE_AccountBalance.2. Openthedataflowintheworkspacewindowtoedititandplacethesourcetable
ACCOUNTBALANCEfromtheOLTPdatastorecreatedintheGettingreadysectionofthisrecipe.
3. LinkthesourcetabletotheExtractQuerytransform,andpropagateallsourcecolumnstothetargetschema.
4. PlacethenewPivottransformobjectintoadataflowandlinktheExtractQuerytoit.ThePivottransformcanbefoundinLocalObjectLibrary|Transforms|DataIntegrator.
5. OpenthePivottransformintheworkspacetoeditandconfigureitsparametersaccordingtothefollowingscreenshot:
6. ClosethePivottransformandlinkittoanotherQuerytransformnamedPrepare_to_Load.
7. PropagateallsourcecolumnstothetargetschemaofthePrepare_to_Loadtransform,andfinallylinkittothetargetACCOUNTBALANCEtemplatetablecreatedintheDS_STAGEdatastoreandSTAGEdatabase.
8. Beforeexecutingthejob,openthePrepare_to_LoadQuerytransforminthe
workspacewindow,double-clickonthePIVOT_SEQcolumn,andcheckPrimarykeytospecifyanadditionalcolumnasaprimarykeycolumnforthemigrateddataset.
9. Saveandrunthejob.10. Openthedataflowagainintheworkspacewindowandimportthetargettableby
right-clickingonthetargettableandselectingImporttablefromthetablecontextmenu.
11. Openthetargettableintheworkspacewindowstoedititsproperties,andselecttheflagDeletedatafromtablebeforeloadingontheOptionstab.
12. YourdataflowandPrepare_to_LoadQuerytransformmappingshouldnowlooklikethefollowingscreenshot:
Howitworks…Pivotcolumnsarethecolumnswhosevalueswillbemergedintoonecolumnafterthepivotingoperationproducesanextrarowforeachpivotedcolumn.Non-pivotcolumnsarethecolumnsnotaffectedbypivotoperation.Asyoucansee,pivotingoperationdenormalizesthedataset,generatingmorerows.ThisiswhyACCOUNTIDdoesnotdefinetheuniquenessoftherecordanymoreandwhywehadtospecifytheextrakeycolumnPIVOT_SEQ.
YoumightaskWhypivot?WhynotjustusethedataasisandperformtherequiredoperationonthedatafromcolumnsQ1-Q4?
Theanswerinthegivenexampleisverysimple.Itismuchmoredifficulttoperformanaggregationwhentheamountsarespreadacrossthedifferentcolumns.Insteadofsummarizingbyasinglecolumnwiththesum(AMOUNT)function,wehavetowritetheexpressionsum(Q1+Q2+Q3+Q4)eachtime.Quartersisnottheworstthingyet.Trytoimaginethesituationwhenthetablehasamountsstoredincolumnsdefiningmonthperiodsoryouhavetofilterbythesetimeperiods.
Ofcourse,contrarycasesexistaswell—whenstoringdataacrossmultiplecolumnsinsteadofjustinoneisjustified.Inthesecases,ifyourdatastructureisnotlikethat,youcanusetheReverse_Pivottransform,whichdoesexactlytheoppositething—convertingrowsintocolumns.LookattheexampleoftheReverse_Pivotconfigurationgivenhere:
Reversepivotingorthetransformationofrowsintocolumnshasintroducedanotherterm:Pivotaxiscolumn.Thisisthecolumnthatholdsthecategoriesdefiningdifferentcolumnsafterreversepivotoperation.ItcorrespondstotheHeadercolumnoptioninthePivottransformconfiguration.
Chapter10.DevelopingReal-timeJobsTherecipesandtopicsthatwillbediscussedinthischapterareasfollows:
WorkingwithnestedstructuresTheXML_MaptransformTheHierarchy_FlatteningtransformConfiguringAccessServerCreatingreal-timejobs
IntroductionInallpreviouschapters,wehaveworkedwithbatch-typejobobjectsinDataServices.Aswealreadyknow,abatchjobinDataServiceshelpstoorganizeETLprocessessothattheycanbestartedondemandorscheduledtobeexecutedataspecifictimeeitheronceorregularly.
Themaindifferencebetweenareal-timejobandbatchjobisthewaythesetwojobobjectsareexecutedbyDataServicesengine.Thepurposeofareal-timejobistoprocessrequestsprovidingresponse.So,technically,areal-timejobcouldberunningforhours,days,orevenweekswithoutactuallyprocessinganydata.DataServicesengineactuallyexecutestheETLcodefromwithinthereal-timejobobjectonlywhennewrequestcomesfromanexternalservice.DataServicesusesthisrequestmessageasthedatasource,processesthisdata,andsendstheprocesseddatabacktoexternalserviceinformofresponsemessage.
AnewDataServicescomponentcalledAccessServerhascomeintotheframe.AccessServerplaystheroleofamessengerservicingreal-timejobs.ItisAccessServerthatacceptsandsendsbackmessagestobeusedasasourceandtargetdataforreal-timejobs.
Inthischapter,wewillalsoreviewtheconceptsofnestedstructures—howandwhentheyarecommonlyused.Themainreasonforthisisthatthereal-timejobsoftenuseXMLtechnologytoreceiverequestsandsendtheresponsesback.TheXMLformatisoftenusedtoexchangenesteddatastructures.
WewillalsoseehowtocreateandconfigureAccessServertobeabletousereal-timejobfunctionalityand,finally,wewillcreateareal-timejobitself.
WorkingwithnestedstructuresEarlierinthisbook,weworkedsolelywithaflatstructure—rowsextractedfromdatabasetablesandinsertedbackinadatabasetable,orexportedtoaflattextfile.Inthisrecipe,wewilltakealookathowtopreparenesteddatastructuresinsideadataflowandthenexportitintoanXMLfileasXMLisasimpleandveryconvenientwaytostorenesteddataandismostcommonlyusedasasourceandtargetobjectsinreal-timejobs.
GettingreadyWewillnotneedtohaveanXMLfilepreparedforthisrecipeaswearegoingtogeneratethemautomaticallywithhelpofDataServicesfromdatasetsstoredinourrelationaldatabases:OLTPandDWH.
Wewillconstructthenesteddatastructureofjobtitlelist,whereeachrecord(jobtitle)willhaveareferencetoalistofemployeeswhohavethesamejobtitleintheOLTPsystem.
Followingisthevisualpresentationofthisnesteddatastructure:
Intheflatdatastructure,thesewouldbetwodifferenttablesandwewouldhavetohavereferencekeycolumnsinbothtableslinkingthemtogetherasaparent–childrelationship.
Anesteddatastructureallowsyoutoavoidreferencekeyscompletely.Inotherwords,wedonotreallyneedJobTitleIDinordertolinkthesetwotablestogether.Alistofemployeeswillbeliterallystoredinthesamedatasetinoneofthefieldsforthespecificjobtitle.
WewillsourcethelistofjobtitlesfromtheHumanResources.EmployeetableofourOLTPdatabase.Persondatasuchasfirstnameandlastname,willbesourcedfromthePerson.PersontablethatislinkedtotheEmployeetablebytheBusinessEntityIDcolumn.
Howtodoit…1. Createanewdataflow,DF_OLTP_XML,andopenitintheworkspacewindowfor
editing.2. Import,ifnecessary,twotables,Person.PersonandHumanResources.Employee,into
theOLTPdatastore.3. PlacebothtablesinthedataflowDF_OLTP_XMLassourcetableobjects.4. PlacetheGet_PersonQuerytransforminsidetheworkspaceofDF_OLTP_XMLandlink
ittothePersontableobject.PropagatethreecolumnstotheoutputschemaoftheQuery—BUSINESSENTITYID,FIRSTNAME,LASTNAME—fromthePersontable.
5. CreatetwoQuerytransformstogetthedatafromtheEmployeetable:Get_JobTitle_PersonandDistinct_JobTitle.
Get_JobTitle_PersonshouldselectthedatasetconsistingoftwocolumnsBUSINESSENTITY_IDandJOBTITLE.
Distinct_JobTitleshouldonlyselecttheJOBTITLEcolumn.
6. IntheDistinct_JobTitleQueryEditor,tickthecheckboxDistinctrows…ontheSELECTtabandsetupascendingsortingontheJOBTITLEcolumnonORDERBYtab.
7. CreatetheGen_JobTitle_IDQuerytransformandlinkDistinct_JobTitletoit.ThisQuerytransformwillbeusedtogeneratenewuniqueidentifiersfordistinctvaluesofjobtitles.
8. Finally,joinallthreeQuerytransformstogetherusinganotherJoinQueryandpropagatefourcolumnstotheoutputschema:JOBTITLE_ID,JOBTITLE,FIRSTNAME,LASTNAME.
9. Nowthatwehavemergedourdatafrommultipletablesintoonedataset,let’sseewhatisrequiredtoconvertthisflatdatasettoanestedone.
10. Todothat,wehavetosplittheflatdataagain,separatingjobtitlesfromemployeedata.Bothresultdatasetsshouldhaveareferencekeycolumn,whichwillbeusedtodefinetherelationshipsbetweentherecords.
CreatetwoQuerytransforms,Q_JobTitleandQ_Person,propagatingJOBTITLE_IDinbothQueryobjects:
11. ThenestingofthedatahappensintheQuerytransformobjectthatisusedtojointhepreviouslysplitdatasets.CreatetheJobTitle_TreeQuerytransformandlinkittobothQ_JobTitleandQ_Person.
12. OpentheJobTitle_TreeQueryEditorinworkspacewindow.13. DraganddropJOBTITLE_IDandJOBTITLEfromQ_JobTitleinputschematothe
outputschema.14. DraganddropthewholeQ_Personinputschematotheoutputschema.Thatwillthe
placeQ_PersontableschemaatthesamelevelwiththeJOBTITLE_IDandJOBTITLEcolumns.Q_PersonisnowanestedsegmentinsidetheJobTitle_Treeschema.
15. Now,wecanswitchbetweenoutputschemasbydouble-clickingoneitherJobTitle_TreeorQ_Person,oryoucanright-clickonschemanameandselectMakecurrent…fromthecontextmenu.ThatisnecessaryifyouwanttochangesettingsontheQuerytransformtabs:Mapping,SELECT,FROM,WHERE,andsoon.Thosetabsarenotsharedbyallnestedoutputschemasandonly“current”outputschemavaluesaredisplayed.
16. MaketheJobTitle_TreeoutputschemacurrentandselecttheFROMtab.Make
surethatonlyoneQ_JobTitlecheckboxisselected.
17. Now,maketheQ_Personoutputschemacurrent.18. OntheFROMtab,tickonlytheQ_Personcheckbox.19. OntheWHEREtab,putthefollowingfilteringcondition:
(Q_Person.JOBTITLE_ID=Q_JobTitle.JOBTITLE_ID)
20. Finally,wehavetooutputournesteddatasetintoapropertargetobject,whichsupportsnesteddata.SQLServerdoesnotsupportnesteddata,andthatiswhywewilluseanXMLfileasatarget.
21. SelecttheNestedSchemasTemplateobjectfromtheright-sidetoolpanel, ,andplaceitasatargetobjectlinkedtothelastJobTitle_TreeQuerytransform.
22. NamethetargetobjectXML_targetandopenitintheworkspacewindowsforediting.Specifythefollowingoptions:
23. Yourdataflowshouldnowlooklikethefollowingfigure:
Howitworks…DataServicesallowsyoutoviewthetargetdataloadedbythelastjobrunfromtheXMLtargetobjectinthesamewayasfortargetdatabasetableobjects,asshowninthefollowingscreenshot:
IfyouopentheXML_target.xmlfilecreatedintheC:\AW\Files\folder,youwillseeacommonXMLstructure:
XMLisjustaconvenientexampleofanobjectthatcanstorenesteddatastructure.DataServiceshasothertargetobjectsthatcanacceptnesteddatasuchasBAPIfunctionsandIDocobjects,bothusedtoextract/loaddatafromandintoSAPsystems.Thesemethodsandconceptswillbeintroducedinthenextchapter.
DataServicesalsosupportstheJSONformatasanothersourceortargetfornesteddatastructures.
Nesteddataisoftencalledhierarchicaldataasitresemblesthetreestructure.Ifyouimaginerowfieldstobeleaves,thenoneoftheleavescouldbeanothertree(onerowormultiplerows)storedinsidealeafsection.
Inotherwords,nesteddatasimplymeansmappingsourcetableasacolumnintheoutputobjectstructureinsideadataflow.
Inthepreviouschapters,weworkedonlywiththeflattableorfiledatawhendatasetsconsistedofmultiplerowsandeachrowconsistedofmultiplefields,eachofwhichcouldonlyhaveonevalue(decimal,character,date,andsoon.).Nestedorhierarchicaldataallowsyoutoreferenceanothertableinsidearowfield.
NoteConvertingaflatdatasettoanesteddatasetnormalizesitasyoudonothavetoduplicateparentfieldsforeverychildsetofrows.
Youcanseehowanestedtablesegmentisdisplayedamongotherparentcolumns.Todefineifanestedstructurecanhavemultiplerecordsforeveryparentrecord,youcanright-clickonthenestedtablesegmentandselecttheRepeatablemenuoption.Unselectingthisoptionwillmakethenestedsegmentaone-recordsegmentandwillchangetheiconofthenestedtablesegmentfrom to .
Thereismore…DataServiceshasfullsupportofnesteddatastructures.Inthestepsofthisrecipe,weusedgoodoldQuerytransformtogenerateit.Inthenextrecipe,wewilldemonstratehowthesametaskcanbeimplementedwiththehelpofspecialDataServicestransform—XML_Maptransform.
TheXML_MaptransformInthefirstrecipeofthischapter,Workingwithnestedstructures,webuiltthenestedstructurewiththehelpofthemostuniversaltransforminDataServices—Querytransform.Querytransformhasthepowertodefinecolumnmapping,filterdata,joindatasetstogether,andmergedatainnestedsegments.Infact,manytransformsthatyouhaveusedbefore,suchasHistory_Preserving,Table_Comparison,Pivot,andothers,canbesubstitutedwiththesetofQuerytransforms.Ofcourse,thosewouldbecomplexETLsolutionsrequiringmoredevelopmenttime,wouldbehardertomaintainandread,and,mostimportantly,lessefficientintermsofperformance.
Inthisrecipe,wewilltakealookatanothertransformXML_Map,whichdoesexactlythesametaskasperformedinthepreviousrecipe—buildsandtransformsnestedstructures.
WewillusethesamesourcetablesPERSON.PERSONandHUMANRESOURCES.EMPLOYEEtobuildadatasetofjobtitleswithnestedlistsofemployees.
GettingreadyWehaveeverythingweneedforthisrecipealready:twosourcetables,PERSON.PERSONandHUMANRESOURCE.EMPLOYEE,importedinourOLTPdatastore.
Howtodoit…1. Createanewjobandnewdataflowandopenitintheworkspace.2. PlacethetwotablesPERSONandEMPLOYEEfromtheOLTPdatastoreinsideadataflow
assourcetables.3. DraganddropXML_MaptransformfromLocalObjectLibrary|Transforms|
Platformintoadataflowworkspaceandlinkbothsourcetablestoit.Whenplacingtransformintheworkspace,choosetheNormalmodeoption.
4. Left-clickonXML_Maptoopenitinworkspaceforediting.5. First,buildtheparentdatastructureofjobtitlesbymappingtheJOBTITLEcolumn
fromtheEMPLOYEEsourceschematotheoutputXML_Mapschema.6. OntheIterationRuletab,double-clickontheiterationrulefieldandselectthe
EMPLOYEEinputschema.7. OntheDISTINCTtab,draganddroptheEMPLOYEE.JOBTITLEsourcecolumninto
theDistinctcolumnsfield.8. OntheORDERBYtab,specifyAscendingsortingbytheEMPLOYEE.JOBTITLE
sourcefield,asshowninthefollowingscreenshot:
9. Now,addanesteddatasetcontainingpersonalinformation.DragthePERSONinputschematotheoutputandmakesurethatitisaddedonthesamelevelwiththepreviouslypropagatedJOBTITLEcolumn.
10. Double-clickontheoutputPERSONschematomakeitcurrentoruseMakecurrentfromthecontextmenubyright-clickingontheoutputPERSONschema.
11. OntheIterationRuletab,selecttheINNERJOINiterationruleandaddbothsourceinputschemasunderneathit.
12. OnthesameIterationRuletab,intheOnfield,specifythejoincondition:PERSON.BUSINESSENTITYID=EMPLOYEE.BUSINESSENTITYID
13. OntheWHEREtab,specifythejoinconditionbetweenparentandnesteddatasetsintheoutputschema:EMPLOYEE.JOBTITLE=XML_Map.JOBTITLE
14. CloseXML_MapEditorandlinkXML_MaptoQuerytransformobjectcalledGen_JobTitle_ID,inwhichwewillgenerateanIDcolumnfortheparentjobtitledataset.
AddtheJOBTITLE_IDoutputcolumn,asshownintheprecedingscreenshot,andputthemappingexpressiongen_row_num()foritontheMappingtab.
15. AfterQuerytransform,addtheNestedSchemasTemplateobjectasatargetobject.ConfigureitasanXMLtypewiththefilename:C:\AW\Files\XML_map.xml.
Howitworks…TheXML_MaptransformpropertiesareverysimilartoQuerytransformpropertieswithafewexceptionswhereXML_Maphassomeextrafunctionalitythatcanbeusedtobuildnesteddatastructures.
WhatmakestheXML_Maptransformareallypowerfultoolistheabilitytojoinanysourceinputdatasets(itdoesnotmatteriftheycomefromflatdatasourcesornesteddatastructures)anditerateonthecombineddataset,producingrequiredoutputresults.
Therearemultipletypesofjoinoperationsavailable:
*—cross-joinoperation:ThisproducesaCartesianproductofjoineddatasets.InSQLlanguage,itisanormalINNERJOINwithoutthespecifiedONclause.||—parallel-joinoperation:Thisisanon-standardSQLoperationthatbasicallyconcatenatesthecorrespondingrecordsfromtwojoineddatasets.Seetheexampleinthefollowingfigure:
INNERJOIN—standardSQLoperation:ThisiswhereyoucanspecifythejoinconditionintheOnfield.LEFTOUTERJOIN—standardSQLoperation:ThisiswhereyoucanspecifythejoinconditionintheOnfield.
Inthepreviousstepsoftherecipe,weproducedonehierarchicaldatasetwiththehelpofXML_Map,which,infact,hastwodatasetsinit—aparentdatasetofdistinctjobtitlessourcedfromtheEMPLOYEEtableandanesteddatasetoftheemployee’spersonalinformationwhichbelongstothespecificjobtitle.
IfwejustsourcedpersonalinformationfromonlythePERSONtable,wewouldnotbeabletospecifywhichpersonalinformation(FIRSTNAMEandLASTNAME)belongstowhichjobtitle.
Byprovidingajoineddatasetforpersonalinformationtoiterateon,wecoulddefinethedependencyforournestedstructurebyusingthefollowingexpressionintheWHEREtab,EMPLOYEE.JOBTITLE=XML_Map.JOBTITLE,whichcouldberoughlytranslatedtobuildadatasetfromthesourcetables,whichcontainsthefieldsJOBTITLE,FIRSTNAME,LASTNAME,andnesttherecordswithFIRSTNAMEandLASTNAMEfieldsinsidetheuniquerecordsoftheoutputjobtitledatasetbyreferencingthecorrespondingJOBTITLEcolumn.
ThefinalQuerytransform,whichisusedtogenerateanextraoutputcolumnwithauniqueIDforaparentjobtitledataset,isquitesimple.Wehavealreadyproducedanalphabeticallysortedanduniquelistofjobtitlesinourparentdatastructure,andallthatisleftistogeneratesequentialnumbersforeachrecord,whichcanbeeasilydonewithhelpofthegen_row_num()function.
NoteNotehowmuchmoreconciseourETLcodehasbecomewiththeuseoftheXML_MaptransformascomparedtothepreviousrecipewherewebuiltthesamehierarchicaldatasetbyonlyusingQuerytransformobjects.
TheHierarchy_FlatteningtransformSometimes,hierarchicaldataisnotrepresentedbynested(hierarchical)datastructuresbutisactuallystoredwithinasimpleflatstructureinnormaldatabasetablesorflatfiles.Thesimplestformofhierarchicalrelationshipsindatacanbepresentedasatablethathastwofields:parentandchild.
Lookattheexampleoffolderhierarchyonthedisk(asshowninthefollowingfigure).Thestructureontheleftisvisuallysimpletoreadandunderstand.Youcaneasilyseewhatistherootfolderandwhataretheleaves,andcaneasilyhighlightthespecificbranchyouareinterestedin.
Thetableontherightisthesimplestwaytostorethehierarchicalrelationshipsdataintheflatformat.ThisstructureisextremelyhardtoquerywiththestandardSQLlanguage.SomedatabaseslikeOraclehavespecialSQLclauses,whichcanhelptoqueryhierarchicaldatatobeabletoanalyzeitandpresentinanunderstandableandclearway.However,thosehierarchicalSQLstatementscanbequitecomplexandthemajorityofotherdatabasesdonotsupportthematall,leavingyouwiththenecessitytowritestoredproceduresinordertoparsethishierarchicaldata,answeringeventhesimplestquestionlikeselectall“children”forspecific“parent”.
Inthisrecipe,wewillreviewthemethodthatisavailableinDataServicestoconvertdatafromthatsimpleflathierarchicalpresentationofparent–childrelationshipsintothemoreefficientandeasy-to-usedatastructurethatcanbequeriedwithastandardSQLlanguage.ThiscanbedonewiththeHierarchy_Flatteningtransform.
GettingreadyAswedonothavemulti-levelparent–childrelationshiptable,weshouldartificiallycreateone.Let’sbuildthehierarchyoflocationsusingourthreesourcetablesfromtheOLTPdatabase:ADDRESS(tosourcecitiesfrom),STATEPROVINCE(tosourcestatesfrom),andCOUNTRYREGION(tosourcecountriesfrom)—allofthemarefromthesamePersonschemaoftheAdventureWorks_OLTPSQLServerdatabase.
Theresultingdatasetwillonlyhavetwocolumns—PARENTandCHILD—andeachrowinitwillrepresentonelinkofthehierarchicaldataset.
1. CreateanewjobandcreateanewdataflowinitnamedDF_Prepare_Hierarchy.2. Openthedataflowintheworkspacewindowforediting,andplacethreesourcetables
initfromOLTPdatastore:ADDRESS,STATEPROVINCE,andCOUNTRYREGION.3. CreateQuerytransformState_CityandjoinADDRESSandSTATEPROVINCEinitusing
theconfigurationsettings,asshowninthefollowingscreenshot,propagatingtheSTATEPROVINCE.NAMEandADDRESS.CITYsourcecolumnsasoutputPARENTandCHILDcolumnsrespectively:
4. CreateQuerytransformCountry_StateandjoinCOUNTRYREGIONandSTATEPROVINCEinitusingconfigurationsettings,asshowninthefollowingscreenshot,propagatingCOUNTRYREGION.NAMEandSTATEPROVINCE.NAMEsourcecolumnsasoutputPARENTandCHILDcolumnsrespectively:
5. MergetheoutputsofbothState_CityandCountry_StatetransformobjectswiththeMergetransform.
6. LinktheMergetransformoutputtotheHierarchyQuerytransformandpropagatebothPARENTandCHILDcolumnswithoutmakinganyotherconfigurationchangestotheQuerytransform.
7. Placethetargettemplatetableattheendofdataflowobjectsequencetoforwardtheresultdatato.NamethetargettableLOCATIONS_HIERARCHYandcreateitintheDS_STAGEdatastore.
Aftersavingandexecutingthejob,theLOCATIONS_HIERARCHYtablewillbecreatedand
populatedwithathree-levelhierarchyoflocations,whichincludecities,states,andcountries,asshowninthefollowingscreenshot:
Now,let’sseehowthisdatasetcanbeflattenedwiththeHierarchy_Flatteningtransform.
Howtodoit…TherearetwodifferentmodesinwhichtheHierarchy_Flatteningtransformparsesandrestructuresthesourcehierarchicaldata:horizontalandvertical.Theyproducedifferentresults,andwewillbuildtwodifferentdataflowsforeachoneoftheminordertoparseandflattenthesourcehierarchicaldataandcomparethefinalresultdatasets.
HorizontalhierarchyflatteningThefollowingarethestepstoperformHorizontalhierarchyflattening.
1. Createanewdataflow,DF_Hierarchy_Flattening_Horizontal,andlinkittothe
existingDF_Prepare_Hierarchyinthesamejob.Openitintheworkspaceforediting.
2. PuttheLOCATIONS_HIERARCHYtemplatetablefromtheDS_STAGEdatastoreasasourcetableobject.
3. LinkthesourcetabletotheHierarchy_Flatteningtransformobject,whichcanbefoundintheLocalObjectLibrary|Transforms|DataIntegratorsection.
4. OpentheHierarchy_Flatteningtransformintheworkspacewindowandchoosethehorizontalmethodofhierarchyflattening.
5. SpecifythesourcePARENTandCHILDcolumnsinthecorrespondingtransformconfigurationsettings:
6. ClosethetransformeditorandlinktheHierarchy_FlatteningtransformobjecttothetargettemplatetableLOCATIONS_TREE_HORIZONTALcreatedintheDS_STAGEdatastore.
VerticalhierarchyflatteningThefollowingarethestepstoperformverticalhierarchyflattening.
1. Createanewdataflow,DF_Hierarchy_Flattening_Vertical,andlinkittothepreviouslycreatedDF_Hierarchy_Flattening_Horizontaldataflowinthesamejob.Openitintheworkspaceforediting.
2. PuttheLOCATIONS_HIERARCHYtemplatetablefromtheDS_STAGEdatastoreasasourcetableobject.
3. LinkthesourcetabletotheHierarchy_Flatteningtransformobject,whichcanbefoundintheLocalObjectLibrary|Transforms|DataIntegratorsection.
4. OpentheHierarchy_Flatteningtransformintheworkspacewindowandchoosetheverticalmethodofhierarchyflattening.
5. SpecifythesourcePARENTandCHILDcolumnsinthecorrespondingtransformconfigurationsettings:
6. ClosethetransformeditorandlinktheHierarchy_FlatteningtransformobjecttothetargettemplatetableLOCATIONS_TREE_VERTICALcreatedintheDS_STAGEdatastore.
7. Saveandclosethedataflowtabintheworkspace.Yourjobshouldhavethreedataflowsnow:thefirstpreparesthehierarchicaldataset,thesecondflattensthisdatasethorizontally,andthethirdflattensthedatasetvertically.Bothresultdatasetsareinsertedintwodifferenttables:LOCATIONS_TREE_HORIZONTALandLOCATIONS_TREE_VERTICAL.
Howitworks…Thehorizontalflatteningresulttablelookslikethefollowing:
Youcannowseewhyitiscalled“horizontal”.Alllevelsofhierarchyarespreadacrossdifferentcolumnshorizontally.
CURRENT_LEAFshowsthenameofthespecificnodeandLEAF_LEVELshowswhichcolumnitcanbefound.
Theconvenienceofthismethodisthatyoucanseethefullpathtothenodeinonerow,startingfromtherootnode,andseetheLEVELcolumnswhereLEVEL0showstherootnode.
Verticalflatteninglooksabitdifferent:
ANCESTORandDESCENDENTarebasicallythesamePARENTandCHILDentities,butoutputresultssetafterhierarchyflatteninghavealotmorerecordsasextrarecordsshowingthe
dependencywerecreatedbetweenthetwonodeseveniftheyarenotrelateddirectly.
TheDEPTHcolumnshowsthedistancebetweentworelatednodes,where0meansthisisthesamenode,1meansthatthenodesarerelateddirectly,and2meansthatthereisanotherparentnodebetweenthem.
TheROOT_FLAGcolumnflagstherootnodesandtheLEAF_FLAGcolumnflagstheendleafnodesthatdonothavedescendants.
Asyoucanseefromthestepsofthisrecipe,theconfigurationoftheHierarchy_Flatteningtransformisextremelysimple.Allthatisrequiredfromyouistospecifytheparentandchildcolumnsthatstoretherelationshipsbetweentheneighbornodesofthehierarchy.
Extraparametersspecifictoeachtypeofhierarchyflatteningareexplainedasfollows.
Maximumdepth:Itexistsonlyforthehorizontalmethodbecausethismethodusesnewcolumnsfornewlevelsofhierarchy,andDataServicesneedsyoutospecifyhowmanyextracolumnsyouwanttocreateinyourresulttargettable.Imaginethesituationwhenyourhierarchicaldatasetstoresanextremelydeephierarchy—100levelsormore—andyoudonotknowaboutthisafterhavinglookedattheunflattenedhierarchyrepresentationwithonlyparentandchildfields.Inthatcase,atablewithafewhundredcolumnsforeachhierarchylevelmaynotbewhatyouarelookingfor.So,thisparameterallowsyoutocontroltheflatteningbehaviorofthetransform.Usemaximumlengthpaths:Thisparameterisspecifictoonlytheverticalmethodofhierarchyflattening.ItaffectsonlythevalueoftheDEPTHfieldintheresultoutputschema.Itworksonlyinsituationswhentherearemultiplepathsfromthedescendenttoitsancestorandtheyareofadifferentlength.SelectingthisoptionwillalwayspickthehighestnumberfortheDEPTHfieldoutofthesemultiplepaths.
QueryingresulttablesNow,let’strytoqueryaresulttablesothatyoucouldseehoweasyitistonowperformtheanalysisofthedata.YoucanrunthefollowingqueriesintheSQLServerManagementStudiowhenconnectedtotheSTAGEdatabase.
Selectallrootnodesofthehierarchy:selectCURRENT_LEAFfromdbo.LOCATIONS_TREE_HORIZONTALwhereLEAF_LEVEL=0orderbyCURRENT_LEAF;
selectANCESTORfromdbo.LOCATIONS_TREE_VERTICALwhereDEPTH=0andROOT_FLAG=1orderbyANCESTOR;
BothSQLstatementsproducethesameresult—alistof13rootnodes(weknowthatthosearecountries).
Checkif“UnitedStates”nodehasaleafnode“Aurora”amongitsdependents:select*fromdbo.LOCATIONS_TREE_HORIZONTALwhereLEVEL0=‘UnitedStates’andCURRENT_LEAF=‘Aurora’;
select*fromdbo.LOCATIONS_TREE_VERTICALwhereANCESTOR=‘UnitedStates’andDESCENDENT=‘Aurora’;
Theresultreturnedbytwoquerieslooksdifferent:
Youcanseethatthehorizontalviewismoreconvenientifyouwanttoseethefullpathtotheleafnodefromthetoprootnode.
TheverticalviewismoreconvenienttouseinSQLqueriesassometimesyoudonothavetofigureoutwhichcolumnyouhavetouseifyouwanttodoaspecificoperationonaspecificlevelofhierarchy.Resultcolumnsofverticalhierarchyflatteningarealwaysthesameandstatic,whereashorizontalhierarchyflatteningproducesanumberofcolumnsthatdependsonthedepthoftheflattenedhierarchy.
ThedecisionofwhattypeofhierarchyflatteningtouseshouldbemadeaftertakingintoaccountthetypeofSQLqueriesthatwillbeusedtoquerythisflatteneddata.
NoteIfyouhaveexperimentedwiththehierarchyflatteningresultdatasets,youwouldhaveprobablynoticedthatsomequerieswrittenagainst“horizontal”and“vertical”resulttablesproducedifferentresultsandarenotexactlywhatisexpected.Thathappensbecauseourparentandchildcolumnsaretextfields(namesofthecountries,regions,andcities),andtheydonotrepresenttheuniquenessofeverynode.Forexample,thereisastate“Ontario”thatbelongstoCanadaandacity“Ontario”thatbelongstothestateCalifornia.DataServicesdoesnotknowaboutthefactthatthesetwoaredifferentnodesandconsidersthemtobethesamenode(asthenamevaluematches).Youshouldkeepthatinmindanduseuniqueidentifiersforthenodesinparentandchildfieldsforhierarchyflatteningtoproducevalidandconsistentresults.
ConfiguringAccessServerAccessServerisrequiredforreal-timejobstowork.Inthisrecipe,wewillgothroughthestepsofcreatingandconfiguringtheAccessServercomponentthatwillberequiredforournextrecipe,wherewearegoingtocreateourfirstreal-timejob.
GettingreadyAccessServercanbecreatedandconfiguredwiththehelpoftwoDataServicestools:
DataServicesServerManager( )andDataServicesManagementConsole( ).
Howtodoit…1. StartSAPDataServicesServerManager.2. GototheAccessServertab.3. ClickontheConfigurationEditorbutton.4. OntheAccessServerConfigurationEditorwindow,clickontheAddbutton.5. FillintheAccessServerconfigurationfields,asshowninthefollowingscreenshot:
6. DonotforgettoenableAccessServerbytickingthecorrespondingoption.7. ClickonOKtocloseandsavethechanges.8. StarttheSAPDataServicesManagementconsoleinyourbrowserandlogin.9. GototheAdministrator|Managementsection.10. ClickontheAddbuttontoaddthepreviouslycreatedAccessServer.11. SpecifythehostnameandAccessServercommunicationport,andclickonApplyto
addtheAccessServer.
Howitworks…AccessServerisastandardDataServicescomponentthatservesasamessagebrokeracceptingrequestsandmessagesfromexternalsystems,forwardingthemtoDataServices,real-timeservicesforprocessing,andthenpassestheresponsebacktotheexternalsystem.
Inotherwords,thisisthekeycomponentrequiredinordertofeedreal-timejobswiththesourcedataandgetoutputdatafromthem.
Wewillcreateareal-timejobinthenextrecipeandexplainthedesignprocessofreal-timejobsindetail.Inthemeantime,youshouldonlyknowthatthemainsourceandtargetobjectsofreal-timejobsaremessages(mostcommonlyinanXMLstructure)andthatAccessServerisresponsiblefordeliveringthosemessages.
Withtheprecedingsteps,theAccessServerservicewascreatedandenabledintheDataServicesenvironmentandisnowreadytoaccepttherequestsfromexternalsystems.
Creatingreal-timejobsInthisrecipe,wewillcreatereal-timejobsandemulatetherequestsfromtheexternalsystemusingtheSoapUItestingtoolinordertogettheresponsewithprocesseddataback.Wewillgothroughallthestepsrequiredtoconfigureallcomponentsrequiredforreal-timejobstowork.
GettingreadyInthissection,wewillinstalltheopensourceSoapUItoolandcreateanewprojectthatwillbeusedtosendandreceiveSOAPmessages(XML-basedformat)toandfromDataServices.
InstallingSoapUIYoucandownloadandinstallSoapUIusingtheURLhttp://www.soapui.org/.
Theinstallationprocessisverystraightforward.Allyouhavetodoisjustfollowtheinstructionsonthescreen.
Aftertheinstallationiscomplete,starttheSoapUI.UsetheSOAPbuttoninthetoptoolbarmenutocreateanewSOAPproject.SpecifytheprojectnameandinitialWSDLaddress,asshowninthefollowingscreenshot:
TheinitialWSDLaddresscanbeobtainedfromDataServices.Togetit,logintoDataServicesManagementConsole,gototheAdministratorsection,chooseWebServices,andclickontheViewWSDLbuttonatthebottomofthemainwindow.
Inthenewopenedwindow,selectandcopythetopURLaddressandpasteitintheNewSOAPProjectconfigurationwindow,asshowninthefollowingscreenshot:
Atthispoint,wehavemadeaninitialconfigurationandcanproceedwithactuallycreatingreal-timejobsattheDataServicesside.
Howtodoit…Now,wehavean“external”systeminplaceandconfiguredinordertosendusrequestmessages.WerememberthattheDataServicescomponentresponsibleforacceptingthesemessagesandsendingthembackisAccessServer,anditwasalreadyconfiguredbyusinthepreviousrecipe.Now,weneedthelastandmostimportantcomponenttobecreatedandconfigured—DataServicesreal-timejob,whichwillbeprocessingtheseSOAPmessagesandreturningtherequiredresult.
Thegoalofourreal-timejobwillbetoprovidethefullnamesofthelocationcodesforaspecificcityandthepostalcodeofthecity.
1. GototheLocalObjectLibrary|Jobssection,right-clickonReal-TimeJobs,and
chooseNewfromthecontextmenu.2. Anyreal-timejobiscreatedwithtwodefaultmandatoryobjectsthatdefinethe
bordersofthereal-timejobprocessingsection:RT_Process_beginsandStep_ends.3. Createtwoscripts,Init_ScriptandFinal_Script,andplacethemcorrespondingly
beforeandafterreal-timejobprocessingsection.4. Insidethereal-timejobprocessingsection,createadataflowandnameit
DF_RT_Lookup_Geography,asshowninthefollowingfigure:
5. Now,openthedataflowDF_RT_Lookup_Geographyforeditinginthemainworkspacewindow.Firstwehave,tocreatefileformatsforourrequestandresponsemessages.
6. CreatearequestfileinyourC:\AW\FilesfoldernamedRT_request.xsd:<?xmlversion=“1.0”encoding=“UTF-8”?>
<xsd:schemaxmlns:xsd=“http://www.w3.org/2001/XMLSchema”>
<xsd:elementname=“Request”>
<xsd:complexType>
<xsd:sequence>
<xsd:elementname=“CITY”type=“xsd:string”/>
<xsd:elementname=“STATEPROVINCECODE”type=“xsd:string”/>
<xsd:elementname=“COUNTRYREGIONCODE”type=“xsd:string”/>
<xsd:elementname=“LANGUAGE”type=“xsd:string”/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
7. CreatearesponsefileinyourC:\AW\FilesfoldernamedRT_response.xsd:<?xmlversion=“1.0”encoding=“UTF-8”?>
<xsd:schemaxmlns:xsd=“http://www.w3.org/2001/XMLSchema”>
<xsd:elementname=“Response”>
<xsd:complexType>
<xsd:sequence>
<xsd:elementname=“CITY”type=“xsd:string”/>
<xsd:elementname=“POSTALCODE”type=“xsd:string”/>
<xsd:elementname=“STATEPROVINCENAME”type=“xsd:string”/>
<xsd:elementname=“COUNTRYREGIONNAME”type=“xsd:string”/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
8. Tocreatearequestmessagefileformat,openLocalObjectLibrary|Formats,right-clickonNestedSchemas,andchooseNew|XMLSchema…fromthecontextmenu.SpecifythefollowingsettingsintheopenedImportXMLSchemaFormatwindow:
9. Tocreatearesponsemessagefileformat,openLocalObjectLibrary|Formats,right-clickonNestedSchemas,andchooseNew|XMLSchema…fromthecontextmenu.SpecifythefollowingsettingsintheopenedImportXMLSchemaFormatwindow:
10. ImportRT_Geography_requestasasourceobjectintothedataflowDF_RT_Lookup_GeographyandlinkittotheRequestQuerytransform,propagatingallcolumnstotheoutputschema.ChoosetheMakeMessageSourceoptionwhenimportingtheobjectintoadataflow.
11. ImportRT_Geography_responseasatargetobjectintothedataflowDF_RT_Lookup_Geography.ChoosetheMakeMessageTargetoptionwhenimportingtheobjectintoadataflow.
12. ImporttheDIMGEOGRAPHYtableobjectfromDWHdatastoreandjoinitwiththeRequestQuerytransformusingtheLookup_DimGeographyQuerytransform.Configurethemappingsettingsaccordingtothefollowingscreenshots:
13. GototheFROMtabandconfigurethejoinconditionsforLEFTOUTERJOINbetweentheRequestQuerytransformandDIMGEOGRAPHYsourcetable:
14. LinktheLookup_DimGeographyQuerytransformtothetargetRT_Geography_responseXMLschemaobject.
15. Yourdataflowshouldlooklikethefollowingfigure:
16. Saveandvalidatethejob.
Howitworks…Thedataflowwecreatedinourreal-timejobacceptsXMLmessages(requests)asaninputandproducesXMLmessages(responses)asanoutput.
WeusetheDIMGEOGRAPHYtablefromourdatawarehousetofetchthepostalcode,fullstate/provincename,andcountrynameineitherFrench,English,orSpanish,dependingonwhichcityandlanguagecodewerereceivedintherequestmessage.
Basically,ourreal-timejobservesasalookupmechanismagainstdatawarehousedata.
Let’spublishourreal-timejobasawebserviceanddoourfirsttestruntoseehowtheexchangingmessagesmechanismworks.
1. OpentheDataServicesManagementConsole|Administrator|Real-Time|
<YourServerName>:4000|Real-TimeServices|Real-TimeServiceConfigurationtab.
2. ClickonAddtoaddthereal-timeserviceLookup_Geography;usethefollowingsettingstoconfigureit,andclickonApplywhenyouhavefinished:
3. GotothenextReal-TimeServicesStatustabandstarttheservicejustcreatedbyselectingitandclickingontheStartbutton:
Theiconofthereal-timeserviceshouldbecomegreen.
1. GototheAdministrator|WebServices|WebServicesConfigurationtab.2. SelectAddReal-TimeServicesinthebelowcomboboxandclickontheApply
buttonontheright.3. SelectLookup_GeographyfromthelistandclickonAdd:
4. TheLookup_Geographyreal-timeserviceshouldappearontheWebServicesStatustab,asshowninthefollowingscreenshot:
5. Wehavesuccessfullypublishedourcreatedareal-timejobasreal-timewebservice.Now,openSoapUIandmakesurethatyoucanseetheLookup_Geographywebservice.Todothat,starttheSoapUItoolandexpandtheDSProject|Real-time_Servicestabintheprojecttreepanel.
6. Right-clickontheLookup_GeographyitemandchooseNewrequestfromthecontextmenu.
7. ExpandLookup_Geographyanddouble-clickontheGeography_requestitem.8. Youwillseethatthenewwindowontherightopensshowingtwopanels:requestand
response.9. FillinvaluesinallthefieldsoftherequestXMLstructureandclickonthegreen
trianglebuttontosubmitarequest.Theresponseisreceivedanddisplayedintherightpanel.Asyoucansee,ithasthedatafromDIMGEOGRAPHYtable,whichresidesinthedatawarehouse:
NoteOneofthemostpopularcasesofusingreal-timejobsiscleansingthedatathroughwebservicesrequests.Areal-timejobreceivesthespecificvalueandpassesitthroughtheDataQualitytransformsavailableinDataServicesinordertocleanseitandthenreturnstheresultintheresponsemessage.
Chapter11.WorkingwithSAPApplications
IntroductionThischapterisdedicatedtothetopicofreadingandloadingdatafromSAPsystemswiththeexampleofaSAPERPsystem.DataServicesprovidesquickandconvenientmethodsofobtaininginformationfromSAPapplications.Asthisisaquitevasttopictodiscuss,therewillbeonlyonerecipebut,nevertheless,itshouldcoverallaspectsofextractingandloadingdataintoSAPERP.
LoadingdataintoSAPERPWewillnotdiscussthetopicofconfiguringtheSAPsystemtocommunicatewiththeDataServicesenvironmentasitwouldrequireanotherfewchaptersonthesubjectand,mostimportantly,itisnotthepurposeofthisbook.AllthisinformationcanbefoundindetailedSAPdocumentationavailableathttp://help.sap.com.
WepresumethatyouhaveexactlythesameDataServicesandstagingenvironmentsconfiguredandcreatedinthepreviouschaptersofthisbookandhavealsoinstalledandconfiguredtheSAPERPsystem,whichcancommunicatewiththeDataServicesjobserver.
Inthisrecipe,wewillgothroughthestepsofloadinginformationintotheSAPERPsystembyusingDataServices.Inoneofourpreparationprocesses,wewillbegeneratingdatarecordsforinsertionrightinthedataflow,whenusually,youhavethedatareadytobeloadedinthestagingareaextractedfromanothersystem.
WewillalsoreviewthemainSAPtransactionsinvolvedintheprocessofmanuallycreatingdataobjects,monitoringtheloadingprocessontheSAPside,andthetransaction,whichmightbeusedforthepost-loadvalidationofloadeddata.
Wewillbeusingtheexampleofcreating/loadingbatchdata,whichisrelatedtomaterialdatainSAPERP.First,wewillcreatethespecificmaterialrequiredforthebatchdatatobeloaded.Then,wewillcreatethetestbatchmanuallytoseehowitisdoneonSAPside,andthenwewilldevelopETLcodeinDataServices,whichwillpreparethebatchrecordandsendittotheSAPside.
GettingreadyThefirstthingwehavetodoistocreatethematerialforwhichwewillbecreatingbatchesinSAP.
1. LogintotheSAPERPsystemandrunthetransactionMM01tocreatenewmaterial.2. SpecifyMaterialasRAWMAT01,MaterialType,andIndustrysector:
3. Selectthefollowingviewsforthenewmaterial:Basicdata1,Basicdata2,Classification,andGeneralPlantData\Storage1:
4. Onthenextwindow,specifyOrganizationLevels:
5. Onthenextscreen,defineBaseUnitofMeasureandmaterialdescription:
6. OntheSales:general/planttab,ticktheBatchmanagementcheckboxtodefinethematerialasbatchmanaged:
7. Finally,ontheClassificationtab,classifythematerialastherawmaterialofclasstype023:
ClickontheContinue(Enter)buttoninthetop-leftcornertosaveandcreatenewmaterial.
Now,wecanmanuallycreatethefirstbatchobjectforournewmaterialsothatwecanlatercompareittothebatchobjectthatwillbegeneratedandinsertedbyDataServicesjobsautomatically.
8. RunthetransactionMSC1Ntocreateanewbatch,andspecifythematerialnumberandbatchnamethatyouwouldliketocreate:
9. ClickonContinue,andonthenextscreen,fillinthevaluesforthefollowingfields:theDateofManufacturebatch,theLastGoodsReceiptdate,andCtryofOrigin:
10. ClickonContinuetosaveandcreateanewbatch,20151009.
ThelastpreparationstepwehavetocompleteistheconfigurationofapartnerprofileinourSAPERPsothatthesystemcanaccepttheIDocmessagescontainingthebatchdatathatwillbesenttoSAPERPfromDataServices.
11. RunthetransactionWE20toconfigurethepartnerprofile.12. OnthePartnerprofileswindow,selectthePartnerTypeLSsectionandselectthe
clientyouarecurrentlyusing:
MakesurethatyourPartn.statusisActiveontheClassificationtabandthatyouhaveBATMASspecifiedintheInboundparmtrslist.Ifnot,thenclickontheCreateinboundparameterbuttonundertheInboundparmtrstabanddefinetheBATMASinboundparameter:
Now,everythingisreadyontheSAPERPsideandallwehavetodoiscreatetheDataServicesjobthatwillgenerateandsendthedataintotheSAPERPsystemforinsertion.
Howtodoit…1. StartDataServicesDesignerandgotoLocalObjectLibrary|Datastores.2. Right-clickontheemptyspaceoftheDatastorestabandchooseNewfromthe
contextmenuinordertocreatenewdatastoreobject.3. Createanewdatastore,SAP_ERP,byspecifyingthedatastoretypeSAPApplications
anddatabaseservernamealongwithyourSAPcredentials.4. ClickontheAdvancedbuttonandspecifytheadditionalsettingsrequiredforsetting
uptheconnectiontotheSAPERPsystem,suchasClientnumberandSystemnumber.Seethefollowingscreenshotforthefulllistofconfigurationsettings:
ClickonOKtocreatethedatastoreobject.
5. Importthefollowingobjectsinyourdatastorebyright-clickingontherequiredsectionoftheobjectyouwanttoimportandchoosingtheImportByName…optionfromthecontextmenu:
TheIDocobjectBATMAS03willbeusedasatargetobjecttotransferbatchdatatotheSAPsystem.
TheMARAandMCH1tableswillbeusedassourceobjectstoextractdatafromtheSAPsystemforpre-loadandpost-loadvalidationpurposes.
6. Createanewjobcontainingfourlinkeddataflowobjects,asshowninthefollowingscreenshot:
7. OpenthefirstDF_SAP_MARAdataflowintheworkspacewindowforeditingandspecifytheMARAtableobjectimportedintheSAP_ERPdatastoreasasourceandthenewSAP_MARAtemplatetableintheSTAGEdatabaseasatarget.PropagateallcolumnsfromthesourceMARAtabletoSAP_MARAusingQuerytransform.Runthejobonceandimportthetargettableobject:
8. OpenDF_Prepare_Batch_Dataintheworkspacewindowforediting.9. AddtheRow_Generationtransformasasource.Setituptogenerateonlyone
recordwiththerownumberstartingat1.10. LinkittotheCreate_Batch_RecordQuerytransform,whichwillbeusedtodefine
thefieldsofthecreatedrecord.Usethefollowingscreenshotasareferenceforcolumnnamesandmappings:
11. AddanotherQuerytransformnamedValidate_Material,linkCreate_Batch_Recordtoit,andpropagateallcolumnsfromtheinputschematotheoutputschema.
12. Addanextracolumnasanewfunctioncallofthelookup_extfunctionandconfigureitasshowninthefollowingscreenshot,lookinguptheMATNRfieldfromtheSAP_MARAtablebytheMATERIALfieldvaluefromtheinputschema:
13. AddtheValidationtransform,forkingthedatasetintothreecategories—Rule,Pass,andFail,sendingtheoutputstothreetargettables:BATCH,BATCH_REJECT,andBATCH_REJECT_RULE,asshowninthefollowingscreenshot:
14. OpentheValidationtransformintheworkspacewindowforeditingandaddinganewvalidationrule:
15. TheValidationtransformeditorshouldlookasshowninthefollowingscreenshot:
Closethedataflowandsavethejob.
16. Openthethirddataflow,DF_Batch_IDOC_Load,intheworkspacewindowforediting.17. Buildthestructureofthedataflow,asshowninthefollowingscreenshot.Thestepsto
configureeachofthedataflowcomponentswillbeprovidedfurther.
18. TheRow_Generationtransformshouldbeconfiguredtogenerateonerecord.UsethefollowingtabletodefineoutputschemamappingsintheEDI_DC40Querytransform.ThefollowingtablehastherecordsonlyforthemandatorycolumnsoftheEDI_DC40IDocsegment.PopulatetherestofthemwithNULLvalues.
Columnname Datatype Mappingexpression
TABNAM varchar(10) ‘EDI_DC40’
MANDT varchar(3) ‘100’
DOCREL varchar(4) ‘740’
DIRECT varchar(1) ‘2’
IDOCTYP varchar(30) ‘BATMAS03’
MESTYP varchar(30) ‘BATMAS’
SNDPOR varchar(10) ‘TRFC’
SNDPRT varchar(2) ‘LS’
SNDPRN varchar(10) ‘SBECLNT100’
CREDAT date sysdate()
CRETIM time systime()
ARCKEY varchar(70) ‘1’
NotePleasekeepinmindthatsomeofthevaluesinmappingexpressionsforthisspecificsegment,EDI_DC40,arespecifictoyourownSAPenvironment.SomeofthemareMANDTandSNDPRN,whichshouldbeobtainedfromyourSAPadministrator.
Toobtainthefulllistofcolumnsrequiredforthespecificsegment,refertotheBATMAS03objectstructureitself.
19. OpentheE1BATMASQuerytransformintheworkspacewindowforeditinganddefinethefollowingmappingsfortheoutputschemacolumns:
Columnname Datatype Mappingexpression
MATERIAL varchar(18) BATCH.MATERIAL
BATCH varchar(10) BATCH.BATCH_NUMBER
ROW_ID int BATCH.ROW_ID
20. OpentheE1BPBATCHATTQuerytransformintheworkspacewindowforeditinganddefinethefollowingmappingsfortheoutputschemacolumns:
Columnname Datatype Mappingexpression
LASTGRDATE date to_date(BATCH.GOODS_RECEIPT_DATE,‘YYYYMMDD’)
COUNTRYORI varchar(3) BATCH.COUNTRY_OF_ORIGIN
PROD_DATE date to_date(BATCH.DATE_OF_MANUFACTURE,‘YYYYMMDD’)
ROW_ID int BATCH.ROW_ID
21. OpentheE1BPBATCHATTXQuerytransformintheworkspacewindowforeditinganddefinethefollowingmappingsfortheoutputschemacolumns:
Columnname Datatype Mappingexpression
LASTGRDATE varchar(1) ‘X’
COUNTRYORI varchar(1) ‘X’
PROD_DATE varchar(1) ‘X’
ROW_ID int BATCH.ROW_ID
22. OpentheE1BPBATCHCTRLQuerytransformintheworkspacewindowforeditinganddefinethefollowingmappingsfortheoutputschemacolumns:
Columnname Datatype Mappingexpression
DOCLASSIFY varchar(1) ‘X’
ROW_ID Int BATCH.ROW_ID
23. OpentheIDOC_Nested_SchemaQuerytransformintheworkspacewindowforediting.
24. DraganddropEDI_DC40andE1BATMASsegmentsfromtheinputschemaintotheoutputschemaoftheIDOC_Nested_SchemaQuerytransform.
25. Double-clickonoutputschemaIDOC_Nested_Schematomakeitsstatusto“current”,opentheFROMtab,andselectonlytheE1BATMASinputschema.MarktheEDI_DC40segmentintheoutputnestedschemaasrepeatable(thefulltableicon).Ifthesegmentschemaiscreatedasrepeatablebydefaultthendonotchangeit.MarktheE1BATMASoutputschemasegmentasnon-repeatable.Todothat,makeitcurrentbydouble-clickingonit,andthenright-clickonit,unselectingtheRepeatableoptionfromthecontextmenu.SeethedifferencebetweentheoutputschemaiconsforEDI_DC40andE1BATMASasforrepeatableandnon-repeatablesegments.
26. Double-clickonthefirstEDI_DC40outputsegmenttomakeitsstatus“current”.OpentheFROMtabandselectonlytheEDI_DC40inputschema:
27. Double-clickonthesecondE1BATMASoutputsegmenttomakeitcurrent.OpentheFROMtabandselectonlytheEDI_DC40inputschema,inthesamewayasforthepreviousEDI_DC40outputschema.Also,deletetheROW_IDcolumnfromtheoutputschemaanddraganddroptherestoftheinputschemasE1BPBATCHATT,E1BPBATCHATTX,andE1BPBATCHCTRLinsidetheE1BATMASoutputschemacreatingnestingstructure:
28. Double-clickonthenestedE1BPBATCHATToutputschematomakeitcurrent.DeletetheROW_IDcolumnfromtheoutputschema.OntheFROMtab,selecttheE1BPBATCHATTinputschema.OntheWHEREtab,specifythefilteringcondition:(E1BPBATCHATT.ROW_ID=E1BATMAS.ROW_ID).
29. Performtheprecedingsamestepforthenextoutputsegment.Double-clickonthenestedE1BPBATCHATTXoutputschematomakeitcurrent.DeletetheROW_IDcolumnfromtheoutputschema.OntheFROMtab,selecttheE1BPBATCHATTXinputschema.
OntheWHEREtab,specifythefilteringcondition:(E1BPBATCHATTX.ROW_ID=E1BATMAS.ROW_ID).
30. Performthesameprecedingstepforthenextoutputsegment.Double-clickonthenestedE1BPBATCHCTRLoutputschematomakeitcurrent.DeletetheROW_IDcolumnfromtheoutputschema.OntheFROMtab,selecttheE1BPBATCHCTRLinputschema.OntheWHEREtab,specifythefilteringcondition:(E1BPBATCHCTRL.ROW_ID=E1BATMAS.ROW_ID).
31. ThetargetobjectBATMAS03importedintotheSAP_ERPdatastoreshouldbeconfiguredusingthevaluesshowninthefollowingscreenshot.OpentheBATMAS03targetobjectinthedataflowinthemainworkspaceforeditingtoconfigureit.
Closethedataflowobject.Saveandvalidatethejobtomakesurethatyouhavenotmadeanysyntaxerrorsinyourdataflowdesign.
32. Openthelastdataflow,DF_SAP_MCH1,foreditingintheworkspacewindow.33. AddtheMCH1tablefromtheSAP_ERPdatastoreasasourceobject.34. PropagateallthecolumnsfromtheMCH1tabletotheoutputschemausingthelinked
Querytransform.35. Addanewtemplatetable,-SAP_MCH1,fromtheSTAGEdatastoreasatargettable
object.36. Save,validate,andrunthejob.
Howitworks…TheprecedingstepsshowthecommonprocessofloadingdataintotheSAPsystemusingtheIDocmechanism.Theloadprocessusuallyconsistsoffewsteps:
ExtractmasterdatafromtheSAPsystemtomakesurethatwearereferencingthecorrectobjectsexistinginthetargetsystemProcessofbuilding/preparingdatasetforloadProcessofloadingthedataintoSAPThepost-validationprocessofextractingdataloadedinSAPbackintothestagingareaforvalidation
Let’sreviewalltheseprocessesbuiltintheformofadataflowinmoredetail.
Thefirstdataflow,DF_SAP_MARA,willbeextractingmaterialdatafromSAPERPforvalidationpurposestomakesurethatwedonottrytocreateabatchformaterialthatdoesnotexistinthetargetSAPsystem.
Theseconddataflow,DF_Prepare_Batch_Data,preparesthebatchrecordtobeloadedinSAP.AsyoucanseefromtheoutputschemamappingofoneoftheQuerytransforms,wepreparethebatch2015100901tobecreatedformaterialRAWMAT01.Asyoumightremember,wehavealreadymanuallycreatedbatch20151009.TherestofthemappingsshowthatwehavealsopopulatedtheCtryoforigin,LastGoodsReceipt,andDateofManufacturefields.
Thethirddataflow,DF_Batch_IDOC_Load,transformsthepreparedbatchrecordintothenestedformatofanIDocmessageandsendsthisIDocmessagetoSAP.Furthermore,wewilltakealookathowyoucanmonitortheprocessofreceivingandloadingIDocsontheSAPside.
Finally,thefourthdataflow,DF_SAP_MCH1,extractstheSAPtableMCH1,whichcontainsinformationaboutbatchescreatedinSAPforpost-loadvalidationpurposes.ThatallowsustoseewhichbatcheswereactuallyloadedinSAPandruntheSQLqueriesinourstagingareatovalidatefieldvalues.
IDocIDocisaformatandtransfermechanismthatSAPsystemsusetoexchangedata.DataServicesutilizesthismechanisminordertosendandreceiveinformationfromSAPsystems.IDocsthattheSAPsystemreceivesarecalledinboundandIDocssentbySAParecalledoutbound.YousawthattransactionWE20wasusedtoconfigureInboundIDocparameterssothatSAPcouldsuccessfullyacceptBATMASIDocmessagessenttoitfromDataServices.
BATMASIDocusedtoloadbatchdatahasanestedstructure,andthatiswhywehadtonestmultipledatasetswiththehelpofQuerytransform.WeusedartificialIDkeyROW_IDtolinkallthenestedsegmentstogether.
KeepinmindthatDataServicesdoesnotloaddatainSAPtablesdirectlyitself.AllDataServicesdoesispreparesthedataintheIDocformatsothatitcanbereceivedbySAPand
loadedintoSAPtablesusinginternalmechanisms/programs.
MonitoringIDocloadontheSAPsideDataServicessendsIDocmessagestoSAPsynchronously.AnIDocmessageisreceivedbySAPandthenprocessed.OnlyafterthatdoesDataServicessendsthenextIDocmessage.Sometimes,thisprocesscantakequitealongtime.AllyouwillseeintracelogontheDataServicessideisonerecordindicatingthatthedataflowloadingdataisstillrunning.
ToseewhatisgoingontheSAPside—howmanyIDocsfailandhowmanyofthemareprocessedsuccessfullybySAP—youcanusetransactionBD87:
ByexpandingtheBATMASsectionanddouble-clickingontheactualIDocrecordthatyouareinterestedin,youcanseethedataintheIDocnestedsegments:
Otherusefulinformationavailableonthisscreenincludes:
ThestatusoftheIDoc(processedsuccessfullyorfailed)Errormessages(iffailed)DatarecordsstoredinIDocmessage(E1BATMAS,E1BPBATCHATT,E1BPBATCHATTX,and
E1BATCHCTRLsegments)
Asyoucansee,theEDI_DC40segmentisnotvisibleasitisanIDocheaderitself.InformationwehaveprovidedinthissegmentisavailableintheShortTechnicalInformationpanelanddefinesthebehaviorofIDocprocessing.
ByclickingontheRefreshbuttonontheStatusMonitorforALEMessagesscreen,youcanseeinrealtimehowtheIDocsreceivedbySAPareprocessed.
Post-loadvalidationofloadeddataWeknowthatoneofthetablesinSAPwherebatchmasterdataisstoredisMCH1.Knowingwhichphysicaltablesareactuallypopulatedwithdatawhenyouenterdatamanuallyviatransactionalscreens,orloadingdatacomingfromexternalsystemsviaanIDocmechanism,isusefulasyoucanalwaysextractthecontentsofthesetablestoperformpost-validationtasks.
Toviewournewlycreatedbatch2015100901,wecanusetransactionMSC3N(DisplayBatch):
Or,wecanseethecontentsoftheMCH1tabledirectlyusingtheSE16transaction(DataBrowser):
Youcanseebothbatcheshere:theonecreatedmanuallyandtheoneloadedwiththehelpofDataServices.
DoyourememberthatwedevelopedadataflowtoextracttheMCH1tabletovalidateloadeddata?Let’schecktheactualrecordsextractedrightaftertheloadingprocesshasbeencompletedbybrowsingthecontentsoftheSAP_MCH1tableinourstagingarea:
TheCHARGcolumnintheMCH1tablestoresthebatchnumbervalues.
TipAstechnicalnamesinSAPtablescanbequitedifficulttounderstand,youcanusetransactionSE11toseethedescriptionsofthecolumnsforthespecifictable.
Thereismore…Wehavejustscratchedthesurfaceofoneofthepossiblemethodsofreading/loadingdatafromtheSAPsystem.
TherearemanyothermethodsthatcanbeusedtocommunicatewithSAPsystems:ABAPdataflows,BAPIcalls,directRFCcalls,OpenHubTables,andmanyothers.
Choosingbetweenthesemethodsusuallydependsonthetypeoftasksthathavetobeimplemented,theamountoftransferreddata,andthetypeofSAPenvironmentused.
Chapter12.IntroductiontoInformationStewardInthischapter,wewillseethefollowingrecipes:
ExploringDataInsightcapabilitiesPerformingMetadataManagementtasksWorkingwiththeMetapediafunctionalityCreatingacustomcleansingpackagewithCleansingPackageBuilder
IntroductionSAPInformationStewardisaseparateproductthatisinstalledalongsideSAPDataServicesandSAPBusinessIntelligenceandprovidesadditionalcapabilitiesforbusinessandITusersinordertoanalyzedataqualityandcreatecleansingpackagesthatcanincreasedatacleansingprocessesranbyDataServices.
TocoverallfunctionalitiesofInformationSteward,wewouldhavetowriteanotherbook.Inthischapter,wewillexplorethemainfunctionsofInformationStewardthatprovedthemselvestobethemostvaluabletousersoftheproduct.
AlltheseactivitiesrelatetospecificareaswithintheSAPInformationStewardapplication.
NoteLogintotheSAPInformationStewardapplicationathttp://localhost:8080/BOE/InfoStewardApp.
Onthemainpage,youcanseefivetabsthatrepresentthefourmainareasoftheInformationStewardproductfunctionality,asshowninthefollowingscreenshot:
ExploringDataInsightcapabilitiesTheDataInsighttabisthefirsttab,anditenablesyoutoprofilethedataavailablefromdifferentsources,buildvalidationrulesforthedata,anddesignascorecardinordertoseeavisualrepresentationofthequalityofyourdata.
GettingreadyBeforewelogintotheSAPInformationStewardapplication,wehavetocreatecoupleofInformationStewardobjectsinastandardCentralManagementConsole(CMC).ThegoalofthispreparationstepistodefinethesourcesofdatathatInformationStewardcanconnecttoinordertoperformdataqualityandanalysistasks.YoucandefinesomedatasourcesdirectlyintheInformationStewardapplicationlikeaflatfile,butsomeofthemshouldfirstbecreatedasconnectionsintheCMCInformationStewardarea.
1. LogintoCMCathttp://localhost:8080/BOE/CMC.2. GototheInformationStewardsection.3. ClickonConnectionsandclickontheCreateconnectionbuttoninthetopmenu.4. Fillinalltherequiredfields,asshowninthefollowingscreenshot,inordertocreate
aconnectionobjecttotheAdventureWorks_DWHSQLServerdatabase:
5. ClickontheTestConnectionbuttontovalidatetheinformationentered,andthenclickontheSavebuttontosavetheconnectionandexittheCreateConnectionscreen.
6. Thedwh_profileconnectionshouldappearinthelistofconnectionsthatcanbeusedinInformationSteward.
7. Finally,let’screateanewDataInsightprojectcalledGeography.Todothat,gototheDataInsightsectionandclickontheCreateaDataInsightprojectbutton.
Howtodoit…Beforeyoustartwiththefollowingsteps,firstlogintoSAPInformationStewardathttp://localhost:8080/BOE/InfoStewardApp.
ThecommonsequenceofactionsperformedontheDataInsighttabinInformationStewardincludes:
CreatingaconnectionobjectProfilingthedataViewingprofilingresultsCreatingavalidationruleCreatingascorecard
CreatingaconnectionobjectThefollowingstepsarerequiredtospecifythesourceofourdataforourDataInsightanalysis.
1. GotoDataInsight|GeographyProject.2. SelecttheWorkspaceHometabandclickontheAdd|Tables…buttoninthetop-
leftcorner.3. Intheopenedwindow,selectthedwh_profileconnectionobject,thenexpandit,
selectthedbo.DimGeographytable,andthenclickontheAddtoProjectbutton,asshowninthefollowingscreenshot:
Profilingthedata
Profilingorgatheringvariouskindsofinformationaboutthedatacanbeusedfordataanalysis.
1. ToprofilethedataintheaddedDimGeographytable,youcanusevariousprofiling
options.Let’scollectuniquenessprofilingdata.OntheWorkspaceHometab,selecttheDimGeographytableinthedwh_profileconnectionandclickontheProfile|UniquenessbuttonintheProfileResultstoolbarmenu.
2. IntheDefineTasks:Uniquenesswindow,specifywhichcolumnsyouwanttogatherauniquenessprofileinformationfor.SelectCityandCountryRegionCodeandclickontheSaveandRunNowbutton,asshowninthefollowingscreenshot:
3. Togathercolumnprofilinginformation,selecttheDimGeographytableandclickontheProfile|ColumnsbuttoninthetoolbarmenuoftheWorkspaceHome|ProfileResultstab.Specifyanameforthecolumnprofilingtask,Geography_column_profiling,andselectallprofilingoptions:Simple,Median&Distribution,andWordDistribution.Then,clickontheSaveandRunNowbuttontocreateandexecutethecolumnprofilingtask.
4. SelecttheTaskssectionontheleft-sidepaneltoseeboththeprofilingtaskscreatedintheprevioussteps.Youcanrunthemanytimefromthistabtorefreshtheprofilingdataaccordingtotheparametersspecified.
ViewingprofilingresultsThefollowingstepsshowyouhowtoviewthepreviouslygatheredprofilingresults.
1. Toseethedataprofileresults,gototheWorkspaceHome|ProfileResultstab.2. Expandthetableyouareinterestedintoseeitscolumnsandselectit.3. ClickontheRefresh|ProfileResultsbuttoninthetoolbarmenu.4. Then,byclickingonthefieldorspecificnumberyouareinterestedin,youcansee
thedetailedresultforthisfieldintheextrawindowsontheright-handsideofthescreen,andatthebottom,asshowninthefollowingscreenshot:
5. Toseetheresultsoftheuniquenessprofileinformationcollected,selecttheAdvancedviewmodeundertheProfileResultstab.
6. Intheopenedwindow,clickonthegreeniconintheUniquenesscolumnandselectthekeycombinationyouhavegatheredinformationon.Inourcase,wehavegathereduniquenessprofilinginformationfortwocolumnsoftheDimGeographytable,CityandCountryRegionCode,asshowninthefollowingscreenshot:
Byhoveringyourcursorovertheredzoneshowingthepercentageofnon-uniquerecordstheforselectedcombinationofcolumns,youcanseedetailedinformationsuchasthepercentageofnon-uniquerowsandnumberofnon-uniquerows.Inourcase,itis22.08%and151.Byclickingontheredzone,youcandisplaynon-uniquerowsatthebottomofthescreen.
Sofar,wehavegatheredtwotypesofprofilinginformation:columnprofiledataanduniquenessprofiledatafortheDimGeographytablelocatedinourdatawarehouse.
CreatingavalidationruleNow,let’sseehowyoucancreateavalidationruleinInformationStewardanddisplaytheresultofapplyingittothedatasetinagraphicalformbyusingscorecards.
1. OntheWorkspaceHome|ProfileResultstab,youcanfindayellowiconinthe
Advisorcolumnagainstthedbo.DimGeographytable,asshowninthefollowingscreenshot:
2. ClickontheyellowiconshownintheprecedingscreenshottolaunchDataValidationAdvisor:
3. WearenotgoingtoacceptthevalidationrulesuggestedbyDataValidationAdvisorandwillcreateourowncustomvalidationrule.
OurcustomrulewillcheckiftheDimGeographytablerecordhastranslatedvaluesinboththecolumns,FrenchCountryRegionNameandSpanishCountryRegionName.Tocreateanewrule,openthesecondverticaltableRules,whichisnexttotheWorkspaceHometab,andclickontheNewbuttonfromthetoolbarmenutocreateanewrule.
4. FillinalltheconfigurationfieldsofthenewFrench_Spanish_CountryRegionNamerule,asshowninthefollowingscreenshot:
Wehavecreatedtwoparameters,$French_translationand$Spanish_translation,ofthevarchardatatype.Eachparameterchecksthevalueineachofthetwocolumns,andintheDefinitiontab,wehavespecifiedtheconditiontobeappliedtothevalues.
5. ClickontheSubmitforApprovalbutton.TherulewillbesenttotheTaskstabforapprovalbyacategoryofusersspecifiedintheApproverfieldontheRuleEditorwindow.
6. TherulecanbeapprovedfromtheMyWorklistsection,asshowninthefollowingscreenshot:
7. GototheWorkspaceHome|RuleResultstabandclickontheBindtoRulebutton.
8. Bindtheruleparameterstothedwh_profile.dbo.DimGeographyfields,asshowninthefollowingscreenshot,andclickontheSaveandClosebutton:
9. ClickonRefresh|RuleResultstoseetheresultsofapplyingtheruletothecolumnsofthetablespecified,asshowninthefollowingscreenshot:
Theleftsideofthescreenshowstherulescoresforthespecifiedfieldsandthenumberofrecordsthatpassed/failedtherule.Inourexample,55rowsdonothaveeitheraFrenchorSpanishtranslationintheFrenchCountryRegionNameandSpanishCountryRegionNamefields.
Youcanseetheactualrecordsintheright-sidepanel.
10. YoucanseetheruleresultontheRulestabdirectly.AllyouneedtodoisselecttheruleandclickontheBindbutton.Theruleresultappearsontherightsideofthescreen,asshowninthefollowingscreenshot:
CreatingascorecardScorecardsareaconvenientwaytovisualizeandpresenthistoricalinformationaboutvalidationruleresults.
1. AscorecardcanbecreatedontheScorecardSetuptab.Thisisavery
straightforwardprocesswhereyoufirstspecifyKeyDataDomain,QualityDimension,thentheruleyouwanttoincludeinthescorecardoutput,and,finally,performrulebindingtolinktheruletotheactualdataset,asshowninthefollowingscreenshot:
2. Toviewthescorecardresults,gotoWorkspaceHomeandselecttheScorecardviewmodeinsteadofWorkspaceinthecomboboxlocationinthetop-rightcorner,asshowninthefollowingscreenshot:
Howitworks…Now,afterwehavecreatedourconnectionobject,gatheredtheprofilingdata,appliedthevalidationrule,andevencreatedthescorecardtoseeitsresults,let’sseeinmoredetailthevariousaspectsofthestepsperformed.
ProfilingAsyoucansee,workinginInformationStewardisaveryintuitiveprocess.
Asmentionedearlier,theDataInsightsectionofInformationStewardisallaboutunderstandingyourdata,whichispossiblewiththeprofilingcapabilitiesofIS.Inthemajorityofcases,profilingyourdataisthefirststepbeforestartinganydataqualityrelatedwork.Inthefollowingsection,wewillreviewthetypesofprofilingdataavailableintheProfileResultssection.
Thevaluesectionofprofilingdatashowstheactualborderandmedianvaluesfromthedatasetforaspecificfield.StringLengthprofilingvaluesprovideinformationaboutthesizeofthevalues.Thecompletenesssectionhelpsyoutoseeanygapsinthedata.Distributioncanbeextremelyusefultounderstandthecardinalityofthespecificfieldsinyourdataset.Forexample,seeingnumber7intheDistribution|ValuefieldoftheprofilingresultdataagainsttheCountryRegionCodefield,wewillknowthatwehaveonlysevendifferentvaluesinthatfield.Clickingonthatnumbershowsusthosevaluesandtheirdistributionintheright-handsidepanel.
RulesRulesallowyoutoanalyzethedataaccordingtocustomconditions.Rulesarecreatedforgeneralruleparameterssothatyoucanapplythesameruletodifferentdatasets,ifnecessary.Linkingtheruletoaspecificdatasetiscalledbinding.Itistheprocessoflinkingruleparameterstoactualtablefields.
Rulesareusuallydefinedbybusinessuserstounderstandhowdatacomplieswithspecificbusinessrequirements.
InformationStewardoffersaDataValidationAdvisorfeaturethatproposesthepreconfiguredrulesdependingontheprofilingresultsofyourdata.
ScorecardsScorecardsallowyoutogroupyourrulesandhelpyoutoseetrendsindatascorescalculatedbyspecificrules.
Thereismore…ThereismuchmoretotheDataInsightfunctionalitythanpresentedinthisrecipe.WehavejustscratchedthesurfaceofthebasicfunctionsavailableinthisareaofInformationSteward.
ItispossibletospecifyfileformatsdirectlyinInformationStewardinordertosourcedatafromflatfilesandfromExcelspreadsheets.
AnothergreatthingaboutInformationStewardDataInsightisthatitallowsyoutobuilddataviewsthatarebasedonmultiplesourcesofinformation.
Theintuitiveandwell-documentedinterfaceallowsyoutoeasilyexperimentandplaywithyourdataonyourown.Thisisalwaysaveryfascinatingprocessthatdoesnotrequireanydeeptechnicalknowledgeoftheunderlyingproduct.
PerformingMetadataManagementtasksThesecondtoolavailableinInformationStewardafterDataInsightisMetadataManagement.
TheMetadataManagementtoolisusedtocollectmetadatainformationfromvarioussystemsinordertogetacomprehensiveviewofitandtoanalyzetherelationshipsbetweenmetadataobjects.
Inthisrecipe,wewilltakealookattheexampleofusingMetadataManagementonourDataServicerepository,whichstorestheETLcodedevelopedforrecipesofthisbook.
GettingreadyAswithDataInsight,wehavetofirstestablishconnectivitytotheDataServicesrepository.ThisisusuallyanadministrationtaskthatcanbedoneinCMCintheInformationSteward|MetadataManagementsection.ClickonCreateanintegratorsourceandfillinalltherequiredfields,asshowninthefollowingscreenshot,todefineaconnectiontoDataServicesrepositoryfortheMetadataManagementtool:
Aftercreatinganintegratorsourceobject,youhavetorunitbyusingtheRunNowoptionintheobject’scontextmenu.ThatoperationwillperformthecollectionofmetadataorinformationaboutalltheobjectsintheDataServicesrepository.RememberthatanyrecentchangesmadetotherepositoryafterthisoperationwillnotbepropagatedtothecollectedMetadataManagementsnapshot,soyouwouldneedtoeitherrunitmanuallyorscheduleittorunregularlyaccordingtoyourrequirements.
ThefollowingscreenshotshowsyouhowtousetheRunNowoption:
TheLastRuncolumnshowsyouwhentheintegratorsourcedatawaslastupdated.
Toseethehistoryofruns,justselectHistoryfromtheintegratorsourceobjectscontextmenu,asshowninthefollowingscreenshot:
Thisscreencanshowyouhowlongittooktocollectmetadatainformationfromtherepositoryandevenprovidesaccesstothedatabaselogofthemetadatacollectionprocess,whichcanbeusedfortroubleshootinganypotentialproblems.
Howtodoit…NowthatwehavedefinedtheconnectiontoourDataServicesrepositoryandcollectedthemetadatasnapshotusingthisconnectioninCMC,wecanlaunchtheInformationStewardapplicationtousetheMetadataManagementfunctionality.
1. LogintoInformationStewardandgototheMetadataManagementsection,as
showninthefollowingscreenshot:
2. ClickontheData_Services_RepositorysourceintheDataIntegrationcategoryandontheopenedscreen,lookfortheDimGeographytableusingtheSearchfield.TheSearchResultssectionattheverybottomshowsyouallthepossiblematches,soallyouhavetodoisselecttheobjectyouneed—tablefromtheSTAGEdatabaseundertheTransformschema:
3. ToseetheimpactthetablehasonanotherobjectinETLrepository,clickontheViewImpactbutton.Youshouldseesomethinglikethefollowingscreenshot:
YoucanseethattheDimGeographytableisusedasasourcetopopulatetheotherDimGeographytables(fromAdventureWorks_DWHandDWH_backupdatabases).
4. ClickontheLINAGEsectioninthesamewindowtoseethesourceobjectfortheDimGeographytableoftheSTAGEdatabaseTransformschema,asshowninthefollowingscreenshot:
Youcanseethatthedatacamefromthreetables:ADDRESS,COUNTRYREGION,andSTATEPROVINCE.
5. ByswitchingtoColumnsMappingView,youcanseethelinageinformationonthecolumnlevel,asshowninthefollowingscreenshot:
6. ClosethiswindowtogobacktothemainMetadataManagementworkingarea.Now,let’sdefinetherelationshipbetweenthetwotablesfromtheDataServicesrepositoryarenotdirectlyrelatedtoeachotherinETLcode:STAGE.Transform.DIMGEOGRAPHYandSTAGE.Transform.DIMSALESTERRITORY.Todothat,youhavetoselecteachtableintheSearchresultssectionatthebottomandclickontheAddtoObjectTraybutton.
7. WhenbothtablesareaddedintoObjectTray,clickontheObjectTray(2)linkatthetopofthescreen(righttotheSearchfield).
8. Intheopenedwindow,selectbothobjects,asshowninthefollowingscreenshot:
9. ClickonEstablishRelationshipandconfigurethedesirablerelationshipbetweenthesetwoobjects,asshowninthefollowingscreenshot:
10. Now,ifyouclickontheViewRelatedTobutton,youcanseethattherelationshipinformationappearsonthescreen,asshowninthefollowingscreenshot:
11. ToexporttheinformationfromthisscreenintoanExcelspreadsheet,clickontheExportthetabularviewtoanExcelfilebuttoninthetop-rightcorner.
12. ChoosetheOpenwithMicrosoftExceloption,asshowninthefollowingscreenshot:
13. ThegeneratedExcelspreadsheetcouldbesenttootherbusinessusers,usedinfurtheranalysis,orsimplyusedasapieceofdocumentationforETLmetadata.
Howitworks…Metadatamanagementcanlinkinformationprovidedbymultiplesourcesinordertoperformlineageandimpactanalysisonobjects.Inourexample,weusedonlytheDataServicesrepository,butmultiplesources,suchasBusinessIntelligencemetadataareoftenimportedalongwithsourcedatabaseobjectsandtheDataServicesmetadata.Thatallowsyoutoseethefullpictureofwhatishappeningtoaspecificdataset,startingfromitsextractionfromthedatabase,whichETLtransformationsareappliedtoit,whichtargettablethetransformeddataisloadedto,and,finally,whichBIuniversesandBIreportsuseit.
Ontopofthatyoucancreateuser–customerrelationshipsbetweenobjectsthatarenotrelatedtoeachothereitherdirectlyorindirectly.
WorkingwiththeMetapediafunctionalityThinkofMetapediaasWikipediaforyourdata.Metapediaisusedtobuildahierarchyofbusinesstermsanddescriptionsforyourdata,groupthemintocategories,andevenassociateactualtechnicalobjectslikepiecesofETLcodeanddatabasetableswiththeseterms.
Inthisrecipe,wewillcreateasmallglossaryofbusinesstermsinInformationStewardandlearnhowitcanbedistributedoutsideofthesystemtobeupdatedbybusinessusersandimportedbackintoInformationSteward.
Howtodoit…1. LogintoInformationStewardandgototheMetapediasection.2. ClickontheNewCategorybuttontocreateanewcategory,Geography,asshownin
thefollowingscreenshot:
SpecifythekeywordstobeassociatedwiththecategoryforaneasysearchandclickontheSavebuttontocreatethecategory.
3. ChooseAllTermsandclickontheNewbuttontocreateanewterm,Postcode,asshowninthefollowingscreenshot:
ClickonSavetocreateitandclosethewindow.
4. Now,selectthecreatedterminthelistoftermsandclickonCategoryActions|AddtoCategory.
5. Ontheopenedcategorylistscreen,selecttheGeographycategoryandclickonOK,asshowninthefollowingscreenshot:
6. ClickontheExportMetapediatoMSExcelfilebuttonandselecttheAllTermsoption.
7. Inthepromptwindow,selectExporttermdescriptioninplaintextformat.8. Savethefileonthedisk.Now,let’sperformsomemodificationstothefileasifwe
arebusinessuserswhohavebeentoldtocreateaglossaryoftermsandcategoriesusingthisExcelspreadsheet.
9. AddthenewtermsontheBusinessTermstabofthespreadsheet,asshowninthefollowingscreenshot:
10. AddthenewcategoriesontheBusinessCategoriestabofthespreadsheet,asshowninthefollowingscreenshot:
11. GobacktoInformationSteward|MetapediaandclickonImportMetapediafromMSExcelfile.Specifythefilemodifiedinthepreviousstep,asshowninthefollowingscreenshot:
NotethatimportinginformationfromthisspreadsheetwillautomaticallyapprovealltermsandwillchangetheirstatusesfromEditingtoApproved.
12. Toassociateatermwithactualtechnicalobjects,double-clickonthespecifictermandclickontheActions|Associatewithobjectsbuttononthetermeditorscreen.SelecttheobjectsyouwanttoassociatewiththetermonebyonebyclickingontheAssociatewithtermbutton.ClickonDoneafteryouhavefinished.
13. Wehaveassociatedtwoobjects,tableCITYandparameter$p_City,fromourDataServicesrepositorywiththetermCity,asshowninthefollowingscreenshot:
Howitworks…ThemainfunctionofMetapediaistoprovideaglossarytobrowseandunderstandthedatapresentedandcategorizedinclearbusinessterms.Inotherwords,thepurposeofMetapediaistoprovideacleartranslationoftechnicaltermsintotermsthatcouldbeunderstoodbybusiness.
Itisasimplebutveryefficientsolution,andinthisrecipe,wedemonstratedhowasimpleglossarycanbecreatedinInformationStewardMetapedia,andthenexportedintoaspreadsheetfordistributionandimportedbackwithupdatedinformation.
ThisisveryusefulifyouneedtogatherthiskindofinformationfromuserswhodonothaveknowledgeoraccesstoInformationStewardandcreatetermsandcategoriesdirectlyinthesystem.
CreatingacustomcleansingpackagewithCleansingPackageBuilderInChapter7,ValidatingandCleansingData(seetherecipeDataQualitytransforms–cleansingyourdata),wealreadyusedthedefaultcleansingpackagePERSON_FIRMavailableinDataServicesfordatacleansingtasks.
Inthisrecipe,wewillcreateanewcleansingpackagefromscratchwiththehelpofInformationStewardandpublishitsothatitcanbeusedinDataServicestransforms.
OurnewcustomcleansingpackagewillbeusedtodeterminethetypeofstreetusedintheaddressfieldoftheAddresstablefromtheOLTPdatabase.
GettingreadyTheInformationStewardCleansingPackageBuildertoolrequiresasampleflatfilewithdatathatisusedtodefinecleansingrules.Thefollowingstepsdescribehowtopreparesuchaflatfilewithsampledata.
AswearegoingtouseourcustomcleansingpackagetocleansetheOLTP.Addresstabledata,wewillgenerateoursampledatasetfromthesametable.
1. LaunchDataServicesDesignerandlogintolocalrepository.2. GotoLocalObjectLibrary|Formats|FlatFiles.3. Right-clickontheFlatFilessectionandcreateanewflatfileformat,PB_sample,as
showninthefollowingscreenshot:
4. Createanewjobandnewdataflow.Insideadataflow,puttheOLTP.ADDRESStableasasourcetable.
5. LinkthesourcetabletoQuerytransformandpropagateonlytheADDRESSLINE1columntotheoutputschema.
6. LinktheoutputofaQuerytransformobjecttothetargetfilebasedonthePB_samplefileformatcreatedearlier.
7. Saveandrunthejob.ThePB_sample.txtfileshouldappearintheC:\AW\Files\folder.
Howtodoit…Now,afterwehavecreatedasamplefile,wecanfinallystarttheInformationStewardapplicationanduseCleansingPackageBuildertocreateournewcustomcleansingpackage.
1. LaunchtheInformationStewardapplicationandgototheCleansingPackage
Builderarea.2. ClickonNewCleansingPackage|CustomCleansingPackageandspecifythe
packagenameandsampledatafileinthefirststepofpackagebuilder:
3. Step2ofpackagebuildercontainsinformationwhichhelpstoparsethesampledatacorrectly:
4. Atstep3ofthepackagebuilder,youshoulddefinethenumberofrecordstakenfromthesamplefiletobeusedinthepackagedesignprocess.Themaximumnumberofrowsis3,000.Specifytherandommechanismofobtainingrowsfromthesamplefile,andnumberofrowstogetis3,000.
5. Step4definestheparsingstrategy:
6. Atstep5,youcanchooseacategorynameandassignsuggestedattributestoitifyouwantto.Inourexample,noneofthesuggestedattributesmatchesourcategorySTREET_TYPE,sowedonottickanyofthem:
7. Atstep6,wecreateattributesforourSTREET_CATEGORYcategoryandcategorizethevaluesfoundinthesamplefileagainsttheattributes.TheStandardFormscolumndefinesthestandardizedformoftheparsedvalueandtheVariationscolumndefineswhatvariationswillbestandardizedtothevaluespecifiedintheStandardFormswindow.PleaseseeanexampleoftheconfigurationfortheDRIVE_ATTRattribute:
8. AnotherexampleistheSTREET_ATTRattribute:
YoucanseehowwehaveassignedSTREETvaluestothestandardformthatarevisuallyandsyntacticallyverydifferent,likeStraseandRue.
9. Afterstep6,youmightthinkyouhavecreatedyourpackageandthatthejobisdone.Thisisalmosttrue.Wehavejustpassedthebasiccleansingpackagebuilderwizardstepsinordertocreatethecanvasforournewpackage.Therealworkstartswhenyoudouble-clickonthepackageintheCleansingPackageBuilderareaandthepackageeditoropens.Ithastwomaineditingmodes:DesignandAdvanced.Weare
notgoingtoworkwiththeadvanceddesignmodeasitwouldtakeanotherbooktocoveralltheaspectsoffine-tuningyourcleansingpackageinthismode.
10. Inthemeantime,youhaveprobablynoticedthatourcustompackagewascreatedwiththelockicon:
11. InformationStewardneedssometimetofinishitsbackgroundprocessesofthepackagecreation,soyouhavetowaitforcoupleofminutesuntiltheiconchangestodifferentone:
12. Nowthepackageisreadytobepublished.SelectthepackageontheleftandclickonthePublishbuttoninthetoolbarmenu.Theclockicononthepackageintheright-sidepanelmeansthatInformationStewardisstillperformingbackgroundoperationsinordertopublishthepackageandmakeitavailableforusageinDataServices:
13. Whenthepackagepublicationisfinished,theiconchangesagain:
14. Youcancontinuefine-tuningyourpackagebyenteringpackageDesignmode.Thismodeshowsyoutheresultofyouractionsimmediatelyinthetableatthebottom:
Howitworks…Let’sseehowthecleansingpackagewecreatedcouldactuallybeusedinDataServicestoperformdatacleansingtasks.
1. StartDataServicesDesigner.2. Createanewjobandnewdataflow.3. ImporttheOLTP.ADDRESStableasasourcetableobject.4. LinkthesourcetabletotheQuerytransformandpropagateonlytheADDRESSLINE1
columntotheoutputschemaaswearegoingtoperformcleansingonlyonthiscolumn.
5. LinktheQuerytransformobjecttotheData_Cleansetransform,whichcanbefoundinLocalObjectLibrary|Transforms|DataQuality|Data_Cleanse.
6. OpentheimportedData_Cleanseobjectforeditinginthemainworkspacewindowandgotothefirsttab,Input.
7. MaptheinputADDRESSLINE1fieldtotheMULTILINE1transforminputfieldname:
8. GotothesecondOptionstabandconfigurethefollowingoptionsspecifyingournewcreatedAddress_Customasthecleansingpackage:
9. Finally,openthethirdtabOutputanddefinethefollowingoutputcolumnsthatwillbeproducedbyData_Cleansetransform:
10. ClosetheData_Cleansetransformobjectandlinkittonewlyimportedtemplatetable,ADDRESS_CLEANSE_STREET_TYPE,createdintheDS_STAGEdatastore.
11. Yourdataflowshouldlooklikethatinthefollowingfigure:
Afteryouhavesavedandranthejob,youcanseethatthecleansingpackage“categorized”andpopulatedcolumnshavebeencreatedforeachattributeof
STREET_CATEGORY:
Howwellacleansingpackagedoesitsjobsolelydependsonyourabilitytodefinerulesandconfigureittoaccommodateallpossiblescenariosthatcanbeseeninyourdata.
Forexample,“Circle”hasnotbeencategorizedaswesimplydidnotdefineanyruleregardingthe“Circle”value.
ThisisoneofthesimplestcasesofthecleansingtaskbutitshouldgiveyouanideaoftheInformationStewardcapabilitiesinthisarea.
Thereismore…Openacleansingpackagebydouble-clickingandgoingtotheAdvancedmodetoseehowmanyoptionsexistforcreatingandtuningcleansingrulesandalgorithms.Youcandefinenewrulesandchangethealreadycreatedonestomakeyourcleansingprocessbehavedifferently.Thecomplexityofacleansingpackageisrestrictedonlybyyourfantasyandthecomplexityoftheaccommodatedcleansingprocessrequirements.
IndexA
AccessServerconfiguring/ConfiguringAccessServer,Howtodoit…
administrativetasksRepositoryManager,using/Howtodoit…ServerManager,using/Howtodoit…CMC,usedforregisteringnewrepository/Howtodoit…LicenseManager,using/Howtodoit…
aggregatefunctionsusing/Usingaggregatefunctions,Howtodoit…,Howitworks…
auditreportingabout/BuildinganexternalETLauditandauditreporting,Howtodoit…,Howitworks…
Autocorrectloadoptionabout/ExploringtheAutocorrectloadoption,Howtodoit…,Howitworks…
B
blobdatatype/Howitworks…bulk-load
about/Optimizingdataflowloaders–bulk-loadingmethodsbulk-loadingmethods
about/Optimizingdataflowloaders–bulk-loadingmethods,Howtodoit…,Howitworks…enabling/Whentoenablebulkloading?
bypassingfeatureusing/Usingthebypassingfeature,Howtodoit…,Howitworks…
C
Casetransformused,forsplittingdataflow/SplittingtheflowofdatawiththeCasetransform,Howtodoit…,Howitworks…
CentralConfigurationManagementabout/Howtodoit…
CentralManagementConsoleabout/Introduction,Howtodoit…
CentralManagementConsole(CMC)/GettingreadyCentralObjectLibrary
objects,assigningtoandfrom/AddingobjectstoandfromtheCentralObjectLibrary
centralrepositoryETLcode,migratingthrough/MigratingETLcodethroughthecentralrepository,Gettingreadyobjects,comparingbetweenLocalandCentral/ComparingobjectsbetweentheLocalandCentralrepositories
ChangeDataCapture(CDC)about/ChangeDataCapturetechniquesNohistorySCD(Type1)/NohistorySCD(Type1)limitedhistorySCD(Type3)/LimitedhistorySCD(Type3)unlimitedhistorySCD(Type2)/UnlimitedhistorySCD(Type2)process,building/Howtodoit…,Howitworks…source-basedETLCDC/Source-basedETLCDCtarget-basedETLCDC/Target-basedETLCDCnative/NativeCDC
CleansingPackageBuilderused,forcreatingcustomcleansingpackage/CreatingacustomcleansingpackagewithCleansingPackageBuilder,Gettingready,Howtodoit…,Howitworks…
clienttoolsabout/IntroductionDesignertool/IntroductionRepositoryManager/Introduction
CodePlexURL/Howtodoit…
commandline(cmd)about/Howtodoit…
conditionalandwhileloopobjectsused,forcontrollingexecutionorder/Usingconditionalandwhileloopobjectstocontroltheexecutionorder,Gettingready,Howtodoit…,Thereismore…
connectionobject,DataInsightcreating/Creatingaconnectionobject
continuousworkflow
using/Usingacontinuousworkflow,Howtodoit…,Howitworks…,Thereismore…
conversionfunctionsusing/Usingconversionfunctions,Howtodoit…,Howitworks…
customfunctionscreating/Creatingcustomfunctions,Howtodoit…,Howitworks…
D
dataloading,intoflatfile/Loadingdataintoaflatfile,Howtodoit…,Howitworks…,There’smore…loading,fromflatfile/Loadingdatafromaflatfile,Howtodoit…,Howitworks…,There’smore…loading,fromtabletotable/Loadingdatafromtabletotable–lookupsandjoins,Howtodoit…,Howitworks…flowsplitting,Casetransformused/SplittingtheflowofdatawiththeCasetransform,Howtodoit…,Howitworks…flowexecution,monitoring/Monitoringandanalyzingdataflowexecution,Gettingready,Howtodoit…,Howitworks…flowexecution,analyzing/Monitoringandanalyzingdataflowexecution,Gettingready,Howtodoit…,Howitworks…cleansing/DataQualitytransforms–cleansingyourdata,Howtodoit…,Howitworks…,There’smore…transforming,Pivottransformused/TransformingdatawiththePivottransform,Gettingready,Howtodoit…,Howitworks…loading,intoSAPERP/LoadingdataintoSAPERP,Gettingready,Howtodoit…,Howitworks…
databaseenvironmentpreparing/Preparingadatabaseenvironment,Howitworks…
databasefunctionsusing/Usingdatabasefunctionskey_generation()function/key_generation()total_rows()function/total_rows()sql()function/sql(),Howitworks…
dataflowauditenabling/Enablingdataflowaudit,Howtodoit…,Howitworks…,There’smore…
dataflowexecution,optimizingSQLtransform/Optimizingdataflowexecution–theSQLtransform,Howtodoit…,Howitworks…Data_Transfertransform/Optimizingdataflowexecution–theData_Transfertransform,Howtodoit…,Howitworks…Data_Transfertransform,usage/WhentouseData_Transfertransform,There’smore…performanceoptions/Optimizingdataflowexecution–performanceoptions,Howtodoit…dataflowperformanceoptions/Dataflowperformanceoptionssourcetableperformanceoptions/Sourcetableperformanceoptionsquerytransformperformanceoptions/Querytransformperformanceoptionslookup_ext()performanceoptions/lookup_ext()performanceoptionstargettableperformanceoptions/Targettableperformanceoptions
dataflowflow,optimizingpush-downtechniques/Optimizingdataflowexecution–push-downtechniques,Howtodoit…,Howitworks…
dataflowloaders,optimizingbulk-loadingmethods/Optimizingdataflowloaders–bulk-loadingmethods,Howtodoit…,Howitworks…bulkloading,enabling/Whentoenablebulkloading?
dataflowperformanceoptionsabout/Dataflowperformanceoptions
dataflowreaders,optimizinglookupmethods/Optimizingdataflowreaders–lookupmethodsQuerytransformjoin,lookupwith/LookupwiththeQuerytransformjoinlookup_ext()function,lookupwith/Lookupwiththelookup_ext()functionsql()function,lookupwith/Lookupwiththesql()functionQuerytransformjoin,advantages/Querytransformjoinslookup_ext()function/lookup_ext()sql()function/sql()performancereview/Performancereview
DataInsightcapabilities,exploring/ExploringDataInsightcapabilities,Gettingready,Howtodoit…connectionobject,creating/Creatingaconnectionobjectdata,profiling/Profilingthedataprofilingresults,viewing/Viewingprofilingresultsvalidationrule,creating/Creatingavalidationrulescorecard,creating/Creatingascorecard,Howitworks…profiling/Profilingrules/Rulesscorecards/Scorecards
DataModificationLanguage(DML)operation/UsingtheMap_OperationtransformDataQualitytransforms
about/DataQualitytransforms–cleansingyourdata,Howtodoit…,Howitworks…
DataServicesclienttools/Introductionserver-basedcomponents/Introductioninstalling/InstallingandconfiguringDataServices,Howtodoit…,Howitworks…configuring/InstallingandconfiguringDataServices,Howtodoit…,Howitworks…referenceguide,URL/Howtodoit…autodocumentation/AutoDocumentationinDataServices,Howtodoit…,Howitworks…automaticjobrecovery/AutomaticjobrecoveryinDataServices,Gettingready,Howtodoit…,Howitworks…,There’smore…
DataServicesobjects
andparent-childrelationships/Peekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjects,Howitworks…objecttypeslist,gettinginDataServicesrepository/GetalistofobjecttypesandtheircodesintheDataServicesrepositoryDF_Transform_DimGeographydataflowinformation,displaying/DisplayinformationabouttheDF_Transform_DimGeographydataflowSalesTerritorytableobjectinformation,displaying/DisplayinformationabouttheSalesTerritorytableobjectscriptobjectcontent,displaying/Seethecontentsofthescriptobject
DataServicesrepositorycreating/CreatingIPSandDataServicesrepositories,Howtodoit…,Howitworks…database,creating/Howtodoit…ODBClayer,configuring/Howtodoit…
datavalidationvalidationfunctions,creating/Creatingvalidationfunctions,Howtodoit…,Howitworks…results,reporting/Reportingdatavalidationresults,Howtodoit…,Howitworks…regularexpressionsupportused/Usingregularexpressionsupporttovalidatedata,Gettingready,Howtodoit…,Howitworks…
Data_Transfertransformabout/Optimizingdataflowexecution–theData_Transfertransform,Howtodoit…,Howitworks…usage/WhentouseData_Transfertransform,There’smore…
datefunctionsusing/Usingdatefunctionscurrentdateandtime,generating/Generatingcurrentdateandtimeparts,extractingfromdates/Extractingpartsfromdates,Howitworks…,There’smore…
Designertoolabout/UnderstandingtheDesignertoolsetting/Howtodoit…defaultoptions,setting/Howtodoit…ETLcode,executing/ExecutingETLcodeinDataServicesETLcode,validating/ValidatingETLcodetemplatetables,using/Templatetablesquerytransform/QuerytransformbasicsHelloWorldexample/TheHelloWorldexample
Dropandre-createtableoption/There’smore…DSManagementConsole
about/Introduction
E
ETLorganizing/Projectsandjobs–organizingETL,Howtodoit…,Howitworks…projects/Projectsandjobs–organizingETL,Howtodoit…,Howitworks…hierarchicalobjectview/Hierarchicalobjectviewhistoryexecutionlogfiles/Historyexecutionlogfilesjobs,schedulingfromManagementconsole/Executing/schedulingjobsfromtheManagementConsolejobs,executingfromManagementconsole/Executing/schedulingjobsfromtheManagementConsole
ETLauditexternalETLaudit,building/BuildinganexternalETLauditandauditreporting,Howtodoit…,Howitworks…built-in,using/Usingbuilt-inDataServicesETLauditandreportingfunctionality,Howtodoit…,Howitworks…
ETLcodemigrating,throughcentralrepository/MigratingETLcodethroughthecentralrepository,Gettingready,Howtodoit…migrating,withexport/import/MigratingETLcodewithexport/import,Howtodoit…
ETLexecutionsimplifying,withsystemconfigurations/SimplifyingETLexecutionwithsystemconfigurations,Gettingready,Howtodoit…,Howitworks…
ETLjobdimensiontables,populating/Usecaseexample–populatingdimensiontables,Howtodoit…building/Usecaseexample–populatingdimensiontables,Howtodoit…mapping,defining/Mappingdependencies,defining/Dependenciesdevelopment/Developmentexecutionorder/Executionordertesting/TestingETLtestdata,preparingtopopulateDimSalesTerritory/PreparingtestdatatopopulateDimSalesTerritorytestdata,preparingtopopulateDimGeography/PreparingtestdatatopopulateDimGeography
executionordercontrolling,bynestingworkflows/Nestingworkflowstocontroltheexecutionorder,Howtodoit,Howitworks…controlling,conditionalandwhileloopsobjectsused/Usingconditionalandwhileloopobjectstocontroltheexecutionorder,Gettingready,Howtodoit…,Howitworks…,Thereismore…
export/import
ETLcode,migratingwith/MigratingETLcodewithexport/import,GettingreadyATLfilesused/Import/ExportusingATLfilestolocalrepository/Directexporttoanotherlocalrepository,Howitworks…
Extract-Transform-Load(ETL)about/Introductionadvantages/Introduction
F
failurescontrolling/Controllingfailures–try-catchobjects,Howtodoit…,Howitworks…
flatfiledata,loadingin/Loadingdataintoaflatfile,Howtodoit…,Howitworks…,There’smore…data,loadingfrom/Loadingdatafromaflatfile,Howtodoit…,Howitworks…
fullpushdown/Gettingready
H
Hierarchy_Flatteningtransformabout/TheHierarchy_Flatteningtransform,Gettingreadyhorizontalhierarchyflattening,performing/Horizontalhierarchyflatteningverticalhierarchyflattening/Verticalhierarchyflattening,Howitworks…resulttables,querying/Queryingresulttables
horizontalhierarchyflatteningabout/Horizontalhierarchyflattening
I
IDocabout/IDocload,monitoringonSAPside/MonitoringIDocloadontheSAPsideloadeddata,post-loadvalidation/Post-loadvalidationofloadeddata
InformationPlatformServices(IPS)configuring/InstallingandconfiguringInformationPlatformServices,Howtodoit…,Howitworks…installing/InstallingandconfiguringInformationPlatformServices,Howtodoit…,Howitworks…
/GettingreadyIPSrepository
creating/CreatingIPSandDataServicesrepositories,Howtodoit…,Howitworks…
J
jobexecutiondebugging/Debuggingjobexecution,Howtodoit…,Howitworks…monitoring/Monitoringjobexecution,Howtodoit…
jobrecovery,automaticinDataServices/AutomaticjobrecoveryinDataServices,Howtodoit…,Howitworks…,There’smore…
joinoperations*-cross-joinoperation/Howitworks…||-parallel-joinoperation/Howitworks…INNERJOIN/Howitworks…LEFTOUTERJOIN/Howitworks…
K
key_generation()function/key_generation()
L
longdatatype/Howitworks…lookupmethods
withQuerytransformjoin/LookupwiththeQuerytransformjoinwithlookup_ext()function/Lookupwiththelookup_ext()functionwithsql()function/Lookupwiththesql()function
lookup_ext()functionlookupwith/Lookupwiththelookup_ext()functionadvantages/lookup_ext()
lookup_ext()performanceoptionsabout/lookup_ext()performanceoptions
M
Map_Operationtransformusing/UsingtheMap_Operationtransform,Howtodoit…,Howitworks…
mathfunctionsusing/Usingmathfunctions,Howtodoit…,There’smore…
MetadataManagementtasksperforming/PerformingMetadataManagementtasks,Gettingready,Howtodoit…,Howitworks…
Metapediaworkingwith/WorkingwiththeMetapediafunctionality,Howtodoit…,Howitworks…
MicrosoftSQLServer2012URL/Howtodoit…
miscellaneousfunctionsusing/Usingmiscellaneousfunctions,Howitworks…
N
nestedstructuresworkingwith/Workingwithnestedstructures,Howtodoit…,Howitworks…,Thereismore…
O
objectreplicationusing/Usingobjectreplication,Howitworks…
OLTPdatastore/Howtodoit…
P
parameterscreating/Creatingvariablesandparameters,Howtodoit…,Howitworks…
parent-childrelationshipsbetweenDataServicesobjects/Peekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjects,Gettingready
partialpushdown/Gettingreadyperformanceoptions
about/Optimizingdataflowexecution–performanceoptions,Howtodoit…Pivottransform
used,fortransformingdata/TransformingdatawiththePivottransform,Gettingready,Howtodoit…,Howitworks…
profilingdata/Profilingprofilingresults,DataInsight
viewing/Viewingprofilingresultspush-downoperations
about/Optimizingdataflowexecution–push-downtechniques,Howitworks…partialpushdown/Gettingreadyfullpushdown/Gettingready
Q
Querytransformjoinlookupwith/LookupwiththeQuerytransformjoin
querytransformjoinsadvantages/Querytransformjoins
Querytransformperformanceoptionsabout/Querytransformperformanceoptions
R
real-timejobscreating/Creatingreal-timejobsSoapUI,installing/InstallingSoapUI,Howtodoit…,Howitworks…
regularexpressionsupportused,forvalidatingdata/Usingregularexpressionsupporttovalidatedata,Gettingready,Howtodoit…,Howitworks…
replicationprocessabout/Howitworks…
rules/Rules
S
SAPERPdata,loadinginto/LoadingdataintoSAPERP,Gettingready,Howtodoit…,Howitworks…URL/LoadingdataintoSAPERP
SAPInformationStewardabout/IntroductionURL/Introduction
scorecard,DataInsightcreating/Creatingascorecard,Howitworks…
scorecards/Scorecardsscript
creating/Creatingascript,Howtodoit…,Howitworks…stringfunctions,using/Usingstringfunctionsinthescript,Howitworks…
server-basedcomponentsIPSServices/IntroductionJobServer/Introductionaccessserver/Introductionwebapplicationserver/Introduction
servicesstarting/Startingandstoppingservices,Howtodoit…,Seealsostopping/Startingandstoppingservices,Howtodoit…,Seealsowebapplicationserver/Howtodoit…DataServicesJobServer/Howtodoit…InformationPlatformServices/Howtodoit…
SlowlyChangingDimensions(SCD)about/Gettingready
SoapUIinstalling/InstallingSoapUI,Howtodoit…,Howitworks…URL/InstallingSoapUI
sourcedataobjectcreating/Creatingasourcedataobject,Howtodoit…,Howitworks…
sourcesystemdatabasecreating/Creatingasourcesystemdatabase,There’smore…
sourcetableperformanceoptionsabout/Sourcetableperformanceoptions
sql()function/sql(),Howitworks…lookupwith/Lookupwiththesql()functionabout/sql()
SQLtransformabout/Optimizingdataflowexecution–theSQLtransform,Howtodoit…,Howitworks…
stagingareastructuresdefining/Definingandcreatingstagingareastructures
creating/Definingandcreatingstagingareastructuresflatfiles/FlatfilesRDBMStables/RDBMStables,Howitworks…
stringfunctionsusing/Usingstringfunctions,Howtodoit…using,inscript/Usingstringfunctionsinthescript,Howitworks…
systemconfigurationsused,forsimplifyingETLexecution/SimplifyingETLexecutionwithsystemconfigurations,Howtodoit…,Howitworks…
T
Table_Comparisontransformusing/UsingtheTable_Comparisontransform,Gettingready,Howtodoit…,Howitworks…
targetdataobjectcreating/Creatingatargetdataobject,Howtodoit…,There’smore…
targetdatawarehousecreating/Creatingatargetdatawarehouse,Howitworks…,There’smore…
targettableperformanceoptionsabout/Targettableperformanceoptions
tasksadministering/Administeringtasks,Howtodoit…,Seealso
total_rows()function/total_rows()try-catchobjects
about/Controllingfailures–try-catchobjects,Howtodoit…,Howitworks…
U
useraccessconfiguring/Configuringuseraccess,Howtodoit…,Howitworks…
V
validationfunctionscreating/Creatingvalidationfunctions,Howtodoit…,Howitworks…using,withvalidationtransform/UsingvalidationfunctionswiththeValidationtransform,Howtodoit…,Howitworks…
validationrule,DataInsightcreating/Creatingavalidationrule
validationtransformvalidationfunctions,usingwith/UsingvalidationfunctionswiththeValidationtransform,Howtodoit…,Howitworks…
variablescreating/Creatingvariablesandparameters,Howtodoit…,Howitworks…
verticalhierarchyflatteningabout/Verticalhierarchyflattening,Howitworks…
W
workflowobjectcreating/Creatingaworkflowobject,Howtodoit…,Howitworks…
workflowsnesting,tocontrolexecutionorder/Nestingworkflowstocontroltheexecutionorder,Howtodoit,Howitworks…
X
XML_Maptransformabout/TheXML_Maptransform,Howtodoit…,Howitworks…
TableofContentsSAPDataServices4.xCookbook
Credits
AbouttheAuthor
AbouttheReviewers
www.PacktPub.com
Supportfiles,eBooks,discountoffers,andmore
Whysubscribe?
FreeaccessforPacktaccountholders
InstantupdatesonnewPacktbooks
Preface
Whatthisbookcovers
Whatyouneedforthisbook
Whothisbookisfor
Sections
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
Conventions
Readerfeedback
Customersupport
Downloadingtheexamplecode
Downloadingthecolorimagesofthisbook
Errata
Piracy
Questions
1.IntroductiontoETLDevelopment
Introduction
Preparingadatabaseenvironment
Gettingready
Howtodoit…
Howitworks…
Creatingasourcesystemdatabase
Howtodoit…
Howitworks…
There’smore…
Definingandcreatingstagingareastructures
Howtodoit…
Flatfiles
RDBMStables
Howitworks…
Creatingatargetdatawarehouse
Gettingready
Howtodoit…
Howitworks…
There’smore…
2.ConfiguringtheDataServicesEnvironment
Introduction
CreatingIPSandDataServicesrepositories
Gettingready…
Howtodoit…
Howitworks…
Seealso
InstallingandconfiguringInformationPlatformServices
Gettingready…
Howtodoit…
Howitworks…
InstallingandconfiguringDataServices
Gettingready…
Howtodoit…
Howitworks…
Configuringuseraccess
Gettingready…
Howtodoit…
Howitworks…
Startingandstoppingservices
Howtodoit…
Howitworks…
Seealso
Administeringtasks
Howtodoit…
Howitworks…
Seealso
UnderstandingtheDesignertool
Gettingready…
Howtodoit…
Howitworks…
ExecutingETLcodeinDataServices
ValidatingETLcode
Templatetables
Querytransformbasics
TheHelloWorldexample
3.DataServicesBasics–DataTypes,ScriptingLanguage,andFunctions
Introduction
Creatingvariablesandparameters
Gettingready
Howtodoit…
Howitworks…
There’smore…
Creatingascript
Howtodoit…
Howitworks…
Usingstringfunctions
Howtodoit…
Usingstringfunctionsinthescript
Howitworks…
There’smore…
Usingdatefunctions
Howtodoit…
Generatingcurrentdateandtime
Extractingpartsfromdates
Howitworks…
There’smore…
Usingconversionfunctions
Howtodoit…
Howitworks…
There’smore…
Usingdatabasefunctions
Howtodoit…
key_generation()
total_rows()
sql()
Howitworks…
Usingaggregatefunctions
Howtodoit…
Howitworks…
Usingmathfunctions
Howtodoit…
Howitworks…
There’smore…
Usingmiscellaneousfunctions
Howtodoit…
Howitworks…
Creatingcustomfunctions
Howtodoit…
Howitworks…
There’smore…
4.Dataflow–Extract,Transform,andLoad
Introduction
Creatingasourcedataobject
Howtodoit…
Howitworks…
There’smore…
Creatingatargetdataobject
Gettingready
Howtodoit…
Howitworks…
There’smore…
Loadingdataintoaflatfile
Howtodoit…
Howitworks…
There’smore…
Loadingdatafromaflatfile
Howtodoit…
Howitworks…
There’smore…
Loadingdatafromtabletotable–lookupsandjoins
Howtodoit…
Howitworks…
UsingtheMap_Operationtransform
Howtodoit…
Howitworks…
UsingtheTable_Comparisontransform
Gettingready
Howtodoit…
Howitworks…
ExploringtheAutocorrectloadoption
Gettingready
Howtodoit…
Howitworks…
SplittingtheflowofdatawiththeCasetransform
Gettingready
Howtodoit…
Howitworks…
Monitoringandanalyzingdataflowexecution
Gettingready
Howtodoit…
Howitworks…
There’smore…
5.Workflow–ControllingExecutionOrder
Introduction
Creatingaworkflowobject
Howtodoit…
Howitworks…
Nestingworkflowstocontroltheexecutionorder
Gettingready
Howtodoit
Howitworks…
Usingconditionalandwhileloopobjectstocontroltheexecutionorder
Gettingready
Howtodoit…
Howitworks…
Thereismore…
Usingthebypassingfeature
Gettingready…
Howtodoit…
Howitworks…
Thereismore…
Controllingfailures–try-catchobjects
Howtodoit…
Howitworks…
Usecaseexample–populatingdimensiontables
Gettingready
Howtodoit…
Howitworks…
Mapping
Dependencies
Development
Executionorder
TestingETL
PreparingtestdatatopopulateDimSalesTerritory
PreparingtestdatatopopulateDimGeography
Usingacontinuousworkflow
Howtodoit…
Howitworks…
Thereismore…
Peekinginsidetherepository–parent-childrelationshipsbetweenDataServicesobjects
Gettingready
Howtodoit…
Howitworks…
GetalistofobjecttypesandtheircodesintheDataServicesrepository
DisplayinformationabouttheDF_Transform_DimGeographydataflow
DisplayinformationabouttheSalesTerritorytableobject
Seethecontentsofthescriptobject
6.Job–BuildingtheETLArchitecture
Introduction
Projectsandjobs–organizingETL
Gettingready
Howtodoit…
Howitworks…
Hierarchicalobjectview
Historyexecutionlogfiles
Executing/schedulingjobsfromtheManagementConsole
Usingobjectreplication
Howtodoit…
Howitworks…
MigratingETLcodethroughthecentralrepository
Gettingready
Howtodoit…
Howitworks…
AddingobjectstoandfromtheCentralObjectLibrary
ComparingobjectsbetweentheLocalandCentralrepositories
Thereismore…
MigratingETLcodewithexport/import
Gettingready
Howtodoit…
Import/ExportusingATLfiles
Directexporttoanotherlocalrepository
Howitworks…
Debuggingjobexecution
Gettingready…
Howtodoit…
Howitworks…
Monitoringjobexecution
Gettingready
Howtodoit…
Howitworks…
BuildinganexternalETLauditandauditreporting
Gettingready…
Howtodoit…
Howitworks…
Usingbuilt-inDataServicesETLauditandreportingfunctionality
Gettingready
Howtodoit…
Howitworks…
AutoDocumentationinDataServices
Howtodoit…
Howitworks…
7.ValidatingandCleansingData
Introduction
Creatingvalidationfunctions
Gettingready
Howtodoit…
Howitworks…
UsingvalidationfunctionswiththeValidationtransform
Gettingready
Howtodoit…
Howitworks…
Reportingdatavalidationresults
Gettingready
Howtodoit…
Howitworks…
Usingregularexpressionsupporttovalidatedata
Gettingready
Howtodoit…
Howitworks…
Enablingdataflowaudit
Gettingready
Howtodoit…
Howitworks…
There’smore…
DataQualitytransforms–cleansingyourdata
Gettingready
Howtodoit…
Howitworks…
There’smore…
8.OptimizingETLPerformance
Introduction
Optimizingdataflowexecution–push-downtechniques
Gettingready
Howtodoit…
Howitworks…
Optimizingdataflowexecution–theSQLtransform
Howtodoit…
Howitworks…
Optimizingdataflowexecution–theData_Transfertransform
Gettingready
Howtodoit…
Howitworks…
WhyweusedasecondData_Transfertransformobject
WhentouseData_Transfertransform
There’smore…
Optimizingdataflowreaders–lookupmethods
Gettingready
Howtodoit…
LookupwiththeQuerytransformjoin
Lookupwiththelookup_ext()function
Lookupwiththesql()function
Howitworks…
Querytransformjoins
lookup_ext()
sql()
Performancereview
Optimizingdataflowloaders–bulk-loadingmethods
Howtodoit…
Howitworks…
Whentoenablebulkloading?
Optimizingdataflowexecution–performanceoptions
Gettingready
Howtodoit…
Dataflowperformanceoptions
Sourcetableperformanceoptions
Querytransformperformanceoptions
lookup_ext()performanceoptions
Targettableperformanceoptions
9.AdvancedDesignTechniques
Introduction
ChangeDataCapturetechniques
Gettingready
NohistorySCD(Type1)
LimitedhistorySCD(Type3)
UnlimitedhistorySCD(Type2)
Howtodoit…
Howitworks…
Source-basedETLCDC
Target-basedETLCDC
NativeCDC
AutomaticjobrecoveryinDataServices
Gettingready
Howtodoit…
Howitworks…
There’smore…
SimplifyingETLexecutionwithsystemconfigurations
Gettingready
Howtodoit…
Howitworks…
TransformingdatawiththePivottransform
Gettingready
Howtodoit…
Howitworks…
10.DevelopingReal-timeJobs
Introduction
Workingwithnestedstructures
Gettingready
Howtodoit…
Howitworks…
Thereismore…
TheXML_Maptransform
Gettingready
Howtodoit…
Howitworks…
TheHierarchy_Flatteningtransform
Gettingready
Howtodoit…
Horizontalhierarchyflattening
Verticalhierarchyflattening
Howitworks…
Queryingresulttables
ConfiguringAccessServer
Gettingready
Howtodoit…
Howitworks…
Creatingreal-timejobs
Gettingready
InstallingSoapUI
Howtodoit…
Howitworks…
11.WorkingwithSAPApplications
Introduction
LoadingdataintoSAPERP
Gettingready
Howtodoit…
Howitworks…
IDoc
MonitoringIDocloadontheSAPside
Post-loadvalidationofloadeddata
Thereismore…
12.IntroductiontoInformationSteward
Introduction
ExploringDataInsightcapabilities
Gettingready
Howtodoit…
Creatingaconnectionobject
Profilingthedata
Viewingprofilingresults
Creatingavalidationrule
Creatingascorecard
Howitworks…
Profiling
Rules
Scorecards
Thereismore…
PerformingMetadataManagementtasks
Gettingready
Howtodoit…
Howitworks…
WorkingwiththeMetapediafunctionality
Howtodoit…
Howitworks…
CreatingacustomcleansingpackagewithCleansingPackageBuilder
Gettingready
Howtodoit…
Howitworks…
Thereismore…
Index