introduction to configuation and management for sas® grid...
TRANSCRIPT
PaperSAS1117-2017IntroductiontoConfigurationandManagement
forSAS®GridManagerforHadoopMarkLochbihler,HortonworksInc.,SantaClara,CA
ABSTRACT
"HowcanweruntraditionalSASÒjobs,includingSASworkspaceserversonHadoopworkernodes?"TheanswerisSASÒGridManagerforHadoop.IthasbeenintegratedwiththeHadoopecosystemtoprovideresourcemanagement,highavailabilityandenterpriseschedulingforSAScustomers.Thispaperwillprovideanintroductionarchitect,configurationandmanagementofSASGridManagerforHadoop.AnyoneinvolvedwithSASandHadoopshouldfindtheinformationinthispaperuseful.ThefirstareatobecoveredwillbeabreakdowneachrequiredSASandHadoopcomponent.FromtheHadoopecosystem,wewilldefinetheroleofYARNCompute,HDFSStorage,andHadoopClientservices.WewillreviewSASmetadatadefinitionsforSASGridManager,ObjectSpawnerandGridWorkspaceServers.WewillcoverrequiredKerberossecurity,aswellasSASEnterpriseGuideandtheSASGSUButility.YARNqueuesandtheSASGridPolicyfileforoptimizingjobschedulingwillbereviewed.Andfinally,wewilldiscusstraditionalSASmathrunningonaHadoopWorkernode,andhowitcantakeadvantageofSASHighPerformanceMathtoacceleratejobexecution.ByleveragingSASGridManagerforHadoop,sitesaremovingSASjobsinsideaHadoopCluster.Thiswillultimatelycutdownondatamovementandprovidemoreconsistentjobexecution.AlthoughthispaperiswrittenfortheSASandHadoopAdministrators,SASUserswillalsobenefitfromthissession.
WHATISSASGRIDCOMPUTING?
SASGridComputinghasbeenofferingSASshopsalowercost,shared,multi-tenant,highperformingcomputingenvironmenttomeettheiradvancedanalyticandmodelingneeds.ByimplementingaSASGrid,SASadministratorscancentralizeindividualandordepartmentalSAScomputingenvironmentsontoaSASComputeGridandbetterutilizeITresources,providehighavailabilityandacceleratedprocessing.ASASGridrunsontwoormoreSASGridComputeNodes.EachSASGridComputeNodeisacandidatetoexecuteSASjobssubmittedintoaGridqueuebySASusergroupsatasite.
ENTERHADOOPANDYARN
ThebenefitsofSASGridComputingarenotnewtotheSASUserCommunity.Ithasbeensuccessfullyimplementedandrunninginproductionatthousandsofcustomers’sitesaroundtheworldforwelloveradecadeandprovidessignificantbenefits.What’snewwithSASGridManagerforHadoopisthatitoffersYARNasanorchestrationoptioninadditiontotheexistingandwell-provenuseofthePlatformSuiteforSASwhichincludesLSF.SASGridManagerforHadoopwasdesignedtoenablecustomerstoco-locatetheirSASGridandassociatedSASworkloadsontheirneworexistingSASHadoopclusters.
WHYMOVESASJOBSINSIDEHADOOP?
Adecisiontodevelopthistypeofsolutiontypicallystartsbyidentifyingbusinessneedsandassociatedusagecasestosupporttheseneeds.AnycustomerinterestedloweringoverallSASstorageandservercosts,whileatthesametimeconsolidatingandcentralizingdepartmentalSASdatasetsisacandidate.AninitialstepcanbetomigrateselectedSASWorkloaddata,includingrawinboundandoutboundfiles,SASlibrariesandotherRDBMSSASdatasourcestoHadoopStorage.JustbytakingthefirststeptomoveSASstoragetoHadoop,organizationscansaveover50%inannualSASstoragecosts.Fornewmodeldeveloporexistingmodeloptimizationeffortsthatplanon
usingIoTHadoopdatasources(includingSensor,ClickStream,WebLog,MachineandIoTdevice),thenmovingexistingSASdatasetstoApacheHiveandHadoopStorage(HDFS)couldyieldsignificantcostsavings.ToaccessthesemigratedSASdatasets,userscanturntoSASÒAccesstoHadoop.ItoffersaSASlibnameengineforHDFS,aswellasHive,andafilenamestatementtoHDFS.Oncethedataismigrated,carefullyselectedSASWorkloadscanbemovedontoHadoopWorkernodesandSASuserscantransparentlyleverageSASGridManagerforHadooptoruntheirdailySASjobs.OncethedecisionhasbeenmadetomovenewandorexistingSASdataandworkloadstoHadoop,andleverageSASGridManager,itishighlyrecommendedthatyoursiteinvestupfrontinSASandHadoopadministrativetrainingandprofessionalservices.ItisalsocriticalthatyouhaveadetailedunderstandingofhowYARNconfigurationandschedulingworksonHadoop.Inaddition,involvingSASandHortonworksProfessionalservicesduringtheprojectstartupphaseinimportant.
HOWITWORKS
YARN101 IfyouarenewtoHadoopandYARN,letmeprovideabitofbackground.YARNistheorchestrationengineforHadoop2.x.Figure1belowisahigh-levelviewofanEnterpriseDataLake.WithYARNastheasthecentralorchestratorandoperatingsystemforHadoop,sitescanrunmultipleBatch,InteractiveandRealtimecomputeengineswithinthesameHadoopcluster(includingSASGridManagerasindicatedbythearrow).HadoophasbeenofferingaLowCost,MassiveScaleStorageandComputeArchitectureforoveradecade.WithSASGridManagerforHadoop,SASuserscanruntraditionalSASjobsonHadoopWorkernodes,insidethecluster.
Figure1:AHadoopClusterrunningBatch,InteractiveandRealTimeEngines,includingSASGridManager.WithSASGridManagerforHadoop,acommunityofSASuserstransparentlyleveragingSASClientsandsubmitinteractiveandbatchSASjobstotheSASGridComputinginfrastructureonHadoop.ThesejobsarescheduledbyYARNbasedonqueuesandsitepoliciestorunonanoptimalSASGridComputeNode(HadoopWorkerNode).BelowisaConceptualViewofthearchitecture:
Clickstream Web&Social
Geolocation Sensor& Machine
ServerLogs
Unstructured
SO
UR
CE
S
Existing Systems
ERP CRM SCM
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization& Dashboards
AN
ALY
TIC
S
Applications Business Analytics
Visualization& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch SAS GridManager
Batch BatchMPP
EDW
Figure2:SASGridManagerforHadoopConceptualArchitecture(Reference:SASGridManagerforHadoop)
BREAKINGDOWNTHISSASARCHITECTUREONHADOOPWithSASGridManagerforHadoop,SASjobsarescheduledbyYARN’sResourceManagerintoaclusterrunninginsideoftheHadoopfirewall.SASjobscanrequestadditionalHadoopresourcesaftertheyhavebeeninitiallylaunched.ThisnewarchitecturecanreducethecomplexityofconfigurationbysimplifyingportmappingbetweenSASjobsandtheservicestheywillneedtocomplete.Inthisdeploymentmodel,SASmathisrunningclosertotheHadoopdataandnolongerrequiresnegotiationwiththeHadoopfirewalltoaccess,ifrequested,additionalHadoopandSASHighPerformanceserviceports.
Figure3:SASGridManagerforHadoopwYARNArchitectureOverviewTheYARNResourceManager’sroleinFigure3aboveistodeterminethemostoptimalHadoopWorkerNodestorunatraditionalSASjoborWorkspaceServerintheHadoopcluster.Inthisillustration,youcanseetwoYARNjobsalreadyrunningonthecluster.EachoftheseHadoopjobshasasingleYARNApplicationMaster(AM)Containerassigned.Job1isnotaSASjob.ItconsistsofAM1alongwiththreeadditionalJobTaskContainers(C1.1,C1.2,andC1.3).ThereisalsoaSASjobrunningonthesamecluster.ThisSASjobhasitsownApplicationMaster(SASAM1)
YARNNodeManager YARNNodeManager YARNNodeManager YARNNodeManager
Job1Container1.1
YARNNodeManager YARNNodeManager YARNNodeManager YARNNodeManager
YARNNodeManager YARNNodeManager YARNNodeManager YARNNodeManager
Job1Container1.2
Job1Container 1.3
Job1AM 1 SASAM 2
SAS Grid Manager w YARN Architecture Overview
SASClient• SASGSUB• SASBatch• SASEG
YARNResourceManager
YARNCapacityScheduler
SASGridControlServer
SASObjectSpawner
SAS MetadataServer
SASContainer2.1
andoneSASJobTaskContainer(SASC2.1).Fromanarchitectureandadministrativeperspective,letstakealookattheprimarySASandHadoopservicesthatwillbeconfigured.KEYSASGRIDARCHITECTURECOMPONENTSSASMetadataServer–ASASservicesupporting,amongotherobjects,thelogicaltophysicalmappingofSASLogicalServerstoYARN.Inourexamples,wewillbeusingthelocalSASServer,SASGrid.SASGridControlServer–ASASservicerunningontheYARNResourceManagernode.ItiscalledbySASclientstocommunicatewithYARNResourceManagertonegotiateresourcesforSASjobs.SASObjectSpawner–ASASServicerunningontheYARNResourceManagernode.ItisusedtolaunchSAScontainswithYARN.
Figure4:ViewofSASManagementConsole,withexpandedSASGridLogicalServer.SASClients–includesSASbatchjobs,SASGSUB(abatchgridutility),andinteractiveClientslikeSASEnterpriseGuide.SASClientswill“Connect”and“Disconnect”fromaSASLogicalServer,likeSASGrid,definedinSASMetadataandshownaboveinFigure4.KEYHADOOPINTEGRATIONPOINTSFORSASGRIDYARNResourceManager–AHadoopYARNMasterServiceresponsibleforcontrollingglobalHadoopclusterresourceusage.ResourceManagerenablesmulti-tenancyandSLAs.ItisalsoresponsibleformonitoringNodeManagerState,submittingApplicationMasterrequests,verifyingcontainerlaunchandmonitoringApplicationMasterstate.
YARNNodeManager–ThisHadoopYARNWorkerNodeServicemanageslocalresourcesonbehalfoftherequestingservice.ItalsotracksnodehealthandcommunicatesstatustotheResourceManager.YARNCapacityScheduler–AHadoopYARNservice,whichcanbeconfiguredtoprovideJobSchedulingpoliciesforSLAs,Users,Groups,andResources.HadoopDataNodes–HadoopDistributedFileSystem(HDFS)storagenodes.KerberosService–TheHadoopclustermustbeKerborized.HADOOPMASTERNODEDECISIONSASASGridControlServerandSASObjectSpawnermustbedeployedonthesameHadoopMasterNodeastheYARNResourceManager.HADOOPWORKERNODEDECISIONSSASHOMEandSASCONFIGForeachHadoopWorkerNodewhichisacandidatetorunSASjobsmustbeconfiguredsothatSASHOMEandSASCONFIGareavailable.SASWORKandSASUTILItiscriticalthateachHadoopWorkernodewhichisacandidatetorunSASjobsisconfiguredcorrectly.AlargepartoftheI/OrequiredwhenrunningSASanalyticsistothescratchortemporarylocationsofSASWORKandSASUTIL.SASrequiredI/Othroughputforthesefilesystems,toprovidethenecessaryperformancetoaheavilyloadedsystem,is100MB/sec/core.AdequatesizingforSASWORKisalsonecessary.TraditionalStorageandComputeverseComputeOnlyWorkerNodesWithinHadoop,itisacommonpracticetohavedualpurposeworkernodeswhichrunmathorprogramsnearonthesamenodeswheretheHadoopdataresides.WithHadoop2.x,theconceptofdedicatedComputeOnlyHadoopWorkerNodesisanoption.ForSASGridManagerforHadoop,bothoptionsareanoption.ForComputeOnly,theseHadoopWorkerNodeswillnolongerhosttherequiredservicesanddataforHDFS,givingmorecomputingresourcesdedicatedtotheprogramsrunningonthesenodes.ThetradeoffforComputeOnlyHadoopNodesisthelossofHDFSdatalocality.YoursitesSASworkloadrequirementswilldeterminewhichtypeofWorkerNodestodeployforSASGridManagerforHadoop.
REALWORLDCONFIGURATIONEXAMPLEInthissection,wewillwalkthroughaSASGridManagerforHadoopconfigurationexercise.Forthisdeployment,ourHadoopClustercurrentlyhastwenty-eightHadoopWorkerNodes.Eachofthesenodeshas256GBofRAM.AftersubtractingtherequiredRAMneededtoruntheOSandadditionalkeyservices,thereis192GBofRAMleftforHadoopcontainers.ThismeansthatthetotalclusterwideRAMcapacityforHadoopContainers5.376TB.Wehavedecidedtoallocate50%ofthisRAMtoSASjobsor2.688TB.
TotalRAMPerCluster
Node
AvailableContainerRAM
PerNode
#WorkerNodesinCluster
TotalContainer
RAMAvailable
AmountofClusterRAMAllocatedto
SASQueue
TotalContainerRAMforSAS
Queue
256GB 192GB 28 5.376TB 50% 2.688TBTable1:TotalClusterYARNContainerRAMAvailableforSASUsersForourexercise,wehadpreviouslydeterminedthattheaveragenumberofSASbatchorinteractivesessionsforeachactive,concurrentuseratoursiteistwo.FromaHadoopperspective,foreachoftheseSASjobs,weknowthatYARNwillspawnsatleasttwocontainers(anApplicationMasterandaJobTaskContainer).AndonceSASjobsarerunninginHadoop,theycancalladditionalHadoop(ie.Pig,Hive,MapReduce)andSASSASHP(ie.prochpsummary,hpmeans,hpfreq)services.TheaveragenumberofadditionalHadoopContainersspawnedfromeachSASjobrunningonHadoopisfour.Thismeanswecananticipateeachactive,concurrentSASusertobeallocating,onaverage,eightYARNContainers.
Avg#ofBatchJobsorInteractiveSessionsper
SASUser
#ContainersPerJoborSession
Avg#ofAdditionalHadoopContainersSpawnedfrominitial
SASJobContainer
AvgTotal#ofContainersperSAS
user
2 2 4 8Table2:AverageNumberofContainersperSASUsersNowthatweknow,onaverage,eachactiveSASuserwillallocateeightcontainers,andwehave2.688TBoftotalclusterRAMavailable,weneedtodeterminethetotalnumberofconcurrentSASuserswhocanrunatanyonetimeonourHadoopcluster.OneadditionalfactortoconsideristhatnotallSASusersneedthesameamountofcomputingresources.Inourexercise,wehavethreeSASuserresourcecategories(Low,Medium,andHigh).EachuserfallingintotheLowcategory,bydefault,willallocate2GBYARNContainersfortheirjobs.TheSASMediumcategorywilldefaulttoaYARNContainerSizeof4GB.AndtheHighcategorywilldefaultto8GBContainersize.Withthesenumbersinmind,wewillbetohave134concurrentSASusersonoursystemasindicatedbelow.
SASAppType(UserType) ContainerSize
Anticipated%ofUsersTypeon
Server
AvailableCluster
MemoryforSASjobs
Max#ofContainers
Avg#ContainersPerSASUser
Total#ofSASUsers
Low(General/Analyst) 2GB 70% 1.881TB 940 8 117
Medium 4GB 20% 537GB 134 8 16
High 8GB 10% 268GB 33 8 4
Totals 2.688TB 1107 134Table3:BreakdownofSASApplicationTypestobeconfiguredforSASUsers
YARNCapacitySchedulerInthisexercise,weareallocating50%oftheHadoopCluster’sRAMtoSASUsers.WewillusetheYARNCapacitySchedulertosetupthispolicy.WeareabletosetupHierarchicalYARNQueues,includingsas94_queue,asafirstclusterwideconfigurationstep.
Figure5:YARNCapacitySchedulerLogicalView-SASQueue-50%HadoopClusterRAM
Below is a view of the YARN Capacity Scheduler UI, we will see that the sas94_queue has 50% cluster RAMallocated.SASUsersLinuxidsshouldalsobemappedtothisqueue.
Figure6:YARNCapacitySchedulerAdminView-SASQueue-50%HadoopClusterRAM
YARN Capacity SchedulerExample: 50% of Cluster RAM allocated to SAS Queue
ResourceManager
Scheduler
root
Adhoc30%
SAS50%
Mrkting20%
Dev10%
Reserved20%
Prod70%
Prod80%
Dev20%
P070%
P130%
Capacity Scheduler
HierarchicalQueues
OurHadoopclusterhasYARNMinimumContainerSizesetclusterwideto2GB.WearegoingtoneedtoimplementamechanismtoadjustContainersizebasedonSASUserresourceneeds.Wewillusethesasgrid-policy.xmlconfigurationfileandSASMetadataUserandGroupmappingstoaccomplishthistask.First,wewillreviewasasgrid-policy.xmlfilewhichwillensurethatSASusersobtaintherightamountofmemoryfortheirjobs.WithinthisSASGridconfigurationfile,youwillseethreeGridApplicationTypes:Low,Medium,andHigh.
SASGridPolicyFileExample<?xmlversion="1.0"encoding="UTF-8"standalone="yes"?><GridPolicydefaultAppType="low"> <GridApplicationTypename="normal"> <jobname>NormalJob</jobname>
<priority>20</priority> <nice>10</nice> <memory>2048</memory> <vcores>1</vcores> <runlimit>480</runlimit> <queue>default</queue> <hosts> <hostGroup>development</hostGroup> </hosts> </GridApplicationType> <GridApplicationTypename="low"> <jobname>SASLow</jobname> <priority>10</priority> <nice>0</nice> <memory>2048</memory> <vcores>1</vcores> <runlimit>480</runlimit> <queue>sas94_queue</queue> <hosts> <hostGroup>sas94_work</hostGroup> </hosts></GridApplicationType> <GridApplicationTypename="medium"> <jobname>SASMedium</jobname> <priority>10</priority> <nice>0</nice> <memory>4096</memory> <vcores>1</vcores> <runlimit>480</runlimit>
<queue>sas94_queue</queue> <hosts> <hostGroup>sas94_work</hostGroup> </hosts></GridApplicationType> <GridApplicationTypename="high"> <jobname>SASHigh</jobname> <priority>10</priority> <nice>0</nice> <memory>8192</memory> <vcores>1</vcores> <runlimit>480</runlimit> <queue>sas94_queue</queue> <hosts> <hostGroup>sas94_work</hostGroup> </hosts></GridApplicationType><HostGroupname="development"> <host>partnerd1worker-1.field.hortonworks.com</host> <host>partnerd1worker-2.field.hortonworks.com</host> <host>partnerd1worker-3.field.hortonworks.com</host> <host>partnerd1worker-4.field.hortonworks.com</host> <host>partnerd1worker-5.field.hortonworks.com</host></HostGroup><HostGroupname="sas94_work"> <host>partnerd1worker-4.field.hortonworks.com</host>
<host>partnerd1worker-5.field.hortonworks.com</host></HostGroup> </GridPolicy>Theabovesasgrid-policy.xmlfileisasample.ItillustrateshowwecansetupGridApplicationTypesfordifferentSASusergroups.NotethatwehavespecifiedaYARNContainermemoryandtheSAS94QueueforeachAppType.WethatwehaveconfiguredtheappropriateGridApplicationTypes,weneedawaytomapSASApplications(SASGridManagerClientUtility,SASEnterpriseGuide,others)totheseSASGridApplicationTypes.ThiscanbedoneusingtheGridOptionsSetMappingWizardwithintheSASManagementConsole.ThisrequireseachSASApplicationtoberegisteredtotheSASMetadataServerandconfiguredfor“IsGridEnabled”.Oncethisisdone,usethefollowingSASGridServerPropertiesOptionsTabforyourLogicalGridServer(SASGridinourcase)andconfigureSASusersandgroupstotheirappropriatedefaultGridAppTypefoundinsasgrid-policy.xmlabove.
Figure7:ConfiguringSASMetadataGroupstoSASGridApplicationTypes.SASClientIdleandRuntimeConfigurationsItisrecommendedthatyousetthefollowingthresholdsforSASusersasadefaultandadjustaccordingtoyoursidesworkloadneeds.ThiswillfreeupidleresourcesforotherHadoopusers.ItisrecommendedtobeveryconservativeinthesesettingssothatyoudonotinadvertentlydisruptSASuserproductivity.
o SASClientIdleTimeout–inourexample–wesetto120minutes§ YoucancontroltheamountoftimeaclientsuchasSASEnterpriseGuidecanstayconnected
butnotactiveusingtheInactiveClienttimeoutsetting.o SASRunLimitTimeout–inourexample–wesetto480minutes
§ Withinthesasgrid-policy.xmlfile,youcancontrolthisusingthe“runlimit=xx”GridApplicationTypeparameter.
Also,acoupleofotherHadoopparametersthatweshouldpaycloseattentiontoareMapReduceMinimumContainerSize,aswellastheengineyouarerunningforHive.WerecommenduseHive’sTezEngineinmostcasesasitprovidesin-memoryqueryexecutionthatisconsiderablyfasterthantheMapReduceengine.WearecurrentlyinvestigatingYARNNodeLabels(PolicyScheduling)andtheHiveLLAPengine(SQLQueryAcceleration)forfutureconfigurationconsiderations.
SASWORKLOADMIGRATIONCONSIDERATIONSSASGridManagerforHadoopshouldbeleveragedasacomplimenttoyourexistingSASinfrastructureandnotareplacement.WhilecertaintraditionalstoragearchitecturesprovideextremelyfastIOforSASworkloads,thereisasignificantcosttotheseinfrastructures.SelectingtherightSASworkloadsanddatatomovetoHadoopisa
criticallyimportantstepwhenmovingSASjobstoHadoop.Akeybusinessdriverisreducingoverallstoragecosts.AnotherkeydriveracceleratingtheperformanceofSASjobsthatrequireaccesstolargedatasetsinHadoop.ThemostexpensiveoperationforanybigdatajobismovinglargeamountsofdatafromstoragetoacomputetieroutsideofHadoop.WithHadoop,aprimarygoaldesigngoalforthelastdecadehasbeentomoveaprogramscomputeormathtowherethedataresides,inHadoop.Thisenablesavoidingcostlydatamovementoverthenetwork.Withourbusinessdriversinmind,SASshopsshouldinitiallyidentifyahandfulofSASjobsrunningoutsideofHadoopwhichhavesignificantlylargeephemeral(SASWORK)andpermanentstorage(SASdatasets,RDBMS,otherinboundandoutbounddatasets)needs.IftheSASjobdataiscurrentlystoredonexpensiveSANstorage,costsavingswillbesignificant.AnySASjobsusingBaseSASprocedureslikeProcSummary,MeansandFreqwhichsupportSASIn-DatabasePushdownshouldconsidered.AnySASjobsplanningtouseHadoopdatasetsshouldbeonthelist.WhenmovingexistingSASWorkloaddataintoHadoop,thereareseveralstorageoptions.ExistingSASdatacanbemovedintoHadoopDistributedFileSystem(HDFS)andthen,withinHive,aschemaonreadycanbeestablished.IftheexistingSASdatasetsanddatabasetablesarestoredinHive,asmentioned,SASInDatabasecapabilitiescanbeleveraged.ThiswillenabletheSASenginetohaveanopportunitytoimplicitlytranslateBaseSASStatprocedurestocomplexHiveQL,ultimately,movingmostofthemathtothedata.SASAccesstoHadoopisanaddonSASproductthatplaysacriticalroleinthismigration.SASuserswillneedtohavesomebasictrainingonhowtomosteffectivelyleveragetheHiveandHDFSlibnameenginesfromtheirSASjobs,aswellastheSASfilenamestatementtoHFDS.ThisbasicsyntaxchangeintheirprogramswillberequiredtotakeadvantagedirectlyofSASdatanowresidinginHadoop.
o LibnametoHive§ UsesaJDBCconnecttoHive§ ProvidesanopportunityfordistributedparallelcomputeonHive
• UsingSASInDatabasePushDowncapabilities
o LibnametoHDFS§ HDFSAPIsareleveragedfordirect,parallelreadandwritetoHadoop.
o FilenametoHDFS
§ SupportsinboundandoutboundreadandwriteofrawdatatoHadoop.
SASdoesincludesupportforHadoopClientsandRESTfulAPIs.TheSASAccesstoHadoopcanalsoallowasasprogrammertorunexplicitHDFS,Pig,Hive,andMapReducecodeinlinewithintheirSASjob.Inaddition,SAShasanSQLEnginewhichsupportsbothimplicitandexplicitPassThruHiveQLexecutiondirectlytotheHadoopcluster.
ADAYINTHELIFEOFASASUSERLEVERAGESASGRID WhenusingSASclients,itshouldbetransparenttotheSASusersfromafunctionalityperpectivethattheyareinteractingwithaSASGrid.HereisadetailedwalkthroughofaSASuserleveragingSASEnterpriseGuiderunningSASProcessFlowsandTasksinsideofHadoop.Below,aSASuserlogsintoSASEnterpriseGuide’sUI,andopensanexistingproject.Toexecuteworkfromthisproject,theusermustlaunchaSASWorkspaceServer.Inthiscase,iftheuserexpandsthe+SASGridonthelefthandsideofthescreen,SASGridManagerforHadoopwillrequestYARN’sResourceManagertolaunchaSASWorkspaceServer(WSS)insidetheHadoopCluster.
Figure8:Clickon+SASGridtolaunchSASWorkspaceServeronHadoopWorkerNodeThegreencheckmarknexttoSASGridbelowindicatesthattheSASEGusercanrunanyProcessFloworTaskinHadoop,becauseYARNhassuccessfullylaunchedtheSASWSSonaHadoopWorkerNode.
Figure9:Thegreenarrow(SASGrid)indicatesSASWorkspaceServerisestablishedinHadoop
IfwelookattheYARNUIbelow,typicallyreservedforHadoopAdministrators,wecanseeaSASEnterpriseGuide–WorkspaceServerinaRUNNINGstate.ThisindicatestotheHadoopAdministratorthataSASjobisrunningandhasreservedtwoYARNcontainersandiscurrentlyactiveontheHadoopcluster:
Figure10:YARNUI-SASEGWorkspaceServer(WSS)runninginsas94_queueonHadoop.Switchingtothe“Nodes”linkintheYARNUI(thisisasmallersampleclustertoillustrateourexampleonly),youcanseethetwoYARNcontainersassociatedwiththeSASWSS,runningin1GBYARNContainers.
Figure11:YARNUI–“Nodes”linkshowingtwo1GBcontainerssupportingtheSASWSS.
And,switchingtotheYARNUI–“Scheduler”link,wecanseeClusterMetricsbelow.
Figure12:YARNUI–“Scheduler”linkHadoopClusterMetrics. SwitchingbacktotheSASUserEGUI,theSASusercanseethataSASLibraryhasbeenassignedpointingtoHive.WecanalsoseeintheSASEGExplorerwindowonthebottomlefthandsideofthescreen,alltheHiveTablesavailabletotheSASuserwhichareassociatedwithSASlibrefHIVE_TPC.Atthispoint,theSASEGusercanrunSASanalytictasksdirectlyagainstHivetablesusingtheHivedatabaseschematpcds_bin_partitioned_orc_10.
Figure13:SASEnterpriseGuide–LibraryHIVE_TPChasbeenassignedtoHive.
Atthispoint,theSASEGusercanrunanytraditionalSASjoborcode.TheSASusercanalsocallanyHDPservice(i.e.HDFS,Pig,Hive,MapReduce)in-linewithinthisSASworkspaceserver.Ifinstalled,theSASusercanalsocallSASHighPerformanceAnalytics,whichcanrunonYARNintheclusteraswell.WhentheSASEGuserissuesa“Disconnect”(seebelow),thiswillinitiatearequesttotheSASWorkspaceServertoshutdown.YarnwillreleasethecontainerthatwasusedforrunningtheSASWorkspaceServerandreclaimtheresourcesassociatedwiththatcontainer.
Figure14:SASEnterpriseGuide–“Disconnect”fromHadoop. Oncethe“Disconnect”iscompleteinEG,theHadoopAdministratorwillseetheSASEnterpriseGuideWorkspaceServerisnow“FINISHED”.(Seebelow).AdisconnectwillalsooccurautomaticallyifaSASEGusershutsdowntheUI.
Figure14:YARNUI-SASEGWorkspaceServer(WSS)FINISHEDstatusonHadoop. ForsitesinterestedinmovingtraditionalSASworkloadsco-locatedontoHadoop,SASGridManagerforHadoopisanidealsolutiontomeetthisneed.Inthispaper,wehavediscussedthebenefitsofmovingtoSASGridManageronHadoop,requiredarchitectureandrecommendedconfigurations.WehavealsosharedaseamlessSASuserexperiencewithyou.IfyouwouldliketolearnmoreaboutSASGridManagerforHadoop,pleaseseethefollowinglinksbelow.
REFERENCES“SASAnalyticsonYourHadoopClusterManagedbyYARN.”July14th,2015.Availableathttp://support.sas.com/rnd/scalability/grid/hadoop/SASAnalyticsOnYARN.pdf.SASInstituteInc.“IntroductiontoGridComputing.”Availableathttp://support.sas.com/rnd/scalability/grid/.“HortonworksApacheHadoopYARNOverviewWebPage.”Availableathttps://hortonworks.com/apache/yarn/.“ApacheHadoopYarn.”TheApacheSoftwareFoundation.June29,2015.Availableathttps://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.“Introducing-SASÒGridManagerforHadoop.”SASGlobalForumPaperSAS6281-2016Availableathttp://support.sas.com/resources/papers/proceedings16/SAS6281-2016.pdf
CONTACTINFORMATIONYourcommentsandquestionsarevaluedandencouraged.Contacttheauthorat: MarkLochbihler PrincipalArchitect,HortonworksInc. mark.lochbihler@hortonworks.comSASandallotherSASInstituteInc.productorservicenamesareregisteredtrademarksortrademarksofSASInstituteInc.intheUSAandothercountries.ÒindicatesUSAregistration.