the science dmz - globusworld.org
Post on 06-Jan-2017
222 Views
Preview:
TRANSCRIPT
TheScienceDMZ
EliDart,NetworkEngineerESnetScienceEngagementLawrenceBerkeleyNational Laboratory
Building theModernResearchDataPortal
GlobusWorld
Chicago,IL
April20,2016
Outline
4/26/162
• ScienceDMZinbrief
• Context– ScienceDMZinthecommunity
• ScienceDMZandDataPortals
• ThisassumesyoualreadyhaveaScienceDMZ– Ifyoudon’thaveone,wecanchatabouthowyoumightbuildone– Ifitwouldbehelpful, Icantalktoyoursystemsandnetworking folks– Orcheckoutthefasterdataknowledgebase:
• http://fasterdata.es.net/science-dmz/
ScienceDMZDesignPattern(Abstract)
10GE
10GE
10GE
10GE
10G
Border Router
WAN
Science DMZSwitch/Router
Enterprise Border Router/Firewall
Site / CampusLAN
High performanceData Transfer Node
with high-speed storage
Per-service security policy control points
Clean, High-bandwidth
WAN path
Site / Campus access to Science
DMZ resources
perfSONAR
perfSONAR
perfSONAR
3 – ESnet Science Engagement (engage@es.net) - 4/26/16 ©2015,EnergySciencesNetwork
SupercomputerCenterDeployment
• High-performancenetworkingisassumedinthisenvironment– Dataflowsbetweensystems,betweensystemsandstorage,widearea,etc.– Globalfilesystemoftentiesresourcestogether
• Portions ofthismaynot runoverEthernet (e.g.IB)• Implications forDataTransferNodes
• “ScienceDMZ”maynotlooklikeadiscreteentityhere– Bythetimeyougetthroughinterconnectingalltheresources,youendupwithmostofthenetworkintheScienceDMZ
– Thisisasitshouldbe– thepointisappropriatedeploymentoftools,configuration,policycontrol,etc.
• Officenetworkscanlooklikeanafterthought,buttheyaren’t– Deployedwithappropriatesecuritycontrols– Officeinfrastructureneednotbesizedforsciencetraffic
4 – ESnet Science Engagement (engage@es.net) - 4/26/16 ©2015,EnergySciencesNetwork
HPCCenter
©2014,EnergySciencesNetwork5 – ESnet Science Engagement (engage@es.net) - 4/26/16
Routed
Border Router
WAN
Core Switch/Router
Firewall
Offices
perfSONAR
perfSONAR
perfSONAR
Supercomputer
Parallel Filesystem
Front endswitch
Data Transfer Nodes
Front endswitch
HPCCenterDataPath
©2014,EnergySciencesNetwork6 – ESnet Science Engagement (engage@es.net) - 4/26/16
Routed
Border Router
WAN
Core Switch/Router
Firewall
Offices
perfSONAR
perfSONAR
perfSONAR
Supercomputer
Parallel Filesystem
Front endswitch
Data Transfer Nodes
Front endswitch
High Latency WAN Path
Low Latency LAN Path
Context:ScienceDMZAdoption
• DOENationalLaboratories– HPCcenters,LHCsites, experimental facilities– Both largeandsmallsites
• NSFCC*programshavefundedmanyScienceDMZs– Significant investments acrosstheUSuniversitycomplex– Bigshoutout totheNSF– theseprogramsarecritically important
• OtherUSagencies– NIH– USDAAgriculturalResearch Service
• International– Australiahttps://www.rdsi.edu.au/dashnet– Brazil– UK
4/26/167
StrategicImpacts• Whatdoes thismean?
– Weareinthemidstofasignificantcyberinfrastructure upgrade– Enterprise networksneednotbeundulyperturbedJ
• Significantlyenhanced capabilitiescompared to3 yearsago– Terabyte-scale datamovement ismucheasier– Petabyte-scale datamovementpossibleoutside theLHCexperiments
• ~3.1Gbps=1PB/month• ~14Gbps=1PB/week
– Widely-deployed toolsaremuchbetter (e.g.Globus)
• Metcalfe’s LawofNetworkUtility– ValueofScienceDMZproportional tothenumberofDMZs
• n2 orn(logn)doesn’tmatter– theeffect isreal– Cyberinfrastructure valueincreases asweallupgrade
4/26/168
NextSteps– BuildingOnTheScienceDMZ
• Enhancedcyberinfrastructuresubstratenowexists– Wideareanetworks(ESnet,GEANT,Internet2,Regionals)– ScienceDMZsconnectedtothosenetworks– DTNsintheScienceDMZs
• Whatdoesthescientistsee?– Scientistseesascienceapplication
• Datatransfer• Dataportal• Dataanalysis
– ScienceapplicationsaretheuserinterfacetonetworksandDMZs• Theunderlyingcyberinfrastructurecomponents(networks,ScienceDMZs,DTNs,etc.)arepartoftheinstrumentofdiscovery
• Large-scaledata-intensivesciencerequiresthatwebuildlargerstructuresontopofthosecomponents
4/26/169
ScienceDataPortals
• Largerepositories ofscientificdata– Climatedata– Skysurveys (astronomy,cosmology)– Manyothers– Datasearch,browsing,access
• Manyscientificdataportalsweredesigned15+yearsago– Single-web-server design– Databrowse/search, dataaccess, userawareness allinasinglesystem– Allthedatagoes throughtheportalserver
• Inmanycasesbydesign• E.g.embargobeforepublication (enforceaccesscontrol)
4/26/1610
LegacyPortalDesign
10GE
Border Router
WAN
Firewall
Enterprise
perfSONAR
perfSONAR
Filesystem(data store)
10GE
Portal Server
Browsing pathQuery pathData path
Portal server applications:· web server· search· database· authentication· data service
4/26/1611
• Verydifficulttoimproveperformancewithoutarchitectural change– Softwarecomponentsalltangledtogether
– DifficulttoputthewholeportalinaScienceDMZbecauseofsecurity
– EvenifyoucouldputitinaDMZ,manycomponentsaren’t scalable
• Whatdoesarchitectural changemean?
ExampleofArchitecturalChange– CDN
• Let’slookatwhatContentDeliveryNetworksdidforwebapplications
• CDNsareawell-deployeddesignpattern– Akamaiandfriends– EntireindustryinCDNs– Assumedpartoftoday’sInternetarchitecture
• WhatdoesaCDNdo?– Storestaticcontentinaseparate locationfromdynamiccontent
• Complexity isn’t inthestaticcontent– it’sintheapplication dynamics• Webapplications arecomplex, full-featured,andslow– Databases,userawareness,etc.– Lotsofintegratedpieces
• Dataserviceforstaticcontent issimple bycomparison– Separationofapplicationanddataservice allowseachtobeoptimized
4/26/1612
ClassicalWebServerModel
4/26/1613
• Webbrowser fetches pagesfromwebserver– Allcontentstoredonthewebserver– Webapplicationsrunonthewebserver
• Webservermaycallouttolocaldatabase• Fundamentally allprocessing islocaltothewebserver
– Webserver sendsdatatoclientbrowserover thenetwork• Perceivedclientperformance changeswithnetworkconditions
– Severalproblems inthegeneral case– Latencyincreases timetopagerender– Packetloss+latencycauseproblems forlargestaticobjects
HostingProvider
TransitNetwork
Residential BroadbandWEB
Long Distance / High Latency
Web Server
Browser
Solution:PlaceLargeStaticObjectsNearClient
HostingProvider
TransitNetwork
Residential BroadbandWEB
Long Distance / High Latency
CDN
DATA
Short Distance / Low Latency
Web Server
CDN Data Server
Browser
4/26/1614
• CDNprovides staticcontent“close”toclient– Latencygoesdown
• Timetopagerendergoesdown• Staticcontentperformancegoesup
– Loadonwebserver goesdown(noneed toservestaticcontent)
– Webserver stillmanagescomplexbehavior• Localreasoning /fastchangesforapplication owner
• Significantwinforwebapplicationperformance
ClientSimplySeesIncreasedPerformance
4/26/1615
• Clientdoesn’t see theCDNasaseparate thing– Webcontentisallstillviewed inabrowser
• Browserfetcheswhatthepagetells ittofetch• Differentcontentcomesfromdifferentplaces• Userdoesn’tknow/care
• CDNsprovideanarchitectural solutiontoaperformance problem– Notbrute-force– Worksmarter, notharder
The‘NetWEB
Browser
Web Server
Rich, Slow
DATA
CDN Data Server
Simple,Fast
The‘NetWEB
Browser
Web Server
ArchitecturalExaminationofDataPortals
• Commondataportalfunctions (mostportalshavethese)– Search/query/discovery– Datadownloadmethodfordataaccess– GUIforbrowsingbyhumans– APIformachineaccess– ideallyincorporates search/query +download
• Performance painisprimarilyinthedatahandlingpiece– Rapidincrease indatascaleeclipsed legacysoftware stackcapabilities– Portalservers oftenstuckinenterprise network
• Canwe“disassemble” theportalandputthepiecesbacktogether better?– UseScienceDMZasaplatformforthedatapiece– Avoidplacingcomplexsoftware intheScienceDMZ
4/26/1616
LegacyPortalDesign
10GE
Border Router
WAN
Firewall
Enterprise
perfSONAR
perfSONAR
Filesystem(data store)
10GE
Portal Server
Browsing pathQuery pathData path
Portal server applications:· web server· search· database· authentication· data service
4/26/1617
Next-GenerationPortalLeveragesScienceDMZ
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZSwitch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE10GE
DTN
DTN
API DTNs(data access governed
by portal)
DTN
DTN
perfSONAR
Filesystem (data store)
10GE
Portal Server
Browsing pathQuery path
Portal server applications:· web server· search· database· authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
4/26/1618
PutTheDataOnDedicatedInfrastructure
• Wehaveseparatedthedatahandlingfromtheportallogic• Portalisstillitsnormalself,butenhanced
– PortalGUI,database,search,etc.allfunctionastheydidbefore– QueryreturnspointerstodataobjectsintheScienceDMZ– Portalisnowfreedfromtiestothedataservers(runitonAmazonifyouwant!)
• Datahandlingisseparate,andscalable– High-performanceDTNsintheScienceDMZ– Scaleasmuchasyouneedtowithoutmodifyingtheportalsoftware
• Outsourcedatahandlingtocomputingcenters– Computingcentersaresetupforlarge-scaledata– Letthemhandlethelarge-scaledata,andlettheportaldotheorchestrationofdataplacement
4/26/1619
EcosystemIsReadyForThis
• ScienceDMZsaredeployedatLabs,Universities, andcomputingcenters– XSEDEsites– DOEHPCfacilities– Manycampusclusters
• GlobusDTNsarepresent inmanyofthoseScienceDMZs– XSEDEsites– DOEHPCfacilities– Manycampusclusters
• Architectural changeallowsdataplacement atscale– Submitaquerytotheportal,Globusplaces thedataatanHPCfacility– RuntheanalysisattheHPCfacility– Theresultsaretheonlythingthatendsuponalaptoporworkstation
4/26/1620
LinksandLists
– ESnetfasterdataknowledgebase• http://fasterdata.es.net/
– ScienceDMZpaper• http://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf
– ScienceDMZemaillist• Sendmailtosympa@lists.lbl.gov withsubject"subscribeesnet-sciencedmz”
– perfSONAR• http://fasterdata.es.net/performance-testing/perfsonar/• http://www.perfsonar.net
– Globus• https://www.globus.org/
21 – ESnet Science Engagement (engage@es.net) - 4/26/16 ©2015,EnergySciencesNetwork
Thanks!
EliDartdart@es.netEnergySciencesNetwork(ESnet)LawrenceBerkeleyNational Laboratory
http://fasterdata.es.net/
http://my.es.net/
http://www.es.net/
ExtraSlides
4/26/1623
DTNClusterDetail
10GE10GE
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZSwitch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE10GE
10GE
10GE
10GE10GE
DTN
DTN
Filesystem
HEAD
“Sealed” DTNs(Globus only, no
shell access)
ClusterHead/Login
Nodes
DTN
DTN
Cluster compute nodes
HEAD
perfSONAR
Configure as DTN Cluster
4/26/1624
DTNClusterDesign
• ConfigureallfourDTNsasasingleGlobusendpoint– Globushasdocsonhowtodothis– https://support.globus.org/entries/71011547-How-do-I-add-multiple-I-O-nodes-to-a-Globus-endpoint-
• Recentoptionsforincreasedperformance– Useadditionalparallelconnections– DistributetransfersacrossmultipleDTNs(GlobusI/ONodes)– Critical– onlydothiswhenallDTNsintheendpointmountthesamesharedfilesystem
• UsetheGlobusCLIcommandendpoint-modify – Usethe--network-useoption– Adjustsconcurrencyandparallelism– Moreinfoatglobus.org (http://dev.globus.org/cli/reference/endpoint-modify/)
4/26/1625
SecurityFootprintofaGlobusTransfer
Amazon AWS
100GE
10GE10GE
100GE
10GE
10GE100GE
DATA
TCP ports50000-51000
Lab1 Science DMZ
Lab1 Border Router
ESnet 100GEESnet Router
Lab2 Border Router
Lab2 Science DMZ
Lab1 DTN
DTN DTN
OrchestrationOrchestration
Lab2 DTN
ESnet Router
Lab1 DTN security
filters
Lab2 DTN security
filters
TCP ports 443,2811, 7512
TCP ports 443,2811, 7512
Logical data path
Physical data path
Logical control path
Physical control path
Lab1 DTN security filters Lab2 DTN security filters
4/26/1626
SecurityFootprintofaGlobusDTN
4/26/1627
10GE
Amazon AWS
100GE
10GE
10GE
100GE
DATA
TCP ports50000-51000 Science DMZ
Site / Campus Border Router
World
DTN
DTN
Orchestration
Remote DTNs
DTN securityfilters
TCP ports 443,2811, 7512
DTN
DATA
Local DTN
Logical data path
Physical data path
Logical control path
Physical control path
top related