the nomad (novel materials discovery) laboratory – a

11
The NOMAD (Novel Materials Discovery) Laboratory – a European Centre of Excellence HPC Platform Requirements and Architecture Report Deliverable No: D6.1 Lead Beneficiary: CSC - IT Center for Science (CSC) Contributing Beneficiaries: Max Planck Computing and Data Facility (MPCDF-MPG), Barcelona Supercomputing Centre (BSC), Leibniz Supercomputing Centre (LRZ), Humboldt- Universitaet zu Berlin (HUB)

Upload: others

Post on 02-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The NOMAD (Novel Materials Discovery) Laboratory – a

TheNOMAD(NovelMaterialsDiscovery)

Laboratory–aEuropeanCentreofExcellence

HPCPlatformRequirementsandArchitectureReport

DeliverableNo:D6.1

LeadBeneficiary:CSC-ITCenterforScience(CSC)ContributingBeneficiaries:MaxPlanckComputingandDataFacility(MPCDF-MPG),

BarcelonaSupercomputingCentre(BSC),LeibnizSupercomputingCentre(LRZ),Humboldt-UniversitaetzuBerlin(HUB)

Page 2: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

*TheacronymhasbeenchangedfromNoMaDtoNOMAD-NoMaDisusedhereinreferencetotheacronymusedintheGrantAgreement.

Copyright 2016 by theNOMADConsortium. The information in this document is proprietary totheNOMADConsortium.This document contains preliminary information and is not subject to any license agreement or any otheragreementwiththeNOMADConsortium.Thisdocumentcontainsonlyintendedstrategies,developments,andfunctionalitiesandisnotintendedtobebinding upon to any particular course of business, product strategy, and/or development oftheNOMADConsortium.TheNOMADConsortiumassume no responsibility for errors or omissions in this document. Furthermore,theNOMADConsortiumdoes not warrant the accuracy or completeness of the information, text, graphics,links,orotheritemscontainedwithinthismaterial.Thisdocumentisprovidedwithoutawarrantyofanykind,eitherexpressor implied, includingbutnot limitedtothe impliedwarrantiesofmerchantability, fitnessforaparticular purpose, or non-infringement. TheNOMADConsortiumshall have noliabilityfor damages of anykind includingwithout limitationdirect, special, indirect,orconsequentialdamages thatmayresult fromtheuse of these materials. This limitation shall not apply in cases of intent or gross negligence. Thestatutoryliabilityforpersonalinjuryanddefectiveproductsisnotaffected.Inaddition,thematerialspresentedandviewsexpressedherearetheresponsibilityoftheauthor(s)only.TheEUCommissiontakesnoresponsibilityforanyusemadeoftheinformationsetout.D6.1HPCPlatformRequirementsandArchitectureReport 2

ExecutiveSummaryInthefirstyear,thefocusofworkpackage6(WP6),responsibleforHPCservicesandinfrastructure,was on planning and designing the NOMAD technology platform according to the differentrequirements of the developers from other WPs, early adopters and future end users. Thearchitecture of the NOMAD technology platform was designed to best serve the needs of thesedifferentgroups.Accordingly,thetechnologyplatformwassplit intotwoparts:adevelopmentandtestingplatformandaproductionplatform.

Thedevelopmentandtestingplatformwas implementedanddeployedat theearliestconveniencetoallowforaquickstartofthedevelopmentworkoftheotherWPs,whiletheproductionplatform–needed only in the second year of the project – was planned and designed in detail. ThedevelopmentplatformfortheNOMADdeveloperswasputinplaceshortlyaftertheprojectstart.Inthe last year, it has been constantly extended according to the needs of the developers. Thedevelopment platform consists of virtualmachines (VMs) formany different purposes (login, databases,parsing,webserver,etc.),asharedstoragesystemandausermanagementsystem,basedonexistingsystemsintheNOMADcomputingcenters.

Theproductionplatformwill host theproduction readyNOMADservices for thematerials sciencecommunity worldwide. The design of the production platform has been based on that of thedevelopmentplatform,withtheadditionoffurtherfunctionalities(singlesignon,securityfeatures,replicationfeatures,etc.)andresources(forcomputing,storageandforsoftwaremaintenance).

Page 3: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

D6.1HPCPlatformRequirementsandArchitectureReport 3

HPCPlatformRequirementsandArchitectureReport1 Introduction.................................................................................................................4

2 Goals............................................................................................................................4

3 Results.........................................................................................................................53.1 DevelopmentPlatform......................................................................................................53.2 Futureproductionplatform...............................................................................................6

3.2.1 VirtualMachines.........................................................................................................73.2.2 Storage........................................................................................................................73.2.3 Authentication&authorization...................................................................................73.2.4 Usagestatisticsandaccounting...................................................................................9

3.3 DeploymentoftheNOMADproductionplatformintheNOMADcomputingcenters.........93.3.1 Datarepositoriesandreplication................................................................................93.3.2 SecurityAspects..........................................................................................................93.3.3 Softwaredeploymentoptions....................................................................................103.3.4 ExtendedComputeservices........................................................................................11

4 Conclusion..................................................................................................................11

Page 4: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

D6.1HPCPlatformRequirementsandArchitectureReport 4

1 IntroductionWP6isresponsibleforthedesignandoperationofatechnologicalplatformtohandlethedemandsof the NOMAD Encyclopedia (WP2), visualization services (WP3), and NOMAD Analytics Toolkit(WP4), as well as the provision of application-enabling services for modeling on Big-Data andcorrespondingworkflows.

In the first year, the focus was on planning and designing the platform according to userrequirements and providing a development platform for all WPs to be able to execute theirdevelopmentandtestingtasks.Therefore,thistaskhasbeencarriedoutinveryclosecollaborationwith otherWPs. According to the Description of Action (DoA), the four computing centers in theNOMAD Laboratory Centre of Excellence (CoE) (BSC, CSC, LRZ andMPCDF-MPG) are providing thenecessary hosting and computing capabilities for both the development and the productionplatforms.

2 GoalsThe goal of the work reported in this deliverable was planning and designing of the technologyplatformaccordingtotherequirementsofbothdevelopmentandproductionphasesoftheNOMADLaboratoryCoE.Accordingtothesetwocomplementaryneeds,thehard-andsoftwaresetupswereseparated into two platforms: first, a development and testing platform that will be used bydevelopers and direct NOMAD members, plus early adopters during the implementation phase;second,aproductionplatformthatwillhostthestableproductionversionsofNOMADapplications(see Figure 1). The production platform will give access to theNOMAD Laboratory CoE data andfunctionalitytoanypotentialuserworldwide.ThedevelopmentsystemhasbeeninstalledatMPCDF-MPG but can be used from all participating sites. The production platforms will be installed andoperatedatallparticipatingcomputingcenters.

Figure1SeparationofdevelopmentandproductionplatformsfortheNOMADLaboratoryCoE

Page 5: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

D6.1HPCPlatformRequirementsandArchitectureReport 5

OnekeygoalintheNOMADLaboratoryCoEistoutilizealreadyexistingresourcesattheparticipatingHPCcenters,withothersiteshopefully joining later.Makingthispossiblerequiresaflexibleoverallarchitecturethatcanmakeuseofheterogeneoushardwareenvironments.

3 ResultsAsexplained,WP6hasestablishedadevelopmentplatformrightfromthebeginningoftheprojectandwillprovideaproductionplatformatalaterstage.

3.1 DevelopmentPlatform

ThedevelopmentplatformforallWPshasbeenputintooperation.ItisconfiguredinasuitablewaytobeabletoperformcurrentandfuturedevelopmenttasksoftheWPs.

As a basis, the respective requirements of all WPs were collected and assembled. Configuring ofrespectivehardwaretookplaceintheNOMADcomputingcenters.

The prototype development and implementation was carried out at MPCDF-MPG – close to theNoMaD Repository and its data – as a template for replication at the other NOMAD computingcenters.Thedevelopmentplatformhastosupportthe:

a) needsofWP1forthecreationofthenormalizeddatafromtheNoMaDRepositorydata,b) implementationoftheNOMADEncyclopediabyWP2,c) integrationofvisualizationservicesbyWP3,andd) Big-Dataanalytics,includingmachine-learningapproaches,byWP4.

Figure2CurrentmastersetupofthedevelopmentplatformatMPCDF-MPG

According to this design, the master setup of the complete development platform has beenimplementedbyanddeployedatMPCDF-MPG(seeFigure2).

Page 6: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

D6.1HPCPlatformRequirementsandArchitectureReport 6

Thissetupishighlyflexibleandcanbeeasilyadaptedtoincreasingneedsbothforincreasedresourceandsoftwarerequirementsand-optional-extensionstoothercomputingcenters.ThishasalreadybeendonebyinstallingpartsofthesystemalsoatCSC.

3.2 Futureproductionplatform

WithrespecttoafutureNOMADproductionplatform,thedevelopmentplatforminitslatestversionisexpectedtobecomethebasisfortheproductionplatformwithadditionalfeaturesandresources(seeFigure3).

Additionalfeaturesrequiredfortheproductionplatformaredescribedbelow.

AllNOMADLaboratoryCoE serviceswill have tobeweb-based, tobeable to support all potentialNOMAD users from the materials science communities, including industry - no local softwareinstallationswillbenecessaryfortheusageoftheNOMADservices.Onlyastandardbrowserwillberequired.

Due to the number of services and distributed character of the NOMAD Laboratory CoEinfrastructure,a„SingleSignOnSystem“(SSO)forexternaluserswillbenecessary(seesection3.2.3).ASSOwillhandleuseraccountsinacentralservice(„IdentityProvider“(IDP))andwillvalidateuserloginsforanyotherNOMADservice(„ServiceProvider“(SP)).

Thefourcomputingcentershavetakenresponsibilityforhostingthefinalproductionplatform.Tobepreparedforanynumberofusersandnecessaryresources,theNOMADproductionplatformhastobe implemented in a scalable and redundant way. Extensive use of Docker containers running invirtualmachines(VMs)enableflexiblescaleupandrunningtheNOMADproductionserviceinseverallocations(HPCcenters)givesbothadditionalscalebutalsoredundancy.

Productionanddevelopmentplatformswillalsodifferinresourcerequirements.

-

Figure3Setupoftheproductionplatform

Page 7: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

D6.1HPCPlatformRequirementsandArchitectureReport 7

For the development platform, compute resource requirements are currently not high, as onlyNOMADdevelopersareaccessingthedevelopmentplatformforimplementationpurposes.

Additionally,NOMADmembersandearlyadopterswillbeusingthedevelopmentplatformintestingphases.AfterthecompletionoftheimplementationandtestingphasesforcertainWPs(e.g.NOMADEncyclopedia), the production platform will offer that service of NOMAD Laboratory CoE usersworldwide.Since it is currentlydifficult toestimate theamountof resourcesneeded foroperatingtheoverall services in theproductionphase, thedesignalsoof theproductionplatform fulfills thesameflexibilitycriteriaasthedevelopmentplatform.

3.2.1 VirtualMachines

MoreinstancesofVMscanbeeasilyassignedtoaspecificNOMADservice,andthestoragecapacitycanbedynamicallyincreased–atallcomputingcenters.Tobewellprepared,theservicesarebeingdeployedinawaythatadditionalresources(VMs,cores,RAM,storage)canbeadded„onthefly“toassureagoodperformanceofallservices.Optionsare implementationofacloud-likeenvironmente.g. on the basis ofOpenStack, or by individually adjusting virtualmachines. This flexibility allowsoptimaluseof existingheterogeneous resources atparticipatingHPC siteswithout theneed toallsitessetupanidenticalinfrastructure.

3.2.2 Storage

Adifferentiationbetweentwokindsofstorageisneeded:

1) storage for developers using the development platform: These storage resources are forstoringandcreating thedataonwhich theNOMADLaboratoryCoE isbased.This includessharedstoragesystemsaswellaslocalstorages,mounteddirectlytoserversorVMs.

2) storage when using the production platform: The data made available through theproductionplatformwillbeseparatedfromthestorageofthedevelopmentplatformandwillhave read-only access via the public NOMAD services. Through the NOMAD tools (foranalysis,visualizationsetc.), thedataproducedbyexternaluserswillnotbestoredback intheproductionplatform,butwillhavetobeavailableforfurtherNOMADservices.

3.2.3 Authentication&authorization

In thedevelopmentand testingphase, authenticationandauthorizationwill beonlynecessary fordevelopersandearlyadopters.This isbeinghandledbyexisting localusermanagementsystemsofthe computing centers. Before activationof theproductionplatform (later in theproject), amorecomplex„SingleSignOn“system(SSO,seeFigure4)fortheenduserwillhavetobeimplemented.Options will be evaluated and it will be decided whether a) an already existing SSO would besufficientandcouldbeusedinNOMAD,orb),aNOMADspecificsolutionwillhavetobedevelopedandbeimplemented.

AnSSOsystemconsistsoftwoparts:1)anIdentifyProvider(IDP),whichisresponsibleforthewholeuser account management and login procedure, and 2) an arbitrary number of Service Providers(SPs). These SPs are the NOMAD applications, which are using the IDP for authentication andauthorizationoftheusers.SomealreadyexistingSSOsolutionsarebeingconsideredandareunderevaluation,including:

Page 8: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

D6.1HPCPlatformRequirementsandArchitectureReport 8

• the user management system implemented in the NoMaD Repository database (highlytailoredtotheneedsandrequirementsoftheNoMaDRepository;inprinciple,anextensionto theNOMAD Laboratory CoE would be possible, but thiswould require the adoption ofstandardized user management interfaces (via the Lightweight Directory Access Protocol(LDAP))intotheNoMaDRepository),

• standardized and common protocols for remote usermanagement (e.g.OAuth21, SecurityAssertionMarkupLanguage2.0(SAML2.0)2orShibboleth3basedweblogin,forexampleonthebasisofeduGAIN4),and

• identity brokers like EUDATs B2Access service, which allows a user to utilize a set of SSOsystemslikeGoogleorFacebook.

In general, authentication and authorization has to be done in the NOMAD services themselves.RegardlessofwhichSSOsystemwillbeputinplace,anyNOMADservicewillhavetoimplementtheSSOfunctionsforusermanagementandloginbyitself(ServiceProvider)

1Seehttps://oauth.net/2/2Seehttps://en.wikipedia.org/wiki/SAML_2.03Seehttps://shibboleth.net/4Seehttp://www.geant.org/Services/Trust_identity_and_security/eduGAIN

Figure4WorkflowinaSSOsystem

Page 9: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

D6.1HPCPlatformRequirementsandArchitectureReport 9

3.2.4 Usagestatisticsandaccounting

Monitoringservicesoftheinvolvedcomputingcenterswillbeextendedtoincludemonitoringoftheusage of the NOMAD services. This would reveal potential performance bottlenecks, as well aspotentialmisuseof theNOMADservices (e.g.denialofserviceattacks).Reportsonresourceusagecanbegeneratedonthebasisofinformationandmonitoringsystemoutput.

In case of shortcomings of hardware resources, these resources will be enhanced according toavailablefinancialresources.

Accountingandlimitingofresourcesforexternalusagecanbedoneintwoways: • UsersgainaccesstoaplatformonwhichtheNOMADservicesareinstalledandaregivena

dedicatedamountofresources(CPUs,RAM,storage,etc.).• NOMADapplicationstrackinguserresourceuseandpotentiallylimitfurtherusage.

3.3 DeploymentoftheNOMADproductionplatformintheNOMADcomputingcenters

3.3.1 Datarepositoriesandreplication

TheNoMaDRepositoryisthemainsourcefordataintheNOMADLaboratoryCoE.Itistheonlyplacewhereuserscanuploadfurtherdata.Internally,thedataoftheNoMaDRepositoryisnormalizedandthenmadeavailablefortheNOMADLaboratoryCoEapplications.

While databases are holding the metadata, both normalized data and metadata has to besynchronizedovertheNOMADcomputingcenters–currentlyatMPCDF-MPGandCSC.Thiscanbedone by theNOMAD developers via conventional command-line tools like rsync andmaster-slavedatabasereplication.

3.3.2 SecurityAspects

The NOMAD Laboratory CoE services are web-based and accessible from everywhere for anypotentialuser.Thissetuprequiresahighattentiononsecurity-relatedissues.

Thecomputingcentersarehosting theNOMADLaboratoryCoEservices.On the levelofhardwareand operating systems, the computing centers will take care of upcoming security issues. Thisincludesregularchecksfornecessaryupdatesoftheoperatingsystemandthebasicsoftwarethatsinstalled. Individual configuration of databases, application servers, etc. cannot be done by thecomputingcentersandshouldbehandledbyindividualWPs.

BackupcapabilitieswillbeprovidedforallNOMADrelateddata,configurationsandapplicationcode.Forexample,MPCDF-MPGoffersalreadybackupfunctionalityontheGPFSclusteritoperatesfortheNoMaDRepositoryandNOMADLaboratoryCoE:afileset(„/nomad/backup“)isautomaticallybackedupontapesviaTivoliStorageManager.

Attheapplicationlevel,thedevelopersoftheNOMADLaboratoryCoEserviceshavetotakecareofsecurity.Fromtheinfrastructurepointofview,usingcontinuousdeploymentandlargelyautomateddeploymentscriptsfacilitatesfrequentsecurityupdatesandrapidfixingofreportedvulnerabilities.

Page 10: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

D6.1HPCPlatformRequirementsandArchitectureReport 10

3.3.3 Softwaredeploymentoptions

FortheprovisioningofVMs,configurationsandapplicationsetups,theNOMADLaboratoryCoEcanbenefit from tools like Ansible, a free-software platform for configuring and managing servers.5Ansiblescriptswillcovertwodifferentusecases:

• basiclevelcasesthroughthecomputingcenters(setupofoperatingsystemandbasictools;willbeusedtoduplicate(virtual)machinesacrosscomputingcenters),and

• applicationlevelcasesthroughNOMADservicedevelopers(installationandconfigurationofuserspaceapplicationslikedatabases,applicationserversetc.).

During the whole development process, MPCDF-MPG will offer its GitLab platform for use by allNOMADdevelopers.GitLabisinfirstplaceaversioningsystemfordistributedcodedevelopmentonthe basis of Git, a version control system that can be used for software development and otherversioncontrol tasks.Besidethisbasic functionality,GitLabofferswikis, issuetrackersandabroadanddeep integrationwith furtherdevelopment tools (e.g. IntegratedDevelopmentEnvironments).ContinuousintegrationcanalsobedonedirectlyfromwithinGitLab(seeFigure5).

Figure5GitLabserviceatMPCDF-MPG

5http://docs.ansible.com/

Page 11: The NOMAD (Novel Materials Discovery) Laboratory – a

NOMAD*–ProjectNo.676580

D6.1HPCPlatformRequirementsandArchitectureReport 11

3.3.4 ExtendedComputeservices

InadditiontothehandlingofcomputingneedsthroughtheNOMADcomputingcenters,therewillbecollaborationswithPRACEconcerningHPCresourcesincasegapsinthesimulationdatahavetobeclosed. In case of needs for heavy re-parsing of NOMAD data, compute resources at NOMADcomputingcenterswillbemadeavailableforparsingthedatainageographicallydistributedwayandmechanismsforsynchronizingtheresults.

4 ConclusionWP6, responsible for HPC services and infrastructure, has developed and successfully provided atechnology platform for the NOMAD Laboratory CoE. The development platform is an essentialinfrastructure for internal use of the NOMAD developers and has proven its appropriateness inpractice. Its functionality was extended according to the growing needs of the developers. Theproduction platform, which will provide NOMAD services publicly for scientific and industrialcommunities,hasbeendesigned. Its implementationwill beundertaken in the secondyearof theproject.Overall,WP6isontrackandinlinewiththetasksoutlinedintheDoA.

Authors:T.Zastrow,H.Lederer(MPCDF-MPG),AtteSillanpää,SriH.Vathsavayi(CSC)