1. simulation 2. archive and distribute 3. analysis 4. understanding
TRANSCRIPT
![Page 1: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/1.jpg)
1. Simulation
2. Archive and distribute
3. Analysis
4. Understanding
![Page 2: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/2.jpg)
Heterogeneous HPC environments
Large community
SSH is king
No global view
Very complex workflow
etc, etc, etc
Problem Space
![Page 3: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/3.jpg)
curiehybrid nodes
-q hybrid
curiehybrid nodes
-q hybrid
curiethin nodes-q standard
curiethin nodes-q standard
curielarge nodes
-q xlarge
curielarge nodes
-q xlarge
ESGFESGF
$HOME
$CCCSTOREDIR
$CCCWORKDIR
$SCRATCHDIR
HPSS : Robotic tapes
curiefront-end
curiefront-end
sourcessmall results IGCM_OUT :
MONITORING/ATLAS
temporary REBUILDIGCM_OUT :
files to be packedoutputs of post-proc jobs
IGCM_OUT : Packed results
Output, Analyse SE and TS
Small precious filesSaved space
File system
dods_cp
cp
ccc_hsm get
airainfront-end
airainfront-end
airainnodesairainnodes
cpESGFESGFdods_cp
Temporary space
Saved space
Non saved space
Space on tapes
computecompute
loginlogin
Visible from www
quotasquotas
quotasquotasquotasquotas
quotasquotas
TGCC in a nutshell
![Page 4: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/4.jpg)
Job_EXP00Job_EXP00
Com
pute
curie
Job_EXP00Job_EXP00 Job_EXP00Job_EXP00
TGCC PeriodLength PeriodLength
$SCRATCHDIR/IGCM_OUT/.../REBUILD
$SCRATCHDIR/IGCM_OUT/XXX/Restart Debug
ESGF=TRUE/FALSE
ncrcat
PackFrequency
$CCCSTOREDIR/IGCM_OUT/XXX/Output
pack_outputpack_output
PackFrequency
$CCCSTOREDIR/IGCM_OUT/.../RESTART DEBUG
Post
curietarpack_restart
pack_debugpack_restartpack_debug
create_tscreate_ts
curiemonitoringmonitoring
Post
TimeSerieFrequency
TS et SE : $CCCSTOREDIR/IGCM_OUT/… dods/storeMONITORING et ATLAS : $CCCWORKDIR dods/work
create_secreate_se
SeasonalFrequency
Atlas/metricsAtlas/metrics
$SCRATCHDIR/IGCM_OUT/XXX/Output
Post
RebuildFrequency
rebuildrebuild
curie
![Page 5: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/5.jpg)
MQ Cluster
MQ Apps
API
DB’s
IPS
L
IPS
L
IPSL User @ Browser | Command Line | Desktop
json
TGCC
MQ Relay
IDRIS
MQ Relay
CINES
MQ Relay
IPSL
MQ Relay
XXX
MQ Relay
msg msgmsgmsgmsg
![Page 6: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/6.jpg)
Simulation monitoring & control
ESG-F integration: data publishing
ES-DOC integration: documentation publishing
PCMDI simulation metrics publishing
HPC diagnostics aggregation
Controlled vocabulary management
Push notifications: Web Socket, SMS, SMTP, MQ
Solution Space
![Page 7: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/7.jpg)
d
Metrics Garden User Web Interface
Test Glecker like metrics on CMIP5 version of IPSL models
Metrics Garden
![Page 8: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/8.jpg)
1. Simulation
2. Archive and distribute
3. Analysis
4. Understanding
![Page 9: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/9.jpg)
How do we usually present ourselves• Prodiguer, the national level
– Coordination between french partners
– IPSL, CNRM-CERFACS, TGCC, IDRIS, CINES
– Accompanying the community
• IS-ENES, the European level
– Coordination between European partners
– Heavy workload on operational implementaiton of ESGF (the biggest source of climate models results)
– Strengthening the infrastructure
• ESGF, ES-DOC, international level.
– WGCM Infrastructure Panel– ESGF Governance (Executive Commitee)– ES-DOC Governance (Principal Investigator)
![Page 10: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/10.jpg)
Many, many processes, many, many communities !
Interconnected communities, all needing access to (some of) the data!
![Page 11: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/11.jpg)
Resolution
Complexity
Duration and ensemble size
Ehanced computing resources produce MORE DATA
Earth Observations
![Page 12: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/12.jpg)
The Earth System Grid Federation (ESGF) is a multi-agency, international collaboration of persons and institutions working together to build an open source software infrastructure for the management and analysis of Earth Science data on a global scale
•Software development and project management: ANL, ANU, BADC, CMCC, DKRZ, ESRL, GFDL, GSFC, JPL, IPSL, ORNL, LLNL (lead), PMEL, …
•Operations: tens of data centers across Asia, Australia, Europe and North America
Worldwide distributed system
![Page 13: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/13.jpg)
Storage evolution in 6 years time (from CMIP3 to CMIP5) : a factor x30
Worldwide distributed system
● Operational since 2011● Hundreds of users per month● Hundreds of To per month● About 10 000 registered users
CMIP3: centralizedCMIP5: distributed system 60 climate models 2 PB of data
![Page 14: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/14.jpg)
ESGF France
- Cadre de travail des administrateurs de nœuds ESGF de France
- l'IPSL teste et valide les versions ESGF puis publie les procédures de déploiement détaillées et adaptées aux centres
- Partage des connaissances- Synchronisation des déploiements- Support de production- Réunions annuelles à l'IPSL
- La communauté s'inscrit dans la thématique Big Data du projet ANR Convergence ainsi que dans le groupe de travail dédié aux données du projet européen IS-ENES2.
http://forge.ipsl.jussieu.fr/prodiguer/wiki/ESGF-FR [email protected]
![Page 15: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/15.jpg)
1. ESGF IWT Missions and Challenges
Release management Build, test and validate Provide installation tools Secure deployments Administrators training and support
Missions Challenges
Automated builds and tests Easier installation
Node set up in less than one hour
![Page 16: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/16.jpg)
2. ESGF IWT RM Process
Release Management Process
The ESGF software stack development respects a release management process which ensures the quality of deliverables. Three distinct roles are identified:
• Developers push new features into the system•IWT Release Manager is responsible for code freeze, cutting releases and compilation• IWT Administrators are requested to test and validate release candidates
![Page 17: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/17.jpg)
3. ESGF IWT Continuous BuildContinuous Build
The ESGF software stack project source code is hosted on github repositories. ‘Devel’ branches are continuously updated with new features by development teams. Github webhooks trigger the execution of the project compilation on a dedicated machine running Jenkins. Distribution binaries are then made available to the community for testing via a web server. Continuous build is useful to be aware of source code quality and inter project dependencies consistency in real time throughout development phases.
Developers
GitHub DevelBranches
Jenkins Continuous Build Server
Push Code
Triggers Builds Automatically
Binaries Web Server(wars, jars)
Publish Binaries if build completes Warning email if
build breaks
@
http://esgf-build.ipsl.upmc.fr/jenkins
http://esgf-build.ipsl.upmc.fr/builds
![Page 18: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/18.jpg)
3. ESGF Integration Testing
Integration Testing
The ESGF Test Federation is based on vmware virtual machines. It is completely independent from the production federation and is used to run the esgf test suite which performs user’s perspective tests in order to validate release candidates as well as new installations or upgrades.
ESGF Test Infrastructure ESGF Test Suite
4. ESGF IWT Integration Testing
http://vesgint-data.ipsl.jussieu.fr
https://github.com/ESGF/esgf-test-suite
Python Nose - Test Framework Python Requests - HTTP Support Python Subprocess - System Execution Python Selenium - Browser Simulation Python Multiprocessing - Parallelisation
![Page 19: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/19.jpg)
5. ESGF IWT Installer and Distribution Mirrors
Installer and Distribution Mirrors
Freshly cut and validated releases are followed by deployment into production. The installer helps each node administrator across the federation to pull the new binaries. Three synchronized distribution mirrors (1 master @IPSL, 1 slave @PCMDI, 1 slave @BADC) improve binaries availability and transfer delays as the installer identifies the fastest mirror.
Node Admins
get_fastest_mirror()
U.S.
U.K.
FRExecute
ESGF Installer
Calls
![Page 20: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/20.jpg)
Original Timing:o(2) PB of requestedoutput from 20+modelling centresfinished early 2010!Actual Timing?Years late.
IPSL
CMIP3 : 35 To
CNRM-CERFACS
Our data perspective
![Page 21: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/21.jpg)
Selon les contraintes de sécurité (e.g., centres de calculs), deux architectures possibles :
Datanode ESG-F + données sur le même réseau Exemple : CICLAD (IPSL - Jussieu)
Réseau + Datanode ESG-F + données
Indexnode ESG-F(distant ou non du Datanode)
![Page 22: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/22.jpg)
Selon les contraintes de sécurité (e.g., centres de calculs), deux architectures possibles :
Datanode ESG-F dans une DMZExemple : TGCC-CCRT
DMZ + Datanode ESG-F
Réseau local + données
Indexnode ESG-F
Pas d’accès interactifFlux réseau à sens uniqueExport NFS read-only
![Page 23: 1. Simulation 2. Archive and distribute 3. Analysis 4. Understanding](https://reader036.vdocument.in/reader036/viewer/2022062409/56649d815503460f94a669a5/html5/thumbnails/23.jpg)
Login
(1) Data Reference Syntax(2) {datanode} : Filesystem visible par le datanode(3) {project} : Projet (ex : CMIP5)
Vue d'ensemble