distributed data management at dkrz - ecmwf
TRANSCRIPT
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 1
Distributed Data ManagementDistributed Data Management
at DKRZat DKRZ
Wolfgang SellWolfgang Sell
Deutsches KlimarechenzentrumDeutsches Klimarechenzentrum GmbHGmbHsell@[email protected]
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 2
Table of Table of ContentsContents
• DKRZ - a German HPC Center
• HPC Systemarchitecture suited for Earth System Modeling
• The HLRE Implementation at DKRZ
• Some Results
• Some Lessons Learnt
• Summary
• DKRZ - a German HPC Center
• HPC Systemarchitecture suited for Earth System Modeling
• The HLRE Implementation at DKRZ
• Some Results
• Some Lessons Learnt
• Summary
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 3
DKRZ - a German HPCCDKRZ DKRZ -- a German HPCCa German HPCC
• Mission of DKRZ
• DKRZ and its Organization
• DKRZ Services
• DKRZ Restructuring
• Mission of DKRZ
• DKRZ and its Organization
• DKRZ Services
• DKRZ Restructuring
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 4
In 1987 DKRZ was founded with the Mission to
• Provide state-of-the-art supercomputing and data service to the German scientific community to conduct top of the line Earth System and Climate Modelling.
• Provide associated services including high level visualization.
In 1987 DKRZ was founded with the Mission to
• Provide state-of-the-art supercomputing and data service to the German scientific community to conduct top of the line Earth System and Climate Modelling.
• Provide associated services including high level visualization.
Mission of DKRZMission of DKRZ
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 5
Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center
• organised under private law (GmbH) with 4 shareholders
• investments funded by federal government,operations funded by shareholders
Deutsches KlimaRechenZentrum = DKRZ German Climate Computer Center
• organised under private law (GmbH) with 4 shareholders
• investments funded by federal government,operations funded by shareholders
DKRZ and its Organization (1)DKRZ and its Organization (1)
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 6
DKRZ internal Structure
• 3 departments for• systems and networks• visualisation and consulting• administration
• 20 staff in total• until restructuring end of 1999 a fourth department
supported climate model applications and climate data management
DKRZ internal Structure
• 3 departments for• systems and networks• visualisation and consulting• administration
• 20 staff in total• until restructuring end of 1999 a fourth department
supported climate model applications and climate data management
DKRZ and its Organization (2)DKRZ and its Organization (2)
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 7
• operations center: DKRZ
• technical organization of computational ressources(compute-, data- and network-services, infrastructure)
• advanced visualisation• assistance for parallel architectures
(consulting and training)
• operations center: DKRZ
• technical organization of computational ressources(compute-, data- and network-services, infrastructure)
• advanced visualisation• assistance for parallel architectures
(consulting and training)
DKRZ ServicesDKRZ Services
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 8
Application center: Model & Data
• professional handling of community models• specific scenario runs, e.g. IPCC• scientific data handling
Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF
Application center: Model & Data
• professional handling of community models• specific scenario runs, e.g. IPCC• scientific data handling
Model & Data Group external to DKRZ, administered by MPI for Meteorology, funded by BMBF
Model & Data ServicesModel & Data Services
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 9
HPC Systemarchitecture suited for Earth System Modeling
HPCHPC Systemarchitecture Systemarchitecture suited forsuited for Earth SystemEarth System ModelingModeling
• Principal HPC System Configuration
• Configuration Variants
• Links between Different Services
• The Data Problem
• Pros and Cons of Shared Filesystems
• Principal HPC System Configuration
• Configuration Variants
• Links between Different Services
• The Data Problem
• Pros and Cons of Shared Filesystems
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 10
Generic HPC System ConfigurationGenericGeneric HPC System HPC System ConfigurationConfiguration
80%
20%
CS DS
Global Systemarchitecture
Rest of theWorld
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 11
Variants of System Configuration (1)VariantsVariants of System of System Configuration Configuration (1)(1)
80%
20%
CS DS
Shared Filesystem
Rest of theWorld
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 12
Variants of System Configuration (2)VariantsVariants of System of System Configuration Configuration (2)(2)
80%
20%
CS DS
Classical LAN-Coupling
Rest of theWorld
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 13
• Functionality and Performance Requirements for Data Service
• Transparent Access to Migrated Data
• High Bandwidth for Data Transfer
• Shared Filesystem
• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile
• Balance between Computational and Data Management Capabilities
• Functionality and Performance Requirements for Data Service
• Transparent Access to Migrated Data
• High Bandwidth for Data Transfer
• Shared Filesystem
• Possibility for Adaptation in Upgrade Stepsdue to Changes in Usage Profile
• Balance between Computational and Data Management Capabilities
Link between Compute Powerand Non-Computing ServicesLink between Compute Powerand Non-Computing Services
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 14
Evolution of Computing Power at DKRZEvolution of Evolution of ComputingComputing Power Power at DKRZat DKRZ
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 15
Adaptation Problem for Data ServerAdaptation Problem Adaptation Problem for Datafor Data ServerServer
Dataproblem in HPC
0
500
1.000
1.500
2.000
2.500
3.000
0 50 100 150 200 250 300 350 400 450 500
Effective Compute Power (P) in GFlops
Dat
ener
zeu
gu
ng
srat
e in
TB
yte/
Jah
r
data increase:
linear, P1
P3/4
P2/3
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 16
• High Bandwidth between the Coupled Servers
• Scalability supported by Operating System
• No Needs for Multiple Copies
• Record Level Access to Data with High Performance
• Minimized Data Transfers
• High Bandwidth between the Coupled Servers
• Scalability supported by Operating System
• No Needs for Multiple Copies
• Record Level Access to Data with High Performance
• Minimized Data Transfers
Pros of Shared Filesystem CouplingPros of Shared Filesystem Coupling
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 17
• Proprietary Software needed
• Standardisation still missing
• Limited Number of Systems that can be connected
• Proprietary Software needed
• Standardisation still missing
• Limited Number of Systems that can be connected
Cons of Shared Filesystem CouplingCons of Shared Filesystem Coupling
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 18
HLRE Implementation at DKRZ HLRE Implementation at DKRZ HLRE Implementation at DKRZ
HöchstLeistungsRechnersystem für die Erdsystemforschung = HLREHigh Performance Computer System for Earth System Research
•Principal HLRE System Configuration
• Requirements and Constraints
• Links between Different Services
• Option for Systemoperation
HöchstLeistungsRechnersystem für die Erdsystemforschung = HLREHigh Performance Computer System for Earth System Research
•Principal HLRE System Configuration
• Requirements and Constraints
• Links between Different Services
• Option for Systemoperation
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 19
Principal HLRE System ConfigurationPrincipalPrincipal HLRE System HLRE System ConfigurationConfiguration
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 20
Hardware at DKRZ(October 2004)Hardware at DKRZHardware at DKRZ(October 2004)(October 2004)
• 24 SX-6 Nodes (192 Vector CPUs, 1,5 TByte CM and 1,5 Tflops peak)
• IXS Crossbar switch( 24 x 24, 2*8*24 GByte/s cross section bandwidth)
• 10 NEC AsAmA Nodes(132 Itanium-2, 1,0 and 1,5 GHz, Linux)
• 1 NEC AzusA (8 Itanium-1; 800 MHz; Linux)
• 4 STK Silos (total capacity ca. 3.5 PetaByte)
• 4 SUN Fire 4800 (Oracle Appl. Service)
• 24 SX-6 Nodes (192 Vector CPUs, 1,5 TByte CM and 1,5 Tflops peak)
• IXS Crossbar switch( 24 x 24, 2*8*24 GByte/s cross section bandwidth)
• 10 NEC AsAmA Nodes(132 Itanium-2, 1,0 and 1,5 GHz, Linux)
• 1 NEC AzusA (8 Itanium-1; 800 MHz; Linux)
• 4 STK Silos (total capacity ca. 3.5 PetaByte)
• 4 SUN Fire 4800 (Oracle Appl. Service)
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 21
DKRZ HardwareCurrent ConfigurationDKRZ HardwareDKRZ HardwareCurrent ConfigurationCurrent Configuration
���
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 22
Filesystem SystematicsCS ViewFilesystem Filesystem SystematicsSystematicsCS CS ViewView
permanent transient
$HOME
$UT
$UTF $TMPSHR
$WRKSHR
$TMPDIR
24
24
20
12
48
46.5 TB
30 TB
13 TB
6.5 TB
11 TB
2 TBjob temporalnode local
O(days)
O(weeks)quota
all nodes
all nodesall nodes
all nodes
all nodesfor ever
1 year
for ever
on tape no backup
quota
quota (#files)
quota
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 23
Some ResultsSome ResultsSome Results
• Point of Operation in CS-DS-Space
• Growth of the Data Archive
• Growth of Transferrate
• Variability of Transferrates
• Point of Operation in CS-DS-Space
• Growth of the Data Archive
• Growth of Transferrate
• Variability of Transferrates
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 24
���������������� ��� ������������������������ ��� ��������
Dataproblem in ESM related HPC
0
500
1.000
1.500
2.000
2.500
3.000
0 50 100 150 200 250 300 350 400 450 500
Effective Compute Power (P) in GFlops
Dat
a C
reat
ion
Rat
e in
TB
yte/
yr
data increase:
linear, P1
P3/4
P2/3
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 25
DS archive capacity (1)DS DS archivearchive capacitycapacity (1)(1)
0
500
1000
1500
2000
2500
[TB]
1992
1994
1996
1998
2000
2002
2004
archive capacity
originalduplicates
0
500
1000
1500
2000
2500
[TB]
1992
1994
1996
1998
2000
2002
2004
archive capacity
originalduplicates
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 26
DS transfer rates (1)DS DS transfertransfer ratesrates (1)(1)
0
1000
2000
3000
4000
5000
6000
7000
[GB]
1992
1994
1996
1998
2000
2002
2004
daily transfer volume
fetch/sysfetch/usrstore/sysstore/usr
0
1000
2000
3000
4000
5000
6000
7000
[GB]
1992
1994
1996
1998
2000
2002
2004
daily transfer volume
fetch/sysfetch/usrstore/sysstore/usr
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 27
DS archive capacity (2001-2004)DS DS archivearchive capacitycapacity (2001(2001--2004)2004)
0
500
1000
1500
2000
2500
[TB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
archive capacity
originalduplicates
0
500
1000
1500
2000
2500
[TB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
archive capacity
originalduplicates
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 28
DS archive capacity (2001-2004)DS DS archive capacity archive capacity (2001(2001--2004)2004)
0
500
1000
1500
2000
2500
[TB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
archive capacity
dumpsOracleUser/utUser/pf
0
500
1000
1500
2000
2500
[TB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
archive capacity
dumpsOracleUser/utUser/pf
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 29
DS transfer rates (2001-2004)DS DS transfertransfer ratesrates (2001(2001--2004)2004)
010002000300040005000600070008000
[GB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer volume
fetch/sysfetch/usrstore/sysstore/usr
010002000300040005000600070008000
[GB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer volume
fetch/sysfetch/usrstore/sysstore/usr
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 30
DS transfer rates (2001-2004)DS DS transfertransfer ratesrates (2001(2001--2004)2004)
0
5000
10000
15000
[GB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer volume
minimumaveragemaximum0
5000
10000
15000
[GB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer volume
minimumaveragemaximum
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 31
Tape transfer rates (2001-2004)Tape Tape transfer rates transfer rates (2001(2001--2004)2004)
0
5000
10000
15000
[GB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer volume
minimumaveragemaximum0
5000
10000
15000
[GB]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer volume
minimumaveragemaximum
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 32
0100020003000400050006000700080009000
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer volume [GB]
repackclient
0100020003000400050006000700080009000
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer volume [GB]
repackclient
Tape transfer rates (2001-2004)Tape Tape transfer rates transfer rates (2001(2001--2004)2004)
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 33
DS transfer requests (2001-2004)DS DS transfertransfer requestsrequests (2001(2001--2004)2004)
0100002000030000400005000060000700008000090000
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer requests
fetch/sysfetch/usrstore/sysstore/usr
0100002000030000400005000060000700008000090000
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
daily transfer requests
fetch/sysfetch/usrstore/sysstore/usr
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 34
DS archive capacity (2001-2004)DS DS archivearchive capacitycapacity (2001(2001--2004)2004)
0
5
10
15
20
25
30
[million]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
number of files stored
9840C9940B9940A9840ABSD3VHS9490
0
5
10
15
20
25
30
[million]
Sep
01
Jan
02
Mai
02
Sep
02
Jan
03
Mai
03
Sep
03
Jan
04
Mai
04
Sep
04
number of files stored
9840C9940B9940A9840ABSD3VHS9490
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 35
Some Lessons LearntSome Lessons LearntSome Lessons Learnt
• Current Implementation of Non-Computing Services needs Significant Amount of Local Disk Space, e.g.HSM and DBMS need their Own Cache
• Lack of Standardisation for Shared FilesystemsDependence on Co-operativeness, e.g.Graphics Server IntegrationPre/Post-Processing Servers from Different Vendors
• Fail-over Solutions needed inComplex Distributed Systems
• Current Implementation of Non-Computing Services needs Significant Amount of Local Disk Space, e.g.HSM and DBMS need their Own Cache
• Lack of Standardisation for Shared FilesystemsDependence on Co-operativeness, e.g.Graphics Server IntegrationPre/Post-Processing Servers from Different Vendors
• Fail-over Solutions needed inComplex Distributed Systems
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 36
Some Lessons Learnt, cont.Some Lessons LearntSome Lessons Learnt, , contcont..
• Server Scalability needed, but no ProblemClient Scalability may be a Problem, e.g128 LUN Limitation for Linux 2.4
• Distributed Servers may Generate Intriguing Dependencies, i.e. clearly Structured High LevelServices do not Guarantee Ease of PerformantOperation
• Server Scalability needed, but no ProblemClient Scalability may be a Problem, e.g128 LUN Limitation for Linux 2.4
• Distributed Servers may Generate Intriguing Dependencies, i.e. clearly Structured High LevelServices do not Guarantee Ease of PerformantOperation
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 37
Invocation Period and Lifetime of Dirty Pages for kupdatedInvocation Period and Lifetime of Dirty Pages for kupdated
����� �� ������������������������� �� ��������������������
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS Page 38
�������� ��������� ������������������ ��������� ����������
Invocation Period and Lifetime of Dirty Pages for kupdatedInvocation Period and Lifetime of Dirty Pages for kupdated
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 39
SummarySummarySummary
• DKRZ provides Computing Resources for Climate Research in Germany on an competitive international level
• The HLRE System Architecture is suited to cope with a compute- and data-intensive Usage Profile
• Shared Filesystems today are operational in Heterogenous System Environments
• Standardisation-Efforts for Shared Filesystems needed
• DKRZ provides Computing Resources for Climate Research in Germany on an competitive international level
• The HLRE System Architecture is suited to cope with a compute- and data-intensive Usage Profile
• Shared Filesystems today are operational in Heterogenous System Environments
• Standardisation-Efforts for Shared Filesystems needed
27-Oct-2004 11. HPC Workshop at ECMWF, Reading, WFS 40
Thank you for your attention Thank you for your attention !!