large scale parallel file system and cluster management ict, cas
DESCRIPTION
Large Scale Parallel File System and Cluster Management ICT, CAS. About ICT, CAS. Institute of Computing Technology, Chinese Academy of Science The first (from 1958) and largest national IT research institute in China The largest graduate school of Computer Science in China - PowerPoint PPT PresentationTRANSCRIPT
Large Scale Parallel File System and Cluster Management
ICT, CAS
Large Scale Parallel File System and Cluster Management
ICT, CAS
About ICT, CAS• Institute of Computing Technology, Chinese
Academy of Science• The first (from 1958) and largest national IT
research institute in China• The largest graduate school of Computer
Science in China• Builder of most Chinese systems in HPC TOP
500 • Focusing on computing system architecture:
CPU, Compiler, Network, Grid, HPC and Storage
Storage Centre of ICT
• Founded in 2001• Leader: Dr. Xu Lu (from HP Lab)• Storage for scientific computing
– BWFS: Parallel cluster file system– Service on Demand system: Storage-based cl
uster management system.
• Storage for business computing– VSDS: Virtual storage research project– Backup / Virtual Computing……
The Storage Bottleneck of Cluster The Storage Bottleneck of Cluster• NFS (Network File System)
– Most widely used in clusters to provide shared data access– Simple and easy to use and management
• Scalability Problem– Multiple NFS server means multiple name space– Hard to extend in capacity.– The performance do not increase with the capacity
• Parallel Access Problem– Poor performance in I/O
density computing– Weak MS Windows support
0
10000
20000
30000
40000
50000
60000
70000
80000
1 2 4 8 16 32
计算节点个数
(KB)
数据吞吐率
4k8k32k64k1M2M
• Parallel network file system– Support multiple storage appliances (8-128) in a single
name space (Up to 512 TB)– Separated Data and Meta-Data access to provide
parallel accessing between different storage appliance
• Global name space between clients with different platforms– Fully compatible with NFS (not 100% POSIX)– Support data sharing between Linux and Windows
clients– Support IA32, IA64 and x86_64 hardware platforms
What’s BWFS
What’s BWFS
• Centralized Management– Web based management for the storage appliances
and the storage sub-system– Integrated client management with Service on
Demand system.
• Online extension– Add storage appliances to increase the capacity
without stopping the application– The new data will be automatically stripped between
all the storage appliances to get a high performance.
`̀
Application Server
Storage Appliance
Application Server
Meta-Data User Data
Data Access on NFSData Access on NFS
存储设备
User-DataMeta-Data
元数据控制器
存储设备
节点服务器
节点服务器
Data Access on BWFSData Access on BWFS
Bandwidth of BWFSBandwidth of BWFS
write large files(20G per node, 1MB record size)
0
50
100
150
200
250
300
350
1 2 4 8 16Number of client nodes
Agg
rega
te B
andw
idth
(M
B/s
)
1SN
2SN
4SN
NFS
read large files(20G per node, 1MB record size)
0
50
100
150
200
250
300
350
1 2 4 8 16
Number of client nodes
Agg
rega
te B
andw
idth
(MB
/s)
1SN
2SN
4SN
NFS
Paradigm Epos3 (China Petrol, Xinjiang)Paradigm Epos3 (China Petrol, Xinjiang)
0
2
4
6
8
10
32 64 96 128
节点数
/计算能力(线小时) BWFS NAS9500 RackServer+Dawni ngNFS
Paradigm Disco (China Petrol, Xinjiang)Paradigm Disco (China Petrol, Xinjiang)
0
500
1000
1500
2000
2500
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
节点数(每节点一个作业)
运行平均时间(秒)
BWFS Rai dsysNFS NAS9500NAS8500 (BWFS)线性
Management InterfaceManagement Interface
Service on Demand System• Initially developed as a subsystem of BWFS to
provide cluster management• Reduce the management work especially in the
system deployments• Increase the availability against the storage
components fail• Enable the fast schedule in large server farms
with multiple clusters• Boot the system directly from the BWFS storage
appliance without the need of local hard disks
Traditional Cluster Deployment System
硬盘
硬盘
硬盘
硬盘
硬盘
硬盘
20mins
系统映像
Shortcoming 1: Inefficiency in Schedule
硬盘
硬盘
硬盘
硬盘
硬盘
硬盘
系统映像 系统映像2
20 mins
Shortcoming 2: Inefficiency in Maintains
硬盘
硬盘
硬盘
硬盘
硬盘
硬盘
系统映像 系统映像2
Hard disk errors occupy 30%-50% of all the computer system errors
Shortcoming 3: Inefficiency in Capacity
硬盘
硬盘
硬盘
硬盘
硬盘
硬盘
系统映像 系统映像2
A 5GB system on a 74GB hard disk
The disks are getting larger and larger but the system images are keeping small
to reduce deployment time
Service on Demand System• Diskless boot OS by TCP/IP
– Virtual SCSI disk to support Windows and Linux– Fully compatible with applications
• Provide high performance snapshots to support fast cloning of system images– Copy on Write when the system image is modified– Online backup system image with snapshot
• Automatic take over on failed clients• Integrated monitor engine to support automatic
schedule or adaptive computing (still in researching)
Service on Demand System
User
Appl i cati on Node
Service 1
Service 2Service N
Storage Appl iance
Map to LocalDiskNetwork
Email系统
Web系统
ParadigmImage
Fast Deployment and Schedule
CGGImage
ParadigmSnapshot
ParadigmSnapshot
ParadigmSnapshot
ParadigmSnapshot
ParadigmServices
CGGServices
ParadigmSnapshot
CGGSnapshot
SystemImage
SystemSnapshot
Easy to maintain
SystemSnapshot
SystemSnapshot
SystemSnapshot
SystemSnapshot
Maintenance
Management UI
部署、管理网络百兆以太网
服务网络千兆以太网
千兆
计算节点 17台73G硬盘,2 CPU,
4GB MEMORY
部署系统设备1TB
龙芯NC 两台 曙光PC
I nfi ni Band
4T盘阵
大内存节点 4 CPU, 8GB MEMORY
注:挂DVD刻录机
SERVER73GB× 3硬盘,4GB
MEMORY, 2 CPU
备用节点 4 CPU, 4GB MEMORY
虚拟存储管理服务器(物理机器为Console)
3T盘阵×3
IP SAN
Consol e(曙光PC)
平台管理访问局域网
Internet
Thanks 谢谢! Thanks 谢谢!