large scale parallel file system and cluster management ict, cas

25
Large Scale Parallel F ile System and Cluster Manag ement ICT, CAS

Upload: baina

Post on 23-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Large Scale Parallel File System and Cluster Management ICT, CAS. About ICT, CAS. Institute of Computing Technology, Chinese Academy of Science The first (from 1958) and largest national IT research institute in China The largest graduate school of Computer Science in China - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Large Scale Parallel File System and Cluster Management ICT, CAS

Large Scale Parallel File System and Cluster Management

ICT, CAS

Large Scale Parallel File System and Cluster Management

ICT, CAS

Page 2: Large Scale Parallel File System and Cluster Management ICT, CAS

About ICT, CAS• Institute of Computing Technology, Chinese

Academy of Science• The first (from 1958) and largest national IT

research institute in China• The largest graduate school of Computer

Science in China• Builder of most Chinese systems in HPC TOP

500 • Focusing on computing system architecture:

CPU, Compiler, Network, Grid, HPC and Storage

Page 3: Large Scale Parallel File System and Cluster Management ICT, CAS

Storage Centre of ICT

• Founded in 2001• Leader: Dr. Xu Lu (from HP Lab)• Storage for scientific computing

– BWFS: Parallel cluster file system– Service on Demand system: Storage-based cl

uster management system.

• Storage for business computing– VSDS: Virtual storage research project– Backup / Virtual Computing……

Page 4: Large Scale Parallel File System and Cluster Management ICT, CAS

The Storage Bottleneck of Cluster The Storage Bottleneck of Cluster• NFS (Network File System)

– Most widely used in clusters to provide shared data access– Simple and easy to use and management

• Scalability Problem– Multiple NFS server means multiple name space– Hard to extend in capacity.– The performance do not increase with the capacity

• Parallel Access Problem– Poor performance in I/O

density computing– Weak MS Windows support

0

10000

20000

30000

40000

50000

60000

70000

80000

1 2 4 8 16 32

计算节点个数

(KB)

数据吞吐率

4k8k32k64k1M2M

Page 5: Large Scale Parallel File System and Cluster Management ICT, CAS

• Parallel network file system– Support multiple storage appliances (8-128) in a single

name space (Up to 512 TB)– Separated Data and Meta-Data access to provide

parallel accessing between different storage appliance

• Global name space between clients with different platforms– Fully compatible with NFS (not 100% POSIX)– Support data sharing between Linux and Windows

clients– Support IA32, IA64 and x86_64 hardware platforms

What’s BWFS

Page 6: Large Scale Parallel File System and Cluster Management ICT, CAS

What’s BWFS

• Centralized Management– Web based management for the storage appliances

and the storage sub-system– Integrated client management with Service on

Demand system.

• Online extension– Add storage appliances to increase the capacity

without stopping the application– The new data will be automatically stripped between

all the storage appliances to get a high performance.

Page 7: Large Scale Parallel File System and Cluster Management ICT, CAS
Page 8: Large Scale Parallel File System and Cluster Management ICT, CAS

Application Server

Storage Appliance

Application Server

Meta-Data User Data

Data Access on NFSData Access on NFS

Page 9: Large Scale Parallel File System and Cluster Management ICT, CAS

存储设备

User-DataMeta-Data

元数据控制器

存储设备

节点服务器

节点服务器

Data Access on BWFSData Access on BWFS

Page 10: Large Scale Parallel File System and Cluster Management ICT, CAS

Bandwidth of BWFSBandwidth of BWFS

write large files(20G per node, 1MB record size)

0

50

100

150

200

250

300

350

1 2 4 8 16Number of client nodes

Agg

rega

te B

andw

idth

(M

B/s

)

1SN

2SN

4SN

NFS

read large files(20G per node, 1MB record size)

0

50

100

150

200

250

300

350

1 2 4 8 16

Number of client nodes

Agg

rega

te B

andw

idth

(MB

/s)

1SN

2SN

4SN

NFS

Page 11: Large Scale Parallel File System and Cluster Management ICT, CAS

Paradigm Epos3 (China Petrol, Xinjiang)Paradigm Epos3 (China Petrol, Xinjiang)

0

2

4

6

8

10

32 64 96 128

节点数

/计算能力(线小时) BWFS NAS9500 RackServer+Dawni ngNFS

Page 12: Large Scale Parallel File System and Cluster Management ICT, CAS

Paradigm Disco (China Petrol, Xinjiang)Paradigm Disco (China Petrol, Xinjiang)

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

节点数(每节点一个作业)

运行平均时间(秒)

BWFS Rai dsysNFS NAS9500NAS8500 (BWFS)线性

Page 13: Large Scale Parallel File System and Cluster Management ICT, CAS

Management InterfaceManagement Interface

Page 14: Large Scale Parallel File System and Cluster Management ICT, CAS

Service on Demand System• Initially developed as a subsystem of BWFS to

provide cluster management• Reduce the management work especially in the

system deployments• Increase the availability against the storage

components fail• Enable the fast schedule in large server farms

with multiple clusters• Boot the system directly from the BWFS storage

appliance without the need of local hard disks

Page 15: Large Scale Parallel File System and Cluster Management ICT, CAS

Traditional Cluster Deployment System

硬盘

硬盘

硬盘

硬盘

硬盘

硬盘

20mins

系统映像

Page 16: Large Scale Parallel File System and Cluster Management ICT, CAS

Shortcoming 1: Inefficiency in Schedule

硬盘

硬盘

硬盘

硬盘

硬盘

硬盘

系统映像 系统映像2

20 mins

Page 17: Large Scale Parallel File System and Cluster Management ICT, CAS

Shortcoming 2: Inefficiency in Maintains

硬盘

硬盘

硬盘

硬盘

硬盘

硬盘

系统映像 系统映像2

Hard disk errors occupy 30%-50% of all the computer system errors

Page 18: Large Scale Parallel File System and Cluster Management ICT, CAS

Shortcoming 3: Inefficiency in Capacity

硬盘

硬盘

硬盘

硬盘

硬盘

硬盘

系统映像 系统映像2

A 5GB system on a 74GB hard disk

The disks are getting larger and larger but the system images are keeping small

to reduce deployment time

Page 19: Large Scale Parallel File System and Cluster Management ICT, CAS

Service on Demand System• Diskless boot OS by TCP/IP

– Virtual SCSI disk to support Windows and Linux– Fully compatible with applications

• Provide high performance snapshots to support fast cloning of system images– Copy on Write when the system image is modified– Online backup system image with snapshot

• Automatic take over on failed clients• Integrated monitor engine to support automatic

schedule or adaptive computing (still in researching)

Page 20: Large Scale Parallel File System and Cluster Management ICT, CAS

Service on Demand System

User

Appl i cati on Node

Service 1

Service 2Service N

Storage Appl iance

Map to LocalDiskNetwork

Page 21: Large Scale Parallel File System and Cluster Management ICT, CAS

Email系统

Web系统

ParadigmImage

Fast Deployment and Schedule

CGGImage

ParadigmSnapshot

ParadigmSnapshot

ParadigmSnapshot

ParadigmSnapshot

ParadigmServices

CGGServices

ParadigmSnapshot

CGGSnapshot

Page 22: Large Scale Parallel File System and Cluster Management ICT, CAS

SystemImage

SystemSnapshot

Easy to maintain

SystemSnapshot

SystemSnapshot

SystemSnapshot

SystemSnapshot

Maintenance

Page 23: Large Scale Parallel File System and Cluster Management ICT, CAS

Management UI

Page 24: Large Scale Parallel File System and Cluster Management ICT, CAS

部署、管理网络百兆以太网

服务网络千兆以太网

千兆

计算节点 17台73G硬盘,2 CPU,

4GB MEMORY

部署系统设备1TB

龙芯NC 两台 曙光PC

I nfi ni Band

4T盘阵

大内存节点 4 CPU, 8GB MEMORY

注:挂DVD刻录机

SERVER73GB× 3硬盘,4GB

MEMORY, 2 CPU

备用节点 4 CPU, 4GB MEMORY

虚拟存储管理服务器(物理机器为Console)

3T盘阵×3

IP SAN

Consol e(曙光PC)

平台管理访问局域网

Internet

Page 25: Large Scale Parallel File System and Cluster Management ICT, CAS

Thanks 谢谢! Thanks 谢谢!