ceph distributed file system: simulating a site failure emails: {bazli.abkarim, mt.wong, jyluke}...
TRANSCRIPT
Ceph Distributed File System: Simulating a Site Failure
emails:{bazli.abkarim, mt.wong, jyluke} @mimos.my
Mohd Bazli Ab Karim, Ming-Tat Wong, Jing-Yuan LukeAdvanced Computing LabMIMOS Berhad, Malaysia
In PRAGMA 26, Tainan, Taiwan
9-11 April 2014
2
Outline
• Motivation• Problems• Solution• Demo• Moving forward
3
Motivations
• Explosion of both structured and unstructured data in cloud computing as well as in traditional datacenters presents a challenge for existing storage solution from cost, redundancy, availability, scalability, performance, policy, etc.
• Our motivation thus focus leveraging on commodity hardware/storage and networking to create a highly available storage infrastructure to support future cloud computing deployment in a Wide Area Network, multi-sites/multi-datacenters environment.
4
Problems
Data Center
Disaster Recovery Site(s)
SAN/NAS
SAN/NAS
R/W
Replication/De-
duplication
R/W
Performance
Redundancy
Availability/Reliability
5
Data Center
SAN/NAS
Disaster Recovery Site(s)
SAN/NAS
Solution
Data Center 1
Local/DAS
Data Center 2
Local/DAS
Data Center n
Local/DAS. . .
One/Multiple Virtual Volume(s)
R/ W
R e p l i c a ti o n s
D a t a S t r i p i n g a n d P a r a l l e l R / W
6
• CRUSH – Controlled, Scalable, Decentralized Placement of Replicated Data– It is an algorithm to determine how to store and retrieve
data by computing data storage locations.
• Why?– To use the algorithm to organize and distribute the data to
different datacenters.
Challenging the CRUSH algorithm
CRUSH Map
osd.0Bucket
hostBucket
osd.1Bucket
hostBucket
osd.2Bucket
hostBucket
osd.3Bucket
hostBucket
osd.4Bucket
hostBucket
osd.5Bucket
hostBucket
osd.6Bucket
hostBucket
osd.7Bucket
hostBucket
osd.8Bucket
hostBucket
osd.9Bucket
hostBucket
osd.10Bucket
hostBucket
osd.11Bucket
hostBucket
datacenterBucket
datacenterBucket
datacenterBucket
rootBucket
OBJECT A
Replica A
Replica A
OBJECT BReplica B
Replica B
OBJECT C
Replica C
Replica CDEFAULT
SANDBOX ENVIRONMENT
IF LARGE SCALE, WE NEED A CUSTOM
CRUSH MAP
OBJECT D
Replica D
Replica D
R e p l i c a ti o n s
ENSURE DATA SAFETY
12
DEMO
13
WAN
DC1
DC2
Mimos BerhadKulim Hi-Tech Park
DC3
Mimos BerhadTechnology Park Malaysia, Kuala Lumpur
350 KM
• It was first started as a proof of concept for Ceph as a DFS over wide area network.
• Two sites had been identified to host the storage servers – MIMOS HQ and MIMOS Kulim
• Collaboration work between MIMOS and SGI.• In PRAGMA 26, we will use this Ceph POC setup to
demonstrate a site failure of a geo-replication distributed file system over wide area network.
Demo Background
This Demo…
14
WAN
DC1
DC2
Mimos BerhadKulim Hi-Tech Park
Demo:Simulate node/site failure while doing read write ops.
Test Plan:(a) From DC1, continuously ping
servers in Kulim.(b) Upload 500Mb file to the file
system.(c) While uploading, take down nodes
in Kulim. From (a), check if nodes are down.
(d) Upload completed, download the same file.
(e) While downloading, bring up the nodes in Kulim.
(f) Checksum both files. Both should be same.
DC3
Mimos BerhadTechnology Park Malaysia, Kuala Lumpur
350 KM
15
Demo in progress…
mon-01
osd01-1
osd01-2
osd01-3
osd01-4
mon-02
osd02-1
osd02-2
osd02-3
osd02-4
mon-03
osd03-1
osd03-2
osd03-3
osd03-4
Edge switch
Edge switch
Core switch
Core switch
Edge switch
WAN
Client10.4.133.20
client
Client10.11.21.16
We will go HERE to disconnect
the ports
Datacenter 3 @ MIMOS KULIM
Datacenter 1 @ MIMOS HQ
Datacenter 2 @ MIMOS HQ
We will ping Kulim hosts
HERE!Owncloud sits HERE!
16
• Challenges during POC which running on top of our production network infrastructure.
• Next, can we set up the distributed storage system with virtual machines plus SDN? – Simulate DFS performance over WAN in a virtualized
environment.– Fine-tuning and run experiments: Client’s file-layout, TCP
parameters for the network, routing, bandwidth size/throughput, multiple VLANs etc.
Moving forward…
17