how to build a scalable storage system with oss at tlug meeting 2008/09/13
DESCRIPTION
TRANSCRIPT
TLUG Meeting 2008/09/13
Gosuke Miyashita
My company
paperboy&co.Web hosting, blog, ec hosting and so on for
indivisualsAbout 1,000 Linux serversMany single servers ...
My goal of a scalable storage system Storage system for a web hosting service
High resource availabilityFlexible I/O distributionEasy to extend Mountable by multiple hostsNo SPoFWith OSSWithout expensive hardwares
Now I’m trying technologies for these purposes
cman CLVM GFS2 GNBD DRBD DM-MP
Technologies
cman
Cluster Manager A component of Red Hat Cluster Suit Membership management Messaging among cluster nodes Needed for CLVM and GFS2
CLVM
Cluster Logical Volume Manager Cluster-wide version of LVM2 Automatically share LVM2 metadata
among all cluster nodes So logical volumes with CLVM available
to all cluster nodes
CLVM
Logical volumeon shared storage
LVM2Metadata
clvmd
LVM2Metadata
clvmd
LVM2Metadata
clvmd
clvmd distributes metadata among cluster nodes
Logical volumes presented to each cluster node
GNBD
Global Network Block Device Provides block-device access over
TCP/IP Similar to iSCSI Advantage over iSCSI is built-in fencing
GNBD
GNBD client
GNBD client
GNBD client
GNBD Server
Exported block device
TCP/IP network
GFS2
Global File System 2 One of cluster-aware file systems Multiple nodes can simultaneously
access this filesystem Uses DLM(Distributed Lock Manager) of
cman to maintain file system integrity OCFS is another cluster-aware file
system
GFS2
GNBD Server
GFS2
GNBD client
cman
GNBD client
cman
GNBD client
cman
These nodes can access to the GFS2 file system simultaneously
DRBD
Distributed Replicated Block Device RAID1 over a network Mirrors a whole block device over
TCP/IP Available Active/Active with cluster file
systems
DRBD
Server
Block Device
Server
Block Device
Replication
DM-MP
Device-Mapper Multipath Bundles I/O paths to one virtual I/O path Can choose active/passive or
active/active
DM-MP with SAN storage
Node
HBA1 HBA2
SAN swtich 1 SAN swtich 2
Storage
CNTRLR1 CNTRLR2
/dev/sda1 /dev/sdb1
Seen as one device
/dev/mapper/mpath0
active/passiveor
active/active
A scalable storage system
cmanGNBD
cmanGNBD
/dev/VG0/LV0 (CLVM)
GNBD
Server
GFS2
GNBD
Server
GFS2
Replication(DRBD)
/dev/mapper/mpath0(DM-MP)/dev/gnbd0 /dev/gnbd1
mount /dev/VG0/LV0 /mnt
GNBD
Server
GFS2
GNBD
Server
GFS2
Replication(DRBD)
/dev/mapper/mpath1(DM-MP)/dev/gnbd2 /dev/gnbd3
How to extend
cmanGNBD
cmanGNBD
/dev/VG0/LV0 (CLVM)
mount /dev/VG0/LV0 /mnt
GNBD
Server
GFS2
GNBD
Server
GFS2
/dev/mapper/mpath0
/dev/gnbd0 /dev/gnbd1GNB
DServe
rGFS
2
GNBD
Server
GFS2
/dev/mapper/mpath1
/dev/gnbd2 /dev/gnbd3GNB
DServe
rGFS
2
GNBD
Server
GFS2
/dev/mapper/mpath2
/dev/gnbd4 /dev/gnbd5
I wonder ...
Many components cause troubles? How about overhead and performance? How about stability? More better way? How about other than Red Hat Linux?