© 2007 open grid forum grid provisioning from cloned golden boot images alan g. yoder, ph.d....
TRANSCRIPT
© 2007 Open Grid Forum
Grid provisioning from cloned golden boot images
Alan G. Yoder, Ph.D.
Network Appliance Inc.
2© 2007 Open Grid Forum
Outline
• Types of grids• Storage provisioning in various grid types• Case study
• performance• stability
3© 2007 Open Grid Forum
Grid types
• Cycle scavenging• Clusters• Data center grids
4© 2007 Open Grid Forum
Cycle scavenging grids
• Widely distributed as a rule• campus or department wide• global grids
• Typically for • collaborative science• resource scavenging
• Main focus to date of GGF, OGF, Globus, et al.• Category includes "grid of grids"
5© 2007 Open Grid Forum
Clusters
• Grid-like systems• good scaleout• cluster-wide namespace
• Especially attractive in HPC settings• Many concepts in common with cycle-
scavenging systems• but proprietary infrastructure
• no management standards yet
6© 2007 Open Grid Forum
Data Center Grids
• Focus of this talk• Typically fairly homogenous
• standard compute node hardware• two or three OS possibilities
• Two variants• Nodes have disks
• Topologically homomorphic to cycle scavenging grids• May use cycle scavenging grid technology
• Nodes are diskless• Storage becomes much more important storage grids
7© 2007 Open Grid Forum
Storage technology adoption curves
Market Adoption Cycles
Enterprise Storage Market
TodayGrid Frameworks
Today ?
Direct attachedStorage
NetworkedStorage
StorageGrids
Global StorageNetwork
Focus of this talk
8© 2007 Open Grid Forum
Diskless compute farms
• Connected to storage grids• Boot over iSCSI or FCP• OS is provisioned in a boot LUN on a storage array• Applications can be provisioned as well
Key benefit – nodes can be repurposed at any time from a different boot image
Key benefit – smart storage and provisioning technology can use LUN cloning to deliver storage efficiencies through block sharing
Key benefit – no rotating rust in compute nodes• reduced power and cooling requirements• no OS/applications to provision on new nodes
9© 2007 Open Grid Forum
Local fabric technologies
SAN
blah blah blah blah blah blah blah blah blah blah
products = e.g.
shadowimage,flexclone
iSCSIorFC
• Servers boot over iSCSI or FCP SAN• Storage server(s) maintain golden image + clones
10© 2007 Open Grid Forum
Global deployment technologies
iSAN
iSAN
iSAN
iSAN
WAN
products e.g.snapmirror,
trucopy
Long-haul replication from centraldata center to local centers
11© 2007 Open Grid Forum
Diskless booting
LU – Logical UnitLUN – Logical Unit Number
Mapping – LUNs :: initiator portsMasking – Initiators :: LUNs (“views”)
• Node shuts down• Storage maps desired image to LUN 0 for the zone
(FCP) or initiator group (iSCSI) the node is in• Node restarts• Node boots from LUN 0
• mounts scratch storage space if also provided• starts up grid-enabled application
• Node proceeds to compute until done or repurposed
12© 2007 Open Grid Forum
Example
/vol/vol1/geotherm2LUN 0
mapped to gridsrv1
gridsrv1 gridsrv2 gridsrv3
/vol/vol1/mysql_on_linuxLUN 0
mapped to gridsrv2 /vol/vol1/mysql_on_linux
LUN 0mapped to gridsrv3
compute grid
storage grid
13© 2007 Open Grid Forum
What makes this magic happen?
/vol/vol1/geotherm2LUN 0
mapped to gridsrv1
gridsrv1 gridsrv2 gridsrv3
/vol/vol1/mysql_on_linuxLUN 0
mapped to gridsrv2 /vol/vol1/mysql_on_linux
LUN 0mapped to gridsrv3
compute grid
storage grid
SGME
14© 2007 Open Grid Forum
SGME
• Storage Grid Management Entity• Component of overall GME in OGF Reference
model• GME is the collection of software that assembles
the components of a grid into a grid
• Provisioning, monitoring etc.
• Many GME products: Condor et al
• Current storage grid incarnations are often home-rolled scripts
• Also Stork, Lustre, qlusters
15© 2007 Open Grid Forum
Provisioning a diskless node
• Add HBAs to white box if necessary• Fiddle with CMOS to boot from SAN• For iSCSI:
• DHCP supplies address, node name• SGME provisions igroup for node address• SGME creates LU for node• SGME maps LU to igroup
• For FC:• zone, mask, map, etc.
SGME Grid Storage Management software
HBA Host Bus Adapter
CMOS BIOS settings
DHCP IP boot management
16© 2007 Open Grid Forum
Provisioning a diskless node
• Add HBAs to white box if necessary• We used QLogic 4052 adapters
• Fiddle with CMOS to boot from SAN
• Get your white box vendor to do this• Blade server racks generally easily configurable
for this as well
17© 2007 Open Grid Forum
Preparing a gold image
• On a client – this is manual one-time work• Install Windows server (e.g.)• Setup HBA
• e.g. QLogic needs iscli.exeand commands in startup.batC:\iscli.exe –n 0 KeepAliveTO 180 IP_ARP_Redirect on
• Software initiators must be prevented from paging out
• HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management:DisablePagingExecutive => 1
• Run Microsoft sysprep setup mgr and seal image
18© 2007 Open Grid Forum
Preparing a gold lun
• On storage server – manual one time work• Copy the golden image to a new base LUN (over CIFS)
des-3050-2> lun show /vol/vol1/gold/win2k3_hs 10g ...
• Create a snap shot of the volume with the gold lun….
des-3050-2> snap create vol1 windows_lun• Create an igroup for each initiator
des-3050-2> igroup create -i -t windows kc65b1 \
iqn.2000-04.com.qlogic:qmc4052.zj1ksw5c9072.1
Note: commands in blue type are Netapp-specific, for purposes of illustration only
19© 2007 Open Grid Forum
Preparing cloned LUNs
• SGME: for each client• create a qtree des-3050-2> qtree create /vol/vol1/iscsi/
• Create a lun clone from the gold lundes-3050-2> lun create –b \ /vol/vol1/.snapshot/windows_lun/gold/win2k3_hs \ /vol/vol1/iscsi/kc65b1
• Map the lun to the igroup.des-3050-2> lun map /vol/vol1/iscsi/kc65b1 kc65b1 0
20© 2007 Open Grid Forum
Getting clients to switch horses
• SGME: for each client • Notify client to clean up• Bring down client
• remote power strips/blade controllers• Remap client LUN on storage
des-3050-2> lun offline /vol/vol1/iscsi/kc65b1
des-3050-2> lun unmap /vol/vol1/iscsi/kc65b1 kc65b1 0
des-3050-2> lun online /vol/vol1/iscsi2/kc65b1
des-3050-2> lun map /vol/vol1/iscsi/kc65b1 kc65b1 0
• Bring up client• DHCP
21© 2007 Open Grid Forum
Lab results
• Experiment conducted at Network Appliance• FAS 3050 clustered system• 224 clients (112 per cluster node)
• dual core 3.2GHz/2GB Intel Xeon IBM H20 Blades • Qlogic QMC 4052 adapters • Windows Server 2003 SE SP1
• Objectives• determine robustness and performance characteristics of
configuration, under conditions of storage failover and giveback
• determine viability of keeping paging file on central storage
22© 2007 Open Grid Forum
Network configuration
Not your daddy’s network
23© 2007 Open Grid Forum
Client load
• Program to generate heavy CPU and paging activity (2 GB memory area, lots of reads and writes)
• Several instances per client
24© 2007 Open Grid Forum
Client load, cont.
• ~400 pages/sec
25© 2007 Open Grid Forum
Load on storage
Near 100% disk utilization on storage systemin takeover mode
des-3050-1(takeover)> sysstat -u 1 CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 18% 1318 3129 4413 167328 48 0 0 13 98% 0% - 100% 42% 2708 67637 6165 166210 8 0 0 13 99% 0% - 100% 53% 2035 71519 5258 155134 52419 0 0 13 99% 45% D 100% 54% 1852 62163 4488 124647 99591 0 0 13 99% 100% : 79% 49% 2021 70115 5083 123828 58347 0 0 13 99% 100% D 73% 83% 1005 24380 2414 110473 54491 0 0 13 99% 100% : 83% 42% 2892 65357 7878 211645 56495 0 0 13 99% 100% : 128% 39% 2250 29027 7839 155554 19597 0 0 13 99% 35% D 93% 74% 1671 39249 4393 184457 57014 0 0 15 100% 100% : 112% 38% 2323 57148 6777 161911 69163 0 0 15 99% 100% : 100% 51% 2105 52591 5354 147766 95826 0 0 12 99% 90% D 86% 29% 382 957 988 163609 60946 0 0 12 98% 100% : 100% 19% 1331 2232 4305 163301 6938 0 0 12 98% 49% : 100% 18% 1247 1547 4390 164802 24 0 0 13 98% 0% - 100% 30% 2037 31462 5717 167336 0 0 0 13 99% 0% - 100% 33% 2000 4047 5909 169060 24 0 0 13 98% 0% - 100% 67% 1580 2177 5471 167101 32 0 0 13 99% 0% - 100%
26© 2007 Open Grid Forum
Observations
• Failover and giveback transparent• No BSOD when times within windows
• recall: KeepAliveTO = 180• some tuning opportunities here
actual failover was < 60 seconds
iscsi stop+start used to increase “failover time” for testing
• Slower client access during takeover• expected behavior
• Heavy paging activity not an issue• Higher number of clients / storage server an option,
depending on application behavior
27© 2007 Open Grid Forum
Economic analysis
• Assume • 256 clients / storage server• 20w / drive• $80 / client-side drive• 80G client-side drive, 10G used per application• $3000 / server-side drive• 300G server-side drive
• Calculate• server-side actual usage• cost of client-side drives vs. cost for server space• cost of power+cooling for client-side drives and server space
28© 2007 Open Grid Forum
Results
• Server side usage• 512 clients x 10GB per application = 5 TB• Assume
• 50% usable space on server• 20w typical per drive • 2.3 x multiplier to account for cooling
• 5000GB * 2 / 300GB/drive * 20w/drive * 2.3 1.53 KW• 10TB raw @ $10/GB $100,000
• Workstation side drives• Same assumptions (note: power supply issue)• 512 drives * 20w/drive * 2.3 23.5 KW• 512 drives * $80/drive $40,960
• At $0.10/KWH, cost curves cross over in three years
• in some scenarios, it’s less than two years
29© 2007 Open Grid Forum
Conclusion
• Dynamic provisioning from golden images is here
• Incredibly useful technology in diskless workstation farms• Fast turnaround• Central control• Simple administration• Nearly effortless client replacement
• Green!
30© 2007 Open Grid Forum
Questions?