© 2007 open grid forum grid provisioning from cloned golden boot images alan g. yoder, ph.d....

30
© 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc.

Upload: jesse-kilgore

Post on 27-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

© 2007 Open Grid Forum

Grid provisioning from cloned golden boot images

Alan G. Yoder, Ph.D.

Network Appliance Inc.

Page 2: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

2© 2007 Open Grid Forum

Outline

• Types of grids• Storage provisioning in various grid types• Case study

• performance• stability

Page 3: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

3© 2007 Open Grid Forum

Grid types

• Cycle scavenging• Clusters• Data center grids

Page 4: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

4© 2007 Open Grid Forum

Cycle scavenging grids

• Widely distributed as a rule• campus or department wide• global grids

• Typically for • collaborative science• resource scavenging

• Main focus to date of GGF, OGF, Globus, et al.• Category includes "grid of grids"

Page 5: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

5© 2007 Open Grid Forum

Clusters

• Grid-like systems• good scaleout• cluster-wide namespace

• Especially attractive in HPC settings• Many concepts in common with cycle-

scavenging systems• but proprietary infrastructure

• no management standards yet

Page 6: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

6© 2007 Open Grid Forum

Data Center Grids

• Focus of this talk• Typically fairly homogenous

• standard compute node hardware• two or three OS possibilities

• Two variants• Nodes have disks

• Topologically homomorphic to cycle scavenging grids• May use cycle scavenging grid technology

• Nodes are diskless• Storage becomes much more important storage grids

Page 7: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

7© 2007 Open Grid Forum

Storage technology adoption curves

Market Adoption Cycles

Enterprise Storage Market

TodayGrid Frameworks

Today ?

Direct attachedStorage

NetworkedStorage

StorageGrids

Global StorageNetwork

Focus of this talk

Page 8: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

8© 2007 Open Grid Forum

Diskless compute farms

• Connected to storage grids• Boot over iSCSI or FCP• OS is provisioned in a boot LUN on a storage array• Applications can be provisioned as well

Key benefit – nodes can be repurposed at any time from a different boot image

Key benefit – smart storage and provisioning technology can use LUN cloning to deliver storage efficiencies through block sharing

Key benefit – no rotating rust in compute nodes• reduced power and cooling requirements• no OS/applications to provision on new nodes

Page 9: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

9© 2007 Open Grid Forum

Local fabric technologies

SAN

blah blah blah blah blah blah blah blah blah blah

products = e.g.

shadowimage,flexclone

iSCSIorFC

• Servers boot over iSCSI or FCP SAN• Storage server(s) maintain golden image + clones

Page 10: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

10© 2007 Open Grid Forum

Global deployment technologies

iSAN

iSAN

iSAN

iSAN

WAN

products e.g.snapmirror,

trucopy

Long-haul replication from centraldata center to local centers

Page 11: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

11© 2007 Open Grid Forum

Diskless booting

LU – Logical UnitLUN – Logical Unit Number

Mapping – LUNs :: initiator portsMasking – Initiators :: LUNs (“views”)

• Node shuts down• Storage maps desired image to LUN 0 for the zone

(FCP) or initiator group (iSCSI) the node is in• Node restarts• Node boots from LUN 0

• mounts scratch storage space if also provided• starts up grid-enabled application

• Node proceeds to compute until done or repurposed

Page 12: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

12© 2007 Open Grid Forum

Example

/vol/vol1/geotherm2LUN 0

mapped to gridsrv1

gridsrv1 gridsrv2 gridsrv3

/vol/vol1/mysql_on_linuxLUN 0

mapped to gridsrv2 /vol/vol1/mysql_on_linux

LUN 0mapped to gridsrv3

compute grid

storage grid

Page 13: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

13© 2007 Open Grid Forum

What makes this magic happen?

/vol/vol1/geotherm2LUN 0

mapped to gridsrv1

gridsrv1 gridsrv2 gridsrv3

/vol/vol1/mysql_on_linuxLUN 0

mapped to gridsrv2 /vol/vol1/mysql_on_linux

LUN 0mapped to gridsrv3

compute grid

storage grid

SGME

Page 14: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

14© 2007 Open Grid Forum

SGME

• Storage Grid Management Entity• Component of overall GME in OGF Reference

model• GME is the collection of software that assembles

the components of a grid into a grid

• Provisioning, monitoring etc.

• Many GME products: Condor et al

• Current storage grid incarnations are often home-rolled scripts

• Also Stork, Lustre, qlusters

Page 15: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

15© 2007 Open Grid Forum

Provisioning a diskless node

• Add HBAs to white box if necessary• Fiddle with CMOS to boot from SAN• For iSCSI:

• DHCP supplies address, node name• SGME provisions igroup for node address• SGME creates LU for node• SGME maps LU to igroup

• For FC:• zone, mask, map, etc.

SGME Grid Storage Management software

HBA Host Bus Adapter

CMOS BIOS settings

DHCP IP boot management

Page 16: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

16© 2007 Open Grid Forum

Provisioning a diskless node

• Add HBAs to white box if necessary• We used QLogic 4052 adapters

• Fiddle with CMOS to boot from SAN

• Get your white box vendor to do this• Blade server racks generally easily configurable

for this as well

Page 17: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

17© 2007 Open Grid Forum

Preparing a gold image

• On a client – this is manual one-time work• Install Windows server (e.g.)• Setup HBA

• e.g. QLogic needs iscli.exeand commands in startup.batC:\iscli.exe –n 0 KeepAliveTO 180 IP_ARP_Redirect on

• Software initiators must be prevented from paging out

• HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management:DisablePagingExecutive => 1

• Run Microsoft sysprep setup mgr and seal image

Page 18: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

18© 2007 Open Grid Forum

Preparing a gold lun

• On storage server – manual one time work• Copy the golden image to a new base LUN (over CIFS)

des-3050-2> lun show    /vol/vol1/gold/win2k3_hs  10g ...

• Create a snap shot of the volume with the gold lun….

des-3050-2> snap create vol1 windows_lun• Create an igroup for each initiator

des-3050-2> igroup create -i -t windows kc65b1 \

iqn.2000-04.com.qlogic:qmc4052.zj1ksw5c9072.1

Note: commands in blue type are Netapp-specific, for purposes of illustration only

Page 19: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

19© 2007 Open Grid Forum

Preparing cloned LUNs

• SGME: for each client• create a qtree des-3050-2> qtree create /vol/vol1/iscsi/

• Create a lun clone from the gold lundes-3050-2> lun create –b \ /vol/vol1/.snapshot/windows_lun/gold/win2k3_hs \ /vol/vol1/iscsi/kc65b1

• Map the lun to the igroup.des-3050-2> lun map /vol/vol1/iscsi/kc65b1 kc65b1 0

Page 20: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

20© 2007 Open Grid Forum

Getting clients to switch horses

• SGME: for each client • Notify client to clean up• Bring down client

• remote power strips/blade controllers• Remap client LUN on storage

des-3050-2> lun offline /vol/vol1/iscsi/kc65b1

des-3050-2> lun unmap /vol/vol1/iscsi/kc65b1 kc65b1 0

des-3050-2> lun online /vol/vol1/iscsi2/kc65b1

des-3050-2> lun map /vol/vol1/iscsi/kc65b1 kc65b1 0

• Bring up client• DHCP

Page 21: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

21© 2007 Open Grid Forum

Lab results

• Experiment conducted at Network Appliance• FAS 3050 clustered system• 224 clients (112 per cluster node)

• dual core 3.2GHz/2GB Intel Xeon IBM H20 Blades • Qlogic QMC 4052 adapters • Windows Server 2003 SE SP1

• Objectives• determine robustness and performance characteristics of

configuration, under conditions of storage failover and giveback

• determine viability of keeping paging file on central storage

Page 22: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

22© 2007 Open Grid Forum

Network configuration

Not your daddy’s network

Page 23: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

23© 2007 Open Grid Forum

Client load

• Program to generate heavy CPU and paging activity (2 GB memory area, lots of reads and writes)

• Several instances per client

Page 24: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

24© 2007 Open Grid Forum

Client load, cont.

• ~400 pages/sec

Page 25: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

25© 2007 Open Grid Forum

Load on storage

Near 100% disk utilization on storage systemin takeover mode

des-3050-1(takeover)> sysstat -u 1 CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk ops/s in out read write read write age hit time ty util 18% 1318 3129 4413 167328 48 0 0 13 98% 0% - 100% 42% 2708 67637 6165 166210 8 0 0 13 99% 0% - 100% 53% 2035 71519 5258 155134 52419 0 0 13 99% 45% D 100% 54% 1852 62163 4488 124647 99591 0 0 13 99% 100% : 79% 49% 2021 70115 5083 123828 58347 0 0 13 99% 100% D 73% 83% 1005 24380 2414 110473 54491 0 0 13 99% 100% : 83% 42% 2892 65357 7878 211645 56495 0 0 13 99% 100% : 128% 39% 2250 29027 7839 155554 19597 0 0 13 99% 35% D 93% 74% 1671 39249 4393 184457 57014 0 0 15 100% 100% : 112% 38% 2323 57148 6777 161911 69163 0 0 15 99% 100% : 100% 51% 2105 52591 5354 147766 95826 0 0 12 99% 90% D 86% 29% 382 957 988 163609 60946 0 0 12 98% 100% : 100% 19% 1331 2232 4305 163301 6938 0 0 12 98% 49% : 100% 18% 1247 1547 4390 164802 24 0 0 13 98% 0% - 100% 30% 2037 31462 5717 167336 0 0 0 13 99% 0% - 100% 33% 2000 4047 5909 169060 24 0 0 13 98% 0% - 100% 67% 1580 2177 5471 167101 32 0 0 13 99% 0% - 100%

Page 26: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

26© 2007 Open Grid Forum

Observations

• Failover and giveback transparent• No BSOD when times within windows

• recall: KeepAliveTO = 180• some tuning opportunities here

actual failover was < 60 seconds

iscsi stop+start used to increase “failover time” for testing

• Slower client access during takeover• expected behavior

• Heavy paging activity not an issue• Higher number of clients / storage server an option,

depending on application behavior

Page 27: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

27© 2007 Open Grid Forum

Economic analysis

• Assume • 256 clients / storage server• 20w / drive• $80 / client-side drive• 80G client-side drive, 10G used per application• $3000 / server-side drive• 300G server-side drive

• Calculate• server-side actual usage• cost of client-side drives vs. cost for server space• cost of power+cooling for client-side drives and server space

Page 28: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

28© 2007 Open Grid Forum

Results

• Server side usage• 512 clients x 10GB per application = 5 TB• Assume

• 50% usable space on server• 20w typical per drive • 2.3 x multiplier to account for cooling

• 5000GB * 2 / 300GB/drive * 20w/drive * 2.3 1.53 KW• 10TB raw @ $10/GB $100,000

• Workstation side drives• Same assumptions (note: power supply issue)• 512 drives * 20w/drive * 2.3 23.5 KW• 512 drives * $80/drive $40,960

• At $0.10/KWH, cost curves cross over in three years

• in some scenarios, it’s less than two years

Page 29: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

29© 2007 Open Grid Forum

Conclusion

• Dynamic provisioning from golden images is here

• Incredibly useful technology in diskless workstation farms• Fast turnaround• Central control• Simple administration• Nearly effortless client replacement

• Green!

Page 30: © 2007 Open Grid Forum Grid provisioning from cloned golden boot images Alan G. Yoder, Ph.D. Network Appliance Inc

30© 2007 Open Grid Forum

Questions?