cluster boot issues
DESCRIPTION
Cluster Boot Issues. Shawn Marriott COSC 3P93. What is a Cluster?. A cluster is a collection of individual autonomous nodes working in concert to create the illusion of a single system. The trouble with nodes. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/1.jpg)
CLUSTER BOOT ISSUESShawn MarriottCOSC 3P93
![Page 2: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/2.jpg)
What is a Cluster?
A cluster is a collection of individual autonomous nodes working in concert to create the illusion of a single system.
![Page 3: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/3.jpg)
The trouble with nodes
If you must consider the state, and configuration of each node in a cluster the illusion of the single system is lost. As the size of a cluster grows the management of individual nodes becomes tedious, and error prone.
![Page 4: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/4.jpg)
A Paradox
How does a node get the information it needs to boot, if it needs to boot to get the information it needs ?
![Page 5: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/5.jpg)
Booting a computer
Part of the startup procedure for most computers is to initialize a bootstrap, which in turn starts the operating system. Where to find the bootstrap is generally configure on a per machine basis in the bios. Common places that are searched for the bootstrap are the hard drive(s), optical drives, solid state devices, and floppies.
![Page 6: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/6.jpg)
Or at least convince the nodes to boot…….
How do you boot a cluster?
![Page 7: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/7.jpg)
A Sub Ideal Solution:
Manually configure every node in the cluster…
![Page 8: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/8.jpg)
An ideal solution:
Unpack the computer. Plug in the computer. Turn on the computer. Done.
![Page 9: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/9.jpg)
How could this be done?
If the nodes in our cluster are computers connected to a network. Then when a computer starts up have it use the network to broadcast information about itself to a server, and have the server provide configuration information and a boot strap.
![Page 10: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/10.jpg)
In the beginning….
There was the Reverse Address Resolution Protocol (RARP). RFC 903, 1984.
![Page 11: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/11.jpg)
Next try……
Bootstrap protocol (BOOTP). RFC 951, 1985
![Page 12: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/12.jpg)
Sample BootP config
node01:ht=ether:ha=080009030166:ip=15.19.8.2:sm=255.255.248.0:gw=15.19.8.1:bf=/bootloader
node02:ht=ether:ha=080009030176:ip=15.19.8.3:sm=255.255.248.0:gw=15.19.8.1:bf=/bootloader
node03:ht=ether:ha=080009030186:ip=15.19.8.4:sm=255.255.248.0:gw=15.19.8.1:bf=/bootloader
node04:ht=ether:ha=080009030196:ip=15.19.8.5:sm=255.255.248.0:gw=15.19.8.1:bf=/bootloader
![Page 13: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/13.jpg)
Another go around
Dynamic Host Configuration Protocol (DHCP).
RFC 1531, 1993.
![Page 14: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/14.jpg)
Sample DHCP Configuration
subnet 192.168.0.0 netmask 255.255.255.0
{ range 192.168.0.10 192.168.0.49; option subnet-mask 255.255.255.0; option broadcast-address
192.168.0.255; option routers 192.168.0.1;
filename "pxelinux.0"; next-server 192.168.0.100;
}
![Page 15: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/15.jpg)
One more try?
In 1999 Intel releases the Wired for Management Framework (WMF)
The framework was mostly ignored, but two interesting parts of the specification endure.
Preboot Execution Environment (PXE). Boot Integrity Services (BIS).
![Page 16: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/16.jpg)
What is PXE?
PXE is a client designed to work with DHCP and TFTP to retrieve boot strap information, it also defines APIs to allow a loaded boot strap to query a local host for additional information
![Page 17: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/17.jpg)
What is BIS?
BIS enables a PXE client to examine a digitally signed boot image. This provides a mechanisms to verify the integrity of a supplied boot strap image.
![Page 18: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/18.jpg)
How is PXE implemented? If you want to use a network to provide
configuration information for a node, then you better have a network card (NIC).
Since you must have a network card, it seems reasonable to put the PXE client on the network card, and have the bios treat network cards as a bootable device.
![Page 19: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/19.jpg)
Intel’s vision
![Page 20: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/20.jpg)
What is a Network Bootstrap Program(NBP)? NBP is a binary executable file, specific
to a give CPU’s architecture. They are small, usually less than 512KB
in size. What an NBP does is up to whoever
creates it. The PXE specification does not go into any detail on NBP.
![Page 21: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/21.jpg)
PXE Work flow with an NLB
![Page 22: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/22.jpg)
What is involved in setting up a PXE environment?
A Node with a PXE client set as the boot device.
A DHCP server TFTP server
![Page 23: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/23.jpg)
Sample DHCP Configuration
subnet 192.168.0.0 netmask
255.255.255.0 { range 192.168.0.10 192.168.0.49; option subnet-mask 255.255.255.0; option broadcast-address
192.168.0.255; option routers 192.168.0.1;}
![Page 24: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/24.jpg)
Sample proxy DHCP config
subnet 192.168.0.0 255.255.0.0
{
vendor pxe
{
bootstrapserver 192.168.0.100 # TFTP server ip address.
#Type, SystemArch, MajorVers
pxebootfile 1 2 1 window.one 1 0
pxebootfile 2 2 1 linux.one 2 3
pxebootfile 1 2 1 hello.one 3 4
client 6 10005a8ad14d
{
pxebootfile 1 2 1 aix.one 5 6
pxebootfile 2 2 1 window.one 6 7
}
}
}
![Page 25: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/25.jpg)
What PXE does right
It is a standard feature on modern NICS, and mother boards with integrated NICS
A PXE client knows the architecture of the node it is running on, and can request an appropriate NBP
It builds upon existing technologies (DHCP, TFTP)
It has extensions to authenticate the boot server, and verify the integrity of a downloaded NBP.
![Page 26: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/26.jpg)
The problems with PXE
PXE has limited knowledge of it’s host. PXE relies on DHCP, which is inefficient on
large networks PXE relies on TFTP, which is impractical for
large files, or many concurrent file transfers If you need different NBPs for different
nodes you must uniquely identify each node(By MAC address) and group them accordingly in DHCP or Use a DHCP proxy server and separate the management of network address and images
![Page 27: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/27.jpg)
Questions and Discussion
![Page 28: Cluster Boot Issues](https://reader036.vdocument.in/reader036/viewer/2022081603/56813f54550346895daa1592/html5/thumbnails/28.jpg)
References
Thomas L. Sterling , 1998. Beowulf Cluster Computing with Linux. Cambridge, Massachusetts: The MIT Press
Intel Corporation, 1998. Boot Integrity Services Application Programming. Version 1.0.ftp://download.intel.com/design/archives/wfm/downloads/bisspec.pdf
Intel Corporation, 1999. Preboot Execution Environment (PXE) Specification. Version 2.1.http://download.intel.com/design/archives/wfm/downloads/pxespec.pdf
http://www.ietf.org/rfc/rfc903.txthttp://www.ietf.org/rfc/rfc951.txthttp://www.ietf.org/rfc/rfc1531.txthttp://www.beowulf.orghttp://en.wikipedia.org/wiki/Preboot_Execution_Environmenthttp://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp