Download - Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams
![Page 1: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/1.jpg)
![Page 2: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/2.jpg)
Cluster Infrastructure &System Provisioning Engineering
Angelo FaillaProduction Engineer – ClusterInfra Dublin
supporting rapid infrastructure and user growth
![Page 3: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/3.jpg)
What do we do?
Efficiently bring up new capacity and manage the
health of core services
required to operate our
infra.
![Page 4: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/4.jpg)
• DNS Infrastructure• NTP infrastructure• Provisioning infrastructure
(DHCP, TFTP, Grub2, etc…)• Cluster/DC level automation
Cluster Infrastructure
Team Responsibilitie
s
![Page 5: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/5.jpg)
System Provisioning Engineering
Team Responsibilitie
s
• Cyborg• Built on top of provisioning infra• Orchestrates server / TOR
provisioning• Image parameters tool• Repair ticketing system• Hardware checking systems
![Page 6: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/6.jpg)
(some of the) challenges
![Page 7: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/7.jpg)
The number of machines
![Page 8: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/8.jpg)
PROVISION-ING:
IT’S HANDS FREE
![Page 9: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/9.jpg)
The number of variables is too high
https://www.flickr.com/photos/curveto/2698598542/ - CC-BY-2.0-
![Page 10: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/10.jpg)
Let’s talk about TFTP…
TFTP: D.O.B. 1981 Angelo: D.O.B. 1981
![Page 11: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/11.jpg)
POP TFTP: Asia -> Oregon
Latency: 150ms
POP
![Page 12: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/12.jpg)
POP TFTP: Asia -> OregonRRQ: 150ms
ACK: 150ms
GET DATA BLOCK0: 150ms
DATABLOCK 0 PAYLOAD: 150ms
GET DATABLOCK N: 150ms
DATABLOCK N PAYLOAD: 150ms
POP
![Page 13: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/13.jpg)
File size
Block Size
Latency
Time to download
80 MB 512 B 150ms 12.5 hours
80 MB 1400 B 150ms 4.5 hours
80 MB 512 B/ 1400 B 1ms <1 minute
POP TFTP: Asia -> Oregon
![Page 14: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/14.jpg)
Solution 1: let’s use iPXE as it talks TCP/HTTP! - It had a 10 minutes watchdog (which we had to patch) - after patch it was still taking > 10 min-utes
Solution 2: put fbtftp server in every POP - our own home made TFTP server - have it stream files from http - cache files locally - couple of minutes to download initrd/ker-nel
Solution 3 (currently investigating):use Grub2 and download initrd/kernel via HTTPconfigurable tcp window size, patch sent up-stream.
Solutions
![Page 15: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/15.jpg)
Vendors tell you they are
IPv6 compliant, but
are they really?
![Page 16: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/16.jpg)
Bring up/down clusters as fast as possible
![Page 17: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/17.jpg)
Come talk to us at our
poster sessions!
![Page 18: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams](https://reader036.vdocument.in/reader036/viewer/2022081520/5882b6c51a28abd75a8b7543/html5/thumbnails/18.jpg)