isn't it ironic - managing a bare metal cloud (osl tes 2015)

40
Isn't it Ironic? Managing a bare metal cloud Devananda van der Veen http://github.com/devananda/talks/isntitironic.html twitter: @devananda

Upload: devananda-van-der-veen

Post on 15-Jul-2015

244 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Isn't it Ironic?Managing a bare metal cloud

Devananda van der Veen

http://github.com/devananda/talks/isnt­it­ironic.htmltwitter: @devananda

Page 2: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Who am IMaster Engineer at HPOpenStack TechnicalCommitteeOpenStack Ironic PTL

Page 3: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

What we're going to talk aboutVirtualization & OpenStackIronic's architectureConfiguration choices you need tomakeOperators <­> DevelopersWalk through a deployMultitenancySome other cool stuff

Page 4: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Why is virtualization important?

Page 5: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)
Page 6: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

"OpenStack is not a virtualizationlayer. It is an abstraction layer."

­ Daniel Sabbah, CTO @ IBM

Page 7: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Compute Virtualization Drivers(KVM, Xen, HyperV ...)

flexibility: dev controls their wholeenvironmentisolation: guests don't share resources... but noisy neighbors

Page 8: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Compute Containerization Drivers(lxc, docker ...)

specificity: just your app, nothing moredensity: far more instances per server... but shared kernel, memory, and otherresources

Page 9: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Compute Physicalization Drivers(Ironic)

raw performancenon­virtualizable workloads (eg,GPUs)... but no hypervisor

Page 10: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Ironic is:An OpenStack serviceA Nova driveralso capable of being used stand­alone

Page 11: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Design GoalsBe resilient to hardware and service failuresPredictable and horizontal scale­outCommon API across hardware vendorsCommon API for both virtual and physicalresources

Page 12: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Key Componentsironic­api: RESTful API serviceironic­conductor: interacts directly with hardware;asynchronous handling of both requested and periodicactions.deploy ramdisk: temporary agent, booted on machines, toprovide remote access to hardware for provisioning andmanagement.Nova driver: interface for Nova; enables OpenStack toprovide common abstraction for virtual and physicalmachines.discoverd ramdisk: optional tool for hardware inventorymanagement

Page 13: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)
Page 14: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Key TechnologiesPXE: pre­boot execution environment, allows host to bootfrom networkDHCP: dynamic host configuration protocol, used to locatethe NBP on the network, and provide the host OS with IPaddress during initIPMI: intelligent platform management interface, for remotecontrol of machine power state, boot device, serial console,etc.TFTP: trivial file transfer protocol, copies the NBP over thenetworkiSCSI: used to remotely attach to HDD and copy themachine image

Page 15: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

But you have options...! PXE: use virtual media channel instead of PXE. RequiresSwift; available in Kilo; limited to "ilo" drivers.! DHCP: static IP injection is possible, but not really suitablefor larger or dynamic environments! IPMI: many other out­of­band power management toolssupported! TFTP: iPXE / HTTP(S) supported, too! iSCSI: image can be fetched directly by "agent" drivers

Page 16: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

... and options ...Homogeneous hardware? Easy!Heterogeneous hardware? Use nova­scheduler to match flavors to node.propertiesSingle tenant / small deployment? Flat network. Maybe use Ironic stand­aloneService provider for multiple tenants? Use keystone for auth, nova for quota management, ... Basically, use OpenStackUntrusted tenants? Network isolation is possible (but not trivial) via Neutron Secure­erase disks, flash firmware between each use

Page 17: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Operators <­> Developers

Page 18: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)
Page 19: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Examples

Page 20: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Enroll Hardware$ ironic node-create -d agent_ipmitool \ -i ipmi_username=admin -i ipmi_password=fake -i ipmi_address=10.1.2.3 \ -p cpus=4 -p memory_mb=8192 -p local_gb=500 \ -e note='spare server' -n mytest+--------------+-------------------------------------------------------------+| Property | Value |+--------------+-------------------------------------------------------------+| chassis_uuid | None || driver | agent_ipmitool || driver_info | u'ipmi_address': u'10.1.2.3', u'ipmi_username': u'admin', || | u'ipmi_password': u'******' || extra | u'note': u'spare server' || properties | u'memory_mb': u'8192', u'local_gb': u'500', u'cpus': u'4' || uuid | 7a1ce8d0-9679-4d87-8f54-b11f6e8adb8f || name | mytest |+--------------+-------------------------------------------------------------+ $ ironic port-create -n 7a1ce8d0-9679-4d87-8f54-b11f6e8adb8f -a 00:11:22:00:11:22+-----------+--------------------------------------+| Property | Value |+-----------+--------------------------------------+| node_uuid | 7a1ce8d0-9679-4d87-8f54-b11f6e8adb8f || extra | || uuid | 024e52b2-6ae4-483b-a039-d6afae7f6a22 || address | 00:11:22:00:11:22 |+-----------+--------------------------------------+

Page 21: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Validate provided info$ ironic node-validate 7a1ce8d0-9679-4d87-8f54-b11f6e8adb8f+------------+--------+--------------------------------------------------------------| Interface | Result | Reason+------------+--------+--------------------------------------------------------------| console | False | Missing 'ipmi_terminal_port' parameter in node's driver_info.| deploy | False | Node 7a1ce8d0-9679-4d87-8f54-b11f6e8adb8f failed to validatedeploy image info. Some parameters were missing. Missing are:['driver_info.deploy_kernel', 'driver_info.deploy_ramdisk', 'instance_info.image_source'] || inspect | None | not supported| management | True || power | True |+------------+--------+--------------------------------------------------------------

Page 22: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

(I forgot a few options)

Oops

Page 23: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Add or change options$ ironic node-update mytest add \ instance_info/image_source=http://192.168.1.1/myimage.qcow2 \ instance_info/image_checksum=e1d99d6d0ef2144a8d672b0420c547b5

$ ironic node-update mytest add \ driver_info/deploy_ramdisk=http://192.168.1.1/deploy.initrd \ driver_info/deploy_kernel=http://192.168.1.1/deploy.vmlinuz

$ ironic node-update mytest replace extra/note='database' name=db01.example+------------------------+-------------------------------------------------| Property | Value+------------------------+-------------------------------------------------| extra | u'note': u'database'| name | db01.example

Page 24: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Validate info (again)$ ironic node-validate db01.example+------------+--------+---------------------------------------------------------------+| Interface | Result | Reason |+------------+--------+---------------------------------------------------------------+| console | False | Missing 'ipmi_terminal_port' parameter in node's driver_info. || deploy | True | || inspect | None | not supported || management | True | || power | True | |+------------+--------+---------------------------------------------------------------+

Page 25: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Show details$ ironic node-show db01.example+------------------------+------------------------------------------------------------| Property | Value+------------------------+------------------------------------------------------------| target_power_state | None| last_error || maintenance_reason || provision_state | available| console_enabled | False| target_provision_state | None| maintenance | False| power_state | power off| driver | agent_ipmitool| reservation | None| instance_uuid | None| driver_internal_info | | chassis_uuid |

Page 26: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Maintenance Mode$ ironic node-set-maintenance --reason 'replacing disks' db01.example true$ ironic node-show db01.example+------------------------+------------------------------------------------------------| Property | Value+------------------------+------------------------------------------------------------| target_power_state | None| last_error || maintenance_reason | replacing disks| provision_state | available| console_enabled | False| target_provision_state | None| maintenance | True| power_state | power off| instance_uuid | None| driver_internal_info |

Page 27: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Power Status Loop$ ironic node-show my.broken.node+-----------------+----------------------------------------------------------------------+| Property | Value |+-----------------+----------------------------------------------------------------------+| last_error | During sync_power_state, max retries exceeded for node || | 9729f0b2-b270-4d06-aa87-40f2b2cad6ee, node state None does not match || | expected state 'off'. Updating DB state to 'None' Switching node to || | maintenance mode. |

$ cat /var/log/upstart/ironic-conductor.log2015-03-24 04:29:19.349 26317 WARNING ironic.conductor.manager [-]During sync_power_state, could not get power state for node 9729f0b2-b270-4d06-aa87-40f2b2cad6ee.Error: IPMI call failed: power status.

Page 28: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Deployment (via Ironic)$ ironic node-set-provision-state db01.example active

The provisioning operation can't be performed on node7a1ce8d0-9679-4d87-8f54-b11f6e8adb8f because it's in maintenance mode.$ ironic node-set-maintenance db01.example false$ ironic node-set-provision-state db01.example active$# ... time goes on ...

Page 29: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Deployment (via Ironic)$ ironic node-show db01.example+------------------------+-------------------------------------| Property | Value+------------------------+-------------------------------------| target_power_state | None| last_error || maintenance_reason | None| provision_state | active| console_enabled | False| target_provision_state | None| maintenance | False| power_state | power on| instance_uuid | None| driver_internal_info |

Page 30: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Deployment (via Nova)$ nova boot –flavor baremetal -image myimage -key-name my_ssh_key ...

$ tail -f /var/log/upsart/nova-compute.log...2014-05-01 03:47:05.878 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 81922014-05-01 03:47:05.878 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 5002014-05-01 03:47:05.878 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 4

...2014-05-01 03:47:05.878 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 02014-05-01 03:47:05.878 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 02014-05-01 03:47:05.878 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 0

Page 31: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Two methods for image deploymentDirect from source || Cache on conductoragent_ipmitoolagent_pyghmiagent_ilo

pxe_ipmitoolpxe_ipminativepxe_seamicropxe_ibootpxe_ilopxe_snmppxe_dracpxe_irmcpxe_amtiscsi_ilo

Page 32: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

PXE Deploy Process

Page 33: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

PXE Deploy Process (cont)

Page 34: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Agent Deploy Process

Page 35: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Agent Deploy Process (cont)

Page 36: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

(there's a hash ring)

Fault Tolerance

Each node is mapped across the set of conductors which haveenabled that node's driver, using a consistent hash ring.Starting or stopping a conductor will initiate a rebalance.Periodic task ensures consistent local state for mapped nodes.No impact to running instances, but may disrupt reboots (ifnodes netboot)

Page 37: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Example$ ironic driver-list+---------------------+----------------+| Supported driver(s) | Active host(s) |+---------------------+----------------+| agent_ilo | cdr-A | << all the ilo nodes go here| agent_ipmitool | cdr-A,cdr-B | << ipmi nodes are split 50/50+---------------------+----------------+

Page 38: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

What happens when I stop cdr­B?$ ironic driver-list+---------------------+----------------+| Supported driver(s) | Active host(s) |+---------------------+----------------+| agent_ilo | cdr-A | << conductor "A" is now managing| agent_ipmitool | cdr-A | << 100% of all my nodes+---------------------+----------------+

Actually, I was just migrating to cdr­C$ ironic driver-list+---------------------+----------------+| Supported driver(s) | Active host(s) |+---------------------+----------------+| agent_ilo | cdr-A,cdr-C | << both ilo & ipmi nodes are| agent_ipmitool | cdr-A,cdr-C | << shared evenly across conductors+---------------------+----------------+

Page 39: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Is there a command to view themapping?

Nope. It's dynamically generated as the cluster changes.

Page 40: Isn't it ironic - managing a bare metal cloud (OSL TES 2015)

Thanks!