emc world 2016 - code.05 automating your physical data center with rackhd

2016 POWERPOINT TEMPLATES

Automate your data center with rackhdKenny coleman emc {code}Joseph Heck - rackhd

# Copyright 2016 EMC Corporation. All rights reserved.


#TITLE

One time virtualization junkieruby/rails/node/js & a wee bit of gohad the first google result for 'kendrick' until 2012bourbon aficionado@kendrickcoleman github.com/kacole2Who?KENNY COLEMANDevOps from before it was coolOpen Source advocate and contributorOpenStack PTL in early years@heckj github.com/heckjJOSEPH HECK

# Copyright 2016 EMC Corporation. All rights reserved.Whats the problem?Whats RackHD?Why again do we need this?How does it work?ArchitectureIntegrationsDemos!

agenda


# Copyright 2016 EMC Corporation. All rights reserved.Ask yourselves a question. What do you want your data center to be like when you grow up? Its a pretty easy assumption that many of us want to run our datacenters like aws, or google or azure. For a few years now, we are moving more operations to the cloud. We are starting to embrace it.

#TITLE

# Copyright 2016 EMC Corporation. All rights reserved.However, This article on Wired talked in-depth about how dropbox was able to move away from the cloud. Apple did the same thing. There are lots of factors that play into it, but many companies are starting to bring it all back on-prem. There is a scale factor and at the same time new tools are being developed. New tools that allow you to run your datacenter the same way AWS does.#TITLE

# Copyright 2016 EMC Corporation. All rights reserved.The goal is to treat our infrastructure as code. The old world was rack and stack. Then we moved onto converged infrastructure. But there has to be a next. How do we operate infrastructure in a way that is more hands off. You cant be like AWS unless you orchestrate at the very lowest levels. We have to be able to treat physical components as if they were virtual machines. At the end of the day, we can reduce cost by delivering a predictable workload.

#TITLE

# Copyright 2016 EMC Corporation. All rights reserved.The biggest problem to solve is what do we do with a server after it gets rolled into the data center, fitted with power and a network connection? What happens after we press that power button? The goal we want to accomplish is to get this server operational as fast as possible. That may be to add it to a cluster of servers for resources or It could be to run some bare-metal application.#TITLE

Day 1 problem

# Copyright 2016 EMC Corporation. All rights reserved.New image.Run Infr as codeOld world = rack stack and load personalityYou cant be like AWS unless orchestrate at the very lowest levels.next after vblock. This is the modern dcTreat physical as if they VMsBring it back to private bc tech has changed

Articles of companies coming back from AWS.Cheaper with predicatble workload.Now we can orchestrate

Dropbox & apple go back to internal8#TITLE

RackHD

OSS

# Copyright 2016 EMC Corporation. All rights reserved.What is rackHD? Go high-level summary

RackHD is a technology stack for enabling automated hardware management and orchestration through cohesive APIs. It serves as an abstraction layer between other M&O layers and the underlying physical hardware.

Developers can use the RackHD APIs to incorporate RackHD functionality into a larger orchestration system or to create a user interface for managing hardware services regardless of the underlying hardware in place.

available under the Apache 2.0 license

RackHD serves as an abstraction layer between other M&O layers and the underlying physical hardware. Developers can use the RackHD API to create a user interface that serves as single point of access for managing hardware services regardless of the specific hardware in place.

RackHD has the ability to discover the existing hardware resources, catalog each component, and retrieve detailed telemetry information from each resource. The retrieved information can then be used to perform low-level hardware management tasks, such as BIOS configuration, OS installation, and firmware management.

RackHD sits between the other M&O layers and the underlying physical hardware devices. User interfaces at the higher M&O layers can request hardware services from RackHD. RackHD handles the details of connecting to and managing the hardware devices.#TITLE

Why YACMT? (yet another configuration management tool)managing and maintaining each individual node Hardware has proven less automationCobbler, SystemImager, Razor/HanlonProvides a significant step the enablement of converged infrastructure automation.Why?

# Copyright 2016 EMC Corporation. All rights reserved.With a datacenter that contains many bare metal machines, managing and maintaining each individual node can quickly become very time consuming and un-scalable. So its essential to have an automated service, like RackHD, to manage the nodes. The primary goals of RackHD are to provide REST APIs and live data feeds to enable automated solutions for managing hardware resources. The technology and architecture are built to provide a platform agnostic solution.

Application automation services such Heroku or CloudFoundry are service API layers (AWS, Google Cloud Engine, SoftLayer, OpenStack, and others) that are built overlying infrastructure. Those services, in turn, are often installed, configured, and managed by automation in the form of software configuration management: Puppet, Chef, Ansible, etc. To automate data center rollouts, managing racks of machines, etc - these are built on automation to help roll out software onto servers - Cobbler, Razor, etc.

The closer you get to hardware, the less automated systems tend to become. Cobbler and SystemImager were mainstays of early data center management tooling. Razor (or Hanlon, depending on where youre looking) expanded on that base system , supported mainly by people working to implement further automation solutions.

RackHD expands the capabilities of hardware management and operations beyond the mainstay features

RackHD enables deeper and fuller automation by playing nicely with both existing and future potential systems. It adds to existing open source efforts by providing a significant step the enablement of converged infrastructure automation.

#TITLE

DISCOVERY & CATALOGING

TELEMETRY & GENEALOGY

DEVICE MANAGEMENTCONFIGURATIONPROVISIONINGFIRMWARE MANAGEMENTLOGGINGENVIRON-MENTALSFAULT DETECTIONANALYTICS DATA

# Copyright 2016 EMC Corporation. All rights reserved.Discovery and CatalogingDiscovers the compute, network, and storage resources and catalogs their attributes and capabilities.Telemetry and GenealogyTelemetry data includes genealogical details, such as hardware, revisions, serial numbers, and date of manufactureDevice ManagementPowers devices on and off. Manages the firmware, power, OS installation, and base configuration of the resources.ConfigurationConfigures the hardware per application requirements. This can range from the BIOS configuration on compute devices to the port configurations in a network switch.ProvisioningProvisions a node to support the intended application workflow, for example lays down ESXi from an image repository. Reprovisions a node to support a different workload, for example changes the ESXi platform to Bare Metal CentOS.Firmware ManagementManages all infrastructure firmware versioning.LoggingLog information can be retrieved for particular elements or collated into a single timeline for multiple elements within the management neighborhood.Environmental MonitoringAggregates environmental data from hardware resources. The data to monitor is configurable and can include power information, component status, fan performance, and other information provided by the resource.Fault DetectionMonitors compute and storage devices for both hard and soft faults. Performs suitable responses based on pre-defined policies.Analytics DataData generated by environmental and fault monitoring can be provided to analytic tools for analysis, particularly around predictive failure.11#TITLE

Discovery and CatalogingTelemetry and GenealogyDevice ManagementConfigurationProvisioningFirmware ManagementLoggingEnvironmental MonitoringFault DetectionAnalytics DataForward vision

# Copyright 2016 EMC Corporation. All rights reserved.Discovery and CatalogingDiscovers the compute, network, and storage resources and catalogs their attributes and capabilities.Telemetry and GenealogyTelemetry data includes genealogical details, such as hardware, revisions, serial numbers, and date of manufactureDevice ManagementPowers devices on and off. Manages the firmware, power, OS installation, and base configuration of the resources.ConfigurationConfigures the hardware per application requirements. This can range from the BIOS configuration on compute devices to the port configurations in a network switch.ProvisioningProvisions a node to support the intended application workflow, for example lays down ESXi from an image repository. Reprovisions a node to support a different workload, for example changes the ESXi platform to Bare Metal CentOS.Firmware ManagementManages all infrastructure firmware versioning.LoggingLog information can be retrieved for particular elements or collated into a single timeline for multiple elements within the management neighborhood.Environmental MonitoringAggregates environmental data from hardware resources. The data to monitor is configurable and can include power information, component status, fan performance, and other information provided by the resource.Fault DetectionMonitors compute and storage devices for both hard and soft faults. Performs suitable responses based on pre-defined policies.Analytics DataData generated by environmental and fault monitoring can be provided to analytic tools for analysis, particularly around predictive failure.#TITLE

capabilities

SNMPIPMI

# Copyright 2016 EMC Corporation. All rights reserved.RackHD is focused on being the lowest level of automation that interrogates agnostic hardware and provisions machines with operating systems. The API can be used to pass in data through variables in the workflow configuration, so you can parameterize workflows. Since workflows also have access to all of the SKU information and other catalogs, they can be authored to react to that information.

The real power of RackHD, therefore, is that you can develop your own workflows and use the REST API to pass in dynamic configuration details. This allows you to execute a specific sequence of arbitrary tasks that satisfy your requirements.

When creating your initial workflows, it is recommended that you use the existing workflows in our code repository to see how different actions can be performed.#TITLE

TaskWorkflow/GraphSkuTalk the talkInstall UbuntuRun CommandReboot NodeCustom Ubuntu InstallSKU

Manufacturer = Intel & RAM = 32GB

Attach Custom Ubuntu Install

MONORAIL ENGINE

# Copyright 2016 EMC Corporation. All rights reserved.Need to add in animations

Rubber duck#TITLE

Graph ExamplesOBM with IPMIProvisioning with Kickstart / Ansible

# Copyright 2016 EMC Corporation. All rights reserved.PXE Boot with iPXE /Ping Sweep / SNMPManual EntryActivePassiveQuery of node and any parent enclosures (IPMI, BMC, DMI, etc) then match to SKU definition

Ongoing health and configuration trackers

The good stuff happens here(kickstart, unattend, zerotouch, ansible, etc)

# Copyright 2016 EMC Corporation. All rights reserved.Software components

monorail engine

httpamqp

# Copyright 2016 EMC Corporation. All rights reserved.SOFTWARE ARCHITECTURE


GLOBAL COLLABORATION

TRANSPARENCY

INTEGRATIONSCOMMUNITYSMALL INVESTMENT

FRICTIONLESS DEVELOPMENT

ATTRACT TALENT

NO VENDOR LOCK-IN

Oss advantage

# Copyright 2016 EMC Corporation. All rights reserved.As software transforms industries across the world, more companies are embracing software as core competency to differentiate themselves with customers and capture new opportunities.

( Mobile changing --- consumer access data generated --- intelligence gathered--- new featured constant feedback.._

Companies like Square, Uber, Netflix, Airbnb, and Tesla continue to possess rapidly growing private market valuations and turn the heads of executives of their industries historical leaders. What do these innovative companies have in common? (How can they go from idea to product so quickly) Speed of innovation Always-available services Web scale Mobile-centric user experiences

Enterprises are following:

Kroger: DevOps adoption with PCF Automated build pipelineAllState: Major IT transformation, want to Uberize the insurance industryLockHeed Martin : Building apps using PCF and Spring (Java FMW)HomeDepot: Software Transformation major competiion for AMAZON so have to delivery new capability quickly and efficently.

Software is transforming industries across the world, more companies are embracing software as core competency to differentiate themselves with customers and capture new opportunities.

Companies like Square, Uber, Netflix, Airbnb, and Tesla continue to possess rapidly growing private market valuations and turn the heads of executives of their industries historical leaders. What do these innovative companies have in common? (How can they go from idea to product so quickly) Speed of innovation ( innovate, expirement and deliver software quickly) Always-available services Web scale Mobile-centric user experiences

OTHER:

Businesses today are constantly pressured to adopt the myriad oftechnical driving forces impacting software development and delivery. These driving forces include: Anything as a service Cloud computing Containers Agile Automation DevOps Microservices Business-capability teams Cloud-native applications

Moving to the cloud is a natural evolution of focusing on software, and cloud-native application architectures are at the center of how these companies obtained their disruptive character

SpeedIts become clear that speed wins in the marketplace. Businesses that are able to innovate, experiment, and deliver software-based solutions quickly are outcompeting those that follow more traditional delivery models.

SafetyIts not enough to go extremely fast. If you get in your car and push the pedal to the floor, eventually youre going to have a rather expensive (or deadly!) accident. Transportation modes such as aircraft and express bullet trains are built for speed and safety. Cloud-native application architectures balance the need to move rapidly with the needs of stability, availability, and durability. Its possible and essential to have both.So how do we go fast and safe?VisibilityOur architectures must provide us with the tools necessary to see failure when it happens

Fault isolationIn order to limit the risk associated with failure, we need to limit the scope of components or features that could be affected by a failure. -- Microservices

Recovery

Scale:Rather than scale vertical scaling, Innovative companies dealt with this problem through two pioneeringmoves: Rather than continuing to buy larger servers, they horizontally scaled application instances across large numbers of cheaper commodity machines. These machines were easier to acquire (or assemble) and deploy quickly. Poor utilization of existing large servers was improved by virtualizing several smaller servers in the same footprint and deploying multiple isolated workloads to them

As software transforms industries across the world, more companies are embracing software as core competency to differentiate themselves with customers and capture new opportunities.

Companies like Square, Uber, Netflix, Airbnb, and Tesla continue to possess rapidly growing private market valuations and turn the heads of executives of their industries historical leaders. What do these innovative companies have in common? (How can they go from idea to product so quickly) Speed of innovation Always-available services Web scale Mobile-centric user experiences

Moving to the cloud is a natural evolution of focusing on software, and cloud-native application architectures are at the center of how these companies obtained their disruptive character

SpeedIts become clear that speed wins in the marketplace. Businesses that are able to innovate, experiment, and deliver software-based solutions quickly are outcompeting those that follow more traditional delivery models.

SafetyIts not enough to go extremely fast. If you get in your car and push the pedal to the floor, eventually youre going to have a rather expensive (or deadly!) accident. Transportation modes such as aircraft and express bullet trains are built for speed and safety. Cloud-native application architectures balance the need to move rapidly with the needs of stability, availability, and durability. Its possible and essential to have both.So how do we go fast and safe?VisibilityOur architectures must provide us with the tools necessary to see failure when it happens

Fault isolationIn order to limit the risk associated with failure, we need to limit the scope of components or features that could be affected by a failure. -- Microservices

Recovery

Scale:Rather than scale vertical scaling, Innovative companies dealt with this problem through two pioneeringmoves: Rather than continuing to buy larger servers, they horizontally scaled application instances across large numbers of cheaper commodity machines. These machines were easier to acquire (or assemble) and deploy quickly. Poor utilization of existing large servers was improved by virtualizing several smaller servers in the same footprint and deploying multiple isolated workloads to them

19#TITLE

Well no charge for using itCommunity engagementReal-time updatesNo vendor lock-inModify and adapt to fit your needsUse existing CI/CD integrationsOss advantage

# Copyright 2016 EMC Corporation. All rights reserved.Demo!


integrations



#TITLE

integrations

# Copyright 2016 EMC Corporation. All rights reserved.Docker-Machine Driverhttps://github.com/emccode/docker-machine-rackhd

# Copyright 2016 EMC Corporation. All rights reserved.0.1 release, currently work in progress for full state management (i.e inclusion of support for start, stop, remove, restart commands)24#TITLE

Rackhd driver for kubernetes

SKUUse a RackHD SKU as a pool of nodes for Kubernetes./kube_up.sh


Vagrant Test SetupUses VMs on VirtualboxSome configuration changesImport sourceChange a templateImport some workflow and SKU definitionshttp://bit.ly/rackhd-dockerGet it hereDemo!!

# Copyright 2016 EMC Corporation. All rights reserved.Find Baremetal Compute nodes dynamically discovered by RackHD and register/unregister them with Ironic (OpenStack Bare Metal Provisioning Program)provides poller service that monitors compute nodes and logs the errors from SEL into Ironic Databaseshovel

# Copyright 2016 EMC Corporation. All rights reserved.Shovel is an application that provides a service with a set of APIs that wraps around RackHD/Ironic existing APIs allowing users to find Baremetal Compute nodes dynamically discovered by RackHD and register/unregister them with Ironic (OpenStack Bare Metal Provisioning Program).Shovel also provides poller service that monitors compute nodes and logs the errors from SEL into Ironic Database.

A Shovel Horizon plugin is also provided to interface with the Shovel service. The plugin adds a new Panel to the admin Dashboard called rackhd that displays a table of all the Baremetal systems discovered by RackHD. It also allows the user to see the node catalog in a nice table View, Register/Unregister node in Ironic, display node SEL and enable/register a failover node.#TITLE

System level

# Copyright 2016 EMC Corporation. All rights reserved.Services diagram


OpenStack shovel integration


CLOUDFOUNDRY BOSH RACKHD CPI


future



#TITLE

getting related switch port information (the mac-address/switch port table from the switch) from a remote catalogprocess those information sources and create the relevant "links" to represent a topology from the RackHD APIs that show what compute servers are connected to what switches, and at which port.

network to compute node topology

# Copyright 2016 EMC Corporation. All rights reserved.(ORFS-152)

Background

Related to the work list in V2 API in it's notion of enabling relationships, we have sufficient information with the existing LLDP catalog and the capability of getting related switch port information (the mac-address/switch port table from the switch) from a remote catalog there. With the combined information, we should be able to process those information sources and create the relevant "links" to represent a topology from the RackHD APIs that show what compute servers are connected to what switches, and at which port.

Goals

a mechanism that will capture needed data from a top-of-rack switch and combine it with lldp catalogs or node data where available to create or amend the underlying data to expose the topology connections.To be able to unplug one of those physical cables and have this mechanism update the topology correctlyTo be able to plug in a physical cable adding a second, independent network connection between compute node and switch, and have that connection be represented in the topologyREST resource API outputs with the V2 API that show the linkages using the relationship structure pattern defined in V2 APIDefined/documented events on the AMQP bus that get sent when a topology is calculated and a change is detected. Specifically, an event if a new link is formed with the details of that link, and a event if a link is broken that previously existed.#TITLE

Access to systems after the OS has been laid down Knowing about network configurations and connectionsAccess via SSH to the host OS, and the potential to grab significant and additional telemetry or install additional packagesIn-band management

# Copyright 2016 EMC Corporation. All rights reserved.Extending the concepts of workflow orchestration to have more knowledge and (potential) access to systems after the OS has been laid down means extending in a number of new ways. Knowing about network configurations and connections, access via SSH to the host OS, and the potential to grab significant and additional telemetry or install additional packages. The first steps to this are to enable SSH access to a HOST OS and to reflect similar information in our data models/API resources

Goals

add representation and appropriate schema for nics and networks to compute nodes, switches, including VLAN specific interfacesadd representation of an IP address and credentials to inquire for OS level detailsworkflow task to use this mechanism to capture OS package w/ versions and store them in a catalogs to include package collectionexpand workflow tasks to arbitrary SSH commands with credentials from node (https://github.com/mscdex/ssh2, https://github.com/tsmith/node-control, https://github.com/mikeal/sequest)expand catalogs to include an OS-level view of network connections as a catalog - nics (interface names for the OS), IP address, gateway, subnet mask, and VLAN if provided/appropriateenable IP lookups to support mapping any data from Ip addresses assigned to compute servers so that ancilliary services can know which node this relates toworkflow task to set in an updated SSH Keyworkflow task to SSH into a switch to set the switch into ZTP/boot mode to reset itextend OS.Install workflows to leverage an in-band connection to verify that the machine is responding via SSH prior to completing the OS.Install workflow#TITLE

Open Source implementation of Redfish 1.0 schema and management APIsManagement of hardware supporting Redfish interfaces in place of IPMI

REDFISH 1.0 implementation

# Copyright 2016 EMC Corporation. All rights reserved.Annotations and documentation for tasks and workflows (graphs)Workflow Editor for composing and debugging workflowsAdditional tasks and toolchains extending existing hardware support capabilitiesExpansion of SKU packs for internal distributions

EXTENDED WORKFLOW TASKS


OSSBE LIKE THE BIG DOGSTECHNICAL NUGGETSINTEGRATIONSKey takeaways

FUTURES


37#TITLE

Main: github.com/RackHD/RackHDVagrant: github.com/RackHD/RackHD/tree/master/exampleFutures: github.com/RackHD/specsCommunity: community.emccode.comLearn more & get started


#TITLE




emc world 2016 - code.05 automating your physical data center with rackhd

Technology