first principles of kubernetes · first principles of kubernetes considerations and best practices...

W H I T E PA P E R – M AY 2 0 1 9

First Principles of KubernetesConsiderations and best practices for planning and implementation

W H I T E PA P E R | 2

First Principles of Kubernetes

Table of contents

Section 1: Cloud native architecture and Kubernetes 3

The need to modernize application delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Cloud native architecture is the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Uniform and effective management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Kubernetes solves enterprise challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Benefits for developers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Benefits for operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

A path to DevOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Section 2: Kubernetes fundamentals 5

An introduction to Kubernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

The Kubernetes cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Kubernetes master server: Key elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Kubernetes worker node: Key elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Kubernetes process flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

What Kubernetes does not do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Section 3: Planning and deploying Kubernetes 7

The importance of upstream Kubernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Is upstream Kubernetes practical for the enterprise? . . . . . . . . . . . . . . . . . . . . . . . . . 8

Before you deploy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

How many Kubernetes clusters do you need? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Where will you deploy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

What will you deploy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

What are your skills gaps? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Kubernetes design tips and tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Private, public, and hybrid clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Multi-cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Infrastructure and application availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

Kubernetes cluster architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

Section 4: Operating Kubernetes backup and recovery options 12

Operating Kubernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

API server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Custom controllers and custom API servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Secrets to streamline Kubernetes upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Section 5: Final thoughts and learning more 16



Section 1: Cloud native architecture and KubernetesThe need to modernize application deliveryIn a recent global survey of business decision-makers, 98 percent said that the delivery of digital applications and services is critical to their future business success. However, monolithic development teams—walled off from IT operations teams—cannot deliver new applications quickly enough or achieve a rapid cadence of new features and improvements.

Much of the infrastructure in your data centers is likely still in silos around individual applications, making it inelastic and difficult to scale. Resources are more or less isolated on dedicated, bare-metal servers, so utilization rates are low and expenses are high. Too much of your IT budget is spent on just keeping the lights on.

Ticket-based infrastructure—where developers submit resource requests and wait days, weeks, or even months for resources to be provisioned—lengthens application development cycles and is wholly inadequate to address the dynamic scaling needs of successful customer-facing digital services.

Cloud native architecture is the solutionAlthough the cloud might seem like the solution to these agility issues, you have to be careful not to trade one set of problems for another. You can quickly get locked into a single cloud provider with no easy way to move applications between cloud providers or back on premises. The answer to these challenges is a multi-cloud architecture built on containers and microservices:

• Containers encapsulate an application and its dependencies in a form that’s portable and easy to deploy. Containers can run on any compatible system—in any cloud—and consume resources much more efficiently than servers, enabling higher density and greater utilization.

• Microservices architecture breaks down an application into multiple component services, enabling greater parallelism during both development and execution.

Correctly implemented, cloud native architecture is inherently multi-cloud, allowing you to deploy applications in different public clouds or on premises to avoid lock-in and increase operational flexibility.

Uniform and effective managementManaging containerized applications uniformly and effectively is an essential element of cloud native architecture. Kubernetes has emerged as the leading solution for container management. Since it was first open sourced in 2014, the Kubernetes system has become the most popular container orchestrator, eclipsing other solutions such as Docker Swarm, Mesos, and many other early contenders. According to a study by the Cloud Native Computing Foundation, Kubernetes has been adopted by 69 percent of organizations—more than three times the footprint of any other platform.

This paper helps you understand Kubernetes and the rapidly expanding ecosystem of related projects. It includes tips on planning, designing, and managing a successful Kubernetes deployment.



Kubernetes solves enterprise challengesThe fundamental challenges faced by enterprises as they undergo digital transformation are how to deliver more and better applications and digital services more quickly and operate them at scale. Meeting these challenges boils down to two key goals:

• Facilitating the success of development teams

• Enabling operators to manage infrastructure resources more efficiently

Adopting a cloud native architecture with Kubernetes is the best way to achieve these goals.

Benefits for developersFor developers, the ticket-based infrastructure of conventional IT is replaced by self-service infrastructure that enables them to access the resources they need when they need them. Team structures become much more modular and continuous development practices take the place of extended development cycles.

Containers by themselves do a lot to simplify a developer’s life. As defined by Docker, a container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries, and settings. Containerizing an application gives developers an advantage in portability. But Docker alone is not enough for a large-scale container deployment, either for development or production. Kubernetes provides a clustered environment with uniform deployment, management, scaling, and availability services for containerized applications.

The Kubernetes community focuses on making the developer experience as clear and straightforward as possible. Everything is rigorously documented, and there’s a wealth of resources available for those getting started. A variety of open-source tools in the Kubernetes ecosystem addresses specific challenges and increases developer productivity. Developers have flexible options for developing on Kubernetes, which allows the development environment to be as similar to the production environment as desired.

Benefits for operatorsKubernetes also offers significant advantages for IT administrators and operators. Resources are clustered and can be consumed and released elastically, enabling seamless scaling and higher resource utilization. That translates to more services from a smaller footprint. Because common tasks are automated, Kubernetes eliminates many of the manual provisioning and other tasks of conventional enterprise IT. A self-healing approach to infrastructure reduces the criticality of failures, making fire drills less common and your teams more productive.

For IT teams trying to come to grips with cloud deployments in addition to on-premises operations, Kubernetes offers the promise of multi-cloud portability. Kubernetes clusters in multiple private and public clouds provide a uniform (or at least very similar) management environment and identical principles of operation, reducing the learning curve associated with adding a new cloud environment and minimizing the risk of operator errors.



A path to DevOpsMany enterprises are aligning their development and operations teams and embracing DevOps as a way to accelerate code delivery and improve operations. Kubernetes can accelerate and simplify the transition to DevOps. It is useful to think of DevOps as a cultural shift in which developers learn to care about how their applications are run in production and operations staff know how the application works so they can actively help make it more reliable. A cloud native approach with Kubernetes creates a pathway to understanding and empathy between teams with significant benefits for both developers and operators:

• More efficient and happier teams – Big problems can be broken down into smaller pieces for more focused and nimble teams.

• Less drudgery – Much of the manual work that causes operations pain and downtime is automated. Developers have better access to resources with less effort.

• More reliable infrastructure and applications – Building automation to handle expected churn results in better failure modes during unexpected events and failures.

• Auditable, visible, and debuggable applications – Complex applications can be opaque. Cloud native tools provide more insight into what is happening within an application.

• Deep security – A cloud native approach enables application developers to play an active role in creating securable applications.

• More efficient usage of resources – Cloud-like ways of deploying and managing applications and services open up new opportunities to apply algorithmic automation.

Section 2: Kubernetes fundamentalsAn introduction to KubernetesThe open-source Kubernetes project was started to make Google’s extensive experience managing containers available to developers and operations teams at large. Established in 2014 by Joe Beda, Brendan Burns, and Craig McLuckie, Kubernetes is the most active project on GitHub with more than 1,700 contributors. It has been under the governance of the Cloud Native Computing Foundation (CNCF) since 2015.

Developers find Kubernetes to be a practical, platform-agnostic framework for application development and management. Developers and operators appreciate Kubernetes for its open and extensible design, failure tolerance, ability to scale, and its open and transparent community.

While a complete explanation of the intricacies of Kubernetes is beyond the scope of this paper, it is important to understand a few basic concepts. If you’re already familiar with Kubernetes terms such as nodes and pods, you can skip this section.

The Kubernetes clusterThe term orchestration implies that there is a central conductor executing a set plan the way a musical conductor directs a score. Kubernetes is more like jazz improv. Like musicians playing jazz, the actors in Kubernetes play off each other to coordinate activities and react to events.



Master Node

etcdAPI server

Kubelet Docker

Scheduler Controller manager

Node

Kubelet Docker

FIGURE 1: A Kubernetes cluster .

In the architecture of Kubernetes, the master server provides the cluster control plane; it makes scheduling decisions and responds to cluster events. Application containers run on Kubernetes worker nodes. The containers that make up an application or service are grouped into a pod, simplifying management and discovery. Multiple pods that support a mix of different application microservices can be distributed across a cluster of nodes.

The desired number of instances of each pod are kept running by the system, which also ensures that aggregate demands don’t exceed cluster resources. Regular health checks verify the status of running pods.

Kubernetes master server: Key elements• etcd is the core state store for a cluster. etcd is the system of record.

• The API server is the only component in the cluster that talks to etcd. It allows various cluster components to create, read, write, update, and watch for resource changes. Its coordination results from one component writing to an API server resource that another component is watching. The second component then reacts to changes almost immediately.

• Built-in or external mechanisms provide authentication and authorization.

• Admission controllers within the API server reject or modify API requests if needed and ensure that data entering the system is valid.

• The scheduler determines where to run unassigned pods by picking a node that meets space requirements and other constraints.

• The controller manager implements the behavior of a ReplicaSet, which ensures a set number of replicas of each pod are running. The controller manager also creates and destroys pods as needed to maintain a stable set.

Kubernetes worker node: Key elements• Kubelet agent: The agent ensures the necessary containers for each pod assigned to the

node are running and healthy.

• Container runtime: Docker or another container runtime executes on each node to allow containers to run on the underlying operating system.



Kubernetes process flowThe following example shows the (somewhat rare) case where a user creates a pod directly. This example serves to illustrate how coordination occurs across a Kubernetes cluster. Typically, a user will create a ReplicaSet and the ReplicaSet creates the pod:

• A user creates a pod with a request to the API server and the API server writes it to etcd.

• The scheduler notices an unbound pod and determines on which node to run the pod. It writes the binding back to the API server.

• The kubelet notices a change in the set of pods bound to its node. It then runs the container with the container runtime.

• The kubelet monitors the status of the pod via the container runtime. As things change, the kubelet reflects the current status back to the API server.

What Kubernetes does not doKubernetes does not:

• Provide a platform as a service (PaaS) – While Kubernetes provides many of the elements for creating development environments, it doesn’t prescribe specific tools or methods.

• Implement continuous integration and continuous delivery (CI/CD) – Kubernetes provides building blocks that make CI/CD easier but doesn’t lock you into a particular tool set.

• Limit you to a microservices architecture – Any application that can be containerized can run on Kubernetes.

• Implement a collection of application services – Middleware, databases, clustered storage, caches, and other services can run on or be accessed from Kubernetes but are not prescribed by the system.

One of the main design principles of Kubernetes is extensibility. Kubernetes is extensible to work with the solutions you already rely on, including logging, monitoring, and alerting services. The CNCF and Kubernetes community are working on a variety of open-source solutions that complement Kubernetes, creating a rich ecosystem that enables you to flexibly address your operational requirements.

Section 3: Planning and deploying KubernetesThe importance of upstream KubernetesIf you’re new to Kubernetes and open-source software, you’re probably thinking about the best way for your organization to get started. The open-source software that enterprises are most familiar with is Linux. Instead of compiling the source code yourself, you can license a Linux distribution, such as Red Hat Enterprise Linux or SUSE Linux, that supplies compiled binaries, some features, and support services. Similarly, there are a number of Kubernetes distributions, and one option for getting started on your cloud native journey is to choose one of them as a first step.

A better alternative, however, is to choose upstream Kubernetes—Kubernetes built from the latest stable source version in the Kubernetes repository. Upstream Kubernetes is the best way to ensure access to the latest capabilities and to avoid lock-in to any one vendor’s distribution. To date, all the major public cloud providers have committed to deliver products that conform with upstream Kubernetes; therefore, organizations that are serious about multi-cloud operations should demand the portability that is only possible with upstream Kubernetes.



Nothing is intrinsically bad about creating a distribution. It is a way for a distributor to control what its customers run. A distribution lets a distributor test more efficiently and safely invest resources in creating a better experience for developers and operators. Distributors have strong incentives to deliver differentiated experiences—and differentiated capabilities. As a distribution develops a following, customers clamor for features. The community cannot move as fast as the distributor can, so the distributor delivers a patch. As a result, the community becomes fragmented one customer request at a time.

When it comes to distributions, however, the circumstances that surround Kubernetes are different from those surrounding Linux. There are four important reasons to choose upstream Kubernetes over Kubernetes distributions, at least for now:

• Kubernetes is evolving quickly – Because vendors need to integrate and vet every new Kubernetes release with their proprietary code, distribution-based offerings often lag behind open-source Kubernetes by two to three versions. This lag creates a substantial delay before you get new and important features.

• Avoiding lock-in is essential – Distributions can be based on a deep fork of Kubernetes, where vendors build proprietary capabilities that aren’t necessarily compatible with the upstream Kubernetes source. You can get locked in if you create applications that rely on closed-source capabilities. Understanding where community-provided technology ends and vendor-proprietary capabilities begin can be difficult. Distributions may also only work with a limited set of technologies for networking, host OS, and more.

• Kubernetes needs to be compatible everywhere – When the first Linux distributions were released, there was no cloud. Linux was something you ran in your data centers; you could pick a single distribution and install it everywhere. You may want to run Kubernetes on top of your existing virtualization environment or on a variety of bare-metal infrastructure and in multiple clouds. Many distributions limit your flexibility to run on a hosted solution such as Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), or Amazon Elastic Container Service for Kubernetes (EKS). And distributions often involve an opinionated installation mechanism that is unlikely to be flexible enough for enterprise deployments.

• Cost – Distributions are often priced based on environment size; licensing costs can run up to $15,000 annually per node. As you add nodes, you add cost in a linear fashion, so the cost of these distributions can quickly spiral out of control.

Is upstream Kubernetes practical for the enterprise?Many enterprises may initially be uncomfortable with the idea of running open-source software directly, especially for critical infrastructure. In the early days, deploying and operating an open-source Kubernetes environment was difficult, but the project has come a long way. Today, there is little reason to be intermediated by a vendor charging enterprise software rates to deliver technology where the real value is delivered by the Kubernetes community.

Distributions mostly ignore a key capability that Kubernetes was created to address: cloud independence. Most distributions do not run on cloud-hosted container-as-a-service solutions. Although your organization may not be using those services now, you shouldn’t make technology decisions today that preclude their use down the road.

UPSTREAM VS. DOWNSTREAM

In software development, the terms upstream and downstream are used to describe how close code is to the source repository. Kubernetes distributions are by definition downstream from the Kubernetes source repository. Upstream Kubernetes means that Kubernetes is based on the latest stable version of the Kubernetes project on GitHub.



Some vendors recognize this need and offer innovative support and subscription plans. These options deliver the full benefits of upstream Kubernetes with the services and support you need to address complex technology transformations and stringent SLAs.

Before you deployWhen planning a Kubernetes deployment, it is helpful to begin by stating your goals and success criteria. Who are the end users of your infrastructure, and what kinds of applications or workloads will they run? What are the uptime, performance, recovery, and other requirements for each workload? With answers to these questions, you can decide what to deploy, and where and how you deploy it.

How many Kubernetes clusters do you need?Many enterprises run multiple clusters and distribute those clusters across deployment environments. The most common use case for multiple clusters is to separate a production environment from development or testing environments.

If your IT resources are already distributed across multiple locations, it often makes sense to set up a separate cluster for each set of resources. You might want multiple locations to reduce latency across geographic regions, support redundancy, or make efficient use of physical or virtual resources already in place. Multiple clusters provide better support for workload containment than a single cluster, minimizing the effects of an outage.

However, a single cluster might be the right choice if your initial deployment is for proof of concept or development only, your availability requirements are minimal, or the resources you have to devote to your Kubernetes project are limited.

Where will you deploy?Once you decide how many clusters you need, the next question is where to deploy them. For example, you could run clusters in multiple AWS availability zones within a single region. Or you could run your clusters on a mix of different cloud providers, a mix of cloud and on-premises environments, or multiple on-premises environments. If you’re deploying on premises, will the infrastructure be virtualized or bare metal? New or existing hardware? If you’re deploying in the cloud, which clouds will you choose?

What will you deploy?Knowing the mix of environments you intend to deploy will help you zero in on what to deploy. There are three general Kubernetes options to consider:

• Container as a service, such as GKE, EKS, or AKS

• Upstream Kubernetes

• A Kubernetes distribution

The previous section advocated upstream Kubernetes over a Kubernetes distribution. However, you may still opt for a distribution, especially if all your environments will be on premises. (But see the discussion of multi-cloud environments in the next section.)

VMWARE ESSENTIAL PKS AT A GLANCE

With VMware® Essential PKS, you can build a cloud native operation that makes the best use of upstream Kubernetes and open-source technology, giving you a flexible foundation to select the right tools and the right clouds for your workloads. VMware Essential PKS gives you access to the support and expertise needed to build a production-grade open-source Kubernetes platform.

KEY BENEFITS OF VMWARE ESSENTIAL PKS

• Signed and fully supported upstream Kubernetes binaries

• Proven reference architectures for running upstream Kubernetes in production on physical hardware, public clouds, and VMware vSphere®

• Expert guidance at every step of your Kubernetes journey

• Flexibility to implement a multi-cloud strategy that lets you select the right tools and clouds for your workloads without vendor lock-in

W H I T E PA P E R | 1 0


If you’re deploying in the cloud alone, you could opt for a particular provider or a pair of providers and use their Kubernetes services. To ensure the greatest flexibility in the future, upstream Kubernetes can be deployed on premises, in the cloud, or both to ensure compatibility across multiple environments while avoiding lock-in.

In addition to Kubernetes, you also need to think about and plan other software, tools, and services needed in your environment. These include the host operating system, authentication, data protection, disaster recovery, and CI/CD and other development tools.

What are your skills gaps?A final consideration is to identify the skills gaps that exist in your team. You’ll want to make sure that people receive the training they need ahead of time, and you may need to hire people with new skills, as well. Enterprises new to Kubernetes often seek outside professional services or consulting help to fill gaps and assist with initial planning, deployment, and operations.

Kubernetes design tips and tricksMost enterprises are still trying to work out the best strategies to balance on-premises resources with the public cloud. Here are some tips to help clarify your thinking on hybrid and multi-cloud environments, as well as how to think about availability and cluster architecture.

Private, public, and hybrid clouds

A simplistic view of cloud is that it is an easy way to outsource your infrastructure management problems. But, for sophisticated enterprises with relatively predictable workloads, there are still cost advantages to running your own infrastructure. Here are several cloud-related tips:

• While you may be able to achieve cost-efficiency without using the public cloud today, it is likely to become increasingly difficult to maintain that advantage as cloud providers continue to grow and compete.

• You don’t truly have a private cloud unless developers can provision infrastructure and a robust, useful set of standard services on demand. Kubernetes was developed to decouple workloads from the underlying infrastructure and is the starting point for a true hybrid cloud. Kubernetes includes a set of services that are provisioned by API, instead of a ticket.

• Look at Kubernetes as a way to become cloud ready even if you are not moving applications to a public cloud today. Kubernetes helps ensure portability of applications and skills.

Multi-cloud

The three major public cloud providers in the U.S.—AWS, Google Cloud Platform, and Microsoft Azure—each compete to offer differentiated services and are making massive investments and a fast cadence of improvements.

Here’s another tip: Even if you are operating on premises or in a single cloud today, it makes sense to hedge your bets. Try to avoid doing anything that will prevent you from taking advantage of public cloud services and capabilities in the future. Kubernetes makes this easier by decoupling developer workflows and apps from your cloud provider.



The public cloud providers offer a variety of great services that you will want to use. Just recognize that detangling your infrastructure from deep service dependencies can be difficult. To minimize lock-in, you have to be smart about your approach to these services. Here are two more tips with that in mind:

• Be judicious about the service dependencies you take on and approach each service dependency on an ROI basis.

• When possible, consider building your own reusable services instead of using cloud provider services.

Efforts are under way to create a structured means to map services into a cluster environment such as Kubernetes to improve portability. The result of this effort will allow an application to move from one environment to another and automatically connect to the equivalent service for that environment.

Infrastructure and application availabilityIf you are down for 10 seconds a day for 10 years, there is a chance no one will notice. If, however, you are down for a whole business day, people will notice. During a 10-year time frame, both are examples of the same SLA: 99.99 percent availability.

People building real-world systems often fail to separate the idea of regularity of service from the impact of a single interruption event. More specifically, a relentless focus on driving down mean time to failure (MTTF) can introduce complexities that might increase mean time to repair (MTTR). Although MTTF can be difficult to quantify, MTTR can be tested and understood, making it a smart move to take steps to reduce MTTR.

One of the great things about Kubernetes is that in most cases it significantly reduces the MTTR for most application-level outages from potentially hours (if operators are involved in recovery operations) to seconds. You can move a container to a new node in a matter of seconds.

Many Kubernetes adopters are giddy when they realize that many formerly pageable events become a non-issue—the cluster detects a failure, restarts the container, and no one is the wiser. You can significantly boost availability for a traditional application by containerizing it, making sure the health-checking model is correct, and deploying it on Kubernetes. Moreover, application-level outages usually dominate MTTF. Kubernetes can smooth over outages and isolate your application from the effects of node failures.

Kubernetes cluster architectureAlmost every significant Kubernetes outage event has been the result of operator error. Often, an operator mistypes a command and pushes a broken configuration to a large number of nodes. While federation technologies that stitch together deployments across multiple zones solve the problem of zone failure, when a single control plane stretches across zones, the impact radius of mistakes also stretches across zones.



etcdAPI server

Scheduler

Workers

Controllermanager

Availability zone A Availability zone A Availability zone BAvailability zone B Availability zone C

Controlmanager

API server API server

Controlmanager

etcd etcd etcd

Workers Workers Workers

Single cluster, multiple availability zones Cluster per availability zone

etcdAPI server

Scheduler

Workers

Controllermanager

FIGURE 2: Examples of Kubernetes cluster architecture .

Here are several tips to keep in mind for your Kubernetes cluster architecture:

• A modest substitution of human toil for fancy federation will often insulate you from operator-driven outages spanning zones. Simpler systems might achieve better overall availability, particularly when you factor in MTTR.

• To achieve availability without federation, run two or more independent Kubernetes clusters in different failure zones with a load balancer in front of them to distribute the load. In each zone, implement an isolated scaling mechanism that allows a cluster to grow if all the load is delivered to that single zone. If an operator breaks something in a zone, a simple configuration update allows you to recover with a low MTTR.

• If you do decide to use a technology that federates or spreads loads across failure domains to achieve very high availability, consider using something built and run by a public cloud provider. Build in technical or procedural controls to avoid pushing configurations that affect multiple failure domains simultaneously.

• Unless you are on the absolute bleeding edge of distributed systems design and have deep operations experience, you are not going to get better top-line availability for a service by actively running it across multiple public cloud providers.

It is not recommended to actively run applications across multiple cloud providers today because it likely creates more problems than it solves. It can’t hurt, however, to have a playbook that allows you to get critical services up and running in another cloud if absolutely necessary. Consider:

• A framework that allows you to turn up a critical service somewhere else to create a business survivable MTTR

• Technology that supports the propagation of critical data to another cloud for safekeeping

• Testing this failover from time to time

Section 4: Operating Kubernetes backup and recovery optionsData protection in a Kubernetes environment is somewhat different—but no less important—than in a traditional IT environment.

With traditional backup, there’s generally a strong correspondence between an application and a server—either physical or virtual. Back up that server or VM, and you can be pretty sure you can restore the application if a problem occurs. In Kubernetes, the master and worker nodes are stateless. If one of them fails for any reason, you don’t restore it from a backup, you just make sure that another node is started to take its place.



FIGURE 3: The API request process .

However, there are two things that absolutely must be backed up:

• etcd – Because all configuration and state information for a Kubernetes cluster is stored in etcd, it is critical to protect it. Backups of etcd may be used to recover in case of catastrophe and may also be used to clone an existing cluster.

• Persistent volumes – Applications running in a Kubernetes environment may use persistent storage volumes that allow them to save data and state, even as container instances come and go.

etcd has built-in backup and restore tooling that is useful for recovering from data loss in a single etcd cluster. For example, it is a good idea to take a backup of etcd before upgrading it. Keep in mind that you can’t use this means of recovery without taking a complete cluster outage.

There’s currently nothing built into Kubernetes for backup and recovery of persistent volumes. Well-known backup vendors for traditional enterprise IT haven’t done much work to accommodate Kubernetes—at least not yet.

For backups of persistent storage, and more sophisticated management of Kubernetes cluster backups and restores, there are a number of open-source projects. One popular example is Velero. Velero helps manage disaster recovery for Kubernetes cluster resources and persistent volumes. It provides a simple, configurable, and operationally robust way to back up and restore applications and persistent volumes from a series of checkpoints, reducing time to recover in the case of infrastructure loss, data corruption, and services outages. Velero works with Kubernetes deployments both on premises and in the cloud.

In addition to backup and recovery, Velero can also do the following:

• Migrate Kubernetes resources from one cluster to another

• Replicate production environments to create development and testing environments

API request

AuthenticationHas this user proven their identity?

Access controlIs this user allowed to perform this action?

Admission controlDoes this request look good?

Process the request

Fail the request



FIGURE 4: Examples of custom controllers and custom API servers .

Operating KubernetesThink proactively about Kubernetes operations rather than reacting to problems as they arise. Successful Kubernetes deployments are just as much a cultural shift as they are a technical one. Although the underlying goals are similar to conventional IT, operating Kubernetes requires a different mindset. Kubernetes is simple relative to the frameworks of the past. It is best to adopt the patterns of Kubernetes and work with the framework rather than against it.

Think about the user experience up front and avoid trying to deliver everything. If you have great tools that your users already know, use them. That helps ensure that users stay happy during the transition.

There are three important areas to understand:

• Platform – Think carefully about how you deploy etcd and nodes, and the architecture of your Kubernetes environment as a whole.

• API server – Take advantage of the management capabilities that the API server provides by default.

• Controllers – Utilize controllers to extend Kubernetes functionality for specific needs.

The platform decisions you make during installation affect operations. Hardware parameters need to be adjusted to work with your hardware. These parameters are not dynamically tunable in production—at least not yet. Other platform considerations include the following:

• etcd – Most Kubernetes components are ephemeral. Except etcd. It should be deployed on three to five nodes, and those nodes should be external to the Kubernetes cluster.

API server

Scheduler

Custom controller

Controlmanager Scheduler

CustomAPI server

CustomControllermanager

Controlmanager

etcdAPI server

etcd

etcd

POD

POD

Custom controllers/CRD Aggregate API



• Additional services – What additional services will you deploy to help you manage your cluster?

According to a CNCF survey, some of the most popular services include:

• Conformance testing with Sonobuoy

• Packaging with Helm

• Ingress with nginx, HAProxy, and Envoy

• Separation with namespaces, customers, and only labels

• Recovery with Velero

• Availability: Application availability—and therefore cluster architecture—can be a critical consideration

API serverThe Kubernetes API server gives you control over important cluster management functions:

• Authentication – Most Kubernetes installers provide you with a single user certificate. Many new sites end up using that single certificate for everything with no real authentication strategy. Kubernetes itself does not define users. It delegates handling of users to another process and integrates with existing site-wide user management. OpenID Connect (OIDC) is recommended for authentication. You will need Dex to integrate OIDC with Active Directory and LDAP. The Gangway project can automate user onboarding.

• Access control – Always implement role-based access control (RBAC) and don’t forget to lock down the Kubernetes dashboard; it provides access to everything.

• Admission control – This is where you can begin to implement business logic and exercise greater control over user actions.

• Resource quotas can prevent a single pod from taking control of an entire worker node.

• Pod security policies provide a broad set of rules to control what pods can and cannot do.

• Dynamic admission controls allow you to modify or mutate incoming user requests on the fly to conform to rules and standards.

Custom controllers and custom API serversIn Kubernetes, controllers are implemented to ensure that the observed state of the cluster is as close as possible to the desired state.

Kubernetes allows you to define custom resources using a custom resource definition (CRD) and act on those resources with custom controllers and custom API servers. This allows you to flexibly satisfy unique application and business requirements.

For example, suppose you want each container in a pod to connect to a specific service, such as a database. A custom controller can ensure this happens every time that type of container is started and also unregister the container when it terminates. A custom controller can be implemented with as few as 20 lines of code.

Kubernetes API aggregation provides a further level of Kubernetes customization, allowing you to create your own API server. When the Kubernetes API server receives a request for an object it doesn’t know, it proxies that request to your custom API server. Setup is slightly more difficult because a custom API server has its own etcd for storing state. Aggregation is most useful in situations where there is a lot of churn and the potential to impact the cluster etcd.



There are samples available to help you get started creating a custom controller and a custom API server.

Secrets to streamline Kubernetes upgradesAny change to running software includes risks. This is doubly true for infrastructure software such as Kubernetes where production services depend on the software.

The risks associated with upgrading a Kubernetes cluster include the following:

• etcd data corruption – A cluster can become unusable. Every node may need to be reset to restore proper operation.

• Control plane downtime – Applications remain running, but you are unable to make changes.

Paying attention to these tips can help ensure upgrade success.

Here’s a pre-upgrade checklist to consider before upgrading Kubernetes:

• Review the release notes for the new release, especially any dependencies. Use supported versions of dependencies such as Docker.

• Perform a backup and verify it can be restored.

• Create and review your upgrade plan.

• Verify the upgrade plan in a non-production cluster if possible.

• Upgrade masters first when the upgrade commences. The control plane components are designed to be as backward-compatible as possible, so you can upgrade the worker nodes after the masters. Upgrading the worker nodes first runs the risk of introducing changes not understood by the control plane.

• Don’t skip versions. While you can skip versions during your upgrade process, it is not advised to do so. Pay special attention to the package versions when updating binaries; it’s easy to accidentally upgrade to the latest version.

• Test cluster conformance both before and after upgrades. While there are different methods available to test cluster conformance, there is no better way to determine that a cluster is functioning as intended than running the end-to-end conformance tests. These tests will give you a better understanding of the state of your cluster. Sonobuoy makes running end-to-end conformance easier.

Section 5: Final thoughts and learning moreCloud native architecture and Kubernetes are creating a sea change in the way enterprises design, create, deploy, and operate applications. Many enterprises are moving quickly to adopt these technologies to stay competitive. The topics discussed in this white paper serve as a useful starting point for anyone beginning the Kubernetes journey. The following resources can help you learn more:

• White paper – How to Think Cloud Native

• eBooks:

– Kubernetes Up and Running

– Cloud Native Infrastructure

– Managing Kubernetes

LEARN MORE ABOUT CLOUD NATIVE TECHNOLOGY FROM VMWARE

To find out more about how VMware can help you build, run, and manage cloud native applications, visit https://cloud.vmware.com/.

https://assets.contentstack.io/v3/assets/blt58b49a8a0e43b5ff/bltfb0d23f8f9b70e55/5cdae147b111f9880b778bbe/How-To-Think-Cloud_Native.pdf

https://pages.cloud.vmware.com/kubernetes-up-and-running-ebook

https://pages.cloud.vmware.com/cloud-native-infrastructure-ebook

https://pages.cloud.vmware.com/managing-kubernetes-ebook

https://cloud.vmware.com/

VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 vmware .com Copyright © 2019 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. and its subsidiaries in the United States and other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. Item No: 217250aq-vmw-wp-principles-kubernetes-uslet 5/19

first principles of kubernetes · first principles of kubernetes considerations and best practices...

Documents