journey from docker to kubernetes

JOURNEY FROM DOCKER TO KUBERNETES

Eshwar [email protected]

Knowledge Sharing Article © 2017 Dell Inc. or its subsidiaries.

2018 Dell EMC Proven Professional Knowledge Sharing 2

Table of Contents

Preface……………………………………………………………………………..3

Introduction to Docker and Kubernetes………………………….. 4

Kubernetes Architecture…………………………………………………..5

Automated Container Management…………………………………8

Container Engine features………………………………………………..10

Running Containers………………………………………………………….11

Kubernetes work units……………………………………………………..14

Conclusion………………………………………………………………………..15

Disclaimer: The views, processes or methodologies published in this article are those of the author. They

do not necessarily reflect Dell EMC’s views, processes or methodologies.


Preface

Containers have changed the way applications are packaged, distributed, and deployed in the modern

datacenter. They provide the prefect abstraction for complex applications in the form of an image,

which bundle applications, along with their dependencies, into a single artifact that’s easy to distribute

and run under a container runtime engine such as Docker or rkt.

Containers offer a lighter, more agile alternative to virtual machines for isolation between applications.

The ease of building and running containers has led to very high density application deployments, which

in turn has driven a need for more robust tools for container management, and stood optimal in

performance, resource utilization, and portability between different platforms.


Introduction to Docker and Kubernetes

Kubernetes is a powerful system for managing containerized applications in a clustered environment.

Developed by Google, it aims to provide better ways of managing related, distributed components

across varied infrastructure.

In this guide, I will discuss some basic concepts of Kubernetes including the architecture of the system,

the problems it solves, and the model that it uses to handle containerized deployments and scaling.

What is Kubernetes?

Kubernetes is a system for managing containerized applications across a cluster of nodes. In many ways,

this is designed to address the disconnect between the way that modern, clustered infrastructure is

designed, and some of the assumptions that most applications and services have about their

environments.

Most clustering technologies strive to provide a uniform platform for application deployment. The user

should not have to care much about where work is scheduled. The unit of work presented to the user is

at the "service" level and can be accomplished by any of the member nodes.

However, in many cases, it does matter what the underlying infrastructure looks like. When scaling out

an app, an administrator cares that the various instances of a service are not all being assigned to the

same host.

On the other side of things, many distributed applications built with scaling in mind are actually made up

of smaller component services. These services must be scheduled on the same host as related

components if they are going to be configured in a trivial way. This becomes even more important when

they rely on specific networking conditions in order to communicate appropriately.

While it is possible with most clustering software to make these types of scheduling decisions, operating

at the level of individual services is not ideal. Applications comprised of different services should still be

managed as a single application in most cases. Kubernetes provides a layer over the infrastructure to

allow for this type of management.


Components of Kubernetes

Infrastructure-level systems like Core OS strive to create a uniform environment where each host is

disposable and interchangeable. Kubernetes, on the other hand, operates with a certain level of host

specialization.

The controlling services in a Kubernetes cluster are called the master, or control plane, components.

These operate as the main management contact points for administrators, and also provide many

cluster-wide systems for the relatively dumb worker nodes. These services can be installed on a single

machine or distributed across multiple machines.

The servers running these components have a number of unique services that are used to manage the

cluster's workload and direct communications across the system. Below, we will cover these

components.


Etcd

Fundamental components that Kubernetes needs to function are a globally available configuration

store.

This is a lightweight, distributed key-value store that can be distributed across multiple nodes.

Kubernetes uses etcd to store configuration data that can be used by each of the nodes in the cluster.

This can be used for service discovery and represents the state of the cluster that each component can

reference to configure or reconfigure themselves.

Values provided are HTTP/JSON API.

Like most other components in the control plane, etcd can be configured on a single master server or, in

production scenarios, distributed among a number of machines.

The only requirement is that it be network accessible to each of the Kubernetes machines.

API Server

This is the management point of the entire cluster

Allows Kubernetes' workloads and organizational units

Ensures that etcd store and the service details of deployed containers are in agreement

Acts as the bridge between various components to maintain cluster health and disseminate

information and commands

Implements a RESTful interface; many different tools and libraries can readily communicate with

it.

kubecfg is the client packaged along with the server-side tools and can be used from a local

computer to interact with the Kubernetes cluster.

Controller Manager Service

This is responsible for number of controllers that regulate the state of the cluster and perform routine

tasks. For example, the replication controller ensures that the number of replicas defined for a service

matches the number currently deployed on the cluster. The details of these operations are written

to etcd, where the controller manager watches for changes through the API server.

When a change is seen, the controller reads the new information and implements the procedure that

fulfills the desired state. This can involve scaling an application up or down, adjusting endpoints, etc.


Scheduler Service

This is the process that assigns workloads to specific nodes in the cluster. Scheduler Service is used to

read in a service's operating requirements, analyze the current infrastructure environment, and place

the work on an acceptable node or nodes.

The scheduler is responsible for tracking resource utilization on each host to make sure that workloads

are not scheduled in excess of the available resources. The scheduler must know the total resources

available on each server, as well as the resources allocated to existing workloads assigned on each

server.

Node Server Components

In Kubernetes, Nodes are servers. Node servers have a few requirements that are necessary to

communicate with the master components, configure the networking for containers, and run the actual

workloads assigned to them.

Docker Running on a Dedicated Subnet

The first requirement of each individual node server is docker. The docker service is used to run

encapsulated application containers in a relatively isolated but lightweight operating environment. Each

unit of work is, at its basic level, implemented as a series of containers that must be deployed.

One key assumption that Kubernetes makes is that a dedicated subnet is available to each node server.

This is not the case with many standard clustered deployments. For instance, with CoreOS, a separate

networking fabric called flannel is needed for this purpose. Docker must be configured to use this so

that it can expose ports in the correct fashion.

Kubelet Service

The main contact point for each node with the cluster group is through a small service called kubelet.

This service is responsible for relaying information to and from the control plane services, as well as

interacting with the etcd store to read configuration details or write new values.

The kubelet service communicates with the master components to receive commands and work. Work

is received in the form of a "manifest" which defines the workload and the operating parameters.

The kubelet process then assumes responsibility for maintaining the state of the work on the node

server.

Proxy Service

To deal with individual host subnetting and make services available to external parties, a small proxy

service is run on each node server. This process forwards requests to the correct containers, can execute

primitive load balancing, and is generally responsible for making sure the networking environment is

predictable and accessible, but isolated.


Automated Container Management

Google Container Engine is a powerful cluster manager and orchestration system for running your

Docker containers. Container Engine schedules your containers into the cluster and manages them

automatically based on requirements you define (such as CPU and memory). It's built on the open

source Kubernetes system, giving you the flexibility to take advantage of on-premises, hybrid, or public

cloud infrastructure.

Set Up a Cluster in Minutes

Set up a managed container cluster of virtual machines, ready for deployment in just minutes. Your

cluster is equipped with capabilities, such as logging and container health checking, to make application

management easier.


Declarative Management

Declare your containers' requirements – such as the amount of CPU/memory to reserve, number of

replicas, and keep alive policy – in a simple JSON config file. Container Engine will schedule your

containers as declared, and actively manage your application to ensure requirements are met.

Flexible & Interoperable

With companies like CoreOS, Huawei, IBM, OpenStack, Red Hat, and VMware (and the list keeps

growing) integrating Kubernetes into their platforms, you'll be able to move workloads, or take

advantage of multiple cloud providers, more easily.


Container Engine features

Kubernetes and Docker Swarm are probably the two most commonly used tools to deploy containers

inside a cluster. Both are created as helper tools that can be used to manage a cluster of containers and

treat all servers as a single unit. However, they differ greatly in their approach.

Kubernetes

Kubernetes is based on Google’s experience of many years working with Linux containers. It is, in a way,

a replica of what Google has been doing for a long time but, this time, adapted to Docker. That approach

is great in many ways, most important being that they used their experience from the start. If you

started using Kubernetes around Docker version 1.0 (or earlier), the experience with Kubernetes was

great. It solved many of the problems that Docker itself had. We could mount persistent volumes that

allowed us to move containers without losing data, used flannel to create networking between

containers, has load balancer integrated, uses etcd for service discovery, and so on. However,

Kubernetes comes at a cost. It uses a different CLI, API, and YAML definitions. In other words, you

cannot use Docker CLI nor you can use Docker Compose to define containers. Everything needs to be

done from scratch exclusively for Kubernetes. It’s as if the tool was not written for Docker (which is

partly true). Kubernetes brought clustering to a new level but at the expense of usability and steep

learning curve.

http://kubernetes.io/

https://github.com/coreos/flannel

https://github.com/coreos/etcd

https://docs.docker.com/compose/

https://technologyconversations.files.wordpress.com/2015/10/kubernetes.png


Docker Swarm

Swarm is a native clustering for Docker. The best part is that it exposes standard Docker API meaning

that any tool that you used to communicate with Docker (Docker CLI, Docker Compose, Dokku, Krane,

and so on) can work equally well with Docker Swarm. That in itself is both an advantage and a

disadvantage. Being able to use familiar tools of your own choosing is great but for the same reasons,

we are bound by the limitations of Docker API. If the API doesn’t support something, there is no way

around it through Swarm API and some clever tricks need to be performed.

We’ll explore those two tools in more details based on their setup and features they provide for running

containers in a cluster.

Running Containers

No need to define all the arguments needed for running Docker containers with Swarm. To run

containers through Docker CLI, we can continue using it with nearly the same commands. If you prefer

to use Docker Compose to run containers, you can continue using it to run them inside the Swarm

cluster. Whichever way you’re used to running your containers, chances are that you can continue doing

the same with Swarm but on a much larger scale.

Kubernetes requires you to learn its CLI and configurations. You cannot use docker-

compose.yml definitions you created earlier. You’ll have to create Kubernetes equivalents. You cannot

use Docker CLI commands you learned before. You’ll have to learn Kubernetes CLI and, likely, make sure

that the whole organization learns it as well.

No matter which tool you choose for deployments to your cluster, chances are you are already familiar

with Docker. You are probably already used to Docker Compose as a way to define arguments for the

https://technologyconversations.files.wordpress.com/2015/11/docker-swarm.png


containers you’ll run. If you played with it for more than a few hours, you are using it as a substitute for

Docker CLI. You run containers with it, tail their logs, scale them, and so on. On the other hand, you

might be a hardcore Docker user who does not like Docker Compose and prefer running everything

through Docker CLI or you might have your own bash scripts that run containers for you. No matter

what you choose, it should work with Docker Swarm.

If you adopt Kubernetes, be prepared to have multiple definitions of the same thing. You will need

Docker Compose to run your containers outside Kubernetes. Developers will continue needing to run

containers on their laptops, your staging environments might or might not be a big cluster, and so on. In

other words, once you adopt Docker, Docker Compose or Docker CLI are unavoidable. You have to use

them one way or another. Once you start using Kubernetes you will discover that all your Docker

Compose definitions (or whatever else you might be using) need to be translated to the Kubernetes way

of describing things and, from there on, you will have to maintain both. With Kubernetes, everything will

have to be duplicated resulting in higher maintenance cost. And it’s not only about duplicated

configurations. Commands you’ll run outside of the cluster will be different from those inside the

cluster. All those Docker commands you learned and love will have to get their Kubernetes equivalents

inside the cluster.

The people behind Kubernetes are not trying to make your life miserable by forcing you to do things

“their way”. The reason for such big differences is in different approaches Swarm and Kubernetes use to

tackle the same problem. Swarm team decided to match their API with the one from Docker. As a result,

we have (almost) full compatibility. Almost everything we can do with Docker we can do with Swarm as

well, only on a much larger scale. There’s nothing new to do, no configurations to be duplicated and

nothing new to learn. Whether you use Docker CLI directly or go through Swarm, the API is (more or

less) the same. The negative side is if there is something you’d like Swarm to do and that something is

not part of the Docker API, you’re in for a disappointment. Let us simplify this a bit. If you’re looking for

a tool for deploying containers in a cluster that will use Docker API, Swarm is the solution. On the other

hand, if you want a tool that will overcome Docker limitations, you should go with Kubernetes. It is

power (Kubernetes) against simplicity (Swarm). Or, at least, that’s how it was until recently. But, I’m

jumping ahead of myself.

The only question unanswered is what those limitations are. Two of the major ones were networking

and persistent volumes. Until Docker Swarm release 1.0 we could not link containers running on

different servers. Actually, we still cannot link them but now we have multi-host networking to help

us connect containers running on different servers. It is a very powerful feature. Kubernetes

used flannel to accomplish networking and now, since the Docker release 1.9, that feature is available

as part of Docker CLI.

Another problem was persistent volumes. Docker introduced them in release 1.9. Until recently, if you

persist a volume, that container was tied to the server where that volume resides. It could not be moved

around without, again, resorting to some nasty tricks like copying volume directory from one server to

another. That in itself is a slow operation that defies the goals of tools like Swarm. Besides, even if you

have time to copy a volume from one server to another, you do not know where to copy since clustering

https://github.com/coreos/flannel


tools tend to treat your whole datacenter as a single entity. Your containers will be deployed to a

location most suitable for them (least number of containers running, most CPUs or memory available,

and so on). Now we have persistent volumes supported by Docker natively.

Both networking and persistent volumes problems were one of the features supported by Kubernetes

for quite some time and the reason why many were choosing it over Swarm. That advantage

disappeared with Docker release 1.9.

The Choice

We should think in the following terms when trying to make a choice between Docker Swarm and

Kubernetes. Do you want to depend on Docker itself solving problems related to clustering. If so choose

Swarm. If something is not supported by Docker it is unlikely that it will be supported by Swarm since it

relies on Docker API. On the other hand, if you want a tool that works around Docker limitations,

Kubernetes might be the right one for you. Kubernetes was not built around Docker but is based on

Google’s experience with containers. It is opinionated and tries to do things in its own way.

The real question is whether Kubernetes’ way of doing things, which is quite different from how we use

Docker, is overshadowed by advantages it gives. Or, should we place our bets into Docker itself and

hope that it will solve those problems? Before you answer those questions, take a look at Docker release

1.9. We got persistent volumes and software networking. We also got unless-stopped restart policy

that will manage our unwanted failures. Now there are three things less of a difference between

Kubernetes and Swarm. Actually, these days there are very few advantages Kubernetes has over Swarm.

On the other hand, Swarm uses Docker API meaning that you get to keep all your commands and Docker

Compose configurations. Personally, I’m placing my bets on Docker engine getting improvements and

Docker Swarm running on top of it. The difference between the two is very small. Both are production-

ready but Swarm is easier to set up, easier to use and we get to keep everything we built before moving

to the cluster; there is no duplication between cluster and non-cluster configurations.

My recommendation is to go with Docker Swarm. Kubernetes is too opinionated, hard to set up, too

different from Docker CLI/API and at the same time it doesn’t have real advantages over Swarm since

Docker release 1.9. That doesn’t mean that there are no features available in Kubernetes that are not

supported by Swarm. There are feature differences in both directions. However, those differences are,

in my opinion, not major and the gap is getting smaller with each Docker release. Actually, for many use

cases there is no gap at all while Docker Swarm is easier to set up, learn, and use.


Kubernetes work units

While containers are used to deploy applications, the workloads that define each type of work are

specific to Kubernetes. We will go over the different types of "work" that can be assigned below.

Pods

Basic unit of kubernetes is a pod. Containers themselves are not assigned to hosts. Instead, closely

related containers are grouped together in a pod. A pod generally represents one or more containers

that should be controlled as a single "application".

This association leads all of the involved containers to be scheduled on the same host. They are

managed as a unit and they share an environment. This means that they can share volumes and IP

space, and can be deployed and scaled as a single application. You can and should generally think of

pods as a single virtual computer in order to best conceptualize how the resources and scheduling

should work.

The general design of pods usually consists of the main container that satisfies the general purpose of

the pod, and optionally some helper containers that facilitate related tasks. These are programs that

benefit from being run and managed in their own container, but are heavily tied to the main application.

Horizontal scaling is generally discouraged on the pod level because there are other units more suited

for the task.

Services

We have been using the term "service" throughout this guide in a very loose fashion, but Kubernetes

actually has a very specific definition for the word when describing work units. A service, when

described this way, is a unit that acts as a basic load balancer and ambassador for other containers. A

service groups together logical collections of pods that perform the same function to present them as a

single entity.

This allows you to deploy a service unit that is aware of all of the backend containers to pass traffic to.

External applications only need to worry about a single access point, but benefit from a scalable backend

or at least a backend that can be swapped out when necessary. A service's IP address remains stable,

abstracting any changes to the pod IP addresses that can happen as nodes die or pods are rescheduled.

Services are an interface to a group of containers so that consumers do not have to worry about

anything beyond a single access location. By deploying a service, you easily gain discoverability and can

simplify your container designs.


Replication Controllers

A more complex version of a pod is a replicated pod. These are handled by a type of work unit known as

a replication controller.

A replication controller is a framework for defining pods that are meant to be horizontally scaled. The

work unit is, in essence, a nested unit. A template is provided, which is basically a complete pod

definition. This is wrapped with additional details about the replication work that should be done.

The replication controller is delegated responsibility over maintaining a desired number of copies. This

means that if a container temporarily goes down, the replication controller might start up another

container. If the first container comes back online, the controller will kill off one of the containers.

Labels

A Kubernetes organizational concept outside of the work-based units is labeling. A label is basically an

arbitrary tag that can be placed on the above work units to mark them as a part of a group. These can

then be selected for management purposes and action targeting.

Labels are fundamental to how both services and replication controllers function. To get a list of

backend servers that a service should pass traffic to, it usually selects containers based on label.

Similarly, replication controllers give all of the containers spawned from their templates the same label.

This makes it easy for the controller to monitor each instance. The controller or the administrator can

manage all of the instances as a group, regardless of how many containers have been spawned.

Labels are given as key-value pairs. Each unit can have more than one label, but each unit can only have

one entry for each key. You can stick with giving pods a "name" key as a general purpose identifier, or

you can classify them by various criteria such as development stage, public accessibility, application

version, etc.

In many cases, you'll want to assign many labels for fine-grained control. You can then select based on

single or combined label requirements.

Conclusion

I conclude Kubernetes aims to offer a better management system. Kubernetes is an exciting project that

implements many functional improvements on top of clustered infrastructure. Meanwhile other

technologies do a great job at handling the clustering aspects.


Appendix

https://www.docker.com/products/docker#/windows

https://www.digitalocean.com/community/tutorials/an-introduction-to-kubernetes

Dell EMC believes the information in this publication is accurate as of its publication date. The

information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO

RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS

PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS

FOR A PARTICULAR PURPOSE.

Use, copying and distribution of any Dell EMC software described in this publication requires an

applicable software license.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.

https://www.docker.com/products/docker#/windows

https://www.digitalocean.com/community/tutorials/an-introduction-to-kubernetes

journey from docker to kubernetes

Documents