building a better test platform...© 2014 cloudera, inc. all rights reserved. 3 about cloudera •...

28
Building A Better Test Platform: A Case Study of Improving Apache HBase Testing with Docker Aleks Shulman, Dima Spivak

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

Building A Better Test Platform: A Case Study of Improving Apache HBase Testing with Docker

Aleks Shulman, Dima Spivak

Page 2: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

2 © 2014 Cloudera, Inc. All rights reserved.

Outline •  About Cloudera •  Apache HBase

•  Overview •  API compatibility

•  API compatibility testing framework •  Implementation and challenges

•  Intro. to containers and Docker •  API compatibility testing framework, revisited •  Conclusions

Page 3: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

3 © 2014 Cloudera, Inc. All rights reserved.

About Cloudera •  Distributes CDH, Cloudera's 100% open-source distribution including

Apache Hadoop. •  Distributes Cloudera Manager (CM), a proprietary monitoring and

management layer atop CDH. •  Contributes heavily to the open source community.

•  Employs 50 committers, holding 84 committerships. •  18 Apache projects started by Clouderans, including Flume, Sqoop, Sentry,

etc.

Page 4: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

4 © 2014 Cloudera, Inc. All rights reserved.

What is Apache HBase? •  “Apache HBase is the Hadoop

database, a distributed, scalable, big data store.”

•  Built atop HDFS. •  Low-latency, consistent, random-

access. •  Modeled after Google’s BigTable

paper. •  Non-relational. •  NoSQL.

Page 5: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

5 © 2014 Cloudera, Inc. All rights reserved.

SQL vs. HBase

HBase API

Java Full-featured.

REST Easy to use.

Thrift Multi-language support.

SQL HBase API

CREATE create

ALTER modify*

DROP disable, drop

INSERT put

SELECT get, scan

UPDATE put

DELETE delete

HBase API

Java Full-featured.

REST Easy to use.

Thrift Multi-language support.

Page 6: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

6 © 2014 Cloudera, Inc. All rights reserved.

CDH and Apache HBase

CDH4.x CDH5.x

0 … 6

CDH3.x

0 1 2 … 5 6 7 0 1 2 …

HBase 0.96.x

HBase 0.98.x

0 … 0 …

HBase 0.90.x

HBase 0.92.x

0

HBase 0.94.x

0 1 2 1 2 …

Page 7: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

7 © 2014 Cloudera, Inc. All rights reserved.

API Compatibility •  RPC compatibility:

•  Incompatibilities around serialization/deserialization. 17:22:15 Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port pair: PBUF 17:22:15 * 17:22:15 api-compat-8.ent.cloudera.com( 17:22:15 at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60) 17:22:15 at org.apache.hadoop.hbase.ServerName.<init>(ServerName.java:101) 17:22:15 at org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283) 17:22:15 at org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77) 17:22:15 at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61) 17:22:15 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703) 17:22:15 at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:126) 17:22:15 at Client_4_3_0.setup(Client_4_3_0.java:716) 17:22:15 at Client_4_3_0.main(Client_4_3_0.java:63)

Page 8: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

8 © 2014 Cloudera, Inc. All rights reserved.

API Compatibility •  Binary compatibility:

•  Client dependencies changing in an incompatible way can result in a runtime exception.

•  Examples: •  HTable constructors removed in CDH4.2.

public HTable(final String tableName)!public HTable(final byte[] tableName)!

•  HBASE-8273: Change of HColumnDescriptor setter return types. •  Previously returned void, builder pattern changed this.

Page 9: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

9 © 2014 Cloudera, Inc. All rights reserved.

API Compatibility

0.92.x

0.94.x

0.96.x

HBase server HBase client

0.92.x

0.94.x

0.96.x

HBase application

v1.0

v1.1

Binary compatibility RPC compatibility

Partial rewrite likely necessary

CDH4

CDH5

Page 10: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

10 © 2014 Cloudera, Inc. All rights reserved.

Testing 1.  Build cluster with version X. 2.  Run API client with version Y.

X ≠ Y

Page 11: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

11 © 2014 Cloudera, Inc. All rights reserved.

API testing implementation •  Provisioning

•  Provision host on private cloud. •  Static VMs.

•  Tarball framework •  Compile and package CDH server bits

(version X). •  Start up the services as a single-node

pseudo-distributed cluster (version X). •  Compile and run CDH client (version Y).

•  Problems •  Failure modes

•  Infrastructure (VMs, hosts). •  Artifacts.

•  Additional complexity •  Special configurations (e.g. Kerberos). •  Clean-up jobs for hosts. •  Adding/maintaining server and client

versions.

No buffer between provisioning, packaging, deployment, and running tests.

Page 12: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

12 © 2014 Cloudera, Inc. All rights reserved.

Implementation alternatives •  Can we reuse existing tools?

•  CDH packaging •  Cloudera Manager*

•  Can we maximize utilization of computing resources?

Page 13: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

13 © 2014 Cloudera, Inc. All rights reserved.

Linux Containers

•  Isolated environments on a Linux host •  cgroups (2.6.24+)

•  Isolating resources (memory, CPU, networking) •  Namespace isolation (filesystems, process trees)

Page 14: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

14 © 2014 Cloudera, Inc. All rights reserved.

Linux Containers

Hypervisor Host Operating System

Guest OS Guest OS Guest OS Guest OS

Libraries Libraries Libraries Libraries

User processes

User processes

User processes

User processes

Virtual Machines

Host Operating System

Libraries

User processes

User processes

User processes

User processes

Containers

Page 15: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

15 © 2014 Cloudera, Inc. All rights reserved.

Linux Containers

Advantages •  Low overhead •  Fast startup and shutdown

Disadvantages •  Only host kernel-compatible

operating systems •  Less isolation*

Page 16: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

16 © 2014 Cloudera, Inc. All rights reserved.

Docker

•  User front-end for containers •  Container management (start,

stop, pause) •  Images (templates for containers) •  Registries (repository for images)

Page 17: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

17 © 2014 Cloudera, Inc. All rights reserved.

Docker

•  docker run!•  Start container from image

•  docker build!•  Create image from Dockerfile

•  docker commit!•  Create image from container

•  docker run!•  Start container from image

•  docker build!•  Create image from Dockerfile

•  docker commit!•  Create image from container

Page 18: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

18 © 2014 Cloudera, Inc. All rights reserved.

Use cases •  Single-process applications root@savard:~# docker run ubuntu:12.04 echo "Hello world"!Hello world!root@savard:~# _!!

•  Persistent containers •  Containers as daemons.

•  init process root@savard:~# docker run -d ubuntu:12.04 /sbin/init!

Page 19: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

19 © 2014 Cloudera, Inc. All rights reserved.

API testing implementation, revisited •  Problems

•  Failure modes •  Infrastructure (VMs, hosts). •  Artifacts.

•  Additional complexity •  Special configurations (e.g. Kerberos). •  Clean-up jobs for hosts. •  Adding/maintaining server and client

versions.

•  Solutions •  Portability

•  Images, internal registry. •  Artifacts built in.

•  Reduce complexity •  Package configurations in images. •  Containers initialize to same state. •  No need to maintain all-in-one image

creation logic.

Separated packaging, cluster deployment, and running tests; environment becomes a test parameter.

Page 20: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

20 © 2014 Cloudera, Inc. All rights reserved.

API testing implementation, revisited •  HBase server

•  Persistent containers. •  Run CM/CDH in a set of containers.

•  1 container per CM host. •  1 container for DNS server.

•  Create any size cluster from 2 images.

•  Provision physical host once, switch cluster versions easily.

•  API client •  Single-process application.

Page 21: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

21 © 2014 Cloudera, Inc. All rights reserved.

Building cluster images

Start container cluster

DNS server dnsserver (10.0.0.2)

Node node-2

(10.0.0.4)

Node node-1

(10.0.0.3)

Build cluster

Page 22: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

22 © 2014 Cloudera, Inc. All rights reserved.

Building cluster images DNS server dnsserver (10.0.0.2)

Node node-2

(10.0.0.4)

Node node-1

(10.0.0.3)

Deploy CM/CDH

Validate cluster health

Stop services

docker commit!

Master Slave

Page 23: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

23 © 2014 Cloudera, Inc. All rights reserved.

Starting cluster containers

DNS server dnsserver (10.0.0.2)

Node node-1

(10.0.0.3)

Node node-2

(10.0.0.4)

Start cluster

Master Slave

Node node-3

(10.0.0.5)

Slave

Node node-4

(10.0.0.6)

Slave

Page 24: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

24 © 2014 Cloudera, Inc. All rights reserved.

Starting cluster containers

DNS server dnsserver (10.0.0.2)

Node node-1

(10.0.0.3)

Node node-2

(10.0.0.4) Node node-3

(10.0.0.5) Node node-4

(10.0.0.6)

Configure cluster

Start services

Validate cluster health

Page 25: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

25 © 2014 Cloudera, Inc. All rights reserved.

•  Script executing org.apache.hadoop.hbase.client.TestFromClientSide on distributed cluster. •  Pass in environmental variables:

-e “HBASE_ZK_QUORUM=node-1.internal”!-e “MODE=<rpc|binary>”!

Running API client

DNS server dnsserver (10.0.0.2)

Node node-1

(10.0.0.3)

Node node-2

(10.0.0.4) Node node-3

(10.0.0.5) Node node-4

(10.0.0.6)

docker run ubuntu12.04_cdh5.0.0_hbase_client !

Page 26: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

26 © 2014 Cloudera, Inc. All rights reserved.

Running API client

DNS server dnsserver (10.0.0.2)

Node node-1

(10.0.0.3)

Node node-2

(10.0.0.4) Node node-3

(10.0.0.5) Node node-4

(10.0.0.6)

docker run ubuntu12.04_cdh5.0.0_hbase_client !

Archive output

Reuse/destroy cluster

Page 27: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop

27 © 2014 Cloudera, Inc. All rights reserved.

Conclusions •  Recognized automation that was more trouble than it was worth.

•  More effort into setting up test environment than on running the tests.

•  Overhauled test framework using Docker. •  Maximize utilization of computing resources. •  Made environment a test parameter.

•  Further work: •  Consider workflows beyond compatibility testing.

•  Upgrade testing. •  Failure injection.

Page 28: Building A Better Test Platform...© 2014 Cloudera, Inc. All rights reserved. 3 About Cloudera • Distributes CDH, Cloudera's 100% open-source distribution including Apache Hadoop