google cloud bigtable integrating time series database with · pdf fileopentsdb + bigtable...

Post on 07-Mar-2018

234 Views

Category:

Documents

11 Downloads

Preview:

Click to see full reader

TRANSCRIPT

OpenTSDB + Bigtable

Integrating time series database withGoogle Cloud Bigtable

Danil Zburivsky, Big Data Practice Lead - @zburivskyChristos Soulios, Big Data Architect - @c_soulios

Pythian specializes in design, implementation, and management of systems that directly contribute to revenue and business success.

History19 years in business

Growing at 30+% per year

400+ employees

300+ customers worldwide

HQ Ottawa, Canada - global reach

Technology agnostic = trusted advisor

Deep expertise: Oracle, Oracle Apps, MySQL, AWS, SQL Server, Cassandra/DataStax, Azure, PostgreSQL, Cloudera, MapR, Hortonworks etc.

Google Premier Partner Status (as of end Aug)

5 Certified Developers (soon to be 12)

Dedicated Google Technical Champion

Launch partner for: Kubernetes, Dataflow, Cloud SQL, Dataproc

Integrated OpenTSDB with Bigtable

DW Explorers Program Partner

Upcoming BigQuery & Cloud ML Launch Partner

• (time, metric, value)

• OS and apps metrics

• Industrial equipment

• Web traffic

Time series data

• Volume can be explosive

• Data arrival and access patterns are different

Storing time series data is a challenge

• Volume can be explosive

• Data arrival and access patterns are different

Storing time series data is a challenge

• NoSQL

• Data model and storage optimized for time series

• Separate query language

Better alternatives — specialized stores

• Open source

• Uses HBase as a data store

• Data model optimized for time series

• REST API

OpenTSDB

<metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>]

<col_t+1>[...<col_t+N>]

OpenTSDB Architecture

Server Server Server Server

TSD TSD

HBase

TSD RPC

HBase RPC

Web UI

Scripts/Alerting

HTTP

TSD RPC

• HBase requires a full Hadoop setup (3xZK, 2xNN, 3xDN, 2xHMaster, 3xHRegion)

• HBase tuning is a job for the brave (HFiles, WAL, MemStore, BucketCache, BlockCache)

HBase can be too much

HBase can be too much

But all I wanted was a time series database

Google Cloud Bigtable

• Highly Scalable NoSQL database

• Low latency, high throughput

• Powers most Google products

• Available as a Google Cloud Service

Migrate HBase apps to Cloud Bigtable

• The Bigtable client is API compatible with HBase client

• Only replace hbase-client.jar with bigtable-hbase.jar

• No code changes required!

Migrate OpenTSDB to Cloud Bigtable

• OpenTSDB does not use standard hbase-client.jar

• OpenTSDB is based on AsyncHBase library

AsyncHBase library

• Open source HBase client library

• Multi-threaded Multiple threads use the same instance

• Fully asynchronous, non-blocking

• Implements the low level HBase RPCs

Detour: Asynchronous programming

Detour: Why asynchronous?

• Efficient thread usage

• Less threads = less memory

• CPU scheduler friendly

• Extremely high concurrency

AsyncHBase library

http://www.tsunanet.net/~tsuna/asynchbase/benchmark/viz.html

AsyncHBase library

“AsyncHBase client differs significantly from HBase's client. Switching to it is not easy as it requires to rewrite all the code that was interacting with any HBase API”

AsyncHBase documentation

AsyncBigtable library

● Complete rewrite of AsyncHBase API

● Uses standard hbase-client for Bigtable access

● Compatible with the bigtable-hbase API

AsyncBigtable challenges

● OpenTSDB jar dependencies

● AsyncBigtable is not async!

● BufferedMutator + Threadpool to emulate async

AsyncBigtable library

Future work

● Native Bigtable API

● Fully asynchronous

● Improve performance

● Add more unit tests

Questions?

https://github.com/opentsdb/asyncbigtable

top related