google cloud bigtable integrating time series database with · pdf fileopentsdb + bigtable...

25
OpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice Lead - @zburivsky Christos Soulios, Big Data Architect - @c_soulios

Upload: vokhuong

Post on 07-Mar-2018

234 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

OpenTSDB + Bigtable

Integrating time series database withGoogle Cloud Bigtable

Danil Zburivsky, Big Data Practice Lead - @zburivskyChristos Soulios, Big Data Architect - @c_soulios

Page 2: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

Pythian specializes in design, implementation, and management of systems that directly contribute to revenue and business success.

History19 years in business

Growing at 30+% per year

400+ employees

300+ customers worldwide

HQ Ottawa, Canada - global reach

Technology agnostic = trusted advisor

Deep expertise: Oracle, Oracle Apps, MySQL, AWS, SQL Server, Cassandra/DataStax, Azure, PostgreSQL, Cloudera, MapR, Hortonworks etc.

Google Premier Partner Status (as of end Aug)

5 Certified Developers (soon to be 12)

Dedicated Google Technical Champion

Launch partner for: Kubernetes, Dataflow, Cloud SQL, Dataproc

Integrated OpenTSDB with Bigtable

DW Explorers Program Partner

Upcoming BigQuery & Cloud ML Launch Partner

Page 3: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

• (time, metric, value)

• OS and apps metrics

• Industrial equipment

• Web traffic

Time series data

Page 4: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

• Volume can be explosive

• Data arrival and access patterns are different

Storing time series data is a challenge

Page 5: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

• Volume can be explosive

• Data arrival and access patterns are different

Storing time series data is a challenge

Page 6: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

• NoSQL

• Data model and storage optimized for time series

• Separate query language

Better alternatives — specialized stores

Page 7: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

• Open source

• Uses HBase as a data store

• Data model optimized for time series

• REST API

OpenTSDB

<metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>]

<col_t+1>[...<col_t+N>]

Page 8: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

OpenTSDB Architecture

Server Server Server Server

TSD TSD

HBase

TSD RPC

HBase RPC

Web UI

Scripts/Alerting

HTTP

TSD RPC

Page 9: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

• HBase requires a full Hadoop setup (3xZK, 2xNN, 3xDN, 2xHMaster, 3xHRegion)

• HBase tuning is a job for the brave (HFiles, WAL, MemStore, BucketCache, BlockCache)

HBase can be too much

Page 10: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

HBase can be too much

Page 11: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

But all I wanted was a time series database

Page 12: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

Google Cloud Bigtable

• Highly Scalable NoSQL database

• Low latency, high throughput

• Powers most Google products

• Available as a Google Cloud Service

Page 13: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

Migrate HBase apps to Cloud Bigtable

• The Bigtable client is API compatible with HBase client

• Only replace hbase-client.jar with bigtable-hbase.jar

• No code changes required!

Page 14: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

Migrate OpenTSDB to Cloud Bigtable

• OpenTSDB does not use standard hbase-client.jar

• OpenTSDB is based on AsyncHBase library

Page 15: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

AsyncHBase library

• Open source HBase client library

• Multi-threaded Multiple threads use the same instance

• Fully asynchronous, non-blocking

• Implements the low level HBase RPCs

Page 16: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

Detour: Asynchronous programming

Page 17: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

Detour: Why asynchronous?

• Efficient thread usage

• Less threads = less memory

• CPU scheduler friendly

• Extremely high concurrency

Page 18: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

AsyncHBase library

http://www.tsunanet.net/~tsuna/asynchbase/benchmark/viz.html

Page 19: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

AsyncHBase library

“AsyncHBase client differs significantly from HBase's client. Switching to it is not easy as it requires to rewrite all the code that was interacting with any HBase API”

AsyncHBase documentation

Page 20: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

AsyncBigtable library

● Complete rewrite of AsyncHBase API

● Uses standard hbase-client for Bigtable access

● Compatible with the bigtable-hbase API

Page 21: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

AsyncBigtable challenges

● OpenTSDB jar dependencies

● AsyncBigtable is not async!

● BufferedMutator + Threadpool to emulate async

Page 22: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

AsyncBigtable library

Page 24: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

Future work

● Native Bigtable API

● Fully asynchronous

● Improve performance

● Add more unit tests

Page 25: Google Cloud Bigtable Integrating time series database with · PDF fileOpenTSDB + Bigtable Integrating time series database with Google Cloud Bigtable Danil Zburivsky, Big Data Practice

Questions?

https://github.com/opentsdb/asyncbigtable