xephon k a time series database with multiple backends

26
Xephon-K A lightweight TSDB with multiple backends Pinglei Guo https://github.com/xephonhq/xephon-k

Upload: pinglei-guo

Post on 11-Apr-2017

28 views

Category:

Software


0 download

TRANSCRIPT

Xephon-KA lightweight TSDB with multiple backends

Pinglei Guo https://github.com/xephonhq/xephon-k

Agenda

● Overview

● Time Series Data Revisited

● Time Series Database state of the art

● Xephon-K Design

● Xephon-K Implementation

● Evaluation

● Lessons learned

● Related & Future work

● Conclusion

Overview

● Written in Golang (1,700 loc including bench and test)

● Use Cassandra as main backend

● Simple data model

● It is working

Time Series Data Revisited

NOT just data with timestamp

‘What happened, happened and couldn’t have happened another way’

- The Matrix

Time Series Data Revisited

Name Saving Update time

Rabbit $100 2017/03/20:12:59:33

Tiger $250 2017/03/20:12:59:33

Name Daily Transaction

Date

Rabbit +$100, 000 2017/03/19

Rabbit -$99, 900 2017/03/20

Tiger +$125 2017/03/19

Tiger +$125 2017/03/20

Single record, update in place, tell current state

A series of events, immutable, tell the history

Time Series Database state of the art

Xephon-K Cassandra Yes Golang at15 N/A 1

Full list on: https://github.com/xephonhq/awesome-time-series-database

Xephon-K Design

Xephon-K Implementation

● Naive schema and Cassandra data model

● Internal representation

● In Memory storage

● API

Xephon-K Implementation - Naive schema

metric_name metric_timestamp value

cpu 2017/03/17:13:24:00:20 10.2

cpu 2017/03/17:13:25:00:00 3.3

cpu 2017/03/17:13:26:00:00 5.6

mem 2017/03/17:13:24:00:20 80.3

mem 2017/03/17:13:25:00:00 60.2

mem 2017/03/17:13:26:00:00 90.3

cqlsh> SELECT * FROM metrics

Xephon-K Implementation - Naive schema

name metric_timestamp val

cpu 2017/03/17:13:24:00:20 10.2

cpu 2017/03/17:13:25:00:00 3.3

cpu 2017/03/17:13:26:00:00 5.6

mem 2017/03/17:13:24:00:20 80.3

mem 2017/03/17:13:25:00:00 60.2

mem 2017/03/17:13:26:00:00 90.3

The table is an abstraction of underlying map

Xephon-K Implementation

● Naive schema and Cassandra data model

● Internal representation

● In Memory storage

● API

Xephon-K Implementation - Internal representation

type IntPoint struct {T int64V int

}type DoublePoint struct {

T int64V double

}

type IntSeries struct {Name stringTags map[string]stringPoints []IntPoint

}type DoubleSeries struct {

Name stringTags map[string]stringPoints []DoublePoint

}

Xephon-K Implementation

● Naive schema and Cassandra data model

● Internal representation

● In Memory storage

● API

Xephon-K Implementation - In Memory storage

type Data map[SeriesID]*IntSeriesStore

type IntSeriesStore struct {mu sync.RWMutexseries common.IntSerieslength int

}

type Index []IndexRow

type IndexRow struct {key stringvalue stringseriesID SeriesID

}

Xephon-K Implementation

● Naive schema and Cassandra data model

● Internal representation

● In Memory storage

● API

Xephon-K Implementation - API Write

[ { "name": "archive_file_tracked", "tags": { "host": "server1", "data_center": "DC1" }, "points": [

[1359788400000, 123], [1359788300000, 13], [1359788410000, 23]

] }]

http://localhost:2333/write

{ "points": [ [1359788400000, 123], [1359788300000, 13], ], "points": [ {"t": 1359788400000, "v": 123}, {"t": 1359788300000, "v": 13}, ]}

Use array instead of object, all numeric values are number in JSON

Evaluation Environment Setup

● i7-6700 CPU @ 3.40GHz 32 GB RAM HDD Ubuntu 16.10 ( kernel 4.8.0-39 )

● Docker 1.13 without resource limits on container

● InfluxDB 1.2

● KairosDB 1.12 + Cassandra 2.2

● Xephon-K (Go 1.7.4) + Cassandra 3.10

● Write to one series with one tag `cpi{agent:xephon-bench}` with fixed value

● Batch size 100 points, client timeout 30 seconds

● No QPS limit, No retry, No backoff

Evaluation - Throughput

Evaluation - Throughput

Database Total Requests

XKM 12327

XKC 7931

KairosDB 15561

InfluxDB 118

5 seconds, 10 workers

● InfluxDB performance is extremely poor (my bad?)

● KairosDB outperformed Xephon-K (K is from KairosDB …)

● Prometheus can’t be benchmarked (no HTTP API)

Evaluation Analysis

Q: Why InfluxDB is so slow ?A: Good question, I am still figuring it out (see #15), you can’t blame docker, run it locally results the same

Q: Why KairosDB is faster, Java > Golang ?● lock

● Buffer (batch size)

Q: That’s it?A: Bingo! But https://github.com/xephonhq/xephon-k/tree/master/doc/bench has bunch of results I didn’t dealt with

Q: The chart looks good, what are you using?A: echarts3 http://echarts.baidu.com/ (One JavaScript a day, Keep Microsoft Excel away)

Lessons learned

● Write ugly code and make things work

● Hardware improve productivity, double the monitor, double the Loc/hr

● Source code is your bestfriend, don’t blindly believe what people say in the

doc, blog, conference, paper, twitter, stackoverflow

Related work

Xephon-B: A TSDB benchmark tool and benchmark result sharing platform

● https://github.com/xephonhq/xephon-b● Is a never finished course project with @zchen

Reika A DSL for TSDB

● https://github.com/xephonhq/tsdb-proxy-java/tree/master/ql● Is also a course project two

Xephon-K: I am course project three QvQ

<- Reika

Future work

● Refactor (everyday I am blaming the code of yesterday)

● Storage without Cassandra (yeah, this is course project four)

● Dashboard

● Benchmark driven development using Xephon-B

Acknowledgement

● Zheyuan Chen and Prof. Peter Alvaro for Xephon-B

● Chujiao Hou for Reika

Conclusion

● Time series data is a series of immutable data points, it tells history

● CQL is an illusion created for RDBMS people

● Cassandra is a map of maps that contains maps

● http://echarts.baidu.com/ is a good charting library

● Ugly code works, perfect is the enemy of deadline (well, video games to be honest)

● Xephon-K is awesome

● What people say in their presentation may not be true, use the source, Luke

Thank You!

No question, please, just let me go.