influx db talk-20150415

Post on 17-Jul-2015

209 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Intro to InfluxDBRichard.Elling@RichardElling.com

FeaturesHTTP(S) API with user access controls

Scalability

Billions of data points

Hundreds of thousands of series

Multiple nodes

Managed retention policies

Simple to install and manage — no external dependencies

Dev Featuresgithub.com/influxdb

Written in go

SQL-like query language

Client libraries available for your favorite dev environment

python, javascript, node.js, java, R, ruby, C#, PHP, …

HTTP: curl, httpie, wget

MIT license

Ops FeaturesSecurity model separates admins from users

Active and vibrant community

Flexible data retention policies

Time-based sharding

Downsample data using different time windows

Expand storage space by adding nodes

Why We Chose InfluxDB?Need telemetry, events, and status from systems

Information, not just numbers

100k+ metrics per system, 2-4k are interesting to measure forever

Events and configuration

Collecting more relational data, extensible, JSON works well

Requirements rules out many “metrics-oriented” time-series solutions

Feed from collectd and HTTP POST

Open source, redistribution and contribution friendly license (MIT)

Deployment Architecture

SchemaVersion 0.8

Embed metadata into series name

Similar to graphite

name1.value1.name2.value2.metric

datacenter.0.server.elvis.temperature

Version 0.9

Spoiler alert

QueriesSQL-like query language

select * from series_name

select value from series_name where time > ‘2015-04-15’

select value from series_name where time > now() - 1h

select value from series_name where time > now() - 1d limit 100

Regular expressions are handyselect * from /.*\.elvis\..*/ limit 10

select value from /^MyCompany\..*/ limit 1

Queries do mathcount, top, bottom

min, max, mean, mode, median, stddev

distinct

percentile

histogram

first, last, difference, sum, derivative

select mean(value) from series_name where time > now() - 1h

select derivative(value) from series_name where time > now() - 1h group by time (60s) order asc

Continuous QueriesUseful for downsampling

Choices:

downsample every time you query

downsample in advance and store the results

Restricted query: only admins can create continuous queries

Powerful with many different options and applications

select mean(value) from series_name group by time(5m) into series_name.mean.5m

Python pluginfrom influxdb import InfluxDBClient client = InfluxDBClient('localhost', 8086, 'user', 'password', ‘db_name’) print json.dumps(client.query('list series'), indent=4) [ { "points": [ [ 0, "Node.elvis.CPU_stats.0.derive.cpu_nsec_idle" ], … ], "name": "list_series_result", "columns": [ "time", "name" ] } ]

Managing ShardsSetup shard spaces when creating databases (!){ “spaces”: [{ “name”: “detail”, “retentionPolicy”: “10d”, “shardDuration”: “2d”, “regex”: “/.*/“, “replicationFactor”: 1, “split”: 1 }] } object { array { object { string name; // space name string retentionPolicy; // minimum time to keep string shardDuration; // max expected group by time() number replicationFactor; // number of replicas number split; // shards per period }; } spaces; };

InfluxDB Version 0.8

current stable release

end of the road for 0.8 (0.8.8)

database back-ends: LevelDB (use this), RocksDB, HyperLevelDB, and LMDB

caveat: clustering is completely redesigned in 0.9

FuturesVersion 0.9 in release-candidate stage (start testing now!)

Significant redesign — migration may be challeging

Tags for fast, efficient queries — see docs and begin schema planning now

Dropping multiple database backends — using BoltDB

Clustering, replication, high-availability

Streaming raft implementation

Role = broker for raft consensus

Role = data for hosting data, answer queries

www.influxdb.com https://groups.google.com/forum/#!forum/influxdb

@InfluxDB

Richard.Elling@RichardElling.com #richardelling

Demo and Questions

top related