time series data with influxdb

59
Working with time series data with InfluxDB Paul Dix @pauldix paul@influxdb.com

Upload: dato-inc

Post on 12-Aug-2015

178 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Time Series Data with InfluxDB

Working with time series data with InfluxDB

Paul Dix @pauldix

[email protected]

Page 2: Time Series Data with InfluxDB

What is time series data?

Page 3: Time Series Data with InfluxDB

Stock trades and quotes

Page 4: Time Series Data with InfluxDB

Metrics

Page 5: Time Series Data with InfluxDB

Analytics

Page 6: Time Series Data with InfluxDB

Events

Page 7: Time Series Data with InfluxDB

Sensor data

Page 8: Time Series Data with InfluxDB

Two kinds of time series data…

Page 9: Time Series Data with InfluxDB

Regular time series

t0 t1 t2 t3 t4 t6 t7

Samples at regular intervals

Page 10: Time Series Data with InfluxDB

Irregular time series

t0 t1 t2 t3 t4 t6 t7

Events whenever they come in

Page 11: Time Series Data with InfluxDB

Inducing a regular time series from an irregular one

query: select count(customer_id) from events where time > now() - 1h group by time(1m), customer_id

Page 12: Time Series Data with InfluxDB

Data that you ask questions about over time

Page 13: Time Series Data with InfluxDB

InfluxDB is an open source distributed time

series database* still working on the distributed part

Page 14: Time Series Data with InfluxDB

Why would you want a database for time series

data?

Page 15: Time Series Data with InfluxDB

Scale

Page 16: Time Series Data with InfluxDB

Example from DevOps• 2,000 servers, VMs, containers, or sensor units

• 200 measurements per server/unit

• every 10 seconds

• = 3,456,000,000 distinct points per day

Page 17: Time Series Data with InfluxDB

Sharding Datausually requires application level code

Page 18: Time Series Data with InfluxDB

Data retentionapplication level code and sharding

Page 19: Time Series Data with InfluxDB

Rollups and aggregation

Page 20: Time Series Data with InfluxDB

InfluxDB features

Page 21: Time Series Data with InfluxDB

SQL style query language

Page 22: Time Series Data with InfluxDB

Retention policiesautomatically managed data retention

Page 23: Time Series Data with InfluxDB

Continuous queriesfor rollups and aggregation

Page 24: Time Series Data with InfluxDB

HTTP API - 2 endpoints

Page 25: Time Series Data with InfluxDB

HTTP API - 2 endpoints

/write?db=mydb&rp=fooWrite: HTTP POST

Page 26: Time Series Data with InfluxDB

HTTP API - 2 endpoints

/write?db=mydb&rp=foo

/query?db=mydb&rp=foo&q=

Write: HTTP POST

Read: HTTP GET

Page 27: Time Series Data with InfluxDB

InfluxDB Schema• Measurements (e.g. cpu, temperature, event,

memory)

Page 28: Time Series Data with InfluxDB

InfluxDB Schema• Measurements (e.g. cpu, temperature, event,

memory)

• Tags (e.g. region=uswest, host=serverA, sensor=23)

Page 29: Time Series Data with InfluxDB

InfluxDB Schema• Measurements (e.g. cpu, temperature, event,

memory)

• Tags (e.g. region=uswest, host=serverA, sensor=23)

• Fields (e.g. value=23.2, info=‘this is some extra stuff`, present=true)

Page 30: Time Series Data with InfluxDB

InfluxDB Schema• Measurements (e.g. cpu, temperature, event,

memory)

• Tags (e.g. region=uswest, host=serverA, sensor=23)

• Fields (e.g. value=23.2, info=‘this is some extra stuff`, present=true)

• Timestamp (nano-second epoch)

Page 31: Time Series Data with InfluxDB

All data is indexed by measurement, tagset,

and time

Page 32: Time Series Data with InfluxDB

Influx CLI

$ ./influx Connected to http://localhost:8086 version 0.9 InfluxDB shell 0.9 >

Page 33: Time Series Data with InfluxDB

Create a database

CREATE DATABASE foo

Page 34: Time Series Data with InfluxDB

Create a retention policy

CREATE RETENTION POLICY <rp-name> ON <db-name> DURATION <duration> REPLICATION <n> [DEFAULT]

Page 35: Time Series Data with InfluxDB

Create a retention policy

CREATE RETENTION POLICY <rp-name> ON <db-name> DURATION <duration> REPLICATION <n> [DEFAULT]

CREATE RETENTION POLICY high_precision ON mydb DURATION 7d REPLICATION 3 DEFAULT

Page 36: Time Series Data with InfluxDB

Create a retention policy

CREATE RETENTION POLICY <rp-name> ON <db-name> DURATION <duration> REPLICATION <n> [DEFAULT]

CREATE RETENTION POLICY high_precision ON mydb DURATION 7d REPLICATION 3 DEFAULT

Writes will go into this RP unless otherwise specified

Page 37: Time Series Data with InfluxDB

Discovery

Page 38: Time Series Data with InfluxDB

Inverted indexof measurements and tags

Page 39: Time Series Data with InfluxDB

DiscoverySHOW MEASUREMENTs

Page 40: Time Series Data with InfluxDB

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

Page 41: Time Series Data with InfluxDB

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

Page 42: Time Series Data with InfluxDB

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

SHOW TAG KEYS from CPU

Page 43: Time Series Data with InfluxDB

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

SHOW TAG KEYS from CPU

SHOW TAG VALUES from CPU WITH KEY = 'region'

Page 44: Time Series Data with InfluxDB

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

SHOW TAG KEYS from CPU

SHOW TAG VALUES from CPU WITH KEY = 'region'

SHOW SERIES

Page 45: Time Series Data with InfluxDB

DiscoverySHOW MEASUREMENTs

SHOW MEASUREMENTS where host = 'serverA'

SHOW TAG KEYS

SHOW TAG KEYS from CPU

SHOW TAG VALUES from CPU WITH KEY = 'region'

SHOW SERIES

SHOW SERIES where service = 'redis'

Page 46: Time Series Data with InfluxDB

Queries

Page 47: Time Series Data with InfluxDB

SQL-ish

select * from some_series where time > now() - 1h

Page 48: Time Series Data with InfluxDB

Aggregates

select percentile(90, value) from cpu where time > now() - 1d group by time(10m)

Page 49: Time Series Data with InfluxDB

Aggregates

select percentile(90, value) from cpu where time > now() - 1d group by time(10m), region

Group by a tag

Page 50: Time Series Data with InfluxDB

Where against Regex (field)

select value from some_log_series where value =~ /.*ERROR.*/ and time > "2014-03-01" and time < "2014-03-03"

Page 51: Time Series Data with InfluxDB

Where against Regex (tag)

select value from some_log_series where host =~ /.*asdf.*/ and time > "2014-03-01" and time < “2014-03-03" group by host

Page 52: Time Series Data with InfluxDB

Functionsmin max percentile first last stddev mean count sum median distinct count(distinct)

more soon: difference, histogram, moving_average

Page 53: Time Series Data with InfluxDB

Continuous queriesCREATE CONTINUOUS QUERY "10m_event_count"ON mydbBEGIN SELECT count(value) INTO "6_months".events FROM events GROUP BY time(10m)END;

Page 54: Time Series Data with InfluxDB

Other tools

Page 55: Time Series Data with InfluxDB

Telegrafdata collection

Page 56: Time Series Data with InfluxDB

Chronograf

Page 57: Time Series Data with InfluxDB

Grafana

Page 58: Time Series Data with InfluxDB

More coming• Compression

• Clustering

• Custom functions

Page 59: Time Series Data with InfluxDB

Thank you!Paul Dix @pauldix

[email protected]