michael hausenblas- scalable time series and stream processing for iot applications

Post on 17-Jan-2017

30 Views

Category:

Devices & Hardware

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2016 Mesosphere, Inc. All Rights Reserved.

SCALABLE TIME SERIES AND STREAM PROCESSING FOR IOT APPLICATIONS

1

Michael Hausenblas, Developer & Cloud Advocate | 2016-01-16

© 2015 Mesosphere, Inc. All Rights Reserved.

MOTIVATION

2

© 2015 Mesosphere, Inc. All Rights Reserved.

AIRLINES

3

© 2015 Mesosphere, Inc. All Rights Reserved.

LOGISTICS

4

© 2015 Mesosphere, Inc. All Rights Reserved.

HEALTH CARE

5

© 2015 Mesosphere, Inc. All Rights Reserved.

TRADERS

6

© 2015 Mesosphere, Inc. All Rights Reserved.

FARMERS

7

© 2015 Mesosphere, Inc. All Rights Reserved.

CITIES

8

© 2

014,

Wire

d m

agaz

ine

© 2015 Mesosphere, Inc. All Rights Reserved.

YOU

9

© 2015 Mesosphere, Inc. All Rights Reserved.

THETOOLBOX

10

© 2015 Mesosphere, Inc. All Rights Reserved.

LET'S TALK ABOUT WORKLOADS* …

11*) kudos to Timothy St. Clair, @timothysc

batch streaming PaaS

MapReduce

© 2015 Mesosphere, Inc. All Rights Reserved.

• Apache Kafka• ØMQ, RabbitMQ, Disque (Redis-based), etc.• fluentd, Logstash, Flume• Akka streams• cloud-only: AWS SQS, Google Cloud Pub/Sub• see also queues.io

MESSAGE QUEUES & ROUTERS

12

© 2015 Mesosphere, Inc. All Rights Reserved.

APACHE KAFKA

13

• High-throughput, distributed, persistent publish-subscribe messaging system

• Originates from LinkedIn

• Typically used as buffer/de-coupling layer in online stream processing

Message queues & routers

kafka.apache.org

© 2015 Mesosphere, Inc. All Rights Reserved.

FLUENTD

14

Message queues & routers

www.fluentd.org

© 2015 Mesosphere, Inc. All Rights Reserved.

STREAM PROCESSING PLATFORMS

15

• Apache Storm• Apache Spark• Apache Samza• Apache Flink• Concord• cloud-only: AWS Kinesis, Google Cloud Dataflow• see also my webinar on stream processing

© 2015 Mesosphere, Inc. All Rights Reserved.

APACHE STORM

16

• Distributed, fault-tolerant stream-processing platform

• Guaranteed message processing (replaying messages on failure)

• Concepts: tuples, streams, spouts, bolts, topologies

Stream processing platforms

storm.apache.org

© 2015 Mesosphere, Inc. All Rights Reserved.

APACHE SPARK

17

Stream processing platforms

spark.apache.org

Spark SQL Spark Streaming MLlib(machine learning)

Spark core (RDD)

GraphX(graph processing)

Mesos

Filesystem (local, HDFS, S3) or data store (HBase, Cassandra, Elasticsearch, etc.)

YARNStandalone

© 2015 Mesosphere, Inc. All Rights Reserved.

TIME SERIES DATASTORES

18

• InfluxDB• OpenTSDB• KairosDB• Prometheus• see also iot-a.info

© 2015 Mesosphere, Inc. All Rights Reserved.

OPENTSDB

19

• Distributed time series database on top HBase

• Store, index, query & plot metrics

• Extremely scalable

• Low-level monitoring

Time series datastores

opentsdb.net

© 2015 Mesosphere, Inc. All Rights Reserved.

INFLUXDB

20

• No-dependency, time series database written in Go

• SQLish query language (incl. regex, fan out)

• Single node or Raft-based distributed node mode

Time series datastores

influxdb.com

© 2015 Mesosphere, Inc. All Rights Reserved.

CHALLENGES

21

• Set up and operation of components

• Elasticity: static vs. dynamic partitioning

• Efficient usage of resources (TCO)

© 2015 Mesosphere, Inc. All Rights Reserved.

MEET THE DATACENTER OPERATINGSYSTEM(DCOS)

22

© 2015 Mesosphere, Inc. All Rights Reserved.

LOCAL OS VS. DISTRIBUTED OS

23http://bitly.com/os-vs-dcos

© 2015 Mesosphere, Inc. All Rights Reserved.

DCOS IS A DISTRIBUTED OPERATING SYSTEM

24

• local OS per node (+container enabled)• scheduling (long-lived, batch)• networking• service discovery• stateful services• security• monitoring, logging, debugging

© 2015 Mesosphere, Inc. All Rights Reserved. 25

© 2015 Mesosphere, Inc. All Rights Reserved.

BENEFITS

26

DCOS

• Run stateless services such as Web server or app server and Big Data services like Kafka, Spark, or Cassandra together on one cluster

• Dynamic partitioning of your cluster, depending on your business requirements

• Increased utilization (10% → 80%++)

© 2015 Mesosphere, Inc. All Rights Reserved.

ANEXAMPLE

27

© 2015 Mesosphere, Inc. All Rights Reserved. 28

https://mesosphere.com/blog/2015/11/18/dcos-time-series-demo

© 2015 Mesosphere, Inc. All Rights Reserved. 29https://github.com/mesosphere/time-series-demo

© 2015 Mesosphere, Inc. All Rights Reserved.

Q & A

30

• @mhausenblas

• mhausenblas.info

• @mesosphere

• mesosphere.io/product

• mesosphere.com/infinity

top related