project nautilus unified data stack for iot industry needs a new storage paradigm that solves for...

23
PROJECT NAUTILUS Unified Data Stack for IoT

Upload: vucong

Post on 18-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

PROJECT NAUTILUSUnified Data Stack for IoT

© Copyright 2017 Dell Inc.2

Market Drivers For Streaming Analytics

Open SourceCommunity

MassiveData Growth

Emergence ofReal-Time Apps

Infrastructure Commoditizationand Scale-Out

Rapid Disseminationof Data to Apps

Monetize Datawith Analytics

Data Velocityand Variety

© Copyright 2017 Dell Inc.3

Streaming Use Cases Across VerticalsMarket Snapshot by Verticals

Source:MarketsandMarkets, Stream Analytics Forecast

Algorithmic Trading

Real-time Customer Engagement

Location Intelligence

Operations Management

Supply Chain Optimization

Vehicle & Route Tracking

Network Monitoring

Real-time Patient Monitoring

Real-Time Call-Center Analysis

• Use Cases Across Different Verticals

• BFSI Use Cases– Algorithmic Trading– Fraud Detection– Real-time Transaction Analysis– Transaction Cost Analysis– Smart Order Routing

© Copyright 2017 Dell Inc.4

Today’s challenges with streaming analytics

Deriving actionable real-time business

insights

The industry needs a new storage paradigm that solves for extreme stream velocity and unified real-time analytics

Complex app ecosystem

>300

Siloed real-time and batch

Extreme stream and

storage scale

DIY architecture is complex, siloed, and expensive

PravegaStreams As A Storage Primitive For IoT

© Copyright 2017 Dell Inc.6

Today’s “Accidental Architecture”Batch

Real-Time

Interactive exploration by Data Scientists

Real-time intelligence at the NOC

Sensors

MirrorMaker

DR Site

Mobile Devices

App Logs

© Copyright 2017 Dell Inc.7

A New Architecture Emerges: Streaming

• A new class of streaming systems is emerging to address the accidental architecture’s problems and enable new applications not possible before

• Some of the unique characteristics of streaming applications– Treat data as continuous and infinite– Compute correct results in real-time with stateful, exactly-once processing

• These systems are applicable for real-time applications, batch applications, and interactive applications

• Web-scale companies (Google, Twitter) are beginning to demonstrate the disruptive value of streaming systems

• What are the implications for storage in a streaming world?

© Copyright 2017 Dell Inc.8

Let’s Rewind A Bit: The Importance of Log Storage

Traditional Apps/Middleware Streaming Apps/Middleware

BLOCKS• Structured Data• Relational DBs

FILES• Unstructured Data• Pub/Sub• NoSQL DBs

OBJECTS• Unstructured Data• Internet Friendly (REST)• Scale over Semantics• Geo

LOGS• Append-only • Low-latency• Tail Read/Write

© Copyright 2017 Dell Inc.9

The Importance of Log StorageThe Fundamental Data Structure for Scale-out Distributed Systems

APPEND-ONLY LOG

x=5 z=6 y=2 x=4 a=7 y=5… …

older newer

High Throughput

catch-up reads

Low Latency

tailing reads writes

© Copyright 2017 Dell Inc.10

Our Goal: Refactor the “Accidental Storage Stack”

Ingest Buffer & Pub/Sub

“Pravega Streams”

Scale-out SDS

NoSQL DB Search Analytics Engines

Using Logs as a Shared Storage Primitive

Ingest Buffer & Pub/Sub

ProprietaryLog Storage

Local Files

DAS

Kafka

NoSQL DBProprietary

Log Storage

Local Files

DAS

Cassandra et al

© Copyright 2017 Dell Inc.11

Introducing Pravega StreamsA new log primitive designed for streaming architectures

• Pravega is an open source distributed storage service offering a new storage abstraction called a stream

• A stream is the foundation for building reliable streaming systems: a high-performance, durable, elastic, and infinite append-only log with strict ordering and consistency

• A stream is as lightweight as a file – you can create millions of them in a single cluster

• Streams greatly simplify the development and operation of a variety of distributed systems: messaging, databases, analytic engines, search engines, and so on

© Copyright 2017 Dell Inc.12

Streaming Storage System

Pravega Architecture

StreamAbstraction

Pravega Streaming Service

Cloud Scale Storage(Isilon or ECS)

• High-Throughput• High-Scale, Low-Cost

Low-Latency Storage

Apache Bookkeeper

Auto-Tiering

Cache(Rocks)

Messaging Apps

Real-Time / Batch / Interactive Predictive Analytics

Stream Processors: Spark, Flink, …Other Apps & Middleware

Pravega Design Innovations1. Zero-Touch Dynamic Scaling

- Automatically scale read/write parallelism based on load and SLO

- No service interruptions- No manual reconfiguration of clients- No manual reconfiguration of

service resources2. Smart Workload Distribution

- No need to over-provision servers for peak load

3. I/O Path Isolation- For tail writes- For tail reads- For catch-up reads

4. Tiering for “Infinite Streams”5. Transactions For “Exactly Once”

NautilusUnified Storage & Analytics Processing Platform

© Copyright 2017 Dell Inc.14

Nautilus VisionA Unified Platform For Big & Fast Data For The Enterprise

Enable Real-time Ingestion

• Support for low latency ingestion of continuous as well as trickle streams

• In-place analytics via Multi-protocol access: Kafka, HDFS & S3

• Multi-tenancy & Role Based access for streaming data

Unify Stream & Batch Processing

• Access to storage at memory speeds

• Fast checkpoint from streaming engine to storage

• Support for data science workflows

Reduce Storage & Operational costs

• Single cluster for storage (streaming, long-term) & analytics

• Seamless disaster recovery via geo-enabled storage

• Simplified architecture results in lower operational cost

© Copyright 2017 Dell Inc.15

Nautilus: A Unified Storage Architecture• An extensible collection of elastic storage micro-services

Open Source HDFS

N/A

IsilonIsilon

ECSECS

DeepStorage Layer

StorageAccess Layer

StreamingStorage Layer

Ingest Buffer Pub/Sub Search Persistent Data Structures …

ApplicationsInfrastructure Monitoring

Patient SurveillanceIntrusion Detection Systems

Fleet Tracking Supply Chain ControlIntelligent Shopping Applications

Smart GridSnow Level Monitoring

Traffic Congestion Analytics

Wearable

Commercial Deployment Options

Pravega Streams

Object or FileData Access

© Copyright 2017 Dell Inc.16

Nautilus: A Unified Data PipelineStrongly Consistent Storage Exactly Once Processing Unified Analytics

Unified AnalyticsReal-Time, Batch, Interactive

Interactive exploration by Data Scientists

Real-time intelligence at the NOC

Sensors

Mobile Devices

App Logs

Isilon / ECS

Ingest Buffer Pub/Sub Search Persistent Data Structures

Pravega Streams

Unified Storage

© Copyright 2017 Dell Inc.17

Nautilus PlatformSecure | Integrated | Efficient | Elastic | Scalable

Nautilus: A Turn-Key Streaming Data Platform

PravegaStreaming Storage

FlinkStateful Stream Processor

Streaming SQL API

ZeppelinNotebook Experience

Notebook API

Mesos + DC/OS

Commodity Servers or Cloud

Secu

rity

Serviceability

Streaming Storage API

Digital WorldReal-Time/Batch Analytics

Frameworks and Apps Interactive Exploration

© Copyright 2017 Dell Inc.18

Operational Scenario: TodayHow to test & deploy new version of analytics business logic?

Archived DataHDFS or NFS or Object

Recent Data: Kafka Logs

x=5 z=6 y=2 x=4 a=7 y=5… …

older newer

HDFS API

ETL

StreamingApplicationBusiness Logic

Requirements• Run new business logic against historical data sets• Validate correct results for problematic scenarios• Ensure no regression• Deploy new business logic in production• Ensure new version is functioning properly before switching

users/applications• Revert to prior if something fails

© Copyright 2017 Dell Inc.19

Operational Scenario: TodayHow to test & deploy new version of an analytics business logic?

Archived DataHDFS or NFS or Object

Recent Data: Kafka Logs

x=5 z=6 y=2 x=4 a=7 y=5… …

older newer

HDFS API

ETL

StreamingApplicationBusiness Logic

StreamingApplicationBusiness Logic’

New version of app deployed with different data access methods

Challenges• Custom scripting and deployment required because historical

data is located in different storage system and accessed via different data type (e.g. files vs. logs)

• Test run is not exactly like production due to mismatches between log/file access and deployment differences – requires more time, is error prone, and leads to inaccurate test results

• Often requires downtime if upgrading in place• Upgrade “next to existing” requires complex workflow

Historical data accessed via files(not logs!) from HDFS archive

© Copyright 2017 Dell Inc.20

Operational Scenario: With NautilusHow to test & deploy new version of a analytics business logic?

StreamingApplicationBusiness Logic

Streams

Tiering to/from ECS handled automatically by the Streaming Storage Subsystem

StreamingApplicationBusiness Logic’

New version of app deployed exactly like production

1

Historical data accessed via same stream as production – just rewind the stream!

2

Once history is consumed, seamlessly start reading real-

time data!

3

Once you are confident things are working, turn off old version and redirect NOC consoles

4

Demo

© Copyright 2017 Dell Inc.22

Summary

1. “Streaming Architecture” replaces “Accidental Architecture”– Data: infinite/continuous vs. static/finite– Correctness in real-time: Exactly once processing + consistent storage

2. Pravega Streaming Storage Enables Storage Refactoring– Infinite, durable, scalable, re-playable, elastic append-only log– Open source project

3. Nautilus = Unified Storage + Unified Data Pipeline– The New Data Stack!