aws apac webinar week - real time data processing with kinesis

65

Upload: amazon-web-services

Post on 16-Apr-2017

1.942 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: AWS APAC Webinar Week - Real Time Data Processing with Kinesis
Page 2: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

aws.amazon.com/webinars/apac/webinar-week | #AWSWebinarWeek

Page 3: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Real-time Data ProcessingKinesis and beyond

Santanu Dutt

Page 4: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 5: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Examples• Algorithmic Trading < 10 msec• Real time bidding < 100 msec• Common IoT scenarios < 5 to 10 sec • Infrastructure Monitoring Dashboard < 1 min• Google Maps Traffic < 5 mins• Social Network and Media recommendation < 15 min to a Day• Most Business Analytics Scenarios < 30 mins• Social Network listening < Depends on how fast you want to respond>!

Page 6: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 7: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Page 8: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Examples• Algorithmic Trading < 10 msec• Real time bidding < 100 msec• Common IoT scenarios < 5 to 10 sec • Infrastructure Monitoring Dashboard < 1 min• Google Maps Traffic < 5 mins• Social Network and Media recommendation < 15 min to a Day• Most Business Analytics Scenarios < 30 mins• Social Network listening < Depends on how fast you want to respond!

Page 9: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Page 10: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

ChallengesA. Speed of Analytics and Response

B. Volume of data

C. Maturity or Capabilities of Analytics Framework

D. Storing and Presentation of results

Page 11: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

The Motivation for Continuous Processing

Page 12: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Some statistics about what AWS Data Services• Metering service

• 10s of millions records per second• Terabytes per hour• Hundreds of thousands of sources• Auditors guarantee 100% accuracy at month end

• Data Warehouse• 100s extract-transform-load (ETL) jobs every day• Hundreds of thousands of files per load cycle• Hundreds of daily users• Hundreds of queries per hour

Page 13: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Metering Service

Page 14: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Internal AWS Metering ServiceWorkload• 10s of millions records/sec• Multiple TB per hour• 100,000s of sources

Pain points• Doesn’t scale elastically• Customers want real-time

alerts• Expensive to operate• Relies on eventually consistent

storage

Page 15: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Our Big Data Transition

Old requirements• Capture huge amounts of data and process it in hourly or daily batches

New requirements• Make decisions faster, sometimes in real-time• Scale entire system elastically • Make it easy to “keep everything”• Multiple applications can process data in parallel

Page 16: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

A General Purpose Data FlowMany different technologies, at different stages of evolution

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Kafka

?

Page 17: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

vKinesis

Movement or activity in response to a stimulus.

A fully managed service for real-time processing of high-volume, streaming data. Kinesis can store and process terabytes of data an hour from hundreds of thousands of sources. Data is replicated across multiple Availability Zones to ensure high durability and availability.

Page 18: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Customer View

Page 19: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis

Data Types IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data

Software/ Technology

IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational Intelligence

Digital Ad Tech./ Marketing

Advertising Data aggregation Advertising metrics like coverage, yield, conversion

Analytics on User engagement with Ads, Optimized bid/ buy engines

Financial Services Market/ Financial Transaction order data collection

Financial market data metrics Fraud monitoring, and Value-at-Risk assessment, Auditing of market order data

Consumer Online/E-Commerce

Online customer engagement data aggregation

Consumer engagement metrics like page views, CTR

Customer clickstream analytics, Recommendation engines

Customer Scenarios across Industry Segments

1 2 3

Page 20: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

What Biz. Problem needs to be solved? Mobile/ Social Gaming Digital Advertising Tech.

Deliver continuous/ real-time delivery of game insight data by 100’s of game servers

Generate real-time metrics, KPIs for online ad performance for advertisers/ publishers

Custom-built solutions operationally complex to manage, & not scalable

Store + Forward fleet of log servers, and Hadoop based processing pipeline

• Delay with critical business data delivery• Developer burden in building reliable, scalable

platform for real-time data ingestion/ processing• Slow-down of real-time customer insights

• Lost data with Store/ Forward layer• Operational burden in managing reliable, scalable platform

for real-time data ingestion/ processing• Batch-driven real-time customer insights

? Accelerate time to market of elastic, real-time applications – while minimizing operational overhead

Generate freshest analytics on advertiser performance to optimize marketing spend, and increase responsiveness to clients

Page 21: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 22: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 23: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Amazon Kinesis StreamsBuild your own data streaming applications

• Easy administration: Simply create a new stream, and set the desired level of capacity with shards. Scale to match your data throughput rate and volume.

• Build real-time applications: Perform continual processing on streaming big data using Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.

• Low cost: Cost-efficient for workloads of any scale.

Page 24: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Kinesis Architecture

Run code in response to an event and automatically manage compute.

Page 25: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Amazon Kinesis – An Overview

Page 26: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Kinesis Stream: Managed ability to capture and store data

• Streams are made of Shards

• Each Shard ingests data up to

1MB/sec, and up to 1000 TPS

• Each Shard emits up to 2 MB/sec

• All data is stored for 24 hours

• Scale Kinesis streams by adding or

removing Shards

• Replay data inside of 24Hr. Window

Page 27: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Putting Data into KinesisSimple Put interface to store data in Kinesis• Producers use a PUT call to store data in a

Stream• PutRecord {Data, PartitionKey,

StreamName}

• A Partition Key is supplied by producer and used to distribute the PUTs across Shards

• Kinesis MD5 hashes supplied partition key over the hash key range of a Shard

• A unique Sequence # is returned to the Producer upon a successful PUT call

Page 28: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Creating and Sizing a Kinesis Stream

Page 29: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Building Kinesis Processing Apps: Kinesis Client LibraryClient library for fault-tolerant, at least-once, Continuous Processing

o Java client library, source available on Github

o Build & Deploy app with KCL on your EC2 instance(s)

o KCL is intermediary b/w your application & stream

Automatically starts a Kinesis Worker for each shard

Simplifies reading by abstracting individual shards

Increase / Decrease Workers as # of shards changes

Checkpoints to keep track of a Worker’s location in the

stream, Restarts Workers if they fail

o Integrates with AutoScaling groups to redistribute workers to

new instances

Page 30: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Amazon Kinesis Connector LibraryCustomizable, Open Source code to Connect Kinesis with S3, Redshift, DynamoDB

S3

DynamoDB

Redshift

Kinesis

ITransformer

• Defines the transformation of records from the Amazon Kinesis stream in order to suit the user-defined data model

IFilter

• Excludes irrelevant records from the processing.

IBuffer

• Buffers the set of records to be processed by specifying size limit (# of records)& total byte count

IEmitter

• Makes client calls to other AWS services and persists the records stored in the buffer.

Page 31: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

USE Cases Ultra Low Latency Analytics (seconds) Complex Computations• => Complex algorithm execution

• => Tuple Processing – every bit of data processed independently vs. aggregation where it goes from 1st row to last row.

• => Moving Window Analysis – moving car from 2nd to 3rd min and then 5th to 6th min.

Page 32: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 33: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Amazon Kinesis FirehoseLoad massive volumes of streaming data into Amazon S3 and Amazon Redshift

• Zero administration: Capture and deliver streaming data into S3, Redshift, and other destinations without writing an application or managing infrastructure.

• Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations.

• Seamless elasticity: Seamlessly scales to match data throughput w/o intervention

Capture and submit streaming data to Firehose

Firehose loads streaming data continuously into S3 and Redshift

Analyze streaming data using your favorite BI tools

Page 34: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Amazon Kinesis Firehose to RedshiftA two-step process

• Use customer-provided S3 bucket as an intermediate destination• Still the most efficient way to do large scale loads to Redshift.• Never lose data, always safe, and available in your S3 bucket.

• Firehose issues customer-provided COPY command synchronously. It continuously issues a COPY command once the previous COPY command is finished and acknowledged back from Redshift.

1

2

Page 35: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

USE Cases Kinesis Firehose used when needed to do batch with more frequency. As

long as analysis can be done with SQL.

Micro-batching scenarios with latencies more 60 second tolerable

In case of Redshift Target – Analytics that can be achieved with standard SQL and User Defined Functions (UDFs)

Most “Real-Time Business Insights” kind of scenarios can be easily supported with

Kinesis Firehose + Redshift!

Page 36: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 37: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Amazon Kinesis AnalyticsAnalyze data streams continuously with standard SQL

• Apply SQL on streams: Easily connect to data streams and apply existing SQL skills.

• Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies

• Scale elastically: Elastically scales to match data throughput without any operator intervention.

Announcement Only!

Amazon Confidential

Connect to Kinesis streams,Firehose delivery streams

Run standard SQL queries against data streams

Kinesis Analytics can send processed data to analytics tools so you can create alerts and

respond in real-time

Page 38: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

USE Cases

Low latency time series analytics

Analytics that can be achieved with confines of supported SQL • - Running Totals• - Moving Averages• - Number of people entering a stadium

Page 39: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 40: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Amazon DynamoDB Streams – time-ordered sequence of item-level changes• Time and partition ordered log

• Provides a stream of inserts, deletes, updates• Old item• New item• Primary key• Change type

• Stream items delivered exactly once

• Streams are asynchronous

• Scales with your table

DynamoDB DynamoDB Streams

Page 41: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

USE Cases

Ultra Low Latency Analytics (seconds) when data is available in Kinesis and DynamoDB Stream, e.g.

Energy meters data coming into Kinesis, to continuously update billing info.

Changes to social network profile stored in DynamoDB, to transmit updates to connection immediately (e.g. user adds a new job to his profile).

Page 42: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Stream and Kinesis Stream processing using Lambda5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 43: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

How Elasticsearch can help

• Combined with Logstash and Kibana, the ELK stack provides a tool for real-time analytics and data visualization

Page 44: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Plug-insA. Kinaba 3B. Kibana 4C. JettyD. cloud-awsE. KuromojiF. icu

Page 45: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Page 46: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Page 47: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

ElasticSearch APIQUERY

AGGREGATION

Page 48: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Aggregation and FilteringDocuments

Page 49: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Aggregation and FilteringDocuments

Query

Page 50: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Aggregation and FilteringDocuments

Query

Buckets

Page 51: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Aggregation and FilteringDocuments

Query

Buckets

Page 52: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Aggregation and FilteringDocuments

Query

Buckets

Metrics 123 420 510

Page 53: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

USE Cases Real-Time Dashboards (Kibana)

Alerting (Percolator API)

Real-Text Analytics, as in Social Media Listening

Real-Time Geospatial Queries and Geospatial Analysis

Page 54: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis & Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Stream and Kinesis Stream processing using Lambda5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 55: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

AWS IoT

“Securely connect one or one-billion devices to AWS, so they can interact with applications and other devices”

Page 56: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

AWS IoT

DEVICE SDKSet of client libraries to

connect, authenticate and exchange messages

DEVICE GATEWAYCommunicate with devices via

MQTT and HTTP

AUTHENTICATIONAUTHORIZATION

Secure with mutual authentication and encryption

RULES ENGINETransform messages

based on rules and route to AWS Services

AWS Services- - - - -

3P Services

DEVICE SHADOWPersistent thing state during

intermittent connections

APPLICATIONS

AWS IoT API

DEVICE REGISTRYIdentity and Management of

your things

Page 57: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

USE Cases Processing sensor data (millions of data points from hundreds of thousands of

sensors) in real time for Alerting

Redirecting sensor data for multi-data-point analysis to Kinesis, DynamoDB

Page 58: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Spark/Storm

Lambda(arbitrary, Node,

Python, Java)

Redshift(structured, SQL)

ElasticSearch(un-structured, JSON)

HIVE SQL

Quick Sight(GUI)

Kinesis Analytics(Limited SQL)

IoT Rule Engine(SQL)

Diffi

culty

of w

orki

ng

with

Page 59: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Spark/StormKinesis

ElasticSearch+ Logstash

Lambda+ Kinesis

Kinesis Analytics

Redshift + DMS

Redshift +Firehose

MR/HIVE/Impala/ Presto +

Firehose

Quick Sight

LATENCY

CAPA

BILI

TIES

IoT Rule Engine

Sub-second Few seconds 2-5 Minutes

Storm+ Kafka

Page 60: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis & Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Stream and Kinesis Stream processing using Lambda5. ElasticSearch6. IoT Rule Engine

D. Demo

Page 61: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

v

Demo Time.

Website - https://secure.amitksh.net/cdn/webinarWeek.htmlReal time updates from Kinesis - https://secure.amitksh.net/rtChart.html

Page 62: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Interesting Possibilities!

Page 63: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Quick Sight

Page 64: AWS APAC Webinar Week - Real Time Data Processing with Kinesis
Page 65: AWS APAC Webinar Week - Real Time Data Processing with Kinesis

Online Labs & Training

Gain confidence and hands-on experience with AWS.

Watch free Instructional Videos and explore Self-Paced Labs

Instructor Led Classes

Learn how to design, deploy and operate highly available, cost-

effective and secure applications on AWS in courses led by qualified

AWS instructors

Validate your technical expertise with AWS and use practice exams to help you

prepare for AWS Certification

AWS Certification

More info at http://aws.amazon.com/training