aws re:invent 2016: metering big data at aws: from 0 to 100 million records in 1 second (arc308)

58
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Diego Macadar - Michael Fort December 1, 2016 From 0 to 100M Records in 1 Second AWS Metering ARC308

Upload: amazon-web-services

Post on 16-Apr-2017

346 views

Category:

Technology


0 download

TRANSCRIPT

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Diego Macadar - Michael Fort

December 1, 2016

From 0 to 100M Records in 1 SecondAWS Metering

ARC308

What to expect from the session

Tools and techniques to

deal with exponential

growth of data.

Three principles

• 100% accurate

• Once and only once guarantee

• Idempotent processing

• Horizontally scalable

• Loosely coupled components

• Elasticity: Automated scaling

• Focus on the business

• Operationally excellent

• Use managed frameworks

Architecture

Global Data

Global State

Transform Analyze Aggregate DeliverCollect

Audit

Streaming components

Global Data

Global State

Transform Analyze Aggregate DeliveryCollect

Audit

Batch components

Global Data

Global State

Transform Analyze Aggregate DeliveryCollect

Audit

Three logical entities

ComputeStateData

Data

ComputeStateData

Global data

Global Data

Global State

Transform Analyze Aggregate DeliverCollect

Audit

Global data

Amazon

S3

Structured dataUnstructured data

vs

• Must be immutable

• Avoid performance bottlenecks by using storage best practices

• Monitoring with Amazon CloudWatch

• Secure data using versioning and encryption

Amazon

DynamoDB

Amazon

RDS

Global data example

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”energyUsage",

“socketIdentifier”:

“dac06b790cb5b0856437b3efa92bd891”,

”lightIdentifier": "000000000001",

“value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”lumens",

“socketIdentifier”:

“dac06b790cb5b0856437b3efa92bd891”,

”lightIdentifier": "000000000001",

“value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”outage",

“socketIdentifier”:

“dac06b790cb5b0856437b3efa92bd8f3”,

”lightIdentifier": "000000000001",

“value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000,

"eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000,

"eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”energyUsage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”lumens", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6", "timestamp": 1476477276000, "eventType": ”outage", “socketIdentifier”: “dac06b790cb5b0856437b3efa92bd891”,”lightIdentifier": "000000000001", “value”: “1” }

Architecture

DeliverCollect

Audit

AggregateTransform Analyze

Global State

Amazon

S3

Global Store

Local data

ServerAWS Cloud

Amazon

S3

Amazon

DynamoDB

Amazon

RDS

Local store

• Can be mutable

• Cache data locally to speed up processing

• Invalidate local data once processed

• Persist all long-term data in globally accessible cloud store

Local data example

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”energyUsage",

“socketIdentifier”:

“dac06b790cb5b0856437b3efa92bd891”,

”lightIdentifier": "000000000001",

“value”: “23” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”lumens",

“socketIdentifier”:

“dac06b790cb5b0856437b3efa92bd891”,

”lightIdentifier": "000000000001",

“value”: “300” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”outage",

“socketIdentifier”:

“dac06b790cb5b0856437b3efa92bd8f3”,

”lightIdentifier": "000000000001",

“value”: “1” }

Transform

{ "clientId": ”bestHotel",

"timestamp": 10/14/2016,

"eventType": ”energyUsage",

“socketIdentifier”:

“dac06b790cb5b0856437b3efa92bd891”,

”lightIdentifier": ”LED_A1",

“value”: “23” }

{ "clientId": " bestHotel",

"timestamp": 10/14/2016,

"eventType": ”lumens",

“socketIdentifier”:

“dac06b790cb5b0856437b3efa92bd891”,

”lightIdentifier": " LED_A1",

“value”: “300” }

{ "clientId": " bestHotel",

"timestamp": 10/14/2016,

"eventType": ”outage",

”lightIdentifier": " LED_A1",

“value”: “1” }

State

ComputeStateData

Architecture

DeliverCollect

Audit

AggregateTransform Analyze

Global State

Amazon

S3

Global Store

Global state

Source Sink Sink

AWS Cloud

Amazon

S3

Amazon

DynamoDB

Amazon

RDS

Global state examples

Failed

Created

Completed

Transformed

Architecture

DeliveryCollect

Amazon

DynamoDB

Audit

AggregateTransform Analyze

Global State

Amazon

S3

Global Store

Mutually shared state

Source Sink

Channel

State

Channel selection

Amazon SQS Amazon Kinesis

Order Not ordered Ordered

Locality Not localized LocalizedDelivery At-least-once At-least-once

Channel

Attributes

Hot partition example

H1 H2

H1 H1

H1 H1

H1 H2

H1 H2

Amazon Kinesis hotspot management

Amazon Kinesis Stream

Producer Consumer

PutRecord GetRecords

Hotel ID Entropy

Partition Key

Entropy = MD5 % Partition Key Size

CONSTRAINTS:

• Hash function calculation is idempotent

• Partition Key Size changes are time versioned

• Partition Key Size is selected based on time

information of the entity

Hot partition example

H1 H2

H1 H1

H1 H1

H1 H2

H1 + 0 H1 + 1 H2 + 0

Amazon Kinesis hotspot management

Amazon Kinesis Stream

Producer Consumer

PutRecord GetRecords

AWS Cloud

Amazon

S3

Amazon

DynamoDB

Amazon

RDS

Capture

Stream IO

Statistics

Hotspot

Manager

Reads

Stream IO

Statistics

DescribeStreamSplitShardMergeShards

Read

Partition

Information

Update

Partition

Information

Architecture

DeliveryCollect

Amazon

DynamoDB

Audit

AggregateTransform Analyze

Global State

Amazon

S3

Global Store

Local state

ServerAWS Cloud

Amazon

S3

Amazon

DynamoDB

Amazon

RDS

Local cache

• Cache state locally with Write-Once-Read-Many (WORM)

characteristic

• Validate state cache against global store as often as possible

• Read state directly from global store which changes often

Architecture

DeliveryCollect

Amazon

DynamoDB

Audit

AggregateTransform Analyze

Global State

Amazon

S3

Global Store

Compute

ComputeStateData

Server-based compute

Compute

Serverless compute

Amazon EC2 AWS Lambda

• No server management

• Out-of-the-box scaling

• Out-of-the-box metrics

• Out-of-the-box logging

• Fine grained controls

• Time-sensitive response

• Co-location of resources

• Clustering

vs

Architecture

DeliveryCollect

Amazon

DynamoDB

AWS Lambda Amazon EC2

Audit

Aggregate

Global State

Amazon

S3

Global Store

Amazon EC2 Auto Scaling

Amazon EC2 w/ Auto Scaling

Amazon

CloudWatch

Auto

Scaling

Monitors

CloudWatch

Alarms

EC2 emits

metrics to

CloudWatch

Architecture

DeliveryCollect

Amazon

DynamoDB

AWS Lambda

Amazon EC2 w/ Auto Scaling

Audit

Aggregate

Global State

Amazon

S3

Global Store

Map Reduce workflow

Lock input dataset for idempotent execution

Amazon

DynamoDB

List of

Manifests

List of

Batches

Map and

Reduce

RecordsAmazon

S3

Architecture

DeliveryCollect

Amazon

DynamoDB

AWS Lambda

Amazon EC2 w/ Auto Scaling

AWS Lambda Amazon EMR

Audit

Global State

Amazon

S3

Global Store

Cluster management

Amazon EMR

ControllerCluster Manager

Amazon

DynamoDB

Amazon

EMR

Gather backlog

Information

Find and Lease

Cluster

Spin-up / Tear Down

ClustersEnqueue Step

Architecture

DeliveryCollect

Amazon

DynamoDB

AWS Lambda

Amazon EC2 w/ Auto Scaling

AWS Lambda Amazon EMR

Audit

Global State

Amazon

S3

Global Store

External-facing API

Elastic Load

Balancing

Amazon

CloudFrontAmazon

Route 53Amazon API

Gateway

• Authorization • Version control

• Authentication • DDOS prevention

• Caching • Throttling

• Scale

Audit

Architecture

DeliveryCollect

Amazon

DynamoDB

AWS Lambda

Amazon EC2 w/ Auto Scaling

AWS Lambda Amazon EMR

AWS Lambda AWS Lambda

Amazon API

Gateway

Amazon API

Gateway

Global State

Amazon

S3

Global Store

Incremental auditingTransitive property of equality

If A = B and B = C, then A = C

Color() Unique()

Audit() Audit()

Checksum auditing

Fixed – Static through the end of processing

Checksum = HF(Fixed + Transformed) * Aggregating Value

{ "clientId": "bestHotel",

"timestamp": 10/14/2016,

"eventType": ”outage",

”lightIdentifier": "LED_A1",

“value”: “2” }

Result

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”outage",

“socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”,

”lightIdentifier": "000000000001",

“value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”outage",

“socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”,

”lightIdentifier": "000000000001",

“value”: “1” }

Source

Checksum auditingChecksum = HF(Fixed + Transformed) * Aggregating Value

{ "clientId": "bestHotel",

"timestamp": 10/14/2016,

"eventType": ”outage",

”lightIdentifier": "LED_A1",

“value”: “2” }

Result

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”outage",

“socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”,

”lightIdentifier": "000000000001",

“value”: “1” }

{ "clientId": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",

"timestamp": 1476477276000,

"eventType": ”outage",

“socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”,

”lightIdentifier": "000000000001",

“value”: “1” }

Source

Transformed – Changed in the lifetime of processing

Checksum auditingChecksum = HF(Fixed + Transformed) * Aggregating Value

{ "clientId": "bestHotel",

"timestamp": 10/14/2016,

"eventType": ”outage",

”lightIdentifier": "LED_A1",

“value”: “2” }

Result

{ "clientId": ”bestHotel",

"timestamp": 10/14/2016,

"eventType": ”outage",

“socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”,

”lightIdentifier": "LED_A1",

“value”: “1” }

{ "clientId": "bestHotel ",

"timestamp": 10/14/2016,

"eventType": ”outage",

“socketIdentifier”: “dac06b790cb5b0856437b3efa92bd8f3”,

”lightIdentifier": "LED_A1",

“value”: “1” }

Source

Perform transformations and filters on source data

Checksum auditingChecksum = HF(Fixed + Transformed) * Aggregating Value

{ "clientId": "bestHotel",

"timestamp": 10/14/2016,

"eventType": ”outage",

”lightIdentifier": "LED_A1",

“value”: “2” }

Result

{ "clientId": ”bestHotel",

"timestamp": 10/14/2016,

"eventType": ”outage",

”lightIdentifier": "LED_A1",

“value”: “1” }

{ "clientId": "bestHotel ",

"timestamp": 10/14/2016,

"eventType": ”outage",

”lightIdentifier": "LED_A1",

“value”: “1” }

Source

Run a hashing function over the fixed and transformed fields

1ae035081ed6c9a40f1c6eb1177350a9

Checksum auditingChecksum = HF(Fixed + Transformed) * Aggregating Value

1ae035081ed6c9a40f1c6eb1177350a9

{“value”: “2” }

Result

1ae035081ed6c9a40f1c6eb1177350a9

{“value”: “1” }

1ae035081ed6c9a40f1c6eb1177350a9

{“value”: “1” }

Source

Aggregating Value – Field used for aggregation during processing

Checksum auditingChecksum = HF(Fixed + Transformed) * Aggregating Value

1ae035081ed6c9a40f1c6eb1177350a9

{“value”: “2” }

Result

1ae035081ed6c9a40f1c6eb1177350a9

{“value”: “1” }

1ae035081ed6c9a40f1c6eb1177350a9

{“value”: “1” }

Source

Multiply hash * aggregating value

1AE035081ED6C9A40F1C6EB1177350A9

35C06A103DAD93481E38DD622EE6A1521AE035081ED6C9A40F1C6EB1177350A9

Checksum auditingAssert(sum(sourceChecksums) = sum(resultChecksums))

ResultSource

Sum results compare source vs results

1AE035081ED6C9A40F1C6EB1177350A9

35C06A103DAD93481E38DD622EE6A1521AE035081ED6C9A40F1C6EB1177350A9

35C06A103DAD93481E38DD622EE6A152 35C06A103DAD93481E38DD622EE6A152

Architecture

DeliveryCollect

Amazon

DynamoDB

AWS Lambda

Amazon EC2 w/ Auto Scaling

AWS Lambda Amazon EMR

AWS Lambda AWS Lambda

Amazon API

Gateway

Amazon API

Gateway

Global State

Audit AuditAmazon

S3

Global Store

Thank you!

Remember to complete

your evaluations!