aws december 2015 webinar series - design patterns using amazon dynamodb

Rick Houlihan, Principal Solutions Architect

12/8/2015

DynamoDBDesign Patterns and Best Practices

What to expect from the session

• Brief history of data processing• DynamoDB Internals

• Tables, API, data types, indexes • Scaling and data modeling

• Design patterns and best practices• Event driven applications and DDB Streams

Timeline of Database Technology

Data Volume Since 2010

• 90% of stored data generated in last 2 years

• 1 Terabyte of data in 2010 equals 6.5 Petabytes today

• Linear correlation between data pressure and technical innovation

• No reason these trends will not continue over time

Technology Adoption and the Hype Curve

Why NoSQL?

SQL NoSQL

SQL vs. NoSQL Access Pattern

TableTable

Attributes

PartitionKey

SortKey

MandatoryKey-value access patternDetermines data distribution Optional

Model 1:N relationshipsEnables rich query capabilities

All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values

00 55 A954 FFAA00 FF

Partition KeysPartition Key uniquely identifies an itemPartition Key is used for building an unordered hash indexAllows table to be partitioned for scale

Id = 1Name = Jim

Hash (1) = 7B

Id = 2Name = AndyDept = EngHash (2) = 48

Id = 3Name = KimDept = Ops

Hash (3) = CD

Key Space

Partition:Sort KeyPartition:Sort Key uses two attributes together to uniquely identify an ItemWithin unordered hash index, data is arranged by the sort keyNo limit on the number of items (∞) per partition key

• Except if you have local secondary indexes

00:0 FF:∞

Hash (2) = 48

Customer# = 2Order# = 10Item = Pen

Customer# = 2Order# = 11Item = Shoes

Customer# = 1Order# = 10Item = Toy

Customer# = 1Order# = 11Item = Boots

Hash (1) = 7B

Customer# = 3Order# = 10Item = Book

Customer# = 3Order# = 11Item = Paper

Hash (3) = CD

55 A9:∞54:∞ AA

Partition 1 Partition 2 Partition 3

Partitions are three-way replicated

Id = 2Name = AndyDept = Engg

Id = 1Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Indexes

Local secondary index (LSI)

Alternate sort key attributeIndex is local to a partition key

A1(partition)

A3(sort)

A2(item key)

A1(partition)

A2(sort)

A3 A4 A5

LSIs A1(partition)

A4(sort)

A2(item key)

A3(projected)

KEYS_ONLY

INCLUDE A3

A1(partition)

A5(sort)

A2(item key)

A3(projected)

A4(projected) ALL

10 GB max per partition key, i.e. LSIs limit the # of range keys!

Global secondary index (GSI)Alternate partition and/or sort keyIndex is across all partition keys

A1(partition)

A2 A3 A4 A5

GSIs A5(partition)

A4(sort)

A1(item key)

A3(projected)

INCLUDE A3

A4(partition)

A5(sort)

A1(item key)

A2(projected)

A3(projected) ALL

A2(partition)

A1(itemkey) KEYS_ONLY

RCUs/WCUs provisioned separately for GSIs

Online indexing

How do GSI updates work?

Primary tablePrimary

tablePrimary

tableGlobal

Secondary Index

Client1. Update request

2. Asynchronous update (in progress)

2. Update response

If GSIs don’t have enough write capacity, table writes will be throttled!

LSI or GSI?

LSI can be modeled as a GSIIf data size in an item collection > 10 GB, use GSIIf eventual consistency is okay for your scenario, use GSI!

Scaling

Throughput• Provision any amount of throughput to a table

Size• Add any number of items to a table

• Max item size is 400 KB• LSIs limit the number of range keys due to 10 GB limit

Scaling is achieved through partitioning

Throughput

Provisioned at the table level• Write capacity units (WCUs) are measured in 1 KB per second• Read capacity units (RCUs) are measured in 4 KB per second

• RCUs measure strictly consistent reads• Eventually consistent reads cost 1/2 of consistent reads

Read and write throughput limits are independent

WCURCU

Partitioning math

In the future, these details might change…

Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500

RCUs per partition = 5000/3 = 1666.67WCUs per partition = 500/3 = 166.67Data/partition = 10/3 = 3.33 GBRCUs and WCUs are uniformly spread across partitions

What causes throttling?

If sustained throughput goes beyond provisioned throughput per partition

Non-uniform workloads• Hot keys/hot partitions• Very large bursts

Mixing hot data with cold data• Use a table per time period

From the example before:• Table created with 5000 RCUs, 500 WCUs• RCUs per partition = 1666.67• WCUs per partition = 166.67• If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB

may throttle requests• Solution: Increase provisioned throughput

What bad NoSQL looks like…

Getting the most out of DynamoDB throughput

“To get the most out of DynamoDB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.”

—DynamoDB Developer Guide

Space: access is evenly spread over the key-space

Time: requests arrive evenly spaced in time

Much better picture…

Data modeling

1:1 relationships or key-values

Use a table or GSI with an alternate partition keyUse GetItem or BatchGetItem APIExample: Given an SSN or license number, get attributes

Users TablePartiton key AttributesSSN = 123-45-6789 Email = johndoe@nowhere.com, License = TDL25478134SSN = 987-65-4321 Email = maryfowler@somewhere.com, License = TDL78309234

Users-Email-GSIPartition key AttributesLicense = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789

1:N relationships or parent-children

Use a table or GSI with partition and sort keyUse Query API

Example:• Given a device, find all readings between epoch X, Y

Device-measurementsPartition Key Sort key AttributesDeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90

N:M relationships

Use a table and GSI with partition and sort key elements switchedUse Query API Example: Given a user, find all games. Or given a game, find all users.

User-Games-TablePartition Key Sort keyUserId = bob GameId = Game1UserId = fred GameId = Game2UserId = bob GameId = Game3

Game-Users-GSIPartition Key Sort keyGameId = Game1 UserId = bobGameId = Game2 UserId = fredGameId = Game3 UserId = bob

Hierarchical Data

Tiered relational data structures

Hierarchical Data Structures as Items…Use composite sort key to define a HierarchyHighly selective result sets with sort queriesIndex anything, scales to any size

… or as Documents (JSON)JSON data types (M, L, BOOL, NULL)Document SDKs AvailableIndexing only via Streams/Lambda400KB max item size (limits hierarchical data structure)

Scenarios and best practices

Event logging

Storing time series data

Time series tables

Events_table_2015_AprilEvent_id(Partition)

Timestamp(Sort)

Attribute1 …. Attribute N

Events_table_2015_MarchEvent_id(Partition)

Timestamp(Sort)

Events_table_2015_FeburaryEvent_id(Partition)

Timestamp(Sort)

Events_table_2015_JanuaryEvent_id(Partition)

Timestamp(Sort)

RCUs = 1000WCUs = 1

RCUs = 10000WCUs = 10000

RCUs = 100WCUs = 1

RCUs = 10WCUs = 1

Current table

Older tables

Don’t mix hot and cold data; archive cold data to Amazon S3

Dealing with time series data

Use a table per time period

Pre-create daily, weekly, monthly tablesProvision required throughput for current tableWrites go to the current tableTurn off (or reduce) throughput for older tables

Product catalog

aws december 2015 webinar series - design patterns using amazon dynamodb

Technology

valmet industrial internet & snowflake - learnings · -...

yinlin chen, jim tuttle, william a. ingram information...

cloud scaling of visual weather - ecmwf · aws sqs asw sns...

aws september webinar series - getting started with dynamodb...

aws prescriptive guidance - patterns · aws prescriptive...

introduction to amazon dynamodb · © 2020, amazon web...

aws re:invent 2016: streaming etl for rds and dynamodb...

(sdd407) amazon dynamodb: data modeling and scaling best...

amazon dynamodb design patterns for ultra-high performance...

(sdd424) simplifying scalable distributed applications using...

deep dive on amazon dynamodb - aws online tech talks

intro to big data on aws igor roiter big data cloud ......

design patterns using amazon dynamodb

amazon dynamodb deep dive: advanced design patterns - aws

getting started with aws - sysfore · aws cloudtrail, aws...

autoscale dynamodb with dynamic dynamodb

aws webcast - data modeling and best practices for scaling...

dat302 under the covers of amazon dynamodb - aws re: invent...

asa: amazon shopping assistant€¦ · aws lambda & aws api...

lab 8 - streaming data to dynamodb - amazon s3 · pdf...