introduction to azure documentdb
TRANSCRIPT
![Page 1: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/1.jpg)
Introduction to Azure DocumentDB
Denny Lee,Principal Program Manager, Azure DocumentDB
![Page 2: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/2.jpg)
Denny Lee• Principal Program Manager for Azure DocumentDB• 20+ years of experience in databases, distributed
systems, data sciences, and software development at Microsoft, Concur, and Databricks
• Noteable Projects:• Project Isotope: Incubation team for HDInsight• Yahoo! 24TB cube: Largest SSAS cube in production
@dennylee
![Page 3: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/3.jpg)
A Brief Overview...
![Page 4: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/4.jpg)
{ "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ]}
Perfect for these
Documentsschema-agnostic JSON store
for
hierarchical and de-normalized data at scale
![Page 5: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/5.jpg)
Not these documents
![Page 6: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/6.jpg)
{ "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ]}
Perfect for these
Documentsschema-agnostic JSON store
for
hierarchical and de-normalized data at scale
![Page 7: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/7.jpg)
Elastically Scalable Throughput + Storage
![Page 8: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/8.jpg)
Guaranteed low latency
Reads <10ms @ P99Writes <15ms @ P99
![Page 9: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/9.jpg)
Globally Distributed
![Page 10: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/10.jpg)
Speaks your language
![Page 11: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/11.jpg)
DocumentDB Query Playground
Demo
Code: https://www.documentdb.com/sql/demo
![Page 12: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/12.jpg)
A Primer on Scale...
![Page 13: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/13.jpg)
The 4 Vs of Big DataExceeds physical limits of vertical scalabilityVolume
Many different formats making integration expensiveVariety
Small decision window compared to data change rateVelocity
Many options or variables confounding analysisVariability
![Page 14: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/14.jpg)
The 4 Vs of Big DataVolume Variety Velocity Variability
Mobile Apps Retail Learning Telematics IoT Gaming
![Page 15: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/15.jpg)
Let’s talk about scale.
Volume and Velocity
![Page 16: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/16.jpg)
![Page 17: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/17.jpg)
Ability to Scale from Day 1• Bursty • Unpredictable traffic
Gaming + Social Experience• Lag-free• Responsive experiences
Move fast without breaking things• Iterative development needs
More users, more problems
![Page 18: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/18.jpg)
• Game scores, guilds and social membership
• Leaderboards by country and social• Guild management and messaging• #1 in Apple app store for free apps
<10ms
99P query latency
>1M game
downloads
~1B requests / day
The Walking Dead, results
![Page 19: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/19.jpg)
Caches• Scores are continuously
updated• Write heavy without
locality
RDBMS• Scale-out requires partitioning• Schema and index
management
Other NoSQL Stores• Longer tail on latencies• Need to specify secondary
indexes for lookups
The right tool for the job ?
![Page 20: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/20.jpg)
Fully managed NoSQL databaseHorizontal scaling for TB and RPSHigh performance, write optimizedSchema agnostic indexing
+Azure DocumentDB
The answer for low latency @ massive scale
![Page 21: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/21.jpg)
Fact: Managing shards is really painful.
Managing shards or partitions
Good news: DocumentDB has done all the heavy lifting.
![Page 22: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/22.jpg)
Elastic scale
![Page 23: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/23.jpg)
Measuring Throughput (Request Units)
Replica gets a fixed budget of request units
Request Unit/sec (RU) is the normalized currency
% IOPS
% CPU
% Memory
READGET Document
Documents
INSERTPOST
REPLACEPUT Document
Operations consume request units (RUs)
QueryPOST Documents
…
Min RU/sec
Max RU/sec
Inco
min
g Re
ques
ts
Replica Quiescent
Ratelimit
Nothrottling
Requests get rate limited if they exceed the SLA Customers pay for reserved
request units by the hour
![Page 25: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/25.jpg)
Configured @10,100 RUs
~940 writes / s~9800 RUs
![Page 26: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/26.jpg)
Configured @250,000 RUs
~12,100 writes / s~128,800 RUsVM @ 99% CPU
![Page 27: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/27.jpg)
A Global Distribution Primer…
![Page 28: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/28.jpg)
Globally Distributed
Azure DocumentDB gives you the ability circumvent the speed of light!
High Availability and Disaster Recovery
Replicate to any Number of regions
Global low latency access
Dynamically configure write and read regions
![Page 29: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/29.jpg)
… with well-defined consistency models!
Consistency Level Strong Bounded Stateless Session Eventual
Total Global Order Yes Yes (outside of the “staleness window”)
No, partial “session” order
No
Consistent prefix guarantee
Yes Yes Yes Yes
Monotonic Reads Yes Yes (within region and across regions outside of the staleness window)
Yes (for the given session)
No
Monotonic Writes Yes Yes Yes Yes
Read your writes Yes Yes (in the write region) Yes No
stronger consistency
faster performance
![Page 30: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/30.jpg)
Global Distribution
Demo
Code: https://aka.ms/docdb-latency-script-nodejs
![Page 31: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/31.jpg)
![Page 32: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/32.jpg)
![Page 33: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/33.jpg)
![Page 34: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/34.jpg)
Common Scenarios
![Page 35: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/35.jpg)
Common scenarios
Retail Gaming IoT Social
Product Catalog
Recommendations
Personalization
User Store
Recommendations
Personalization
Event Store
Device Registry
Telemetry Store
User Behavior
Telemetry
Personalization
![Page 36: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/36.jpg)
Common scenarios
IoT
Event Store
Device Registry
Telemetry Store
IoT / Sensor Data Challenges:
• Hardware is relatively hard to update• Different generation of devices
=> different schemas (variety)• Many sensors emitting telemetry
=> high rate of ingestion (volume + variety)
![Page 37: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/37.jpg)
Top 5 Automotive Manufacture in the World
Telematics services include:• Safety service• Diagnostic service• Remote service
Ingest and query 100+ TB of semi-structure data
IoT : Vehicle Telematics
![Page 38: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/38.jpg)
IoT : Vehicle Telematics
Ingress API
Inbound Interface(Web API)
Raw Event Store (HOT)(DocumentDB)
Aggregated Event Store (Warm)(DocumentDB)
Aggregated Event Store (Cold)(Blob Storage)
Outbound Interface(Web API)
Message Queue(Event Hubs)
Stream Processor(Stream Analytics)
![Page 39: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/39.jpg)
Common scenarios
Social + AdTech Challenges:
• Ingest + Analyze Third Party Data => Who dictates schema? (variety)=> How do you index?
• A lot of social and user data=> high rate of ingestion (volume +
variety)
Social
User Behavior
Telemetry
Personalization
![Page 40: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/40.jpg)
• Startup - Advanced Marketing Intelligence Platform
• Utilizes deep learning to analyze billions of relational network connections to build a social fingerprint for each user
• Extracts knowledge and cultural insights by analyzing what people choose to follow
Social Analytics + Ad Technology
>1BSocial Media
Profiles
>50M
Tweets per Day
![Page 41: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/41.jpg)
• Store tweets, geo-location data, and ML results in DocumentDB
• Data from each social media producer has its own schema that evolves independently
• Need to iterate rapidly… no time for managing VMs
Social Analytics + Ad Technology
>1BSocial Media
Profiles
>50M
Tweets per Day
![Page 42: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/42.jpg)
Before moving to DocumentDB, my developers would need to come to me to confirm that our Elasticsearch deployment would support their data or if I would need to scale things to handle it. DocumentDB removed me as a bottleneck, which has been great for me and them.
Stephen Hankinson, CTO, Affinio
Quote
![Page 43: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/43.jpg)
Geospatial Supportincluding polygons
Demo
Want to try? Go to DocumentDB Query Playgroundhttps://www.documentdb.com/sql/demo
![Page 44: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/44.jpg)
Polygon Query Examplehttps://www.keene.edu/campus/maps/tool/
Polygon of coordinates-124.630000, 48.360000-123.870000, 46.140000-122.230000, 45.540000-119.170000, 45.950000-116.920000, 45.960000-116.990000, 49.000000-123.050000, 49.020000-123.150000, 48.310000-124.630000, 48.360000
![Page 45: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/45.jpg)
Finding Volcanos with DocumentDB
https://www.documentdb.com/sql/demo
![Page 46: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/46.jpg)
Data Sciences:Apache Spark + DocumentDB
![Page 47: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/47.jpg)
Example: Graph Structures
![Page 48: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/48.jpg)
Example: Graph Structures
![Page 49: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/49.jpg)
Classic Graph Scenario: Flights
vertex = airports
edges = flights
![Page 50: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/50.jpg)
Data Sciences:Apache Spark + DocumentDB
Demo
Notebook View: https://aka.ms/docdb-spark-graphpyView: https://aka.ms/pydocdb-spark-graphCode: https://aka.ms/docdb-spark-graph-code
![Page 51: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/51.jpg)
Graph Calculations: Degrees, PageRank
What is the most important airport (most flights in / out)
tripGraph.inDegrees\
.sort(desc("inDegree"))\
.limit(10))
![Page 52: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/52.jpg)
AdvantagesData Science Scenarios
• Blazing Fast IoT Scenarios
• Updateable columns
• Push-down predicate filtering
![Page 53: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/53.jpg)
AdvantagesBlazing Fast IoT Scenarios
Flight information
global safetyalerts
weather
Data Science Scenarios
Device Notifications
Web / REST API
![Page 54: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/54.jpg)
AdvantagesUpdateable Columns
Flight information
Data Science Scenarios
Device Notifications
Web / REST API
{ tripid: “100100”, delay: -5, time: “01:00:01”}
{ tripid: “100100”, delay: -30, time: “01:00:01”}
{delay:-30}
{delay:-30}
{delay:-30}
![Page 55: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/55.jpg)
AdvantagesPushdown Predicate Filtering Data Science Scenarios
{city:SEA}
locations headquarter exports
0 1
country
Germany
city
Seattle
country
France
city
Paris
city
Moscow
city
Athens
Belgium 0 1 {city:SEA, dst: POR, ...},{city:SEA, dst: JFK, ...}, {city:SEA, dst: SFO, ...}, {city:SEA, dst: YVR, ...}, {city:SEA, dst: YUL, ...}, ...
![Page 56: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/56.jpg)
References Get direct access to the engineering team -> [email protected]
Resources• Schema Agnostic Indexing with DocumentDB, VLDB 2015• Consistency Levels in DocumentDB• SQL Queries with DocumentDB• Language Integrated JavaScript queries and transactions with
DocumentDB• Distribute your data globally with DocumentDB
![Page 57: Introduction to Azure DocumentDB](https://reader035.vdocument.in/reader035/viewer/2022062503/589ee7a01a28abe97f8b4e41/html5/thumbnails/57.jpg)
More Resources
AskDocDB@microsoft
Follow @DocumentDBUse #DocumentDB
documentdb.com
#azure-documentDB