Download - Executing Queries on a Sharded Database
![Page 1: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/1.jpg)
Executing Queries on a Sharded Database
Neha NarulaSeptember 25, 2012
![Page 2: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/2.jpg)
Choosing and Scaling Your Datastore
Neha NarulaSeptember 25, 2012
![Page 3: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/3.jpg)
Who Am I?
FroogleBlobstoreNative Client
![Page 4: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/4.jpg)
In This Talk
• What to think about when choosing a datastore for a web application
• Myths and legends• Executing distributed queries• My research: a consistent cache
![Page 5: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/5.jpg)
Every so often…
Friends ask me for advice when they are building a new application
![Page 6: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/6.jpg)
Friend
![Page 7: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/7.jpg)
“Hi Neha! I am making a new application.
![Page 8: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/8.jpg)
I have heard MySQL sucks and I should use NoSQL.
![Page 9: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/9.jpg)
I am going to be iterating on my app a lot
![Page 10: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/10.jpg)
I don't have any customers yet
![Page 11: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/11.jpg)
and my current data set could fit on a thumb drive from 2004, but…
![Page 12: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/12.jpg)
Can you tell me which NoSQL database I should use?
![Page 13: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/13.jpg)
And how to shard it?"
![Page 14: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/14.jpg)
Neha
![Page 15: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/15.jpg)
http://knowyourmeme.com/memes/facepalm
![Page 16: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/16.jpg)
![Page 17: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/17.jpg)
![Page 18: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/18.jpg)
What to Think about When You’re Choosing a Datastore
Hint: Not Scaling
![Page 19: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/19.jpg)
Development Cycle
• Prototyping– Ease of use
• Developing– Flexibility, multiple developers
• Running a real production system– Reliability
• SUCCESS! Sharding and specialized datastorage – We should only be so lucky
![Page 20: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/20.jpg)
Getting Started
First five minutes – what’s it like?
Idea credit: Adam Marcus, The NoSQL Ecosystem, HPTS 2011
via Justin Sheehy from Basho
![Page 21: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/21.jpg)
First Five Minutes: Redis
http://simonwillison.net/2009/Oct/22/redis/
![Page 22: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/22.jpg)
First Five Minutes: MySQL
![Page 23: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/23.jpg)
Developing
• Multiple people working on the same code• Testing new features => new access patterns• New person comes along…
![Page 24: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/24.jpg)
Redis
Time to go get lunch
![Page 25: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/25.jpg)
MySQL
![Page 26: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/26.jpg)
Questions
• What is your mix of reads and writes?• How much data do you have?• Do you need transactions?• What are your access patterns?• What will grow and change?• What do you already know?
![Page 27: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/27.jpg)
Reads and Writes
• Write optimized vs. Read optimized• MongoDB has a global write lock per process• No concurrent writes!
![Page 28: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/28.jpg)
Size of Data
• Does it fit in memory?• Disk-based solutions will be slower• Worst case, Redis needs 2X the memory of
your data!– It forks with copy-on-write
sudo echo 1 > /proc/sys/vm/overcommit_memory
![Page 29: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/29.jpg)
Requirements
• Performance– Latency tolerance
• Durability– Data loss tolerance
• Freshness– Staleness tolerance
• Uptime– Downtime tolerance
![Page 30: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/30.jpg)
Performance
• Relational databases are considered slow• But they are doing a lot of work!• Sometimes you need that work, sometimes
you don’t.
![Page 31: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/31.jpg)
Simplest Scenario
• Responding to user requests in real time• Frequently read data fits in memory• High read rate• No need to go to disk or scan lots of data
Datastore CPU is the bottleneck
![Page 32: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/32.jpg)
Cost of Executing a Primary Key Lookup
QueryCost
• Receiving message• Instantiating thread• Parsing query• Optimizing• Take out read locks• (Sometimes) lookup in
an index btree• Responding to the client
Actually retrieving data
![Page 33: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/33.jpg)
Options to Make This Fast
• Query cache• Prepared statements• Handler Socket
– 750K lookups/sec on a single server
Lesson: Primary key lookups can be fast no matter what datastore you use
![Page 34: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/34.jpg)
Flexibility vs. Performance
• We might want to pay the overhead for query flexibility
• In a primary key datastore, we can only ask queries on primary key
• SQL gives us flexibility to change our queries
![Page 35: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/35.jpg)
Durability
• Persistent datastores– Client: write– Server: flush to disk, then send “I completed your
write”– CRASH– Recover: See the write
• By default, MongoDB does not fsync() before returning to client on write– Need j:true
• By default, MySQL uses MyISAM instead of InnoDB
![Page 36: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/36.jpg)
Specialization
• You know your – query access patterns and traffic– consistency requirements
• Design specialized lookups– Transactional datastore for consistent data– Memcached for static, mostly unchanging content– Redis for a data processing pipeline
• Know what tradeoffs to make
![Page 37: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/37.jpg)
Ways to Scale
• Reads– Cache– Replicate– Partition
• Writes– Partition data amongst multiple servers
![Page 38: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/38.jpg)
Lots of Folklore
![Page 39: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/39.jpg)
Myths of Sharded Datastores
• NoSQL scales better than a relational database• You can’t do a JOIN on a sharded datastore
![Page 40: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/40.jpg)
MYTH: NoSQL scales better than a relational database
• Scaling isn’t about the datastore, it’s about the application– Examples: FriendFeed, Quora, Facebook
• Complex queries that go to all shards don’t scale
• Simple queries that partition well and use only one shard do scale
![Page 41: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/41.jpg)
Problem: Applications Look Up Data in Different Ways
![Page 42: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/42.jpg)
Post Page
SELECT * FROM commentsWHERE post_id = 100
zrange comments:100 0 -1
HA!
![Page 43: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/43.jpg)
Example Partitioned Database
Database
Database
Database
comments table
post_id
100-199
0-99
200-199
Webservers
![Page 44: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/44.jpg)
MySQL
MySQL
MySQL
Query Goes to One Partition
MySQL
MySQL
MySQL
Comments on post 100
100-199
0-99
200-299
![Page 45: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/45.jpg)
MySQL
MySQL
MySQL
Many Concurrent Queries
MySQL
MySQL
MySQL
Comments on post 100
100-199
0-99
200-299
Comments on post 52
Comments on post 289
![Page 46: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/46.jpg)
User Comments Page
Fetch all of a user's comments:
SELECT * FROM comments WHERE user = 'sam'
![Page 47: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/47.jpg)
Query Goes to All Partitions
MySQL
MySQL
MySQL
Query goes to all servers
100-199
0-99
200-299
Sam’s comments
![Page 48: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/48.jpg)
Costs for Query Go Up When Adding a New Server
MySQL
0-99
Sam’s comments
MySQL
100-199
MySQL
200-299
MySQL
300-399
![Page 49: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/49.jpg)
CPU Cost on Server of Retrieving One Row
Two costs: Query overhead and row retrieval
MySQL
QueryOverhead
97%
![Page 50: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/50.jpg)
Idea: Multiple Partitionings
Partition comments table on post_id and user.
Reads can choose appropriate copy so queries go to only one server.
Writes go to all copies.
![Page 51: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/51.jpg)
Multiple Partitionings
Database
Database
Database
comments table
post_id
100-199
0-99
200-199
user
K-R
A-J
S-Z
![Page 52: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/52.jpg)
All Read Queries Go To Only One Partition
MySQL
MySQL
MySQL
100-199
0-99
S-Z
Sam’s comments
Comments on post 100
![Page 53: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/53.jpg)
Writes Go to Both Table Copies
MySQL
MySQL
MySQL
100-199
0-99
S-Z
Write a new comment by Vicki on post
100
![Page 54: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/54.jpg)
Scaling
• Reads scale by N – the number of servers• Writes are slowed down by T – the number of
table copies• Big win if N >> T• Table copies use more memory
– Often don’t have to copy large tables – usually metadata
• How to execute these queries?
![Page 55: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/55.jpg)
Dixie
• Is a query planner which executes SQL queries over a partitioned database
• Optimizes read queries to minimize query overhead and data retrieval costs
• Uses a novel cost formula which takes advantage of data partitioned in different ways
![Page 56: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/56.jpg)
Wikipedia Workload with Dixie
3.2 X
Each query
going to 1 server
Many queries
going to all servers
Wikipedia• Database dump from 2008• Real HTTP traces• Sharded across 1, 2, 5, and 10 servers• Compare by adding a copy of the page
table (< 100 MB), sharded another way
![Page 57: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/57.jpg)
Myths of Sharded Datastores
• NoSQL scales better than a relational database• You can’t do a JOIN on a sharded datastore
![Page 58: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/58.jpg)
MYTH: You can’t execute a JOIN on a sharded database
• What are the JOIN keys? What are your partition keys?
• Bad to move lots of data• Not bad to lookup a few pieces of data• Index lookup JOINs expressed as primary key
lookups are fast
![Page 59: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/59.jpg)
Join Query
Fetch all of Alice's comments on Max's posts.
3 Alice First post!
comments table
post_id user text
6
7
22
Alice
Alice
Alice
Like.
You think?
Nice work!
1 Max http://…
posts table
id author link
3
22
37
Max
Max
Max
www.
Ask Hacker
http://…
![Page 60: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/60.jpg)
Join Query
Fetch all of Alice's comments on Max's posts.
3 Alice First post!
comments table
post_id user text
6
7
22
Alice
Alice
Alice
Like.
You think?
Nice work!
1 Max http://…
posts table
id author link
3
22
37
Max
Max
Max
www.
Ask Hacker
http://…
![Page 61: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/61.jpg)
Distributed Query Planning
• R*• Shore• The State of the Art in Distributed Query
Processing, by Donald Kossman• Gizzard
![Page 62: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/62.jpg)
Conventional Wisdom
Partition on JOIN keys, Send query as is to each server.
0-99 0-99
100-199
100-199
200-299
200-299
SELECT comments.text FROM posts, commentsWHERE comments.post_id = posts.idAND posts.author = 'Max'AND comments.user = 'Alice'
![Page 63: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/63.jpg)
…
Conventional Plan Scaling
MySQL
Alice’s comments on Max’s posts
MySQL MySQL
![Page 64: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/64.jpg)
Index Lookup JoinPartition on filter keys, retrieve Max's posts, then retrieve Alice's comments on Max's posts.
0-99
100-199
200-299
J-R
S-Z
100-199
200-299
J-R
S-Z
A-I0-99A-I
SELECT posts.id FROM postsWHERE posts.author = 'Max’
![Page 65: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/65.jpg)
0-99
100-199
200-299
J-R
S-Z
100-199
200-299
J-R
S-Z
A-I0-99A-I
ids of Max’s posts
Index Lookup Join
Partition on filter keys, retrieve Max's posts, then retrieve Alice's comments on Max's posts.
![Page 66: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/66.jpg)
0-99
100-199
200-299
J-R
S-Z
100-199
200-299
A-I
J-R
S-Z
0-99A-I
SELECT comments.text FROM commentsWHERE comments.post_id IN […]AND comments.user = 'Alice'
Index Lookup JoinPartition on filter keys, retrieve Max's posts, then retrieve Alice's comments on Max's posts.
![Page 67: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/67.jpg)
0-99
100-199
200-299
J-R
S-Z
100-199
200-299
A-I
J-R
S-Z
0-99A-I
Alice’s comments on Max’s posts
Index Lookup JoinPartition on filter keys, retrieve Max's posts, then retrieve Alice's comments on Max's posts.
![Page 68: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/68.jpg)
…MySQL
Alice’s comments on Max’s
posts
MySQL MySQL
Index Lookup Join Plan Scaling
![Page 69: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/69.jpg)
Comparison
Conventional Plan Index Lookup Plan
Intermediate data, Max’s posts Alice DIDN’T comment on
![Page 70: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/70.jpg)
Comparison
Conventional Plan Index Lookup Plan
Intermediate data, Max’s posts Alice DIDN’T comment on
![Page 71: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/71.jpg)
Dixie
• Cost model and query optimizer which predicts the costs of the two plans
• Query executor which executes the original query, designed for a single database, on a partitioned database with table copies
![Page 72: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/72.jpg)
Dixie's Predictions for the Two Plans
• Varying the size of the intermediate data• Max’s posts, Alice didn’t comment on
• Fixing the number of results returned• Partitioned over 10 servers
Conventional Plan, each
query going to all servers
Index Lookup Plan, each
query going to two servers
![Page 73: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/73.jpg)
Performance of the Two Plans
Blue beats yellow after this.Dixie predicts it!
![Page 74: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/74.jpg)
JOINs Are Not Evil
• When only transferring small amounts data• And with carefully partitioned data
![Page 75: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/75.jpg)
Lessons Learned
• Don’t worry at the beginning: Use what you know
• Not about SQL vs. NoSQL systems – you can scale any system
• More about complex vs. simple queries• When you do scale, try to make every query
use one (or a few) shards
![Page 76: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/76.jpg)
Research: Caching
• Applications use a cache to store results computed from a webserver or database rows
• Expiration or invalidation?• Annoying and difficult for app developers to
invalidate cache items correctly
Work in progress with Bryan Kate, Eddie Kohler, Yandong Mao, and Robert MorrisHarvard and MIT
![Page 77: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/77.jpg)
Solution: Dependency Tracking Cache
• Application indicates what data went into a cached result
• The system will take care of invalidations• Don’t need to recalculate expired data if it has
not changed• Always get the freshest data when data has
changed• Track ranges, even if empty!
![Page 78: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/78.jpg)
Example Application: Twitter
• Cache a user’s view of their twitter homepage (timeline)
• Only invalidate when a followed user tweets• Even if many followed users tweet, only
recalculate the timeline once when read• No need to recalculate on read if no one has
tweeted
![Page 79: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/79.jpg)
Challenges
• Invalidation + expiration times– Reading stale data
• Distributed caching• Cache coherence on the original database
rows
![Page 80: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/80.jpg)
Benefits
• Application gets all the benefits of caching– Saves on computation– Less traffic to the backend database
• Doesn’t have to worry about freshness
![Page 81: Executing Queries on a Sharded Database](https://reader033.vdocument.in/reader033/viewer/2022051412/54c2e7f44a7959896c8b4622/html5/thumbnails/81.jpg)
Summary
• Choosing a datastore• Dixie• Dependency tracking cache