pebblesdb: building key-value stores using fragmented log ... › ~vijay › papers ›...
TRANSCRIPT
![Page 1: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/1.jpg)
PebblesDB: Building Key-Value Stores using Fragmented Log
Structured Merge Trees
Pandian Raju1, Rohan Kadekodi1, Vijay Chidambaram1,2, Ittai Abraham2
1The University of Texas at Austin2VMware Research
![Page 2: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/2.jpg)
What is a key-value store?
• Store any arbitrary value for a given key
123
124
Keys{“name”:“JohnDoe”,“age”:25}
{“name”:“RossGel”,“age”:28}
Values
2
![Page 3: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/3.jpg)
What is a key-value store?
• Store any arbitrary value for a given key
• Insertions:• Point lookups:• Range Queries:
123
124
Keys{“name”:“JohnDoe”,“age”:25}
{“name”:“RossGel”,“age”:28}
Values
3
![Page 4: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/4.jpg)
What is a key-value store?
• Store any arbitrary value for a given key
• Insertions: put(key, value)• Point lookups:• Range Queries:
123
124
Keys{“name”:“JohnDoe”,“age”:25}
{“name”:“RossGel”,“age”:28}
Values
4
![Page 5: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/5.jpg)
What is a key-value store?
• Store any arbitrary value for a given key
• Insertions: put(key, value)• Point lookups: get(key)• Range Queries:
123
124
Keys{“name”:“JohnDoe”,“age”:25}
{“name”:“RossGel”,“age”:28}
Values
5
![Page 6: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/6.jpg)
What is a key-value store?
• Store any arbitrary value for a given key
• Insertions: put(key, value)• Point lookups: get(key)• Range Queries: get_range(key1, key2)
123
124
Keys{“name”:“JohnDoe”,“age”:25}
{“name”:“RossGel”,“age”:28}
Values
6
![Page 7: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/7.jpg)
Key-Value Stores - widely used
• Google’s BigTable powers Search, Analytics, Maps and Gmail• Facebook’s RocksDB is used as storage engine in production
systems of many companies
7
![Page 8: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/8.jpg)
Write-optimized data structures• Log Structured Merge Tree (LSM) is a write-optimized data structure
used in key-value stores• Provides high write throughput with good read throughput, but
suffers high write amplification
8
![Page 9: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/9.jpg)
• Log Structured Merge Tree (LSM) is a write-optimized data structure used in key-value stores • Provides high write throughput with good read throughput, but
suffers high write amplification• Write amplification - Ratio of amount of write IO to amount of user
data
KV-storeClient10GB
Userdata
IftotalwriteI/Ois200GB
Writeamplification=20
9
Write-optimized data structures
![Page 10: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/10.jpg)
• Inserted 500M key-value pairs• Key: 16 bytes, Value: 128 bytes• Total user data: ~45 GB
450
300
600
900
1200
1500
1800
2100
RocksDB LevelDB PebblesDB UserData
WriteIO(G
B)
Write amplification in LSM based KV stores
10
![Page 11: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/11.jpg)
• Inserted 500M key-value pairs• Key: 16 bytes, Value: 128 bytes• Total user data: ~45 GB
1868(42x)
1222(27x)
756(17x)
450
300
600
900
1200
1500
1800
2100
RocksDB LevelDB PebblesDB UserData
WriteIO(G
B)
11
Write amplification in LSM based KV stores
![Page 12: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/12.jpg)
Why is write amplification bad?
• Reduces the write throughput• Flash devices wear out after limited write cycles
(Intel SSD DC P4600 – can last ~5 years assuming ~5 TB write per day)
RocksDB can write ~500 GB of user data per day to a SSD to last 1.25 years
Data source: https://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/data-center-ssds/dc-p4600-series/dc-p4600-1-6tb-2-5inch-3d1.html12
![Page 13: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/13.jpg)
PebblesDB
Built using new data structure Fragmented Log-Structured Merge Tree
High performance write-optimized key-value store
Achieves 3-6.7x higher write throughput and 2.4-3xlesser write amplification compared to RocksDB
Gets the highest write throughput and least write amplification as a backend store to MongoDB
13
![Page 14: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/14.jpg)
Outline
• Log-Structured Merge Tree (LSM)• Fragmented Log-Structured Merge Tree (FLSM)• Building PebblesDB using FLSM• Evaluation• Conclusion
14
![Page 15: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/15.jpg)
Outline
• Log-Structured Merge Tree (LSM)• Fragmented Log-Structured Merge Tree (FLSM)• Building PebblesDB using FLSM• Evaluation• Conclusion
15
![Page 16: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/16.jpg)
Log Structured Merge Tree (LSM)
Data is stored both in memory and storage
Memory
Storage
In-memory
16
File1
![Page 17: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/17.jpg)
Writesaredirectlyputtomemory
In-memoryMemory
Storage
Write(key,value)
17
File1
Log Structured Merge Tree (LSM)
![Page 18: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/18.jpg)
Memory
File1
File2
In-memory data is periodically written as files to storage (sequential I/O)
In-memory
18
Storage
Log Structured Merge Tree (LSM)
![Page 19: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/19.jpg)
Files on storage are logically arranged in different levels
In-memoryMemory
Level0
Level1
Leveln
19
Storage
Log Structured Merge Tree (LSM)
![Page 20: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/20.jpg)
Compaction pushes data to higher numbered levels
In-memoryMemory
Level0
Level1
Leveln
20
Storage
Log Structured Merge Tree (LSM)
![Page 21: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/21.jpg)
Files are sorted and have non-overlapping key ranges
In-memoryMemory
1.…12 15….19 25….75 79….99
Searchusingbinarysearch
Level0
Level1
Leveln
21
Storage
Log Structured Merge Tree (LSM)
![Page 22: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/22.jpg)
Level 0 can have files with overlapping (but sorted) key ranges
In-memoryMemory
2….57 23….78Level0
Level1
Leveln
Limitonnumberoflevel0files
22
Storage
Log Structured Merge Tree (LSM)
![Page 23: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/23.jpg)
Write amplification: Illustration
Max files in level 0 is configured to be 2
Memory
2….37 23….48
1….12 15….25 39….62 77….95
Level0
Level1
Leveln
In-memory58….68
Level1re-writecounter:1
23
Storage
![Page 24: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/24.jpg)
Write amplification: Illustration
Level 0 has 3 files (> 2), which triggers a compaction
Memory
2….37 23….48
1….12 15….25 39….62 77….95
Level0
Level1
Leveln
58….68
In-memory
Level1re-writecounter:1
24
Storage
![Page 25: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/25.jpg)
Write amplification: Illustration
* Files are immutable * Sorted non-overlapping files
Memory
2….37 23….48
1….12 15….25 39….62 77….95
Level0
Level1
Leveln
58….68
In-memory
Level1re-writecounter:1
25
Storage
![Page 26: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/26.jpg)
Write amplification: Illustration
Set of overlapping files between levels 0 and 1
Memory
2….37 23….48
1….12 15….25 39….62 77….95
Level0
Level1
Leveln
58….68
In-memory
Level1re-writecounter:1
26
Storage
![Page 27: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/27.jpg)
Write amplification: Illustration
Memory
2….37 23….48
1….12 15….25 39….62 77….95
Level0
Level1
Leveln
58….68
In-memory
Level1re-writecounter:1
27
Storage
Set of overlapping files between levels 0 and 1
![Page 28: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/28.jpg)
Write amplification: Illustration
Memory
2….37 23….48
1….12 15….25 39….62 77….95
Level0
Level1
Leveln
58….68
In-memory
Level1re-writecounter:1
28
Storage
Set of overlapping files between levels 0 and 1
![Page 29: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/29.jpg)
1….2347….6824….461….68
Write amplification: Illustration
Compacting level 0 with level 1
Memory
2….37 23….48
1….12 15….25 39….62 77….95
Level0
Level1
Leveln
58….68
In-memory
Level1re-writecounter:1Level1re-writecounter:2
29
Storage
![Page 30: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/30.jpg)
Write amplification: Illustration
Level 0 is compacted
Memory
1….23 24….46 47….68 77….95
Level0
Level1
Leveln
In-memory
Level1re-writecounter:2
30
Storage
![Page 31: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/31.jpg)
Write amplification: Illustration
Data is being flushed as level 0 files after some Write operations
Memory
1….23 24….46 47….68 77….95
Level0
Level1
Leveln
10….3317….531….121
Level1re-writecounter:2
31
Storage
![Page 32: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/32.jpg)
Write amplification: Illustration
Compacting level 0 with level 1
Memory
1….23 24….46 47….68 77….95
Level0
Level1
Leveln
10….33 17….53 1….121
Level1re-writecounter:2
32
Storage
![Page 33: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/33.jpg)
92….12162….9031….601….30
Write amplification: Illustration
Memory
Level0
Level1
Leveln
1….121 Level1re-writecounter:2Level1re-writecounter:3
33
Storage
Compacting level 0 with level 1
![Page 34: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/34.jpg)
Write amplification: Illustration
Existing data is re-written to the same level (1) 3 times
Memory
1….30 31….60 62….90 92….121
Level0
Level1
Leveln
Level1re-writecounter:3
34
Storage
![Page 35: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/35.jpg)
Root cause of write amplification
Rewriting data to the same levelmultiple times
To maintain sorted non-overlapping files in each level
35
![Page 36: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/36.jpg)
Outline
• Log-Structured Merge Tree (LSM)• Fragmented Log-Structured Merge Tree (FLSM)• Building PebblesDB using FLSM• Evaluation• Conclusion
36
![Page 37: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/37.jpg)
Naïve approach to reduce write amplification
• Just append the file to the end of next level• Many (possibly all) overlapping files within a level
• Affects the read performance
1….89 6….915….65 9….99 1….102 1…2718….95Leveli
(all files have overlapping key ranges)
37
![Page 38: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/38.jpg)
Partially sorted levels
• Hybrid between all non-overlapping files and all overlapping files• Inspired from Skip-List data structure• Concrete boundaries (guards) to group together overlapping files
1….12 18….3113….34 42….65 72….8745….5640….47Leveli
(filesofsamecolorcanhaveoverlappingkeyranges)
38
13 35 70
![Page 39: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/39.jpg)
Fragmented Log-Structured Merge Tree
Novel modification of LSM data structure
Uses guards to maintain partially sorted levels
Writes data only once per level in most cases
39
![Page 40: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/40.jpg)
FLSM structure
Note how files are logically grouped within guards
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
40
Storage
![Page 41: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/41.jpg)
Guards get more fine grained deeper into the tree
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
41
Storage
FLSM structure
![Page 42: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/42.jpg)
How does FLSM reduce write amplification?
42
![Page 43: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/43.jpg)
In-memory
How does FLSM reduce write amplification?
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
15 70
40 7015 95
30….68
Max files in level 0 is configured to be 2
43
Storage
![Page 44: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/44.jpg)
2….1415….68
Compacting level 0
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
30….68
2….68
44
15
Storage
How does FLSM reduce write amplification?
![Page 45: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/45.jpg)
15….59
2….14 15….68
Fragmented files are just appended to next level
Memory
1….12
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15
40 7015 95
77….87 82….95
70
45
15
Storage
How does FLSM reduce write amplification?
![Page 46: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/46.jpg)
15….592….14 15….68
Guard 15 in Level 1 is to be compacted
Memory
1….12
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15
40 7015 95
77….87 82….95
70
15….68
46
Storage
How does FLSM reduce write amplification?
![Page 47: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/47.jpg)
15….3940….68
2….14
Files are combined, sorted and fragmented
Memory
1….12
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15
40 7015 95
77….87 82….95
70
15….68
47
40
Storage
How does FLSM reduce write amplification?
![Page 48: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/48.jpg)
15….39 40….68
2….14
Fragmented files are just appended to next level
Memory
1….12
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15
40 7015 95
77….87 82….95
70
48
40
Storage
How does FLSM reduce write amplification?
![Page 49: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/49.jpg)
FLSM maintains partially sorted levels to efficiently reduce the search space
How does FLSM reduce write amplification?
FLSM doesn’t re-write data to the same levelin most cases
How does FLSM maintain read performance?
49
![Page 50: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/50.jpg)
Selecting Guards
50
• Guards are chosen randomly and dynamically• Dependent on the distribution of data
![Page 51: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/51.jpg)
Selecting Guards
51
1 1e+9Keyspace
• Guards are chosen randomly and dynamically• Dependent on the distribution of data
![Page 52: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/52.jpg)
Selecting Guards
52
1 1e+9Keyspace
• Guards are chosen randomly and dynamically• Dependent on the distribution of data
![Page 53: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/53.jpg)
Selecting Guards
• Guards are chosen randomly and dynamically• Dependent on the distribution of data
53
1 1e+9Keyspace
![Page 54: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/54.jpg)
Operations: Write
FLSM structure
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
Put(1,“abc”)Write(key,value)
54
Storage
![Page 55: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/55.jpg)
FLSM structure
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
Get(23)
55
Storage
Operations: Get
![Page 56: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/56.jpg)
Search level by level starting from memory
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
Get(23)
56
Storage
Operations: Get
![Page 57: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/57.jpg)
All level 0 files need to be searched
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
Get(23)
57
Storage
Operations: Get
![Page 58: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/58.jpg)
Level 1: File under guard 15 is searched
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
Get(23)
58
Storage
Operations: Get
![Page 59: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/59.jpg)
Level 2: Both the files under guard 15 are searched
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
Get(23)
59
Storage
Operations: Get
![Page 60: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/60.jpg)
High write throughput in FLSM• Compaction from memory to level 0 is stalled• Writes to memory is also stalled
Memory
Storage1….37 18….48Level0
In-memory
2….98 23….48
Write(key,value)
Ifrateofinsertionishigherthanrateofcompaction,writethroughputdependsontherateofcompaction
60
![Page 61: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/61.jpg)
High write throughput in FLSM• Compaction from memory to level 0 is stalled• Writes to memory is also stalled
Memory
Storage1….37 18….48Level0
In-memory
2….98 23….48
Write(key,value)
Ifrateofinsertionishigherthanrateofcompaction,writethroughputdependsontherateofcompaction
61
FLSMhasfastercompaction becauseoflesserI/Oandhencehigherwritethroughput
![Page 62: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/62.jpg)
Challenges in FLSM
• Every read/range query operation needs to examine multiple files per level• For example, if every guard has 5 files, read latency is
increased by 5x (assuming no cache hits)
Trade-off between write I/O and read performance
62
![Page 63: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/63.jpg)
Outline
• Log-Structured Merge Tree (LSM)• Fragmented Log-Structured Merge Tree (FLSM)• Building PebblesDB using FLSM• Evaluation• Conclusion
63
![Page 64: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/64.jpg)
PebblesDB
• Built by modifying HyperLevelDB (±9100 LOC) to use FLSM• HyperLevelDB, built over LevelDB, to provide improved
parallelism and compaction• API compatible with LevelDB, but not with RocksDB
64
![Page 65: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/65.jpg)
Optimizations in PebblesDB
• Challenge (get/range query): Multiple files in a guard• Get() performance is improved using file level bloom filter
65
![Page 66: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/66.jpg)
Optimizations in PebblesDB
• Challenge (get/range query): Multiple files in a guard• Get() performance is improved using file level bloom filter
66
BloomfilterIskey25
present?Definitelynot
Possiblyyes
![Page 67: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/67.jpg)
Optimizations in PebblesDB
1….12 15….39 82….95Level1
15 70
BloomFilterBloomFilterBloomFilterBloomFilter
77….97 Maintainedin-memory
67
• Challenge (get/range query): Multiple files in a guard• Get() performance is improved using file level bloom filter
![Page 68: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/68.jpg)
Optimizations in PebblesDB
1….12 15….39 82….95Level1
15 70
BloomFilterBloomFilterBloomFilterBloomFilter
77….97 Maintainedin-memory
68
• Challenge (get/range query): Multiple files in a guard• Get() performance is improved using file level bloom filter
PebblesDBreads samenumberoffilesasanyLSMbasedstore
![Page 69: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/69.jpg)
Optimizations in PebblesDB
• Challenge (get/range query): Multiple files in a guard• Get() performance is improved using file level bloom filter• Range query performance is improved using parallel threads
and better compaction
69
![Page 70: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/70.jpg)
Outline
• Log-Structured Merge Tree (LSM)• Fragmented Log-Structured Merge Tree (FLSM)• Building PebblesDB using FLSM• Evaluation• Conclusion
70
![Page 71: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/71.jpg)
Evaluation
Micro-benchmarks
71
LowmemorySmalldataset
Crashrecovery
CPUandmemoryusage
Agedfilesystem
Realworldworkloads- YCSB
NoSQLapplications
![Page 72: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/72.jpg)
Evaluation
Micro-benchmarks
72
LowmemorySmalldataset
Crashrecovery
CPUandmemoryusage
Agedfilesystem
Realworldworkloads- YCSB
NoSQLapplications
![Page 73: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/73.jpg)
Real world workloads - YCSB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Hype
rLevelDB
• Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark• Insertions: 50M, Operations: 10M, key size: 16 bytes and value size: 1 KB
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
73
![Page 74: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/74.jpg)
35.08Ko
ps/s
25.8Kop
s/s
33.98Ko
ps/s
22.41Ko
ps/s
57.87Ko
ps/s
34.06Ko
ps/s
5.8Ko
ps/s
32.09Ko
ps/s
952.93GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Hype
rLevelDB
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
74
Real world workloads - YCSB• Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark• Insertions: 50M, Operations: 10M, key size: 16 bytes and value size: 1 KB
![Page 75: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/75.jpg)
35.08Ko
ps/s
25.8Kop
s/s
33.98Ko
ps/s
22.41Ko
ps/s
57.87Ko
ps/s
34.06Ko
ps/s
5.8Ko
ps/s
32.09Ko
ps/s
952.93GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Hype
rLevelDB
LoadA- 100%writesRunA - 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE - 100%writesRunE - 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
75
Real world workloads - YCSB• Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark• Insertions: 50M, Operations: 10M, key size: 16 bytes and value size: 1 KB
![Page 76: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/76.jpg)
35.08Ko
ps/s
25.8Kop
s/s
33.98Ko
ps/s
22.41Ko
ps/s
57.87Ko
ps/s
34.06Ko
ps/s
5.8Ko
ps/s
32.09Ko
ps/s
952.93GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Hype
rLevelDB
LoadA- 100%writesRunA - 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
76
Real world workloads - YCSB• Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark• Insertions: 50M, Operations: 10M, key size: 16 bytes and value size: 1 KB
![Page 77: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/77.jpg)
35.08Ko
ps/s
25.8Kop
s/s
33.98Ko
ps/s
22.41Ko
ps/s
57.87Ko
ps/s
34.06Ko
ps/s
5.8Ko
ps/s
32.09Ko
ps/s
952.93GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Hype
rLevelDB
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE - 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
77
Real world workloads - YCSB• Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark• Insertions: 50M, Operations: 10M, key size: 16 bytes and value size: 1 KB
![Page 78: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/78.jpg)
35.08Ko
ps/s
25.8Kop
s/s
33.98Ko
ps/s
22.41Ko
ps/s
57.87Ko
ps/s
34.06Ko
ps/s
5.8Ko
ps/s
32.09Ko
ps/s
952.93GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Hype
rLevelDB
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE - 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
78
Real world workloads - YCSB• Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark• Insertions: 50M, Operations: 10M, key size: 16 bytes and value size: 1 KB
![Page 79: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/79.jpg)
NoSQL stores - MongoDB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Wire
dTiger
• YCSB on MongoDB, a widely used key-value store• Inserted 20M key-value pairs with 1 KB value size and 10M operations
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
79
![Page 80: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/80.jpg)
20.73Ko
ps/s
9.95Kop
s/s
15.52Ko
ps/s
19.69Ko
ps/s
23.53Ko
ps/s
20.68Ko
ps/s
0.65Kop
s/s
9.78Kop
s/s
426.33GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Wire
dTiger
• YCSB on MongoDB, a widely used key-value store• Inserted 20M key-value pairs with 1 KB value size and 10M operations
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
80
NoSQL stores - MongoDB
![Page 81: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/81.jpg)
20.73Ko
ps/s
9.95Kop
s/s
15.52Ko
ps/s
19.69Ko
ps/s
23.53Ko
ps/s
20.68Ko
ps/s
0.65Kop
s/s
9.78Kop
s/s
426.33GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Wire
dTiger
• YCSB on MongoDB, a widely used key-value store• Inserted 20M key-value pairs with 1 KB value size and 10M operations
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
81
NoSQL stores - MongoDB
![Page 82: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/82.jpg)
20.73Ko
ps/s
9.95Kop
s/s
15.52Ko
ps/s
19.69Ko
ps/s
23.53Ko
ps/s
20.68Ko
ps/s
0.65Kop
s/s
9.78Kop
s/s
426.33GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Wire
dTiger
• YCSB on MongoDB, a widely used key-value store• Inserted 20M key-value pairs with 1 KB value size and 10M operations
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
82
NoSQL stores - MongoDB
![Page 83: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/83.jpg)
20.73Ko
ps/s
9.95Kop
s/s
15.52Ko
ps/s
19.69Ko
ps/s
23.53Ko
ps/s
20.68Ko
ps/s
0.65Kop
s/s
9.78Kop
s/s
426.33GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Wire
dTiger
• YCSB on MongoDB, a widely used key-value store• Inserted 20M key-value pairs with 1 KB value size and 10M operations
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
83
NoSQL stores - MongoDB
![Page 84: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/84.jpg)
20.73Ko
ps/s
9.95Kop
s/s
15.52Ko
ps/s
19.69Ko
ps/s
23.53Ko
ps/s
20.68Ko
ps/s
0.65Kop
s/s
9.78Kop
s/s
426.33GB
0
0.5
1
1.5
2
2.5
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Wire
dTiger
• YCSB on MongoDB, a widely used key-value store• Inserted 20M key-value pairs with 1 KB value size and 10M operations
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes
84
NoSQL stores - MongoDB
PebblesDBcombineslowwriteIOofWiredTigerwithhighperformanceofRocksDB
![Page 85: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/85.jpg)
Outline
• Log-Structured Merge Tree (LSM)• Fragmented Log-Structured Merge Tree (FLSM)• Building PebblesDB using FLSM• Evaluation• Conclusion
85
![Page 86: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/86.jpg)
Conclusion
• PebblesDB: key-value store built on Fragmented Log-Structured Merge Trees• Increases write throughput and reduces write IO at the same time• Obtains 6X the write throughput of RocksDB
• As key-value stores become more widely used, there have been several attempts to optimize them• PebblesDB combines algorithmic innovation (the FLSM data
structure) with careful systems building
86
![Page 87: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/87.jpg)
https://github.com/utsaslab/pebblesdb
![Page 88: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/88.jpg)
https://github.com/utsaslab/pebblesdb
![Page 89: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/89.jpg)
Backup slides
89
![Page 90: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/90.jpg)
Operations: Seek
• Seek(target): Returns the smallest key in the database which is >= target• Used for range queries (for example, return all entries
between 5 and 18)
Get(1)Level 0 – 1, 2, 100, 1000Level 1 – 1, 5, 10, 2000Level 2 – 5, 300, 500
90
![Page 91: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/91.jpg)
Operations: Seek
• Seek(target): Returns the smallest key in the database which is >= target• Used for range queries (for example, return all entries
between 5 and 18)
Seek(200)Level 0 – 1, 2, 100, 1000Level 1 – 1, 5, 10, 2000Level 2 – 5, 300, 500
91
![Page 92: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/92.jpg)
Operations: Seek
• Seek(target): Returns the smallest key in the database which is >= target• Used for range queries (for example, return all entries
between 5 and 18)
92
![Page 93: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/93.jpg)
Operations: Seek
FLSM structure
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
Seek(23)
93
Storage
![Page 94: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/94.jpg)
Operations: Seek
All levels and memtable need to be searched
Memory
2….37 23….48
1….12 15….59 77….87 82….95
2….8 15….2316….32 70….90 96….9945….65
Level0
Level1
Level2
In-memory
15 70
40 7015 95
Seek(23)
94
Storage
![Page 95: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/95.jpg)
Optimizations in PebblesDB
• Challenge with reads: Multiple sstable reads per level• Optimized using sstable level bloom filters• Bloom filter: determine if an element is in a set
BloomfilterIskey25
present?Definitelynot
Possiblyyes95
![Page 96: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/96.jpg)
Optimizations in PebblesDB
• Challenge with reads: Multiple sstable reads per level• Optimized using sstable level bloom filters• Bloom filter: determine if an element is in a set
1….12 15….39 82….95Level1
15 70
Get(97)True
BloomFilterBloomFilterBloomFilterBloomFilter
77….97 Maintainedin-memory
96
![Page 97: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/97.jpg)
Optimizations in PebblesDB
• Challenge with reads: Multiple sstable reads per level• Optimized using sstable level bloom filters• Bloom filter: determine if an element is in a set
1….12 15….39 82….95Level1
15 70
Get(97)False True
BloomFilterBloomFilterBloomFilterBloomFilter
77….97
97
![Page 98: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/98.jpg)
Optimizations in PebblesDB
• Challenge with reads: Multiple sstable reads per level• Optimized using sstable level bloom filters• Bloom filter: determine if an element is in a set
1….12 15….39 82….95Level1
15 70
BloomFilterBloomFilterBloomFilterBloomFilter
77….97
PebblesDBreads atmostonefileperguardwithhighprobability98
![Page 99: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/99.jpg)
Optimizations in PebblesDB• Challenge with seeks: Multiple sstable reads per level• Parallel seeks: Parallel threads to seek() on files in a guard
1….12 15….39 77….97 82….95Level1
15 70
Seek(85)
Thread1 Thread2
99
![Page 100: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/100.jpg)
Optimizations in PebblesDB• Challenge with seeks: Multiple sstable reads per level• Parallel seeks: Parallel threads to seek() on files in a guard• Seek based compaction: Triggers compaction for a level
during a seek-heavy workload• Reduce the average number of sstables per guard• Reduce the number of active levels
SeekbasedcompactionincreaseswriteI/O butasatrade-offtoimproveseekperformance
100
![Page 101: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/101.jpg)
Tuning PebblesDB
• PebblesDB characteristics like• Increase in write throughput,• decrease in write amplification and• overhead of read/seek operationall depend on one parameter, maxFilesPerGuard (default 2 in PebblesDB)
• Setting this to a very high value favors write throughput• Setting this to a very low value favors read throughput
101
![Page 102: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/102.jpg)
Horizontal compaction
• Files compacted within the same level for the last two levels in PebblesDB• Some optimizations to prevent huge increase in write IO
102
![Page 103: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/103.jpg)
Experimental setup
• Intel Xeon 2.8 GHz processor• 16 GB RAM• Running Ubuntu 16.04 LTS with the Linux 4.4 kernel• Software RAID0 over 2 Intel 750 SSDs (1.2 TB each)• Datasets in experiments 3x bigger than DRAM size
103
![Page 104: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/104.jpg)
Write amplification
7.2GB
100.7GB
756GB
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
10M 100M 500M
WriteIOra
tiowrtPe
bblesD
B
Numberofkeysinserted
• Inserted different number of keys with key size 16 bytes and value size 128 bytes
104
![Page 105: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/105.jpg)
Micro-benchmarks
11.72Ko
ps/s
6.89Kop
s/s
7.5Ko
ps/s
0
0.5
1
1.5
2
2.5
3
Random-Writes Reads Range-Queries
Throug
hputra
tiowrtHy
perLevelDB
Benchmark
• Used db_bench tool that ships with LevelDB• Inserted 50M key-value pairs with key size 16 bytes and value size 1 KB• Number of read/seek operations: 10M
105
![Page 106: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/106.jpg)
Micro-benchmarks
239.05Kop
s/s
11.72Ko
ps/s
6.89Kop
s/s
7.5Ko
ps/s
126.2Ko
ps/s
0
0.5
1
1.5
2
2.5
3
Seq-Writes Random-Writes Reads Range-Queries Deletes
Throug
hputra
tiowrtHy
perLevelDB
Benchmark
• Used db_bench tool that ships with LevelDB• Inserted 50M key-value pairs with key size 16 bytes and value size 1 KB• Number of read/seek operations: 10M
106
![Page 107: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/107.jpg)
Multi threaded micro-benchmarks
44.4Kop
s/s
40.2Kop
s/s
38.8Kop
s/s
0
0.5
1
1.5
2
2.5
Writes Reads MixedThroug
hputra
tiowrtHy
perLevelDB
Benchmark
• Writes – 4 threads each writing 10M• Reads – 4 threads each reading 10M• Mixed – 2 threads writing and 2 threads reading (each 10M)
107
![Page 108: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/108.jpg)
Small cached dataset• Insert 1M key-value pairs with 16 bytes key and 1 KB value• Total data set (~1 GB) fits within memory• PebblesDB-1: with maximum one file per guard
108
45.25Ko
ps/s
205.76Kop
s/s
205.34Kop
s/s
0
0.5
1
1.5
2
2.5
Writes Reads Range-QueriesThroug
hputra
tiowrtHy
perLevelDB
Benchmark
![Page 109: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/109.jpg)
Small key-value pairs• Inserted 300M key-value pairs• Key 16 bytes and 128 bytes value
109
44.48Ko
ps/s
6.34Kop
s/s
6.31Kop
s/s
0
0.5
1
1.5
2
2.5
3
3.5
Writes Reads Range-Queries
Throug
hputra
tiowrtHy
perLevelDB
Benchmark
![Page 110: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/110.jpg)
Aged FS and KV store
17.37Ko
ps/s
5.65Kop
s/s
6.29Kop
s/s
0
0.5
1
1.5
2
2.5
Writes Reads Range-Queries
Throug
hputra
tiowrtHy
perLevelDB
Benchmark
• File system aging: Fill up 89% of the file system• KV store aging: Insert 50M, delete 20M and update 20M key-value
pairs in random order
110
![Page 111: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/111.jpg)
Low memory micro-benchmark
27.78Ko
ps/s
2.86Kop
s/s
4.37Kop
s/s
0
0.5
1
1.5
2
2.5
Writes Reads Range-Queries
Throug
hputra
tiowrtHy
perLevelDB
Benchmark
• 100M key-value pairs with 1KB (~65 GB data set)• DRAM was limited to 4 GB
111
![Page 112: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/112.jpg)
Impact of empty guards
• Inserted 20M key-value pairs (0 to 20M) in random order with value size 512 bytes• Incrementally inserted new 20M keys after deleting the older
keys• Around 9000 empty guards at the start of the last iteration• Read latency did not reduce with the increase in empty
guards
112
![Page 113: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/113.jpg)
22.08Ko
ps/s
21.85Ko
ps/s
31.17Ko
ps/s
32.75Ko
ps/s
38.02Ko
ps/s
7.62Kop
s/s
0.37Kop
s/s
19.11Ko
ps/s
1349.5GB
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
LoadA RunA RunB RunC RunD LoadE RunE RunF TotalIO
Throug
hputra
tiowrt
Hype
rLevelDB
• HyperDex – distributed key-value store from Cornell• Inserted 20M key-value pairs with 1 KB value size and 10M operations
LoadA- 100%writesRunA- 50%reads,50%writesRunB- 95%reads,5%writesRunC- 100%reads
RunD- 95%reads(latest),5%writesLoadE- 100%writesRunE- 95%rangequeries,5%writesRunF- 50%reads,50%read-modify-writes 113
NoSQL stores - HyperDex
![Page 114: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/114.jpg)
CPU usage
• Median CPU usage by inserting 30M keys and reading 10M keys• PebblesDB: ~171%• Other key-value stores: 98-110%• Due to aggressive compaction, more CPU operations due to
merging multiple files in a guard
114
![Page 115: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/115.jpg)
Memory usage
• 100M records (16 bytes key, 1 KB value) – 106 GB data set• 300 MB memory space• 0.3% of data set size
• Worst case: 100M records (16 bytes key, 16 bytes value) ~3.2 GB• 9% of data set size
115
![Page 116: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/116.jpg)
Bloom filter calculation cost
• 1.2 sec per GB of sstable• 3200 files – 52 GB – 62 seconds
116
![Page 117: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/117.jpg)
Impact of different optimizations
• Sstable level bloom filter improve read performance by 63%• PebblesDB without optimizations for seek – 66%
117
![Page 118: PebblesDB: Building Key-Value Stores using Fragmented Log ... › ~vijay › papers › pebblesdb-sosp17-slides.pdfPebblesDB: Building Key-Value Stores using Fragmented Log Structured](https://reader033.vdocument.in/reader033/viewer/2022060510/5f270cb303a0e520762bbfd1/html5/thumbnails/118.jpg)
Thank you!Questions?
118