fragmentation in in-line deduplication backup systems 1. reducing impact of data fragmentation...
TRANSCRIPT
![Page 1: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/1.jpg)
Fragmentation in in-line deduplication backup systems
1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin Barczynski, Wojciech Kilian, Cezary Dubnicki . Systor 2012.
2. Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication, Mark Lillibridge and Kave Eshghi and Deepavali Bhagwat. FAST 2013
5/6/2013
Speaker: Oren Kishon, Advanced Topics in Storage Systems, Graduate Seminar
External slides taken from FAST 13 websitehttps://www.usenix.org/conference/fast13/improving-restore-speed-backup-systems-use-inline-chunk-based-deduplication
![Page 2: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/2.jpg)
Talk outline
The problemThe new ideas for solutionsExperimental resultsSummary / discussion / questions
![Page 3: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/3.jpg)
Talk outline
The problemThe new ideas for solutionsExperimental resultsSummary / discussion / questions
![Page 4: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/4.jpg)
The problem: Fragmentation
In-line deduplication: handle blocks as they come, without post-processing:
If block already exist: only update dedup dataIf new: “throw” at the end of the current data set.
At restore time, the data is already very fragmented, and so the restore is slow.
![Page 5: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/5.jpg)
The problem: Fragmentation
![Page 6: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/6.jpg)
The problem: Fragmentation
![Page 7: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/7.jpg)
The problem: Fragmentation
![Page 8: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/8.jpg)
The problem: Fragmentation
![Page 9: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/9.jpg)
The problem: Fragmentation
![Page 10: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/10.jpg)
The problem: Fragmentation
![Page 11: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/11.jpg)
The problem: Fragmentation
![Page 12: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/12.jpg)
The problem: Fragmentation
![Page 13: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/13.jpg)
The problem: Fragmentation
![Page 14: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/14.jpg)
Talk outline
The problemThe new ideas for solutionsExperimental resultsSummary / discussion / questions
![Page 15: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/15.jpg)
The two new ideas
Trade off deduplication for lower fragmentation. In other words: write more, but read faster.
Caching: smarter algorithm than LRU.
![Page 16: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/16.jpg)
The two new ideas
Trade off deduplication for lower fragmentation. In other words: write more, but read faster.
Caching: smarter algorithm than LRU.
![Page 17: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/17.jpg)
Trade off deduplication
9LivesData: “Context based Rewriting” (CBR)
HP: “Capping”
![Page 18: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/18.jpg)
Trade off deduplication
9LivesData: “Context based Rewriting” (CBR)
HP: “Capping”
![Page 19: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/19.jpg)
Context base rewriting (CBR)
Each dup:
Stream context(5MB)
Disk context(2MB)
![Page 20: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/20.jpg)
Context base rewriting (CBR)
Large intersection:no need to rewrite.
Small intersection:Rewriting will speed-up restore time.
![Page 21: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/21.jpg)
Context base rewriting (CBR)
Reaching a decision
Metric used - “Rewrite Utility” (for context): disk blocks not in stream / total disk blocks
Two thresholds:1. Global: rewrite-utility > 70%2. Adjusting: rewrite-utility > 95% of other scores so far.
5% limit: if 5% actually rewritten, stop rewriting.
![Page 22: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/22.jpg)
Trade off deduplication
9LivesData: “Context based Rewriting” (CBR)
HP: “Capping”
![Page 23: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/23.jpg)
Capping
Input: Break into 20 MB segments.
Disk: Read/seek chunks. Chunk container: 4MB.
Metric used: “Fragmentation” = #containers read (at restore) / MB actually restored
Deduplicate each segment against at most T containers. At cost of not deduplicating some chunks... Fragmentation is limited to (T+5)/20 MB
![Page 24: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/24.jpg)
Capping
![Page 25: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/25.jpg)
Capping
![Page 26: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/26.jpg)
Trade off deduplication
The major difference
CBR Capping
Focus on limiting:Rewriting
(lose of deduplication)
Fragmentation
Don't limit: Fragmentation Rewriting
![Page 27: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/27.jpg)
The two new ideas
Trade off deduplication for lower fragmentation. In other words: write more, but read faster.
Caching: smarter algorithm than LRU.
![Page 28: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/28.jpg)
Better caching:Forward assembly area
An insight: LRU and other cache algorithms solve a problem of non-deterministic requests, while deduplicated data restoration is totally deterministic.
![Page 29: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/29.jpg)
Better caching
![Page 30: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/30.jpg)
Better caching
![Page 31: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/31.jpg)
Better caching
![Page 32: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/32.jpg)
Better caching
![Page 33: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/33.jpg)
Better caching
![Page 34: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/34.jpg)
Better caching
![Page 35: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/35.jpg)
Better caching
![Page 36: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/36.jpg)
Better caching
![Page 37: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/37.jpg)
Better caching
![Page 38: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/38.jpg)
Better caching
![Page 39: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/39.jpg)
Better caching:Forward assembly area
Wrap up: Knowledge of the future helps us decide what to keep / throw from cache (no need to make
assumptions). Caching in sizes of chunks (rather than containers) saves space and later copying.
![Page 40: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/40.jpg)
Talk outline
The problemThe new ideas for solutionsExperimental resultsSummary / discussion / questions
![Page 41: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/41.jpg)
Experimental results
CBR: up to X2 restore time, 1-5% dedup loss Capping: 2-6X, 8% dedup loss Forward assembly area: 2-4X
![Page 42: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/42.jpg)
Experimental results
CBR: up to X2 restore time, 1-5% dedup loss Capping: 2-6X, 8% dedup loss Forward assembly area: 2-4X
But:They had different metrics of “restore speed”And:CBR was tested on some sets of 14 updates. Capping/Assembly on sets of hundreds...
![Page 43: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/43.jpg)
Talk outline
The problemThe new ideas for solutionsExperimental resultsSummary / discussion / questions
![Page 44: Fragmentation in in-line deduplication backup systems 1. Reducing Impact of Data Fragmentation Caused By In-Line Deduplication. Michal Kaczmarczyk, Marcin](https://reader036.vdocument.in/reader036/viewer/2022081520/5697c0021a28abf838cc2cd3/html5/thumbnails/44.jpg)
Summary
In-line deduplication has a downfall of fragmentation causing slow restore speeds.
The restore speed can be greatly increased by losing only little deduplication, and smarter caching.
Questions? Comments?