btrfs specific dedup liu bo · 2017. 12. 14. · default dedup(bs=4k) dedup(bs=8k) dedup(bs=64k)...
TRANSCRIPT
Btrfs Specific Dedup
Liu Bo
Why btrfs needs dedup?
What is Dedup?
Dedup
• A specialized compression technique
• Elimate duplicate copies
• Improve storage utilization
But we already have
Compression?
A Good FS For Backup!
Btrfs:● Cow B+tree● 2^64 byte == 16 EiB maximum file size● Dynamic inode allocation● Checksum on both data and metadata● Compression(zlib, lzo supported)● Integrated multiple device support● Subvolume, writable/readonly snapshot● Send/receive● Etc
Btrfs Deduplication:● Inline● Bock level
Back Reference:
●Fingerprint●
●Hash algorithm:Crc32c vs sha256
●B+tree: dedup tree
●Keys: dedup keys
Dedup Engine:● Dedup is a filter of IO as
compression● Take a bunch of locked pages to
process● Asynchronous helper thread, aim to
work across all online processors
Flexible Control:● Register (create the dedup tree)● Unregister (delete the out-of-date
dedup tree)● Mount options
– "-o dedup"– "-o dedup_bs=xxx", eg. 4k, 128k
Conclusion:● Transparent dedup● Synchronous, block level● Compression support● Tunable granularity, ie. dedup
blocksize● Not default, easy to control
Limit:
● Effective on backup, virtualization● Ineffective on structured data
Performance
default dedup(bs=4k) dedup(bs=8k) dedup(bs=64k) dedup(bs=128)0
100
200
300
400
500
600
700
85.9
136163
195 199
88.7
155175
199
243
83.8
178
440
602
6481G Zero Write(compress: OFF)
First write
Backup-1
Backup-2
Performance, cont
default dedup(bs=4k) dedup(bs=8k) dedup(bs=64k) dedup(bs=128)0
100
200
300
400
500
600
700
800
900
323
136163
195 199
327
154175
198 202
843
155
207239
290
1G Zero Write(compress: ON)
First write
Backup-1
Backup-2
Demo
Known Issues:
● ENOSPC● A byte to byte comparison
QA
Reference● http://en.wikipedia.org/wiki/Data_deduplication
● - http://media.netapp.com/documents/tr-3505.pdf
● - http://www.druva.com/blog/2009/01/09/understanding-data-deduplication
● - https://btrfs.wiki.kernel.org/index.php/Main_Page
● - https://communities.netapp.com/community/netapp-blogs/drdedupe/blog/2010/04/07/how-netapp-deduplication-works--a-primer
● - http://en.wikipedia.org/wiki/Fingerprint_%28computing%29