liquid a scalable deduplication file system for virtual machine images
DESCRIPTION
liquid a scalable deduplication file system for virtual machine imagesTRANSCRIPT
![Page 1: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/1.jpg)
Liquid : A Scalable Deduplication File System For Virtual Machine Images
GUIDED BYAP: REMYA DEPT OF COMPUTER SCIENCE & ENGINEERING
SUBMITTED BY SANOJ A S ROLL NO: R11U016 S7 CSE
![Page 2: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/2.jpg)
2
CONTENTS
INTRODUCTION VIRTUAL MACHINE DEDUPLICATION ISSUES IN VM STORAGE LIQUID SYSTEM ARCHITECTURE COMMUNICATION AMONG COMPONENTS HEART BEAT PROTOCOL DEDUPLICATION IN LIQUID OPTIMIZATIONS ON FINGER PRINT CALCULATION STORAGE FOR DATA BLOCKS ADVANTAGES OF LIQUID CONCLUSION
![Page 3: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/3.jpg)
3
INTRODUCTION
Cloud computing means storing and accessing data programs over internet instead of yours computers hard drive.
![Page 4: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/4.jpg)
4
VIRTUAL MACHINE
Saving as a critical component in cloud computing.
Virtual Machine - Hypothetical Computer.
Emulates the functions of a real world computer.
Executes programs like a physical machine.
Initial state of a virtual machine is stored in a file called virtual Machine image.
![Page 5: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/5.jpg)
5
VIRTUAL MACHINE
![Page 6: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/6.jpg)
6
DEDUPLICATION
Data Deduplication – data compression technology.
Eliminates duplicate copies of repeating data.
A redundant data block is replaced instead of storing multiple times.
Improves storage utilization
![Page 7: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/7.jpg)
7
DEDUPLICATION
![Page 8: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/8.jpg)
8
ISSUES IN VM STORAGE
High demand on VM storage remains a challenging problem.
Existing systems have made efforts to reduce storage consumption.
Uses SAN cluster.
Cannot satisfy increasing demand due to cost limitation.
Hence we propose LIQUID.
![Page 9: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/9.jpg)
9
LIQUID SYSTEM ARCHITECTURE
Three components - Single meta server with hot back up multiple data server and multiple clients.
Runs on user-level service process.
VM images are split into fixed size data blocks.
Meta server – namespace , finger print , reference count.
Meta server – mirrored to hot back up shadow meta server.
![Page 10: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/10.jpg)
10
LIQUID SYSTEM ARCHITECTURE (CONT)
Data servers – change of managing data blocks in VM images.
Organized in a distributed hash table.
A liquid client provides a POSIX compatible file system.
Client – critical component (provides deduplication)
Fault tolerance – Mirroring the meta server.
Replicas of data blocks are stored.
![Page 11: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/11.jpg)
11
LIQUID SYSTEM ARCHITECTURE (CONT)
Shadow Meta Server
Meta server
Data Servers
Client FS
Client FS
Client FS
CacheCache Cache
Heart beat
Fig : Liquid architecture.
Hot backup
![Page 12: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/12.jpg)
12
COMMUNICATION AMONG COMPONENTS HEART BEAT PROTOCOL
META SERVER-manages all data servers.
Exchange regular heart beat message with each data server in a ROUND ROBIN FASHION.
Detect failed data servers when there are many data servers.
To speed up failure detection data servers send an error signal to meta server.
![Page 13: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/13.jpg)
13
DEDUPLICATION IN LIQUID
Liquid chooses fixed size chunking instead of variable size chunking. Better since all files stored in VM images will be aligned on disk
block boundaries.
Advantage-simplicity.
Block size choice.
Block size- balancing factor which is hard to choose.
Great impact on both deduplication and io performance.
![Page 14: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/14.jpg)
14
DEDUPLICATION IN LIQUID(CONT)
Smaller block size-more random seeks when accessing a VM image.
Not tolerable. A large block size is also not preferable, it will reduce
deduplication ratio.
Liquid choose different block size under different situation. Advised to use a multiplication of 4 kb between 256 kb and 1
MB to achieve good balance between IO performance and deduplication ratio.
![Page 15: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/15.jpg)
15
DEDUPLICATION IN LIQUID(CONT)
![Page 16: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/16.jpg)
16
DEDUPLICATION IN LIQUID(CONT)
![Page 17: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/17.jpg)
17
OPTIMIZATIONS ON FINGER PRINT CALCULATION
Rely on comparison of data block finger prints for redundancy.
Finger print-collision resistant hash value calculated from data block contents.
MD5[26] and SHA-1[12] are frequently used for this purpose.
Finger print collision - very small, orders of magnitude smaller than hardware error rates.
![Page 18: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/18.jpg)
OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)
So we could safely assume that two data blocks are identical.
Finger print calculation - expensive.
Delays finger print calculation for recently modified data blocks.
Runs deduplication lazily only when it is necessary.
Client side maintains a shared cache which contains recently accessed data blocks. 18
![Page 19: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/19.jpg)
19
OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)
A portion of memory is used by the client side of liquid as private cache.
Private cache hold-modified data blocks and delay finger print calculation on them.
Modified data block ejected from->shared cache and added to ->private cache.
Modified data will be ejected->if private cache becomes full.
![Page 20: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/20.jpg)
20
OPTIMIZATIONS ON FINGER PRINT CALCULATION (CONT)
And ejected based on LRU policy.
Only then will the modified data block’s finger print be calculated.
Liquid uses multiple threads for finger print calculation.
Multiple threads will process different data blocks currently.
Provides good IO performance.
![Page 21: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/21.jpg)
21
FILE SYSTEM LAY OUT
All file system meta data are stored on the meta server.
Organized in a file system tree.
Client side could cache portions of file system meta data for
fast accesses.
When a VM is stopped ,modified meta data and data blocks
Will be pushed back to meta server.
Data servers ensures modification on VM image is visible to
other client nodes.
![Page 22: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/22.jpg)
22
FILE SYSTEM LAY OUT
Fig. Process of look-up by fingerprint.
![Page 23: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/23.jpg)
23
ADVANTAGES OF LIQUID
Fast Virtual Machine deployment with peer to peer data transfer.
Low storage consumption by means of deduplication.
Instant cloning for virtual machine images.
On demand fetching through a network caching with local disks.
LIQUID files has no specific limit.
![Page 24: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/24.jpg)
24
CONCLUSION
Presented LIQUID which is a deduplication file system with good IO performance.
Achieved by caching frequently accessed data blocks in memory cache.
Avoids additional disk operations.
Deduplication of VM images proved to be effective.
![Page 25: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/25.jpg)
25
REFERENCES
Bloom Filter, Sept. 2011. [Online]. Available :http://en.wikipedia.org/wiki/Bloom_filter
Filesystem in Userspace, Sept. 2011. [Online]. Available: http://fuse.sourceforge.net/
Rabin Fingerprint, Sept. 2011. [Online]. Available: http://en.wikipedia.org/wiki/Rabin_fingerprint.
Reiserfs, Sept. 2011. [Online]. Available: http://en.wikipedia.org/wiki/ReiserFS.
Xfs: A High-Performance Journaling Filesystem, Sept. 2011. [Online]. Available: http://oss.sgi.com/projects/xfs/.
Data Deduplication, Sept. 2013. [Online]. Available: http://en.wikipedia.org/wiki/Data_deduplication.
![Page 26: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/26.jpg)
![Page 27: liquid a scalable deduplication file system for virtual machine images](https://reader036.vdocument.in/reader036/viewer/2022062510/547a658eb379594e2b8b49ec/html5/thumbnails/27.jpg)