thesis presentation
DESCRIPTION
TRANSCRIPT
![Page 1: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/1.jpg)
Measuring and Benchmarking Personal Clouds
Advisors: Dr. Pedro García López Dr. Marc Sanchez Artigas
M. Sc. Thesis Presentation
Cristian Cotes González
![Page 2: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/2.jpg)
Contents
1. Introduction
2. Background and Related Work
3. Measuring Personal Cloud Services
4. Benchmarking Personal Clouds Synchronization
5. Conclusions and Future Work
![Page 3: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/3.jpg)
Introduction
![Page 4: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/4.jpg)
Introduction
● 3S Personal Cloud definition:
The Personal Cloud is a unified digital locker for our personal data offering three key services: Storage, Synchronization and Sharing.
● Well-known Personal Clouds: Dropbox, Box, Google Drive...
![Page 5: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/5.jpg)
Introduction
● Little is known about the architecture of commercial Personal Cloud solutions.
● Open source solutions don’t met all the Personal Cloud requirements.
● Solution: We developed StackSync, an open source Personal Cloud.
![Page 6: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/6.jpg)
Motivation
● Very little is known about the Quality of Service (QoS) of Personal Clouds.
● After two years developing StackSync, we wanted to compare it with private and commercial solutions to understand how it performs.
● To make this comparison we needed:
○ Simulate user behavior.
○ Use a benchmarking framework for Personal Clouds.
![Page 7: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/7.jpg)
Contributions
● Analysis of the state-of-the-art of Personal Clouds.
● Measurement of Personal Cloud services (QoS).
● Improvement of an existing benchmarking framework.
● Generate realistic traces to simulate user behavior.
● Benchmarking of Personal Clouds synchronization protocol.
![Page 8: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/8.jpg)
Publications
● Raúl Gracia Tinedo, Marc Sánchez Artigas, Adrián Moreno Martínez, Cristian Cotes and Pedro García López. "Actively Measuring Personal Cloud Storage". In the 6th IEEE International Conference on Cloud Computing. 2013, Santa Clara Marriott, CA, USA.
● Pedro García López, Marc Sánchez Artigas, Sergi Toda and Cristian Cotes. "StackSync: Bringing Elasticity to Dropbox-like File Synchronization". ( Submitted to the 15th International Middleware Conference. December, 2014, Bordeaux, France).
![Page 9: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/9.jpg)
Background and Related Work
![Page 10: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/10.jpg)
Open source Personal Clouds
ownCloud● Uses a pull strategy to synchronize files.
● WebDAV protocol to discover new changes.
SparkleShare● Built on top of Git.
● Push notifications.
● Not prepared to process large binary files.
Syncany● Discover changes pulling the server.
● Metadata stored in files.
![Page 11: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/11.jpg)
StackSync
None of the current open source solutions fits well in a Personal Cloud definition. For this reason we developed StackSync.
StackSync is an open source Personal Cloud that synchronizes, stores and shares files.
![Page 12: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/12.jpg)
StackSync Architecture
● StackSync can be divided into four main blocks:
○ Clients: Synchronize files data and metadata (file size, filename...)
○ Sync service: Receives and process clients metadata. Also, notify them new changes.
○ Storage backend: Stores data files.
○ Communication middleware: Used to exchange metadata between clients and the sync service.
![Page 13: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/13.jpg)
Related Work
● Measurements and Benchmarks
○ Performance evaluation of Cloud services is a current hot topic.
○ Few works have turned attention to measure the performance of Cloud storage services.
● Synchronization Algorithms
○ Little is known about the design and implementation of commercial sync protocols.
○ Recent works from Idilio Drago characterize Dropbox: Inside Dropbox: Understanding Personal Cloud Storage Services.
![Page 14: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/14.jpg)
Measuring Personal Clouds Services
![Page 15: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/15.jpg)
Methodology and Platform
● Measure performance of: Dropbox, Box and SugarSync.
● Based on REST API.
● Two different platforms:○ University laboratories: 30 machines.
○ PlanetLab: 40 nodes divided into two geographic regions (Western Europe and North America)
![Page 16: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/16.jpg)
Workload Model
● Up/Down Workload:○ Objective: Measure up/down transfer speed.○ Upload files until the account is full.○ If the account is full: download and delete all files.
● Service Variability Workload:○ Objective: Maintain every node with a continuous transfer flow
to analyze the variability of the service over time.○ Each node had two threads:
■ Upload thread: Upload files continuously and delete some files when the account is full.
■ Download thread: Download files continuously.
![Page 17: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/17.jpg)
Transfer Speed: Download
● Dropbox and Box present a download speed faster than SugarSync.
● Dropbox exhibits the best download speed.
● SugarSync download transfer speed is constant and low.
● Small range of download bandwidth ([200,1300] KB/sec)
![Page 18: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/18.jpg)
Transfer Speed: Upload
● As in download, Dropbox and Box present an upload speed faster than SugarSync.
● Distributions present irregular shapes.
● Box presents the fastest upload.
● Upload transfer capacity better than download capacity due to pricing policies of Cloud providers (inbound traffic is free while outbound traffic is not).
![Page 19: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/19.jpg)
Transfer & Geographic location
● Results obtained during 3 weeks executing the up/down workload in PlanetLab.
● Better QoS in North America than in European countries due to datacenters location.
![Page 20: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/20.jpg)
Variability over time: SugarSync and Box
● Results obtained from the Service Variability workload.
● Box exhibits a stable service for downloads but upload transfer speed varies significantly.
● SugarSync exhibits a stable service for uploads and downloads.
● Downloads are more reliable and predictable.
![Page 21: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/21.jpg)
Variability over time: Dropbox
● Dropbox exhibits daily upload speed patterns.
● Upload transfer speed during nights is between 15% to 35% higher than during diurnal hours.
![Page 22: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/22.jpg)
Benchmarking Personal Clouds Synchronization
![Page 23: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/23.jpg)
Traces
● As there were no public traces containing files and the history of modifications we developed a trace generator.
● File size: We use the distribution presented in the article Understanding data characteristics and access patterns in a cloud storage system.
● 90% of the files are smaller than 4MB.
● To imitate real behavior of users, we create three different actions:○ ADD: File creation.○ UPDATE: File modification.○ REMOVE: File removal.
![Page 24: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/24.jpg)
Traces
● To determine the action to be performed, we applied the Markov Model proposed in Generating realistic datasets for deduplication analysis.
● We use the probabilities from the “Homes” dataset proposed in the same article.
● To modify a file, the tool supports 3 modification types:○ B: Beginning of the file○ E: End of the file○ M: Middle of the file
● Only files smaller than 4MB are modified.
![Page 25: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/25.jpg)
Traces
● The trace used for these experiments contains:○ 940 ADDs that generate a total data of 535 MB.
○ 72 UPDATEs
○ 228 REMOVEs
● The average file size is 583 KB.
![Page 26: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/26.jpg)
Benchmarking Framework
● Proposed by Drago et al. in the article “Benchmarking Personal Cloud Storage”
● As the initial tool was too simple, we implemented new functionalities to capture traffic while executing the generated trace.
● The test measures the overhead of the different file syncing protocols.
![Page 27: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/27.jpg)
Protocol Overhead
● In this test we compared the protocol overhead of StackSync with other commercial services.
● StackSync has a low overhead compared with Dropbox or Google Drive, which are the services with more overhead.
● Dropbox exhibits the highest overhead, sending up to 150 MB of additional traffic.
![Page 28: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/28.jpg)
StackSync vs Dropbox
● For a deeper understanding of the overhead, we run other experiments only for Dropbox and StackSync.
● This test grouped all the actions of the same type to generate 3 separate traces.
● In this image is depicted the overhead ratio generated by the storage traffic.
● For ADDs, StackSync transferred a total of 565 MB while Dropbox needed 660 MB.
● For UPDATEs, StackSync is negatively affected by static chunking mechanisms.
![Page 29: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/29.jpg)
StackSync vs Dropbox
● In this image is depicted the amount of MBytes generated by the control traffic.
● Dropbox produces a huge amount of control traffic when adding new files: 25 MB
● StackSync only needs 3.2 MB to add all the files.
● In UPDATEs and REMOVEs actions, Dropbox exhibits higher amounts of traffic than StackSync.
![Page 30: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/30.jpg)
StackSync vs ownCloud
● Unlike StackSync, ownCloud uses a pull-based synchronization protocol.
● In this test, we used 2 PCs:○ Uploader: Execute the trace.○ Downloader: Synchronize files
uploaded by the uploader.
● StackSync (Push):○ Uploader: 20 KB/min○ Downloader: 10 KB/min
● ownCloud (Pull):○ Uploader: 600-800 KB/min○ Downloader: 100-300 KB/min
![Page 31: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/31.jpg)
StackSync: Synchronization time
● We analyzed deeply synchronization time for StackSync.
● ADD and REMOVE actions follow a normal distribution.
● UPDATE actions has a median of 2.75 seconds, but most of the times are higher due to the static chunking.
● Files > 2 MB: Sync time increases linearly.
● Files < 2 MB: Sync time is constant due to processing time of the synchronization server.
![Page 32: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/32.jpg)
Conclusions andFuture Work
![Page 33: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/33.jpg)
Conclusions
In this Thesis, we have examined central aspects of Personal Cloud storages services to characterize their performance in two different ways:
○ Data transfers○ Synchronization protocols
Data transfers
● Transfer performance of commercial Personal Clouds varies from one provider to another.
● The variability of transfers depends on:○ Traffic type: Upload or download.○ Hour of the day
![Page 34: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/34.jpg)
Conclusions
Synchronization Protocols
● Personal Clouds generate overhead depending on their synchronization features and mechanisms (chunking, delta encoding, pull or push synchronization...)
● StackSync implements an efficient synchronization protocol.
![Page 35: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/35.jpg)
Future Work
● CPU and RAM monitorization for the benchmarking tool. This will provide information about the computation power needed by the desktop clients to process user actions.
● Generate realistic files. Now the benchmark synchronizes binary random files.
● Improvements in the StackSync desktop client. Try to reduce overhead when updating files using advanced synchronization mechanisms.
![Page 36: Thesis presentation](https://reader033.vdocument.in/reader033/viewer/2022051411/547d38e2b4af9f70588b45a5/html5/thumbnails/36.jpg)
Measuring and Benchmarking Personal Clouds
Advisors: Dr. Pedro García López Dr. Marc Sanchez Artigas
M. Sc. Thesis Presentation
Cristian Cotes González