distributed caching - computer measurement group · distributed caching: gaining speed by...
TRANSCRIPT
![Page 1: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/1.jpg)
Distributed Caching:Distributed Caching:Gaining Speed by Reduplicating DataGaining Speed by Reduplicating Data
Christopher R. HertelChristopher R. HertelSamba Geek
Senior Principal Software Engineer, Red Hat Novemberly, 2012
![Page 2: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/2.jpg)
IntroductionsIntroductions
![Page 3: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/3.jpg)
IntroductionsIntroductions
3MSPCMG, November 2012
A ruminant mammal (Geekus geekus) with long legs, humped shoulders, and broadly palmated antlers.
Me: Your Friendly Neighborhood CIFS Geek
Samba Team member (since 1998-ish)
jCIFS Project co-founder
CIFS Author (shameless plug )
Network Storage Geek
Incurable Idealist
Etc., etc., ad nauseum
![Page 4: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/4.jpg)
4
IntroductionsIntroductionsIntroductionsIntroductionsIntroductionsIntroductions
The opinions expressed are my ownand not necessarily those of my employer, my spouse,
kids, pets, so-called friends, or “the Voices”.
MSPCMG, November 2012
![Page 5: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/5.jpg)
5
Members of the Samba Team gather at the 10th annual Samba eXPerience conference in Göttingen, Germany.
MSPCMG, November 2012
IntroductionsIntroductions
![Page 6: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/6.jpg)
6
IntroductionsIntroductions
BranchCache Overview
The Prequel Project
PrequelD
PrequelHC
Client Plans
Tools
Where are we going?...and what am I doing in this handbasket?
MSPCMG, November 2012
![Page 7: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/7.jpg)
Getting Getting SidetrackedSidetracked
![Page 8: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/8.jpg)
MSPCMG, November 2012MSPCMG, November 2012
SidetracksSidetracks
![Page 9: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/9.jpg)
MSPCMG, November 2012MSPCMG, November 2012
SidetracksSidetracks
Cellphones Tablets as Cloud Terminals/
The Cloud stores Mass Quantities...but Bandwidth is low...but you only need a little at a time
![Page 10: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/10.jpg)
MSPCMG, November 2012MSPCMG, November 2012
SidetracksSidetracks
Common Data Transfer Methods:Local (USB, Bluetooth)Web (HTTP/HTTPS)Proprietary (Dropbox, etc.)Windows (SMB/CIFS via jCIFS)Windows (SMB/CIFS via jCIFS)
![Page 11: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/11.jpg)
MSPCMG, November 2012MSPCMG, November 2012
SidetracksSidetracks
JCIFSA little project I started many years ago
A Java client for SMB/CIFS
![Page 12: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/12.jpg)
BranchCacheBranchCacheOverviewOverview
![Page 13: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/13.jpg)
13
BranchCache OverviewBranchCache Overview
Accessing content over a WAN linkMinimize content copies over the WAN
Cache the copy on the local networkEnsure that the cached copy is still valid
Retrieve fingerprints from the server
MSPCMG, November 2012
![Page 14: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/14.jpg)
14
BranchCache OverviewBranchCache Overview
Clients request “fingerprints”Each fingerprint maps to a chunk of contentFingerprints are used to find cached contentIf content is not found in the local cache, it is retrieved over the WANCache keeps fingerprint-to-content mapping
MSPCMG, November 2012
![Page 15: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/15.jpg)
15
BranchCache OverviewBranchCache Overview
Distributed Cache Mode:Each node keeps a cache ofcontent it has downloaded
Clients broadcast to find content
The cache is distributed across peers
Limited to the local LAN
MSPCMG, November 2012
![Page 16: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/16.jpg)
16
BranchCache OverviewBranchCache Overview
Hosted Cache Mode:Clients tell cache node thatthey have cache-able content
Cache node retrievescache-able content from the client node
Other clients always query the cache node for content
Not LAN-lockedMSPCMG, November 2012
![Page 17: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/17.jpg)
17
BranchCache OverviewBranchCache Overview
Any questions about
BranchCache basics?
MSPCMG, November 2012
![Page 18: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/18.jpg)
Prequel ProjectPrequel Project
![Page 19: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/19.jpg)
19
Prequel ProjectPrequel Project
The Prequel ProjectThe Prequel Project
An Open Source implementation of the PeerDist protocol
PeerDist is the protocol suite underlying BranchCache
MSPCMG, November 2012
![Page 20: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/20.jpg)
20
Prequel ProjectPrequel Project
Prequel Project GoalsPrequel Project Goals
PrequelD: Server-side hash generationInterface with:
SambaHTTP server (e.g. Apache)
PrequelHC: Hosted Cache
Prequel Client for Linux
Prequel Tools
MSPCMG, November 2012
![Page 21: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/21.jpg)
21
Prequel ProjectPrequel Project
Websites:http://fedorahosted.org/prequel/Source code repositoryhttp://ubiqx.org/proj/Prequel/Project home page
Microsoft Docs:[MS-CCROD] Content Caching and Retrieval Protocols
Overview
[MS-PCCRC] Peer Content Caching and Retrieval: Content Identification
[MS-PCHC] Peer Content Caching and Retrieval: Hosted Cache Protocol Specification
MSPCMG, November 2012
![Page 22: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/22.jpg)
22
PrequelD:the Prequel Server Daemon
MSPCMG, November 2012
![Page 23: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/23.jpg)
23
PrequelDPrequelD
PrequelD is a Userland DæmonMake “nice”Background hash generationHashes stored in cache files
Cache files are “shared read”Speak to Dæmon over a socket
Threaded communication
MSPCMG, November 2012
![Page 24: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/24.jpg)
24
PrequelDPrequelD
Currently “works”Needs signal handling• SIGHUP: Reload Config• SIGTERM: Clean shutdown
Should traverse directories in the background (feature)Should do stale cache cleanup
Cache File AccessAPI definedCode should be done soon
Supports only PeerDist v1Design allows for PeerDist v2
MSPCMG, November 2012
![Page 25: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/25.jpg)
25
PrequelDPrequelD
Configuration File
socket /var/run/prequeld.sock;logfile /var/log/prequel.log;
cachedir /var/prequel_cache { hash1 sha256; # Most common hash type hash2 none; keyfile /etc/prequel/prequeld.key; sourcedir /data/music { keyfile /etc/prequel/music.key; minblocks 4; verbosity 0; exclude *.tmp, Queen; } }
socket /var/run/prequeld.sock;logfile /var/log/prequel.log;
cachedir /var/prequel_cache { hash1 sha256; # Most common hash type hash2 none; keyfile /etc/prequel/prequeld.key; sourcedir /data/music { keyfile /etc/prequel/music.key; minblocks 4; verbosity 0; exclude *.tmp, Queen; } }
MSPCMG, November 2012
![Page 26: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/26.jpg)
26
PrequelDPrequelD
Configuration FileScoped configuration
Global sectionMost settings havereasonable defaults
cachedir sectionsIdentify the target directoryContained settings are section localSettings apply to sourcedirs that followOrder is important!
sourcedir lines or sectionsIdentify directories of source files
Comments start with a '#'
MSPCMG, November 2012
![Page 27: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/27.jpg)
27
PrequelDPrequelD
Design Challenges
The hash cache is not directly connected to the source content
Cache files can get out of syncMay need to be re-hashed
Hashes should be removed on source file write or delete
Cache files can become “orphans”
Would it be better to keep the cache within the file system?
MSPCMG, November 2012
![Page 28: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/28.jpg)
28
PrequelHC:the Prequel Hosted Cache
Server
MSPCMG, November 2012
![Page 29: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/29.jpg)
29
PrequelHC:the Prequel Hosted Cache
Server
( Coming Soon )
MSPCMG, November 2012
![Page 30: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/30.jpg)
30
PrequelHCPrequelHC
Stand-alone HTTP server
Implements two sub-protocols:
PeerDist Hosted Cache Protocol ([MS-PCHC])
PeerDist Retrieval Protocol ([MS-PCCRR])
Written in Python
MSPCMG, November 2012
![Page 31: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/31.jpg)
31
PrequelHCPrequelHC
Hosted Cache ProtocolUsed by clients to offer content to the hosted cache serverUsed by servers to fetch content information from clientsPeerDistv1 requires HTTPSPeerDistv2 requires HTTP
MSPCMG, November 2012
![Page 32: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/32.jpg)
32
PrequelHCPrequelHC
Retrieval ProtocolUsed by hosted cache server to fetch offered content from clients
Transmitted over HTTP
Data blocks are encrypted over the wire
MSPCMG, November 2012
![Page 33: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/33.jpg)
33
PrequelHCPrequelHC
The Future
C libraries for sub-protocols
Apache module? CGI script?
Maintain stand-alone server?
MSPCMG, November 2012
![Page 34: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/34.jpg)
34
Prequel Client:the Uncharted Territory
MSPCMG, November 2012
![Page 35: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/35.jpg)
35
Prequel ClientPrequel Client
A user-land client would be fairly easy
Simple user management
Applications would need to call it directly
An in-kernel client is more daunting
Could integrate with the “CIFS” client
Could mesh with the file system cache
Available to sync with de-duplication
MSPCMG, November 2012
![Page 36: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/36.jpg)
36
Prequel Tools:Catch as Catch Can
MSPCMG, November 2012
![Page 37: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/37.jpg)
37
Prequel ToolsPrequel Tools
Tools we've slapped together as we build and test our implementation.
PdDump
PeerDist v1 Content Information Dump
pq_size_calc
Calculate the Content Information size from the original file size
oSSL_key_dxDecrypt a BranchCache key extracted from Windows
MSPCMG, November 2012
![Page 38: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/38.jpg)
38
Prequel ToolsPrequel Tools
Tools we've slapped together as we build and test our implementation.
STiB
Retrieve Content Information over HTTP(also implements BITS protocol)
pq_cgi
CGI program generates Content Information on the fly
MSPCMG, November 2012
![Page 39: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/39.jpg)
The End
![Page 40: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/40.jpg)
40
Prequel ProjectPrequel Project
Websites:http://fedorahosted.org/prequel/Source code repositoryhttp://ubiqx.org/proj/Prequel/Project home page
Microsoft Docs:[MS-CCROD] Content Caching and Retrieval Protocols
Overview
[MS-PCCRC] Peer Content Caching and Retrieval: Content Identification
[MS-PCHC] Peer Content Caching and Retrieval: Hosted Cache Protocol Specification
MSPCMG, November 2012
![Page 41: Distributed Caching - Computer Measurement Group · Distributed Caching: Gaining Speed by Reduplicating Data Christopher R. Hertel Samba Geek Senior Principal Software Engineer, Red](https://reader030.vdocument.in/reader030/viewer/2022040909/5e81ea64fc1ca0493e68b0da/html5/thumbnails/41.jpg)