cache storage for the next billion students: anirudh badam, sunghwan ihm research scientist:...
Post on 29-Mar-2015
219 Views
Preview:
TRANSCRIPT
Cache Storage For the Next Billion
Students: Anirudh Badam, Sunghwan Ihm
Research Scientist: KyoungSoo Park
Presenter: Vivek Pai
Collaborator: Larry Peterson
Cache Storage for the Next Billion2
The Next Billion
Developing regions are not all alike Many people have stable food, clean
water, reasonable powerConnectivity, however, is bad
Growing middle class with desire for education & technologyThese people are the next billion
Cache Storage for the Next Billion3
Bad Networking & Options
Africa often backhauled through EuropeSatellite latency not funGhana: 2Mbps, $6000/month!
Emerging option: disk1TB disk now $200Even latency better than satellite
Cache Storage for the Next Billion4
Enter the Tiny Laptops
Problem – memory in 256MB range
Cache Storage for the Next Billion5
Making Storage Work
Populate disk with content Preloaded HTTP cache Preloaded WAN accelerator cache Preloaded Web sites – Wikipedia, etc
Ship disk to schools Update as needed
Pull update caches on-demand during peak Push updates off peak, overnight
Cache Storage for the Next Billion6
Deployment Scenarios
Special servers per school 2 for redundancy
Average school size: 100 students @ 100/laptop, $10K/school
Problems 2 servers @ $5K doubles per-school cost Servers don’t ride laptop commodity curves
Solution: no servers, just laptops
Cache Storage for the Next Billion7
Goal: 1 TB Cache Store on a 256MB Laptop
Why caching?Improves Web accessImproves WAN access
ProblemLarge disks are really slowDisk storage requires indexIn-memory indices optimize disk
access
Cache Storage for the Next Billion8
Memory Index Sizing
Squid: popular HTTP cache 72 bytes/object Web objects average 8KB each 1TB = 125M objects 125M objects = 9GB RAM just for index
Commercial caches: better RAM usage 32 bytes/object 1TB disk = 4GB RAM
Cache Storage for the Next Billion9
Revisiting Cache Indexing
Seek reduction important Most objects small Access largely random
High insert rate Assume hit rate is 50% Assume cachable rate is 50% Insert rate = 25% of request rate
High delete rate Caches largely full If insert rate = 25%, delete rate = 25% Deletion using LRU, etc
Cache Storage for the Next Billion10
Restarting the Design
Eliminate in-memory indexTreat disk like memoryOptimize data structures for localityUse location-sensitive algorithmsMeasure performance
Now consider what to addFor each addition, measure
performance
Cache Storage for the Next Billion11
What This Yields
HashCache familyOne basic storage enginePluggable algorithms & indexing
HashCache proxyWeb proxy using HashCache engine
Cache Storage for the Next Billion12
Performance Comparison
Cache Storage for the Next Billion13
Index Bits Per Object
240
576
Cache Storage for the Next Billion14
Index Bits Per Object
240
5760 0
11
31
39
Cache Storage for the Next Billion15
HashCache Memory
Cache Storage for the Next Billion16
Storage Limits w/2GB Index
Cache Storage for the Next Billion17
Beyond Diminishing Returns
HTTP cachability has upper limitBeyond that, items revalidated helpsRevalidation on demand, or
background Uncached content still cachable
Wide-area acceleratorsMust still contact servers, though
Cache Storage for the Next Billion18
Why WAN Acceleration?
Lots of slowly-changing dataWikipediaNews sites“Customized” sites
WAN acceleration middleboxesCustom protocol between boxesStandard protocols to rest of netLess desirable than caches for Web
Cache Storage for the Next Billion19
WAN Acceleration Dilemma
WAN accelerators use chunksTransit stream broken into chunks
Small chunks = high compressionAlso lots of small objects
Large chunks = high performanceBut worse for compression
Memory & disk important
Cache Storage for the Next Billion20
Merging WAN Acc & HashCache
Easily index huge # chunksSmall chunks OKLarge chunks better
Store chunks redundantlyOptimize for performance &
compression Communicate tradeoffs to cache layer
Cache Storage for the Next Billion21
Deployments
Two cache instances deployedBoth in AfricaShared machines, multiple services
Working with OLPC on deploymentWorking on licensingHopefully resolved this yearGoal: all-in-one server for schools
Cache Storage for the Next Billion22
Longer Term Goals
Effort started around server consolidation Virtualization nice, except for memory Many apps very page-fault sensitive Extracting & sharing components desirable
More work in developing regions Even within the US: poor, rural, etc Customization for school-like workloads More work on peak/off-peak behavior
top related