cache storage for the next billion students: anirudh badam, sunghwan ihm research scientist:...

Cache Storage For the Next Billion

Students: Anirudh Badam, Sunghwan Ihm

Research Scientist: KyoungSoo Park

Presenter: Vivek Pai

Collaborator: Larry Peterson

Cache Storage for the Next Billion2

The Next Billion

Developing regions are not all alike Many people have stable food, clean

water, reasonable powerConnectivity, however, is bad

Growing middle class with desire for education & technologyThese people are the next billion

Bad Networking & Options

Africa often backhauled through EuropeSatellite latency not funGhana: 2Mbps, $6000/month!

Emerging option: disk1TB disk now $200Even latency better than satellite

Enter the Tiny Laptops

Problem – memory in 256MB range

Making Storage Work

Populate disk with content Preloaded HTTP cache Preloaded WAN accelerator cache Preloaded Web sites – Wikipedia, etc

Ship disk to schools Update as needed

Pull update caches on-demand during peak Push updates off peak, overnight

Deployment Scenarios

Special servers per school 2 for redundancy

Average school size: 100 students @ 100/laptop, $10K/school

Problems 2 servers @ $5K doubles per-school cost Servers don’t ride laptop commodity curves

Solution: no servers, just laptops

Goal: 1 TB Cache Store on a 256MB Laptop

Why caching?Improves Web accessImproves WAN access

ProblemLarge disks are really slowDisk storage requires indexIn-memory indices optimize disk

access

Memory Index Sizing

Squid: popular HTTP cache 72 bytes/object Web objects average 8KB each 1TB = 125M objects 125M objects = 9GB RAM just for index

Commercial caches: better RAM usage 32 bytes/object 1TB disk = 4GB RAM

Revisiting Cache Indexing

Seek reduction important Most objects small Access largely random

High insert rate Assume hit rate is 50% Assume cachable rate is 50% Insert rate = 25% of request rate

High delete rate Caches largely full If insert rate = 25%, delete rate = 25% Deletion using LRU, etc

Restarting the Design

Eliminate in-memory indexTreat disk like memoryOptimize data structures for localityUse location-sensitive algorithmsMeasure performance

Now consider what to addFor each addition, measure

performance

What This Yields

HashCache familyOne basic storage enginePluggable algorithms & indexing

HashCache proxyWeb proxy using HashCache engine

Performance Comparison

Index Bits Per Object

5760 0

HashCache Memory

Storage Limits w/2GB Index

Beyond Diminishing Returns

HTTP cachability has upper limitBeyond that, items revalidated helpsRevalidation on demand, or

background Uncached content still cachable

Wide-area acceleratorsMust still contact servers, though

Why WAN Acceleration?

Lots of slowly-changing dataWikipediaNews sites“Customized” sites

WAN acceleration middleboxesCustom protocol between boxesStandard protocols to rest of netLess desirable than caches for Web

WAN Acceleration Dilemma

WAN accelerators use chunksTransit stream broken into chunks

Small chunks = high compressionAlso lots of small objects

Large chunks = high performanceBut worse for compression

Memory & disk important

Merging WAN Acc & HashCache

Easily index huge # chunksSmall chunks OKLarge chunks better

Store chunks redundantlyOptimize for performance &

compression Communicate tradeoffs to cache layer

Deployments

Two cache instances deployedBoth in AfricaShared machines, multiple services

Working with OLPC on deploymentWorking on licensingHopefully resolved this yearGoal: all-in-one server for schools

Longer Term Goals

Effort started around server consolidation Virtualization nice, except for memory Many apps very page-fault sensitive Extracting & sharing components desirable

More work in developing regions Even within the US: poor, rural, etc Customization for school-like workloads More work on peak/off-peak behavior

cache storage for the next billion students: anirudh badam, sunghwan ihm research scientist:...

cache storage

cache instances

storage work

web slide

slow disk storage

satellite slide

hashcache memory slide

disk access slide

Documents

the presentation will begin soon…. france sora park...

badam halwa by shree mohan karachi mart (a unit of karachi...

chromosome 2 doil choi, sunghwan jo korea. cytological...

magnetic resonance imaging findings of grafted cancellous...

hari shyam menu...

current sequencing effort of tomato chromosome 2 sunghwan...

by naveen kumar badam. contents introduction architecture of...

learning technology research fest 2010 event · pdf...

coblitz: a scalable large-file transfer service (cos 461)...

towards understanding developing world traffic sunghwan ihm...

affordances of input modalities for visual ... - karthik...

poonam badam milk project 1

multimedia congestion control in wireless sensor...

skwiki: a multimedia sketching system for collaborative...

1 cos 461: computer networks spring 2008 (mw 1:30-2:50 in cs...

advance - badam

jonathan c. roberts, panagiotis d. ritsos, sriram k. badam

on fatigue crack growth using cohesive zone...

fooling neural network interpretations via adversarial model...

a survey on context-aware systems matthias baldauf and...