celebrating diversity in volunteer computing david p. anderson space sciences lab u.c. berkeley...

Celebrating Diversityin Volunteer Computing

David P. AndersonSpace Sciences Lab

U.C. Berkeley

Sept. 1, 2008

Background

Volunteer computing distributed scientific computing using

volunteered resources (desktops, laptops, game consoles, cell phones, etc.)

BOINC middleware for volunteer (and desktop grid)

computing

Diversity of resources CPU type, number, speed RAM, disk Coprocessors OS type and version network

performance availability proxies

system availability reliability

crashes, invalid results, cheating

Diversity of applications

Resource requirements CPU, coprocessors, RAM, storage, network

Completion time constraints Numerical properties

same result on all CPUs a little different unboundedly different

IBM World Community Grid “Umbrella” project sponsored by IBM

Rice genome study: Univ. of Washington Protein X-ray crystallography: Ontario Cancer

Inst. African climate study: Univ. of Capetown Dengue fever drug discovery: Univ. of Texas Human protein folding: NYU, Univ. of Washington HIV drug discovery: Scripps Institute

Started Nov. 2004 390,000 volunteers total 167,000 years of CPU time Currently ~170 TeraFLOPS

CPU type

100000A

Processory Type

# cores

100000

1 2 3 4 6 8 14 16 24 32 64

OS type

10,000

100,000

1,000,000

Type of Operating System

1000000

RAM (in MB)

Free disk space

100000 8 16 24 32 40 48 56 64 72 80 88 96 104

Available Disk Space (GB)

Availability

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Percent Available

Job error rate

100000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Percent Error

Average turnaround time

0 39 75 111 147 183 219 255 291 327 363 399 435 471 519

Current WCG applications

Job dispatching

1M jobsschedulerclient

Goals maximize system throughput minimize time to batch completion minimize time to grant credit scale to >100 requests/sec

BOINC scheduler architecture

Job queue(DB)Schedulerclient

FeederJob cache(shared memory)

Issues: what if cache fills up with unsendable jobs? what is client needs a job not in cache?

Homogeneous replication

Different platforms do FP math differently makes result validation difficult

Divide platforms into equivalence classes, send instances of a job to a single class

“Census” program computes distribution Scheduler: send committed jobs if possible

Win/Intel Win/AMD etc. uncommitted

Retry acceleration

Retries needed when: job times out error (crash) returned results fail to validate

Send retries to hosts that are: fast (low turnaround) reliable

Shorten latency bound of retries

Volunteer app selection

Volunteers can select apps opt to accept jobs from non-selected apps

Fast feasibility checks (no DB)

Client sends: hardware spec availability info list of jobs queued, in progress

Resource checks Completion time check

EDF simulation deadlines missed?

Slow feasibility checks (DB)

Is job still needed? Has another replica been sent to this

volunteer?

jobApplication

Platform mechanism

Jobs are associated with apps, not versions

Win/x86 Win/x64 Linux/x86

App versions

Request message:

platform 0: Win64platform 1: Win32

Application

Win/x86 Win/x64 Linux/x86

App versions jobjob

Host punishment

The problem: hosts that error out all jobs Maintain M(h): max jobs per day for host h On each error, decrement M(h) On valid job, double M(h)

Anonymous platform mechanism

Rather than downloading apps from server, client has preexisting local apps.

Scheduler: if client has its own apps, only send it jobs for those apps.

Usage scenarios: Computers with unsupported platforms People who optimize apps Security-conscious people who want to inspect

the source code

Old scheduling policy

Job cache scan start from random point do fast feasibility checks lock job, do slow feasibility checks

Multiple scans send jobs committed to an HR class if fast host, send retries send work for selected apps is allowed, send work for non-selected apps

Problems rigid policy app == 1 CPU

Coprocessor and multi-thread apps

How to select the best version for a given host? How to estimate performance on the host?

Win/x86

single-threaded

multi-threaded CUDA

Multithread/coprocessor (cont.) How to decide which app version to use?

app versions have “plan class” string scheduler has project-supplied functionbool app_plan(SCHEDULER_REQUEST &sreq, char* plan_class, HOST_USAGE&);

returns: whether host can run app coprocessor usage CPU usage (possibly fractional) expected FLOPS cmdline to pass to app

embodies knowledge about sublinear speedup, etc. Scheduler: call app_plan() for each version, use

the one with highest expected FLOPS

Multithread/coprocessor (cont.)

Client coprocessor handling (currently just CUDA)

hardware check/report scheduling (coprocessors not timesliced)

CPU scheduling run enough apps to use at least N cores

Score-based scheduling

random

rank by score

feasible jobs

send M highest-scoring jobs

Terms in the score function

Bonus if host is fast and job is a retry job is committed to HR class app was selected by volunteer

Job size matching

Goal: send large jobs to fast hosts, small jobs to slow hosts reduce credit-granting delay reduce server occupancy time

Census program maintains host statistics Feeder maintains job size statistics Score penalty: |job - host|2

Adaptive replication

Goal: achieve a target level of reliability while reducing replication to 1+ε

Idea: replicate less (but always some) as a host becomes more trusted

Policy: maintain “invalid rate” E(h) per host. if E(h) > X, replicate (e.g., 2-fold) else replicate with probability E(h)/X

Is there a counterstrategy?

Server simulation

How do we know these policies are any good? How can we study alternatives?

In situ study is difficult SIMBA emulator (U. of Delaware):

SIMBA(emulates N clients)

BOINC server(not emulated)

Upcoming scheduler changes

Problems: only use 1 app version completion-time simulation is antiquated (doesn’t

reflect multithread, coprocessor, RAM limitations) New concept: resource signature

#CPUs, #coprocessors, RAM Do simulation based on “greedy EDF

scheduling” using resource signature Select app version that can use available

resources

Conclusion

Volunteer computing has diverse resources and workloads

BOINC has mechanisms that deal effectively and efficiently with this diversity

Lots of fun research problems here!

davea@ssl.berkeley.edu

celebrating diversity in volunteer computing david p. anderson space sciences lab u.c. berkeley...

Documents

1 u.c. berkeley, extension contracts law instructor: tod...

u.c. berkeley library

some slides from: u.c. berkeley, u.c. berkeley, alan...

program analysis using randomization sumit gulwani, george...

u.s. antitrust law: a primer howard a. shelanski, u.c....

market maker inventories and stock prices · 2010. 11....

invited speakers robert coleman(u.c. berkeley, usa) samit...

approximate nash equilibria in interesting games...

how much? for what? thoughts on economic issues in networks...

basic probability jean walrand eecs – u.c. berkeley

ucb communication networks: big picture jean walrand u.c....

global address space applications kathy yelick nersc/lbnl...

luca trevisan u.c berkeley based on work withtsz chiu kwok...

dung-hai lee u.c. berkeley

gripping parts at concave vertices k. “gopal”...

quantum algorithms & complexity umesh vazirani u.c. berkeley

ucb switches jean walrand u.c. berkeley wlr

introduction to artifical intelligence jitendra malik u.c....

spride & u.c. berkeley student research prelim

monitoring, diagnosing, and repairing eric anderson u.c....