Download - 1 1 Work per Task: a Useful Measure for Increasing Throughput While saving Energy Yitzhak (Tsahi) Birk Technion birk

1

Work per Task: a Useful Measure for Increasing ThroughputWhile saving Energy

Yitzhak (Tsahi) Birk

Technionwww.ee.technion.ac.il/~birk

www.psl.technion.ac.il

2

Energy-Performance Tradeoff• “Performance” and energy conservation are

often conflicting goals.– A faster CPU is often less energy-efficient

– Driving faster consumes more energy for the trip– A disk that seeks faster consumes more energy

per seek

• Is this always the case?

• Can we do anything?

3

Outline

• Performance measures and the relationship among them

• “Work per task”

• Energy-performance win-win situations

• Conclusions

4

Performance Measures

• User perspective: time to accomplish mission (delay; response time; latency)

– queuing delay

– service time

• Service-provider perspective: throughput (Missions/sec; bits/sec; transcations/sec)

Is service time reduction the common goal?

Service Time = 1/throughput?

Queue

ServerRequester

100%50%Load ( = throughput / capacity)

Res-ponsetime

5

Pipelining

Tasks:– Wash:

55min

– Dry: 45 min

– (Transfer: 5 min)Maximum throughput

Service

time

Combo 1/110 110

Separate 1/65 115

Increases both throughput and service time!

5+100+5

1 5+55+5+45+5 2

6

Work Per Task• Unit: seconds (time), but

• Unlike latency, this is actually (resource x time), like “person hours”

• In time interval T<sec>, an N-resource system can do NT<sec> of work

• If a task requires less work, more tasks can be done during time T

Less work per task higher throughput

7

Work Per Task and Energy

• If a resource consumes fixed power, reducing the amount of work (as defined!) per given task reduces energy consumption for a given workload.

Less work per task potentially save energy!

8

Reducing work per task may lead to

win-win for energy and performance!

Example 1

Data Layout for Random Block-Set Access

10

Scenario

• Data: numerous disjoint sets of 4 blocks.

• Workload: numerous requests to read entire sets, chosen randomly.

• Hardware: 4 disk drives

– Time to access a block: ts– Time to transfer a block: tt

• Controller can immediately assign a request to an idle disk (all disks are kept busy at all times).

• Goal: maximize throughput (block-sets / sec).

11

Data Layout Options

• Striping each block set across all disks

• Each block set ona single disk

• For both options: assume perfect load balancing and that all disks are busy all the time.

• With Striping:

• No Striping:

12

Work Per Task (reading a block set)

The unstriped layout is more efficient

W = ts + 4∙tt

W = 4(ts + tt)

• Smax= 4/W, where W is the amount of disk work per task

13

Throughput Comparison

The non-striped layout wins on throughput

4

4unstriped s t

striped s t

S T T

S T T

• Seek:– Same energy per seek– Striped: 4 seeks per task– Non-Striped: 1 seek per task

• Rotation:– Fixed power W

• Signal processing: same

• Communication protocols: less with no striping

14

Energy Comparison

The non-striped layout also wins on Energy

W = ts + 4∙tt

W = 4∙ts + 4∙tt

What About Response Time?• Wasn’t part of the requirement!, but let’s

examine it anyhow.

• “Zero load”:– Small blocks: ~equal– Large blocks: transfer time dominates, so stripe the

data at negligible throughput and energy penalty.

• Heavy load:– Response time is dominated by queuing delay– Higher maximum throughput shorter queues for

any given workload non-striped wins!

Example 2

Web Server

17

Background• Web server:

– Stores data on disk– Retrieves it in response to requests

• Web page:– HTML “skeleton”– Multiple small embedded objects

• Each page requires multiple disk accesses:– Wasted disk time low maximum throughput– Extra disk seeks and time more energy per

workload

• Goal: reduce work per page

18

PSL Home Pagewww.psl.technion.ac.il

• Total size: ~150kB

• Composition:– Skeleton

– 8 embedded objects

• Disk work: – ~80ms

– 9 seeks

• Max disk throughput:~ 12 pages/sec

19

Possible Approaches

• Prefetching:– Mostly latency hiding, not a reduction in disk

work per web page– (Smart scheduling can reduce mean seek

distance and thus some reduction in work)

• Contiguous placement of page’s objects:– Rotational latency not saved

– Seek may not be saved if server is busy, as other requests are interlaced with those of page

– Not effective unless also read together

20

Packaging

• All page’s objects:– placed contiguously on disk

– read in a single disk access.

• After reading from disk:– Server separates in memory (unaware client), or

– Sends entire package to client (participating client).

• Price: replication of objects that are part of multiple pages (negligible or don’t do it)

• Complication: need to update upon object change– Server uses “check if modified since” after reading package

– Various policies are possible upon change, ranging from “do nothing” to look for all copies and update them.

21

Packaging: Impact (PSL example)

• 9 seeks 1 seek per page

• Disk work: 80ms 10ms

• Disk throughput: 12 pages/sec 100 pages/sec

• Disk energy per web page:– Seek: down 9X

– Motor: down ~9X (same power but for a shorter time)

– Electronics: some savings

Packaging increases throughput & saves energy!

Ex 3: Same Work for Less Energy

• Disk access:– Seek

– Rotational Latency

– Transfer

• Important insight: – Slower seek need not prolong the time to first bit

– Slower seek consumes less energy

– Seek speed can be chosen per seek (unlike rpm)

• Implication: can adjust seek speed based on seek distance and relative rotational position so as to save energy with no performance penalty! [Toshiba]

22

Ex 4: Scheduling for Efficient Access

• Reduces work and increases throughput.

• Fairness:– Avoid starvation by gating

– At heavy load, higher max throughput lower load shorter queue shorter response times for all!

Caution: Power, Work and Energy

• Energy = Work Power,

so

• Reducing Work by increasing Power may not reduce Energy consumption

Example 5

RandomRAID Video Server

Random RAID Video Server [1995]

ReconstructionBuffer

DISK DRIVES

RAM BUFFERSTREAMER

NETWORK

CONTROL

XOR

RandomRAID - Overview

• N disk drives

• Parity groups:– Stripe size k: arbitrary k < N

– Choice of disk drives constituting a parity group (at write time): “random”

• Full-stripe reads (e.g., data streaming):– pick any (k-1) of the k relevant disks

– Example: the ones with the shortest queues or the ones that will do minimum work

• Special case: random mirroring

DISK DRIVES

RandomRAID – Salient Features• Load balancing

– During normal operation

– In degraded mode

– During reconstruction

• Work reduction or shorter response time(depends on disk-choice policy)

• Incremental scale-up possible, including non-identical disk drives

• Offers the benefits of randomization without the negative side-effects!

28

Judicious exploitation of redundancy for performance enhancement and energy efficiency!

DISK DRIVES

Example 6

On-Line Transaction Processing

via Satellite

(e.g., cash register)

30

Maximizing Deadline-Constrained Capacity in Multi-channel ALOHA Networks [Birk et al]

• Multichannel ALOHA:– Upstream contention channels (shared)

– Private downstream channel for hub transmissions and Acks

– New message: draw a channelrandomly, transmit and wait for Ack

– If no Ack, redraw channel and retransmit

• Transaction processing – performance goals:– User: deadline and permissible Pr(failure)

– Service provider’s goal: max. transactions/sec

• For battery-operated terminals and sensors: minimum mean energy per transaction.

31

Multi-Channel ALOHA: Example

• Scenario:

– Delay permits 2 attempts (rounds).

– P(collision) = 0.1

– P(failure) = 0.001

• Greedy approach: send 3 copies in 1st attempt:

– minimum mean delay

– 3 copies per message

• Better approach:– Send 1 in first attempt

– Send 2 in 2nd attempt iff 1st fails

• Analysis– longer mean delay (who cares?)

– 1.2 copies per message

– 2.5-fold higher capacity

– 2.5x less energy per transaction!

Higher Throughput and Less Energy!

32

Arrow 2

The Arrow ABM System [Dov Raviv]• Arrow miss probability: 0.1

• System requirement: 0.001

• Possible solution: 3 arrows

• Key observations:– Advance radar warning allows a

2nd attempt upon failure.

– Interception time is not important.

• Optimal solution:– Fire one arrow

– Iff it misses, fire two more

– Probability of failure: 0.001

– Average of 1.2 Arrows per target

Exact requirement + smart solution 60% savings!

Learn from other applications!

33

Conclusions

• While energy reduction may be at odds with other goals, reduction of the amount of work per task (not service time!) can create win-win situations.

• Redundancy can be beneficially exploited for performance enhancement, not only for fault tolerance.

• Designing to the true requirements may lead to interesting opportunities.

Download - 1 1 Work per Task: a Useful Measure for Increasing Throughput While saving Energy Yitzhak (Tsahi) Birk Technion birk

Top Related