1 the difference engine, charles babbage images from wikipedia (joe d and andrew dunn) slides...
TRANSCRIPT
1
The Difference Engine, Charles BabbageThe Difference Engine, Charles BabbageImages from Wikipedia (Joe D and Andrew Dunn)Images from Wikipedia (Joe D and Andrew Dunn)
Slides courtesy Anselmo LastraSlides courtesy Anselmo Lastra
2
COMP 740:COMP 740:Computer Architecture and Computer Architecture and ImplementationImplementation
Montek SinghMontek Singh
Wed, Jan 12, 2011Wed, Jan 12, 2011
Lecture 2: Lecture 2: Fundamentals and TrendsFundamentals and Trends
3
Quantitative Principles of Computer Quantitative Principles of Computer DesignDesign
T
1P
Execution timeResponse timeLatency
Execution timeResponse timeLatency
PerformanceRate of producing resultsThroughputBandwidth
PerformanceRate of producing resultsThroughputBandwidth
bitn / instructio / programresult / work /
time
time
bits / nsinstructio / program / resultswork /
4
TopicsTopics PerformancePerformance ChipsChips Trends inTrends in
““Bandwidth” (or Throughput) vs. LatencyBandwidth” (or Throughput) vs. LatencyPowerPowerCostCostDependabilityDependability
Measuring PerformanceMeasuring Performance
5
Trends: Moore’s LawTrends: Moore’s Law
Era of the microprocessor.Increases due to transistorsand architectural improvements
6
PerformancePerformance Increase by 2002 was 7X faster than would Increase by 2002 was 7X faster than would
have been due to tech alonehave been due to tech alone What has slowed the trend?What has slowed the trend?
Note what is really being builtNote what is really being builtA commodity device!A commodity device!So cost is very importantSo cost is very important
ProblemsProblemsAmount of heat that can be removed economicallyAmount of heat that can be removed economicallyLimits to instruction level parallelismLimits to instruction level parallelismMemory latencyMemory latency
7
Moore’s LawMoore’s Law Number of transistors on a chip Number of transistors on a chip
at the lowest cost/componentat the lowest cost/component
It’s not quite clear what it isIt’s not quite clear what it is Moore’s original paper, doubling yearlyMoore’s original paper, doubling yearly
Didn’t make it in 1975Didn’t make it in 1975 Often quoted as doubling every 18 monthsOften quoted as doubling every 18 months Sometimes as doubling every two yearsSometimes as doubling every two years
Moore’s article worth reading if you haven’t Moore’s article worth reading if you haven’t yetyet
8
Quick Look: Classes of ComputersQuick Look: Classes of Computers Used to be Used to be
mainframe, mainframe, mini and mini and micromicro
NowNow Desktop (portable?)Desktop (portable?)
Price/performance, single app, graphicsPrice/performance, single app, graphics ServerServer
Reliability, scalability, throughputReliability, scalability, throughput EmbeddedEmbedded
Not only “toasters”, but also cell phones, etc.Not only “toasters”, but also cell phones, etc.Cost, power, real-time performanceCost, power, real-time performance
9
Chip PerformanceChip Performance Based on a number of factorsBased on a number of factors
Feature size (or “technology” or “process”)Feature size (or “technology” or “process”)Determines transistor & wire densityDetermines transistor & wire densityUsed to be measured in microns, now nanometersUsed to be measured in microns, now nanometersCurrently: 90 nm, 65 nm, even 45 nmCurrently: 90 nm, 65 nm, even 45 nm
Die sizeDie size Device speedDevice speed
Note section on wires in HP4Note section on wires in HP4 Thin wires -> more resistance and capacitanceThin wires -> more resistance and capacitance Wire delay scales poorlyWire delay scales poorly
10
Wafer, Die, YieldWafer, Die, Yield
11
PackagingPackaging
12
ITRSITRSInternational Technology Roadmap for International Technology Roadmap for
SemiconductorsSemiconductors http://www.itrs.net/http://www.itrs.net/ An industry consortiumAn industry consortium Predicts trendsPredicts trends Take a look at the yearly report on their websiteTake a look at the yearly report on their website
13
ITRS Predictions (2006 update)ITRS Predictions (2006 update)
Aside: Ray KurzweilAside: Ray Kurzweil Kurzweil: Kurzweil:
futurist, authorfuturist, authorBook in 2005: Book in 2005:
“The Singularity “The Singularity is Near”is Near”
Movie in 2010Movie in 2010
14
15
TrendsTrends Now let’s look at trends inNow let’s look at trends in
““Bandwidth” (Throughput) vs. LatencyBandwidth” (Throughput) vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance
16
Bandwidth over LatencyBandwidth over Latency Very important to understand section in HP4 Very important to understand section in HP4
on page 15on page 15 What they mean by What they mean by bandwidthbandwidth is also is also
processor performance processor performance (throughput),(throughput), maybe maybe memory size, etcmemory size, etc
Let’s look at chartsLet’s look at charts
17
DiskDisk
1
10
100
1000
10000
1 10 100
Relative Latency Improvement
Relative BW
Improvement
Disk
(Latency improvement = Bandwidth improvement)
18
RAMRAM
1
10
100
1000
10000
1 10 100
Relative Latency Improvement
Relative BW
Improvement
MemoryDisk
(Latency improvement = Bandwidth improvement)
19
LANLAN
1
10
100
1000
10000
1 10 100
Relative Latency Improvement
Relative BW
Improvement
Memory
Network
Disk
(Latency improvement = Bandwidth improvement)
20
ProcessorProcessor
1
10
100
1000
10000
1 10 100
Relative Latency Improvement
Relative BW
Improvement
Processor
Memory
Network
Disk
(Latency improvement = Bandwidth improvement)
CPU high, Memory low(“Memory Wall”)
21
SummarySummary
In the time that bandwidth doubles, latency In the time that bandwidth doubles, latency improves by no more than a factor of 1.2 to 1.4 improves by no more than a factor of 1.2 to 1.4 (and capacity improves faster than bandwidth)(and capacity improves faster than bandwidth)
Stated alternatively:Stated alternatively: Bandwidth improves by more than the square of the Bandwidth improves by more than the square of the
improvement in Latencyimprovement in Latency
22
Why Less Improvement?Why Less Improvement? Moore’s Law helps bandwidthMoore’s Law helps bandwidth
Longer distance for signal to travel, so longer latencyLonger distance for signal to travel, so longer latency Which offsets faster transistorsWhich offsets faster transistors
Distance limits latencyDistance limits latency Speed of light lower boundSpeed of light lower bound
Bandwidth sellsBandwidth sells Capacity, processor “speed” and benchmark scoresCapacity, processor “speed” and benchmark scores
Latency can help bandwidthLatency can help bandwidth Often bandwidth is increased by adding latencyOften bandwidth is increased by adding latency
OS introduces latencyOS introduces latency
23
Techniques to AmeliorateTechniques to Ameliorate CachingCaching
Use capacity (“bandwidth”) to reduce average latencyUse capacity (“bandwidth”) to reduce average latency
ReplicationReplication Again, leverage capacityAgain, leverage capacity
PredictionPrediction Use extra processing transistors to pre-fetchUse extra processing transistors to pre-fetch Maybe also to recompute instead of fetchMaybe also to recompute instead of fetch
24
TrendsTrends Now let’s look at trends inNow let’s look at trends in
““Bandwidth” vs. LatencyBandwidth” vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance
25
PowerPower For CMOS chips, traditional dominant energy For CMOS chips, traditional dominant energy
consumption has been in switching transistors, consumption has been in switching transistors, called dynamic powercalled dynamic power
witchedFrequencySVoltageLoadCapacitivePowerdynamic 2
2/1
For mobile devices, energy is better metric:For mobile devices, energy is better metric:
VoltageLoadCapacitiveEnergydynamic2
For fixed task, slowing clock rate reduces power, not energyFor fixed task, slowing clock rate reduces power, not energy Capacitive load a function of number of transistors Capacitive load a function of number of transistors
connected to output and of technology, which determines connected to output and of technology, which determines capacitance of wires and transistorscapacitance of wires and transistors
Dropping voltage helps both, moved from 5V to 1VDropping voltage helps both, moved from 5V to 1V Clock gatingClock gating
26
ExampleExample Suppose 15% reduction in voltage Suppose 15% reduction in voltage
results in a 15% reduction in frequency. results in a 15% reduction in frequency. What is impact on dynamic power?What is impact on dynamic power?
dynamic
dynamic
dynamic
OldPower
OldPower
witchedFrequencySVoltageLoadCapacitive
witchedFrequencySVoltageLoadCapacitivePower
6.0
)85(.
)85(.85.2/1
2/1
3
2
2
27
Trends in PowerTrends in Power Because leakage current flows even when a Because leakage current flows even when a
transistor is off, now transistor is off, now static powerstatic power important important tootoo
Leakage current increases in processors with Leakage current increases in processors with smaller transistor sizessmaller transistor sizes
Increasing the number of transistors increases Increasing the number of transistors increases power even if they are turned offpower even if they are turned off
In 2006, goal for leakage is 25% of total power In 2006, goal for leakage is 25% of total power consumption; high performance designs at 40%consumption; high performance designs at 40%
Very low power systems even gate voltage to Very low power systems even gate voltage to inactive modules to control loss due to leakageinactive modules to control loss due to leakage
VoltageCurrentPower staticstatic
28
TrendsTrends Now let’s look at trends inNow let’s look at trends in
““Bandwidth” vs. LatencyBandwidth” vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance
29
Cost of Integrated CircuitsCost of Integrated Circuits
yield test Final
packaging ofCost die testingofCost die ofCost IC ofCost
yield Die
test timedie Average hour per testingofCost die testingofCost
yield Dieper wafer Dies
waferofCost die ofCost
per wafer diesTest area Die2
diameterWafer
area Die2diameterWafer
per wafer Dies
2
area Die areaunit per Defects
1 yield Wafer yield Die
Dingwall’s Equation
30
ExplanationsExplanations
Second term in “Dies per wafer”corrects for the rectangular diesnear the periphery of round wafers
“Die yield” assumes a simple empiricalmodel: defects are randomly distributedover the wafer, and yield is inverselyproportional to the complexity of thefabrication process (indicated by )
=3 for modern processes implies thatcost of die is proportional to (Die area)4
31
“Revised Model Reduces Cost Estimates”, Linley Gwennap, Microprocessor Report 10(4), 25 Mar 1996
Intel AMD Cyrix MIPS PowerPC PowerPC Pentium Sun HitachiPentium 5K86 6x86 R5000 603e 604 Pro UltraSparc SH7604
Process BiCMOS CMOS CMOS CMOS CMOS CMOS BiCMOS CMOS CMOSLine width (microns) 0.35 0.35 0.44 0.35 0.64 0.44 0.35 0.47 0.8Metal layers 4 3 5 3 4 4 4 4 2Wafer size (mm) 200 200 200 200 200 200 200 200 150Wafer cost $2,700 $2,200 $2,400 $2,600 $2,500 $2,300 $2,700 $2,200 $500Die area (sq mm) 91 181 204 84 98 196 196 315 82Effective area 85% 75% 85% 48% 65% 72% 85% 68% 75%Dice/wafer 297 159 122 325 275 128 128 74 177Defects/sq cm 0.6 0.8 0.7 0.8 0.5 0.8 0.6 0.8 0.5Yield 65% 40% 36% 74% 74% 38% 42% 26% 75%Die cost $14 $40 $55 $11 $9 $47 $50 $116 $4Package size (pins) 296 296 296 272 240 304 387 521 144Package type PGA PGA PGA PBGA CQFP CQFP MCM PGA PQFPPackage cost $18 $21 $21 $11 $14 $21 $40 $45 $3Test & assembly cost $8 $10 $10 $6 $6 $12 $21 $28 $1Total mfg cost $40 $71 $86 $28 $29 $80 $144 $189 $8
Real World ExamplesReal World Examples
32
TrendsTrends Now let’s look at trends inNow let’s look at trends in
““Bandwidth” vs. LatencyBandwidth” vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance
33
DependabilityDependability When is a system operating properly? When is a system operating properly? Infrastructure providers now offer Service Infrastructure providers now offer Service
Level Agreements (SLA) to guarantee that Level Agreements (SLA) to guarantee that their networking or power service would be their networking or power service would be dependabledependable
Systems alternate between 2 states of service Systems alternate between 2 states of service with respect to an SLA:with respect to an SLA: Service accomplishment, where the service is Service accomplishment, where the service is
delivered as specified in SLAdelivered as specified in SLA Service interruption, where the delivered service is Service interruption, where the delivered service is
different from the SLAdifferent from the SLA
Failure = transition from state 1 to state 2Failure = transition from state 1 to state 2 Restoration = transition from state 2 to state 1Restoration = transition from state 2 to state 1
34
DefinitionsDefinitionsModule reliability = measure of continuous Module reliability = measure of continuous
service accomplishment (or time to failure)service accomplishment (or time to failure) Two key metrics:Two key metrics:
Mean Time To Failure (MTTF) measures ReliabilityMean Time To Failure (MTTF) measures ReliabilityFailures In Time (FIT) = 1/MTTF, the rate of failures Failures In Time (FIT) = 1/MTTF, the rate of failures
Traditionally reported as failures per billion hours of Traditionally reported as failures per billion hours of operationoperation
Derived metrics:Derived metrics:Mean Time To Repair (MTTR) measures Service Mean Time To Repair (MTTR) measures Service
InterruptionInterruptionMean Time Between Failures (MTBF) = MTTF+MTTRMean Time Between Failures (MTBF) = MTTF+MTTR
Module availability measures service as alternate Module availability measures service as alternate between the 2 states of accomplishment and between the 2 states of accomplishment and interruption (number between 0 and 1, e.g. 0.9)interruption (number between 0 and 1, e.g. 0.9)
Module availability = MTTF / ( MTTF + MTTR)Module availability = MTTF / ( MTTF + MTTR)
35
Example -- Calculating ReliabilityExample -- Calculating Reliability If modules have If modules have exponentially distributed lifetimesexponentially distributed lifetimes (age of (age of
module does not affect probability of failure), overall failure module does not affect probability of failure), overall failure rate is the sum of failure rates of the modulesrate is the sum of failure rates of the modules
Calculate Calculate FITFIT and and MTTFMTTF for 10 disks (1M hour MTTF per disk), 1 for 10 disks (1M hour MTTF per disk), 1 disk controller (0.5M hour MTTF), and 1 power supply (0.2M disk controller (0.5M hour MTTF), and 1 power supply (0.2M hour MTTF):hour MTTF):
FailureRate
MTTF
Solution next
36
SolutionSolution
hours
MTTF
FIT
eFailureRat
000,59
000,17/000,000,000,1
000,17
000,000,1/17
000,000,1/5210
000,200/1000,500/1)000,000,1/1(10
If modules have If modules have exponentially distributed lifetimesexponentially distributed lifetimes (age of module does not affect probability of (age of module does not affect probability of failure), overall failure rate is the sum of failure failure), overall failure rate is the sum of failure rates of the modulesrates of the modules
Calculate Calculate FITFIT and and MTTFMTTF for 10 disks (1M hour MTTF for 10 disks (1M hour MTTF per disk), 1 disk controller (0.5M hour MTTF), and 1 per disk), 1 disk controller (0.5M hour MTTF), and 1 power supply (0.2M hour MTTF):power supply (0.2M hour MTTF):
37
TrendsTrends Now let’s look at trends inNow let’s look at trends in
““Bandwidth” vs. LatencyBandwidth” vs. LatencyPowerPowerCostCostDependabilityDependabilityPerformancePerformance
38
First, What is Performance?First, What is Performance? The starting point is universally acceptedThe starting point is universally accepted
““The time required to perform a specified amount of The time required to perform a specified amount of computation is the ultimate measure of computer computation is the ultimate measure of computer performance”performance”
How should we summarize (reduce to a single How should we summarize (reduce to a single number) the measured execution times (or number) the measured execution times (or measured performance values) of measured performance values) of severalseveral benchmark programs?benchmark programs?Two propertiesTwo properties
A single-number performance measure for a set of A single-number performance measure for a set of benchmarks expressed in units of time should be benchmarks expressed in units of time should be directly proportional to the total (weighted) time directly proportional to the total (weighted) time consumed by the benchmarks.consumed by the benchmarks.
A single-number performance measure for a set of A single-number performance measure for a set of benchmarks expressed as a rate should be inversely benchmarks expressed as a rate should be inversely proportional to the total (weighted) time consumed by proportional to the total (weighted) time consumed by the benchmarks.the benchmarks.from “Characterizing Computer Performance with a Single Number”, J. E. Smith, CACM, October 1988, pp. 1202-1206
39
Quantitative Principles of Computer Quantitative Principles of Computer DesignDesign Performance is in units of things per secPerformance is in units of things per sec
So bigger is betterSo bigger is better
What if we are primarily concerned with What if we are primarily concerned with response time?response time?
T
1P
Execution timeResponse timeLatency
Execution timeResponse timeLatency
PerformanceRate of producing resultsThroughputBandwidth
PerformanceRate of producing resultsThroughputBandwidth
bitn / instructio / programresult / work /
time
time
bits / nsinstructio / program / resultswork /
40
Performance: What to measure?Performance: What to measure? What about just MIPS and MFLOPS?What about just MIPS and MFLOPS? Usually rely on benchmarks vs. real workloadsUsually rely on benchmarks vs. real workloads Older measures wereOlder measures were
Kernels orKernels or Small programs designed to mimic real workloadsSmall programs designed to mimic real workloads
Whetstone, DhrystoneWhetstone, Dhrystone http://www.netlib.org/benchmark Note LINPACK and Top500Note LINPACK and Top500
41
MIPSMIPS
MIPS10 timeCPU
countn Instructio
10CPI
Clockrate
timeCPU
countn Instructio
CPI
ClockrateClockrate
countn InstructioCPI timeCPU
66
Machines with different Machines with different instruction sets?instruction sets?
Programs with different Programs with different instruction mixes?instruction mixes?
Uncorrelated with Uncorrelated with performanceperformance Marketing metricMarketing metric
““Meaningless Indicator of Meaningless Indicator of Processor Speed”Processor Speed”
42
MFLOP/sMFLOP/s
610 timeCPU
operations FP ofNumber MFLOP/s
Popular in supercomputing Popular in supercomputing communitycommunity
Often not where time is Often not where time is spentspent
Not all FP operations are Not all FP operations are equalequal ““Normalized” MFLOP/sNormalized” MFLOP/s
Can magnify performance Can magnify performance differencesdifferences A better algorithm (e.g., A better algorithm (e.g.,
with better data reuse) can with better data reuse) can run faster even with higher run faster even with higher FLOP countFLOP count
43
Peak Performance?Peak Performance?
44
BenchmarksBenchmarks To increase predictability, collections of benchmark applications, To increase predictability, collections of benchmark applications,
called called benchmark suitesbenchmark suites, are popular, are popular SPECCPUSPECCPU: popular desktop benchmark suite: popular desktop benchmark suite
CPU only, split between integer and floating point programsCPU only, split between integer and floating point programs SPECint2000 has 12 integer, SPECfp2000 has 14 integer pgmsSPECint2000 has 12 integer, SPECfp2000 has 14 integer pgms SPECCPU2006 was announced Spring 2006SPECCPU2006 was announced Spring 2006 SPECSFSSPECSFS (NFS file server) and (NFS file server) and SPECWebSPECWeb (WebServer) added as server (WebServer) added as server
benchmarksbenchmarks www.spec.org
Transaction Processing CouncilTransaction Processing Council measures server performance measures server performance and cost-performance for databasesand cost-performance for databases TPC-CTPC-C Complex query for Online Transaction Processing Complex query for Online Transaction Processing TPC-H models ad hoc decision supportTPC-H models ad hoc decision support TPC-W a transactional web benchmarkTPC-W a transactional web benchmark TPC-App application server and web services benchmarkTPC-App application server and web services benchmark
45
SPEC2006 ProgramsSPEC2006 Programs
46
How to Summarize Performance?How to Summarize Performance? Arithmetic average of execution times??Arithmetic average of execution times??
But they vary in basic speed, so some would be more But they vary in basic speed, so some would be more important than others in arithmetic averageimportant than others in arithmetic average
Could add weights per program, but how to Could add weights per program, but how to pick weight? pick weight? Different companies want different weights for their Different companies want different weights for their
productsproducts
SPECRatio: Normalize execution times to SPECRatio: Normalize execution times to reference computer, yielding a ratio reference computer, yielding a ratio proportional to performance =proportional to performance = time on reference computer / time on computer being time on reference computer / time on computer being
ratedrated Spec uses an older Sun machine as referenceSpec uses an older Sun machine as reference
47
RatiosRatios If program SPECRatio on Computer A is 1.25 times If program SPECRatio on Computer A is 1.25 times
bigger than Computer B, thenbigger than Computer B, then
1.25
reference
A A
referenceB
B
B A
A B
ExecutionTime
SPECRatio ExecutionTimeExecutionTimeSPECRatio
ExecutionTime
ExecutionTime Performance
ExecutionTime Performance
Note that when comparing 2 computers as a Note that when comparing 2 computers as a ratio, execution times on the reference ratio, execution times on the reference computer drop out, so choice of reference computer drop out, so choice of reference computer is irrelevant computer is irrelevant
48
MeansMeans
.1 numbers, positive of tuple-an be ,,Let 1 nnrr nr
n
r
rr
rrr
rrr
rrr
ii
n
H
nG
nnA
nnQ
n
n
ii
n
ii
n
ii
n
1
111
)(mean Harmonic
1)(mean Geometric
)(mean Arithmetic
)(mean Quadratic
1
1
1
1
222
1
r
r
r
r
49
Geometric MeanGeometric Mean
Since ratios, proper mean is geometric mean Since ratios, proper mean is geometric mean (SPECRatio unitless, so arithmetic mean meaningless)(SPECRatio unitless, so arithmetic mean meaningless)
1
n
ni
i
GeometricMean SPECRatio
1.1. Geometric mean of the ratios is the same as the ratio of Geometric mean of the ratios is the same as the ratio of
the geometric meansthe geometric means
2.2. Ratio of geometric means Ratio of geometric means = Geometric mean of = Geometric mean of performanceperformance ratios ratios choice of reference computer is irrelevant! choice of reference computer is irrelevant!
These two points make geometric mean of ratios These two points make geometric mean of ratios attractive to summarize performanceattractive to summarize performance
50
Different TakeDifferent Take Smith (CACM 1988, see references) takes a Smith (CACM 1988, see references) takes a
different view on meansdifferent view on means First let’s look at exampleFirst let’s look at example
51
RatesRates Change to MFLOPS and also look at different Change to MFLOPS and also look at different
meansmeans
52
Avoid the Geometric Mean?Avoid the Geometric Mean? If benchmark execution times are normalized to If benchmark execution times are normalized to
some reference machine, and means of some reference machine, and means of normalized execution times are computed, only normalized execution times are computed, only the geometric mean gives consistent results no the geometric mean gives consistent results no matter what the reference machine ismatter what the reference machine is This has led to declaring the geometric mean as the This has led to declaring the geometric mean as the
preferred method of summarizing execution time (e.g., preferred method of summarizing execution time (e.g., SPEC)SPEC)
Smith’s commentsSmith’s comments ““The geometric mean does provide a consistent measure The geometric mean does provide a consistent measure
in this context, but it is consistently wrong.”in this context, but it is consistently wrong.” ““If performance is to be normalized with respect to a If performance is to be normalized with respect to a
specific machine, an aggregate performance measure specific machine, an aggregate performance measure such as total time or harmonic mean rate should be such as total time or harmonic mean rate should be calculated before any normalizing is done. That is, calculated before any normalizing is done. That is, benchmarks should not be individually normalized first.”benchmarks should not be individually normalized first.”
He advocates using time, or normalizing after taking He advocates using time, or normalizing after taking meanmean
53
VariabilityVariability Does a single mean summarize performance Does a single mean summarize performance
of programs in benchmark suite?of programs in benchmark suite? Can decide if good predictor by characterizing Can decide if good predictor by characterizing
variability of distribution using standard variability of distribution using standard deviationdeviation
Like geometric mean, geometric standard Like geometric mean, geometric standard deviation is multiplicative rather than deviation is multiplicative rather than arithmeticarithmetic
Can simply take the logarithm of SPECRatios, Can simply take the logarithm of SPECRatios, compute the standard mean and standard compute the standard mean and standard deviation, and then take the exponent to deviation, and then take the exponent to convert back:convert back:
1
1exp ln
exp ln
n
i
i
i
GeometricMean SPECRation
GeometricStDev StDev SPECRatio
54
Form of Standard DeviationForm of Standard Deviation Standard deviation is more informative if we know Standard deviation is more informative if we know
distribution has a standard formdistribution has a standard form bell-shaped normal distributionbell-shaped normal distribution, whose data are symmetric , whose data are symmetric
around mean around mean lognormal distributionlognormal distribution, where logarithms of data--not data , where logarithms of data--not data
itself--are normally distributed (symmetric) on a itself--are normally distributed (symmetric) on a logarithmic scalelogarithmic scale
For a lognormal distribution, we expect that For a lognormal distribution, we expect that
68% of samples fall in range 68% of samples fall in range
95% of samples fall in range 95% of samples fall in range
gstdevmeangstdevmean ,/
22 ,/ gstdevmeangstdevmean
55
0
2000
4000
6000
8000
10000
12000
14000
wup
wis
e
swim
mgr
id
appl
u
mes
a
galg
el art
equa
ke
face
rec
amm
p
luca
s
fma3
d
sixt
rack
apsi
SP
EC
fpR
atio
1372
5362
2712
GM = 2712GSTEV = 1.98
Example (1/2)Example (1/2)
GM and multiplicative StDev of GM and multiplicative StDev of SPECfp2000 for Itanium 2SPECfp2000 for Itanium 2
56
Example (2/2)Example (2/2)
GM and multiplicative StDev of SPECfp2000 for GM and multiplicative StDev of SPECfp2000 for AMD AthlonAMD Athlon
0
2000
4000
6000
8000
10000
12000
14000
wup
wis
e
swim
mgr
id
appl
u
mes
a
galg
el art
equa
ke
face
rec
amm
p
luca
s
fma3
d
sixt
rack
apsi
SP
EC
fpR
atio
1494
29112086
GM = 2086GSTEV = 1.40
57
CommentsComments Standard deviation of 1.98 for Itanium 2 is Standard deviation of 1.98 for Itanium 2 is
much higher-- vs. 1.40--so results will differ much higher-- vs. 1.40--so results will differ more widely from the mean, and therefore are more widely from the mean, and therefore are likely less predictablelikely less predictable
Falling within one standard deviation: Falling within one standard deviation: 10 of 14 benchmarks (71%) for Itanium 210 of 14 benchmarks (71%) for Itanium 2 11 of 14 benchmarks (78%) for Athlon11 of 14 benchmarks (78%) for Athlon
Thus, the results are quite compatible with a Thus, the results are quite compatible with a lognormal distribution (expect 68%)lognormal distribution (expect 68%)
58
Next TimeNext Time Principles of Computer DesignPrinciples of Computer Design Amdahl’s LawAmdahl’s Law
Then on to Instruction Set ArchitectureThen on to Instruction Set Architecture
59
Readings/ReferencesReadings/References Gordon Moore’s paperGordon Moore’s paper
http://www.intel.com/pressroom/kits/events/moores_law_40th/index.htm
http://download.intel.com/museum/Moores_Law/Articles-Press_Releases/Gordon_Moore_1965_Article.pdf
Paper on which latency section is basedPaper on which latency section is based Patterson, D. A. 2004. Latency lags bandwidth. Patterson, D. A. 2004. Latency lags bandwidth.
Commun. ACMCommun. ACM 47, 10 (Oct. 2004), 71-75. 47, 10 (Oct. 2004), 71-75. ““Characterizing Computer Performance with a Characterizing Computer Performance with a
Single Number”, J. E. Smith, CACM, October Single Number”, J. E. Smith, CACM, October 1988, pp. 1202-12061988, pp. 1202-1206