lecture 2 quantifying performance topics speedup amdahl’s law execution time readings: chapter 1...
TRANSCRIPT
Lecture 2Quantifying Performance
Lecture 2Quantifying Performance
Topics Topics Speedup Amdahl’s law Execution time
Readings: Chapter 1Readings: Chapter 1
August 26, 2015
CSCE 513 Computer Architecture
– 2 – CSCE 513 Fall 2015
OverviewOverviewLast TimeLast Time
Overview: Speed-up Power wall, ILP wall, to multicore Def Computer Architecture Lecture 1 slides 1-29?
NewNew Syllabus and other course pragmatics
Website (not shown)Dates
Figure 1.9 Trends: CPUs, Memory, Network, Disk Why geometric mean? Speed-up again Amdahl’s Law
– 3 – CSCE 513 Fall 2015
Instruction Set Architecture (ISA)Instruction Set Architecture (ISA)
““Myopic view of computer architecture”Myopic view of computer architecture”
• ISAs – appendices A and KISAs – appendices A and K• 80x86• ARM• MIPS
– 4 – CSCE 513 Fall 2015
MIPS Register Usage Figure 1.4MIPS Register Usage Figure 1.4
Ref. CAAQA
– 5 – CSCE 513 Fall 2015
MIPS Instructions Fig 1.5 Data TransfersMIPS Instructions Fig 1.5 Data Transfers
Ref. CAAQA
– 6 – CSCE 513 Fall 2015
MIPS Instructions Fig 1.5 Arithmetic/LogicalMIPS Instructions Fig 1.5 Arithmetic/Logical
Most significant bit is bit zero; lsb #63Most significant bit is bit zero; lsb #63
Ref. CAAQA
– 7 – CSCE 513 Fall 2015
MIPS Instructions Fig 1.5 ControlMIPS Instructions Fig 1.5 Control
Condition Codes set by ALU operationsCondition Codes set by ALU operations
PC Relative branchesPC Relative branches
JumpsJumps
JumpAndLinkJumpAndLink
Return address on function call?Return address on function call?
Return AddressReturn AddressRef. CAAQA
– 8 – CSCE 513 Fall 2015
MIPS Instruction Format (RISC)MIPS Instruction Format (RISC)
Ref. CAAQA
– 9 – CSCE 513 Fall 2015
New World “Computer Architecture is back”New World “Computer Architecture is back”““Computer architects must design a computer to meet Computer architects must design a computer to meet
functional requirements as well as price, power, functional requirements as well as price, power, performance, and availability goals”performance, and availability goals”
Patterson, David A.; Hennessy, John L. (2011-08-01). Patterson, David A.; Hennessy, John L. (2011-08-01). Computer Architecture: A Quantitative Approach Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer (The Morgan Kaufmann Series in Computer Architecture and Design) (Kindle Locations 944-945). Architecture and Design) (Kindle Locations 944-945). Elsevier Science (reference). Kindle Edition. Elsevier Science (reference). Kindle Edition.
You Tube You Tube
Google(Computer Architecture is back Patterson)Google(Computer Architecture is back Patterson)
– 10 – CSCE 513 Fall 2015
Fig 1.7 Requirement Challenges for Computer ArchitectsFig 1.7 Requirement Challenges for Computer Architects
Level of software compatibilityLevel of software compatibility
Operating system requirementsOperating system requirements
StandardsStandardsRef. CAAQA
– 11 – CSCE 513 Fall 2015
Fig 1.10 Performance over last 25-40 yearsFig 1.10 Performance over last 25-40 years
ProcessorsProcessors
Ref. CAAQA
– 12 – CSCE 513 Fall 2015
Fig 1.10 Performance over last 25-40 yearsFig 1.10 Performance over last 25-40 years
MemoryMemory
Ref. CAAQA
– 13 – CSCE 513 Fall 2015
Fig 1.10 Performance over last 25-40 yearsFig 1.10 Performance over last 25-40 years
NetworksNetworks
DiskDisk
Ref. CAAQA
– 14 – CSCE 513 Fall 2015
Fig 1.10 Performance over last 25-40 yearsFig 1.10 Performance over last 25-40 years
ProcessorsProcessors
Ref. CAAQA
– 15 – CSCE 513 Fall 2015
Quantitative Principles of DesignQuantitative Principles of Design
Take advantage of ParallelismTake advantage of Parallelism
Principle of localityPrinciple of locality Temporal locality Spatial locality
Focus on the common caseFocus on the common case
Amdahl’s Law Amdahl’s Law
Ref. CAAQA
– 16 – CSCE 513 Fall 2015
Taking Advantage of ParallelismTaking Advantage of Parallelism
Logic parallelism – carry lookahead adderLogic parallelism – carry lookahead adder
Word parallelism – SIMDWord parallelism – SIMD
Instruction pipelining – overlap fetch and executeInstruction pipelining – overlap fetch and execute
Multithreads – executing independent instructions at Multithreads – executing independent instructions at the same timethe same time
Speculative execution - Speculative execution -
Ref. CAAQA
– 17 – CSCE 513 Fall 2015
Principle of LocalityPrinciple of Locality
Rule of thumb – (Zipf’s law?? Not really)Rule of thumb – (Zipf’s law?? Not really)
A program spends 90% of its execution time in only A program spends 90% of its execution time in only 10% of the code.10% of the code.
So what do you try to optimize?So what do you try to optimize?
Locality of memory referencesLocality of memory references
Temporal localityTemporal locality
Spatial localitySpatial locality
– 18 – CSCE 513 Fall 2015
Amdahl’s LawAmdahl’s Law
])1[(
1
enhanced
enhancedenhanced
overall
Speedup
FracFrac
Speedup
Suppose you have an enhancement or improvement in a Suppose you have an enhancement or improvement in a design component.design component.
The improvement in the performance of the system is limited The improvement in the performance of the system is limited by the % of the time the enhancement can be usedby the % of the time the enhancement can be used
Ref. CAAQA
– 19 – CSCE 513 Fall 2015
Amdahl’s with Fractional Use FactorAmdahl’s with Fractional Use Factor
Example:Example: Suppose we are considering an enhancement to Suppose we are considering an enhancement to a web server. The enhanced CPU is 10 times faster on a web server. The enhanced CPU is 10 times faster on computation but the same speed on I/O. Suppose also computation but the same speed on I/O. Suppose also that 60% of the time is waiting on I/Othat 60% of the time is waiting on I/O
])1[(
1
enhanced
enhancedenhanced
overall
Speedup
FracFrac
Speedup
Ref. CAAQA
– 20 – CSCE 513 Fall 2015
Amdahl’s Law revisitedAmdahl’s Law revisited
Speedup = Speedup = (execution time without enhance.) / (execution time with (execution time without enhance.) / (execution time with enhance.)enhance.)
= (time without) / (time with) = T= (time without) / (time with) = Twowo / T / Twithwith
NotesNotes
1.1. The enhancement will be used only a portion of the time.The enhancement will be used only a portion of the time.
2.2. If it will be rarely used then why bother trying to improve itIf it will be rarely used then why bother trying to improve it
3.3. Focus on the improvements that have the highest fraction of use Focus on the improvements that have the highest fraction of use time denoted Fractiontime denoted Fractionenhancedenhanced. .
4.4. Note FractionNote Fractionenhancedenhanced is always less than 1. is always less than 1.
Then Then
Ref. CAAQA
– 21 – CSCE 513 Fall 2015
Amdahl’s with Fractional Use FactorAmdahl’s with Fractional Use Factor
])1[(*ExecTimeExecTime oldnewenhanced
enhancedenhanced Speedup
FracFrac
])1[(
1
)/()(
enhanced
enhancedenhanced
newoldoverall
SpeedupFrac
Frac
ExecTimeExecTimeSpeedup
Ref. CAAQA
– 22 – CSCE 513 Fall 2015
Amdahl’s with Fractional Use FactorAmdahl’s with Fractional Use Factor
Example:Example: Suppose we are considering an enhancement to Suppose we are considering an enhancement to a web server. The enhanced CPU is 10 times faster on a web server. The enhanced CPU is 10 times faster on computation but the same speed on I/O. Suppose also computation but the same speed on I/O. Suppose also that 60% of the time is waiting on I/Othat 60% of the time is waiting on I/O
5625.164.
1
04.6.
1104.
)4.1(
1
])1[(
1
enhanced
enhancedenhanced
overall
SpeedupFrac
FracSpeedup
Ref. CAAQA
– 23 – CSCE 513 Fall 2015
Graphics Square Root Enhancement p 40Graphics Square Root Enhancement p 40
NewDesign1 FPSQRT NewDesign1 FPSQRT
• 20% speed up FPSQR 10 times20% speed up FPSQR 10 times
NewDesign2 FP NewDesign2 FP
• improve all FP by 1.6; FP=50% of exec timeimprove all FP by 1.6; FP=50% of exec time
Ref. CAAQA
– 24 – CSCE 513 Fall 2015
Geometric Means vs Arithmetic MeansGeometric Means vs Arithmetic Means
Ref. CAAQA
– 25 – CSCE 513 Fall 2015
Comparing 2 computers Spec_RatiosComparing 2 computers Spec_Ratios
Ref. CAAQA
– 26 – CSCE 513 Fall 2015
Performance MeasuresPerformance Measures
Response time (latency) -- time between start and completion Response time (latency) -- time between start and completion
Throughput (bandwidth) -- rate -- work done per unit time Throughput (bandwidth) -- rate -- work done per unit time
Processor Speed – e.g. 1GHzProcessor Speed – e.g. 1GHz
When does it matter? When does it matter?
When does it not?When does it not?
tenhancemenwithtimeexecution
tenhancemenwithouttimeexecutionSpeedup
___
___
Ref. CAAQA
– 27 – CSCE 513 Fall 2015
AvailabilityAvailability
MTTRMTTF
MTTFlabilityModuleAvai
Ref. CAAQA
– 28 – CSCE 513 Fall 2015
MTTF Example MTTF Example
Ref. CAAQA
– 29 – CSCE 513 Fall 2015
Comparing Performance fig 1.15Comparing Performance fig 1.15
Computer AComputer A Computer BComputer B Computer CComputer C
Program P1Program P1 11 1010 2020
Program P2Program P2 10001000 100100 2020
Total TimesTotal Times 10011001 110110 4040
Comparing three program executing on three machines
Faster than relationships A is 10 times faster than B on program 1 B is 10 times faster than A on program 2 C is 50 times faster than A on program 2 … 3 * 2 comparisons (3 choose 2 computers * 2 programs)
So what is the relative performance of these machines???
Ref. CAAQA
– 30 – CSCE 513 Fall 2015
fig 1.15 Total Execution timesfig 1.15 Total Execution times
Computer AComputer A Computer BComputer B Computer CComputer C
Program P1Program P1 11 1010 2020
Program P2Program P2 10001000 100100 2020
Total timesTotal times 10011001 110110 4040
Comparing three program executing on three machines
So now what is the relative performance of these machines??? B is 1001/110 = 9.1 times as fast as A
Arithmetic mean execution time =
Ref. CAAQA
– 31 – CSCE 513 Fall 2015
Weighted Execution Times fig 1.15Weighted Execution Times fig 1.15
Computer AComputer A Computer BComputer B Computer CComputer C
Program P1Program P1 11 1010 2020
Program P2Program P2 10001000 100100 2020
Program P3Program P3 10011001 110110 4040
Now assume that we know that P1 will run 90%, and P2 10% of the time.
So now what is the relative performance of these machines???
timeA = .9*1 + .1*1000 = 100.9timeB = .9*10 +.1*100 = 19Relative performance A to B = 100.9/19 = 5.31
Ref. CAAQA
– 32 – CSCE 513 Fall 2015
Geometric MeansGeometric Means
Compare ratios of performance to a standardCompare ratios of performance to a standard
Using A as the standard Using A as the standard
program 1 B ratio = 10/1 = 10 C ratio = 20/1 = 20program 1 B ratio = 10/1 = 10 C ratio = 20/1 = 20
program 2 Br = 100/1000 = .1 Cr = 20/1000 = .02program 2 Br = 100/1000 = .1 Cr = 20/1000 = .02
B is “twice as fast” as C using A as the standardB is “twice as fast” as C using A as the standard
Using B as the standardUsing B as the standard
program 1 Ar = 1/10 = .1 program 1 Ar = 1/10 = .1 Cr = Cr =
program 2 Br = 1000/100 = 10 Cr =program 2 Br = 1000/100 = 10 Cr =
So now compare A and B ratios to each other you get So now compare A and B ratios to each other you get the same 10 and .1, so what? Same ?the same 10 and .1, so what? Same ?
Ref. CAAQA
– 33 – CSCE 513 Fall 2015
Geometric Means fig 1.17Geometric Means fig 1.17Measure performance ratios to a standard machineMeasure performance ratios to a standard machine
Normalized to ANormalized to A Normalized to BNormalized to B Normalized to CNormalized to C
AA BB CC AA BB CC AA BB CC
P1P1 1.01.0 10.010.0 20.020.0 .1.1 1.01.0 2.02.0 .05.05 .5.5 1.01.0
P2P2 1.01.0 .1.1 .02.02 1010 1.01.0 .2.2 50.50. 5.05.0 1.01.0
Arithmetic Arithmetic meanmean
1.01.0 5.055.05 10.0110.01 5.055.05 1.01.0 1.11.1 25.0325.03 2.752.75 1.01.0
Geometric Geometric MeanMean
1.01.0 1.01.0 .63.63 1.01.0 1.01.0 .63.63 1.581.58 1.581.58 1.01.0
Total TimeTotal Time 1.01.0 .11.11 .4.4 9.19.1 1.01.0 .36.36 25.0325.03 2.752.75 1.01.0
Ref. CAAQA
– 34 – CSCE 513 Fall 2015
CPU Performance EquationCPU Performance Equation
Almost all computers use a clock running at a fixed Almost all computers use a clock running at a fixed rate.rate.
Clock period e.g. 1GHzClock period e.g. 1GHz
Instruction Count (IC) – Instruction Count (IC) –
CPI = CPUclockCyclesForProgram / InstructionCountCPI = CPUclockCyclesForProgram / InstructionCount
CPUtime = IC * ClockCycleTime * CyclesPerInstructionCPUtime = IC * ClockCycleTime * CyclesPerInstruction
ClockRateogramclesForCPUclockCy
TimeClockCycleogramclesForCPUclockCyCPUtime
/Pr
*Pr
Ref. CAAQA
– 35 – CSCE 513 Fall 2015
CPU Performance EquationCPU Performance Equation
CPUtime =CPUtime =
Instruction CountInstruction Count
CPICPI
Clock cycle timeClock cycle time
CPUtimeogram
Seconds
ClockCycle
Seconds
nInstructio
sClockCycle
ogram
nsInstructio
PrPr
n
i ii CPIICCPUcycles1
Ref. CAAQA
– 36 – CSCE 513 Fall 2015
Fallacies and PitfallsFallacies and Pitfalls
1.1. Pitfall: Falling prey to Amdahl’s law.Pitfall: Falling prey to Amdahl’s law.
2.2. Pitfall: A single point of failure.Pitfall: A single point of failure.
3.3. Fallacy: the cost of the processor dominates the Fallacy: the cost of the processor dominates the cost of the system.cost of the system.
4.4. Fallacy: Benchmarks remain valid indefinitely.Fallacy: Benchmarks remain valid indefinitely.
5.5. The rated mean time to failure of disks is 1,2000,000 The rated mean time to failure of disks is 1,2000,000 hours or almost 140 years, so disks practically never hours or almost 140 years, so disks practically never fail.fail.
6.6. Fallacy Peak performance tracks observed Fallacy Peak performance tracks observed performance.performance.
7.7. Pitfall: Fault detection can lower availability.Pitfall: Fault detection can lower availability.
Ref. CAAQA
– 37 – CSCE 513 Fall 2015
List of AppendicesList of Appendices
Ref. CAAQA
– 38 – CSCE 513 Fall 2015
Homework Set #2Homework Set #2
1.1. 1.8 a-d (Change 2015 throughout the question 1.8 a-d (Change 2015 throughout the question 2025)2025)
2.2. 1.91.9
3.3. 1.121.12
4.4. 1.181.18
5.5. Matrix multiply (mm.c will be emailed and placed on Matrix multiply (mm.c will be emailed and placed on website)website)
a. Compile with gcc –S
b. Compile with gcc –O2 –S and note differences
George K. Zipf (1949) (1949) Human Behavior and the Principle Human Behavior and the Principle of Least Effortof Least Effort. Addison-Wesley. Addison-Wesley
– 39 – CSCE 513 Fall 2015
1.81.8 [10/ 15/ 15/ 10/ 10] < 1.4, 1.5 > One challenge for architects is that [10/ 15/ 15/ 10/ 10] < 1.4, 1.5 > One challenge for architects is that the design created today will require several years of the design created today will require several years of implementation, verification, and testing before appearing on the implementation, verification, and testing before appearing on the market. This means that the architect must project what the market. This means that the architect must project what the technology will be like several years in advance. Sometimes, this technology will be like several years in advance. Sometimes, this is difficult to do. is difficult to do.
a.a. [10] < 1.4 > According to the trend in device scaling observed by [10] < 1.4 > According to the trend in device scaling observed by Moore’s law, the number of transistors on a chip in 2015 should be Moore’s law, the number of transistors on a chip in 2015 should be how many times the number in 2005? how many times the number in 2005?
b.b. b. [15] < 1.5 > The increase in clock rates once mirrored this trend. b. [15] < 1.5 > The increase in clock rates once mirrored this trend. Had clock rates continued to climb at the same rate as in the Had clock rates continued to climb at the same rate as in the 1990s, approximately how fast would clock rates be in 2015? 1990s, approximately how fast would clock rates be in 2015?
c.c. c. [15] < 1.5 > At the current rate of increase, what are the clock c. [15] < 1.5 > At the current rate of increase, what are the clock rates now projected to be in 2015? rates now projected to be in 2015?
d.d. d. [10] < 1.4 > What has limited the rate of growth of the clock rate, d. [10] < 1.4 > What has limited the rate of growth of the clock rate, and what are architects doing with the extra transistors now to and what are architects doing with the extra transistors now to increase performance? increase performance?
Patterson, David A.; Hennessy, John L. (2011-08-01). Computer Architecture: A Quantitative Patterson, David A.; Hennessy, John L. (2011-08-01). Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design) (Kindle Approach (The Morgan Kaufmann Series in Computer Architecture and Design) (Kindle Locations 2203-2217). Elsevier Science (reference). Kindle Edition. Locations 2203-2217). Elsevier Science (reference). Kindle Edition.
– 40 – CSCE 513 Fall 2015
1.91.9 [10/ 10] < 1.5 > You are designing a system for a real- [10/ 10] < 1.5 > You are designing a system for a real-time application in which specific deadlines must be time application in which specific deadlines must be met. Finishing the computation faster gains nothing. met. Finishing the computation faster gains nothing. You find that your system can execute the necessary You find that your system can execute the necessary code, in the worst case, twice as fast as necessary. code, in the worst case, twice as fast as necessary.
a.a. [10] < 1.5 > How much energy do you save if you [10] < 1.5 > How much energy do you save if you execute at the current speed and turn off the system execute at the current speed and turn off the system when the computation is complete? when the computation is complete?
b.b. [10] < 1.5 > How much energy do you save if you set the [10] < 1.5 > How much energy do you save if you set the voltage and frequency to be half as much?voltage and frequency to be half as much?
Patterson, David A.; Hennessy, John L. (2011-08-01). Patterson, David A.; Hennessy, John L. (2011-08-01). Computer Architecture: A Quantitative Approach (The Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Morgan Kaufmann Series in Computer Architecture and Design) (Kindle Locations 2218-2224). Elsevier Science Design) (Kindle Locations 2218-2224). Elsevier Science (reference). Kindle Edition. (reference). Kindle Edition.
– 41 – CSCE 513 Fall 2015
1.121.12 [20/ 20/ 20] < 1.1, 1.2, 1.7 > In a server farm such as that used [20/ 20/ 20] < 1.1, 1.2, 1.7 > In a server farm such as that used by Amazon or eBay, a single failure does not cause the entire by Amazon or eBay, a single failure does not cause the entire system to crash. Instead, it will reduce the number of requests system to crash. Instead, it will reduce the number of requests that can be satisfied at any one time.that can be satisfied at any one time.
a.a. [20] < 1.7 > If a company has 10,000 computers, each with a MTTF [20] < 1.7 > If a company has 10,000 computers, each with a MTTF of 35 days, and it experiences catastrophic failure only if 1/ 3 of of 35 days, and it experiences catastrophic failure only if 1/ 3 of the computers fail, what is the MTTF for the system? the computers fail, what is the MTTF for the system?
b.b. b. [20] < 1.1, 1.7 > If it costs an extra $ 1000, per computer, to b. [20] < 1.1, 1.7 > If it costs an extra $ 1000, per computer, to double the MTTF, would this be a good business decision? Show double the MTTF, would this be a good business decision? Show your work.your work.
c.c. [20] < 1.2 > Figure 1.3 shows, on average, the cost of downtimes, [20] < 1.2 > Figure 1.3 shows, on average, the cost of downtimes, assuming that the cost is equal at all times of the year. For assuming that the cost is equal at all times of the year. For retailers, however, the Christmas season is the most profitable retailers, however, the Christmas season is the most profitable (and therefore the most costly time to lose sales). If a catalog (and therefore the most costly time to lose sales). If a catalog sales center has twice as much traffic in the fourth quarter as sales center has twice as much traffic in the fourth quarter as every other quarter, what is the average cost of downtime per every other quarter, what is the average cost of downtime per hour duringhour during
Patterson, David A.; Hennessy, John L. (2011-08-01). Computer Architecture: A Quantitative Patterson, David A.; Hennessy, John L. (2011-08-01). Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design) (Kindle Approach (The Morgan Kaufmann Series in Computer Architecture and Design) (Kindle Locations 2250-2257). Elsevier Science (reference). Kindle Edition. Locations 2250-2257). Elsevier Science (reference). Kindle Edition.
– 42 – CSCE 513 Fall 2015
1.181.18 [10/ 20/ 20/ 20/ 25] < 1.10 > When parallelizing an [10/ 20/ 20/ 20/ 25] < 1.10 > When parallelizing an application, the ideal speedup is speeding up by the number of application, the ideal speedup is speeding up by the number of processors. This is limited by two things: percentage of the processors. This is limited by two things: percentage of the application that can be parallelized and the cost of application that can be parallelized and the cost of communication. Amdahl’s law takes into account the former communication. Amdahl’s law takes into account the former but not the latter. but not the latter.
a.a. [10] < 1.10 > What is the speedup with N processors if 80% of [10] < 1.10 > What is the speedup with N processors if 80% of the application is parallelizable, ignoring the cost of the application is parallelizable, ignoring the cost of communication? communication?
b.b. b. [20] < 1.10 > What is the speedup with 8 processors if, for b. [20] < 1.10 > What is the speedup with 8 processors if, for every processor added, the communication overhead is 0.5% of every processor added, the communication overhead is 0.5% of the original execution time. the original execution time.
c.c. c. [20] < 1.10 > What is the speedup with 8 processors if, for c. [20] < 1.10 > What is the speedup with 8 processors if, for every time the number of processors is doubled, the every time the number of processors is doubled, the communication overhead is increased by 0.5% of the original communication overhead is increased by 0.5% of the original execution time? execution time?
– 43 – CSCE 513 Fall 2015
d. [20] < 1.10 > What is the speedup with N processors d. [20] < 1.10 > What is the speedup with N processors if, for every time the number of processors is if, for every time the number of processors is doubled, the communication overhead is increased doubled, the communication overhead is increased by 0.5% of the original execution time? by 0.5% of the original execution time?
e. [25] < 1.10 > Write the general equation that solves e. [25] < 1.10 > Write the general equation that solves this question: What is the number of processors with this question: What is the number of processors with the highest speedup in an application in which P% of the highest speedup in an application in which P% of the original execution time is parallelizable, and, for the original execution time is parallelizable, and, for every time the number of processors is doubled, the every time the number of processors is doubled, the communication is increased by 0.5% of the original communication is increased by 0.5% of the original execution time?execution time?
Patterson, David A.; Hennessy, John L. (2011-08-01). Computer Architecture: A Quantitative Patterson, David A.; Hennessy, John L. (2011-08-01). Computer Architecture: A Quantitative Approach (The Morgan Kaufmann Series in Computer Architecture and Design) (Kindle Approach (The Morgan Kaufmann Series in Computer Architecture and Design) (Kindle Locations 2327-2331). Elsevier Science (reference). Kindle Edition. Locations 2327-2331). Elsevier Science (reference). Kindle Edition.