Download - Computer Architecture CSE 3322
![Page 1: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/1.jpg)
Computer Architecture CSE 3322Lecture 7
Assignment: 2.10, 2.11, 2.13, 2.18 Due Mon 9/22
TEST 1 - Mon 9/29
Lectures 1 - 7, Ch 2, 3
Web Sitecrystal.uta.edu/~jpatters/cse3322
![Page 2: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/2.jpg)
Computer Architecture CSE 3322
Web Sitecrystal.uta.edu/~jpatters/cse3322
Send email to Pramod Kumar, [email protected], with the names and emails of your Four Project team members by Mon Sept 15. If not on a team, send your email address to Pramod.
![Page 3: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/3.jpg)
CPI
CPI = Clock Cycles / Instruction
“Average clock cycles per instruction”
![Page 4: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/4.jpg)
CPI
CPI = Clock Cycles / Instruction
“Average clock cycles per instruction”
CPU Time = Instructions x CPI / Clock Rate = Instructions x CPI x Clock Cycle Time
![Page 5: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/5.jpg)
CPI
CPI = Clock Cycles / Instruction
“Average clock cycles per instruction”
CPU Time = Instructions x CPI / Clock Rate = Instructions x CPI x Clock Cycle Time
Average CPI = SUM of CPI (i) * I(i) for i=1, n Instruction Count
![Page 6: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/6.jpg)
CPI
Invest Resources where time is Spent!
CPI = Clock Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count
“Average clock cycles per instruction”
CPU Time = Instruction Count x CPI / Clock Rate = Instruction Count x CPI x Clock Cycle Time
Average CPI = SUM of CPI (i) * I(i) for i=1, n Instruction Count
Average CPI = SUM of CPI(i) * F(i) for i = 1, nF(i) is the Instruction Frequency
![Page 7: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/7.jpg)
Suppose we have two implementations of the same instruction set
For some program,Machine A has:
a clock cycle time of 10 ns. and a CPI of 2.0
Machine B has:a clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much?
CPI Example
![Page 8: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/8.jpg)
Suppose we have two implementations of the same instruction set
For some program,Machine A has: CPU Time = I*2.0*10ns=I*20ns
a clock cycle time of 10 ns. and a CPI of 2.0
Machine B has: a clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much?
CPI Example
![Page 9: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/9.jpg)
Suppose we have two implementations of the same instruction set
For some program,Machine A has: CPU Time = I*2.0*10ns=I*20ns
a clock cycle time of 10 ns. and a CPI of 2.0
Machine B has: CPU Time = I*1.2*20ns=I*24nsa clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much?
CPI Example
![Page 10: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/10.jpg)
Suppose we have two implementations of the same instruction set
For some program,Machine A has: CPU Time = I*2.0*10ns=I*20ns
a clock cycle time of 10 ns. and a CPI of 2.0
Machine B has: CPU Time = I*1.2*20ns=I*24nsa clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much? A is 24/20 =1.2 faster than B
CPI Example
![Page 11: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/11.jpg)
Suppose we have two implementations of the same instruction set
For some program,Machine A has: CPU Time = I*2.0*10ns=I*20ns
a clock cycle time of 10 ns. and a CPI of 2.0
Machine B has: CPU Time = I*1.2*20ns=I*24nsa clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much? A is 24/20 =1.2 faster than B
Note: CPI is Smaller for B
CPI Example
![Page 12: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/12.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2
![Page 13: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/13.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5
Load 20% 5 1.0
Store 10% 3 .3
Branch 20% 2 .4
2.2 = CPI ave
![Page 14: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/14.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5
Load 20% 5 1.0
Store 10% 3 .3
Branch 20% 2 .4
2.2 = CPI ave
CPU Time(i) = Instr Cnt(i) * CPI(i) * Clk Cycle TimeCPU Time Inst Cnt * CPI ave * Clk Cycle Time
% Time = F(i) * CPI(i) / CPI ave
![Page 15: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/15.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2 = CPI ave
CPU Time(i) = Instr Cnt(i) * CPI(i) * Clk Cycle TimeCPU Time Inst Cnt * CPI ave * Clk Cycle Time
% Time = F(i) * CPI(i) / CPI ave
![Page 16: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/16.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2 = CPI ave
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles?
![Page 17: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/17.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 (2) 1.0 (.4) 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2 (1.6)
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles?
2.2/1.6 = 1.375
CPU Time = Inst Cnt * CPI ave * Clk Cycle Time
![Page 18: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/18.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles? CPI = 1.6
How does this compare with using branch prediction to shave a cycle off the branch time?
![Page 19: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/19.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 (1) .4 (.2) 18%
2.2 (2.0)
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles? CPI = 1.6
How does this compare with using branch prediction to shave a cycle off the branch time? CPI = 2.0
![Page 20: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/20.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles? CPI = 1.6
How does this compare with using branch prediction to shave a cycle off the branch time? CPI = 2.0
What if two ALU instructions could be executed at once?
![Page 21: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/21.jpg)
Example (RISC processor)
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 (.5) .5 (.25) 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2 (1.95)
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles? CPI = 1.6
How does this compare with using branch prediction to shave a cycle off the branch time? CPI = 2.0
What if two ALU instructions could be executed at once? CPI=1.95
![Page 22: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/22.jpg)
A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions:
Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C
The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.
Which sequence will be faster? How much?What is the CPI for each sequence?
![Page 23: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/23.jpg)
A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions:
Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second sequence
has 6 instructions: 4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be faster? How much?
What is the CPI for each sequence?
![Page 24: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/24.jpg)
A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions:
Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second sequence has
6 instructions: 4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be
faster? How much? 10 / 9 = 1.11What is the CPI for each sequence?
![Page 25: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/25.jpg)
A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions:
Class A has 1 cycle Class B has 2 cycles Class C has 3 cycles
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second sequence has 6
instructions: 4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be faster?
How much? 10 / 9 = 1.11What is the CPI for each sequence? 10/5 = 2
9/6 = 1.5
![Page 26: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/26.jpg)
MIPS = Instruction Count
Execution time x 106
A popular performance metric is MIPS, the numberof millions of instructions per second.
For a given program,
![Page 27: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/27.jpg)
MIPS = Instruction Count
Execution time x 106
A popular performance metric is MIPS, the numberof millions of instructions per second.
For a given program,
1. Cannot compare if instruction set is different
![Page 28: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/28.jpg)
MIPS = Instruction Count
Execution time x 106
A popular performance metric is MIPS, the numberof millions of instructions per second.
For a given program,
1. Cannot compare if instruction set is different2. Highly dependent on the program
![Page 29: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/29.jpg)
MIPS = Instruction Count
Execution time x 106
A popular performance metric is MIPS, the numberof millions of instructions per second.
For a given program,
1. Cannot compare if instruction set is different2. Highly dependent on the program3. Can be inversely proportional to performance
![Page 30: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/30.jpg)
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A has 1 cycle,Class B has 2 cycles, Class C has 3 cycles
Instruction counts ( billions)Code from A B CCompiler 1 5 1 1Compiler 2 10 1 1
• Which sequence will be faster according to MIPS?• Which sequence will be faster according to execution
time?
MIPS example
![Page 31: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/31.jpg)
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 5+1x2+1x3=10 billionCompiler 2
MIPS example
![Page 32: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/32.jpg)
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billionCompiler 2 15 billion
MIPS example
![Page 33: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/33.jpg)
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 1010x10-8=100Compiler 2 15 billion
MIPS example
![Page 34: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/34.jpg)
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 100 secCompiler 2 15 billion 150 sec
MIPS example
![Page 35: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/35.jpg)
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions:
Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 100 sec 7x103/100Compiler 2 15 billion 150 sec
MIPS example
![Page 36: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/36.jpg)
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions:
Class A Class B Class CCPI 1 2 3
Instruction counts ( billions)Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 100 sec 70Compiler 2 15 billion 150 sec 12x103/150
MIPS example
![Page 37: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/37.jpg)
Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A Class B Class C
CPI 1 2 3 Instruction counts ( billions)
Code from A B C TotalCompiler 1 5 1 1 7Compiler 2 10 1 1 12 CPU cycles Exec Time MIPSCompiler 1 10 billion 100 sec 70Compiler 2 15 billion 150 sec 80
MIPS example
![Page 38: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/38.jpg)
• Performance best determined by running a real application– Use programs typical of expected workload– Or, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics, etc.
Benchmarks
![Page 39: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/39.jpg)
• Performance best determined by running a real application– Use programs typical of expected workload– Or, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics, etc.
• Small benchmarks– nice for architects and designers– easy to standardize– can be abused
Benchmarks
![Page 40: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/40.jpg)
• SPEC (System Performance Evaluation Cooperative)
Benchmarks
![Page 41: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/41.jpg)
• SPEC (System Performance Evaluation Cooperative)– companies have agreed on a set of real
program and inputs
Benchmarks
![Page 42: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/42.jpg)
• SPEC (System Performance Evaluation Cooperative)– companies have agreed on a set of real
program and inputs– can still be abused
Benchmarks
![Page 43: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/43.jpg)
• SPEC (System Performance Evaluation Cooperative)– companies have agreed on a set of real
program and inputs– can still be abused – valuable indicator of performance (and
compiler technology)
Benchmarks
![Page 44: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/44.jpg)
SPEC95• Eighteen application benchmarks (with inputs)
reflecting a technical computing workload
![Page 45: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/45.jpg)
SPEC95• Eighteen application benchmarks (with inputs)
reflecting a technical computing workload• Eight integer applications
–go, m88ksim, gcc, compress, li, ijpeg, perl, vortex
![Page 46: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/46.jpg)
SPEC95• Eighteen application benchmarks (with inputs)
reflecting a technical computing workload• Eight integer applications
–go, m88ksim, gcc, compress, li, ijpeg, perl, vortex
• Ten floating-point intensive applications–tomcatv, swim, su2cor, hydro2d, mgrid, applu,
turb3d, apsi, fppp, wave5
![Page 47: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/47.jpg)
SPEC95• Eighteen application benchmarks (with inputs)
reflecting a technical computing workload• Eight integer applications
–go, m88ksim, gcc, compress, li, ijpeg, perl, vortex
• Ten floating-point intensive applications–tomcatv, swim, su2cor, hydro2d, mgrid, applu,
turb3d, apsi, fppp, wave5• Must run with standard compiler flags
–eliminate special undocumented incantations that may not even generate working code for real programs
![Page 48: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/48.jpg)
PentiumClock rate (MHz)
SP
EC
fp
Pentium Pro
2
0
4
6
8
3
1
5
7
9
10
200 25015010050
SPEC fp
![Page 49: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/49.jpg)
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )
Amdahl's Law
![Page 50: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/50.jpg)
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a
machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?
Amdahl's Law
![Page 51: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/51.jpg)
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a machine, with
multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? Improved Time = (100 – 80) + 80/n = 100/4
Amdahl's Law
![Page 52: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/52.jpg)
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a machine, with
multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? Improved Time = (100 – 80) + 80/n = 100/4
20 + 80/n = 25 80 = 5n ; n = 16
Amdahl's Law
![Page 53: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/53.jpg)
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a machine,
with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?
How about making it 5 times faster?
Amdahl's Law
![Page 54: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/54.jpg)
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a machine, with
multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?
How about making it 5 times faster?Improved Time = (100 –80) + 80/n = 100/5
Amdahl's Law
![Page 55: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/55.jpg)
Execution Time After Improvement =
Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement )
• Example:Suppose a program runs in 100 seconds on a machine, with multiply
responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?
How about making it 5 times faster?Improved Time = (100 –80) + 80/n = 100/5 20 + 80/n = 20
80/n = 0 Impossible!
Amdahl's Law
![Page 56: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/56.jpg)
• Performance is specific to a particular
program/s
– Total execution time is a consistent summary of performance
Remember
![Page 57: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/57.jpg)
• Performance is specific to a particular program/s– Total execution time is a consistent summary of
performance
• For a given architecture performance increases come
from:– increases in clock rate (without adverse CPI affects)
Remember
![Page 58: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/58.jpg)
• Performance is specific to a particular program/s– Total execution time is a consistent summary of
performance
• For a given architecture performance increases come from:– increases in clock rate (without adverse CPI affects)– improvements in processor organization that lower CPI
Remember
![Page 59: Computer Architecture CSE 3322](https://reader036.vdocument.in/reader036/viewer/2022062315/56814d52550346895dba8b97/html5/thumbnails/59.jpg)
• Performance is specific to a particular program/s– Total execution time is a consistent summary of performance
• For a given architecture performance increases come from:– increases in clock rate (without adverse CPI affects)– improvements in processor organization that lower CPI– compiler enhancements that lower CPI and/or instruction count
Remember