ece 588/688 advanced computer architecture...
Post on 22-Apr-2018
220 Views
Preview:
TRANSCRIPT
© Copyright by Alaa Alameldeen and Haitham Akkary 2018 1
ECE 588/688
Advanced Computer
Architecture II
Instructor: Alaa Alameldeen
alaa@ece.pdx.edu
Winter 2018
Portland State University
Portland State University – ECE 588/688 – Winter 2018 2
When and Where?
When: Monday & Wednesday 5:00-6:50 PM
Where: PCC Willow Creek 312
Office hours: After class, or by appointment
TA: Aarti Rane
Office hours; TBD (starting next week)
Webpage: http://www.cecs.pdx.edu/~alaa/ece588/
Go to webpage for:
Class slides and papers
Homework and project assignments
Changes in class schedule
Portland State University – ECE 588/688 – Winter 2018 3
Course Description
Parallel computing and multiprocessors
Symmetric Multiprocessors (SMPs)
Chip Multiprocessors (CMPs), aka multi-core
processors
Multithreading and parallel programming models
Multiprocessor memory systems
Emphasis on papers readings NOT on a textbook
Tutorial papers
Original sources and ideas papers
Papers covering most recent trends
Portland State University – ECE 588/688 – Winter 2018 4
Expected Background
ECE 587/687 or equivalent
Superscalar processor microarchitecture
Branch prediction
Cache organization
Memory ordering
Speculative execution
Multithreading
Programming experience in “C”
Portland State University – ECE 588/688 – Winter 2018 5
Grading Policy
Class Participation 5%
Paper reviews 10%
Homeworks 20%
Project 25%
Midterm Exam 20%
Final Exam 20%
Grading Scale: A: 92-100%
A-: 86-91.5%
B+: 80-85.5%
B: 76-79.5%
B-: 72-75.5%
C+: 68-71.5%
C: 64-67.5%
C-: 60-63.5%
D+: 57-59.5%
D: 54-56.5%
D-: 50-53.5%
F: Below 50%
Portland State University – ECE 588/688 – Winter 2018 6
Important Dates
Midterm Exam: Monday, February 12th, in class
Final Exam: Wednesday March 14th, in class
Project Presentations: Friday March 16th, 7-9 PM, location TBD
Project Reports Due: Tuesday March 20th (email)
No class next Monday (holiday). Other classes may get combined/cancelled/moved.
Portland State University – ECE 588/688 – Winter 2018 7
Why Study Computer Architecture
Technology advancements require continuous
optimization of cost, performance, and power
Moore’s law
Original version: Transistor scaling exponential
Popular version: Processor performance
exponentially increasing
Innovation needed to satisfy market trends
User and software requirements keep on changing
Software developers expecting improvements in
computing power
Portland State University – ECE 588/688 – Winter 2018 8
Why Study Parallel Computing
Technology made multi-core processors both
feasible AND necessary for performance
Moore’s law: too many transistors on a die than can
be used (efficiently) for a single processor
Traditional out-of-order processors face memory
and power walls
Software requirements need more computing
power than a single processor
Scientific computations
Commercial applications
Portland State University – ECE 588/688 – Winter 2018 9
Moore’s Law (1965)
#Transistors Per Chip (Intel)
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
10,000,000,000
1971
1974
1977
1980
1983
1986
1989
1992
1995
1998
2001
2004
Year
Almost 75%
increase per
year
Portland State University – ECE 588/688 – Winter 2018 10
Memory Wall
0.1
1
10
100
1000
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
Year
Tim
e (
ns
)
CPU Cycle Time
DRAM latency
CPU cycle time: 500 times faster since 1982
DRAM Latency: Only ~5 times faster since 1982
Portland State University – ECE 588/688 – Winter 2018 11
Why Not Build Larger OoO
Processors? Memory Wall
1980 Memory Latency ~ 1 instruction
Now ~ 1000 instructions
Power Wall
Dynamic power increases quadratically with frequency, static power
increases exponentially
ILP Limitations
Serial programs don’t have an infinite amount of instructions
to execute in parallel
Other Problems
Temperature & cooling
On-chip interconnect
Complexity & testing
Portland State University – ECE 588/688 – Winter 201812
Technology Trends: Growing
#Transistors Causes Inflection Points
Most famous inflection point
enabled simple microprocessor 1971 Intel 4004
2300 transistors
Could access 300 bytes
of memory (0.0003 megabytes)
Bigger and better-performing processors were
enabled by more transistors
©2006, Prof. Mark Hill
Portland State University – ECE 588/688 – Winter 201813
Microprocessor Design Complexity
Millions of Transistors
Bit-level Parallelism (BLP)
64-bit multiply in one/two
cycles
Instruction-Level Parallelism
(ILP)
Dozens of instruction in
flight
Branch prediction
Out-of-order
Dominating Use of Caches
Level-one cache
Level-two cache
Emerging Chip Multiprocessing
L1 Instruction Cache
L1 Data Cache
L2
Cache
Tags
PowerPC 750 (1998)
Portland State University – ECE 588/688 – Winter 201814
CPU 0 + L1 Cache CPU 1 + L1 Cache
Shared L2 Cache
Multi-core Processors (Chip
Multiprocessors) Replicate
processor “core” &
Caches
Uses more transistors
Tolerates “memory wall”
Simpler lower-power cores
Reduces complexity
IBM Power 4
Portland State University – ECE 588/688 – Winter 201815
Software Trends
TPC-C Throughput increased by more than 75% per year
over eight years
What about desktop applications?
What about scientific applications?
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Dec-99 Apr-01 Sep-02 Jan-04 May-05 Oct-06 Feb-08 Jul-09
Tra
nsacti
on
s P
er
Min
ute
(tp
mC
)
Portland State University – ECE 588/688 – Winter 2018 16
Multi-Core Performance
Recall Popular Moore’s Law:
Microprocessor performance doubles every 1.5-2
years
Future Multi-Core Performance Doublings Require
Effective multithreaded programming
Better communication
Faster synchronization
More cores
Portland State University – ECE 588/688 – Winter 201817
Multi-core Processor Architecture
Which design is better?
Balance cores, caches, and pin bandwidth
Core0
Core1
Core0
Cache Cache
CacheCache
Core1
Core2
Core3
Core4
Core5
Core6
Core7
Core8
Core9
Core10
Core11
Core12
Core13
Core14
Core15
Portland State University – ECE 588/688 – Winter 2018 18
Programming Models Single-core
One program runs on core
Programmers write efficient programs
Architects design for fast execution of instructions
Multi-core
Each core can run a thread/task/program
Programmers can split up their programs?
Multiprocessors
Shared memory
Message passing
Threads
Data Parallel
Portland State University – ECE 588/688 – Winter 2018 19
Introduction to Parallel Computing Serial vs. Parallel Computation (tutorial pictures)
Parallel computing includes
Break up problem into smaller discrete parts that can be solved
concurrently
Break each part further into a sequence of (serial) instructions
All parts are run simultaneously on different CPUs
Why use parallel computing?
Save time
Solve larger problems
Provide concurrency (do more than one thing at the same time)
Save money (use multiple cheaper resources vs. a single
expensive supercomputer)
Use more memory than available on a single computer
Use non-local resources
Who uses parallel computing and why? (tutorial graphs)
Portland State University – ECE 588/688 – Winter 2018 20
Reading Assignment
Introduction to Parallel Computing, Lawrence Livermore National
Laboratory tutorial
Read before Wed class
Wed 1/17: Olukotun et al., “The Case for a Single-Chip
Multiprocessor,” ASPLOS 1996
Submit review before the beginning of class:
Paper summary
Strong points (2-4 points)
Weak points (2-4 points)
Review due before 5PM on day of class
Send by email to aarane AT pdx DOT edu
Plain text, no attachments
Subject line has to be: “ECE588_REVIEW_01_17”
top related