ece 588/688 advanced computer architecture...

20
© Copyright by Alaa Alameldeen and Haitham Akkary 2018 1 ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen [email protected] Winter 2018 Portland State University

Upload: nguyenngoc

Post on 22-Apr-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

© Copyright by Alaa Alameldeen and Haitham Akkary 2018 1

ECE 588/688

Advanced Computer

Architecture II

Instructor: Alaa Alameldeen

[email protected]

Winter 2018

Portland State University

Portland State University – ECE 588/688 – Winter 2018 2

When and Where?

When: Monday & Wednesday 5:00-6:50 PM

Where: PCC Willow Creek 312

Office hours: After class, or by appointment

TA: Aarti Rane

Office hours; TBD (starting next week)

Webpage: http://www.cecs.pdx.edu/~alaa/ece588/

Go to webpage for:

Class slides and papers

Homework and project assignments

Changes in class schedule

Portland State University – ECE 588/688 – Winter 2018 3

Course Description

Parallel computing and multiprocessors

Symmetric Multiprocessors (SMPs)

Chip Multiprocessors (CMPs), aka multi-core

processors

Multithreading and parallel programming models

Multiprocessor memory systems

Emphasis on papers readings NOT on a textbook

Tutorial papers

Original sources and ideas papers

Papers covering most recent trends

Portland State University – ECE 588/688 – Winter 2018 4

Expected Background

ECE 587/687 or equivalent

Superscalar processor microarchitecture

Branch prediction

Cache organization

Memory ordering

Speculative execution

Multithreading

Programming experience in “C”

Portland State University – ECE 588/688 – Winter 2018 5

Grading Policy

Class Participation 5%

Paper reviews 10%

Homeworks 20%

Project 25%

Midterm Exam 20%

Final Exam 20%

Grading Scale: A: 92-100%

A-: 86-91.5%

B+: 80-85.5%

B: 76-79.5%

B-: 72-75.5%

C+: 68-71.5%

C: 64-67.5%

C-: 60-63.5%

D+: 57-59.5%

D: 54-56.5%

D-: 50-53.5%

F: Below 50%

Portland State University – ECE 588/688 – Winter 2018 6

Important Dates

Midterm Exam: Monday, February 12th, in class

Final Exam: Wednesday March 14th, in class

Project Presentations: Friday March 16th, 7-9 PM, location TBD

Project Reports Due: Tuesday March 20th (email)

No class next Monday (holiday). Other classes may get combined/cancelled/moved.

Portland State University – ECE 588/688 – Winter 2018 7

Why Study Computer Architecture

Technology advancements require continuous

optimization of cost, performance, and power

Moore’s law

Original version: Transistor scaling exponential

Popular version: Processor performance

exponentially increasing

Innovation needed to satisfy market trends

User and software requirements keep on changing

Software developers expecting improvements in

computing power

Portland State University – ECE 588/688 – Winter 2018 8

Why Study Parallel Computing

Technology made multi-core processors both

feasible AND necessary for performance

Moore’s law: too many transistors on a die than can

be used (efficiently) for a single processor

Traditional out-of-order processors face memory

and power walls

Software requirements need more computing

power than a single processor

Scientific computations

Commercial applications

Portland State University – ECE 588/688 – Winter 2018 9

Moore’s Law (1965)

#Transistors Per Chip (Intel)

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

10,000,000,000

1971

1974

1977

1980

1983

1986

1989

1992

1995

1998

2001

2004

Year

Almost 75%

increase per

year

Portland State University – ECE 588/688 – Winter 2018 10

Memory Wall

0.1

1

10

100

1000

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

Year

Tim

e (

ns

)

CPU Cycle Time

DRAM latency

CPU cycle time: 500 times faster since 1982

DRAM Latency: Only ~5 times faster since 1982

Portland State University – ECE 588/688 – Winter 2018 11

Why Not Build Larger OoO

Processors? Memory Wall

1980 Memory Latency ~ 1 instruction

Now ~ 1000 instructions

Power Wall

Dynamic power increases quadratically with frequency, static power

increases exponentially

ILP Limitations

Serial programs don’t have an infinite amount of instructions

to execute in parallel

Other Problems

Temperature & cooling

On-chip interconnect

Complexity & testing

Portland State University – ECE 588/688 – Winter 201812

Technology Trends: Growing

#Transistors Causes Inflection Points

Most famous inflection point

enabled simple microprocessor 1971 Intel 4004

2300 transistors

Could access 300 bytes

of memory (0.0003 megabytes)

Bigger and better-performing processors were

enabled by more transistors

©2006, Prof. Mark Hill

Portland State University – ECE 588/688 – Winter 201813

Microprocessor Design Complexity

Millions of Transistors

Bit-level Parallelism (BLP)

64-bit multiply in one/two

cycles

Instruction-Level Parallelism

(ILP)

Dozens of instruction in

flight

Branch prediction

Out-of-order

Dominating Use of Caches

Level-one cache

Level-two cache

Emerging Chip Multiprocessing

L1 Instruction Cache

L1 Data Cache

L2

Cache

Tags

PowerPC 750 (1998)

Portland State University – ECE 588/688 – Winter 201814

CPU 0 + L1 Cache CPU 1 + L1 Cache

Shared L2 Cache

Multi-core Processors (Chip

Multiprocessors) Replicate

processor “core” &

Caches

Uses more transistors

Tolerates “memory wall”

Simpler lower-power cores

Reduces complexity

IBM Power 4

Portland State University – ECE 588/688 – Winter 201815

Software Trends

TPC-C Throughput increased by more than 75% per year

over eight years

What about desktop applications?

What about scientific applications?

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

Dec-99 Apr-01 Sep-02 Jan-04 May-05 Oct-06 Feb-08 Jul-09

Tra

nsacti

on

s P

er

Min

ute

(tp

mC

)

Portland State University – ECE 588/688 – Winter 2018 16

Multi-Core Performance

Recall Popular Moore’s Law:

Microprocessor performance doubles every 1.5-2

years

Future Multi-Core Performance Doublings Require

Effective multithreaded programming

Better communication

Faster synchronization

More cores

Portland State University – ECE 588/688 – Winter 201817

Multi-core Processor Architecture

Which design is better?

Balance cores, caches, and pin bandwidth

Core0

Core1

Core0

Cache Cache

CacheCache

Core1

Core2

Core3

Core4

Core5

Core6

Core7

Core8

Core9

Core10

Core11

Core12

Core13

Core14

Core15

Portland State University – ECE 588/688 – Winter 2018 18

Programming Models Single-core

One program runs on core

Programmers write efficient programs

Architects design for fast execution of instructions

Multi-core

Each core can run a thread/task/program

Programmers can split up their programs?

Multiprocessors

Shared memory

Message passing

Threads

Data Parallel

Portland State University – ECE 588/688 – Winter 2018 19

Introduction to Parallel Computing Serial vs. Parallel Computation (tutorial pictures)

Parallel computing includes

Break up problem into smaller discrete parts that can be solved

concurrently

Break each part further into a sequence of (serial) instructions

All parts are run simultaneously on different CPUs

Why use parallel computing?

Save time

Solve larger problems

Provide concurrency (do more than one thing at the same time)

Save money (use multiple cheaper resources vs. a single

expensive supercomputer)

Use more memory than available on a single computer

Use non-local resources

Who uses parallel computing and why? (tutorial graphs)

Portland State University – ECE 588/688 – Winter 2018 20

Reading Assignment

Introduction to Parallel Computing, Lawrence Livermore National

Laboratory tutorial

Read before Wed class

Wed 1/17: Olukotun et al., “The Case for a Single-Chip

Multiprocessor,” ASPLOS 1996

Submit review before the beginning of class:

Paper summary

Strong points (2-4 points)

Weak points (2-4 points)

Review due before 5PM on day of class

Send by email to aarane AT pdx DOT edu

Plain text, no attachments

Subject line has to be: “ECE588_REVIEW_01_17”