lecture 1 introduction - nus computing - homesooyj/cs5222_l1.pdf · 2015-01-08 · parallelism) for...

51
CS5222 Advanced Computer Architecture Lecture 1 Introduction

Upload: vancong

Post on 25-Jun-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

CS5222Advanced Computer Architecture

Lecture 1Introduction

Overview Teaching Staff Introduction to Computer Architecture

History Future / Trends Significance

The course Content Workload Administrative Matters

2[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Dr. Soo Yuen Jien Contact Information: Room: COM2 #02-61 Consultation Hour:

Friday 3pm-5pm Wednesday after lecture Email me for other timing

Email: [email protected]

Comments / Suggestions welcome

Who am I?

3[ CS5222 Adv. Comp. Arch. AY1415S2 ]

WHAT IS COMPUTER ARCHITECTURE?

4[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Computer Architecture: Definition Architecture (in Computing):

The organization of the components and functionalities of a system

Computer Architecture: The study of computer (processor) architecture To maximize performance within constraints Typically classified into 3 categories:

Instruction Set MicroArchitecture System Design

5[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The 3 Categories

•The hardware/software interface•Expose the functionalities to

programmer

Instruction Set

•Organization of components•Techniques / Mechanisms for

performanceMicroArchitecture

• Interconnection, data path•Memory hierarchySystem Design

6[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Computer Architecture VS Hardware Engineering

Computer Architecture: Describes the behavior of the processor Describes the high level mechanisms /

techniques for better performance

Hardware Engineering: Concerns with the actual implementation of

the architecture Logic / Circuit implementation, Packaging,

Cooling, Transistor process technology etc

7[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Computer System: The brief history Let's review the progress of computer

system in the past:1. Follow the thread of "Personal" Computer2. Another thread on High-end supercomputer

Observe the progress in terms of: Speed ( Operations / Second ) Size Availability and Cost

8[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: 1946 - ENIAC ENIAC:

World’s first programmable electronic digital computer

1900 additions per second 18,000 vacuum tubes 30 ton, 80 by 8.5 feet

9[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: 1951 - UNIVAC UNIVAC:

first commercial computer of US Uses Von Neumman design

2000 additions per second for $1 million Sold 48 copies

10[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: 1964 – IBM 360 IBM System/360:

Six implementations with varying price, performance

An example: 2MHz, 128KB-256KB memory, 500K operations/sec for $1M

All binary compatible, redefines industry!

11[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: 1965 – PDP-8

DEC PDP-8: first minicomputer 4k of 12-bit words 4 registers

330K operations per second for $16,000 sold 50,000 copies!

12[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: 1971 – Intel 4004

Intel 4004: First microprocessor

(single chip CPU) 4-bit processor for

calculator 1KB data + 4KB

program memory Only 2300 transistors 16-pin package 740KHz 100K operations per

second

13[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: 1977 – Apple II

Apple II: first personal computer

1 Mhz clock, 4kB of RAM, $1300 ~200k operations

per second

14[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: 1981 – IBM PC

IBM PC The system that shapes

the IT industry as we know it

Intel 8088 Processor 4.77 MHz, 16-256kB

RAM

240K operations for $3000!

15[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: 2003 – Pentium 4 Intel Pentium 4 processor

Clock speed 3.0GHz for around $300 169 million transistors 6000M operations/sec

16[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: 2011 – Intel i7 Intel Core i7 processor

Clock speed 3.2GHz for around $500 ~120GFlops

17[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Brief History: Supercomputer

18[ CS5222 Adv. Comp. Arch. AY1415S2 ]

0.0

5,000.0

10,000.0

15,000.0

20,000.0

25,000.0

30,000.0

35,000.0

Linpack Performance ( teraflops )

1,105.01,759.02,566.0

10,510.0

17,590.0

33,826.7

Nov, 2008 Road‐Runner (US)

Nov, 2009 Jaguar (US)

Nov, 2010 TianHe (China)

Nov, 2011 K‐Computer (Japan)

Nov, 2012 Titan (US)

Nov, 2013 TianHe‐2 (China)

Summary: From a few to manyn

Transistor is the building block of CPU since 1960s

1970 - 1980

2K – 100K

1980 - 1990

100K – 1M

1990 - 2000

1M – 100M

2000 - 2011

100M – 2.2B

Current World Population = 7Billionabout the number of transistors in 3 CPU chips!

19[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Summary: From BIG to small

Process size = Minimum length of a transistor

80286

1982

1.5 µm

Pentium

1993

0.80 µm- 0.25 µm

Pentium 4

2000

0.180 µm- 0.065 µm

Wave length of visible light = 350nm (violet) to 780nm (red)Process size now smaller than wavelength of violet light!

Core i7

2010

0.045 µm- 0.032 µm

20[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Summary: From S-L-O-W to fastFLOPS = FLoating-point Operation Per Second

80286

1982

1.8 MIPS*

Pentium

1993

200 MFLOPS#

Pentium 4

2000

4 GFLOPS#

Core i7

2011

120 GFLOPS #

21[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Summary: The Brief History Unprecedented progress since late 1940s

Performance doubling ~2 years (1971-2005): Total of 36,000X improvement! If transportation industry matched this

improvement, we could have traveled Singapore to Shanghai, China in about a second for roughly a few cents!

Incredible amount of innovations to revolutionize the computing industry again and again

22[ CS5222 Adv. Comp. Arch. AY1415S2 ]

GREAT!!(BUT IS THERE ANYTHING LEFT TO DO?)

23[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Moore’s Law Intel co-founder Gordon Moore "predicted" in

1965 that Transistor density will double every 18 months

24[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Growth in Processor Performance

25[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Growth in Processor Performance Prior to mid-80s

Largely technology driven Average 25% performance gain per year

Mid-80s to 2002 Both technology, instruction set (RISC), and

organization Average 52% performance gain per year Factor of seven gain from organization

2002 onwards Average 20% performance gain per year

26[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Three Walls Three major reasons for the unsustainable

growth in uniprocessor performance

1. The Memory Wall: Increasing gap between CPU and Main memory

speed

2. The ILP Wall: Decreasing amount of "work" (instruction level

parallelism) for processor

3. The Power Wall: Increasing power consumption of processor

27[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Memory Wall Memory access speed increases at about

10% / yr Processor speed increases at about 50% / yr

Memory is now order of magnitude slower than the processor speed E.g. Intel Core i7 has 0.3ns cycle, DDR3

SDRAM latency is ~10ns

Increasing amount of chip area dedicated to on-chip cache

28[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The ILP Wall Instruction Level Parallelism (ILP) defines the

amount of instructions that can be executed in parallel The main source of performance for superscalar

processors

Very limited for implicit ILP, discovered on-the-fly by processor Average ~3 instructions (depends!!)

Move to explicit ILP Parallel Programming and Execution

29[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Power Wall We can now cramp more transistor into a

chip than the ability (power) to turn them on!

30[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Power Consumption: A comparison

~10mega watts

1 HDB block ~50kilo watts

~500 watts

Frige ~600 watts

31[ CS5222 Adv. Comp. Arch. AY1415S2 ]

The Power Wall: Challenges Mobile/Portable (cell phone, laptop, PDA)

Battery life is critical

Desktop 400 million computers in the world 0.16PW (PetaWatt = 1015 Watt) of power

dissipation Equivalent to 26 nuclear power plants

Data centers 1 single server rack is between 5 and 20 kW 100s of those racks in a single room

32[ CS5222 Adv. Comp. Arch. AY1415S2 ]

SO, HOW DO WE FIGHT THE WAR (WALL)?

33[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Meeting the challenge Hyper-Threading Technology (HTT) in Xeon and

Pentium 4 Allow one physical processor to appear and behave as

two virtual processors to the operating system Two independent thread gives more ILP!

Intel dual-core (Pentium D) Multiple microprocessor cores on a single chip

Copyright © 2005 Intel

34[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Parallelism saves Power Dynamic Power = C x V2 x f

C = Capacitance, V = Voltage, f = clock freq Performance is proportional to clock frequency

Exploit explicit parallelism for reducing power using additional cores Increase density (=more transistors = more

capacitance) Can increase cores (2x) and performance (2x) Or increase cores (2x) but decrease frequency

(f/2)

35[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Multicore Revolution Chip density is continuing to increase ~2x

every 2 years Clock speed is not Number of processor cores may double instead

36[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Multicore Revolution: Industry

All microprocessor companies switch to MP (2X CPUs / 2 yrs) Procrastination results in 2X sequential perf. / 5 yrs

Current State: Intel i7 has 6 cores The STI Cell processor (PS3) has 8 cores nVidia Tesla GPU has up to 512 cores Intel MIC has > 50 cores

“We are dedicating all of our future product development to multicore designs. … This is a sea

change in computing” Paul Otellini, President, Intel (2005)

37[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Multicore/Manycore Roadmap Multicore: 2X / 2 yrs ≈ 64 cores in 8

years Manycore: 8X to 16X multicore

12

48

1632

6464128

256512

1

10

100

1000

2003 2005 2007 2009 2011 2013 2015

38[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Architecture Outlook Expect modestly pipelined processors

Small cores not much slower than large cores

Parallelism is energy efficient path to performance Lower threshold and supply voltages lowers energy per

operation

Small, regular processing elements easier to verify

Heterogeneous processors Special function units to accelerate popular functions

39[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Multicore: Impacts All major processor vendors are producing

multicore chips Every machine will soon be a parallel machine All programmers will be parallel programmers???

Complexity may eventually be hidden in libraries, compilers, and high level languages But a lot of work is needed to get there

Big open questions: What will be the killer apps for multicore machines? How should the chips be designed, and how will

they be programmed? Many others…..

40[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Parallel Revolution May Fail“…when we start talking about parallelism and ease of use of truly parallel computers, we're talking about a

problem that's as hard as any that computer science has faced. …

I would be panicked if I were in industry.”John Hennessy, President, Stanford University, 1/07

41[ CS5222 Adv. Comp. Arch. AY1415S2 ]

100% failure rate of Parallel Computer Companies Convex, Encore, MasPar, NCUBE, Kendall Square Research,

Sequent, (Silicon Graphics), Transputer, Thinking Machines, …

What if IT goes from a growth industry to a replacement industry? If SW can’t effectively multiple cores per chip

SW no faster on new computer Only buy if computer wears out

Parallel Computing: A view from BerkeleyApplications 1. What are the applications? 2. What are common kernels of the

applications?

Architecture and Hardware 3. What are the HW building blocks? 4. How to connect them?

Programming Model and Systems Software

5. How to describe applications and kernels?

6. How to program the hardware?

Evaluation 7. How to measure success?

42[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Compiler Challenges Heterogeneous processors

Increase in the design space for code optimization

Auto-tuners: optimizing code at runtime

Software controlled memory management Example: Cell processor

43[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Parallel Programming Challenges Finding enough parallelism (Amdahl’s

Law) Granularity Locality Load balance Coordination and synchronization Debugging Performance modeling

44[ CS5222 Adv. Comp. Arch. AY1415S2 ]

BACK TO THE COURSE

45[ CS5222 Adv. Comp. Arch. AY1415S2 ]

What will we learn in CS5222? Instruction-Level Parallelism (ILP)

Pipelining Dynamic Scheduling (Superscalar out-of-order) Static scheduling (VLIW processors) Branch Prediction

Multi-threaded processors Multiprocessors

Symmetric shared-memory architectures Synchronization Memory consistency

Memory Hierarchy Design

46[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Where can CS5222 takes you? Advanced Compiler

System Software

Operating System

High Performance Computing

Parallel Computing

47[ CS5222 Adv. Comp. Arch. AY1415S2 ]

We expect you to know Computer Organization (CS2100) Multi-Core Architecture (CS4223)

Significant overlap in topics, but more indepth Instruction set concepts:

RISC instruction set design philosophy registers, instructions, etc.

Simple pipelining Basic caches, main memory Low-level programming experience

C is very likely to be needed

48[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Reference

Computer Architecture: A Quantitative Approach 4th Edition Hennessy & Patterson Published by Morgan

Kaffman

49[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Resources Primary and only information source is

IVLE

Workbin: Lecture notes Assignment submissions

Forum: Ask course-related technical questions in the

forum. Email is only for your personal concerns.

50[ CS5222 Adv. Comp. Arch. AY1415S2 ]

Assessment Final Exam: 50%

Assignments: 30% 2-3 assignments

Midterm: 20% Tentatively in week 7 (after term break). During normal lecture hours.

51[ CS5222 Adv. Comp. Arch. AY1415S2 ]