e6200, fall 07, oct 24ambale: cmp1 bharath ambale venkatesh 10/24/2007

10
E6200, Fall 07, Oct 24 Ambale: CMP 1 Bharath Ambale Venkatesh 10/24/2007

Upload: meghan-ford

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

E6200, Fall 07, Oct 24Ambale: CMP3 Instruction-level parallelism (ILP) Re-ordering of instructions so that they can be executed in parallel Pipelining Superscalars Maximum of 6-10 instructions per cycle for real applications Bottlenecks: Branch prediction

TRANSCRIPT

Page 1: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 1

Bharath Ambale Venkatesh10/24/2007

Page 2: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 2

TrendsIncreasing processor size/speed leads to increase

in power requiredIncreasing heat generated leads to need of

cooling componentsIncrease in network speeds is much slower than

increase in processor speedsFuture applications are becoming more and more

parallel – multimedia, face recognition, voice recognition, etc…

Future applications also becoming data intensive

Page 3: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 3

Instruction-level parallelism (ILP)Re-ordering of

instructions so that they can be executed in parallel

Pipelining SuperscalarsMaximum of 6-10

instructions per cycle for real applications

Bottlenecks: Branch prediction

Page 4: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 4

Thread-level parallelism (TLP)Multiple threads

spawned from same process (SMT)

Loop-level parallelismThreads interact with

each otherPentium 4 uses

hyper-threadingBottleneck: Memory

cache

Page 5: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 5

Process-level Parallelism (PLP)Run multiple

independent processes controlled by the OS

Symmetric Multiprocessors (SMP) : multiple independent processors connected by a network (cluster)

Bottleneck: network

Page 6: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 6

Memory AccessCache miss: when data has to fetched from main

memoryCache miss in superscalars leads to significant

delaySMT leads to multiple processes accessing a

shared cache – a cache is pushed to have more ports

Memory bandwidth is a problem

Page 7: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 7

CMPIntroduce multiple

supersacalar processors each capable of running multiple threads

Each processor has individual cache and also has a shared cache

Processors need not be homogenous

Page 8: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 8

CMP Advantages

Multi-taskingShorter signal pathLess-power consumedMemory bandwidth is not the limiting problem

DisadvantagesSpecialized software to utilize multithreadingThermal management is more difficult

Commercial CMP’sIntel and AMD’s dual-core, quad-core,..

Page 9: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 9

Cell Processor

Page 10: E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007

E6200, Fall 07, Oct 24 Ambale: CMP 10

References "A D&T Roundtable: Are Single-Chip Multiprocessors in

Reach?," IEEE Design and Test of Computers ,vol. 18, no. 1, pp. 82-89, January/February, 2001.

Wenbin Yao, Dongsheng Wang, Weimin Zheng, Songliu Guo. “Current Trends in High Performance Computing and Its Applications.” Architecture Design of a Single-chip Multiprocessor , pp. 165-174, 2005.

L. Hammond, B. Nayfeh, and K. Olukotun. "A single-chip multiprocessor." IEEE Computer, vol. 30, no. 9, pp. 79--85, September 1997.