computer architecture guidance keio university amano, hideharu hunga@am . ics . keio . ac ....

41
Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am ics keio ac jp

Upload: lauryn-sprague

Post on 15-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Computer   Architecture  Guidance

Keio University

AMANO, Hideharu

hunga@am . ics . keio . ac . jp

Page 2: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Contents

Techniques of two key architectures for future system LSIs Parallel Architectures Reconfigurable Architectures

Advanced uni-processor architecture→   Special Course of Microprocessors (by Prof. Y

amasaki, Fall term)

Page 3: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Class

Lecture using Powerpoint: (70mins. ) The ppt file is uploaded on the web site

http://www.am.ics.keio.ac.jp, and you can down load/print before the lecture.

When the file is uploaded, the message is sent to you by E-mail.

Textbook: “Parallel Computers” by H.Amano (Sho-ko-do)

Exercise (20mins.) Simple design or calculation on design issues

Page 4: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Evaluation

Exercise on Parallel Programming using SCore on RHiNET-2 (50%)

Exercise after every lecture (50%)

Page 5: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Computer   Architecture   1Introduction to Parallel Architectures

Keio University

AMANO, Hideharu

hunga@am . ics . keio . ac . jp

Page 6: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Parallel Architecture

Purposes Classifications Terms Trends

A parallel architecture consists of multiple processing units which work simultaneously.

Page 7: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Boundary between Parallel machines and Uniprocessors

ILP(Instruction   Level   Parallelism) A single Program Counter Parallelism Inside/Between instructions

TLP(Tread   Level   Parallelism) Multiple Program Counters Parallelism between processes and jobs

Uniprocessors

Parallel MachinesDefinition

Hennessy & Petterson’s Computer Architecture: A quantitative approach

Page 8: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Increasing of simultaneous issued instructions vs. Tightly couplingSingle pipeline

Multiple instructions issue

Multiple Threads execution

Connecting Multiple Processors

Shared memory, Shared register

On chip implementation

Tightly Coupling

Performance improvement

Page 9: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Purposes of providing multiple processors Performance

A job can be executed quickly with multiple processors Fault tolerance

If a processing unit is damaged, total system can be available : Redundant systems

Resource sharing Multiple jobs share memory and/or I/O modules for cost

effective processing : Distributed systems Low power

High performance with Low frequency operation

Parallel Architecture: Performance Centric!

Page 10: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Flynn’s Classification

The number of Instruction   Stream :  M(Multiple)/S(Single)

The number of Data   Stream : M/S SISD

Uniprocessors ( including Super scalar 、 VLIW ) MISD : Not existing ( Analog   Computer ) SIMD MIMD

Page 11: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

SIMD (Single Instruction StreamsMultiple Data Streams

Instruction

InstructionMemory

Processing Unit

Data memory

•All Processing Units executes the same instruction•Low degree of flexibility•Illiac-IV/MMX ( coarse grain )•CM-2 type ( fine grain )

Page 12: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Two types of SIMD

Coarse grain : Each node performs floating point numerical operations ILLIAC-IV , BSP,GF-11 Multimedia instructions in recent high-end CPUs Dedicated on-chip approach: NEC’s IMEP

Fine grain : Each node only performs a few bits operations

ICL   DAP,   CM-2 , MP-2 Image/Signal Processing Connection Machines extends the application to Artificial I

ntelligence (CmLisp)

Page 13: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

A processing unit of CM-2

A B F

C

256bit memory

s c

OP

Flags

Context

1bit serial ALU

Page 14: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Element of CM2

P P P PP P P PP P P PP P P P

Router

4x4 Processor Array

12links 4096 Hypercubeconnection

256bit x 16 PE RAM

Instruction

LSI chip

4096 chips =64K   PE

Page 15: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Thinking Machines’ CM2 (1996)

Page 16: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

The future of SIMD

Coarse grain SIMD A large scale supercomputer like Illiac-IV/GF-11 will not

revive. Multi-media instructions will be used in the future. Special purpose on-chip system will become popular.

Fine grain SIMD Advantageous to specific applications like image

processing General purpose machines are difficult to be built

ex.CM2  →  CM5

Page 17: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

MIMD

Processors

Memory modules (Instructions ・ Data )

•Each processor executes individual instructions•Synchronization is required•High degree of flexibility•Various structures are possible

Interconnectionnetworks

Page 18: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Classification of MIMD machines Structure of shared memory UMA(Uniform   Memory   Access   Model )

provides shared memory which can be accessed from all processors with the same manner.

NUMA(Non-Uniform   Memory   Access   Model )provides shared memory but not uniformly accessed.

NORA/NORMA ( No   Remote   Memory   Access   Model )provides no shared memory. Communication is done with

message passing.

Page 19: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

UMA The simplest structure of shared memory machin

e The extension of uniprocessors OS which is an extension for single processor ca

n be used. Programming is easy. System size is limited.

Bus connected Switch connected

A total system can be implemented on a single chip

On-chip multiprocessorChip multiprocessorSingle chip multiprocessor

Page 20: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

An example of UMA : Bus connected

PU PU

SnoopCache

PU

SnoopCache

PU

SnoopCache

Main   Memory

  shared   bus

SnoopCache

SMP(Symmetric MultiProcessor)

On chip multiprocessor

Page 21: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Switch connected UMA

Switch

Local   Memory

CPU

Interface

Main   Memory

. . . .

… .

The gap between switch and bus becomes small

Page 22: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

NUMA Each processor provides a local memory,

and accesses other processors’ memory through the network.

Address translation and cache control often make the hardware structure complicated.

Scalable : Programs for UMA can run without modification. The performance is improved as the system size.

Competitive to WS/PC clusters with Software DSM

Page 23: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Typical structure of NUMA

Node  1

Node   2

Node  3

Node  0 0

InterconnectonNetwork

Logical address space

Page 24: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Classification of NUMA Simple NUMA :

Remote memory is not cached. Simple structure but access cost of remote memory is large.

CC-NUMA : Cache   Coherent Cache consistency is maintained with hardware. The structure tends to be complicated.

COMA:Cache   Only   Memory   Architecture No home memory Complicated control mechanism

Page 25: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Cray’s T3D: A simple NUMA supercomputer (1993)

Using

Alpha 21064

Page 26: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

The Earth simulator(2002)

Page 27: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

SGI   Origin

Hub  Chip

Main   Memory

Network

Bristled   Hypercube

Main   Memory is connected directly with Hub   Chip1 cluster consists of 2 PE.

Page 28: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

SGI’s CC-NUMA Origin3000(2000)

Using R12000

Page 29: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

DDM(Data   Diffusion  Machine )

... ... ... ...

Page 30: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

NORA/NORMA

No shared memory Communication is done with message

passing Simple structure but high peak performance

The fastest processor is always NORA(except The Earth Simulator)

Hard for programming

Inter-PU communications Cluster computing

Page 31: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Early Hypercube machine nCUBE2

Page 32: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Fujitsu’s NORA AP1000(1990)

Mesh connection SPARC

Page 33: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Intel’s Paragon XP/S(1991)

Mesh connection i860

Page 34: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

PC Cluster

Beowulf Cluster (NASA’s Beowulf Projects 1994, by Sterling) Commodity components TCP/IP Free software

Others Commodity components High performance networks like Myrinet / Infiniban

d Dedicated software

Page 35: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

RHiNET-2 cluster

Page 36: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Terms(1)

Multiprocessors : MIMD machines with shared memory ( Strict definition :by Enslow J

r.) Shared memory Shared I/O Distributed OS Homogeneous

Extended definition: All parallel machines ( Wrong usage )

Page 37: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Terms ( 2) Multicomputer

MIMD machines without shared memory, that is NORA/NORMA

Arraycomputer A machine consisting of array of processing elements :

SIMD A supercomputer for array calculation (Wrong usage)

Loosely coupled ・ Tightly coupled Loosely coupled: NORA, Tightly coupled: UMA But, are NORAs really loosely coupled??

Don’t use if possible

Page 38: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Stored programmingbased

Others

SIMDFine grain Coarse grain 

MIMD

Bus connected UMASwitch connected UMA

NUMASimple NUMACC-NUMACOMA

NORA

Systolic architectureData flow architectureMixed controlDemand driven architecture

Multiprocessors

Multicomputers

Classification

Page 39: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Systolic Architecture

Data x

Data y

Computational Array

Data streams are inserted into an array of special purpose computing node in a certain rhythm.

Introduced in Reconfigurable Architectures

Page 40: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Data flow machines

a bc

d e

(a+b)x(c+(dxe))

A process is driven by the data

Also introduced in Reconfigurable Systems

Page 41: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp

Exercise 1

In this class, a PC cluster RHiNET is used for exercising parallel programming. Is RHiNET classified into Beowulf cluster ? Why do you think so?

If you take this class, send the answer with your name and student number to [email protected]