architectures - 1

7/27/2019 Architectures - 1

1/27

31/07/2013 1

ARCHITECTURES - 1

Mariagiovanna Sami


2/27

31/07/2013 2

Architecture: which definition? Abstract archi tecturethe functional

specificationof a computer:

Concrete architecturean

implementationof an abstract architecture.

Abstract arcihtecture: a black box

specification specification of a machine

can be seen:


3/27

31/07/2013 3

Architecture definition (2) From the programmers point of viewwe deal

with a programming model, equivalent to

description of themachine language;

From the designers point of viewwe deal with

a hardware model(a black-box description for

the designer: must include additional information,

e.g., interface protocols etc.).


4/27

31/07/2013 4

Architecture definition (3) Usually, architecture denotes abstract

architecture. Concrete Architecture is

often called microarchitecture(term

originally created for microprogrammed

CPUs, extended more in general to the

structural description in terms of functionalunits and interconnections).


5/27

31/07/2013 5

Where do we start from? Background: the Von Neumann

paradigm (and the Harvard alternative)

Extension to a reactive paradigm still

V.N!


6/27

31/07/2013 6

An Architectural Paradigm: Composition of hardware and program

execution mode;

Does not include software, but implies the

execution mode of object code!


7/2731/07/2013 7

The classical V.N. abstract

architecture:

Memory Control Unit

I/O

ALU CPU


8/2731/07/2013 8

Programming style: imperative,

control-flow dominated

One address space in memory

information is identified by its address Machine instructions are stored

sequentially: natural order of fetching and

execution is by increasing address values execution in the same sequential order;

Variables are identified by namestranslated as addresses.


9/2731/07/2013 9

The Control Flow:

The C.U. determines address of next instructionto be executed as contained in therProgramCounter (PC)andfetches it from memory:

The C.U. decodes the instruction and controls its

execution by proper commands to ALU andmemory

Simultaneously, address ofnextinstruction iscomputed: as a rule, next instruction is

immediately sequential to the one being executed(address computed by incrementing PC) unlessotherwise explicitly stated by controlinstruction


10/2731/07/2013 10

Control-dominated execution: Controlis implicitlydetermined by

orderingof instructions in the program or

explicitlymodified by jump/branch

instructions:

Executionis inherentlysequentialand

serial.


11/2731/07/2013 11

The basic approach: C.U. the

only active unit All transfers to/from memory controlled by

C.U.;

I/O initiated by instructions in the program

(program-controlled I/O, polling):

C.U. activates transfer channels;

All actions are de facto synchronized by

execution of the program.


12/2731/07/2013 12

The Harvard variant... Basically, separates programand data

memory:

Program

MemoryControl Unit

I/O

ALU

Data

Memory


13/2731/07/2013 13

Performance Evaluation...

Made with reference to a set ofbenchmark

programs(often synthetic);

For every instruction in the machines

Instruction Set (IS) the total time required

(fetch+execute) is known;

Profiling(execution of the program with

suitable sets of data) gives the dynamic

sequenceof instructions executed


14/2731/07/2013 14

Performance Evaluation (2)

Total time required by execution of the program =

sum of times required by all instructions in the

dynamic sequence of execution (Instructions may

have different latency depending on specificoperations, necessity of accessing memory to

read/write data, even length of instruction itself).

performance optimization through choice of

best algorithm + less time-consuming

instructions


15/2731/07/2013 15

(Some of) the bottlenecks Memory is slower than logic: larger (and less

costly) memory = wider gap, but ever larger

addressable memory space is requested!; Execution is totally serialan instruction must be

completed before its successor is fetched from

memory; technology dominates instruction

latency and overall performances;


16/2731/07/2013 16

Bottlenecks (2) If a reactive system is designed

(typically, an application-specific or

embedded system) an external event

created by an I/O device is serviced only

when the device is polled by the program

real-time only as good as theprogrammer can make it!


17/2731/07/2013 17

So, how to achieve better

performances? Modify memory structureso that the

programmer will see a very large addressable

spacebut the CPU will see a fast equivalentmemory;

Achieve better eff iciencyfor execution of the

instruction sequence;

Allow servicing external events when events

arisein an asynchronous waywith respect to

program execution.


18/2731/07/2013 18

Starting from the bottom... Servicing external events? Solution born

with the first minicomputers (early 60s):

interrupt(an external unit may initiatean

actionexecution of the servicing routine

is then controlled by the C.U.)


19/2731/07/2013 19

Getting better efficiency for

instruction execution? A first approach: create instructions capable of

executing complex operations(object code more

compact; one instruction fetched from memoryexecutes actions previously performed by a

sequence of instructions).

Drawbacks: more complex C.U. (longer clock

period); identification of useful complex

instructions for general-purpose CPUs is difficult.


20/2731/07/2013 20

Complex instructionsStill...:

The solution has been widely adopted

CISC machines a winning approach for along time;

May be very useful when specializedtasks

are widely used (e.g., DSP or image-processing) or for application-specificCPUs.


21/2731/07/2013 21

Getting better efficiency for

instruction executionthe

alternative Modify structure of CPU and execution

paradigm to introduce parallelismovercome the serial execution bottleneck.

But...

Which kind of parallelism? Parallelism hasto be detected within the applicationat

which level?


22/2731/07/2013 22

What about the memory

problem? Introduce a hierarchyof memories - large,

slow (and cheap) ones at the bottom, fast,

small (and costly) ones at the top (nearest

the CPU);

Allow a wider memory bandwidthmore

than one unit of information at a time istransferred from memory to CPU (or

between memories).


23/2731/07/2013 23

Memory (2)In fact:

Hierarchy: does not imply any assumption

on mode of execution other than serial;

requires extensions to hw structure

controlling memory access;

larger bandwidth: meaningful only if some

form of parallelism is adopted.


24/27

31/07/2013 24

What these lectures will be about: Memory hierarchy: it is assumed that the

basic points are already known (e.g.:

virtual memory and its hw supports; the

scope of cache memory...). Attention will

be given to cache organization and

performances: technological aspects arenot discussed here (other courses...).


25/27

31/07/2013 25

What these lectures will be about

(2): Parallelism: from within the CPU to

system-level, taking into account

character istics of application-specif ic

systems:

Pipelining

Instruction-Level Parallelism (ILP)

Multi-threading

Multi-processor systems.


26/27

31/07/2013 26

Course organization Lectures

Exercises

Use of tools for architecture evaluation and

design:

Analysis of an applications behaviour given

a fixed architecture

Design of a specific architecture for a given

application.


27/27

27

Texts: Slides are available in the Masters

repository;

Suggested readings: a list will be circulated(books available in the Library, papersaccessible via Internet or provided in hard-

copy); Manuals of software tools: available in the

repository.

architectures - 1

Documents