corrrrr unit iv

8/10/2019 COrrrrr Unit IV

1/29

1

Mailam Engineering College(Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai

& Accredited by National Board of Accreditation (NBA, New Delhi

Mailam (Po), Villupuram (Dt). Pin: 604 304

DEPARTMET !" C!MP#TER APP$%CAT%!&Computer !rgani'ation MC*++

Part A

+. -at o /ou meant / pipelining1 2an *0+*

A pipelinin! "ay be vis#ali$ed as a collection of se!"ents called pipe sta!es thro#!h

which binary infor"ation flows% Each se!"ent perfor"s partial processin! as dictated by the

tas% The res#lt obtained in each se!"ent is transferred to the ne't se!"ent in the pipeline%The final res#lt is obtained after the data passes thro#!h all the se!"ents%

*. E5plain laten/ an t-roug-put.

atency) Each Instr#ction taes certain a"o#nt of ti"e to co"plete% This is called aslatency% It is the ti"e differences when an instr#ction is iss#ed and when it is co"pleted%

Thro#!hp#t) The n#"ber of instr#ctions co"pleted in a !iven ti"e is calledThro#!hp#t%

3. -at are t-e ma7or -arateri8ti8 o9 a pipeline1

*ipelinin! cannot be i"ple"ented in a sin!le tas% As it wors by splittin! "#ltiple

tas into a n#"ber of s#btas and operatin! on the" si"#ltaneo#sly%

The speed#p or efficiently is achieved by #sin! the pipelinin! depends on the n#"ber

of pipe sta!es and the n#"ber of available tas that can be s#bdivide%

4. De9ine ontrol or. 2Ma/ *0+*The co"bination of control steps #sed for the !eneration of control si!nals is a

control word% A control word is a word whose individ#al bits represent the vario#s controlsi!nals%

;. -at are t-e


2/29

2

Data ha$ards%

Instr#ction ha$ards%

.tr#ct#ral ha$ards%

?. -at are =a'ar81

A ha$ard is also called as h#rdle %The sit#ation that prevents the ne't instr#ction in

the instr#ction strea" fro" e'ec#tin! d#rin! its desi!nated Cloc cycle% .tall is introd#cedby ha$ard%

. -at i8 meant / Data -a'ar81

A data ha$ard is any condition in which either the so#rce or the destination operandsof an instr#ction are not available at the ti"e e'pected in pipeline% As a res#lt so"e

operation has to be delayed, and the pipeline stalls%

+0. -at i8 meant / %n8trution -a'ar81The pipeline "ay be stalled beca#se of a delay in the availability of an instr#ction%

+or e'a"ple, this "ay be a res#lt of "iss in cache, re/#irin! the instr#ction to be fetchedfro" the "ain "e"ory% .#ch ha$ards are called as Instr#ction ha$ards or Control ha$ards%

++. -at i8 meant / &trutural -a'ar81

The str#ct#ral ha$ards is the sit#ation when two instr#ctions re/#ire the #se of a!iven hardware reso#rce at the sa"e ti"e% The "ost co""on case in which this ha$ard

"ay arise is access to "e"ory%

+*. -at o /ou mean / out@o9 orer e5eution1 %8 it De8irale1In a pipelined processor with several instr#ctions is process conc#rrently it is *ossible

for instr#ction to finish o#t of se/#ence, one instr#ction finishes before another which isiss#ed earlier% As for as "ain co"p#tation is concerned no 0a$ards will happen b#t if an

interr#pts occ#rs it creates the proble"%

+3. $i8t out Variou8 ran-ing te-niue u8e in miro program ontrol unit1

Bit1rin! Usin! Conditional 2ariable

-ide Branch Addressin!

+4. -at i8 miro programming an miro programme ontrol unit1

3icropro!ra""in! is a "ethod of control #nit desi!n in which the control #nitselection and se/#encin! infor"ation are stored in 43 and 4A3s called control store or

control "e"ory%

3icro pro!ra""ed control #nit is a !eneral approach #sed for i"ple"entation ofcontrol #nit% 0ere control si!nals are !enerated by a pro!ra" si"ilar to "achine lan!#a!e

pro!ra"s%

+;. De9ine t-e term -arire ontrol. 2an *0+*It is the one that contains control #nits that #se fi'ed lo!ic circ#its to interpret

instr#ctions and !enerate control si!nals fro" the"% The fi'ed lo!ic circ#it bloc incl#desco"binational circ#it that !enerates the re/#ired control o#tp#ts for decodin! and encodin!

f#nctions%

+6. -at i8 t-e nee88it/ o9 grouping 8ignal81 It is #sed to red#ce the n#"ber of the bits in the "icroinstr#ction%

Prepared ByMrs. V.Rekha AP / MCA


3/29

3

It is #sed to overco"e the drawbac of assi!nin! individ#al bits to each control si!nal

res#lts in lon! "icroinstr#ctions%

+>. De9ine o &euening.

It is a process of sched#lin! tas that are awaitin! initiation in order to avoid collisionand achieve hi!h thro#!hp#t%

+?. rite ontrol 8ignal8 9or 8toring a or in memor/. 45o#t , 3A4in

46o#t , 3D4in ,write

3D4o#t E , -3+C

+. -at are t-e prolem8 9ae in %n8trution Pipeline. 4eso#rces Conflicts

Data Dependency

Branch Diffic#lties

*0. -at i8 Regi8ter Renaming1

If a te"porary re!ister ass#"es the role of the per"anent re!ister whose data it is

holdin! and is !iven the sa"e na"e is called as the 4e!ister 4ena"in!

*+. =o ata -a'ar an e pre


4/29

4

All operations and data transfers within the processor tae place within ti"e periodsdefined by the processor cloc%

*>. De9ine multip-a8e loing.Ed!e1tri!!ered flip1flops are not #sed9 two or "ore cloc si!nals "ay be needed to

!#arantee proper transfer of data% This is nown as "#ltiphase clocin!%

*?. -at are t-ree 8tep8 t-at reuire8 9or t-e memor/ rea operation1 45o#t, 3A4in, 4ead

3D4inE, -3+C

3D4o#t, 46in

*. -at are t-e ation8 t-at reuire8 9or e5euting o9 a omplete in8trution1 +etch the instr#ction

+etch the first operand (the contents of the "e"ory location pointed to by 4:%

*erfor" the addition

oad the res#lt into 4I

30. De9ine regi8ter 9ile.

A three1b#s str#ct#re #sed to connect the re!isters and the AU of a processor% All

!eneral1p#rpose re!isters are co"bined into a sin!le bloc called the re!ister file%

3+. De9ine ontrol 8tore.The "icro ro#tines for all instr#ctions in the instr#ction set of a co"p#ter are stored

in a special "e"ory called the control store%

3*. De9ine


5/29

5

The instr#ction fetch #nit has e'ec#ted the branch instr#ction conc#rrently with thee'ec#tion of other instr#ction% This techni/#e is referred as branch foldin!%

3. De9ine ran- ela/ 8lot.

-hen e'ec#tion of I6 is co"pleted and a branch is to be "ade, the processor "#stdiscard I: and fetch the instr#ction at the branch tar!et% The location followin! a branch

instr#ction is called a branch delay slot%

40. -at i8 ela/e ran-ing1

A techni/#e called delayed branchin! can "ini"i$e the penalty inc#rred as a res#lt of

conditional branch instr#ctions% The idea is si"ple% The instr#ctions in the delay slots arealways fetched%

4+. De9ine 8tati ran- preition.

-ith either of these sche"es, the branch prediction decision is always the sa"eevery ti"e a !iven instr#ction is e'ec#ted% Any approach that has this characteristic is called

static branch prediction

4*. De9ine /nami ran- preition.

Approach in which the prediction decision "ay chan!e dependin! on e'ec#tionhistory is called dyna"ic branch prediction%

43. De9ine [email protected] "ore a!!ressive approach is to e/#ip the processor with "#ltiple processin! #nits

to handle several instr#ctions in parallel in each processor sta!e% -ith this arran!e"ent,several instr#ctions start e'ec#tion in the sa"e cloc, and the processor is said to #se

"#ltiple1iss#e%

44. De9ine ommitment unit.-hen o#t1of1order e'ec#tion is allowed, a special control #nit is needed to !#arantee

in1order co""it"ent% This is called co""it"ent #nit%

4;. E5plain ealo1A deadloc is a sit#ation that can arise when two #nits, A and B #se a shared

reso#rce% .#ppose that #nit B cannot co"plete its tas #nit A co"pletes its tas% At thesa"e ti"e, #nit B has been assi!ned a reso#rce that #nit A need% If this happens, neither

#nit can co"plete its tas% Unit A is waitin! for the reso#rce it needs, which is bein! held by#nit B% At the sa"e ti"e, #nit B is waitin! for #nit A to finish before it can release that

reco#rse%

46. De9ine &uper8alar operation.

.#perscalar describes a "icroprocessor desi!n that "aes it possible for "ore thanone instr#ction at a ti"e to be e'ec#ted d#rin! a sin!le cloc cycle% In a s#perscalar desi!n,

the processor or the instr#ction co"piler is able to deter"ine whether an instr#ction can becarried o#t independently of other se/#ential instr#ctions, or whether it has a dependency

on another instr#ction and "#st be e'ec#ted in se/#ence with it%

4>. $i8t out t-e i8a


6/29

6

The branch instr#ction processin!%

4?. -at in9ormation etermine8 t-e ontrol 8ignal81 2De *0++ Instr#ction opcode is fetched

6nd half of instr#ction is fetched with I> address

Contents of AC written o#t to device over data b#s

4. Di99erentiate prei8e an imprei8e e5eption8. 2De *0++

A "achine is said to s#pport precise interr#pt when it !#arantees that all theinstr#ction before the instr#ction ca#sin! the e'ception will be e'ec#ted and retired witho#t

bein! affected by the e'ception bein! raised and all instr#ctions after the fa#ltin! instr#ctionwill not chan!e the state of the "achine before the e'ception is handled% Any "achine that

does not !ive s#ch !#arantee is called to have i"precise e'ception%

*recise e'ception is a desired attrib#te as it helps pro!ra""er to reason abo#t thelo!ic in the pro!ra", especially in the event of deb#!!in! in the presence of an e'ception%

3oreover i"precise e'ception can t#rn a behavior of even a sin!le threaded pro!ra" withsa"e inp#t, non1deter"inistic%

;0. $i8t t-e te-niue8 u8e 9or o


7/29


8/29

8

+etch the contents of the "e"ory location pointed to by the *C% The contents of this

location are interpreted as an instr#ction to be e'ec#ted% 0ence, they are loaded into

the I4%

%R 22PC

Ass#"in! that the "e"ory is byte addressable, incre"ent the contents of the *C by

8, that is,

PC 2PC F 4

Carry o#t the actions specified by the instr#ction in the I4%

-here an instr#ction occ#pies "ore than one word, steps 5 and 6 "#st be repeated as"any ti"es as necessary to fetch the co"plete instr#ction% These two steps are #s#ally

referred to as the fetch phase9 step : constit#tes the e'ec#tion phase% In which thearith"etic and lo!ic #nit (AU and all the re!isters are interconnected via a sin!le co""on

b#s% This b#s is internal to the processor and sho#ld not be conf#sed with the e'ternal b#sthat connects the processor to the "e"ory and I> devices%

The data and address lines of the e'ternal "e"ory b#s are connected to the internal

processor b#s via the "e"ory data re!ister, 3D4, and the "e"ory address re!ister, 3A4,respectively% 4e!ister 3D4 has two inp#ts and two o#tp#ts% Data "ay be loaded into 3D4

either fro" the "e"ory b#s or fro" the internal processor b#s% The data stored in 3D4"ay be placed on either b#s%

The inp#t of 3A4 is connected to the internal b#s, and its o#tp#t is connected to thee'ternal b#s% The control lines of the "e"ory b#s are connected to the instr#ction decoder

and control lo!ic bloc% This #nit is responsible for iss#in! the si!nals that control theoperation of all the #nits inside the processor and for interactin! with the "e"ory b#s%

The n#"ber and #se of the processor re!isters 4 thro#!h 4(n 1 5 vary considerably fro"one processor to another% 4e!isters "ay be provided for !eneral1p#rpose #se by the

pro!ra""er% .o"e "ay be dedicated as special1p#rpose re!isters, s#ch as inde' re!isters

or stac pointers%

Three re!isters, 7, , and TE3*, have not been "entioned before% These re!isters are

transparent to the pro!ra""er, that is, the pro!ra""er need not be concerned with the"beca#se they are never referenced e'plicitly by any instr#ction%

The "#ltiple'er 3U selects either the o#tp#t of re!ister 7 or a constant val#e 8 to be

provided as inp#t A of the AU% The constant 8 is #sed to incre"ent the contents of thepro!ra" co#nter% The two possible val#es of the 3U control inp#t .elect as .elect8 and

.elect7 for selectin! the constant 8 or re!ister 7, respectively%

As instr#ction e'ec#tion pro!resses, data are transferred fro" one re!ister to another, often

passin! thro#!h the A U to perfor" so"e arith"etic or lo!ic operation% The instr#ctiondecoder and control lo!ic #nit is responsible for i"ple"entin! the actions specified by theinstr#ction loaded in the I4 re!ister%

The decoder !enerates the control si!nals needed to select the re!isters involved and direct

the transfer of data% The re!isters, the AU, and the interconnectin! b#s are collectivelyreferred to as the datapath%



9/29

9

Single bus organization of the data path inside a processor

An instr#ction can be e'ec#ted by perfor"in! one or "ore of the followin! operations in

so"e specified se/#ence) Transfer a word of data fro" one processor re!ister to another or to the AU%

*erfor" arith"etic or a lo!ic operation and store the res#lt in a processor re!ister%

+etch the contents of a !iven "e"ory location and load the" into a processor

re!ister%

.tore a word of data fro" a processor re!ister into a !iven "e"ory location%

Register Transfers:

Instr#ction e'ec#tion involves a se/#ence of steps in which data are transferred fro"one re!ister to another% +or each re!ister, two control si!nals are #sed to place the contents

of that re!ister on the b#s or to load the data on the b#s into the re!ister% The inp#t ando#tp#t of re!ister 4i are connected to the b#s via switches controlled by the si!nals 4iin and

4i o#t respectively% -hen 4iin is set to 5, the data on the b#s are loaded into 4i% .i"ilarly,when 4io#t is set to 5, the contents of re!ister 4i are placed on the b#s% -hile 4io#t is e/#al

to , the b#s can be #sed for transferrin! data fro" other re!isters% .#ppose that we wishto transfer the contents of re!ister 4l to re!ister 48% This can be acco"plished as follows)

Enable the o#tp#t of re!ister 4l by settin! 45o#t to 5% This places the contents of 4 5

on the processor b#s% Enable the inp#t of re!ister 48 by settin! 48in to 5% This loads data fro" the

processor b#s into re!ister 48%



10/29

10

Input and output gating for the registers

All operations and data transfers within the processor tae place within ti"e periods definedby the processor cloc% The control si!nals that !overn a partic#lar transfer are asserted at

the start of the cloc cycle%

Performing Arithmetic And Logical Operation:The AU is a co"binational circ#it that has no internal stora!e% It perfor"s

arith"etic and lo!ic operations on the two operands applied to its A and B inp#ts% Theoperands is the o#tp#t of the "#ltiple'er 3U and the other operand is obtained directly

fro" the b#s% The res#lt prod#ced by the AU is stored te"porarily in re!ister % Therefore,a se/#ence of operations to add the contents of re!ister 4l to those of re!ister 46 and store

the res#lt in re!ister 4: isR1out, Yin

R2out, Select Y, Add, ZinZout, Rin

!etching a "ord from #emor$:

The connection for re!ister 3D4 has fo#r control si!nals) 3D4 in and 3D4o#t controlthe connection to the internal b#s, and 3D4 inE and 3D4o#t E control the connection to the

e'ternal b#s% The circ#it is easily "odified to provide the additional connections%



11/29

11

Input and output gating for one register bit.

Connections and control signals for register MDR

E5ample: 3A4 45F

.tart a 4ead operation on the "e"ory b#s-ait for the 3+C response fro" the "e"ory

oad 3D4 fro" the "e"ory b#s46 3D4F

&toring a or %n Memor/:-ritin! a word into a "e"ory location follows a si"ilar proced#re% The desired

address is loaded into 3A4% Then, the data to be written are loaded into 3D4, and a -riteco""and is iss#ed% 0ence, e'ec#tin! the instr#ction 3ove 46,(4 5 re/#ires the followin!

se/#ence)

R1out, MARinR2out, MDRin, Write

MDRout, WM!CAs in the case of the read operation, the -rite control si!nal ca#ses the "e"ory b#s

interface hardware to iss#e a -rite co""and on the "e"ory b#s% The processor re"ains instep : #ntil the "e"ory operation is co"pleted and an 3+C response is received%

*. $i8t an e5plain t-e 8tep8 in


12/29

12

The #pdated val#e is "oved fro" re!ister bac into the *C d#rin! step 6, while waitin! forthe "e"ory to respond% In step :, the word fetched fro" the "e"ory is loaded into the I4%

.teps 5 thro#!h : constit#te the instr#ction fetch phase, which is the sa"e for all

instr#ctions% The instr#ction decodin! circ#it interprets the contents of the I4 at thebe!innin! of step 8% This enables the control circ#itry to activate the control si!nals for

steps 8 thro#!h ;, which constit#te the e'ec#tion phase% The contents of re!ister 4: aretransferred to the 3A4 in step 8, and a "e"ory read operation is initiated%

Then the contents of 4 5 are transferred to re!ister 7 in step


13/29

13

Th#s, if N G the processor ret#rns to step 5 i""ediately after step 8% If N G 5, step < isperfor"ed to load a new val#e into the *C, th#s perfor"in! the branch operation%

3 Di8u88 multiple u8 organi'ation.

All !eneral1p#rpose re!isters are co"bined into a sin!le bloc called the re!ister file%The re!ister file is said to have three ports%

There are two o#tp#ts, allowin! the contents of two different re!isters to be accessed

si"#ltaneo#sly and have their contents placed on b#ses A and B% The third port allows thedata on b#s C to be loaded into a third re!ister d#rin! the sa"e cloc cycle%

B#ses A and B are #sed to transfer the so#rce operands to the A and B inp#ts of the

AU, where an arith"etic or lo!ic operation "ay be perfor"ed% The res#lt is transferred tothe destination over b#s C% If needed, the AU "ay si"ply pass one of its two inp#t

operands #n"odified to b#s C%

The AU control si!nals for s#ch an operation 4GA or 4GB% A second feat#re is theintrod#ction of the Incre"ented #nit, which is #sed to incre"ent the *C by 8% Usin! the

Incre"ented eli"inates the need to add 8 to the *C #sin! the "ain AD, as was done insin!le b#s or!ani$ation%



14/29

14

Consider the three1operand instr#ction A R4,R;,R6

In step 5, the contents of the *C are passed thro#!h the AU, #sin! the 4GB control

si!nal, and loaded into the 3A4 to start a "e"ory read operation% At the sa"e ti"ethe *C is incre"ented by 8% Note that the val#e loaded into 3A4 is the ori!inal

contents of the *C% The incre"ented val#e is loaded into the *C at the end of thecloc cycle and will not affect the contents of 3A4%

In step 6, the processor waits for 3+C and loads the data received into 3D4, thentransfers the" to I4 in step :%

+inally, the e'ec#tion phase of the instr#ction re/#ires only one control step to

co"plete, step 8% By providin! "ore paths for data transfer a si!nificant red#ction in

the n#"ber of cloc cycles needed to e'ec#te an instr#ction is achieved%

4. E5plain =arire ontrol it- t-e lo iagram, Miro Programme ontrol Miro in8trution 2Ma/ *0+*, De *0++ an *0+3

The processor "#st have so"e "eans of !eneratin! the control si!nals needed in theproper se/#ence% Co"p#ter desi!ners #se a wide variety of techni/#es to solve this

proble"% The approaches #sed fall into one of two cate!ories) 0ardwired control

3icro pro!ra""ed control%

The re/#ired control si!nals are deter"ined by the followin! infor"ation)

Contents of the control step co#nter

Contents of the instr#ction re!ister

Contents of the condition code fla!s

E'ternal inp#t si!nals, s#ch as 3+C and interr#pt re/#ests

The decoder>encoder bloc is a co"binational circ#it that !enerates the re/#ired control

o#tp#ts, dependin! on the state of all its inp#ts% By separatin! the decodin! and encodin!



15/29

15

f#nctions% +or any instr#ction loaded in the I4, one of the o#tp#t lines IN. 5 thro#!h IN. "is set to 5, and all other lines are set to %

The inp#t si!nals to the encoder bloc are co"bined to !enerate the individ#al controlsi!nals 7 in , *C Uh Add, End, and so on% An e'a"ple of how the encoder !enerates the

in control si!nal for the processor or!ani$ation% This circ#it i"ple"ents the lo!ic f#nction

si!nal is asserted d#rin! ti"e slot Tl for all instr#ctions, d#rin! T= for an Add instr#ction,d#rin! T 8 for an #nconditional branch instr#ction, and so on% Circ#it that !enerates the Endcontrol si!nal fro" the lo!ic f#nction

The End si!nal starts a new instr#ction fetch cycle by resettin! the control step

co#nter to its startin! val#e% Control si!nal called 4UN% -hen set to 5, 4UN ca#ses the

co#nter to be incre"ented by one at the end of every cloc cycle% -hen 4UN is e/#al to ,the co#nter stops co#ntin!%

The control hardware can be viewed as a state "achine that chan!es fro" one stateto another in every cloc cycle, dependin! on the contents of the instr#ction re!ister, the

condition codes, and the e'ternal inp#ts% The o#tp#ts of the state "achine are the controlsi!nals% The se/#ence of operations carried o#t by this "achine is deter"ined by the wirin!

of the lo!ic ele"ents, hence the na"e Hhardwired%H A controller that #ses this approach canoperate at hi!h speed% 0owever, it has little fle'ibility, and the co"ple'ity of the instr#ction

set it can i"ple"ent is li"ited%



16/29

16

A 'omplete Processor:

This str#ct#re has an instr#ction #nit that fetches instr#ctions fro" an instr#ctioncache or fro" the "ain "e"ory when the desired instr#ctions are not already in the cache%

It has separate processin! #nits to deal with inte!er data and floatin!1point data% A data

cache is inserted between these #nits and the "ain "e"ory% Usin! separate caches forinstr#ctions and data is co""on practice in "any processors today%

Miro programme ontrol 2Ma/ *0+*

An alternative sche"e for hardwired control is called "icro pro!ra""ed control in whichcontrol si!nals are !enerated by a pro!ra" si"ilar to "achine lan!#a!e pro!ra"s%

A control word (C- is a word whose individ#al bits represent the vario#s control si!nals

each of the control steps in the control se/#ence of an instr#ction defines a #ni/#eco"bination of 5s and s in the C-%



17/29

17

The C- s correspondin! to the ; steps of .elect7 is represented by .elect G and .elect8by .elect G 5% A se/#ence of C- s correspondin! to the control se/#ence of a "achine

instr#ction constit#tes the "icroro#tine for that instr#ction, and the individ#al control wordsin this "icroro#tine are referred to as "icroinstr#ctions%

The "icroro#tines for all instr#ctions in the instr#ction set of a co"p#ter are stored in aspecial "e"ory called the control store% The control #nit can !enerate the control si!nals for

any instr#ction by se/#entially readin! the C- s of the correspondin! "icroro#tine fro" thecontrol store% This s#!!ests or!ani$in! the control #nit%

To read the control words se/#entially fro" the control store, a "icropro!ra" co#nter (*C

is #sed% Every ti"e a new instr#ction is loaded into the I4, the o#tp#t of the bloc labeledHstartin! address !eneratorH is loaded into the *C%

In "icropro!ra""ed control, an alternative approach is to #se conditional branch

"icroinstr#ctions% In addition to the branch address, these "icroinstr#ctions specify whichof the e'ternal inp#ts, condition codes, or, possibly, bits of the instr#ction re!ister sho#ld be

checed as a condition for branchin! to tae place%



18/29

18

The instr#ction Branch J "ay now be i"ple"ented by a "icroro#tine% After loadin! this

instr#ction into I4, a branch "icroinstr#ction transfers control to the correspondin!"icroro#tine, which is ass#"ed to start at location 6< in the control store% This address is

the o#tp#t of the startin! address !enerator bloc% The "icroinstr#ction at location 6< tests Nbit of the condition codes% If this bit is e/#al to , a branch taes place to location to fetch

a new "achine instr#ction% therwise, the "icroinstr#ction at location 6= is e'ec#ted to p#t

the branch tar!et address into re!ister % The "icroinstr#ction in location 6; loads thisaddress into the *C%

#icroinstructions:

0ori$ontal and vertical or!ani$ations represent the two or!ani$ational e'tre"es in"icropro!ra""ed control% 3any inter"ediate sche"es are also possible, in which the

de!ree of encodin! is a desi!n para"eter% The layo#t is a hori$ontal or!ani$ation beca#se it!ro#ps only "#t#ally e'cl#sive "icrooperations in the sa"e fields% As a res#lt, it does not

li"it in any way the processorKs ability to perfor" vario#s "icrooperations in parallel%

0i!hly encoded sche"es that #se co"pact codes to specify only a s"all n#"ber of control

f#nctions in each "icroinstr#ction are referred to as a vertical or!ani$ation% n the other

hand, the "ini"ally encoded sche"e, in which "any reso#rces can be controlled with asin!le "icroinstr#ction, is called a hori$ontal or!ani$ation%

The hori$ontal approach is #sef#l when a hi!her operatin! speed is desired and when the

"achine str#ct#re allows parallel #se of reso#rces% The vertical approach res#lts inconsiderably slower operatin! speeds beca#se "ore "icroinstr#ctions are needed to

perfor" the desired control f#nctions%



19/29

19

;. E5plain in etail t-e implementation o9 pipeline it- a neat iagram. 2an *0+*In co"p#ter architect#re *ipelinin! "eans e'ec#tin! "achine instr#ctions conc#rrently% The

pipelinin! is #sed in "odern co"p#ters to achieve hi!h perfor"ance% The speed ofe'ec#tion of pro!ra"s is infl#enced by "any factors% ne way to i"prove perfor"ance is to

#se faster circ#it technolo!y to b#ild the processor and the "ain "e"ory% Another

possibility is to arran!e the hardware so that "ore than one operation can be perfor"ed atthe sa"e ti"e% In this way, the n#"ber of operations perfor"ed per second is increasedeven tho#!h the elapsed ti"e needed to perfor" anyone operation is not chan!ed%

*ipelinin! is a partic#larly effective way of or!ani$in! conc#rrent activity in a co"p#ter

syste"% The basic idea is very si"ple% It is fre/#ently enco#ntered in "an#fact#rin! plants,

where pipelinin! is co""only nown as an asse"bly1line operation% The processor e'ec#tesa pro!ra" by fetchin! and e'ec#tin! instr#ctions, one after the other% et +i and Ei refer to

the fetch and e'ec#te steps for instr#ction Ii% E'ec#tions of a pro!ra" consists of a

se/#ence of fetch and e'ec#te steps,

Now consider a co"p#ter that has two separate hardware #nits, one for fetchin!

instr#ctions and another for e'ec#tin! the"% The instr#ction fetched by the fetch #nit isdeposited in an inter"ediate stora!e b#ffer, B5% This b#ffer is needed to enable the

e'ec#tion #nit to e'ec#te the instr#ction while the fetch #nit is fetchin! the ne't instr#ction%The res#lts of e'ec#tion are deposited in the destination location specified by the

instr#ction% The data can be operated by the instr#ctions are inside the bloc labeledHE'ec#tion #nitH%

The co"p#ter is controlled by a cloc whose period is s#ch that the fetch and e'ec#te stepsof any instr#ction can each be co"pleted in one cloc cycle% peration of the co"p#ter

proceeds% In the first cloc cycle, the fetch #nit fetches an instr#ction I5 (step +5 andstores it in b#ffer Bl at the end of the cloc cycle% In the second cloc cycle, the instr#ction



20/29

20

fetch #nit proceeds with the fetch operation for instr#ction I6 (step +6% 3eanwhile, thee'ec#tion #nit perfor"s the operation specified by instr#ction I5, which is available to it in

b#ffer Bl (step E5% By the end of the second cloc cycle, the e'ec#tion of instr#ction I5 isco"pleted and instr#ction I6 is available% Instr#ction I6 is stored in B5, replacin! I5, which

is no lon!er needed% .tep E6 is perfor"ed by the e'ec#tion #nit d#rin! the third cloc cycle,while instr#ction I: is bein! fetched by the fetch #nit% In this "anner, both the fetch and

e'ec#te #nits are ept b#sy all the ti"e%

+ +etch) read the instr#ction fro" the "e"ory%

D Decode) decode the instr#ction and fetch the so#rce operand(s%

E E'ec#te) perfor" the operation specified by the instr#ction%

- -rite) store the res#lt in the destination location

Role of 'ache #emor$:

Each sta!e in a pipeline is e'pected to co"plete its operation in one cloc cycle% 0ence, thecloc period sho#ld be s#fficiently lon! to co"plete the tas bein! perfor"ed in any sta!e%

*ipelinin! is "ost effective in i"provin! perfor"ance if the tass bein! perfor"ed indifferent sta!es re/#ire abo#t the sa"e a"o#nt of ti"e%

Pipeline Performance:

The pipelined processor processin! of one instr#ction in each cloc cycle, which "eans thatthe rate of instr#ction processin! is fo#r ti"es that of se/#ential operation% The potential

increase in perfor"ance res#ltin! fro" pipelinin! is proportional to the n#"ber of pipelinesta!es%



21/29

21

6. -at i8 a Data -a'ar81 =o ill /ou o


22/29

22

!peran 9oraring:The data ha$ard #st described arises beca#se one instr#ction, instr#ction I6 is waitin! for

data to be written in the re!ister file% 0owever, these data are available at the o#tp#t of theAU once the E'ec#te sta!e co"pletes step El% 0ence, the delay can be red#ced, or possibly

eli"inated, if we arran!e for the res#lt of instr#ction I5 to be forwarded directly for #se instep E6%

The processor datapath involvin! the AU and the re!ister file% This arran!e"ent is si"ilarto the three1b#s str#ct#r, e'cept that re!isters .4Cl, .4C6, and 4.T have been added%

These re!isters constit#te intersta!e b#ffers needed for pipelined operation% 4e!isters .4C5

and .4C6 are part of b#ffer B6 and 4.T is part of B:% The data forwardin! "echanis" isprovided by the bl#e connection lines% The two "#ltiple'ers connected at the inp#ts to the

AU allow the data on the destination b#s to be selected instead of the contents of eitherthe .4CI or .4C6 re!ister% -hen the instr#ctions are e'ec#ted in the datapath of the

operations perfor"ed in each cloc cycle are as follows% After decodin! instr#ction I6 anddetectin! the data dependency, a decision is "ade to #se data forwardin!% The operand not

involved in the dependency, re!ister 46, is read and loaded in re!ister .4CI in cloc cycle :%In the ne't cloc cycle, the prod#ct prod#ced by instr#ction I5 is available in re!ister 4.T,

and beca#se of the forwardin! connection, it can be #sed in step E6% 0ence, e'ec#tion of I6

proceeds witho#t interr#ption%(andling data ha)ards in soft*are:

%+: Mul R*,R3,R4

!P!P

%* : A R;,R4,R6&ie e99et:

The data dependencies enco#ntered in the precedin! e'a"ples are e'plicit and easilydetected beca#se the re!ister involved is na"ed as the destination in instr#ction I5 and as a

so#rce in I6% .o"eti"es an instr#ction chan!es the contents of a re!ister other than theone na"ed as the destination%

Cla88i9iation o9 ata epenent -a'ar8:The Data dependent ha$ards can be classified into three types accordin! to vario#s data

#pdate patterns, Consider two instr#ctions I5 and I6, with I5 occ#rrin! before I6 in pro!ra"order%

%. Rea A9ter rite (RA) (9lo epenene -a'ar) ( R(+) D(*) L M )

Data ha$ard refers to a sit#ation where an instr#ction refers to a res#lt that has not yet

been calc#lated or retrieved%

%%. rite A9ter Rea (AR) (Anti epenene -a'ar) ( D(+) R(*) L M )

A write after read (-A4 data ha$ard represents a proble" with conc#rrent e'ec#tion%

%%%. rite A9ter rite (A) (!utput epenene -a'ar) ( R(+) R(*) L M )

A write after write (-A- data ha$ard "ay occ#r in a conc#rrent e'ec#tion environ"ent%

>. Di8u8 %n8trution -a'ar8. 2an *0+* *0+3*ipeline e'ec#tion of instr#ctions will red#ce the ti"e and i"proves the perfor"ance%

-henever this strea" is interr#pted, the pipeline stalls ill#strates for the case of a cache"iss% A branch instr#ction "ay also ca#se the pipeline to stall% The effect of branch

instr#ctions and the techni/#es that can be #sed for "iti!atin! their i"pact are disc#ssedwith #nconditional branches and conditional branches%



23/29

23

#nonitional ran-e8:

A se/#ence of instr#ctions bein! e'ec#ted in a two1sta!e pipeline% Instr#ctions I5 to I: arestored at s#ccessive "e"ory addresses, and I6 is a branch instr#ction% et the branch

tar!et be instr#ction I% In cloc cycle :, the fetch operation for instr#ction 5: is in pro!ressat the sa"e ti"e that the branch instr#ction is bein! decoded and the tar!et address

co"p#ted% In cloc cycle 8, the processor "#st discard I:, which has been incorrectlyfetched, and fetch instr#ction I% In the "eanti"e, the hardware #nit responsible for theE'ec#te (E step "#st be told to do nothin! d#rin! that cloc period%

Either a cache "iss or a branch instr#ction stalls the pipeline for one or "ore cloc cycles%To red#ce the effect of these interr#ptions, "any processors e"ploy sophisticated fetch

#nits that can fetch instr#ctions before they are needed and p#t the" in a /#e#e% Typically,the instr#ction /#e#e can store several instr#ctions% A separate #nit, which we call the

dispatch #nit, taes instr#ctions fro" the front of the /#e#e and sends the" to thee'ec#tion #nit% This leads to the or!ani$ation% The dispatch #nit also perfor"s the decodin!

f#nction%

To be effective, the fetch #nit "#st have s#fficient decodin! and processin! capability toreco!ni$e and e'ec#te branch instr#ctions% It atte"pts to eep the instr#ction /#e#e filled

at all ti"es to red#ce the i"pact of occasional delays when fetchin! instr#ctions% If there isa delay in fetchin! instr#ctions beca#se of a branch or a cache "iss, the dispatch #nit

contin#es to iss#e instr#ctions fro" the instr#ction /#e#e% The fetch #nit contin#es to fetchinstr#ctions and add the" to the /#e#e%

the /#e#e len!th chan!es and how it affects the relationship between different pipeline

sta!es% .#ppose that instr#ction I5 introd#ces a 61cycle tall% .ince space is available in the/#e#e, the fetch #nit contin#es to fetch instr#ctions and the /#e#e len!th rises to : in cloc

cycle =% Instr#ction I< is a branch instr#ction% Instr#ctions I5, I6, I:, I8 and I co"plete



24/29

24

e'ec#tion in s#ccessive cloc cycles% 0ence, the branch instr#ction does not increase theoverall e'ec#tion ti"e% This techni/#e is referred to as branch folding%

4eadin! "ore than one instr#ction in each cloc cycle "ay red#ce delay% 0avin! an

instr#ction /#e#e lie this is also beneficial in dealin! with cache "isses% The instr#ction/#e#e "iti!ates the i"pact of branch instr#ctions on perfor"ance thro#!h the process of

branch foldin!% It has a si"ilar effect on stalls ca#sed by cache "isses% The effectiveness ofthis techni/#e is enhanced when the instr#ction fetch #nit is able to read "ore than oneinstr#ction at a ti"e fro" the instr#ction cache%

Conitional ran-e8 an ran- preition:

&+ ela$ed -ranching

The processor fetches ne't instr#ctions before it deter"ines whether the c#rrent instr#ction

is a branch instr#ction%&& %ranching Prediction .Static/Another techni/#e for red#cin! the branch penalty associated with conditional branches is to

atte"pt topredict whether or not a partic#lar branch will be taen%

&&& $namic %ranch PredictionThe idea is that the processor hardware assesses the lielihood of a !iven branch bein!

taen by eepin! trac of branch decisions every ti"e that instr#ction is e'ec#ted%

?. E5plain Datapat- an ontrol on8ieration8

The three1b#s str#ct#re s#itable for pipelined e'ec#tion with a sli!ht "odification to s#pporta 81sta!e pipeline% There are separate instruction and data caches that #se separate

address and data connections to the processor% This re/#ires two versions of the 3A4re!ister, I3A4 for accessin! tile instr#ction cache and D3A4 for accessin! the data cache%

The *C is connected directl# to the I3A4, so that the contents of the *C can be transferred

to I3A4 at the sa"e ti"e that an independent AU operation is tain! place% The dataaddress in D3A4 can be obtained directl# fro" the register file or fro" the A$% to s#pport

the re!ister indirect and inde'ed addressin! "odes% .eparate 3D4 re!isters are providedfor read and &rite operations% Data can be transferred directly between these re!isters and

the re!ister file d#rin! load and store operations witho#t the need to pass thro#!h the AU%



25/29

25

'uffer registers have been introd#ced at the inp#ts and o#tp#t of the AU% These arere!isters .4Cl, .4C6, and 4.T% +orwardin! connections "ay be added if desired% The

instr#ction re!ister has been replaced with an instruction (ueue, which is loaded fro" theinstr#ction cache% The o#tp#t of the instr#ction decoder is connected to the control signal

pipeline% This pipeline holds the control si!nals in b#ffers B6 and B%:

The followin! operations can be perfor"ed independentl# in the processor, 4eadin! an instr#ction fro" the instr#ction cache Incre"entin! the *C

Decodin! an instr#ction

4eadin! fro" or writin! into the data cache 4eadin! the contents of #p to two re!isters fro" the re!ister file

-ritin! into one re!ister in the re!ister file *erfor"in! an AU operation

. Di8u88 aout &uper8alar !peration.

*ipelinin! "aes it possible to e'ec#te instr#ctions conc#rrently% .everal instr#ctions are

present in the pipeline at the sa"e ti"e, b#t they are in different sta!es of their e'ec#tion%

-hile one instr#ction is perfor"in! an AU operation, another instr#ction is bein! decodedand yet another is bein! fetched fro" the "e"ory% Instr#ctions enter the pipeline in strictpro!ra" order%

The "a'i"#" thro#!hp#t of a pipelined processor is one instr#ction per cloc cycle% The

processors are capable of achievin! an instr#ction e'ec#tion thro#!hp#t of "ore than one

instr#ction per cycle% They are nown as superscalar processors. 3any "ode" hi!h1perfor"ance processors #se this approach%



26/29

26

In a s#perscalar processor, the detri"ental effect on perfor"ance of vario#s ha$ardsbeco"es even "ore prono#nced% The co"piler can avoid "any ha$ards thro#!h #dicio#s

selection and orderin! of instr#ctions% +or e'a"ple, the co"piler sho#ld strive to interleavefloatin!1point and inte!er instr#ctions%

This wo#ld enable the dispatch #nit to eep both the inte!er and floatin!1point #nits b#sy

"ost of the ti"e% In !eneral, hi!h perfor"ance is achieved if the co"piler is able to arran!e

pro!ra" instr#ctions to tae "a'i"#" advanta!e of the available hardware #nits%

!ut@o9@orer e5eution:Instr#ctions are dispatched in the sa"e order as they appear in the pro!ra"% 0owever, their

e'ec#tion is co"pleted o#t of order% .#ppose one iss#e arise fro" dependencies a"on!instr#ctions%

To !#arantee a consistent state when e'ceptions occ#r, the res#lts of the e'ec#tion of

instr#ctions "#st be written into the destination locations strictly in pro!ra" order% This"eans we "#st delay step -6 #ntil cycle =% In t#rn, the inte!er e'ec#tion #nit "#st retain

the res#lt of instr#ction I6, and hence it cannot accept instr#ction I8 #ntil cycle =% If ane'ception occ#rs d#rin! an instr#ction, all s#bse/#ent instr#ctions that "ay have been

partially e'ec#ted are discarded% This is called a precise e)ception% It is easier to provide

precise e'ceptions in the case of e'ternal interr#pts% At this point, the processor and all itsre!isters are in a consistent state, and interr#pt processin! can be!in%



27/29

27

+0. Di99erene eteen miro programme an -arire ontrol. 2an *0+*

0ardwired control is a control "echanis" to !enerate control si!nals by #sin! appropriatefinite state "achine (+.3% 3icropro!ra""ed control is a control "echanis" to !enerate

control si!nals by #sin! a "e"ory called control stora!e (C., which contains the controlsi!nals% Altho#!h "icropro!ra""ed control see"s to be advanta!eo#s to CI.C "achines,

since CI.C re/#ires syste"atic develop"ent of sophisticated control si!nals, there is nointrinsic difference between these 6 control "echanis"%

The pair of H"icroinstr#ction1re!isterH and Hcontrol stora!e address re!isterH can bere!arded as a Hstate re!isterH for the hardwired control% Note that the control stora!e can

be re!arded as a ind of co"binational lo!ic circ#it% -e can assi!n any , 5 val#es to eacho#tp#t correspondin! to each address, which can be re!arded as the inp#t for a

co"binational lo!ic circ#it% This is a tr#th table%

The "icropro!ra""ed control is not always necessary to i"ple"ent CI.C "achines%0ardwired control also can be #sed for i"ple"entin! sophisticated CI.C "achines%

0ardwired syste"s are "ade to perfor" in a set "anner, i"ple"ented with lo!ic, switches,etc% between any inp#t and o#tp#t in the syste"% nce the "anner in which the control is

e'ec#ted%

3icropro!ra""ed syste"s are centered aro#nd a co"p#ter of so"e sort, often a

"icrocontroller in s"all syste"s, that controls the syste" #sin! a pro!ra"% Inp#t is sent tothe co"p#ter, and the pro!ra" deter"ines what sho#ld be done with the inp#t to co"e #p

with an o#tp#t% .o the processor is between the inp#t and the o#tp#t, rather than a directlin between the inp#t and o#tp#t%

The versatility of the "icropro!ra""ed syste" far e'ceeds the hardwired syste"% The

syste"s can also be considerably s"aller% The si$e of a co"ple' "icrocontroller can be /#itea bit s"aller that a b#nch of lo!ic and switches for the sa"e f#nctionality%

++. -at i8 ran- penalt/1 E5plain -o ran- penalt/ i8 reue. 2De *0++

A branch instr#ction loads the processor@s pro!ra" co#nter with a new non1se/#entialval#e% Conse/#ently, all the instr#ctions whose e'ec#tion was started before the branch wastaen are s#ddenly red#ndant and the pipeline has to be refilled with instr#ctions followin!

the branch tar!et address% The cost of e'ec#tin! an operation that ca#ses a non1se/#entialflow of control is nown as the branch penalty%

Instr#ctions that "odify the flow of control red#cin! or even eli"inatin! the b#bble in the

4I.C@s pipeline ca#sed when a branch is taen9 that is, concerned with ways of red#cin! the



28/29

28

branch penalty% .o"e of the techni/#es involve li"itin! the da"a!e done by a branch andso"e techni/#es atte"pt to predict the o#tco"e of a branch before it has been e'ec#ted%

.everal instr#ctions "odify the flow of control9 for e'a"ple, the #nconditional branch, the

conditional branch, the s#bro#tine call, and the s#bro#tine ret#rn% Internally !eneratedtraps and e'ceptions and e'ternally !enerated interr#pts also "odify the flow of control%

.#bro#tine call and ret#rns are not nor"ally re!arded as branch operations fro" theco"p#ter architectKs point of view, b#t they have si"ilar characteristics fro" the co"p#terdesi!nerKs point of view9 that is, they also inc#r a branch penalty% The #nconditional branch

is always taen and forces e'ec#tion to contin#e at the tar!et address% An #nconditional

branch is e/#ivalent to the hi!h1level lan!#a!e !o to and its o#tco"e is nown at co"pile1ti"e%

Reduce -ranch penalt$:

The o#tco"e of a conditional branch is deter"ined by the state of one or "ore fla! bits inthe processorKs condition code re!ister and is therefore not nown #ntil r#nti"e% The

conditional branch "ay be taen% -hen a branch is not taen, the o#tco"e is so"eti"escalled in line beca#se the ne't instr#ction i""ediately followin! the branch is e'ec#ted% A

s#bro#tine call is a type of #nconditional branch that saves the ret#rn address% .i"ilarly, a

s#bro#tine ret#rn is an #nconditional branch that fetches the tar!et address fro" a re!isteror the stac% .o"e co"p#ters s#pport conditional s#bro#tine calls and ret#rns%

5% *redict branch>#"p instr#ctions AND branch direction (taen or not taen6% *redict branch>#"p tar!et address (for taen branches

:% .pec#latively e'ec#te instr#ctions alon! the predicted path

Anna #ni


29/29

29

P% -hat is branch penaltyO E'plain how branch penalty is red#ced% 4ef% No%) 55F

Prepared By

corrrrr unit iv

Documents