corrrrr unit iv
TRANSCRIPT
-
8/10/2019 COrrrrr Unit IV
1/29
1
Mailam Engineering College(Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai
& Accredited by National Board of Accreditation (NBA, New Delhi
Mailam (Po), Villupuram (Dt). Pin: 604 304
DEPARTMET !" C!MP#TER APP$%CAT%!&Computer !rgani'ation MC*++
Part A
+. -at o /ou meant / pipelining1 2an *0+*
A pipelinin! "ay be vis#ali$ed as a collection of se!"ents called pipe sta!es thro#!h
which binary infor"ation flows% Each se!"ent perfor"s partial processin! as dictated by the
tas% The res#lt obtained in each se!"ent is transferred to the ne't se!"ent in the pipeline%The final res#lt is obtained after the data passes thro#!h all the se!"ents%
*. E5plain laten/ an t-roug-put.
atency) Each Instr#ction taes certain a"o#nt of ti"e to co"plete% This is called aslatency% It is the ti"e differences when an instr#ction is iss#ed and when it is co"pleted%
Thro#!hp#t) The n#"ber of instr#ctions co"pleted in a !iven ti"e is calledThro#!hp#t%
3. -at are t-e ma7or -arateri8ti8 o9 a pipeline1
*ipelinin! cannot be i"ple"ented in a sin!le tas% As it wors by splittin! "#ltiple
tas into a n#"ber of s#btas and operatin! on the" si"#ltaneo#sly%
The speed#p or efficiently is achieved by #sin! the pipelinin! depends on the n#"ber
of pipe sta!es and the n#"ber of available tas that can be s#bdivide%
4. De9ine ontrol or. 2Ma/ *0+*The co"bination of control steps #sed for the !eneration of control si!nals is a
control word% A control word is a word whose individ#al bits represent the vario#s controlsi!nals%
;. -at are t-e
-
8/10/2019 COrrrrr Unit IV
2/29
2
Data ha$ards%
Instr#ction ha$ards%
.tr#ct#ral ha$ards%
?. -at are =a'ar81
A ha$ard is also called as h#rdle %The sit#ation that prevents the ne't instr#ction in
the instr#ction strea" fro" e'ec#tin! d#rin! its desi!nated Cloc cycle% .tall is introd#cedby ha$ard%
. -at i8 meant / Data -a'ar81
A data ha$ard is any condition in which either the so#rce or the destination operandsof an instr#ction are not available at the ti"e e'pected in pipeline% As a res#lt so"e
operation has to be delayed, and the pipeline stalls%
+0. -at i8 meant / %n8trution -a'ar81The pipeline "ay be stalled beca#se of a delay in the availability of an instr#ction%
+or e'a"ple, this "ay be a res#lt of "iss in cache, re/#irin! the instr#ction to be fetchedfro" the "ain "e"ory% .#ch ha$ards are called as Instr#ction ha$ards or Control ha$ards%
++. -at i8 meant / &trutural -a'ar81
The str#ct#ral ha$ards is the sit#ation when two instr#ctions re/#ire the #se of a!iven hardware reso#rce at the sa"e ti"e% The "ost co""on case in which this ha$ard
"ay arise is access to "e"ory%
+*. -at o /ou mean / out@o9 orer e5eution1 %8 it De8irale1In a pipelined processor with several instr#ctions is process conc#rrently it is *ossible
for instr#ction to finish o#t of se/#ence, one instr#ction finishes before another which isiss#ed earlier% As for as "ain co"p#tation is concerned no 0a$ards will happen b#t if an
interr#pts occ#rs it creates the proble"%
+3. $i8t out Variou8 ran-ing te-niue u8e in miro program ontrol unit1
Bit1rin! Usin! Conditional 2ariable
-ide Branch Addressin!
+4. -at i8 miro programming an miro programme ontrol unit1
3icropro!ra""in! is a "ethod of control #nit desi!n in which the control #nitselection and se/#encin! infor"ation are stored in 43 and 4A3s called control store or
control "e"ory%
3icro pro!ra""ed control #nit is a !eneral approach #sed for i"ple"entation ofcontrol #nit% 0ere control si!nals are !enerated by a pro!ra" si"ilar to "achine lan!#a!e
pro!ra"s%
+;. De9ine t-e term -arire ontrol. 2an *0+*It is the one that contains control #nits that #se fi'ed lo!ic circ#its to interpret
instr#ctions and !enerate control si!nals fro" the"% The fi'ed lo!ic circ#it bloc incl#desco"binational circ#it that !enerates the re/#ired control o#tp#ts for decodin! and encodin!
f#nctions%
+6. -at i8 t-e nee88it/ o9 grouping 8ignal81 It is #sed to red#ce the n#"ber of the bits in the "icroinstr#ction%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
3/29
3
It is #sed to overco"e the drawbac of assi!nin! individ#al bits to each control si!nal
res#lts in lon! "icroinstr#ctions%
+>. De9ine o &euening.
It is a process of sched#lin! tas that are awaitin! initiation in order to avoid collisionand achieve hi!h thro#!hp#t%
+?. rite ontrol 8ignal8 9or 8toring a or in memor/. 45o#t , 3A4in
46o#t , 3D4in ,write
3D4o#t E , -3+C
+. -at are t-e prolem8 9ae in %n8trution Pipeline. 4eso#rces Conflicts
Data Dependency
Branch Diffic#lties
*0. -at i8 Regi8ter Renaming1
If a te"porary re!ister ass#"es the role of the per"anent re!ister whose data it is
holdin! and is !iven the sa"e na"e is called as the 4e!ister 4ena"in!
*+. =o ata -a'ar an e pre
-
8/10/2019 COrrrrr Unit IV
4/29
4
All operations and data transfers within the processor tae place within ti"e periodsdefined by the processor cloc%
*>. De9ine multip-a8e loing.Ed!e1tri!!ered flip1flops are not #sed9 two or "ore cloc si!nals "ay be needed to
!#arantee proper transfer of data% This is nown as "#ltiphase clocin!%
*?. -at are t-ree 8tep8 t-at reuire8 9or t-e memor/ rea operation1 45o#t, 3A4in, 4ead
3D4inE, -3+C
3D4o#t, 46in
*. -at are t-e ation8 t-at reuire8 9or e5euting o9 a omplete in8trution1 +etch the instr#ction
+etch the first operand (the contents of the "e"ory location pointed to by 4:%
*erfor" the addition
oad the res#lt into 4I
30. De9ine regi8ter 9ile.
A three1b#s str#ct#re #sed to connect the re!isters and the AU of a processor% All
!eneral1p#rpose re!isters are co"bined into a sin!le bloc called the re!ister file%
3+. De9ine ontrol 8tore.The "icro ro#tines for all instr#ctions in the instr#ction set of a co"p#ter are stored
in a special "e"ory called the control store%
3*. De9ine
-
8/10/2019 COrrrrr Unit IV
5/29
5
The instr#ction fetch #nit has e'ec#ted the branch instr#ction conc#rrently with thee'ec#tion of other instr#ction% This techni/#e is referred as branch foldin!%
3. De9ine ran- ela/ 8lot.
-hen e'ec#tion of I6 is co"pleted and a branch is to be "ade, the processor "#stdiscard I: and fetch the instr#ction at the branch tar!et% The location followin! a branch
instr#ction is called a branch delay slot%
40. -at i8 ela/e ran-ing1
A techni/#e called delayed branchin! can "ini"i$e the penalty inc#rred as a res#lt of
conditional branch instr#ctions% The idea is si"ple% The instr#ctions in the delay slots arealways fetched%
4+. De9ine 8tati ran- preition.
-ith either of these sche"es, the branch prediction decision is always the sa"eevery ti"e a !iven instr#ction is e'ec#ted% Any approach that has this characteristic is called
static branch prediction
4*. De9ine /nami ran- preition.
Approach in which the prediction decision "ay chan!e dependin! on e'ec#tionhistory is called dyna"ic branch prediction%
43. De9ine [email protected] "ore a!!ressive approach is to e/#ip the processor with "#ltiple processin! #nits
to handle several instr#ctions in parallel in each processor sta!e% -ith this arran!e"ent,several instr#ctions start e'ec#tion in the sa"e cloc, and the processor is said to #se
"#ltiple1iss#e%
44. De9ine ommitment unit.-hen o#t1of1order e'ec#tion is allowed, a special control #nit is needed to !#arantee
in1order co""it"ent% This is called co""it"ent #nit%
4;. E5plain ealo1A deadloc is a sit#ation that can arise when two #nits, A and B #se a shared
reso#rce% .#ppose that #nit B cannot co"plete its tas #nit A co"pletes its tas% At thesa"e ti"e, #nit B has been assi!ned a reso#rce that #nit A need% If this happens, neither
#nit can co"plete its tas% Unit A is waitin! for the reso#rce it needs, which is bein! held by#nit B% At the sa"e ti"e, #nit B is waitin! for #nit A to finish before it can release that
reco#rse%
46. De9ine &uper8alar operation.
.#perscalar describes a "icroprocessor desi!n that "aes it possible for "ore thanone instr#ction at a ti"e to be e'ec#ted d#rin! a sin!le cloc cycle% In a s#perscalar desi!n,
the processor or the instr#ction co"piler is able to deter"ine whether an instr#ction can becarried o#t independently of other se/#ential instr#ctions, or whether it has a dependency
on another instr#ction and "#st be e'ec#ted in se/#ence with it%
4>. $i8t out t-e i8a
-
8/10/2019 COrrrrr Unit IV
6/29
6
The branch instr#ction processin!%
4?. -at in9ormation etermine8 t-e ontrol 8ignal81 2De *0++ Instr#ction opcode is fetched
6nd half of instr#ction is fetched with I> address
Contents of AC written o#t to device over data b#s
4. Di99erentiate prei8e an imprei8e e5eption8. 2De *0++
A "achine is said to s#pport precise interr#pt when it !#arantees that all theinstr#ction before the instr#ction ca#sin! the e'ception will be e'ec#ted and retired witho#t
bein! affected by the e'ception bein! raised and all instr#ctions after the fa#ltin! instr#ctionwill not chan!e the state of the "achine before the e'ception is handled% Any "achine that
does not !ive s#ch !#arantee is called to have i"precise e'ception%
*recise e'ception is a desired attrib#te as it helps pro!ra""er to reason abo#t thelo!ic in the pro!ra", especially in the event of deb#!!in! in the presence of an e'ception%
3oreover i"precise e'ception can t#rn a behavior of even a sin!le threaded pro!ra" withsa"e inp#t, non1deter"inistic%
;0. $i8t t-e te-niue8 u8e 9or o
-
8/10/2019 COrrrrr Unit IV
7/29
-
8/10/2019 COrrrrr Unit IV
8/29
8
+etch the contents of the "e"ory location pointed to by the *C% The contents of this
location are interpreted as an instr#ction to be e'ec#ted% 0ence, they are loaded into
the I4%
%R 22PC
Ass#"in! that the "e"ory is byte addressable, incre"ent the contents of the *C by
8, that is,
PC 2PC F 4
Carry o#t the actions specified by the instr#ction in the I4%
-here an instr#ction occ#pies "ore than one word, steps 5 and 6 "#st be repeated as"any ti"es as necessary to fetch the co"plete instr#ction% These two steps are #s#ally
referred to as the fetch phase9 step : constit#tes the e'ec#tion phase% In which thearith"etic and lo!ic #nit (AU and all the re!isters are interconnected via a sin!le co""on
b#s% This b#s is internal to the processor and sho#ld not be conf#sed with the e'ternal b#sthat connects the processor to the "e"ory and I> devices%
The data and address lines of the e'ternal "e"ory b#s are connected to the internal
processor b#s via the "e"ory data re!ister, 3D4, and the "e"ory address re!ister, 3A4,respectively% 4e!ister 3D4 has two inp#ts and two o#tp#ts% Data "ay be loaded into 3D4
either fro" the "e"ory b#s or fro" the internal processor b#s% The data stored in 3D4"ay be placed on either b#s%
The inp#t of 3A4 is connected to the internal b#s, and its o#tp#t is connected to thee'ternal b#s% The control lines of the "e"ory b#s are connected to the instr#ction decoder
and control lo!ic bloc% This #nit is responsible for iss#in! the si!nals that control theoperation of all the #nits inside the processor and for interactin! with the "e"ory b#s%
The n#"ber and #se of the processor re!isters 4 thro#!h 4(n 1 5 vary considerably fro"one processor to another% 4e!isters "ay be provided for !eneral1p#rpose #se by the
pro!ra""er% .o"e "ay be dedicated as special1p#rpose re!isters, s#ch as inde' re!isters
or stac pointers%
Three re!isters, 7, , and TE3*, have not been "entioned before% These re!isters are
transparent to the pro!ra""er, that is, the pro!ra""er need not be concerned with the"beca#se they are never referenced e'plicitly by any instr#ction%
The "#ltiple'er 3U selects either the o#tp#t of re!ister 7 or a constant val#e 8 to be
provided as inp#t A of the AU% The constant 8 is #sed to incre"ent the contents of thepro!ra" co#nter% The two possible val#es of the 3U control inp#t .elect as .elect8 and
.elect7 for selectin! the constant 8 or re!ister 7, respectively%
As instr#ction e'ec#tion pro!resses, data are transferred fro" one re!ister to another, often
passin! thro#!h the A U to perfor" so"e arith"etic or lo!ic operation% The instr#ctiondecoder and control lo!ic #nit is responsible for i"ple"entin! the actions specified by theinstr#ction loaded in the I4 re!ister%
The decoder !enerates the control si!nals needed to select the re!isters involved and direct
the transfer of data% The re!isters, the AU, and the interconnectin! b#s are collectivelyreferred to as the datapath%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
9/29
9
Single bus organization of the data path inside a processor
An instr#ction can be e'ec#ted by perfor"in! one or "ore of the followin! operations in
so"e specified se/#ence) Transfer a word of data fro" one processor re!ister to another or to the AU%
*erfor" arith"etic or a lo!ic operation and store the res#lt in a processor re!ister%
+etch the contents of a !iven "e"ory location and load the" into a processor
re!ister%
.tore a word of data fro" a processor re!ister into a !iven "e"ory location%
Register Transfers:
Instr#ction e'ec#tion involves a se/#ence of steps in which data are transferred fro"one re!ister to another% +or each re!ister, two control si!nals are #sed to place the contents
of that re!ister on the b#s or to load the data on the b#s into the re!ister% The inp#t ando#tp#t of re!ister 4i are connected to the b#s via switches controlled by the si!nals 4iin and
4i o#t respectively% -hen 4iin is set to 5, the data on the b#s are loaded into 4i% .i"ilarly,when 4io#t is set to 5, the contents of re!ister 4i are placed on the b#s% -hile 4io#t is e/#al
to , the b#s can be #sed for transferrin! data fro" other re!isters% .#ppose that we wishto transfer the contents of re!ister 4l to re!ister 48% This can be acco"plished as follows)
Enable the o#tp#t of re!ister 4l by settin! 45o#t to 5% This places the contents of 4 5
on the processor b#s% Enable the inp#t of re!ister 48 by settin! 48in to 5% This loads data fro" the
processor b#s into re!ister 48%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
10/29
10
Input and output gating for the registers
All operations and data transfers within the processor tae place within ti"e periods definedby the processor cloc% The control si!nals that !overn a partic#lar transfer are asserted at
the start of the cloc cycle%
Performing Arithmetic And Logical Operation:The AU is a co"binational circ#it that has no internal stora!e% It perfor"s
arith"etic and lo!ic operations on the two operands applied to its A and B inp#ts% Theoperands is the o#tp#t of the "#ltiple'er 3U and the other operand is obtained directly
fro" the b#s% The res#lt prod#ced by the AU is stored te"porarily in re!ister % Therefore,a se/#ence of operations to add the contents of re!ister 4l to those of re!ister 46 and store
the res#lt in re!ister 4: isR1out, Yin
R2out, Select Y, Add, ZinZout, Rin
!etching a "ord from #emor$:
The connection for re!ister 3D4 has fo#r control si!nals) 3D4 in and 3D4o#t controlthe connection to the internal b#s, and 3D4 inE and 3D4o#t E control the connection to the
e'ternal b#s% The circ#it is easily "odified to provide the additional connections%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
11/29
11
Input and output gating for one register bit.
Connections and control signals for register MDR
E5ample: 3A4 45F
.tart a 4ead operation on the "e"ory b#s-ait for the 3+C response fro" the "e"ory
oad 3D4 fro" the "e"ory b#s46 3D4F
&toring a or %n Memor/:-ritin! a word into a "e"ory location follows a si"ilar proced#re% The desired
address is loaded into 3A4% Then, the data to be written are loaded into 3D4, and a -riteco""and is iss#ed% 0ence, e'ec#tin! the instr#ction 3ove 46,(4 5 re/#ires the followin!
se/#ence)
R1out, MARinR2out, MDRin, Write
MDRout, WM!CAs in the case of the read operation, the -rite control si!nal ca#ses the "e"ory b#s
interface hardware to iss#e a -rite co""and on the "e"ory b#s% The processor re"ains instep : #ntil the "e"ory operation is co"pleted and an 3+C response is received%
*. $i8t an e5plain t-e 8tep8 in
-
8/10/2019 COrrrrr Unit IV
12/29
12
The #pdated val#e is "oved fro" re!ister bac into the *C d#rin! step 6, while waitin! forthe "e"ory to respond% In step :, the word fetched fro" the "e"ory is loaded into the I4%
.teps 5 thro#!h : constit#te the instr#ction fetch phase, which is the sa"e for all
instr#ctions% The instr#ction decodin! circ#it interprets the contents of the I4 at thebe!innin! of step 8% This enables the control circ#itry to activate the control si!nals for
steps 8 thro#!h ;, which constit#te the e'ec#tion phase% The contents of re!ister 4: aretransferred to the 3A4 in step 8, and a "e"ory read operation is initiated%
Then the contents of 4 5 are transferred to re!ister 7 in step
-
8/10/2019 COrrrrr Unit IV
13/29
13
Th#s, if N G the processor ret#rns to step 5 i""ediately after step 8% If N G 5, step < isperfor"ed to load a new val#e into the *C, th#s perfor"in! the branch operation%
3 Di8u88 multiple u8 organi'ation.
All !eneral1p#rpose re!isters are co"bined into a sin!le bloc called the re!ister file%The re!ister file is said to have three ports%
There are two o#tp#ts, allowin! the contents of two different re!isters to be accessed
si"#ltaneo#sly and have their contents placed on b#ses A and B% The third port allows thedata on b#s C to be loaded into a third re!ister d#rin! the sa"e cloc cycle%
B#ses A and B are #sed to transfer the so#rce operands to the A and B inp#ts of the
AU, where an arith"etic or lo!ic operation "ay be perfor"ed% The res#lt is transferred tothe destination over b#s C% If needed, the AU "ay si"ply pass one of its two inp#t
operands #n"odified to b#s C%
The AU control si!nals for s#ch an operation 4GA or 4GB% A second feat#re is theintrod#ction of the Incre"ented #nit, which is #sed to incre"ent the *C by 8% Usin! the
Incre"ented eli"inates the need to add 8 to the *C #sin! the "ain AD, as was done insin!le b#s or!ani$ation%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
14/29
14
Consider the three1operand instr#ction A R4,R;,R6
In step 5, the contents of the *C are passed thro#!h the AU, #sin! the 4GB control
si!nal, and loaded into the 3A4 to start a "e"ory read operation% At the sa"e ti"ethe *C is incre"ented by 8% Note that the val#e loaded into 3A4 is the ori!inal
contents of the *C% The incre"ented val#e is loaded into the *C at the end of thecloc cycle and will not affect the contents of 3A4%
In step 6, the processor waits for 3+C and loads the data received into 3D4, thentransfers the" to I4 in step :%
+inally, the e'ec#tion phase of the instr#ction re/#ires only one control step to
co"plete, step 8% By providin! "ore paths for data transfer a si!nificant red#ction in
the n#"ber of cloc cycles needed to e'ec#te an instr#ction is achieved%
4. E5plain =arire ontrol it- t-e lo iagram, Miro Programme ontrol Miro in8trution 2Ma/ *0+*, De *0++ an *0+3
The processor "#st have so"e "eans of !eneratin! the control si!nals needed in theproper se/#ence% Co"p#ter desi!ners #se a wide variety of techni/#es to solve this
proble"% The approaches #sed fall into one of two cate!ories) 0ardwired control
3icro pro!ra""ed control%
The re/#ired control si!nals are deter"ined by the followin! infor"ation)
Contents of the control step co#nter
Contents of the instr#ction re!ister
Contents of the condition code fla!s
E'ternal inp#t si!nals, s#ch as 3+C and interr#pt re/#ests
The decoder>encoder bloc is a co"binational circ#it that !enerates the re/#ired control
o#tp#ts, dependin! on the state of all its inp#ts% By separatin! the decodin! and encodin!
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
15/29
15
f#nctions% +or any instr#ction loaded in the I4, one of the o#tp#t lines IN. 5 thro#!h IN. "is set to 5, and all other lines are set to %
The inp#t si!nals to the encoder bloc are co"bined to !enerate the individ#al controlsi!nals 7 in , *C Uh Add, End, and so on% An e'a"ple of how the encoder !enerates the
in control si!nal for the processor or!ani$ation% This circ#it i"ple"ents the lo!ic f#nction
si!nal is asserted d#rin! ti"e slot Tl for all instr#ctions, d#rin! T= for an Add instr#ction,d#rin! T 8 for an #nconditional branch instr#ction, and so on% Circ#it that !enerates the Endcontrol si!nal fro" the lo!ic f#nction
The End si!nal starts a new instr#ction fetch cycle by resettin! the control step
co#nter to its startin! val#e% Control si!nal called 4UN% -hen set to 5, 4UN ca#ses the
co#nter to be incre"ented by one at the end of every cloc cycle% -hen 4UN is e/#al to ,the co#nter stops co#ntin!%
The control hardware can be viewed as a state "achine that chan!es fro" one stateto another in every cloc cycle, dependin! on the contents of the instr#ction re!ister, the
condition codes, and the e'ternal inp#ts% The o#tp#ts of the state "achine are the controlsi!nals% The se/#ence of operations carried o#t by this "achine is deter"ined by the wirin!
of the lo!ic ele"ents, hence the na"e Hhardwired%H A controller that #ses this approach canoperate at hi!h speed% 0owever, it has little fle'ibility, and the co"ple'ity of the instr#ction
set it can i"ple"ent is li"ited%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
16/29
16
A 'omplete Processor:
This str#ct#re has an instr#ction #nit that fetches instr#ctions fro" an instr#ctioncache or fro" the "ain "e"ory when the desired instr#ctions are not already in the cache%
It has separate processin! #nits to deal with inte!er data and floatin!1point data% A data
cache is inserted between these #nits and the "ain "e"ory% Usin! separate caches forinstr#ctions and data is co""on practice in "any processors today%
Miro programme ontrol 2Ma/ *0+*
An alternative sche"e for hardwired control is called "icro pro!ra""ed control in whichcontrol si!nals are !enerated by a pro!ra" si"ilar to "achine lan!#a!e pro!ra"s%
A control word (C- is a word whose individ#al bits represent the vario#s control si!nals
each of the control steps in the control se/#ence of an instr#ction defines a #ni/#eco"bination of 5s and s in the C-%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
17/29
17
The C- s correspondin! to the ; steps of .elect7 is represented by .elect G and .elect8by .elect G 5% A se/#ence of C- s correspondin! to the control se/#ence of a "achine
instr#ction constit#tes the "icroro#tine for that instr#ction, and the individ#al control wordsin this "icroro#tine are referred to as "icroinstr#ctions%
The "icroro#tines for all instr#ctions in the instr#ction set of a co"p#ter are stored in aspecial "e"ory called the control store% The control #nit can !enerate the control si!nals for
any instr#ction by se/#entially readin! the C- s of the correspondin! "icroro#tine fro" thecontrol store% This s#!!ests or!ani$in! the control #nit%
To read the control words se/#entially fro" the control store, a "icropro!ra" co#nter (*C
is #sed% Every ti"e a new instr#ction is loaded into the I4, the o#tp#t of the bloc labeledHstartin! address !eneratorH is loaded into the *C%
In "icropro!ra""ed control, an alternative approach is to #se conditional branch
"icroinstr#ctions% In addition to the branch address, these "icroinstr#ctions specify whichof the e'ternal inp#ts, condition codes, or, possibly, bits of the instr#ction re!ister sho#ld be
checed as a condition for branchin! to tae place%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
18/29
18
The instr#ction Branch J "ay now be i"ple"ented by a "icroro#tine% After loadin! this
instr#ction into I4, a branch "icroinstr#ction transfers control to the correspondin!"icroro#tine, which is ass#"ed to start at location 6< in the control store% This address is
the o#tp#t of the startin! address !enerator bloc% The "icroinstr#ction at location 6< tests Nbit of the condition codes% If this bit is e/#al to , a branch taes place to location to fetch
a new "achine instr#ction% therwise, the "icroinstr#ction at location 6= is e'ec#ted to p#t
the branch tar!et address into re!ister % The "icroinstr#ction in location 6; loads thisaddress into the *C%
#icroinstructions:
0ori$ontal and vertical or!ani$ations represent the two or!ani$ational e'tre"es in"icropro!ra""ed control% 3any inter"ediate sche"es are also possible, in which the
de!ree of encodin! is a desi!n para"eter% The layo#t is a hori$ontal or!ani$ation beca#se it!ro#ps only "#t#ally e'cl#sive "icrooperations in the sa"e fields% As a res#lt, it does not
li"it in any way the processorKs ability to perfor" vario#s "icrooperations in parallel%
0i!hly encoded sche"es that #se co"pact codes to specify only a s"all n#"ber of control
f#nctions in each "icroinstr#ction are referred to as a vertical or!ani$ation% n the other
hand, the "ini"ally encoded sche"e, in which "any reso#rces can be controlled with asin!le "icroinstr#ction, is called a hori$ontal or!ani$ation%
The hori$ontal approach is #sef#l when a hi!her operatin! speed is desired and when the
"achine str#ct#re allows parallel #se of reso#rces% The vertical approach res#lts inconsiderably slower operatin! speeds beca#se "ore "icroinstr#ctions are needed to
perfor" the desired control f#nctions%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
19/29
19
;. E5plain in etail t-e implementation o9 pipeline it- a neat iagram. 2an *0+*In co"p#ter architect#re *ipelinin! "eans e'ec#tin! "achine instr#ctions conc#rrently% The
pipelinin! is #sed in "odern co"p#ters to achieve hi!h perfor"ance% The speed ofe'ec#tion of pro!ra"s is infl#enced by "any factors% ne way to i"prove perfor"ance is to
#se faster circ#it technolo!y to b#ild the processor and the "ain "e"ory% Another
possibility is to arran!e the hardware so that "ore than one operation can be perfor"ed atthe sa"e ti"e% In this way, the n#"ber of operations perfor"ed per second is increasedeven tho#!h the elapsed ti"e needed to perfor" anyone operation is not chan!ed%
*ipelinin! is a partic#larly effective way of or!ani$in! conc#rrent activity in a co"p#ter
syste"% The basic idea is very si"ple% It is fre/#ently enco#ntered in "an#fact#rin! plants,
where pipelinin! is co""only nown as an asse"bly1line operation% The processor e'ec#tesa pro!ra" by fetchin! and e'ec#tin! instr#ctions, one after the other% et +i and Ei refer to
the fetch and e'ec#te steps for instr#ction Ii% E'ec#tions of a pro!ra" consists of a
se/#ence of fetch and e'ec#te steps,
Now consider a co"p#ter that has two separate hardware #nits, one for fetchin!
instr#ctions and another for e'ec#tin! the"% The instr#ction fetched by the fetch #nit isdeposited in an inter"ediate stora!e b#ffer, B5% This b#ffer is needed to enable the
e'ec#tion #nit to e'ec#te the instr#ction while the fetch #nit is fetchin! the ne't instr#ction%The res#lts of e'ec#tion are deposited in the destination location specified by the
instr#ction% The data can be operated by the instr#ctions are inside the bloc labeledHE'ec#tion #nitH%
The co"p#ter is controlled by a cloc whose period is s#ch that the fetch and e'ec#te stepsof any instr#ction can each be co"pleted in one cloc cycle% peration of the co"p#ter
proceeds% In the first cloc cycle, the fetch #nit fetches an instr#ction I5 (step +5 andstores it in b#ffer Bl at the end of the cloc cycle% In the second cloc cycle, the instr#ction
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
20/29
20
fetch #nit proceeds with the fetch operation for instr#ction I6 (step +6% 3eanwhile, thee'ec#tion #nit perfor"s the operation specified by instr#ction I5, which is available to it in
b#ffer Bl (step E5% By the end of the second cloc cycle, the e'ec#tion of instr#ction I5 isco"pleted and instr#ction I6 is available% Instr#ction I6 is stored in B5, replacin! I5, which
is no lon!er needed% .tep E6 is perfor"ed by the e'ec#tion #nit d#rin! the third cloc cycle,while instr#ction I: is bein! fetched by the fetch #nit% In this "anner, both the fetch and
e'ec#te #nits are ept b#sy all the ti"e%
+ +etch) read the instr#ction fro" the "e"ory%
D Decode) decode the instr#ction and fetch the so#rce operand(s%
E E'ec#te) perfor" the operation specified by the instr#ction%
- -rite) store the res#lt in the destination location
Role of 'ache #emor$:
Each sta!e in a pipeline is e'pected to co"plete its operation in one cloc cycle% 0ence, thecloc period sho#ld be s#fficiently lon! to co"plete the tas bein! perfor"ed in any sta!e%
*ipelinin! is "ost effective in i"provin! perfor"ance if the tass bein! perfor"ed indifferent sta!es re/#ire abo#t the sa"e a"o#nt of ti"e%
Pipeline Performance:
The pipelined processor processin! of one instr#ction in each cloc cycle, which "eans thatthe rate of instr#ction processin! is fo#r ti"es that of se/#ential operation% The potential
increase in perfor"ance res#ltin! fro" pipelinin! is proportional to the n#"ber of pipelinesta!es%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
21/29
21
6. -at i8 a Data -a'ar81 =o ill /ou o
-
8/10/2019 COrrrrr Unit IV
22/29
22
!peran 9oraring:The data ha$ard #st described arises beca#se one instr#ction, instr#ction I6 is waitin! for
data to be written in the re!ister file% 0owever, these data are available at the o#tp#t of theAU once the E'ec#te sta!e co"pletes step El% 0ence, the delay can be red#ced, or possibly
eli"inated, if we arran!e for the res#lt of instr#ction I5 to be forwarded directly for #se instep E6%
The processor datapath involvin! the AU and the re!ister file% This arran!e"ent is si"ilarto the three1b#s str#ct#r, e'cept that re!isters .4Cl, .4C6, and 4.T have been added%
These re!isters constit#te intersta!e b#ffers needed for pipelined operation% 4e!isters .4C5
and .4C6 are part of b#ffer B6 and 4.T is part of B:% The data forwardin! "echanis" isprovided by the bl#e connection lines% The two "#ltiple'ers connected at the inp#ts to the
AU allow the data on the destination b#s to be selected instead of the contents of eitherthe .4CI or .4C6 re!ister% -hen the instr#ctions are e'ec#ted in the datapath of the
operations perfor"ed in each cloc cycle are as follows% After decodin! instr#ction I6 anddetectin! the data dependency, a decision is "ade to #se data forwardin!% The operand not
involved in the dependency, re!ister 46, is read and loaded in re!ister .4CI in cloc cycle :%In the ne't cloc cycle, the prod#ct prod#ced by instr#ction I5 is available in re!ister 4.T,
and beca#se of the forwardin! connection, it can be #sed in step E6% 0ence, e'ec#tion of I6
proceeds witho#t interr#ption%(andling data ha)ards in soft*are:
%+: Mul R*,R3,R4
!P!P
%* : A R;,R4,R6&ie e99et:
The data dependencies enco#ntered in the precedin! e'a"ples are e'plicit and easilydetected beca#se the re!ister involved is na"ed as the destination in instr#ction I5 and as a
so#rce in I6% .o"eti"es an instr#ction chan!es the contents of a re!ister other than theone na"ed as the destination%
Cla88i9iation o9 ata epenent -a'ar8:The Data dependent ha$ards can be classified into three types accordin! to vario#s data
#pdate patterns, Consider two instr#ctions I5 and I6, with I5 occ#rrin! before I6 in pro!ra"order%
%. Rea A9ter rite (RA) (9lo epenene -a'ar) ( R(+) D(*) L M )
Data ha$ard refers to a sit#ation where an instr#ction refers to a res#lt that has not yet
been calc#lated or retrieved%
%%. rite A9ter Rea (AR) (Anti epenene -a'ar) ( D(+) R(*) L M )
A write after read (-A4 data ha$ard represents a proble" with conc#rrent e'ec#tion%
%%%. rite A9ter rite (A) (!utput epenene -a'ar) ( R(+) R(*) L M )
A write after write (-A- data ha$ard "ay occ#r in a conc#rrent e'ec#tion environ"ent%
>. Di8u8 %n8trution -a'ar8. 2an *0+* *0+3*ipeline e'ec#tion of instr#ctions will red#ce the ti"e and i"proves the perfor"ance%
-henever this strea" is interr#pted, the pipeline stalls ill#strates for the case of a cache"iss% A branch instr#ction "ay also ca#se the pipeline to stall% The effect of branch
instr#ctions and the techni/#es that can be #sed for "iti!atin! their i"pact are disc#ssedwith #nconditional branches and conditional branches%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
23/29
23
#nonitional ran-e8:
A se/#ence of instr#ctions bein! e'ec#ted in a two1sta!e pipeline% Instr#ctions I5 to I: arestored at s#ccessive "e"ory addresses, and I6 is a branch instr#ction% et the branch
tar!et be instr#ction I% In cloc cycle :, the fetch operation for instr#ction 5: is in pro!ressat the sa"e ti"e that the branch instr#ction is bein! decoded and the tar!et address
co"p#ted% In cloc cycle 8, the processor "#st discard I:, which has been incorrectlyfetched, and fetch instr#ction I% In the "eanti"e, the hardware #nit responsible for theE'ec#te (E step "#st be told to do nothin! d#rin! that cloc period%
Either a cache "iss or a branch instr#ction stalls the pipeline for one or "ore cloc cycles%To red#ce the effect of these interr#ptions, "any processors e"ploy sophisticated fetch
#nits that can fetch instr#ctions before they are needed and p#t the" in a /#e#e% Typically,the instr#ction /#e#e can store several instr#ctions% A separate #nit, which we call the
dispatch #nit, taes instr#ctions fro" the front of the /#e#e and sends the" to thee'ec#tion #nit% This leads to the or!ani$ation% The dispatch #nit also perfor"s the decodin!
f#nction%
To be effective, the fetch #nit "#st have s#fficient decodin! and processin! capability toreco!ni$e and e'ec#te branch instr#ctions% It atte"pts to eep the instr#ction /#e#e filled
at all ti"es to red#ce the i"pact of occasional delays when fetchin! instr#ctions% If there isa delay in fetchin! instr#ctions beca#se of a branch or a cache "iss, the dispatch #nit
contin#es to iss#e instr#ctions fro" the instr#ction /#e#e% The fetch #nit contin#es to fetchinstr#ctions and add the" to the /#e#e%
the /#e#e len!th chan!es and how it affects the relationship between different pipeline
sta!es% .#ppose that instr#ction I5 introd#ces a 61cycle tall% .ince space is available in the/#e#e, the fetch #nit contin#es to fetch instr#ctions and the /#e#e len!th rises to : in cloc
cycle =% Instr#ction I< is a branch instr#ction% Instr#ctions I5, I6, I:, I8 and I co"plete
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
24/29
24
e'ec#tion in s#ccessive cloc cycles% 0ence, the branch instr#ction does not increase theoverall e'ec#tion ti"e% This techni/#e is referred to as branch folding%
4eadin! "ore than one instr#ction in each cloc cycle "ay red#ce delay% 0avin! an
instr#ction /#e#e lie this is also beneficial in dealin! with cache "isses% The instr#ction/#e#e "iti!ates the i"pact of branch instr#ctions on perfor"ance thro#!h the process of
branch foldin!% It has a si"ilar effect on stalls ca#sed by cache "isses% The effectiveness ofthis techni/#e is enhanced when the instr#ction fetch #nit is able to read "ore than oneinstr#ction at a ti"e fro" the instr#ction cache%
Conitional ran-e8 an ran- preition:
&+ ela$ed -ranching
The processor fetches ne't instr#ctions before it deter"ines whether the c#rrent instr#ction
is a branch instr#ction%&& %ranching Prediction .Static/Another techni/#e for red#cin! the branch penalty associated with conditional branches is to
atte"pt topredict whether or not a partic#lar branch will be taen%
&&& $namic %ranch PredictionThe idea is that the processor hardware assesses the lielihood of a !iven branch bein!
taen by eepin! trac of branch decisions every ti"e that instr#ction is e'ec#ted%
?. E5plain Datapat- an ontrol on8ieration8
The three1b#s str#ct#re s#itable for pipelined e'ec#tion with a sli!ht "odification to s#pporta 81sta!e pipeline% There are separate instruction and data caches that #se separate
address and data connections to the processor% This re/#ires two versions of the 3A4re!ister, I3A4 for accessin! tile instr#ction cache and D3A4 for accessin! the data cache%
The *C is connected directl# to the I3A4, so that the contents of the *C can be transferred
to I3A4 at the sa"e ti"e that an independent AU operation is tain! place% The dataaddress in D3A4 can be obtained directl# fro" the register file or fro" the A$% to s#pport
the re!ister indirect and inde'ed addressin! "odes% .eparate 3D4 re!isters are providedfor read and &rite operations% Data can be transferred directly between these re!isters and
the re!ister file d#rin! load and store operations witho#t the need to pass thro#!h the AU%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
25/29
25
'uffer registers have been introd#ced at the inp#ts and o#tp#t of the AU% These arere!isters .4Cl, .4C6, and 4.T% +orwardin! connections "ay be added if desired% The
instr#ction re!ister has been replaced with an instruction (ueue, which is loaded fro" theinstr#ction cache% The o#tp#t of the instr#ction decoder is connected to the control signal
pipeline% This pipeline holds the control si!nals in b#ffers B6 and B%:
The followin! operations can be perfor"ed independentl# in the processor, 4eadin! an instr#ction fro" the instr#ction cache Incre"entin! the *C
Decodin! an instr#ction
4eadin! fro" or writin! into the data cache 4eadin! the contents of #p to two re!isters fro" the re!ister file
-ritin! into one re!ister in the re!ister file *erfor"in! an AU operation
. Di8u88 aout &uper8alar !peration.
*ipelinin! "aes it possible to e'ec#te instr#ctions conc#rrently% .everal instr#ctions are
present in the pipeline at the sa"e ti"e, b#t they are in different sta!es of their e'ec#tion%
-hile one instr#ction is perfor"in! an AU operation, another instr#ction is bein! decodedand yet another is bein! fetched fro" the "e"ory% Instr#ctions enter the pipeline in strictpro!ra" order%
The "a'i"#" thro#!hp#t of a pipelined processor is one instr#ction per cloc cycle% The
processors are capable of achievin! an instr#ction e'ec#tion thro#!hp#t of "ore than one
instr#ction per cycle% They are nown as superscalar processors. 3any "ode" hi!h1perfor"ance processors #se this approach%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
26/29
26
In a s#perscalar processor, the detri"ental effect on perfor"ance of vario#s ha$ardsbeco"es even "ore prono#nced% The co"piler can avoid "any ha$ards thro#!h #dicio#s
selection and orderin! of instr#ctions% +or e'a"ple, the co"piler sho#ld strive to interleavefloatin!1point and inte!er instr#ctions%
This wo#ld enable the dispatch #nit to eep both the inte!er and floatin!1point #nits b#sy
"ost of the ti"e% In !eneral, hi!h perfor"ance is achieved if the co"piler is able to arran!e
pro!ra" instr#ctions to tae "a'i"#" advanta!e of the available hardware #nits%
!ut@o9@orer e5eution:Instr#ctions are dispatched in the sa"e order as they appear in the pro!ra"% 0owever, their
e'ec#tion is co"pleted o#t of order% .#ppose one iss#e arise fro" dependencies a"on!instr#ctions%
To !#arantee a consistent state when e'ceptions occ#r, the res#lts of the e'ec#tion of
instr#ctions "#st be written into the destination locations strictly in pro!ra" order% This"eans we "#st delay step -6 #ntil cycle =% In t#rn, the inte!er e'ec#tion #nit "#st retain
the res#lt of instr#ction I6, and hence it cannot accept instr#ction I8 #ntil cycle =% If ane'ception occ#rs d#rin! an instr#ction, all s#bse/#ent instr#ctions that "ay have been
partially e'ec#ted are discarded% This is called a precise e)ception% It is easier to provide
precise e'ceptions in the case of e'ternal interr#pts% At this point, the processor and all itsre!isters are in a consistent state, and interr#pt processin! can be!in%
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
27/29
27
+0. Di99erene eteen miro programme an -arire ontrol. 2an *0+*
0ardwired control is a control "echanis" to !enerate control si!nals by #sin! appropriatefinite state "achine (+.3% 3icropro!ra""ed control is a control "echanis" to !enerate
control si!nals by #sin! a "e"ory called control stora!e (C., which contains the controlsi!nals% Altho#!h "icropro!ra""ed control see"s to be advanta!eo#s to CI.C "achines,
since CI.C re/#ires syste"atic develop"ent of sophisticated control si!nals, there is nointrinsic difference between these 6 control "echanis"%
The pair of H"icroinstr#ction1re!isterH and Hcontrol stora!e address re!isterH can bere!arded as a Hstate re!isterH for the hardwired control% Note that the control stora!e can
be re!arded as a ind of co"binational lo!ic circ#it% -e can assi!n any , 5 val#es to eacho#tp#t correspondin! to each address, which can be re!arded as the inp#t for a
co"binational lo!ic circ#it% This is a tr#th table%
The "icropro!ra""ed control is not always necessary to i"ple"ent CI.C "achines%0ardwired control also can be #sed for i"ple"entin! sophisticated CI.C "achines%
0ardwired syste"s are "ade to perfor" in a set "anner, i"ple"ented with lo!ic, switches,etc% between any inp#t and o#tp#t in the syste"% nce the "anner in which the control is
e'ec#ted%
3icropro!ra""ed syste"s are centered aro#nd a co"p#ter of so"e sort, often a
"icrocontroller in s"all syste"s, that controls the syste" #sin! a pro!ra"% Inp#t is sent tothe co"p#ter, and the pro!ra" deter"ines what sho#ld be done with the inp#t to co"e #p
with an o#tp#t% .o the processor is between the inp#t and the o#tp#t, rather than a directlin between the inp#t and o#tp#t%
The versatility of the "icropro!ra""ed syste" far e'ceeds the hardwired syste"% The
syste"s can also be considerably s"aller% The si$e of a co"ple' "icrocontroller can be /#itea bit s"aller that a b#nch of lo!ic and switches for the sa"e f#nctionality%
++. -at i8 ran- penalt/1 E5plain -o ran- penalt/ i8 reue. 2De *0++
A branch instr#ction loads the processor@s pro!ra" co#nter with a new non1se/#entialval#e% Conse/#ently, all the instr#ctions whose e'ec#tion was started before the branch wastaen are s#ddenly red#ndant and the pipeline has to be refilled with instr#ctions followin!
the branch tar!et address% The cost of e'ec#tin! an operation that ca#ses a non1se/#entialflow of control is nown as the branch penalty%
Instr#ctions that "odify the flow of control red#cin! or even eli"inatin! the b#bble in the
4I.C@s pipeline ca#sed when a branch is taen9 that is, concerned with ways of red#cin! the
Prepared ByMrs. V.Rekha AP / MCA
-
8/10/2019 COrrrrr Unit IV
28/29
28
branch penalty% .o"e of the techni/#es involve li"itin! the da"a!e done by a branch andso"e techni/#es atte"pt to predict the o#tco"e of a branch before it has been e'ec#ted%
.everal instr#ctions "odify the flow of control9 for e'a"ple, the #nconditional branch, the
conditional branch, the s#bro#tine call, and the s#bro#tine ret#rn% Internally !eneratedtraps and e'ceptions and e'ternally !enerated interr#pts also "odify the flow of control%
.#bro#tine call and ret#rns are not nor"ally re!arded as branch operations fro" theco"p#ter architectKs point of view, b#t they have si"ilar characteristics fro" the co"p#terdesi!nerKs point of view9 that is, they also inc#r a branch penalty% The #nconditional branch
is always taen and forces e'ec#tion to contin#e at the tar!et address% An #nconditional
branch is e/#ivalent to the hi!h1level lan!#a!e !o to and its o#tco"e is nown at co"pile1ti"e%
Reduce -ranch penalt$:
The o#tco"e of a conditional branch is deter"ined by the state of one or "ore fla! bits inthe processorKs condition code re!ister and is therefore not nown #ntil r#nti"e% The
conditional branch "ay be taen% -hen a branch is not taen, the o#tco"e is so"eti"escalled in line beca#se the ne't instr#ction i""ediately followin! the branch is e'ec#ted% A
s#bro#tine call is a type of #nconditional branch that saves the ret#rn address% .i"ilarly, a
s#bro#tine ret#rn is an #nconditional branch that fetches the tar!et address fro" a re!isteror the stac% .o"e co"p#ters s#pport conditional s#bro#tine calls and ret#rns%
5% *redict branch>#"p instr#ctions AND branch direction (taen or not taen6% *redict branch>#"p tar!et address (for taen branches
:% .pec#latively e'ec#te instr#ctions alon! the predicted path
Anna #ni
-
8/10/2019 COrrrrr Unit IV
29/29
29
P% -hat is branch penaltyO E'plain how branch penalty is red#ced% 4ef% No%) 55F
Prepared By