the design of application specific integrated circuits with high level synthesis approaches
DESCRIPTION
The Design of Application Specific Integrated Circuits with High Level Synthesis Approaches. Shiann-Rong Kuang ( 鄺獻榮 ) Assistant Professor Dept. of Computer Science and Engineering National Sun Yat-Sen University. Outlines. Introduction Novel High Level Synthesis Approaches - PowerPoint PPT PresentationTRANSCRIPT
1NSYSU CSENSYSU CSE 2002/9
The Design of Application The Design of Application Specific Integrated Circuits with Specific Integrated Circuits with
High Level Synthesis Approaches High Level Synthesis Approaches
Shiann-Rong Kuang (鄺獻榮 )
Assistant ProfessorDept. of Computer Science and Engineering
National Sun Yat-Sen University
2NSYSU CSENSYSU CSE 2002/9
Outlines
Introduction
Novel High Level Synthesis Approaches
Integrated Data Path Synthesis Approach
Pipelined Control Path Synthesis Approach
Dynamic Pipelining Approach
ASICs design
Binary Arithmetic Coder
– Low-Error Fixed-Width Multipliers
Fuzzy Color Corrector
Future Work
3NSYSU CSENSYSU CSE 2002/9
Introduction
High level synthesis
Behavioral description register transfer level description
Data path synthesis and control path synthesis
t1=a-b;t2=c+t1;t3=e-f;x=d-t2;y=t1+t3; +
b d fet3
at1x
et2y
_
FSM
4NSYSU CSENSYSU CSE 2002/9
Integrated Data Path Synthesis Approach
Data Path Synthesis
module selection, scheduling, and allocation: highly interdependent
separately solve them the best designs may not be explored
Proposed Data Path Synthesis Approach
combine module selection, scheduling, and allocation
general module selection model – module types with different attributes (delay, area, …)
a mixed-vertex compatibility graph model– solve it globally using partial clique partitioning
5NSYSU CSENSYSU CSE 2002/9
Clock cycle=100ns, Latency=5, and performance constraint=500ns
-1
-3+5
a bcd e f
t1
t2
x y
t3+2
-4m-type ADD_1 ADD_2 SUB_1 SUB_2 SUB_3 Register Mux_2 Mux_3 wire
index 1 2 3 4 5cost (area) 200 100 240 120 60 150 80 120 100delay (ns) 70 160 70 160 380 2/2 3 3 5
-4
c-step0
1
2
3
4
+2
+5-3
-1
+2+5
e df ct3
bt1x
at2y
ADD_1 SUB_2
-1-3-4
SUB_3
-1
-3+5
+2-4
c-step
0
1
2
3
4-1,-3
-4+2+5
b d fet3
at1x
et2y
ADD_2 SUB_1
circuit 1 2
module cost 340 380
MUXcost 200 80
wirecost 1200 1100
Register cost 900 900
Totalcost 2640 2460
6NSYSU CSENSYSU CSE 2002/9
Find all feasible Assignments
MCG transformations
Initial MCG
|V1|=30, |V2|=0
A130 A131 A132
A433
A334
A432
A333
A332
A431
A140
A441
A343A342
A440
A141
A442
A450A211 A212 A213
A511 A512 A513 A514
A221 A222
A521 A522 A523
A430
c-step
01234
-1
-3
-4+2 +5
01234
-1
-3-4+2+5
c-step01234
c-step
-1+2
-3+5
-4
m-type ADD_1 ADD_2 SUB_1 SUB_2 SUB_3 Register Mux_2 Mux_3 wire
index 1 2 3 4 5cost (area) 200 100 240 120 60 150 80 120 100delay (ns) 70 160 70 160 380 2/2 3 3 5
7NSYSU CSENSYSU CSE 2002/9
A433
A334
A432
A333
A431A140
A441
A343
A440A442
A450 A212 A213
A512 A513 A514
A222
A522 A523
instance 1
A430
:new instance (subtractor)best Decision: A140,
A430A433A432A431
A140 , A343
A450
A212
A513 A514
A440
A522 A523
instance 1
A441 A442
best Decision: A343,1
(using the old subtractor instance)
MCG after iteration 1
MCG after iteration 2
8NSYSU CSENSYSU CSE 2002/9
A140 , A343A450
instance 1 instance 2
A212 , A514
instance 3
|V1|=0,|V2|=3
A140 , A343 A450
A212 A514
instance 1 instance 2
MCG after iteration 3 Final MCG
-4
c-step0
1
2
3
4
+2
+5-3
-1
+2+5
e df ct3
bt1x
at2y
ADD_1 SUB_2
-1-3-4
SUB_3
9NSYSU CSENSYSU CSE 2002/9
Integrated Data Path Synthesis Approach
Experiments and Results
Lib 1 Lib 2 Lib 3 Lib 4 m-type area area area delay (ns) m-type area delay (ns)
LT 1020 102 102 50 ADD_1 1500 55 ADD 1920 192 192 50 ADD_2 500 170 SUB 2240 224 224 50 ADD_3 300 220 MUL 80460 8046 4000 50 MUL_1 17000 80
Register 150 1500 150 2/2 MUL_2 8000 180 Mux_2 64 64 6400 2 MUL_3 3000 260 Mux_3 96 96 9600 2 Register 1200 2/2 Mux_4 128 128 12800 2 Mux_2 0 0 Mux_5 160 160 16000 2 Mux_3 0 0 Mux_6 192 192 19200 2 Mux_4 0 0
wire 100 100 1000 2 wire 0 0
10NSYSU CSENSYSU CSE 2002/9
Integrated Data Path Synthesis Approach
11NSYSU CSENSYSU CSE 2002/9
Integrated Data Path Synthesis Approach
12NSYSU CSENSYSU CSE 2002/9
Pipelined Control Path Synthesis Approach
Main Idea of Pipelining Control Path
i(t)
s(t)
SL SL
OL
SL’
SRs
i(t)
s(t)
)](),([ tstiSL
OL
SRs
)](),([ tsti
(a) (b)
i(t)
s(t)
SL SL
OL
SL’’
SRs
i(t+1)
(c)
i(t)
s(t)
OL
SRs
i(t+1) SRs’
(d)
i(t)
s(t)
OL
SRs
i(t+1)
'
(e)
i(t)
s(t)
SRs
i(t+1)CRsPRs
pipelined circuit 2
(f)
13NSYSU CSENSYSU CSE 2002/9
Pipelined Control Path Synthesis Approach
Proposed Control Path Synthesis Approach
A problem: may violate the control dependency
Modify the original BSTG by inserting no operation states
Theorem
A BSTG satisfies all control dependencies if the distance Dij of states in each produce-consume state pair <Si, Sj>c satisfies one of the following conditions:
Condition 1: if Sj is not a branch state, then Dij k.
Condition 2: if Sj is a branch state, then Dij 2k-1.
Nij : the minimal number of NOOPs needed to insert between <Si, Sj>c
Nij = 2k-Dij-1, if Sj is a branch state;
Nij = k-Dij, otherwise.
Minimize the number of NOOPs using ILP formulation
14NSYSU CSENSYSU CSE 2002/9
>1 +1
-2
+3
+7 -9+4
+5 -6 -8 +10
+11
>2
>3
SCDFG
[>1,+1]
[>2,-2]
[+4]
[+5] [-6] [c3:-8][c3:+10]
S1
S2
S3
S4
S5
S6
S7
S8
S9
c1
c2
c1
c2
[c3:+7][c3:-9]
[+11]
[>3,+3]
2
22
1
v1 v3
v2
2 2
2 1
v1 v3
v2 v4
2
2
2
1
2 21 ii
i1 i2
v4
(a)
(b)
[>1,+1]
[>2,-2]
[>3,+3]
[+4]
[+5] [-6]
[+11]
S1
S2
S3
S4
S5
S6
S7
S8
S9
c1
c2
c1
c2
[c3:+7][c3:-9]
c1 >1c2 >2c3 >3
[c3:-8][c3:+10]
BSTG
15NSYSU CSENSYSU CSE 2002/9
[>1,+1]
[>2,-2]
[+5] [-6][c3:-8]
[c3:+10]
S1
S2
S3
S4
S5 S6
S7
S8
S9
c1
c2
c1
c2 [c3:+7][c3:-9]
N2
N1
N3
N4
[+11]
[>3,+3]
[+4]
[>1,+1]
[>2,-2]
[+4]
[c3:-8][c3:+10]
S1
S2
S3
S4
S5 S6
S7
S8
c1c1
[c3:+7][c3:-9]
N2
N1
N3
N4
[+11]
[>3,+3]c2c1
c2c1 c1
S9
[-6][+5]
CL1
state registers
CL2
control registers
CLkCL3 PR1 PR2 PR3 PRk-1
pipelined circuit k
16NSYSU CSENSYSU CSE 2002/9
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12
5_EWF
Cond2
k
k
0
2
4
6
8
10
12
14
0 2 4 6 8 10 12
5_EWFCond2
PCPk
k
5_EWFCond2
100
200
300
400
500
600
0 2 4 6 8 10 12k
lits
0
100
200
300
400
500
0 2 4 6 8 10 12
5_EWFCond2
k
PRs
17NSYSU CSENSYSU CSE 2002/9
Dynamic Pipelining Approach
Pipelining
In most of existing pipelining techniques, latency is fixed or has some fixed values
In some loops of ASICs, variant loop execution length and time-relative data dependencies between the different iterations make them to be pipelined inefficiently or impossibly
Dynamic pipelining– A new loop scheduling approach to pipeline the loop using variant latencies
– Controller consists of two interactive finite state machines
while(c1) { while(c2) { } }
18NSYSU CSENSYSU CSE 2002/9
iteration
Phase 1 Phase 2 Phase 3
Latency=5 Latency=6 Latency=4
: the stages in which no operation is performedtime
i+4
i+3
i+2
i+1
i
Dynamic Pipelining Approach
19NSYSU CSENSYSU CSE 2002/9
An Example of Dynamic Pipelining
j=1;
while (N>j) { /* N is the number of data which needs to be sorted */
i=j-1;
temp=a[j];
while (temp<a[i] && i 0) {
a[i+1]=a[i];
i=i-1;
}
a[i+1]=temp;
j++;
}
20NSYSU CSENSYSU CSE 2002/9
S1: j=1; ………......................…...………………....….….... o1
S2: O_loop: if(N j) goto End_O; …......…...…….. o2
i=j-1; …….......................……………………….…........… o3
r_add=j; ………….….........…….………….… o4
S3: j++; ………….……..............…………………….........…. o5
temp=a[r_add];……..….……………...……………..…. o6
S4: I_loop: r_add=i; ……..…………….....……..…......... o7
S5: data=a[r_add]; …………………..….........…..……...... o8
S6: w_add=i+1; ………………………..…....…........… o9
if (!(temp<data && i 0)) goto End_I; …. o10
S7: a[w_add]=data; ………........…….………....... o11
i=i-1; gotoI_loop; …….......………………… o12
S8: End_I: a[w_add]=temp; gotoO_loop; ………........… o13
End_O:
S1 S2 S3 S4 S5 S6 S7 S8ci
cico
co
Init
[o1] [o2,o3][o4]
[o5,o6] [o7] [o8] [o13]
inner loop
[o9,o10] [o11,o12]
21NSYSU CSENSYSU CSE 2002/9
BSTGo
S1 S2 S3 NoopiS8
coco
Initdone
done
[start=1]
BSTGi
S4 S5 S6 S7Noopo
[done=ci]start
done
startdone
BSTG Partitioning
Inner Loop Pipelining
original PBSTGi
S4,i S5,i
S6,i Noopo
[done=ci]start done
startdone
S7,i
S4,i+1
S5,i+1
[done=1] start done
done
S7,i
PS1 PS2
S5,i+1
start
S6,i
[done=ci]
S4,i+1
Noopo
[done=1]
new PBSTGi
22NSYSU CSENSYSU CSE 2002/9
Outer Loop Pipelining
S2 S3 S8S4 S5 N1 N2 N3
iteration
i
i+1
i+2
1iS2 S3 S8S4 S5 N1 N2 N3
S2 S3 S8S4 S5 N1 N2 N3
S2 S3 S8S4 S5 N1 N2 N3
repeating pipeline body
unwind the loop body four times
new BSTGo :S1 S2 S3 Noopi S8
coco
Initdone
done
S4 S5
[start=1]
S2 S4 S5
S2 S3 S4 S5
S2 S3 S4 S5
S2 S3 S4 S5
S2 S3 S4 S5L=1 L=2
S3
L=3
23NSYSU CSENSYSU CSE 2002/9
final PBSTGo
N3,i
Noopx
S3,i+2
N2,i+1
[start=1]
done
doneco done
co done
PS1 PS2[start=1] PS3
S2,i+2
S5,i+1
S8,i
N1,i+1 S4,i+2
co done
co done
repeating pipeline body
Init S1
S5,i
S2,i+1
N1,i
S4,i+1
N2,i
Noopx
done
done[start=1] Noopi
done
S2 S3 S4 S3,i+1[start=1]
S8 S8
co done
co done
done
final PBSTGi
doneS7,i
PS2
S5,i+1
done start
S6,i
Noopo
[done=ci]start
start
PS1
done start S4,i+1
[done=1]
24NSYSU CSENSYSU CSE 2002/9
Controller Architecture
combinationallogic
state registers
Mux
combinationallogic
state registers
inner controller
outer controller
run
done
co
start
Control signals
from
dat
apat
h
to d
atap
ath
ciEq. (3.4)
Eq. (3.3)
Eq. (3.5)
Datapath Allocation
25NSYSU CSENSYSU CSE 2002/9
An execution example
latency=7 latency=3latency=5latency=3
PS3 PS1 PS2 PS3 PS1 PS2 PS3 PS1 PS2 PS3 PS1 PS2 PS3 PS1Nop Nop Nop Nop Nop Nop
PS2 PS1 PS1 PS2 PS1 PS1 PS2 PS1 PS1 Nop Nop PS1 PS2 PS1PS2 PS1 PS2 PS1 PS2 PS1
N2
S4
N3
S5
S2
S8
N1
S3
N2
S4
N3
S5
S2
S8
N1
S3
N2
S4
N3
S5
S2
S8
N1
S3
N2
S4
N3
S5
S2
S8
N1
S3
N2
S4
N3
S5
S2
S7S5
S6S4
S6S4
S7S5
S6S4
S7S5
S6S4
S6S4
S7S5
S6S4
S7S5
S6S4
S7S5
S6S4
S6S4
S6S4
S7S5
S6S4
done
start
run( )
1 0 0 0 0 10 0 0 0 0 0 0 1 1 1 1 0 0 1
1 0 0 1 1 10 0 0 1 1 1 1 1 0 0 1 0 0 1
1 1 0 1 1 10 1 0 1 1 1 1 1 1 0 1 1 0 1
inner
outer
state(i)
state(o)
PS1 PS2
Nop PS1
N3
S5
S2
S8
N1
S3
S6S4
1 0
1 0
1 1
PS2
PS1
S8
N1
S3
S6S4
0
0
1
S2 S3 S4 S5 S6 S7 S4 S5 S6 S8
S2 S3 S4 S5 S6 S7 S4 S5 S6 S7 S4 S5 S6 S8
S6S5S2 S3 S4 S5 S6 S7 S4 S5 S6 S7 S4 S5 S6 S7 S4 S8
S2 S3 S4 S5 S6 S8
iteration
i
i+1
i+2
1i
26NSYSU CSENSYSU CSE 2002/9
Experimental Results
Comparing results of insertion sorter
Other examples
example data size sequentialdynamic
pipeliningspeedup
Data1 10 56 31 1.81
Data2 10 236 121 1.95
Data3 10 108 57 1.89
Data4 10 112 59 1.90
Data5 100 596 301 1.98
Data6 100 20396 10201 2.00
Data7 100 9980 4993 2.00
Data8 100 10176 5091 2.00
27NSYSU CSENSYSU CSE 2002/9
Binary Arithmetic Coder
Adaptive Binary Arithmetic Coder
Q-coder: compress mainly bilevel image data
a compression chip universal enough quickly compress any type of data that could still achieve a good compression ratio
proposed modified hardwared algorithm– a new probability estimation modeler using a table-look-up app
roach
– a technique solves carry-over and source termination
– fixed-width parallel multiplier
VLSI chip
28NSYSU CSENSYSU CSE 2002/9
Encoding(){
C=0x00; A=0xff; R=0x0000; S=0000000000;for (each input binary symbol) {
phase1: Generate P('0'|S) by Eq. (4.5);phase2: AP=A* P('0'|S);
if (input symbol=='0') A=AP;else {
A=A-AP; C=C+AP;if (carry occurs) R++;
}Update the adaptive modeler by Eq. (4.6);Shift the input symbol into S;
phase3: while (MSB of A==0) normalization_of_encoding();}Encode LPS and then output 17 consecutive '1'’s;
}
Encoding Algorithm
29NSYSU CSENSYSU CSE 2002/9
System Architecture
Modeler
P(‘0’|S)Adaptive
S
NormalizationInput/Output
Path
Asynchronous
C
A
01
En_Input
De_Output
En/De
En_Input Input
Output
symbol
symbolEn_Output
De_Input
De_Output
01
A’
C’
En/De
Shift_In
Adaptive Coder
En_CL
De_CL
State registers
Control Path
Init
En/De
ArithmeticOperation Unit Unit
handshakingsignals
In_Data
Out_Data
30NSYSU CSENSYSU CSE 2002/9
characteristic proposed chip
technology TSMC 0.8m SPDMsupply voltage 5V
package 40 LD S/B
operation frequency 25 MHz
chip area 4.2*4.5 mm2
chip complexity 54k gates
scan-mode yes
31NSYSU CSENSYSU CSE 2002/9
file compression ratio of different coding methods
type name ST_32 MF1 MF2 MDF Huffman LZW
t07 62.60% 58.02% 61.27% 64.55% 47.13% 63.32%
t18 63.00% 58.05% 60.57% 63.83% 44.70% 63.59%
t19 65.31% 60.89% 63.86% 67.65% 48.39% 65.78%
t20 60.99% 57.32% 59.37% 62.70% 43.52% 61.04%
t21 66.97% 61.61% 63.46% 67.35% 46.57% 67.11%
Text
t22 57.69% 55.21% 58.15% 61.67% 45.25% 59.13%
average 63.57% 59.03% 61.51% 65.03% 45.91% 64.00%
t03 26.83% 18.59% 20.87% 21.42% 14.74% 14.41%
t09 40.90% 32.74% 35.50% 37.72% 27.43% 38.55%
t10 43.73% 29.68% 31.49% 32.38% 20.21% 42.06%
t13 49.87% 35.16% 37.34% 38.87% 22.82% 47.99%
t14 27.44% 20.89% 22.65% 23.06% 16.44% 18.62%
Binary
t23 46.57% 36.82% 40.03% 43.39% 29.61% 44.71%
average 44.04% 30.65% 32.65% 33.79% 21.04% 41.58%
t15 49.37% 45.01% 46.55% 48.12% 10.31% 38.82%
t05 19.69% 20.15% 21.26% 21.46% 4.22% 6.40%
t12 49.77% 47.14% 48.62% 49.82% 12.43% 39.84%
t24 9.41% 8.44% 9.15% 8.69% 10.56% 0.00%
t16 24.86% 22.66% 23.55% 23.46% 3.09% 10.23%
Image
t17 36.64% 35.58% 36.79% 37.48% 12.20% 26.18%
average 36.19% 34.12% 35.35% 36.04% 9.80% 25.22%
total average 42.32% 33.61% 35.36% 36.47% 18.39% 36.91%
32NSYSU CSENSYSU CSE 2002/9
Dynamic Pipelining Design
filesfile size(bytes)
C- rateC-speed (S)(Mbit/sec)
C-speed (P)(Mbit/sec)
speedup
Text1 73622 49.1% 2.99 6.10 2.04Text2 51740 52.8% 3.00 6.12 2.04Text3 108501 51.5% 3.00 6.12 2.04Text4 23680 49.6% 2.97 6.13 2.06
Binary1 163840 23.2% 2.86 6.01 2.10Binary2 147456 34.8% 2.91 6.03 2.07Binary3 1064960 36.3% 2.94 6.05 2.06Binary4 98304 40.9% 2.93 6.06 2.07Image1 345600 42.0% 2.99 6.01 2.01Image2 245760 13.9% 2.85 5.98 2.10Image3 921856 41.8% 2.94 6.01 2.04Image4 345600 28.1% 2.96 5.98 2.02
33NSYSU CSENSYSU CSE 2002/9
Low-Error Fixed-Width Multipliers
Fixed-Width Multiplier
multiplication operations used in many ASICs have the special fixed-width property
directly omit about half the adder cells of the conventional parallel multiplier
a significant error would be introduced in the product
Low-Error Fixed-Width Multiplier
low-error fixed-width sign-magnitude multipliers
low-error fixed-width two’s complement multipliers
reduced width multiplier (n < m < 2n)
34NSYSU CSENSYSU CSE 2002/9
Low-Error Fixed-Width Multipliers
Fixed-width sign-magnitude multipliers
20311302 4
1 ... nnnn yxyxyxyx
1001120032
1
8
1 ...... yxyxyxyxn-nn
102112011
2
1 ... nnnnn yxyxyxyx
102112011
...
nnnnnji
ji yxyxyxyxyx
2
1 1n
10101113022
1
2
1
4
1
4
1 ... yxyxyxyxnnnn where
Theorem: Given a , we have that
n
2
0.> if 1,
0;= if 0, 1
nand
35NSYSU CSENSYSU CSE 2002/9
x0 y5
P0
P1
P2
P3
P4
P5
x3 y1 x2 y1 x1 y1 x0 y1x4 y1
x5 y0 x4 y0 x3 y0 x2 y0 x1 y0 x0 y0
x2 y2 x1 y2 x0 y2x3 y2
x2 y3 x1 y3 x0 y3
x1 y4 x0 y4
Ha Ha Ha Ha Ha
Fa Fa Fa Fa Fa
Fa Fa Fa Fa Fa
Fa Fa Fa Fa Fa
Fa Fa Fa Fa Fa
P6P7P8P9P10P11
0
x5 y1
x4 y2x5 y2
x4 y3x5 y3
x4 y4x5 y4
x4 y5x5 y5
x3 y3
x2 y4x3 y4
x2 y5 x1 y5x3 y5
Fa Fa Fa Fa Fa
x5 y0
cell AO: AO1
AO3
AO4
x4 y1
AO2 x3 y2
x2 y3
x1 y4
x0 y5
Cg
Fa
Fa Fa
Fa Fa Fa
Fa Fa Fa Fa
P6P7P8P9P10P11
x5 y1
x4 y2x5 y2
x4 y3x5 y3
x4 y4x5 y4
x4 y5x5 y5
x3 y3
x2 y4x3 y4
x2 y5 x1 y5x3 y5
Fa Fa Fa Fa Ha
AG
C1
C2
C3
C4
C5
O1
O2
O3
O4
Sign-magnitude multiplierSign-magnitude multiplier
X = x5 x4 x3 x2 x1 x0 Y = y5 y4 y3 y2 y1 y0
36NSYSU CSENSYSU CSE 2002/9
x0 y5
P0
P1
P2
P3
P4
P5
x3 y1 x2 y1 x1 y1 x0 y1x4 y1
x5 y0 x4 y0 x3 y0 x2 y0 x1 y0 x0 y0
x2 y2 x1 y2 x0 y2x3 y2
x2 y3 x1 y3 x0 y3
x1 y4 x0 y4
Ha Ha Ha Ha Ha
Fa Fa Fa Fa Fa
Fa Fa Fa Fa Fa
Fa Fa Fa Fa Fa
Fa Fa Fa Fa Fa
P6P7P8P9P10P11
1
x5 y1
x4 y2x5 y2
x4 y3x5 y3
x4 y4x5 y4
x4 y5x5 y5
x3 y3
x2 y4x3 y4
x2 y5 x1 y5x3 y5
Fa Fa Fa Fa Fa
P’M
cell OR:OR
OR
OR
x4 y1
x5 y0
OR x3 y2
x2 y3
x1 y4
x0 y5
Cg
Fa
Fa Fa
Fa Fa Fa
Fa Fa Fa Fa
P6P7P8P9P10P11
1
x5 y1
x4 y2x5 y2
x4 y3x5 y3
x4 y4x5 y4
x4 y5x5 y5
x3 y3
x2 y4x3 y4
x2 y5 x1 y5x3 y5
Fa Fa Fa Fa FaTwo’s complementTwo’s complement multipliermultiplier
37NSYSU CSENSYSU CSE 2002/9
AO
AO
AO
Cgx0 y5
P5
x3 y1x4 y1
x5 y0 x4 y0
x2 y2
x3 y2
x2 y3
x1 y3
x1 y4
x0 y4
Ha
Fa Fa
Fa Fa Fa
Fa Fa Fa Fa
Fa Fa Fa Fa Fa
P6P7P8P9P10P11
1
x5 y1
x4 y2x5 y2
x4 y3x5 y3
x4 y4x5 y4
x4 y5x5 y5
x3 y3
x2 y4x3 y4
x2 y5 x1 y5x3 y5
Fa Fa Fa Fa Fa
x0 y5
P0
P1
P2
P3
P4
P5
x3 y1 x2 y1 x1 y1 x0 y1x4 y1
x5 y0 x4 y0 x3 y0 x2 y0 x1 y0 x0 y0
x2 y2 x1 y2 x0 y2x3 y2
x2 y3 x1 y3 x0 y3
x1 y4 x0 y4
Ha Ha Ha Ha Ha
Fa Fa Fa Fa Fa
Fa Fa Fa Fa Fa
Fa Fa Fa Fa Fa
Fa Fa Fa Fa Fa
P6P7P8P9P10P11
1
x5 y1
x4 y2x5 y2
x4 y3x5 y3
x4 y4x5 y4
x4 y5x5 y5
x3 y3
x2 y4x3 y4
x2 y5 x1 y5x3 y5
Fa Fa Fa Fa Fa
P’M
Reduced width multiplierReduced width multiplier
38NSYSU CSENSYSU CSE 2002/9
Low-Error Fixed-Width Multipliers
Error comparison
multipliers errors n=4 n=8 n=12 n=16
MP' 48 1792 45056 983040M1 32 1280 32768 720896M2 16 768 20480 458752MF 16 512 8192 196608MR
M
16 256 4096 131072
MP' 13.750 450.75 11267.75 245764.7M1 5.375 187.89 3927.92 74497.4M2 1.500 130.50 4099.50 98308.5MF 0.938 65.04 1570.32 30403.7MR
0.125 26.76 731.39 14629.6
MP' 12.9 3.7 5.1 7.1M1 5.6 2.5 2.3 2.8M2 1.6 1.9 2.5 3.2MF 1 1 1 1
two’scomplement
MR
(0.01)
0.1 0.4 0.5 0.5
39NSYSU CSENSYSU CSE 2002/9
Application
(a) original (b) M1 (c) MF
(d) MR1 (e) MR2 (f) MS
40NSYSU CSENSYSU CSE 2002/9
(b) M1(a) original
41NSYSU CSENSYSU CSE 2002/9
(c) MF (d) MR1
42NSYSU CSENSYSU CSE 2002/9
(e) MR2 (f) MS
43NSYSU CSENSYSU CSE 2002/9
Fuzzy Color Corrector
Fuzzy Color Correction
in previous literature, the color correction process was modeled as a three-level fuzzy tree inference process
– the algorithm in it is inefficient and its hardware implementation is then costly and slow
a new efficient fuzzy tree inference algorithm suitable for the center of gravity defuzzification method is proposed
FuzzyCorrection System
Printed Color Image
ColorScanner
Original Color Image
ColorPrinter
RGB CMY
44NSYSU CSENSYSU CSE 2002/9
modified fuzzy color correction algorithm
Init: L=1;S1: while (input pattern Xi NULL) {S1: Calculate the address of rule memory (ROM);S2, S3: s1=ROM[address++]; D=s1; S4: k=0; PathL=0; d=ROM[address];S5: while (k<8 && D>0) {S6: D=d; PathL=k; k++;
}S5: if (1 k 7 && |D| d/2) PathL=k; S7~S13: Calculate Xo using Eq. (6.6);S7: if (++L==4) L=1;
}
45NSYSU CSENSYSU CSE 2002/9
+ROM
s1, d L ROM2
ALU
k
Path1Path2
c6 c8
2’s
_
D
|D|
Xo
temp2temp1address
wk
Xi
144
128
<<4 <<1<<1
>>1E0E8L0
_
2.5. Fuzzy Color CorrectorProposed Sequential Architecture
46NSYSU CSENSYSU CSE 2002/9
Dynamic pipelined Design
picturesfile size(bytes) sequential
dynamicpipelining speedup
Pic1 148416 2704900 1370164 9.25 1.97
Pic2 230604 4345076 2193924 10.39 1.98
Pic3 974916 16898438 8139846 8.35 2.08
Pic4 1137198 20268056 10356836 9.11 1.96
L
47NSYSU CSENSYSU CSE 2002/9
Future Work
NNI: NoC Network Interface (ISO-OSI 7-Layer RM) other components
R-FPGA
VCI
NNI
NNI NNI NNINNI NNI
NNI NNINNI NNI NNI
Interconnection network External Networks
Video Camera
I
Display
O
VideoDec. IP
CPUCore
VideoEnc. IP
RateControl IP MEM
System-on-a-Chip System-on-a-Chip ((SoCSoC)) Platform Platform
48NSYSU CSENSYSU CSE 2002/9
References
[1] Jer-Min Jou, Shiann-Rong Kuang, Yeu-Horng Shiau, and Ren-Der Chen, “Design of A Dynamic Pipelined Architecture for Fuzzy Color Correction”, to be published in IEEE Transactions on VLSI Systems, 2002.
[2] Jer-Min Jou, Yeu-Horng Shiau, Pei-Yin Chen, and Shiann-Rong Kuang, “A Low Cost Gray Prediction Search Chip for Motion Estimation”, Vol. 49, No. 7, pp. 928-938, July 2002.
[3] Shiann-Rong Kuang, Jer-Min Jou, Ren-Der Chen, and Yeu-Horng Shiau, “Dynamic Pipeline Design of an Adaptive Binary Arithmetic Coder,” IEEE Transactions on Circuits & Systems Part II, Vol. 48, No. 9, pp. 813-825, September 2001.
[4] Jer Min Jou, Shiann Rong Kuang, and Ren-Der Chen, “Design of Low-Error Fixed-Width Multipliers for DSP Applications,” IEEE Transactions on Circuits & Systems Part II, Vol. 46, No. 6, pp. 836-842, June 1999.
49NSYSU CSENSYSU CSE 2002/9
References
[5] Jer-Min Jou, Shiann-Rong Kuang, and Ren-Der Chen, “A New Efficient Fuzzy Algorithm for Color Correction,” IEEE Transactions on Circuits & Systems Part I, Vol. 46, No. 6, pp. 773-775, June 1999.
[6] Shiann-Rong Kuang, Jer-Min Jou, and Yuh-Lin Chen, “The Design of an Adaptive On-Line Binary Arithmetic Coding Chip,” IEEE Transactions on Circuits & Systems Part I, Vol. 45, No. 7, pp. 693-706, July 1998.
[7] Jer-Min Jou and Shiann-Rong Kuang, “Design of a low-error fixed-width multiplier for DSP applications,” Electronics Letters, Vol. 33, No. 19, pp. 1597-1598, 1997.
[8] Jer-Min Jou and Shiann-Rong Kuang, “A Library-Adaptively Integrated High Level Synthesis System,” Proceedings of NSC – Part A: Physical Science and Engineering, Vol. 19, No. 3, pp. 220-234, May 1995.