m2: team paradigm :: milestone 2 2-d discrete cosine transform group m2: tommy taylor brandon hsiung...
Post on 22-Dec-2015
215 views
TRANSCRIPT
M2: Team Paradigm
:: Milestone 2 2-D Discrete Cosine Transform
Group M2:Tommy Taylor Brandon HsiungChangshi XiaoBongkwan Kim
Project Manager: Yaping Zhan
M2: Team Paradigm
Project statusDesign Proposal (Complete)Architecture Proposal (Almost Complete): Algorithm description (Done): High level simulation (Done): Mapping algorithm into hardware (Done): Behavioral Verilog and test bench (Debugging)Size estimates/floor plan (To be completed): Structural Verilog: More accurate transistor count: Floor plan
M2: Team Paradigm
Design decisions
Do not include motion prediction
Go with 2-D DCT
Use SRAM
No pipelining
Will not run in real-time
M2: Team Paradigm
Distributed algorithm of 1D DCT :
A = cos(/4)
B = cos(/8)
C = sin(/8)D = cos(/16)E = cos(3/16)
F = sin(3/16)G = sin(/16)
A A A A
B C -C -B
A -A -A A
C -B B -C
x0 + x7
x1 + x6
x2 + x5
x3 + x4
X0
X2
X4
X6
= 1/2
D E F G
E -G -D -F
F -D G E
G -F E -D
x0 - x7
x1 - x6
x2 - x5
x3 - x4
X1
X3
X5
X7
= 1/2
M2: Team Paradigm
In two’s complement representation:
ui = -buiB-1 + j=1, B-1 2-jbui
j
Where, buij is the jth bit, bui
B-1 is the MSB, i.e. the sign bit
Xn = j=1,B-1 2-jDn(bj) – Dn(bB-1), where Dn(bj) = (i=1,3Ci,n buij)
A A A A
B C -C -B
A -A -A A
C -B B -C
b015 b0
14…b00
b115 b1
14…b10
b215 b2
14…b20
b315 b3
14…b30
X0
X2
X4
X6
=
For example, D0(b14) = Ab014+Ab1
14+Ab214+Ab3
14
Distributed algorithm of 1D DCT (continued):
M2: Team Paradigm
1D DCT architecture
out_data(16)
Selector
+ -
+ +R R
Parallel to serial
Control logic
ROM
in_data(16)
in_valid
out_valid
out_ready
out_done
clk
vdd
vss
reset
Register file 8x16
Register file 8x16
Bit addressgenerator
Bit addressgenerator
ROM
M2: Team Paradigm
2D DCT :
Two 1D DCT can operate in pipeline to boost throughput performance, this requires RAM can be read and wrote at the same time and each 1D DCT module read/write the RAM in row and column order alternatively.
1D DCT (on rows)
1D DCT (on columns)
Transpose RAM
Data in
Data out
Control logic
M2: Team Paradigm
Transistor count and performance estimation :
adder register ROM Control logic total pins
4x16x30 18x16x20 8x16x2 1000 ~9k 40
1DDCT module :
2DDCT = 2x1DDCT + SRAM ~ 24k
throughput latency
8 samples/64 cycle 528 cycle
M2: Team Paradigm
High level simulation (in C/C++) :three implementation of 1DDCT:
1. Based on definition
2. Based on fast algorithm
3. Based on distributed algorithm
input
Function 1
Function 2
Function 3
Matlab
comparepass/fail
M2: Team Paradigm
-
Selector
R0 R7 We begin by inputting eight, sixteen bit values into individual registers
We use a selector to select the registers that will be added and subtracted
The R0 & R7 values are added and subtracted in parallel...So forth for R1 & R6...R2 & R5....R3 & R4
It will take 8 clock cycles to get all the data
R7R0
Step 1:
M2: Team Paradigm
Step 1 (Verilog)
always @ (posedge clk or negedge rst) begin if(rst==0) begin
count <= 0; end else begin
if(in_clr==1) begin count <= 0; end else begin if(in_valid && ~out_full) begin buf[count] <= in_data; count <= count + 1; end end
end end // always @ (posedge clk or negedge rst)
always @ (posedge clk) begin if(in_read) begin
out_data1 <= buf[in_addr]; out_data2 <= buf[7-in_addr];
end end
Write operation
Read operation
M2: Team Paradigm
Bit Address Generator
Store the results from the addition and subtraction into 8, 16' registers
Taking the first bit in each of the four registers (addition results and subtraction result) we use the value to allow the bit address generator to store it in the proper position in ROM
R0 R7bit 1bit 1bit 1bit 1
1011
Rom0 Rom7
Step 2
M2: Team Paradigm
Step 2 (Verilog)always @ (posedge clk or negedge rst) begin if(rst==0) begin
count <= 0; end else begin
if(in_clr==1) begin count <= 0; end else begin if(in_read & ~out_full) begin buf[count] <= in_data; count <= count + 1; end end
end end
always @ (in_bitpos) begin out_addr[3] <= buf[0][in_bitpos:in_bitpos]; out_addr[2] <= buf[1][in_bitpos:in_bitpos]; out_addr[1] <= buf[2][in_bitpos:in_bitpos]; out_addr[0] <= buf[3][in_bitpos:in_bitpos]; end
Bit address generator
Read operation
M2: Team Paradigm
Rom0 Rom7
R5 R6
S1S0
Parallel to Serial
From the ROM the data in the addresses are added, stored in a register then the result is shifted (multiplied by a factor of two...two's complement)
Step 3
M2: Team Paradigm
Step 3 (Verilog)always @ (posedge clk or negedge rst) begin if(rst==0) begin
out_data <= 0; bit_pos <= 15;
end else begin
if(in_clr==1) begin out_data <= 0; bit_pos <= 15; end else begin if(~out_done) begin out_data <= out_data + in_data; bit_pos <= bit_pos - 1; end end // else: !if(in_clr==1)
end end
M2: Team Paradigm
C Code Result
M2: Team Paradigm
::conclusion & questions
: Implementing 2D DCT
: Roughly 24k transistor count
: Verilog needs debugging