ece 656m embedded systems design and prototyping term 3, 2011-2012
TRANSCRIPT
ECE 656M
Embedded Systems DesignAnd
Prototyping
Term 3, 2011-2012
Cesar A. Llorente
Research and teaching interests:• reconfigurable computing• machine vision• energy systems
Contact:Electronics and Communications Engineering
College of Engineering
Contact: [email protected]
ECE 545
Lecture Projects
Project 1 30 %Project 2 20 %
Homework 10 %exams Quiz 20 % in class Final 20 % take home
Lecture (1)
Lecture 1 - Introduction to Embedded SystemsLecture 2 – Introduction to VHDL Combinational Logic. Packages and Components.Hands-on Session 1: XST Synthesis and SimulationLecture 3 – Behavioral Modeling of Sequential Logic. Registers, Counters, Shift Registers. Simple Testbenches.Lecture 4 - Introduction to FPGA Devices & ToolsHands-on Session 2: Tools for FPGA Synthesis and ImplemenationLecture 5 - Finite State MachinesLecture 6 - Algorithmic State Machines. Memories: RAM, ROM.Lecture 7 – Advanced Testbenches. File I/O.Lecture 8 - Mixed Style RTL Modeling
Quiz 1
Lecture (2)
TextbooksRequired Textbooks:
Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004
Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998
Supplementary Textbooks:
Stephen Brown and Zvonko Vranesic, Fundamentals of Digital Logic with VHDL Design, 2nd Edition, McGraw-Hill, 2005
Peter J. Ashenden, The Designer's Guide to VHDL, 2nd Edition, San Francisco:Morgan Kaufman, 1996, 2002
Quiz
2 hours 30 minutes
in class
design-oriented
open-books, open-notes
Tentative date:
Final Examination
take-home
full design, including logic synthesis and timing analysis
for FPGAs
Tentative date:
Project technologies
FPGA: Field Programmable Gate Arrays
World of Integrated Circuits
Integrated Circuits
Full-CustomASICs
Semi-CustomASICs
UserProgrammable
PLD FPGA
PAL PLA PML LUT(Look-Up Table)
MUX Gates
• designs must be sent for expensive and time consuming fabrication in semiconductor foundry
• bought off the shelf and reconfigured by designers themselves
Two competing implementation approaches
ASICApplication Specific
Integrated Circuit
FPGAField Programmable
Gate Array
• designed all the way from behavioral description to physical layout
• no physical layout design; design ends with a bitstream used to configure a device
Which Way to Go?
Off-the-shelf
Low development cost
Short time to market
Reconfigurability
High performance
ASICs FPGAs
Low power
Low cost inhigh volumes
Source: [Brown99]
What is an FPGA Chip ?
• Field Programmable Gate Array
• A chip that can be configured by user to implement different digital hardware
• Configurable Logic Blocks and Programmable Switch Matrices
• Bitstream to configure: function of each block & the interconnection between logic blocks
I/O Block
I/O B
lock
I/O Block
I/O B
lock
CLB Structure
COUT
D Q
CK
S
REC
D Q
CK
REC
O
G4G3G2G1
Look-UpTable
Carry&
ControlLogic
O
YB
Y
F4F3F2F1
XB
X
Look-UpTable
F5IN
BYSR
S
Carry&
ControlLogic
CINCLKCE SLICE
CLB Slice
LUT (Look-Up Table) Functionality
• Look-Up tables are primary elements for logic implementation
• Each LUT can implement any function of 4 inputs
x1 x2 x3 x4
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
Major FPGA Vendors
SRAM-based FPGAs
• Xilinx, Inc.
• Altera Corp.
• Atmel
• Lattice Semiconductor
Flash & antifuse FPGAs
• Actel Corp.
• Quick Logic Corp.
Share over 60% of the market
Xilinx FPGA Families• Old families
– XC3000, XC4000, XC5200
old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs.
• Low-cost families
– Spartan/XL – derived from XC4000
– Spartan-II – derived from Virtex
– Spartan-IIE – derived from Virtex-E
– Spartan-3
• High-performance families
– Virtex (0.22µm)
– Virtex-E, Virtex-EM (0.18µm)
– Virtex-II, Virtex-II PRO (0.13µm)
– Virtex-4 (0.09µm)
Design process (1)
Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds…..
Library IEEE;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;
entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; );end AES_core;
Specification
VHDL description (Your VHDL Source Files)
Functional simulation
Post-synthesis simulationSynthesis
Design process (2)
Implementation(Mapping, Placing & Routing)
Configuration
Timing simulation
On chip testing
Design Process control from Active-HDL
Simulation Tools
Many others…
architecture MLU_DATAFLOW of MLU is
signal A1:STD_LOGIC;signal B1:STD_LOGIC;signal Y1:STD_LOGIC;signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;
beginA1<=A when (NEG_A='0') else
not A;B1<=B when (NEG_B='0') else
not B;Y<=Y1 when (NEG_Y='0') else
not Y1;
MUX_0<=A1 and B1;MUX_1<=A1 or B1;MUX_2<=A1 xor B1;MUX_3<=A1 xnor B1;
with (L1 & L0) selectY1<=MUX_0 when "00",
MUX_1 when "01",MUX_2 when "10",MUX_3 when others;
end MLU_DATAFLOW;
VHDL description Circuit netlist
Logic Synthesis
Synthesis Tools
… and others
Features of synthesis tools
• Interpret RTL code
• Produce synthesized circuit netlist in a standard EDIF format
• Give preliminary performance estimates
• Some can display circuit schematics corresponding to EDIF netlist
Implementation
• After synthesis the entire implementation process is performed by FPGA vendor tools
Mapping
LUT2
LUT3
LUT4
LUT5
LUT1FF1
FF2
LUT0
Placing
CLB SLICES
FPGA
Routing
Programmable Connections
FPGA
Design Process control from Active-HDL
Top Level ASIC Digital Design Flow
RTL Design
Place + Route
Physical Verification
Synthesis
Design Inception
Design Complete
Macro Development
RTL DesignDesign Function Digital Tool
RTL Design
Testbench Developement
Mixed Mode Simulation
FPGA Verification(users discression)
Lint Checking(users discression)
Code Coverage(users discression)
Formal Verification
Cadence NC VerilogMentor Graphis ModelSim
Cadence NC VerilogMentor Graphics ModelSim
Cadence AMS Designer
Xilinx ISE
Cadence Hal
Cadence ICT
Agilent ADSMatlab
Design Inception Design Inception
Synthesis
Synthesis + Macro Development
System Interface Simulation
Cadence Conformal
Synthesis + Macro Development
Synthesis + Macro DevelopmentDesign Function Digital Tool
Synthesis
Static Timing Analysis
Logical Equivalency
DFT
Place + Route
Gate-Level Simulation
RTL
Synopsys DC Cadence RC
Synopsys PrimeTime
Cadence Conformal
Synopsys DFT CompilerCadence RC
Place + Route
Cadence NC VerilogMentor Graphics Modelsim
RTL
Macro Generation
Macro Verification
Macro Rules Generation / Library Generation
Mentor Graphics Calibre
Artisan/Cadence DFII
Artisan
Verification Verification
Place + Route
Floorplan
Macro Placement / Std Cell Placement
Placement-Based Optimization
Clock Tree Synthesis
Route
RC Extraction
Signal Integrity
Design Function Digital Tool
Static Timing
Analysis
Cadence NanoRoute
Cadence Fire&Ice QX
Cadence CeltIC / Voltage Storm
SynopsysPrime-Time
VerificationVerification
Cadence Encounter
Synthesis Synthesis
ATPG Mentor Graphics FastScan
Cadence EncounterMetal Fill
Spare Cells / Decoupling Cap Filler Cells Cadence Encounter
Physical VerificationDesign Function Digital Tool
GDSII Preparation / Schematic Preparation
DRC
LVS
ERC
Simulation Preparation
Back Annotated SimulationLayout Chip Finishing
Cadence DFII Cadence DFII
Cadence NC VerilogCadence Virtuoso
Placed + Routed Design
Placed + Routed Design
Design Complete Design Complete
Mentor Graphics Calibre
Top-Level SimulationSynopsys Nanosim
Cadence AMS Designer
CAD software available at DLSU (1)
• Xilinx ISE 12.3 (under Windows)
• VCS (under Linux)
• available in the STRC111 Intel Microprocessors Lab
VHDL simulators
Free Student Edition: ISE WebPack
• available in the STRC111 Intel Microprocessors Lab
CAD software available at DLSU (2)
Tools used for logic synthesis
• Xilinx XST / EDK /SDK (under Windows)
FPGA synthesis
• available in the STRC111 Intel Microprocessors Lab
CAD software available at DLSU (3)
• Xilinx XST (under Windows)
FPGA synthesis
• available in the STRC111 Intel Microprocessors Lab
Tools used for implementation (mapping, placing & routing) in the FPGA technology
Projects – Overview
Project 1 (35 points) January – February (~6 weeks)
Project 2 (35 points) March (~4 weeks )
Application: Game Application using Microblaze ProcessorTechnology: FPGATarget: synthesizable code, downloadable code
Application: Game Software using state machines Technology: FPGATarget: synthesizable code, downloadable code
Projects 1, 2
• choice between two project topics cryptography (e.g., encryption, authentication, hash) digital signal processing (e.g., digital filter, FFT, image processing, etc.)
• both topics specified by the instructor
• initial specification in the form of a - pseudocode and/or flowchart - detailed interface
• design and source code is required to be scalable, i.e., work for different parameters and operand sizes, specified at the time of synthesis
EncryptionInput: (A, B, C, D) Table S[0..2r+3]
B = B + S[0]D = D + S[1]for i= 1 to r do { t= (B*(2B+1)) <<< log2w u= (D*(2D+1)) <<< log2w A= ((At) <<< u) + S[2i] C= ((Cu) <<< t) + S[2i+1] (A, B, C, D) = (B, C, D, A) }A = A + S[2r+2]C = C + S[2r+3]
Output: (A, B, C, D)
DecryptionInput: (A, B, C, D) Table S[0..2r+3]
C = C – S[2r+3]A = A – S[2r+2]for i= r downto 1 do { (A, B, C, D) = (D, A, B, C) u= (D*(2D+1)) <<< log2w t= (B*(2B+1)) <<< log2w C= ((C – S[2i+1]) >>> t)u A= ((A – S[2i]) >>> u)t }D = D – S[1]B = B – S[0]
Output (A, B, C, D)
Example: Last year’s project – RC6 cipher
Encryption/decryptionunit
with control & i/o interface
clock
reset
enc_dec
data_in
data_available
data_read
m
S_i
key_available
key_read Key memory unit
data_out
writefull
m
round number round key(s)
Required interface
w
ready
Projects 1, 2Optimization Criteria
Maximum ratio
Throughput / Circuit Area
or
Minimum product Latency Circuit Area
Primary timing parameters
Latency Throughput
Circuit
Time to process
a single block of data
Xi
Yi
Number of bits processed
in a unit of time
Circuit
Xi
Xi+1
Xi+2
Yi
Yi+1
Yi+2
Throughput =Block_size · Number_of_blocks_processed_simultaneously
Latency
Infinite Impulse Response (IIR) Filter
Equations (1)
Transfer function
Two investigated architectures
Architecture 1: Direct II Form
Architecture 2: Cascade of second-order systems
(b)
Fi(z)
Example of coefficients: Butterworth filterOrder O=10, Passband Fp=0.3
Architecture 1: Direct II Form
Architecture 2: Cascade of second-order systems
a[1..10] =
b[1..10] =
IIR Filterwith control unit
& i/o interface
clock
reset
data_inwi
a_i
ab_write
data_out
wo
Required interface
wc
b_iwc
process
ready
valid
Project 2bfrom FALL 2005
to be modified in FALL 2006
Using high-level behavioral VHDL describe an 8-bit microcontroller MC68HC11E1, workingin the expanded mode, with the following simplifications:
1. Inputs and outputs of the microcontroller are reduced toE (clock), RESETn (reset active low),
RW (read/write), AS (address strobe), ADDR15..8 (also denoted as PB7..0),
ADDR7..0/DATA7..0 (multiplexed address & data, also denoted as PC7..0), PORTD and PORTE.
Microcontroller
2. Internal registers are reduced to the registers A, IX, SP, CC (Condition Codes NZVC), and PC.
3. The only parts of 68HC11E1 implemented in your model are:
a. CPUb. RAM (512 B in the range $0000-$01FF)c. parallel I/O (PORTD and PORTE)
4. Internally generated clock E has a frequency 2 MHz.
5. Internal I/O registers are limited toPORTD at the memory address $1008DDRD at the memory address $1009PORTE at the memory address $100A
6. Instruction set of the microcontroller is reduced to the following instructions
a. Data transfer instructionsLDAA, LDX, LDS, STAA, STX
a. Arithmetic instructionsCLRA, NEGA, ADDA, SUBA, ASRA, ASLA
a. Logic instructionsANDA, ORAA, EORA
a. Data test instructionsCMPA, CPX, TSTA
a. Control instructionsBEQ, BGT, BHI, BSR, JSR, RTS, JMP
a. Stack instructionsPSHA, PULA, PSHX, PULX
7. Addressing modes of the microcontroller are reduced to the following modes
a. immediateb. extended
c. indexedd. inherente. relative
8. Main program is stored in the external RAM starting at the address $4000.
9. After reset, PC is set to the address $0000 (internal RAM of MC68HC11) where the instruction JMP $4000 is located.
Microcontroller system
The implemented microcontroller system should consist of:
1. Microcontroller MC68HC11E12. 8 kB RAM, such as 61643. 74HC373 8-bit latch4. 74HC138 decoder chip5. Auxiliary gates, if needed
Write Cycle
Features of the model
1. Your model should allow cycle accurate modeling of the circuit behavior.
2. Your model should contain debugging featuresequivalent to the debugging features of the DLX model,discussed in class and described in Ashenden, Chapter 15.
3. Generic parameters passed to the modelshould include a. name of the file with the contents of the external RAMb. clk-to-output delayc. debugging mode
1. Your model should report all undefined opcodes,treat them as NOP, and proceed to the next RAM address.
Testing and debugging
The behavior of your model should be carefully verifiedusing a testbench instantiating your model with
a. the external RAM containing a valid program composed of a substantial subset of instructions implemented in the modelb. debugging mode set to the most detailed mode (trace_each_step)
Deliverables
1. All source code files.2. Contents of the external RAM used for
the model verification, in the hexadecimal notation, and expressed using the corresponding 68HC11 assembly language mnemonics.
1. The detailed log/report generated by your modelfor a given contents of RAM, and with the debuggingmode set to trace_each_step.
All Projects - Organization
• Projects divided into phases• Intermediate code submitted through WebCT at selected
checkpoints and evaluated by the instructor and/or TA• Penalty points for falling behind the schedule (below 50%
of the work that supposed to be done by a certain deadline)• Feedback provided to students on a fair and best effort
basis• Final report and codes submitted by WebCT and graded
using a full scale • Contest for the best results (bonus points awarded to the
winners)• Penalty and bonus points added to the final grade
Honor Code Rules
• All students are expected to write and debug their codes individually
• Students are encouraged to help and support each other in all problems related to the- operation of the CAD tools,- basic understanding of the problem.