digital system design with vhdl - george mason...
TRANSCRIPT
1
ECE 545
Digital System Design with VHDL
Fall 2017
Kris Gaj
Office hours: Thursday, 3:15-4:15 PM,Tuesday, 6:00-7:00 PM,and by appointment
Research and teaching interests:• reconfigurable computing• hardware/software codesign• computer arithmetic• cryptography
Contact:The Engineering Building, room 3225 [email protected]
Course Web Page
Google “Kris Gaj” ®
ECE 545 Digital System Design with VHDL
ECE 545Part of:
MS in Electrical Engineering
MS in Computer Engineering
Digital Systems DesignDigital Signal Processing
Fundamental course for the specialization areas:
Elective
Elective course in the remaining specialization areas
One of five core courses (must be passed with B or better)
ECE 545
Part of:
PhD in Electrical and Computer Engineering
Knowledge tested at the Technical Qualifying Exam (TQE)Topic 2: Digital Design and Computer Organization
I am interested in…
I want to specializeprimarily in…
VLSI
Digital Systems Design
ASICs & FPGAs
VHDL/Verilog
CAD Tools
Reconfigurable Computing
Microelectronics
VLSI Fabrication
Nanoelectronics
CAD tools & Design Automation
Hardware Description Languages
FPGAs & Reconfigurable computing
Computer Arithmetic
Front-end ASIC Design (algorithmic downto gate level)
Back-end ASIC Design (circuit and mask layout levels)
Analog & Digital Circuit Design
VLSI Fabrication
Microelectronics
Nanoelectronics
Semiconductor Devices
MS CpEDigital Systems Design
MS EEMicroelectronics/Nanoelectronics
Recommendedprogram &
specialization
2
algorithmic
Design level
register-transfer
gate
transistor
layout
devices
CoursesDigital System
Design with VHDL
DigitalIntegratedCircuitsPhysical
VLSI Design
Internetof Things
ECE545
ECE645
ECE586
ECE 680
ECE590
ECE 684 MOS Device ElectronicsECE 584 Semiconductor
Device Fundamentals
ECE681
VLSI Design for ASICs
ECE615
SW/HWCodesign ECE
699Low-Power& Secure
VLSI
ComputerArithmetic
CpEDigital Systems Design
Pre-ApprovedElectives
SuggestedElectives
ECE 545 Digital System Design with VHDL
ECE 586 Digital Integrated CircuitsECE 590 Internet of ThingsECE 615 Software/Hardware CodesignECE 645 Computer ArithmeticECE 681 VLSI Design for ASICsECE 699 Low-Power & Secure VLSIECE 740 DSP Hardware Architectures
ECE 584, 684, … (technology)ECE 511, 611, … (microprocessors)ECE 646, 746, 747 … (applications)
K. Gaj, H. Homayoun, J.-P. Kaps, A. Sasan
CpEMicroprocessors and Embedded Systems
ECE 510 Real-Time ConceptsECE 511 MicroprocessorsECE 590 Mobile Systems & AppsECE 611 Advanced MicroprocessorsECE 612 Real-Time Embedded
SystemsECE 615 Software/Hardware CodesignECE 699 Heterogeneous & Green
Computing
CS 540, 583 (languages, algorithms)CS 635 (parallel machines)ECE 542, 642, 742 (networks)ECE 645, 681 (digital design)ECE 548 (sequential mach. theory)ECE 699 (advanced mobile systems)ECE 590 (small spacecraft design)
X. Chen, H. Homayoun, J.-P. Kaps,P. Pachowicz, C. Sabzevari, A. SasanProfessors
DIGITAL SYSTEMS DESIGN
1. ECE 545 Digital System Design with VHDL– K. Gaj, J.-P. Kaps, project, FPGA design with VHDL
2. ECE 586 Digital Integrated Circuits– D. Ioannou, homework
3. ECE 590 Internet of Things– A. Sasan, homework, literature review
4. ECE 615 Software/Hardware Codesign– K. Gaj, project, SoC design with VHDL and C
5. ECE 645 Computer Arithmetic– K. Gaj, project, FPGA design with VHDL or Verilog
6. ECE 681 VLSI Design for ASICs– H. Homayoun, project/lab, front-end and back-end ASIC design with
Synopsys tools7. ECE 740 Digital Signals Processing Hardware Architectures
– A. Cohen, project, FPGA design with VHDL and Matlab/Simulink8. ECE 699 Low-Power & Secure VLSI
– A. Sasan, project TBD
MICROPROCESSOR AND EMBEDDED SYSTEMS
1. ECE 510 Real-Time Concepts– P. Pachowicz, project, design of real-time systems
2. ECE 511 Microprocessors– J.P. Kaps, K. Lilly, M. Garcia, project, system based on MSP430 microcontroller
3. ECE 590 Mobile Systems and Applications– X. Chen, project, mobile app development
4. ECE 611 Advanced Microprocessors– H. Homayoun, project, computer architecture simulation tools
5. ECE 612 Real-Time Embedded System– C. Sabzevari, project, programming distributed real-time systems
6. ECE 615 Software/Hardware Codesign– K. Gaj, homework, SoC design with VHDL and C
7. ECE 699 Heterogeneous Architectures and Green Computing– H. Homayoun, project, computer architecture simulation tools
CpEDigital Signal Processing
Pre-ApprovedElectives
SuggestedElectives
ECE 535 Digital Signal ProcessingECE 537 Introduction to Digital
Image ProcessingECE 545 Digital System Design
with VHDLECE 645 Computer ArithmeticECE 740 DSP Hardware
ArchitecturesECE 768 Advanced Digital Signal
Processing
ECE 681 (ASIC)ECE 511, 611 (microprocessors)ECE 528 (math background)ECE 635, 754 (advanced DSP)ECE 731, 735 (applications)
A. Cohen, K. Gaj, K. Hintz,J. Nelson, K. WageProfessors
CpENetwork & System Security
ECE 542 Computer NetworkArchitectures and Protocols
ECE 646 Cryptography and ComputerNetwork Security
ECE 746 Advanced AppliedCryptography
ECE 747 Cryptographic EngineeringISA 656 Network Security
ISA 562, 564, 674, 765, 767 (networksecurity)ECE 642, 741, 742 (computer networks)ECE 545, 645 (hardware implementations)ECE 511, 611 (microprocessors)
K. Gaj, J. Kaps, K. Zeng
A few words about You
13 MS CpE
6 MS EE
5 PhD ECE
1 MS AEP1 CERT RS&IP
MS AEP – Master’s in Applied and Engineering PhysicsCERT RS&IP – Graduate Certificate in Remote Sensing
& Image Processing
3
TAFarnoud Farahmand
Ph.D. Student Member of the Cryptographic Engineering Research Group
(CERG) since Fall 2014.
• help with the installation
and configuration of CAD tools
• help with understanding of tutorials
and the operation of tools
• help with VHDL and tool-oriented
homework assignments
• limited help with debugging your
project codesMS Thesis in Nov. 2016.
Internship in the Crypto Groupof Google in Summer 2017.
Getting Help Outside of Office Hours
• System for asking questions 24/7
• Answers can be given by students and instructors
• Student answers endorsed (or corrected) by instructors
• Average response time in Fall 2015 = 2 hours
• You can submit your questions anonymously
• You can ask private questions visible only to
the instructors
Grading Scheme
• Homework - 15%
• Project - 35%
• Midterm Exam - 20%
• Final Exam - 30%
• Class Activity - Bonus 5%
Bonus Points for Class Activity• Based on class exercises during lecture
• “Small” points earned each week posted on
BlackBoard
• Up to 5 “big” bonus points
• Scaled based on the performance of the best student
For example:
1. Alice 40 5 2. Bob 36 4.5
… … …27. Charlie 8 1
Small points Big points
Homework (1)
• Several assignments, including – Installation & configuration of tools– Getting familiar with the basic and advanced tool
features– Solving exercises devoted to the design and analysis of
simple digital circuits– Drawing block diagrams and Algorithmic State
Machine charts– Writing effective Testbenches– Writing & debugging RTL synthesizable VHDL code
Homework (2)
• All homework assignments should be done individually
• Using any code not developed by a given student, or using any external help needs to be acknowledged
• Students are encouraged to help and support each other in all problems related to the- installation & operation of the CAD tools- understanding of homework tasks
4
Midterm exam 1ü 2 hours 40 minutes
ü in class
ü design-oriented
ü open-books, cheat sheet
ü practice exams available on the web
Last week of October
Tentative date:
Final examü 2 hours 45 minutes
ü in class
ü design-oriented
ü open-books, cheat sheet
ü practice exams available on the web
Thursday, December 14, 4:30-7:15pm
Date:
21
Textbooks
Required TextbookPong P. Chu, RTL Hardware Design Using VHDL,Wiley-Interscience, 2006.
Supplementary Textbook – Basics Refresher
Stephen Brown and Zvonko Vranesic,Fundamentals of Digital Logic with VHDL Design, McGraw-Hill, 3rd Edition, 2008.
Supplementary Textbook – AdvancedRicardo Jasinski, Effective Coding with VHDL: Principles and Best Practice, The MIT Press; 1st Edition, 2016.
5
Supplementary Textbook – AdvancedHubert Kaeslin, Digital Integrated Circuit Design: From VLSI Architectures to CMOS Fabrication, Cambridge University Press; 1st Edition, 2008.
26
Technology
Block RAM
s
Block RAM
s
Xilinx: ConfigurableLogic Blocks (CLBs) /Altera: Adaptive Logic Modules (ALM)
I/OBlocks
What is an FPGA?
Xilinx: Block RAMs /Altera: Memory Blocks
28
Modern FPGARAM blocks
Multipliers
Logic blocks
Graphics based on The Design Warrior’s Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
Multipliers/DSP units
RAM blocks
Logic resources(CLBs or ALMs)
(#Logic resources, #Multipliers/DSP units, #RAM_blocks)
29
Programmableinterconnect
Programmablelogic blocks
The Design Warrior’s Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
General structure of an FPGA
30
Xilinx Configurable Logic Block (CLB)
6
31ECE 448 – FPGA and ASIC Design with VHDL
Basic Components of the Slice
LUTs
Storage Elements
32
4-input LUT (Look-Up Table) (used in earlier families of FPGAs)
• Look-Up tables are primary elements for logic implementation
• Each 4-input LUT can implement any function of 4 inputs
x1 x2 x3 x4
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
33
6-Input LUT of Xilinx FPGAs
• designs must be sentfor expensive and timeconsuming fabricationin semiconductor foundry
• bought off the shelfand reconfigured bydesigners themselves
Two competing implementation approaches
ASICApplication Specific
Integrated Circuit
FPGAField Programmable
Gate Array
• designed all the wayfrom behavioral descriptionto physical layout
• no physical layout design;design ends witha bitstream usedto configure a device
FPGAs vs. ASICs
ASICs FPGAs
High performanceOff-the-shelf
Short time to the market
Low development costs
Reconfigurability
Low power
Low cost (but only in high volumes)
36ECE 448 – FPGA and ASIC Design with VHDL
Major FPGA VendorsSRAM-based FPGAs• Xilinx, Inc.• Altera Corp.
(subsidiary of Intel since 2015)• Lattice Semiconductor• Atmel• Achronix• Tabula (went out of business in 2015)Flash & antifuse FPGAs• Microsemi SoC Products Group (formerly Actel Corp.)• Quick Logic Corp.
~ 51% of the market~ 34% of the market
~ 85%
7
Technology Low-cost Mid-range High-performance
220nm Virtex180nm Spartan-II,
Spartan-IIE120/150nm Virtex-II,
Virtex-IIPro90nm Spartan-3 Virtex-465nm Virtex-545nm Spartan-640nm Virtex-628nm Artix-7 Kintex-7 Virtex-720nm Kintex-7UltraSCALE VirtexUltraSCALE16nm Kintex-7UltraSCALE+ Virtex UltraSCALE+
Xilinx FPGA Families Altera/Intel FPGA Families
Technology Low-cost Mid-range High-performance
130nm Cyclone Stratix90nm CycloneII StratixII65nm CycloneIII ArriaI StratixIII40nm CycloneIV ArriaII StratixIV28nm CycloneV ArriaV StratixV20nm/
14nmtri-gateCyclone10 Arria10 Stratix 10
39
Altera/Intel FPGA Families
40
FPGA Family
41
Artix-7 FPGA Family
42
FPGADesign
Process
8
FPGA Design process (1)Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds…..
Library IEEE;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;
entity RC5_core isport(
clock, reset, encr_decr: in std_logic;data_input: in std_logic_vector(31 downto 0);data_output: out std_logic_vector(31 downto 0);out_full: in std_logic;key_input: in std_logic_vector(31 downto 0);key_read: out std_logic;
);end AES_core;
Specification / Pseudocode
VHDL description (Your Source Files)Functional simulation
Post-synthesis simulationSynthesis
On-paper hardware design (Block diagram & ASM chart)
FPGA Design process (2)
Implementation
Configuration
Timing simulation
On chip testing
Results
Levels of design description
Algorithmic level
Register Transfer Level
Logic (gate) level
Circuit (transistor) level
Physical (layout) level
Level of description most suitable for synthesis
Levels supported by HDL
46
Register Transfer Level (RTL) Design Description
Combinational Logic
Combinational Logic
Registers
George Mason University
Synthesis
48
architecture MLU_DATAFLOW of MLU is
signal A1:STD_LOGIC;signal B1:STD_LOGIC;signal Y1:STD_LOGIC;signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;
beginA1<=A when (NEG_A='0') else
not A;B1<=B when (NEG_B='0') else
not B;Y<=Y1 when (NEG_Y='0') else
not Y1;
MUX_0<=A1 and B1;MUX_1<=A1 or B1;MUX_2<=A1 xor B1;MUX_3<=A1 xnor B1;
with (L1 & L0) selectY1<=MUX_0 when "00",
MUX_1 when "01",MUX_2 when "10",MUX_3 when others;
end MLU_DATAFLOW;
VHDL description Circuit netlist
Logic Synthesis
9
49
Circuit netlist (RTL view)
George Mason University
Implementation
51
Mapping
LUT2
LUT1
FF1
FF2
LUT0
52
PlacingCLB SLICES
FPGA
53
RoutingProgrammable Connections
FPGA Two main stages of the FPGA Design Flow
Synthesis
Technologyindependent
Technologydependent
Implementation
RTLSynthesis Map Place & Route Configure
- Code analysis- Derivation of main logic constructions- Technology independent optimization- Creation of “RTL View”
- Mapping of extracted logic structures to device primitives- Technology dependent optimization- Application of “synthesis constraints”-Netlist generation- Creation of “Technology View”
- Placement of generated netlist onto the device-Choosing best interconnect structure for the placed design-Application of “physical constraints”
- Bitstream generation- Burning device
10
55
Configuration
• Once a design is implemented, you must create a file that the FPGA can understand• This file is called a bitstream: a BIT file (.bit extension)
• The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information
56
FPGATools
Xilinx ISEIntegrated Software Environment
• Older development environment• Frozen at version 14.7 in 2012 (although new
releases of v14.7 still posted periodically)• To be replaced by Xilinx Vivado for the new FPGA
families, starting at Series-7 FPGAs (Artix-7, Kintex-7, Virtex-7)
• Used in all previous offerings of ECE 545and other GMU FPGA-oriented courses (except ECE 699 Software/Hardware Codesign)
Support for Xilinx Families90 nm Spartan-3, Virtex-465 nm Virtex-545 nm Spartan-640 nm Virtex-628 nm Artix-7, Kintex-7,
Virtex-720 nm Kintex-7 UltraSCALE,
Virtex-7 UltraSCALE16 nm Kintex-7 UltraSCALE+,
Virtex-7 UltraSCALE+Future Families
ISE
Vivado
Vivado Design Suite
• 4 years of development and 1 year of beta testing• first version released in Summer 2012• scalable data model, supporting designs with
up to 100 million ASIC gate equivalents (GEs)• based on industry standards, such as
• AMBA AXI4 interconnect• IP-XACT IP packaging metadata• Tool Command Language (Tcl)• Synopsys Design Constraints (SDC)
Productivity Gains• Synthesis 3x faster than Xilinx XST (part of ISE)• Substantial improvement in runtime and
maximum design size compared to Xilinx ISE• Vivado Simulator 3x faster than ISim• Much better visibility into key design metrics,
such as timing, power, resource utilization, and routing congestion much earlier during the design process
• Estimates becomes progressively more accurate
11
Vivado vs. ISE vs. Competing Tools
Source: Xcell, no. 79, 2012
Design Entry Methods
• VHDL, Verilog• System Verilog• C, C++• System C• Matlab• Simulink
Multidimensional Analytical PlacerISE:• One-dimensional, timing-driven place-and-route algorithms• Simulated annealing algorithms that determine
randomly where the tool should place logic cells• Does adequate job for FPGAs below 1 million GEs
Vivado:• Modern multidimensional analytic placement algorithm• Deterministically finds a solution that primarily minimizes:
timing, congestion, and wire length• Better results, fewer iterations• Efficient up to 100 million GEs
Vivado’s Multidimensional Optimization
Source: Xcell, no. 79, 2012
Hierachical Chip Planning& Advantages of Standards
• ability to partition the design for processing by synthesis, implementation and verification
• divide-and-conquer team approach to big projects• design preservation feature enabling repeatable timing results• access to state of the art third-party EDA tools for tasks such as
• constraint generation• formal verification• static timing analysis
Power Optimization and Analysis
• capable of analyzing design logic and removing unnecessary switching activity
• advanced clock gating techniques• up to 30% reduction in dynamic power• power estimates at every stage of the design flow
12
Flow Automation, Not Flow Dictation
• GUI-based push-button flow• GUI-based step-by-step analysis at each design stage• Command line• Batch
High-Level Synthesis
• extensive evaluation of commercial tools forElectronic System Level (ESL) design (including study by research firm BDTI)
• 2010 acquisition of AutoESL Design Technologies, Inc. (25 employees) with flagship product AutoPilot
• Autopilot further developed and fully incorporated into Vivado Design Suite as Vivado HLS
• Design and verification orders of magnitude faster than at the RTL level
• Results dependent on the application domain
High Level LanguageC, C++, System C
Hardware Description LanguageVHDL or Verilog
VivadoHLS
Vivado HLS
High-Level Synthesis
HDLCode
Physical ImplementationFPGATools
Netlist
PostPlace&Route
Results
Functional Verification
Timing Verification
ReferenceImplementationinC
TestVectors
Manual Modifications(pragmas, tweaks)
HLS-readyCcode
HLS-Based Development and Benchmarking Flow
Vivado HLS Additional Simulation Tool
ModelSim-Intel FPGA Starter Edition
ModelSim:• Industry standard for simulation• Significantly faster than Vivado Simulator• Windows, Linux OS• Mixed-language support: VHDL, Verilog, System Verilog• Recommended for advanced users and more complex designs• To be used primarily as a standalone tool for functional simulation
(configuration for the timing simulation more time-consuming)
Features of the Starter Edition:• Free, no license required• 10,000 executable line limit
13
Lab Access Rules and Behavior Code
Please refer to
ECE Labs website
and in particular to
Access rules & behavior code
74
Project
Project
üCryptography Projects - proposed by the Instructor
ü Projects in your domain of expertise, e.g.,DSP, Applied and Engineering Physics, Remote Sensing & Image Processing, Big Data, Bioengineering, etc.
üyou will be responsible for defining and specifying a topic & scope of these projects by yourselves
ü an additional advisor, such as your MS/PhD Thesis advisoror manager at work highly recommended
Cryptography Project
ürelated to the research project conducted byCryptographic Engineering Research Group (CERG)at GMU
ü supporting NIST (National Institute of Standardsand Technology) in the evaluation of candidatesfor new cryptographic standards
CERG @ GMUhttp://cryptography.gmu.edu
10 PhD students3 MS students
Cryptographic Standard Contests
time97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17
AES
NESSIE
CRYPTREC
eSTREAM
SHA-3
34 stream 4 HW winnersciphers ® + 4 SW winners
51 hash functions ® 1 winner
15 block ciphers ® 1 winnerIX.1997 X.2000
I.2000 XII.2002
IV.2008
X.2007 X.2012
XI.2004
CAESARI.2013
57 authenticated ciphers ® multiple winnersTBD
14
79
Why a Contest for a Cryptographic Standard?
• Avoid back-door theories• Speed-up the acceptance of the standard• Stimulate non-classified research on methods of
designing a specific cryptographic transformation• Focus the effort of a relatively small cryptographic
community
80
Features Required from Today’s Ciphers
FUNCTIONALITY• easy key distribution• digital signatures
STRENGTHPERFORMANCE• software• hardware
81
Secret-key (Symmetric) Ciphers
key of Alice and Bob - KAB key of Alice and Bob - KAB
Alice Bob
Network
Encryption Decryption
Most Popular Standards: AES, Triple DES82
Features of Secret-Key Ciphers
FUNCTIONALITY• easy key distribution• digital signatures
STRENGTHPERFORMANCE• software• hardware
Best attack:Exhaustive-key search2k trials for a k-bit key
Primary Application: Bulk data encryption
83
Public-key (Asymmetric) CiphersPublic key of Bob - KB Private key of Bob - kB
Alice Bob
Network
Encryption Decryption
Most Popular Standards: RSA, Elliptic Curve Cryptography (ECC)84
Digital Signature Schemes
Message
Hash function
Public keycipher
Alice Signature
Alice’s private key
Bob
Hash function
Alice’s public key
Hash value 1
Hash value 2
Hash value
Public key cipher
yes no
Message Signature
15
85
Features of Public-Key Ciphers
FUNCTIONALITY• easy key distribution• digital signatures
STRENGTHPERFORMANCE• software• hardware
Best attack:Solving the underlying math problem, such asfactoring of largeintegers:Given N=P�Q,find P and Q.
Primary Applications: Exchange of keys for secret-key ciphersDigital signatures
Five security levels & corresponding key sizes allowed by American government
NIST SP 800-56
RSA ECCSymmetricciphersLevel
IIIIIIIVV
80
112
128
192
256
160
224
256
384
512
1024
2048
3072
8192
15360
87
Evaluation Criteria in Cryptographic Contests
Security
Software Efficiency Hardware Efficiency
Simplicity
FPGAs ASICs
Flexibility Licensing
µProcessors µControllers
88
• Focus on ranking, rather than absolute values• Only relatively large differences (>20-30%) matter• Winner in use for the next 20-30 years, implemented using
technologies not in existence today• Very wide range of possible applications, and as a result
performance and cost targets• Large number of candidates• Limited time for evaluation• Results are final
Hardware Benchmarking in Cryptographic Contests
89
AES Contest 1997-2000Final Round
Hardware results matter!
Speed in FPGAs Votes at the AES 3 conference
90
NIST SHA-3 Contest - Timeline
51candidates
Round 114 5 1
Round 3
July 2009 Dec. 2010 Oct. 2012Oct. 2008
Round 2 Round 3
16
91
Throughput vs. Area Normalized to Results for SHA-256 and Averaged over 11 FPGA Families – 256-bit variants
Early Leader
Overall Normalized Area
Overall Normalized Throughput
92
SHA-3 finalists in high-performance FPGA families
2.830.79 4.002.001.411.000.500.350.25
• standard-cell CMOS 65nm UMC ASIC process
• 256-bit variants of algorithms
• Taped-out in Oct. 2011,successfully testedin Feb. 2012
GMU/ETH Zurich ASIC
94
Correlation Between ASIC Results and FPGA Results
ASIC Stratix III FPGA
95
Correlation Between ASIC Results and FPGA Results
ASIC Stratix III FPGA
96
Goal: Portfolio of new-generation authenticated ciphers
First-round submissions: March 15, 2014
Announcement of final portfolio: 2018
Organizer: An informal committee of leading cryptographic experts
Number of candidate families:
Round 1: 57 Round 2: 29 Round 3: 15
CAESAR Competition
17
97
Message
Bob
Tag
Alice
Authenticated Ciphers
KAB KABAuthenticatedCipher
IV
CiphertextIV
TagCiphertextIV
AuthenticatedCipher
valid
KAB - Secret key of Alice and BobIV – Initialization Vector, AD – Associated Data
AD
AD
AD
Message
98
Submitters of Hardware Implementations1. CCRG NTU (Nanyang Technological University) Singapore –
ACORN, AEGIS, JAMBU, & MORUS2. CLOC-SILC Team, Japan – CLOC & SILC3. Ketje-Keyak Team – Ketje & Keyak4. Lab Hubert Curien, St. Etienne, France – ELmD & TriviA-ck5. Axel Y. Poschmann and Marc Stöttinger – Deoxys & Joltik6. NEC Japan – AES-OTR7. IAIK TU Graz, Austria – Ascon8. DS Radboud University Nijmegen, Netherlands – HS1-SIV9. IIS ETH Zurich, Switzerland – NORX10. Pi-Cipher Team – Pi-Cipher11. EmSec RUB, Germany – POET12. CG UCL, INRIA – SCREAM13. Shanghai Jiao Tong University, China – SHELL
Total: 19 Candidate Families
99
Submitters - GMU Benchmarking Team
“Ice” HomsirikamolAES-GCM, AEZ,Ascon, Deoxys, HS1-SIV, ICEPOLE, Joltik, NORX, OCB,PAEQ, Pi-Cipher, STRIBOB
Will Diehl
AhmedFerozpuriPRIMATEs-GIBBON &HANUMAN,PAEQ
Farnoud FarahmandAES-COPACLOC
Mike X.LyonsTriviA-ck
MinalpherOMDPOETSCREAM
Total: 19 Candidate Families + AES-GCM 100
Throughput/Area of AES-GCM = 1.020 (Mbit/s)/LUTs
Relative Throughput/Area in Virtex 6vs. AES-GCM
E – Throughput/Area for EncryptionD – Throughput/Area for DecryptionA – Throughput/Area for Authentication OnlyDefault: Throughput/Area the same for all 3 operations
101
Relative Throughput in Virtex 6Ratio of a given Cipher Throughput/Throughput of AES-GCM
Throughput of AES-GCM = 3239 Mbit/s
E – Throughput for EncryptionD – Throughput for DecryptionA – Throughput for Authentication OnlyDefault: Throughput the same for all 3 operations
ATHENa Database of Results
18
103
• Available athttp://cryptography.gmu.edu/athena
• Developed by John Pham, a Master’s-level student of Jens-Peter Kaps as a part of the SHA-3 Hardware Benchmarking project, 2010-2012,(sponsored by NIST)
• In June 2015 extended to support Authenticated Ciphers
• In July 2017 extended to support the CAESAR Use Casesand ranking of candidate variants
ATHENa Database of Results
104
Two Views
• Rankings Viewhttps://cryptography.gmu.edu/athenadb/fpga_auth_cipher/rankings_view
• Easier to use• Provides Rankings
• Table Viewhttps://cryptography.gmu.edu/athenadb/fpga_auth_cipher/table_view
• More comprehensive• Allows close investigation of all designs &
comparative analysis• Geared toward more advanced users• On-line help
105
• Fall 2009: SHA-3 Contest, Round 2• Fall 2010: SHA-3 Contest, Round 2• Fall 2011: SHA-3 Contest, Round 3• Fall 2012: Pilot study on Authenticated Ciphers• Fall 2013: Pilot studies on Authenticated Ciphers, Block
Ciphers, and Stream Ciphers• Fall 2014: CAESAR Contest, Round 1• Fall 2015: CAESAR Contest, Round 2• Fall 2016: CAESAR Contest, Round 3• Fall 2017: Post-Quantum Cryptography Pilot Study
Participation of the ECE 545 students
106
Threat of Quantum Computers
• First perceived by physicists (R. Feynman,D. Deutsch) in 1980s
• First significant quantum algorithms(capable of running on quantum computers only) developed in 1990s
• First practical realization in 1998(2 qubits)
• Significant technological breakthroughsduring the last 20 years
• Quantum Artificial Intelligence lab started by Google in 2013
• IBM quantum processor (16-17 qubits)in 2017Photo: Vandersypen, PQCrypto 2017
107Source: Vandersypen, PQCrypto 2017
Major advances during the last 20 years
Timeline of Quantum Computing: https://en.wikipedia.org/wiki/Timeline_of_quantum_computing
108
Effect on Secret-Key Algorithms
1996: Grover’s Algorithm, reduces the time of the exhaustive-key searchfor secret key ciphers
from 2k to 2k/2 operations, for a k-bit key, e.g., from 2128 to 264 operations, for a 128-bit key or
from 2256 to 2128 operations, for a 256-bit key
assuming a sufficiently powerful and reliable quantum computer available
Easy Countermeasure: Double the size of a key
19
109
Effect on Public-Key Algorithms
1994: Shor’s Algorithm, breaks major public key cryptosystems based on
factoring: RSA
discrete logarithm problem: DSA, Diffie-Hellman
Elliptic Curve discrete logarithm problem: Elliptic Curve Cryptosystems
independently of the key size assuming
a sufficiently powerful and reliable quantum computer available
No known countermeasuresNew algorithms and standards required
110
Remaining Challenges in Quantum Computing
1. High sensitivity to manufacturing variationsSolution: Best industry cleanrooms, e.g., QuTech-Intel collaborationtoward quantum-dot arrays made @ Intel 300mm wafers
2. Scalable control circuits (currently bulky & expensive)Solution: Tailored cryo-CMOS digital control
3. Multitude of interconnects and external pinsSolution: Multiplexing electronics co-integrated with qubits
4. Non-standard architecture & limited programmabilitySolution: System layer approach
Likely to be overcome in the next 10-15 yearsSource: Vandersypen, PQCrypto 2017
111
System Layer Approach
Source: Vandersypen, PQCrypto 2017 112
Projected Progress
Source: Vandersypen, PQCrypto 2017
113
Can we accelerate building quantum computers?
Source: Vandersypen, PQCrypto 2017 114
Can we accelerate software development?
Source: Vandersypen, PQCrypto 2017
20
115
Public-key cryptographic algorithms for which there are no known attacks using quantum computers
Capable of • being implemented using any traditional methods,
including software and hardware• running efficiently on any modern computing platforms:
PCs, tablets, smartphones, servers with FPGA accelerators, etc.
Post-Quantum Cryptography
116
• New public-key cryptographic families: mid-1990s-present• D.J. Bernstein introduces the term post-quantum cryptography: 2003• Series of PQCrypto Conferences: 2006-present• NIST Workshop on Cybersecurity in a Post-Quantum World 2015• NIST announcement of standardization plans at PQCrypto 2016,
Fukuoka, Japan, Feb. 2016• NIST Call for Proposals and Request for Nominations for Public-Key
Post-Quantum Cryptographic Algorithms: Dec. 2016Deadline for submitting candidates: November 30, 2017
Post-Quantum Cryptography Efforts
117
• NIST Call for Proposals and Request for Nominations for Public-Key Post-Quantum Cryptographic Algorithms: Dec. 2016
Deadline for submitting candidates: November 30, 2017
Post-Quantum Cryptography NIST Project
Source: Moody, NIST 2017 118
Promising PQC Families
Family Encryption Signature Key Agreement
Hash-based XX
Code-based XX X
Lattice-based XX X
Multivariate X XX
Supersingular Elliptic CurveIsogeny
XX
XX – high-confidence candidates, X – medium-confidence candidates
119
Promising PQC Algorithms
Family Encryption & Key Exchange
Signature
Hash-based XMSS (2011), SPHINCS (2015)
Code-based McEliece (1978), Niederreiter (1986)
CFS (2001)
Lattice-based NTRUEncrypt (1996), Ring-LWE (2010),
NewHope (2016), Kyber (2017)
pqNTRUSign (2001-2017),BLISS (2013),
Dilithium (2017)
Multivariate PMI+ (2004), SRP (2015) Unbalanced Oil and Vinegar (1999), HFEv-, QUARTZ (2001), Rainbow (2005)
120
1. NTRUEncrypt Short Vector Encryption Scheme (SVES)fully compliant with
IEEE 1363.1 Standard Specification for Public Key Cryptographic Techniques Based on Hard Problems over Lattices
Parameter sets: • Optimized for speed• 192-bit security: ees1087ep1: p=3, q=2048, N=1087, df=dr=63• 256-bit security: ees1499ep1: p=3, q=2048, N=1499, df=dr=79
2. Multivariate Rainbow Signature Scheme
Parameter set: • (17,12)(1,12)• 80-bit security level
Algorithms Selected for a Pilot CERG Study
21
121
Paving the way for the future comprehensive, fair, and efficient hardware benchmarking of PQC candidates through
1. Uniform Hardware API
2. Uniform & Efficient Development Process
Our Objectives
122
Minimum Compliance Criteria• Encryption & decryption, or
Signature generation & verification• External key generation (e.g., in software)• Permitted data port widths, etc.
Communication Protocol
Interface Timing Characteristics
Proposed Uniform Hardware API
123
Comparative Analysis of Implementation Difficulties
Feature NTRUEncrypt Rainbow SSHigh-security levels Easy to
implementChallenging toimplement
Key sizes Small Very LargeSupport for multiple parameter sets swapped at run time
Relatively easy to implement
Challenging to implement
Component operations Standard: variable rotator, hash function
Complex: Systemof Linear Equation Solver
Dependence of the execution timeon message size
Strong Weak
124
Outcomes of the CERG Pilot Study
• First hardware implementation of the full NTRUEncrypt-SVES scheme
• Hardware optimization for speed revealed the hash function bottleneck
• Changes in the NTRUEncrypt standards recommended to overcome this bottleneck
• State of the art implementation of the Rainbow Signature Scheme comparable to the earlier results by Tang et al.from PQCrypto 2011
• New PQC Hardware API, paving the way for the fair evaluation of candidates in the NIST standardizationprocess
Cryptography Project
ü RTL VHDL implementation of a post-quantum cryptographic algorithm, or its sub-function, based on
• algorithm specification• reference implementation in C• Hardware API specification
ü groups of 1, 2, or 3 students (2-person groups preferred)
ü desirable expertise within a group, not taught in this class§ C or Python programming§ Basics of math (modular arithmetic, operations on
matrices, solving systems of linear equations, etc.)
Combining Projects from Two Different Courses
• ECE 545 & ECE 646• ECE 545 project can be extended into an ECE 646 hardware
project by adding additional ciphers/sub-functions, architectures, modes of operation, parameter sets, etc.
• ECE 646 students must write a final report, give an oral presentation, & submit project deliverables
• ECE 545 students submit only project deliverables
• ECE 545 & ECE 798/799/998/999• ECE 545 project can be extended into
a Research Project, Master’s Thesis, or PhD Thesis
22
Project Organization• Project divided into phases
• Deliverables for each phase submitted using Blackboard at selected checkpoints and evaluated by the instructor and/or TA
• Feedback provided to the students on the best effort basis
• Periodical group meetings devoted to the discussion of each phase deliverables and encountered difficulties
• Final deliverables submitted using Blackboard at the end of the semester
• Final project score based only on the final deliverables
Questions?
Thank you!
128
Comments?
Suggestions?ATHENa: http://cryptography.gmu.edu/athena
CERG: http://cryptography.gmu.edu