a universal hardware api for authen6cated ciphers• 2015.07.07: announcement of second-round...
TRANSCRIPT
1
AUniversalHardwareAPIforAuthen6catedCiphers
EkawatHomsirikamol,WilliamDiehl,AhmedFerozpuri,
FarnoudFarahmand,MalikUmarSharif,andKrisGaj
GeorgeMasonUniversityUSA
http://cryptography.gmu.edu https://cryptography.gmu.edu/athena
2
Co-Authors from the GMU CAESAR Team
“Ice” Will Farnoud Umar Homsirikamol Diehl Farahmand Sharif
Kris Gaj
3
CAESAR Contest
2013-2017
4
Secret-Key Ciphers MACs & Hash Functions
Authenticated Ciphers
Cryptographic Transformations
Block Ciphers Stream Ciphers
confidentiality message integrity & authentication
confidentiality + message integrity & authentication
5time
97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17
AES
NESSIE
CRYPTREC
eSTREAM
SHA-3
34 stream 4 HW winners ciphers → + 4 SW winners
51 hash functions → 1 winner
15 block ciphers → 1 winner
IX.1997 X.2000
I.2000 XII.2002
IV.2008
X.2007 X.2012
XI.2004
CAESAR I.2013
57 authenticated ciphers → multiple winners
XII.2017
Cryptographic Standard Contests
6
• 2014.03.15: Deadline for first-round submissions • 2014.04.15: Deadline for first-round software • 2015.07.07: Announcement of second-round candidates • 2015.09.15: Deadline for second-round software • 2015.12.15: Deadline for second-round Verilog/VHDL • 2016.03.15: Announcement of third-round candidates • 2016.12.15: Announcement of finalists • 2017.12.15: Announcement of final portfolio
CAESAR Contest Timeline
7
Evaluation Criteria
Security
Software Efficiency Hardware Efficiency
Simplicity
FPGAs ASICs
Flexibility Licensing
µProcessors µControllers
8
Software API vs. Hardware API - Similarities
Software API Hardware API
Function Arguments
Argument Format
Core Ports = Interface
Input/Output Format
9
Software API vs. Hardware API - Differences
Software API Hardware API
All inputs available at the same time
Inputs accepted, outputs produced sequentially
Physically constrained
Abstracted
10
• Hardware API can have a high influence on Area and Throughput/Area ratio of all implementations
• Hardware API typically much more difficult to modify than Software API
• Without a comprehensive hardware API, the comparison of existing and future implementations highly unreliable and potentially unfair
• Need for a uniform hardware API, endorsed by the CAESAR Committee, and adopted by all future implementers
Hardware API and the CAESAR Contest
11
• Popular general-purpose interfaces • ARM: AXI4, AXI4-Lite, AXI4-Stream (Advanced eXtensible Interface) • IBM: PLB (Processor Local Bus), OPB (On-chip Peripheral Bus) • Altera: Avalon • Xilinx: FSL (Fast Simplex Link) • Silicore Corp.: Wishbone (used by opencores.org)
• Hardware APIs used during the SHA-3 Contest • GMU, Virginia Tech, University College Cork, etc.
• Interfaces used so far in the CAESAR competition • minimalistic, algorithm specific • AXI4-Stream-based proposed by ETH
(non-uniform, algorithm-specific user and data ports)
Previous Work
12
Proposed API Specification
13
Input and Output of an Authenticated Cipher
Npub(PublicMessageNumber),typicallyNonceNsec(SecretMessageNumber)[supportedbyfewalgorithms]
EncNsec–EncryptedSecretMessageNumberAD–AssociatedData
or
Key
TagNpub AD CiphertextNsecEnc
Key
Encryption
Npub Nsec AD Message
Npub AD CiphertextNsecEnc Tag Nsec AD MessageInvalid
Decryption
14
• inputs of arbitrary size in bytes (but a multiple of a byte only) • size of the entire message/ciphertext does not need to be
known before the encryption/decryption starts (unless required by the algorithm itself)
• independent data and key inputs • wide range of data port widths, 8 ≤ w ≤ 256 • simple high-level communication protocol • support for the burst mode • possible overlap among processing the current input block,
reading the next input block, and storing the previous output block
Proposed Features (1)
15
• storing decrypted messages internally, until the result of authentication is known
• support for encryption and decryption within the same core, but only one of these two operations performed at a time
• ability to communicate with very simple, passive devices, such as FIFOs
• ease of extension to support existing communication interfaces and protocols, such as • AMBA-AXI4 from ARM - a de-facto standard for the System-on-Chip
buses • PCI Express – high-bandwidth serial communication between PCs
and hardware accelerator boards
Proposed Features (2)
16
Authenticated Encryption with Associated Data – AEAD Interface
clk
sdi
sdi_valid
sdi_ready
sw
pdi
pdi_valid
pdi_ready
w
Public Data Input
Ports
PDI
Secret Data Input
Ports
SDI
clk rst
Data Output
Ports
DOw
do
do_ready
do_valid
AEAD
rst
17
Typical External Circuits (1) - FIFOs
full
pdi_valid
pdi_ready
clk rst
dout
empty
read
wdout
empty
read
rst clkwr_clkrd_clk =
sw
rstclk rd_clkwr_clk =
din
write
pdiw
AEAD
rstclk
SDI
FIFO
FIFO
PDI
rd_clk =rstwr_clkclk
sdi
sdi_valid
sdi_ready
do
do_ready
do_validFIFO
DO
18
Typical External Circuits (2) – AXI4 IPs
rst
w
sw
pdi
pdi_valid
pdi_ready
sdi
sdi_valid
sdi_ready
dout
empty
read
m_axis_tdata
m_axis_tvalid
m_axis_tready
Master
AXI4−Stream
clk rstclk rst
clk rst
clk rst
s_axis_tdata
s_axis_tvalid
s_axis_tready
AXI4−Stream
Slavew
do
do_ready
do_valid
AEAD
clk
SDI
FIFO
19
Input/Output Format
status = FAIL
w−bit
instruction = ACTKEY instruction = ACTKEY
instruction = DEC
seg_0_header
seg_0 = Npub
seg_1_header
seg_2_header
seg_2 = AD
seg_3_header
seg_3 = Ciphertext
seg_1 = Enc Nsec
seg_0 = Npub
seg_2_header
seg_2 = AD
seg_3_header
seg_3 = Message
seg_1_header
seg_1 = Nsec
(a) (b)(d)
(c)
seg_0_header
instruction = ENCseg_0_header
seg_1_header
seg_2_header
seg_2 = Message
seg_1 = AD
seg_0 = Nsec
status = PASS
seg_4_header seg_4 = Tag
PDI for encryption DO for encryption = PDI for decryption DO for decryption
20
High-Speed Implementations
of AEAD
npubnpub
TAG_SIZE
sdi_ready
sdi
sdi_valid
byp
ass_
full
byp
ass_
wr
KEY_SIZE
W
sdi
sdi_ready
pdi_ready
pdipdi_valid
sdi_valid
write
din
full empty
dout
read
FIFO
Bypass
DBLK_SIZE/8
DBLK_SIZE/8
bdi_decryptbdi_decrypt
nsec_readynsec_ready
bdi_pad_loc
bdi_valid_bytesbdi_valid_bytes
bdi_pad_loc
bdi_sizebdi_size
bdi_read bdi_read
exp_tag_ready exp_tag_ready
bdi_eot
bdi_eoi
bdi_eot
bdi_eoi
bdi_nodatabdi_nodata
BS_BYTES
bdi_proc
bdi_readybdi_ready
bdi_proc
bdi_adbdi_ad
nsec_readnsec_read
npub_readnpub_read
key_updated
key_needs_update
key_ready
key_needs_update
key_ready
key_updated
rdkey_ready
rdkey_read
rdkey_ready
rdkey_read
npub_ready npub_ready
W
W 4 W 3
do
do_ready
do_valid
do_ready
do
tag_ready
tag_write
msg_auth_valid
msg_auth_done
byp
ass_
em
pty
byp
ass_
rd
statusdoutctrldin
au
x_fifo
_sta
tus
au
x_fifo
_d
ou
t
au
x_fifo
_d
in
au
x_fifo
_ctrl
byp
ass_
da
ta
do_validProcessorPost
AUX FIFO
bdo_nsec
bdo_size
bdo_ready
bdo_write
bdo_data tag_dataProcessorPre
Controller
CipherCore
Datapath
CipherCore
W
TAG_SIZE
len_a
len_d
len_a
len_d
exp_tag exp_tag
CTR_AD_SIZE
tag
bdo
bdibdi
keykey
rdkeyrdkey
nsecnsecNSEC_SIZE
AEAD Core
CipherCore
pdi
pdi_valid
pdi_ready
DBLK_SIZE
RDKEY_SIZE
AEAD
SW
msg_auth_done
tag_ready
tag_write
msg_auth_valid
msg_auth_done
bdo_nsec
bdo_ready
bdo_write
bdo_size
DBLK_SIZE
BS_BYTES+1
CTR_D_SIZE
NPUB_SIZE
Block Diagram of a High-Speed Implementation of AEAD
22
PreProcessor: • parsing segment headers • loading and activating keys • Serial-In-Parallel-Out loading of input blocks • padding input blocks • keeping track of the number of data bytes left to process
PostProcessor: • clearing any portions of output blocks not belonging to ciphertext
or plaintext • Parallel-In-Serial-Out conversion of output blocks into words • formatting output words into segments • storing decrypted messages in AUX FIFO, until the result of
authentication is known • generating an error word if authentication fails
PreProcessor and PostProcessor for High-Speed Implementations (1)
23
Features: • Ease of use • No influence on the maximum clock frequency of AEAD
(up to 300 MHz in Virtex 7) • Limited area overhead • Clear separation between the AEAD Core unit and internal FIFOs
• Bypass FIFO – for passing headers and associated data directly to PostProcessor
• AUX FIFO – for temporarily storing unauthenticated messages after decryption
Benefits: • The designers can focus on designing the CipherCore specific to a
given algorithm, without worrying about the functionality common for multiple algorithms
• Full-block width interface of the CipherCore
PreProcessor and PostProcessor for High-Speed Implementations (2)
24
Verification & Result
Generation
25
Manual Design
HDLCode
Automated Optimization FPGATools[+ATHENa]
PostP&RNetlist
PostPlace&Route
Results
Functional Verification
GMU Testbench
Timing Verification
GMU Testbench
InformalSpecifica6on TestVectors
Proposed Development & Benchmarking Flow
1. Xilinx ISE + ATHENa 2. Vivado + Default Strategies
AETVgen Ccode
Implementa6onParameters
26
• Universal Testbench supporting any authenticated cipher core following GMU AEAD API
• Change of cipher requires only changing test vector file • A Python script, AETVgen, created to automatically
generate test vector files representing multiple test cases • Encryption and Decryption • Empty Associated Data and/or Empty Message/Ciphertext • Various, randomly selected sizes of AD and Message/Ciphertext • Valid tag and invalid tag cases
• All source codes made available at GMU ATHENa website
Universal Testbench & Automated Test Vector Generation
27
• Generation of results possible for • CipherCore – full block width interface, incomplete functionality • AEAD Core - recommended • AEAD – difficulty with setting BRAM usage to 0 (if desired)
• Use of wrappers: • Out-of-context (OOC) mode available in Xilinx Vivado (no pin limit) • Generic wrappers available in case the number of port bits exceeds
the total number of user pins, when using Xilinx ISE • GMU Wrappers: 5 ports only (clk, rst, sin, sout, piso_mux_sel)
• Recommended Automated Optimization Procedure • ATHENa for Xilinx ISE and Altera Quartus II • 25 default optimization strategies for Xilinx Vivado
Result Generation
28
1. AES-COPA 2. Ascon 3. CLOC 4. Deoxys 5. ICEPOLE 6. Joltik 7. Keyak
Authenticated Ciphers Implemented Using Our Hardware API
8. Minalpher 9. OCB 10. PAEQ 11. POET 12. PRIMATEs-HANUMAN 13. PRIMATES-GIBBON 14. SCREAM
15. AES-GCM
CAESAR Candidates:
Current Standard:
29
Preliminary Results for 13 Round 1 CAESAR Candidates & AES-GCM
30
ATHENa Database of Results for Authenticated Ciphers http://cryptography.gmu.edu/athena
31
ATHENa Database of Results for Authenticated Ciphers
32
AEAD Core vs. CipherCore Area Overhead for Virtex-6
clk, rst, sin, sout, piso_mux_sel clk, rst, sin, sout, piso_mux_sel
Overhead = LUT(AEAD_Core)-LUT(CipherCore)
LUT(AEAD_Core) × 100%
33
Conclusions
clk, rst, sin, sout, piso_mux_sel clk, rst, sin, sout, piso_mux_sel
• Complete Hardware API for authenticated ciphers • The Detailed Specification - ePrint 2015/669
• GMU hardware API with supporting materials • Available at: https://cryptography.gmu.edu/athena • Universal testbench and Automated Test Vector Generation • PreProcessor and PostProcessor Units for high-speed
implementations • Universal wrappers and scripts for generating results • Ease of recording and comparing results using ATHENa database
• GMU proposal open for discussion and possible improvements through • Better specification • Better implementation of supporting codes
34
• Support for two-pass algorithms • Pipelining of multiple streams of data • Detection and reporting of input formatting errors • Accepting inputs with padding done in software
Future Work
Comments?
Thank you!
35
Questions?
http://cryptography.gmu.edu/athena Choose: Download
http://cryptography.gmu.edu
Suggestions?