e cient simultaneous deployment of multiple lightweight authenticated ciphers · 2020. 5. 23. · e...

22
Efficient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers Behnaz Rezvani, Thomas Conroy, Luke Beckwith, Matthew Bozzay, Trevor Laffoon, David McFeeters, Yijia Shi, Minh Vu, and William Diehl Virginia Tech, Blacksburg, VA 24061, USA email: {behnaz, tconroy, luke98, mb20022, tslaffoon98, davidmm, yijiashi, minv17, wdiehl}@vt.edu Abstract. Cryptographic protections are ubiquitous in information tech- nology, including the emerging Internet of Things (IoT). As a result of technology migration to a resource-challenged landscape and new threats to cryptographic security, governments and industry are exploring new cryptographic algorithms. While new standards will emerge, however, old standards will not disappear for the time being. It is therefore important to explore platforms where multiple cryptographic deployments can be dynamically interchanged and even share resources. In this research we build on the Development Package for the Applications Programming In- terface for Hardware Implementations of Lightweight Cryptography (DP API HW LWC). In this construct, developers design hardware implemen- tations of authenticated encryption with associated data (AEAD) inside a cryptographic core (CryptoCore) encapsulated by input/output utilities. While CryptoCore is intended for single register-transfer level (RTL) implementations, we install a custom-designed soft core microprocessor inside CryptoCore to run underlying block ciphers, along with a shell to facilitate AEAD processing. Through dynamic loading and execution of block ciphers on the core, we demonstrate a single LWC deployment on an Artix-7 FPGA, capable of executing 3 NIST LWC Standardization Process Round 2 AEAD candidates (COMET-AES, COMET-CHAM and GIFT-COFB) using only 55% of the combined area of separate RTL implementations of the same ciphers. Keywords: NIST · Lightweight cryptography · FPGA · Implementation · Authenticated encryption · AES · CHAM · GIFT · COMET · GIFT- COFB · Microprocessor · Instruction Set Extension 1 Introduction Cryptographic protections, consisting of mathematically secure and ideally effi- cient algorithms, are an important part of many information technology sectors, including national security, finance, transportation, energy, medical, and personal privacy. Cryptographic services include confidentiality, i.e., preventing a 3rd party from reading a transmission, authenticity, i.e., verification that a transmis- sion originated from a particular sender, and integrity, i.e., assurance that the transmission was not altered between sender and receiver.

Upload: others

Post on 06-Aug-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of MultipleLightweight Authenticated Ciphers

Behnaz Rezvani, Thomas Conroy, Luke Beckwith, Matthew Bozzay, TrevorLaffoon, David McFeeters, Yijia Shi, Minh Vu, and William Diehl

Virginia Tech, Blacksburg, VA 24061, USAemail: {behnaz, tconroy, luke98, mb20022, tslaffoon98,

davidmm, yijiashi, minv17, wdiehl}@vt.edu

Abstract. Cryptographic protections are ubiquitous in information tech-nology, including the emerging Internet of Things (IoT). As a result oftechnology migration to a resource-challenged landscape and new threatsto cryptographic security, governments and industry are exploring newcryptographic algorithms. While new standards will emerge, however, oldstandards will not disappear for the time being. It is therefore importantto explore platforms where multiple cryptographic deployments can bedynamically interchanged and even share resources. In this research webuild on the Development Package for the Applications Programming In-terface for Hardware Implementations of Lightweight Cryptography (DPAPI HW LWC). In this construct, developers design hardware implemen-tations of authenticated encryption with associated data (AEAD) inside acryptographic core (CryptoCore) encapsulated by input/output utilities.While CryptoCore is intended for single register-transfer level (RTL)implementations, we install a custom-designed soft core microprocessorinside CryptoCore to run underlying block ciphers, along with a shell tofacilitate AEAD processing. Through dynamic loading and execution ofblock ciphers on the core, we demonstrate a single LWC deployment onan Artix-7 FPGA, capable of executing 3 NIST LWC StandardizationProcess Round 2 AEAD candidates (COMET-AES, COMET-CHAMand GIFT-COFB) using only 55% of the combined area of separate RTLimplementations of the same ciphers.

Keywords: NIST · Lightweight cryptography · FPGA · Implementation· Authenticated encryption · AES · CHAM · GIFT · COMET · GIFT-COFB · Microprocessor · Instruction Set Extension

1 Introduction

Cryptographic protections, consisting of mathematically secure and ideally effi-cient algorithms, are an important part of many information technology sectors,including national security, finance, transportation, energy, medical, and personalprivacy. Cryptographic services include confidentiality, i.e., preventing a 3rdparty from reading a transmission, authenticity, i.e., verification that a transmis-sion originated from a particular sender, and integrity, i.e., assurance that thetransmission was not altered between sender and receiver.

Page 2: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

2 B. Rezvani et al.

At present, the state of cryptography is at a point of inflection not observedin the past several decades. One driver is the ubiquitous nature of the Internetof Things (IoT), where small, intelligent devices pervade many aspects of ourdaily lives and communicate with one another as well as with centralized ap-plications. Much of the data handled by IoT devices is sensitive and requirescryptographic protections. However, existing secret key cryptographic standards,e.g., the Advanced Encryption Standard (AES) [41], were designed for largerdesktop computers and centralized servers. Government and industry are seekingalternatives for resource-constrained devices; an example of such a search isthe U.S. National Institute of Standards and Technology (NIST) LightweightCryptography (LWC) Standardization Process, which is investigating lightweightauthenticated encryption with associated data (AEAD) and optional hash func-tions [30].

A second driver is a potential upending of security premises due to theemergence of quantum computing. Known quantum algorithms can handilydefeat security premises based on integer factorization and discrete logarithms,which would render current public key cryptographic standards such as RSA(Rivest-Shamir-Adelman) and ECC (Elliptic Curve Cryptography) ineffective[38]. While secret key algorithms would not be completely defeated by knownquantum attacks, security could be reduced necessitating the adoption of longerkeys [16]. NIST is preparing to adopt new standards through its Post-QuantumCryptography (PQC) Standardization Process [3].

At the end of both of the above standardization efforts, it is unlikely thatthere will be a single secret or public key standard. Rather, legacy and newerstandards will likely be simultaneously employed in the near- and mid-term. Asan example, consider the case of Google’s experimentation with dual public keydeployments in its Chrome browser [12]. In this experiment, transmissions wouldbe authenticated using simultaneous ECC and a post-quantum solution (i.e., anearlier version of the NIST PQC candidate New Hope [4]). Additionally, the ideaof investigating convergence in proposed PQC and LWC algorithms was proposedin [34]. Similar situations can be contemplated for secret key cryptography inIoT Devices; for example, a lightweight hub might communicate with wirelesssensors using ZigBee protocol using a very lightweight cipher, but use a higherthroughput AES-GCM/CCM scheme with a remote command post using WiFi.

Candidate ciphers in cryptographic contests and standardization processesare evaluated by many metrics, including performance, required resources, andsecurity, prior to being finally standardized. However, most evaluations considerciphers individually, e.g., “which of ciphers A, B, or C is most efficient?” However,an emerging question to ask is “which group of ciphers {A, B, C} or {D, E, F}is most efficient when operated together?”

To address the former question, the Applications Programming Interface forHardware Implementations of Lightweight Cryptography (API HW LWC) hasbeen proposed for fair benchmarking and evaluation of candidates in the NISTLWC Standardization Process [25]. To facilitate adaption of HW implementationsto the API HW LWC, a developer’s package, the Developer’s Package (DP) for

Page 3: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3

API HW LWC (DP API HW LWC) is available [40]. Using the DP, a designerinserts a hardware implementation of an authenticated cipher in a centralizedwrapper called “CryptoCore,” and encapsulates CryptoCore in a top modulecalled “LWC,” which also contains input and output modules called “Preprocessor”and “Postprocessor” which comply with the API HW LWC.

However, the DP API HW LWC is only designed to operate with one authen-ticated cipher at a time. In fact, while HW implementations (e.g., designed usingregister transfer level (RTL) or high level synthesis (HLS)) of ciphers are moreefficient than software implementations running on generalized processors, theyrapidly lose advantage when combining functionality of multiple ciphers.

In this research, we prepare an environment to answer the latter question of“which groups of ciphers are more efficient?” We modify CryptoCore in the DPAPI HW LWC to include a common controller and datapath capable of handlingAEAD functionality for a number of authenticated ciphers, and embed a soft coreprocessor (i.e., “core”) to run the underlying block cipher associated with one ofseveral authenticated ciphers. The core is a lightweight 8-bit custom-designed softcore processor called “HOKSTER,” although any number of existing or futureprocessors could be substituted. An advantage of this architecture is that testvectors and other facilities of the DP API HW LWC are fully compatible (althoughwe have made a small addition to the API to dynamically select algorithm inuse), which enables comparison of multiple HW and SW combinations of AEADalong many dimensions.

We demonstrate the common CryptoCore with embedded processor usinga concatenation of test vectors (TVs) consisting of TVs for the COMET-AES,COMET-CHAM, and GIFT-COFB NIST LWC Round 2 candidate authenticatedciphers. Whenever a TV from a new algorithm commences processing, theCryptoCore controller loads the corresponding block cipher into the core (i.e., AES,CHAM, and GIFT, respectively). We then evaluate the resulting implementation,capable of executing multiple ciphers, in terms of area and throughput versusprevious RTL implementations of the individual ciphers.

Our contributions in this work are as follows:

1. We demonstrate a model for reducing the costs of simultaneous deploymentof existing and newly-emerging cryptographic standards, which could berequired during extended transitional periods between standards.

2. We open new arenas for comparison of cipher hardware implementations,including hybrid AEAD implementations consisting of a rigid outer hardwarelayer and a flexible inner software core, and combinations of groups of ciphersversus single implementations or other groups, using the existing API HWLWC framework.

3. We develop, test, and integrate a new 8-bit soft core microprocessor, whichadds to the variability of 8-bit platforms on which NIST candidate authenti-cated ciphers and associated primitives can be evaluated.

4. We investigate instruction set extensions for cryptographic applications, withfocus on newly-fielded CHAM and GIFT block ciphers.

Page 4: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

4 B. Rezvani et al.

2 Background

2.1 Authenticated encryption with associated data

Authenticated encryption with associated data (AEAD) is introduced in [33], andin one algorithm can provide confidentiality, authenticity and integrity withoutrelying on composition of distinct ciphers and hashes. “Associated data” (AD)refers to data which does not require encryption, but for which authenticity andintegrity are important, such as protocol or header information. Inputs to AEADinclude message or plaintext (PT ), a public message number (npub) (e.g., nonceor “number used once”), AD, and secret key (K). Outputs from authenticatedencryption are ciphertext (CT ) and a tag (Tag). During authenticated decryption,CT is converted back to PT , but only released if Tag is verified to be correct.

Underlying cryptographic primitives of AEAD, such as block ciphers, are“data-intensive;” they are highly parallel and repeatable transactions with simpleand highly-predictable control process. Therefore, they can be implemented veryefficiently in hardware using a variety of architectures. By contrast, AEAD algo-rithms are, with the exception of underlying primitives, “control-intensive;” theyhave complex and irregular control processes which must allow for a wide varietyof circumstances (e.g., missing AD or PT , partial or incomplete blocks, padding,etc.) Therefore, fully-functional single-algorithm AEAD HW implementationsare more complex than block ciphers, and multi-algorithm implementations canbe very costly.

COMET COunter Mode Encryption with authentication Tag (COMET) isa combination of Beetle and CTR modes of operation that provides AEADfunctionality [24]. COMET is single-pass and inverse-free. It has three members;in this paper, we consider two of them: COMET-128 AES-128/128 (the primarymember) and COMET-128 CHAM-128/128. The authors’ recommendations arekey size k = 128 bits, data block size n = 128 bits, nonce size n = 128 bits, andtag size τ = 128 bits. The n-bit Y -state (cipher state) and the k-bit Z-state (keystate) are concatenated to form the (n + k)-bit state of COMET. The initialstate (Y0||Z0) is formed by using the nonce (Z0 = Ek(N)) and the secret key(Y0 = K). There are 5-bit control constants that are used as domain separators todifferentiate between AD, PT , and Tag processes, and moreover, to distinguishbetween partial and full blocks.

GIFT-COFB GIFT-COFB refers to the COmbined FeedBack (COFB) modeof operation using the underlying cipher GIFT-128 [6]. Similar to COMET,GIFT-COFB is also inverse-free and single-pass. It processes 128-bit data blocksusing a 128-bit key and a 128-bit nonce, and it also provides a 128-bit tag. Theinitial state is loaded by a nonce (N) and after 40 iterations of the GIFT roundfunction, the state is updated. The upper 64 bits of the updated state are storedin a register called delta state. For each block of AD or PT , the delta state ismultiplied by 2 in GF(264) and then added to the state before the start of the

Page 5: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 5

round function process. However, in some special cases, such as partial blocks ofAD or PT , or null AD or PT segments, the delta state is multiplied by 3.

2.2 Block Cipher Primitives

In most hardware and software implementations of authenticated ciphers, under-lying primitives such as block ciphers can be invoked in a “black box” approach,from which the possibility of dynamically-varying implementations arises. TheAES, CHAM, and GIFT block ciphers are described below.

AES AES [41], or Advanced Encryption Standard, has been a U.S. and worldwidesecret key standard for two decades. AES-128 (with 128-bit plaintext blocksand 128-bit key) consists of 10 rounds of several transformations, including thelinear ShiftRows and MixColumns, the non-linear SubBytes consisting of 16 8-bitS-Boxes, and the addition of round keys, including an initial prewhitening stagebefore Round 1. A simplified rendition of AES is shown in Fig. 1; MixColumnsis not performed in Round 10, which is indicated by the path with dashed linesin the figure.

CHAM CHAM [27] is in the family of ARX (Addition, Rotation, XOR) cipherswhich use modulo-addition to develop non-linearity. It is designed to be smaller inarea in hardware than SIMON [8] and is made of features which are simpleto implement in lightweight processors, including 1) simple key scheduling;2) 1- and 8-bit left rotations only; and 3) round numbers reused as roundconstants. CHAM128/128 consists of a 4-branch Generalized Feistel Structure(GFS), executes a block encryption in 80 rounds, and is shown in Fig. 1.

GIFT GIFT [7] is an update to the popular PRESENT cipher [11]. SeveralNIST LWC Round 2 candidates use GIFT, including ESTATE [15], GIFT-COFB [6], LOTUS/LOCUS-AED [14] and SUNDAE-GIFT [5]. GIFT is a simplesubstitution-permutation network (SPN), consisting of 32 4-bit S-Boxes, a 128-bitpermutation, round constant addition, and round key updates at each of 40rounds. A simplified block diagram is shown in Fig. 2.

2.3 NIST LWC Standardization Process

In 2018, NIST established the Lightweight Cryptographic (LWC) StandardizationProcess to investigate new AEAD and optional hash algorithms which performsignificantly better than current standards. 56 candidates were accepted toRound 1 in April 2019, and 32 candidates were selected to Round 2 in August2019. The standardization process is expected to last several years, after whichnew LWC standards could be formalized. Submissions should be optimized forshort messages (e.g., 8 bytes), and should demonstrate good performance inresource-constrained environments, including 8-bit processors. NIST especiallyencourages 3rd-party (i.e., parties other than a submission’s author) evaluationsof candidates.

Page 6: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

6 B. Rezvani et al.

Fig. 1: AES (left) and CHAM (right). AES consists of 16 8-bit S-Boxes (Sub-Bytes), permutations (ShiftRows and MixColumns), and round key addition(AddRoundKey). CHAM (depicting rounds i and i + 1) consists of a Feistelstructure with 32-bit modular addition, 1-bit and 8-bit rotations, and XORs,including round constant additions. Round key computation is not shown.

2.4 Applications Programming Interface for HardwareImplementations of Lightweight Cryptography

Experience from previous cryptographic competitions, such as the NIST SHA-3 and Competition for Authenticated Encryption: Security, Applicability andRobustness (CAESAR) have shown the importance of having a standardizedAPI for HW implementations, in order to facilitate fair comparisons among alarge number of ciphers. Accordingly the Applications Programming Interfacefor Hardware Implementations of Lightweight Cryptography (API HW LWC)was proposed [25]. This API establishes a standard set of external AXI-capableinterfaces and protocols, such as pdi (public data interface), sdi (secret datainterface), and do (data output), on which all data arrive and depart. pdi, sdiand do can have bus widths w = 8, 16, or 32 bits; we use 32-bit external buswidths in our implementations. TVs are formatted using a protocol which defines

Page 7: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 7

Fig. 2: GIFT block cipher, consisting of substitution (32 4-bit S-Boxes), 128-bitpermutation, and round constant addition. Round key computation is not shown.Note: the permutation (middle) block is an abstract rendering not intended toconvey the full permutation.

all authenticated encryption and decryption (and optionally hash) operationson npub, PT , AD, K, CT and Tag. To facilitate easy deployment of the APIHW LWC, a Development Package (DP API HW LWC) is provided [40]. Usingthe DP, a designer encapsulates their design in CryptoCore (Fig. 3), with nochanges required to surrounding modules. Automated TVs are generated usinga Python script called cryptotvgen, which can be functionally verified usinga Hardware Description Language (HDL) test bench in an simulator such asVivado Simulator or ModelSim. However, the API and DP are only designed toencapsulate a single AEAD or hash algorithm at a time; there is no support fordeployment and evaluation of simultaneous multiple algorithms.

Fig. 3: LWC top module used in the Development Package for the ApplicationsProgramming Interface for Hardware Implementations of Lightweight Cryptogra-phy.

Page 8: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

8 B. Rezvani et al.

2.5 Hardware implementations of NIST LWC AEAD candidates

Hardware implementations in either FPGA or ASIC have been provided byNIST LWC candidate authors for many ciphers, including Ascon, ESTATE,SAEAES, Oribadita, LOTUS, and ACE [23, 13, 29, 10, 14, 1]. However, theseimplementations do not use a uniform implementation standard such as the APIHW LWC, making a cross-comparison difficult. One study performs a directcomparison of several NIST LWC Round 2 candidate HW implementations in theAPI HW LWC, including Ascon, COMET-AES, COMET-CHAM, GIFT-COFB,SpoC, and Schwaemm & Esch (SPARKLE) [32]. Data derived from individualcipher implementations in the Artix-7 FPGA from that work can be used for adirect comparison with the simultaneous multiple deployment in this research;results are reported subsequently.

2.6 Software implementations of NIST LWC AEAD and blockcipher primitives

Since our architecture includes a processor core to run software implementationsof AEAD-associated block ciphers, it is helpful to examine the history of softwarebenchmarking on lightweight processors.

A significant study of multiple NIST LWC AEAD implementations is con-ducted in [31]. Authors benchmark applications on the Arduino Uno R3 with8-bit ATmega328P, STM32F1 “bluepill” with 32-bit ARM Cortex M3, STM32NUCLEO-F746ZG with 32-bit ARM Cortex M4, and Espressif ESP32 WROOMwith 32-bit Xtensa LX6. All 56 NIST Round 1 candidate ciphers were bench-marked, including 213 variants using C/C++ implementations. Results pertinentto this research are reported and compared subsequently.

Another prominent study is FELICS (Fair Evaluation of Lightweight Crypto-graphic Systems), which evaluates multiple AEAD candidates and block ciphersusing C and assembly language (ASM) implementations in 8-bit AVR ATmega128,16-bit MSP430, and 32-bit ARM Atmel SAM3X8 with Cortex M3 [36]. Specif-ically, ciphers are evaluated in many lightweight-application scenarios, and aFELICS-AEAD framework has been proposed for authenticated ciphers. However,to date there has been no evaluation of the authenticated or block ciphers in thisresearch, with the exception of AES.

We note that both of the above software studies propose tailored APIs andtest frameworks which are a close analogies to the API for HW implementations,and all APIs closely resemble the SW API specified in [30].

2.7 Custom processor implementations including Instruction SetExtensions

Custom instruction set extensions (ISEs) for reconfigurable processors are anopen and fast-moving area of research. The study of ISEs for cryptographicapplications can generally be grouped into those that seek to improve efficiencyand those that address security concerns, such as side channel and fault attacks.

Page 9: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 9

The long-term desire is for security and efficiency enhancements to converge, butthis is a challenging open problem.

Efficiency In [28] the authors present XCrypto, an ISE for RISC-V to performbit permutations, S-Boxes, look up tables, and randomness generation. They alsoinclude a 16-entry 32-bit register file, which makes their enhancement similar toa co-processor. In [21] authors explore tightly coupled accelerators, including 28new instructions, to address lattice-based post-quantum cryptography (PQC).In [35], authors show that the inclusion of two simple custom instructions tothe RISC-V RV321 ISA leads to a 5-fold speed up of the permutation in theSNEIK authenticated cipher. Additionally, in [39], authors estimate the effect ofimplementing bit manipulations instructions in the RISC-V on the speed up forcryptographic operations, such as AES, ChaCha, and Keccak.

Security In [26] authors introduce SKIVA, based on the SPARC-V8 instructionset architecture and 32-bit LEON3 open-source soft core processor [17], withcustom instructions to defend against side channel and fault attacks. Basedon programs partially or wholly transformed into bitsliced applications, theISE provide support for instruction redundancy and conversion to and frombitslice representations. In [22], authors introduce an ISE concept called FENL toprevent micoarchitectural leakage present during interactions between instructionsdue to Hamming Weight, distance across pipeline stages or physical electricaltransactions. The ISE concept is agnostic to ISA or platform, but demonstratedon RISC-V. In [18], authors develop a custom side channel protected soft coreprocessor, which includes ISEs for lightweight block ciphers such as PRESENT,LED, and TWINE protected against power analysis side channel attacks.

In this research, we introduce several ISEs, with efficiency and security as thenear-term and long-term goals, respectively.

3 Design

3.1 Modified CryptoCore

In this research, we modify the internals of CryptoCore to enable simultaneousdeployment of multiple lightweight authenticated ciphers, while all deployedciphers remain compliant with the API HW LWC. Specifically, CryptoCore ismodified from its traditional RTL single-cipher structure in Fig. 3 to a new formshown in Fig. 4.

We make one modification to the API HW LWC, which is to add an m-bitcode (4 bits in this research) indicating the block cipher in use. The code isappended to the TV by creating a new API instruction (INS = 0x1) per theformat in [25]. In this research, 0x2 = COMET-AES, 0x1 = COMET-CHAM,and 0x0 = GIFT-COFB. For example, a TV for COMET-CHAM would bepreceeded by INS = 11000000, which is padded to 32 bits since w = 32 in these

Page 10: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

10 B. Rezvani et al.

implementations. The Preprocessor is also modified to read and interpret thecipher initialization instruction, which is subsequently passed to CryptoCore.

The modified CryptoCore is further detailed in Fig. 5, and includes a datapathand controller which are common to the AEAD-layer functions of the COMETand GIFT-COFB authenticated ciphers, and the soft core processor, itself encap-sulated in a loader. Common AEAD-layer functionality includes buffering andassembling strings of PT , CT , AD, npub; loading the expected Tag and perform-ing tag verification during authenticated decryption, loading and buffering ofsecret keys, assembly of initialization vectors, control decisions based on variouslengths of PT , CT and AD (including null PT or AD), and padding.

The combined datapath for this research is shown in Fig. 6; the solid linesare common between COMET and GIFT-COFB, the dashed lines are only usedin COMET, and the dotted lines are only used in GIFT-COFB. The datapathfirst designates the specified cipher to the loader on the 8-bit din bus in oneclock cycle. Then, it stores the 128-bit secret key (K) from the 32-bit key busin the Key register in 4 clock cycles. GIFT-COFB uses the same secret key forall encryption blocks (Ek) during each encryption/decryption process. However,as we mentioned before, COMET is based on the CTR mode of operation, so itexploits the ϕ permutation to update the Z-state for each block of data. Thedomain separation constants (Ctrl) are also combined with the ϕ permutation toindicate the proper location of the encryption/decryption process. Then, basedon the specified cipher, either the secret key (for GIFT-COFB) or the Z-state(for COMET) is selected to be passed to the core via the loader. Since the Ek-keyis 128 bits and the din size is 8 bits, this process takes 16 clock cycles.

The datapath gets the 128-bit nonce (N), AD, PT , and expected tag (T )from the 32-bit bdi bus and stores them in the iData register, for which loadingeach block takes 4 clock cycles. Then if it is necessary, the AD and PT blocks arepadded before inserting to the state. In GIFT-COFB, the updated state (Ek-out)is passed through the G module and then combined with both the delta state(explained in the GIFT-COFB section) and the AD/PT block. In COMET, theAD/PT block and the Ek-out are combined in the % module and then updatethe Y -state. Similar to Ek-key, either the GIFT-COFB state or the Y -state ischosen as the core input (Ek-in) and passed to the loader in 16 clock cycles.

The datapath receives the updated state (Ek-out) from the core on the 8-bitdout bus in 16 clock cycles. The CT is obtained from the combination of Ek-outand AD/PT , and if it is necessary, is truncated. The 32-bit output bus (bdo)receives either the CT block or the Tag (during encryption only) in 4 clock cycles.In decryption, the expected Tag and the computed Tag′ are compared, andbased on the result, the msg auth signal is set to one or zero, which respectivelyshows whether the tag verification step passed or failed.

The communication and signaling to the core is simple, and helps preserveflexibility to enable additional authenticated and block ciphers. Data is exchangedvia din and dout, which are n-bit (8 bits in this case, corresponding to an 8-bitprocessor core). Data exchanged includes bytes of PT , AD, CT , Tag, npub,or K, using a protocol agreed to by the CryptoCore and loader controllers. A

Page 11: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 11

Fig. 4: LWC top module with modified CryptoCore and Preprocessor, includingmulti-algorithm datapath and controller and embedded processor core.

Fig. 5: Internal construction of CryptoCore, including encapsulation of core inloader, and signaling to loader.

header signal (2 bits in this version) signals the context of din, i.e., PT , K, orcontrol information. Upon receipt of a new TV, the CryptoCore can take severalactions: 1) if the desired block cipher is not currently loaded into the core, theCryptoCore controller directs the loader controller to load the new block cipher;or 2) if the desired block cipher is already loaded, applicable data (e.g., PT , K,etc.) is loaded to core, and the block cipher commences.

The loader (detailed in Fig. 7) contains its own API to load and retrieve datafrom the core, load and run programs, reset the core, and perform other systemcalls from the core; it functions as an operating system for the core. Block cipherprograms and initialization data for AES, CHAM, and GIFT block ciphers are

Page 12: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

12 B. Rezvani et al.

bdi32

iData

Delta

||

×2×364 64 64

Ek_out[127:64]

64

64

6400...0}

64

G

Pad

Ystate

Nϱ Ek_outEk_out

N

Hard

Loader

Trunc

msg auth1 φ

Zstate

Key

key32

Ctrl

K

Ek_outK

K

Ek

_key

bdo

8

Cipher

= =ϱ

HL_D

8

8

8 8

Ek_out

Ek

_ou

t

Ek_out

32

32

32

32

Ek

_in

Fig. 6: Common datapath for AEAD-layer functionality for COMET and GIFT-COFB authenticated ciphers.

prestored in .vhd look up tables (ROM), and are loaded on command by theloader controller. By convention in our implementation, PT is always loadedinto, and CT extracted from, Data RAM locations 0x00 – 0x0F, and K is alwaysloaded to 0x10 – 0x1F. Other conventions are possible.

3.2 Custom-Designed Soft Core Processor

An embedded soft core processor is used to compute block cipher encryptionsin this research. We introduce the HOKSTER custom-designed 8-bit soft coreprocessor; HOKSTER stands for Hardware-oriented Kustom Security Test &Evaluation Resource, since it is ultimately intended for prototyping protectionsagainst active and passive side channel attacks. Although any number of popularsoft core microprocessors could be included as the embedded core, our motivationsfor including a custom-designed processor are as follows:

Page 13: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 13

Fig. 7: Loader encapsulates core, and loads and runs block cipher operations incore upon command from CryptoCore.

1. [30] calls for evaluation of candidates in 8-bit processors, yet most modern8-bit evaluations take place exclusively on the AVR ATmega series. Weprovide an alternative to increase breadth of comparison.

2. The ATmega128 is billed as a Reduced Instruction Set Computer (RISC)but has 133 instructions with 32 general purpose (GP) registers. We providea more parsimonious architecture consisting of only 38 instructions with 16(nearly) GP registers, which allows analysis of SW efficiency from a differentperspective.

3. Our architecture allows for experimentation with cryptographic-specific in-struction set extensions (ISEs) and memory-mapped accelerators at lowoverhead.

4. Microprocessor design is a living art that did not “die” with the roll-out ofthe RISC-V; alternative designs should be investigated and encouraged.

HOKSTER is an 8-bit RISC processor with a baseline Instruction Set Archi-tecture (ISA) of 38 instructions and 16 (nearly) GP registers. It uses a Harvardarchitecture with up to 4K of Program RAM (PRAM) and 64K of Data RAM(DRAM), each of which are instantiated by the user in blocks of 2n bytes atsynthesis time. Instructions (8 or 16 bits long) are drawn 8 bits at a time fromPRAM, and executed in 1 or 2 clock cycles per instruction. It is an expandedand improved version of the soft core processor used in [19, 18].

Page 14: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

14 B. Rezvani et al.

HOKSTER is a classic load-store architecture where arithmetic operations areonly permitted on values in registers, although increments and decrements arepermitted on immediate values of 1 to 16. HOKSTER has 8 8-bit “a” registersand 8 8-bit“r” registers. While all arithmetic operations are permitted on a andr registers, concatenations of like-numbered a and r registers (labeled ai : ri)are used to represent 16-bit values, such as DRAM addresses or interrupt masks.There are also special purpose registers such as pc (program counter), sp (stackpointer), sr (status register), and ie (interrupt enable) which are accessible onlythrough ISA instructions.

There are two functional units: the Arithmetic Logic Unit (ALU) and thecustom ALU (ALUc). While 16 operations are pre-defined on ALU, ALUcallows user implementation of 2-operand ISEs, which can execute in an arbitrarynumber of clock cycles. In our research we implement ISEs for hardware-friendlycryptographic operations such as S-Boxes, column multiplications in finite fields,and complex bitwise permutations. For more complex operations HOKSTERsupports memory-mapped peripherals and up to 16 priority encoded vectoredinterrupts, including user-defined interrupt service routines (ISR).

The HOKSTER source code, development tools, documentation, and applica-tion base are provided at [9]. Development tools include assembler and simulatorcoded in Python, and simulation tools designed primarily for Xilinx Vivado.Sample applications include the AES, CHAM and GIFT source code used inthis research, classic computational benchmarks like BubbleSort and TreeSort,and a Direct Memory Access (DMA) external peripheral used to demonstratememory-mapped accelerators and interrupts.

A simplified overview of HOKSTER is shown in Fig. 8. HOKSTER signalsare grouped into a number of buses, including d-bus (n-bit data communications),p-bus (8-bit words and 12-bit program addresses), a-bus (16-bit DRAM addressbus), i-bus (16-bit incoming pending interrupts), and s-bus (8-bit communicationfor core to external entities, including interrupt acknowledgement). The i-bus isnot used in this research.

3.3 Instruction Set Extensions

We implement instruction set extensions (ISEs) for several transformationsassociated with the subject block ciphers. ISEs are executed by the custom ALU(ALUc) in the format <opcode> <op1, op2>, e.g., asb r1, r2 computes theAES S-Box on contents of Register r1 and puts the result in Register r2. ISEsimplemented in this research are shown in Table 1. The numbers of clock cyclesfor an atomic ISE operation are shown in the “Cycles per operation” column, andthe effective cycles per byte are shown in the “cpb” column. In this architecture,ISEs can occur on long word or block sizes, e.g., swd or gsp, which require callsin a particular sequence; otherwise indeterminate results can occur.

For AES, the asb instruction computes 1 8-bit S-Box per call. The amc

instruction loads and computes a 32-bit column multiplication on Galois Fields(i.e., AES MixColumns transformation). For CHAM, the swd instruction loadsa 32-bit field (i.e., one Feistel branch) and performs arbitrary rotations on the

Page 15: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 15

Fig. 8: Simplified block diagram of HOKSTER soft core processor.

Table 1: Instruction Set Extensions.Cipher ISE Description Cycles per operation cpb

AES asb 8-bit S-Box 2 per S-Box=2 2.0000AES amc 32-bit Col. Mult. 2x5 calls/column+3=13 3.2500CHAM swd ROT (1 - 31 bits) on 32-bit word 2x6 calls/32-bit word+1=13 3.2500GIFT gsp 32 4-bit S-Box & 128-bit Perm. 2x23 calls/128-bit block+1=47 2.9375

field. For GIFT, the gsp instruction loads an entire 128-bit block, and computes32 4-bit S-Boxes and 128-bit permutations in parallel.

ISEs must be designed to not adversely affect the critical path of the core,which sometimes necessitates distributing ISE computations over multiple clockcycles. As shown in Table 1, effectively 2 to 3 cycles per byte are required forthese ISEs.

4 Results

4.1 Simultaneous deployment of multiple ciphers vs. individualciphers

The modified CryptoCore enabling simultaneous deployment of 3 authenticatedciphers, COMET-AES, COMET-CHAM, and GIFT-COFB, is implemented inVHDL as described above, and functionally verified in Vivado Simulator usingthe test benches included in the DP API HW LWC. The combined LWC platformis implemented on the Xilinx Artix-7 (xc7a100tcsg324-3) FPGA, and furtheroptimized using the Minerva Automated Hardware Optimization Tool [20].

Page 16: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

16 B. Rezvani et al.

Table 2: AEAD Implementations.CipherImplementation

FrequencyMHz

AreaLUT

TP FormulaCycles/Block

TPMbps

SW Comparisonµs [31]

Individual [32] - - - - -

COMET-AES 251.0 2753 20 1606 -COMET-CHAM 201.0 2214 91 283 -GIFT-COFB 263.0 1932 53 635 -

Multiple [TW] 104.0 3785 - - -

COMET-AES - - 14380 0.926 529COMET-CHAM - - 20456 0.651 1534GIFT-COFB - - 59661 0.223 3495

Although no hybrid HW/SW implementations of this type are known to exist,we can confidently compare our simultaneous deployment of 3 ciphers againstindividual RTL hardware implementations constructed with similar methodology.For example, individual implementations of COMET-AES, COMET-CHAM, andGIFT-COFB, which are compliant with the API HW LWC, use the associateddevelopment package, and are implemented in the Artix-7 FPGA (optimizedby Minerva), are available in [32]. In our implementation, resource savings interms of FPGA area (e.g., Look Up Tables – LUTs) are significant. Specifically,our combined LWC deployment requires 3785 LUTS, while the combined areaof COMET-AES, COMET-CHAM and GIFT-COFB implementations in [32] is6899 LUTs, i.e., our implementation is 55% the size of the separately packagedimplementations. However, software implementations can rarely rival the highperformance (e.g., throughput (TP)) of tailored RTL implementations, as shownin our results. Specifically, TP of our COMET-AES, COMET-CHAM, and GIFT-COFB implementations are only 0.06%, 2.3%, and 0.035% of their respectiveHW implementations; our implementation sacrifices performance for resourcesand flexibility.

An interesting indirect comparison can be made with NIST LWC AEADsoftware benchmarking performed in [31], where the authors compute total runtimes for a number of TVs on various 32-bit processors; we include their leastcapable 32-bit processor, the STM32F1 with Arm Cortex M3, for purposes ofcomparison. Even though the F1 implementations in [31] do not employ ISEs,the order of latencies (least to greatest) corresponds to order in this work (TW),namely COMET-AES, COMET-CHAM, and GIFT-COFB, whereas GIFT-COFBand COMET-CHAM are reversed when comparing basic-iterative architecturesusing pure RTL design in [32].

4.2 Comparison of HOKSTER core with popular cores

Although designing a microprocessor to outperform existing processors was nota goal of this research, a comparison of HOKSTER with some contemporarysoft core processors is provided for the interested reader (shown in Table 3).HOKSTER results are as implemented in this research; Microblaze and VexRiscv

Page 17: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 17

results are based on [45], and [43, 42], respectively. All FPGA areas in LUTs aresynthesized results on the Artix-7 FPGA. A HOKSTER core implemented at50 MHz frequency with no ISEs, including 512 bytes of Program RAM and 128bytes of Data RAM, uses 803 LUTs; The HOKSTER core as implemented inthis project with 4 ISEs requires 1130 LUTs (results are shown in parenthesis).A HOKSTER core with these 4 ISEs capable of 105 MHz, implemented using[20], required 1323 LUTs. Microblaze and VexRiscv do not include memory orinstruction extensions in the listed results.

Table 3: Soft Core Processor Comparisons. “GP” is general purpose; “ISE” isinstruction set extension; “FSL” is Fast Simplex Link.Field HOKSTER Microblaze (Default) RISC-V (VexRiscv)

LUTs 803(1130) 1190 556Max Frequency (MHz) 105 174 240Architecture Harvard Harvard HarvardBus Size 8-bit 32-bit 32-bitGP Registers 16 32 31Co-processors ISE FSL Accelerators ISE; Core extensionInterrupt Support Yes Yes YesMem. Mapped Devices Allowed AXI DMA Allowed

Memory0-4kB Program0-64kB Data

0-4GB Data0-4GB Program0-64kB Data Cache0-64kB Instruction Cache

0-4GB Data0-4GB ProgramVariable Data +Instruction Caches

ISA Instruction Count 38(42) 7947 Base,>115 with extensions

4.3 SW Implementations of Block Ciphers

Results of block cipher implementations are shown in Table 4. Cipher implemen-tations are shown without, and with ISEs (indicated by ∗). Program and DataRAM bytes are shown; Data RAM includes any stack usage. Latency (cycles)is encryption time for 1 128-bit block, and cpb is cycles per byte. Speed Up(Spd. Up) is latency improvement using ISE. ISE area (LUTs) is the synthesizedarea of additional HW components in the Artix-7 FPGA. Latency-to-area (L/A)ratio is shown, where area is basic core area (Table 3), along with change in L/Aratio L/A’ (denoting (L/A)/(L/A)∗), where area is (basic core + ISE) area. “NE”indicates that this combination was not evaluated.

Based on these results, we provide the following points of analysis:

Bitwise Permutations Bitwise permutations in GIFT, PRESENT, and otherblock ciphers are known to be easy in HW but challenging in SW [2]. This iswhy Feistel structures often have advantages in SW, as they are able to conduct

Page 18: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

18 B. Rezvani et al.

Table 4: SW Implementations.Cipher Program Data Latency cpb Spd. Up ISE Area L/A L/A’∗=ISE Bytes Bytes Cycles LUT Ratio

AES NE - - - - - - -AES* 363 64 14332 896 - 98 15.90 -

CHAM 396 64 25832 1615 - - 32.17 -CHAM* 320 64 20312 1270 1.27 86 22.85 0.71

GIFT 448 128 156884 9805 - - 195.37 -GIFT* 355 128 59564 3722 2.63 140 63.16 0.32

wholesale permutations on fields of 8, 16, 32, or 64 bits, sometimes at the cost ofmore rounds. In our non-ISE GIFT implementations, permutations account for81200 cycles out of 156884, or 52% of total cycles. In contrast, computation of32 4-bit S-Boxes requires only 21160 cycles (13% of total). However, since we arerequired to load all 128 bits into an ISE “permutator,” we can easily performall S-Box conversions at only the cost of LUT area for S-Boxes. Therefore, weimplement a single substitution + permutation ISE gsp. Since a substitution+ permutation sequence requires 47 clock cycles per round, 102360 cycles arereduced to 1880 in the ISE version, for a transactional speed up of 54× (withoverall block encryption speed up of 2.6×). The speed up comes at a 17% increasein processor area, but reduces the L/A ratio by 68%. Further improvements inGIFT and permutation-based ciphers can be realized using innovative bitslicedimplementations e.g., [2], and the impending adoption of bit manipulation ISEsand peripherals for open-source architectures, e.g., [44].

Light processor-friendly rotations Our CHAM results back up claims in [27,37] of CHAM’s efficiency in software. In particular, CHAM uses only 1-bit and8-bit left rotations. The 8-bit rotation can be implemented in a series of byteswaps in 8- or 16-bit processors; the 1-bit rotations are more complex, and areimplemented as a series of shifts and concatenations across 4 consecutive byteswith wrap-around. We provide an ISE for rotations on a 32-bit field called swd,which solves any rotations in a 32-bit Feistel branch. However, we note a speedup of only 1.3× using this ISE. This speed up comes with an 11% increase inprocessor area, which reduces L/A ratio by 29%. If we are resource-limited forISEs, we should first choose a generalized bitwise permutator (e.g., for GIFT)before fixed short-distance rotators used in CHAM.

Key scheduling Another CHAM efficiency claim in [27] is “stateless-on-the-fly” key scheduling. Since our block ciphers all include organic on-the-fly keyscheduling, the total key scheduling overhead significantly impacts performance.CHAM key scheduling, for example, incurs a one-time cost of 687 clock cyclesper block encryption (with the same key), while GIFT requires 236 cycles perround (9204 cycles total), and AES requires 648 per round (6480 total). CHAM

Page 19: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 19

key scheduling uses only 7.4% of the cycles of GIFT key scheduling; this leads toa significant performance improvement.

Auto post-increment indexing Although the HOKSTER architecture con-tains nearly only essential instructions for an orthogonal ISA, it includes autopost-increment loads and stores (lpb and spb, respectively), which load/store abyte from/to memory, and then increment the register holding the address (index).We note that without the use of l/spb, CHAM∗ takes 5520 clock cycles (or 20%)longer, than with the auto post-increment, while added core HW complexity isonly negligibly increased.

5 Conclusions

In this work we expanded the applicability of the Development Package for Ap-plications Programming Interface for Hardware Implementations of LightweightCryptography (DP API HW LWC), from its base-case of one cryptographicimplementation, to a construct allowing simultaneous deployments of multiplealgorithms. We demonstrated a modification of CryptoCore to include a commondatapath and control segment for the authenticated encryption layers of 3 NISTLWC Standardization Process Round 2 candidates COMET-AES, COMET-CHAM and GIFT-COFB, and a processor core running AES, CHAM, or GIFTblock ciphers. For the core, we designed, tested, and installed a custom 8-bit softcore processor, although substitutions with other popular cores is possible. Thecore implementations consisted of tailored software for AES, CHAM, and GIFT,and selected instruction set extensions (ISE) shown to improve performance for amodest cost in additional resources. We showed that our simultaneous deploymentof multiple ciphers consumed only 55% of the combined FPGA resources of sepa-rate individual cipher implementations using identical benchmarking frameworks.This work has demonstrated a model in which legacy and emerging cryptographicstandards can be simultaneously deployed, during a lengthy standards transitionperiod, at reduced cost. Additionally, this research opens the door to a soundbenchmarking methodology to compare hybrid hardware and software cipherimplementations to pure hardware implementations, or for comparing variousgroups and combinations of ciphers to determine their ability to efficiently shareresources.

Acknowledgement

This research was supported by National Institute of Standards and Technology(NIST) Award 70NANB18H219 for Lightweight Cryptography in Hardware andEmbedded Systems.

Page 20: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

20 B. Rezvani et al.

References

1. Aagaard, M.D., Sattarov, M., Zidaric, N.: Hardware Design and Analysis of theACE and WAGE Ciphers. arXiv:1909.12338 [cs] (Jan 2020), http://arxiv.org/abs/1909.12338, arXiv: 1909.12338

2. Adomnicai, A., Najm, Z., Peyrin, T.: Fixslicing: A New GIFT Representation.Cryptology ePrint Archive, Report 2020/412 (2020), https://eprint.iacr.org/2020/412

3. Alagic, G., Alperin-Sheriff, J., Apon, D., Cooper, D., Dang, Q., Miller, C., Moody,D., Peralta, R., Perlner, R., Robinson, A., Smith-Tone, D., Liu, Y.: Status Reporton the First Round of the NIST Post-Quantum Cryptography StandardizationProcess. Tech. rep. (2019)

4. Alkim, E., Avanzi, R., Bos, J., Ducas, L., de la Piedra, A., Poppelmann, T., Schwabe,P., Stebila, D., Albrecht, M., Orsini, E., Osheter, V., Paterson, K., Peer, G., Smart,N.: NewHope Algorithm Specifications and Supporting Documentation Version1.03 (Jul 2019), https://newhopecrypto.org/data/NewHope\_2019\_07\_10.pdf

5. Banik, S., Bogdanov, A., Peyrin, T., Sasaki, Y., Sim, S.M., Tischhauser, E., Todo, Y.:SUNDAE-GIFT: An Authenticated Cipher Submission to the NIST LWC Competi-tion (Mar 2019), https://csrc.nist.gov/Projects/Lightweight-Cryptography/Round-2-Candidates

6. Banik, S., Chakraborti, A., Iwata, T., Minematsu, K., Nandi, M., Peyrin, T.,Sasaki, Y., Sim, S.M., Todo, Y.: GIFT-COFB: An Authenticated Cipher Submissionto the NIST LWC Competition (Mar 2019), https://csrc.nist.gov/Projects/lightweight-cryptography/Round-2-candidates

7. Banik, S., Pandey, S.K., Peyrin, T., Sasaki, Y., Sim, S.M., Todo, Y.: GIFT: A SmallPresent - Towards Reaching the Limit of Lightweight Encryption. In: InternationalConference on Cryptographic Hardware and Embedded Systems (CHES). pp. 321–345 (Sep 2017)

8. Beaulieu, R., Shors, D., Smith, J., Treatman-Clark, S., Weeks, B., Wingers, L.: Si-mon and Speck: Block Ciphers for the Internet of Things (July 2015), https://csrc.nist.gov/csrc/media/events/lightweight-cryptography-workshop-2015/

documents/papers/session1-shors-paper.pdf

9. Beckwith, L., Bozzay, M., Conroy, T., Diehl, W., Laffoon, T., McFeeters, D.,Rezvani, B., Shi, Y., Vu, M.: HOKSTER Custom Microprocessor (2020), https://github.com/willja001/hokster

10. Bhattacharjee, A., List, E., Lopez, C.M., Nandi, M.: The Oribatida Family ofLightweight Authenticated Encryption Schemes (Mar 2019), https://csrc.nist.gov/Projects/Lightweight-Cryptography/Round-2-Candidates

11. Bogdanov, A., Knudsen, L.R., Leander, G., Paar, C., Poschmann, A., Robshaw,M.J.B., Seurin, Y., Vikkelsoe, C.: PRESENT: An Ultra-Lightweight Block Cipher.In: Paillier, P., Verbauwhede, I. (eds.) Cryptographic Hardware and EmbeddedSystems - CHES 2007. pp. 450–466. Springer Berlin Heidelberg, Berlin, Heidelberg(2007)

12. Braithwaite, M.: Experimenting with Post-Quantum Cryptog-raphy (Jul 2016), https://security.googleblog.com/2016/07/

experimenting-with-post-quantum.html

13. Chakraborti, A., Datta, N., Jha, A., Lopez, C.M., Nandi, M., Sasaki, Y.: ESTATE:Hardware Benchmarking and Security Analysis (Nov 2019), https://csrc.nist.gov/Events/2019/lightweight-cryptography-workshop-2019

Page 21: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

Efficient Simultaneous Deployment of Multiple LW Authenticated Ciphers 21

14. Chakraborti, A., Datta, N., Jha, A., Lopez, C.M., Nandi, M., Sasaki, Y.: LOTUSand LOCUS AEAD: Hardware Benchmarking and Security (Nov 2019), https://csrc.nist.gov/Events/2019/lightweight-cryptography-workshop-2019

15. Chakraborti, A., Datta, N., Jha, A., Lopez, C.M., Nandi, M., Sasaki, Y.: ESTATE(Mar 2019), https://csrc.nist.gov/Projects/Lightweight-Cryptography/

Round-2-Candidates

16. Chen, L., Jordan, S., Liu, Y., Moody, D., Peralta, R., Perlner, R., Smith-Tone, D.:Report on Post-Quantum Cryptography. Tech. rep., National Institute of Standardsand Technology (2016)

17. Cobham Gaisler Research: LEON-3 Processor (2018), https://www.gaisler.com/index.php/products/processors/leon3

18. Diehl, W., Abdulgadir, A., Kaps, J., Gaj, K.: Side-channel resistant soft coreprocessor for lightweight block ciphers. In: 2017 International Conference on Re-ConFigurable Computing and FPGAs (ReConFig). pp. 1–8 (2017)

19. Diehl, W., Farahmand, F., Yalla, P., Kaps, J., Gaj, K.: Comparison of hardwareand software implementations of selected lightweight block ciphers. In: 2017 27thInternational Conference on Field Programmable Logic and Applications (FPL).pp. 1–4 (2017)

20. Farahmand, F., Ferozpuri, A., Diehl, W., Gaj, K.: Minerva (Dec 2017), https://cryptography.gmu.edu/athena/index.php?id=Minerva

21. Fritzmann, T., Sigl, G., Sepulveda, J.: RISQ-V: Tightly Coupled RISC-V Accelera-tors for Post-Quantum Cryptography. Cryptology ePrint Archive, Report 2020/446(2020), https://eprint.iacr.org/2020/446

22. Gao, S., Marshall, B., Page, D., Pham, T.: FENL: an ISE to mitigateanalogue micro-architectural leakage. IACR Transactions on CryptographicHardware and Embedded Systems (TCHES) 2020(2), 73—-98 (3 2020).https://doi.org/https://doi.org/10.13154/tches.v2020.i2.73-98

23. Gross, H., Wenger, E., Dobraunig, C., Ehrenhofer, C.: Ascon hardware implemen-tations and side-channel evaluation. Microprocessors and Microsystems 52, 470–479 (Jul 2017). https://doi.org/10.1016/j.micpro.2016.10.006, https://linkinghub.elsevier.com/retrieve/pii/S0141933116302721

24. Gueron, S., Jha, A., Nandi, M.: COMET: COunter Mode Encryption withauthentication Tag. Tech. rep. (Nov 2019), https://csrc.nist.gov/Projects/

lightweight-cryptography/round-2-candidates

25. Kaps, J.P., Diehl, W., Tempelmeier, M., Homsirikamol, E., Gaj, K.: Hardware APIfor Lightweight Cryptography. Tech. rep. (Oct 2019), https://cryptography.gmu.edu/athena/index.php?id=LWC

26. Kiaei, P., Mercadier, D., Dagand, P.E., Heydemann, K., Schaumont, P.: CustomInstruction Support for Modular Defense against Side-channel and Fault Attacks.Cryptology ePrint Archive, Report 2020/466 (2020), https://eprint.iacr.org/2020/466

27. Koo, B., Roh, D., Kim, H., Jung, Y., Lee, D.G., Kwon, D.: CHAM: A Family ofLightweight Block Ciphers for Resource-Constrained Devices. In: Kim, H., Kim,D.C. (eds.) Information Security and Cryptology – ICISC 2017, vol. 10779, pp.3–25. Springer International Publishing, Cham (2018)

28. Marshall, B., Page, D., Pham, T.: XCrypto: a cryptographic ISE for RISC-V (2019),https://github.com/scarv/xcrypto

29. Naito, Y., Matsui, M., Sakai, Y., Suzuki, D., Sakiyama, K., Sugawara, T.: SAEAES(Feb 2019), https://csrc.nist.gov/Projects/Lightweight-Cryptography/

Round-2-Candidates

Page 22: E cient Simultaneous Deployment of Multiple Lightweight Authenticated Ciphers · 2020. 5. 23. · E cient Simultaneous Deployment of Multiple LW Authenticated Ciphers 3 API HW LWC

22 B. Rezvani et al.

30. National Institute for Standards and Technology: Submission Requirements andEvaluation Criteria for the Lightweight Cryptography Standardization Process. Tech.rep. (Aug 2018), https://csrc.nist.gov/projects/lightweight-cryptography

31. Renner, S., Pozzobon, E., Mottok, J.: Benchmarking Software Implementations of1st Round Candidates of the NIST LWC Project on MCUs. Tech. rep., Laboratoryfor Safe and Secure Systems, OTH Regensburg (2019)

32. Rezvani, B., Coleman, F., Sachin, S., Diehl, W.: Hardware Implementations ofNIST Lightweight Cryptographic Candidates: A First Look p. 26 (Feb 2020),https://eprint.iacr.org/2019/824/20190716:135314

33. Rogaway, P.: Authenticated-encryption with associated-data. p. 98 (01 2002).https://doi.org/10.1145/586123.586125

34. Saarinen, M.J.O.: Exploring NIST LWC/PQC Synergy with R5Sneik: How SNEIK1.1 Algorithms were Designed to Support Round5. Cryptology ePrint Archive,Report 2019/685 (2019), https://eprint.iacr.org/2019/685

35. Saarinen, M.J.O.: SNEIK on Microcontrollers: AVR, ARMv7-M, and RISC-Vwith Custom Instructions. Cryptology ePrint Archive, Report 2019/936 (2019),https://eprint.iacr.org/2019/936

36. dos Santos, L.C., Großschadl, J., Biryukov, A.: FELICS-AEAD: Benchmarkingof Lightweight Authenticated Encryption Algorithms. Tech. rep., University ofLuxembourg – CryptoLux (2020)

37. Seok, B., Lee, C.: Fast implementations of ARX-based lightweight block ciphers(SPARX, CHAM) on 32-bit processor. International Journal of Distributed SensorNetworks 15(9), 10 (Sep 2019). https://doi.org/10.1177/1550147719874180

38. Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factoring.In: Proceedings 35th Annual Symposium on Foundations of Computer Science. pp.124–134 (1994)

39. Stoffelen, K.: Efficient Cryptography on the RISC-V Architecture. In: Schwabe, P.,Theriault, N. (eds.) Progress in Cryptology – LATINCRYPT 2019. pp. 323–340.Springer International Publishing, Cham (2019)

40. Tempelmeier, M., Farahmand, F., Homsirikamol, E., Diehl, W., Kaps, J.P., Gaj, K.:Implementer’s Guide to Hardware Implementations Compliant with the HardwareAPI for Lightweight Cryptography (Nov 2019), https://cryptography.gmu.edu/athena/index.php?id=LWC

41. U.S. Commerce Department: Announcing the Advanced Encryption Standard(AES). Tech. rep., National Bureau of Standards (2001)

42. Verbeure, T.: SpinalHDL/VexRiscv (2018), https://github.com/SpinalHDL/

VexRiscv#area-usage-and-maximal-frequency

43. Waterman, A., Asanovic, K.: The RISC-V Instruction Set Manual Volume I:User-Level ISA (2017), https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf

44. Wolf, C.: RISC-V Bitmanip Extension (2020), https://github.com/riscv/

riscv-bitmanip

45. Xilinx: Microblaze Processor Reference Guide (2008), https://www.xilinx.com/support/documentation/sw_manuals/mb_ref_guide.pdf