efficient implementation of sha-3 hash function on 8-bit avr … · 2020. 11. 27. · efficient...
TRANSCRIPT
-
Efficient Implementation of SHA-3 Hash Function on 8-bit AVR-based Sensor Nodes
YoungBeom Kim, Hojin Choi, Seog Chung SeoCryptography Optimization & Application Lab,
Department of Information Security, Cryptology, and Mathematics, Kookmin University
Cryptography Optimization & Application Lab
-
Contents
• Introduction
• Memory optimization
• Chaining optimization methodology
• Experimental result
• Conclusions
Cryptography Optimization & Application Lab
-
Introduction
Cryptography Optimization & Application Lab
-
Some Context• Hash Function provides data integrity
• Fatal reverse attack has been filed against the existing SHA-2 Family
• The importance and demand of SHA-3 is increasing
• No single implementation method is more efficient than all others on ever possible platforms
• Existing efficient designs are usually hardware or specific architectures (Parallel system) oriented
• SHA-3 is a core algorithm used in MAC, digest, digital signature, DRBG, PQC, and so on.
• General software optimization method for various platforms is an important issue
• As 5G industry increases, a efficient implementation method of SHA-3 in embedded devices is important.
Cryptography Optimization & Application Lab
-
Overview of SHA-3• Keccak algorithm selected to be next-generation hash function in SHA-3 competition held by NIST
• SHA-3 based on Sponge structure• Absorbing Process : Compressing message and updating internal state by 𝑓𝑓-function• Squeezing Process : Computing digest
Fig. 1: Overview of Sponge structure
Cryptography Optimization & Application Lab
-
Overview of SHA-3• 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 of 𝑓𝑓-function is a three-dimensional 𝑥𝑥 × 𝑦𝑦 × 𝑧𝑧 matrix
• Row 𝑥𝑥 and Column 𝑦𝑦 are both fixed to five• Consisting of 25 lanes
Fig. 2: State of SHA-3
Cryptography Optimization & Application Lab
-
Overview of SHA-3• 𝜃𝜃 process
• XOR each bit in 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 with parties of two columns • XORing sum of columns ((𝑥𝑥 − 1),𝑧𝑧) and ((𝑥𝑥 + 1),(𝑧𝑧 − 1))
Alg. 1: Algorithm of 𝜃𝜃 process Fig. 3: Overview of 𝜃𝜃 process
Cryptography Optimization & Application Lab
-
Overview of SHA-3• 𝜋𝜋 process
• Rearranging the positions of the lanes• Not changing value of lanes
Fig. 4: Overview of 𝜋𝜋 process Fig. 5: Detail Structure of 𝜋𝜋 process
Alg. 2: Algorithm of 𝜋𝜋 process
Cryptography Optimization & Application Lab
-
Overview of SHA-3• 𝜌𝜌 process
• Right-rotating the bits of each lane as much as offset • Not changing position of lanes• Implemented in combination with 𝜋𝜋 process in standard implementation method
Fig. 5: Overview of 𝜌𝜌 process Alg. 3: Algorithm of 𝜌𝜌 process
Cryptography Optimization & Application Lab
-
Overview of SHA-3• 𝜒𝜒 process
• XORing each bit with a nonlinear function of two other bits in its row• Operating in row form
• 𝜄𝜄 process• XORing Round-constant and S[12] of 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠• Operating for single lane
Fig. 6: Overview of 𝜒𝜒 process Alg. 4: Algorithm of 𝜒𝜒 process
Cryptography Optimization & Application Lab
-
Standard Method• The standard implementation method of SHA-3 follows as: 𝜃𝜃 → 𝜋𝜋~𝜌𝜌 → 𝜒𝜒~𝜄𝜄• Combing 𝜋𝜋 process and 𝜌𝜌 process into 𝜋𝜋~𝜌𝜌 process• Accessing 7 times to 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 during 𝑓𝑓-function
• When b = 1600, 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is 200 bytes and 𝑓𝑓-function comprise 24 round • Requiring 168 memory access to 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 during 𝑓𝑓-function
• Memory access cause higher overhead than arithmetic and logical operations in low-end-processor
StandardMethod Initial 𝜽𝜽 𝜽𝜽 process 𝝅𝝅~𝝆𝝆 process 𝝌𝝌~𝜾𝜾 process Total Access
Load O O O O7 times
Store X O O O
Table. 1: Number of memory access to 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 in Standard Method
Cryptography Optimization & Application Lab
-
Memory Optimization
Cryptography Optimization & Application Lab
-
Memory Optimization• Proposed implementation method of SHA-3 follows as: 𝜃𝜃~𝜌𝜌 (𝜋𝜋) → 𝜒𝜒~𝜄𝜄• Implementing 𝝅𝝅 process implicitly in 𝜃𝜃~𝜌𝜌 process• Combing 𝜃𝜃 process and 𝜌𝜌 process into 𝜃𝜃 ~𝜌𝜌 process• Accessing 5 times to 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 during 𝑓𝑓-function
• Requiring 120 memory access to 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 during 𝑓𝑓-function• Proposed Method : 120 < Standard Method 168
• Reducing memory access twice compared to the standard implementation method
ProposedMethod Initial 𝜽𝜽 𝜽𝜽~𝝆𝝆 process 𝝅𝝅 process 𝝌𝝌~𝜾𝜾 process Total Access
Load O O X (Implicitly) O5 times
Store X O X (Implicitly) O
Table. 2: Number of memory access to 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 in Proposed Method
Cryptography Optimization & Application Lab
-
Memory Optimization• 𝜽𝜽 and 𝝆𝝆 process execute independent operation for lane
• Appling 𝝆𝝆 process before storing in 𝜽𝜽 process
• Appling 𝝅𝝅 process implicitly when updating state (store)• 𝝅𝝅 process is a rearrange process for each lane• 𝝅𝝅 process can be executed implicitly
• Memory address translation operation occurs only once• 𝜽𝜽 and 𝝅𝝅~ 𝝆𝝆 process require twice translation in standard method• Standard Method : 𝜽𝜽 → 𝝅𝝅~𝝆𝝆; twice• Proposed Method : 𝜽𝜽~𝝆𝝆 𝝅𝝅 ; once Fig. 7: Overview of Proposed Method
Cryptography Optimization & Application Lab
-
Chaining optimization methodology
Cryptography Optimization & Application Lab
-
Target Platforms
• 8-bit AVR MCUs• ATmega 128• Popularly used in WSNs (Wireless Sensor Networks)
• Spec of ATmega 128• Flash Memory : 128 KB• SRAM : 4KB• EEPROM : 4KB• 32 8-bit general-purpose registers
Fig. 8: ATmega 128
Cryptography Optimization & Application Lab
-
Register Scheduling• The generally used parameter is b = 1600, where the state is 200 bytes• R8-R15 and R16-R23 hold two lanes• R2-R5 are used to translate the memory address
• Initial 𝜽𝜽 and lanes of 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
Fig. 9: Register Scheduling for Proposed Method in 8-bit AVR MCUs
Cryptography Optimization & Application Lab
-
• To apply 𝝅𝝅 process implicitly, we propose a Chaining optimization methodology in 8-bit AVR• Data Load to register (𝜽𝜽 process) Memory translation in register (𝝅𝝅 process) Data Store to Memory (𝝆𝝆 process)• 𝜽𝜽~𝝆𝝆 (𝝅𝝅) process uses R8-R15, R16-R23 alternately we call it “Chain Implementation”
• 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 (200 bytes) cannot be held in the register operating lane unit• Here, memory address translation (cost 𝛼𝛼) is occurred in each process• Combining 𝜽𝜽~𝝆𝝆 process, memory address translation cost reduced to two times (4 2)
StandardMethod Initial 𝜽𝜽 𝜽𝜽 process
𝝅𝝅~𝝆𝝆process
Load O + 𝛼𝛼 O + 𝛼𝛼 O + 𝛼𝛼
Store X O O + 𝛼𝛼
ProposedMethod Initial 𝜽𝜽
𝜽𝜽~𝝆𝝆process 𝝅𝝅 process
Load O + 𝛼𝛼 O + 𝛼𝛼 X (Implicitly)
Store X O X (Implicitly)
Cryptography Optimization & Application Lab
Chaining optimization methodology
-
Chaining optimization methodology
Fig. 11: Proposed ImplementationTemp : Empty
State : S’ [0]
Cryptography Optimization & Application Lab
-
Temp : S’[4]
State : S’ [0]
Fig. 11: Proposed Implementation
Cryptography Optimization & Application Lab
Chaining optimization methodology
-
Temp : S’[4]
State : S’[14]
Fig. 11: Proposed Implementation
Cryptography Optimization & Application Lab
Chaining optimization methodology
-
Temp : S’[17]
State : S’[14]
Fig. 11: Proposed Implementation
Cryptography Optimization & Application Lab
-
Temp : S’[17]
State : S’[15]
Fig. 11: Proposed Implementation
Cryptography Optimization & Application Lab
-
Temp : S’[8]
State : S’[5]
Cryptography Optimization & Application Lab
Chaining optimization methodology
-
Experimental Result
Cryptography Optimization & Application Lab
-
• 25.7% performance improvement over Balasch et al.’s implementation• Our Work is the fastest implementation of SHA-3 in 8-bit AVR microcontroller• Narrowing the difference in performance by about two times compared to the SHA-2 Family
• Existing implementations have nearly three times the difference in performance
Experimental Result
Reference Algorithm LanguageLength of message byte
50 byte 100 byte 500 byte
This Work SHA-3 (256-bit) Asm 2667(+25.1%)1333
(+25.7%)1073
(+25.0%)
Otte et al. SHA-3 (256-bit) C, Asm 12854 6427 1672
Balasch et al. SHA-3 (256-bit) Asm 3560( - )1795 ( - )
1432( - )
Balasch et al. SHA-256 Asm 672 668 532
Balasch et al. Blake (256-bit) Asm 714 708 562
Balasch et al. Photon (256-bit) Asm 9723 7892 4788
Table. 3: Performance of SHA-3 by hash rate (CPB), when hashing a byte of various message in 8-bit AVR
Cryptography Optimization & Application Lab
-
Conclusion
Cryptography Optimization & Application Lab
-
• We introduced a new generic fast implementation method of SHA-3
• Proposed Method not requires a lookup table or additional operations
• Proposed Chaining optimization methodology of SHA-3 is the fastest implementation
• Our Work is efficiently applicable in PQC, DRBG, MAC, and so on
• Our Work is a generic method that can be a applied to various platforms
Conclusion
Cryptography Optimization & Application Lab
-
Question?Contact me : darania @ kookmin.ac.kr
Cryptography Optimization & Application Lab
-
Thank You~Cryptography Optimization & Application Lab
Efficient Implementation of SHA-3 Hash Function on 8-bit AVR-based Sensor Nodes�ContentsIntroductionSome ContextOverview of SHA-3Overview of SHA-3Overview of SHA-3Overview of SHA-3Overview of SHA-3Overview of SHA-3Standard MethodMemory OptimizationMemory OptimizationMemory OptimizationChaining optimization methodologyTarget PlatformsRegister SchedulingChaining optimization methodologyChaining optimization methodologyChaining optimization methodologyChaining optimization methodology슬라이드 번호 22슬라이드 번호 23Chaining optimization methodologyExperimental ResultExperimental ResultConclusionConclusionQuestion?Thank You~