vermi: veriﬁcation tool for masked implementations · ti provides security without the ideal-gate...

VerMI: Verification Toolfor Masked Implementations

Victor ArribasKU Leuven, imec-COSIC, Belgium

[email protected]

Svetla NikovaKU Leuven, imec-COSIC, Belgium

[email protected]

Vincent RijmenKU Leuven, imec-COSIC, Belgium

[email protected]

Abstract—Masking is a widely used countermeasure againstSide-Channel Attacks, nonetheless, the implementation of thesecountermeasures is challenging. Experimental security evaluationrequires special equipment, a considerable amount of time, andextensive technical knowledge. Therefore, to automate and tospeed up this process, a formal verification can be performed toasses the security of a design. In this work we present VerMI, averification tool in the form of a logic simulator that checks theproperties defined in Threshold Implementations to address thesecurity of a hardware implementation for meaningful ordersof security. The tool is designed so that any masking schemecan be evaluated. It accepts combinational and sequential logicand is able to analyze an entire cipher in short time. Withthe tool we have managed to spot a flaw in the round-basedKECCAK implementation by Gross et al., published in DSD 2017.

Index Terms—VerMI, Masking, Side-Channel Analysis,Glitches, Formal Verification, Logic Simulator, Non-completeness,Uniformity

I. INTRODUCTION

Implementing secure cryptographic algorithms in hardwarehas become quintessential, since physical attacks on crypto-graphic devices are getting increasingly popular and easier toreproduce. Side-Channel Attacks (SCA) exploit the leakagefrom different physical properties of the device. Severalcountermeasures have been proposed in the literature, includinghiding and masking [1]. Masking has caught most attention,with multiple schemes published to date [2]–[5], where securityagainst SCA is achieved by randomizing the intermediate valuesof a cryptographic implementation.

Checking if a cryptographic implementation is in factsecure is a very demanding exercise. One way to do it isby using attack based evaluations. Nevertheless, this is highlyinefficient, since it requires a considerable effort implementingdifferent attacks, solely ensuring security against those specificattacks. Following the same principle, there is the non-specifictest vector leakage assessment (TVLA) [6] over the powerconsumption or the electromagnetic radiation of the device todetect leakage. Yet, TVLA and attack based evaluations needthe actual ASIC or a correctly programmed FPGA, as well asspecial equipment to be performed.

Formal verification is another way of evaluating the securityof a cryptographic implementation. This method studies thetheoretical conditions for a design to be secure [7]–[9] ormodels the implementation to simulate its behavior [10], [11].

Our contribution: We propose VerMI, a Verification Tool toaid during the design flow towards secure hardware implemen-tations. The tool helps to spot any mistake with bit precisionwhen masking a cryptographic hardware implementation, andprovides an initial univariate security evaluation without anyspecial equipment. VerMI works at both algorithmic andimplementation level, with an inherent capability to checkdesigns in the presence of glitches. We build a structuralmodel of the circuit, providing highly precise informationon the behavior of every bit. The tool is tested for differentimplementations from the literature, finding a flaw in theround-based KECCAK implementation in [12]. The authorsacknowledged our findings and updated their implementationsacordingly in [13].

Fig. 1: VerMI’s flow diagram.

II. PRELIMINARIES

A. Threshold Implementations

Threshold Implementations (TI) [2], [3] is a boolean maskingscheme based on secret sharing, where each sensitive, i.e.key dependent, data element x is divided into s pieces (x =(x1, . . . , xs)), such that x = x1 ⊕ . . .⊕ xs, where all s sharesare needed to derive x. The aim of masking is to provideformal methods to randomize intermediate variables and henceprovide security in hardware implementations. TI imposes threemandatory properties for a design to be secure:

• Correctness: a shared function f such that fi(x) = yi,where i = 1, . . . , s, is correctly shared if

∑yi = y =

f(x).• Non-completeness: a shared function f is dth-order non-

complete if any combination of up to d componentfunctions fi is independent of at least one input share. Inthis case d is the degree of security.

• Uniformity: the outputs of a shared function have toconform a uniform distribution.

TI provides security without the ideal-gate assumption.Any gate can glitch depending on prior inputs of the sameclock cycle before stabilizing, creating glitch dependenciesor unexpected combinations. The non-completeness propertyavoids these dependencies to cause security losses.

The number of shares s is defined by the degree of securityd and the algebraic degree of the function t. We provide anexample (Ex. 1) of how (and how not) to protect with TI:z = a⊕ bc, where (d, t, sin, sout) = (1, 1, 3, 3):

Example 1

z1 = a1 ⊕ b1c1 ⊕ b1c2 ⊕ b2c1

z2 = a2 ⊕ b2c2 ⊕ b2c3 ⊕ b3c2

z3 = a3 ⊕ b3c3 ⊕ b1c3 ⊕ b3c1

Non-complete

z1 = a1 ⊕ b1c1 ⊕ b1c2 ⊕ b3c1

z2 = a2 ⊕ b2c2 ⊕ b2c3 ⊕ b3c2

z3 = a3 ⊕ b3c3 ⊕ b1c3 ⊕ b2c1

Complete

Consolidated Masking Schemes (CMS) [4] and DomainOriented Masking (DOM) [5] are two masking schemes thatsatisfy the TI properties. Both aim to reduce the number ofshares, by introducing additional randomness and relying onthe independence of the shared input bits.

B. Previous work

Formal verification of masked implementations is a widelyresearched topic [7], [8], [14]. These works target softwareimplementations and none of them take into account glitches.Therefore, they can not be applied to verify hardware imple-mentations.

Formal verification of masked hardware implementationscan be done in several stages of the design flow: at algorithmic,implementation or physical level. In the work of Reparaz [11]a tool for verification at algorithmic level is presented. Thetool receives a software implementation of the secured functionand performs a leakage assessment over simulated traces.However, glitches are not taken into account. At implementationlevel, Bertoni et al. [10] present a tool suitable for hardwareimplementations in the presence of glitches. Nevertheless, theirtool only analyzes combinational logic with a simple powermodel. Bloem et al. [9] also present a formal verificationmethod for hardware implementations in the presence ofglitches. Given a hardware implementation, they extract thenetlist and model the logic gates using a Fourier representation.To simulate glitches, the model of a gate is extended to computeany Boolean function from its original inputs. Additionally, toanalyze higher-orders, a SAT solver is instantiated. In both casesthe analysis complexity increases enormously. At physiscallevel there is TVLA or attack based evaluations.

III. VERMI

VerMI practically evaluates the TI properties (excluding cor-rectness) to ensure the correct application of the TI/CMS/DOMsharing. The tool creates a specific .tcl file for the design toanalyze, automating the RTL synthesis as well as the netlistgeneration. VerMI offers two main functionalities: the checkingof non-completeness and uniformity. Fig. 1 presents the tool’sflow diagram.

A. Logic Simulator

Our tool receives an HDL design that may be written inVHDL or Verilog. To unify the structure of the files, we firstsynthesize the HDL using RTL synthesis tools with open sourcelibraries to extract the netlist. The netlist is parsed, and astructural model of the circuit in the shape of a Binary DecisionDiagram (BDD) is built using wires, pins, and gates as objectsconnected to one another. Then, we implement a logic simulatorthat performs an event-driven gate-level simulation.

B. Header

The top module of the design to analyze must include a smallheader in which important data for the tool is specified. Thedata include the names of the sensitive inputs, the outputs ofevery layer of registers, random numbers, and the outputs to beanalyzed. Thus, the tool can distinguish sensitive data (protecteddata) from the inputs and from every stage of registers, randomdata, and sensitive outputs. Variables and shares are sorted in aN by M matrix, where N is the number of variables and Mthe number of shares (Fig. 2a). This data structure will helpthe tool perform the evaluation of Non-completeness.

C. Non-completeness

VerMI is able to check any-order non-completeness, and tooutput the specific bits on which this property is not fulfilled.

1) Input dependencies: We first collect the input dependen-cies of every Primary Output with a recursive search throughthe BDD.

2) Decision algorithm: Then, we check whether all sharesfor the same variable are included. If they are all included, thevariable fails non-completeness. The tool goes through everyoutput performing this check, reporting the ones that fail.

To check whether all shares for the same variable areincluded is not a trivial task, since the tool is unable todistinguish between variables and shares. The tool constructsthe same matrix as in III-B with the input dependencies ofevery output and checks whether any variable uses all shares.Fig. 2 presents the tables for both third output shares of Ex. 1.

a) b) c)

Fig. 2: Left to right: complete sensitive data set, third share(Ex. 1) non-complete, and third share (Ex. 1) complete.

3) Single bit variables: Variables in the analyzed HDL mod-ule are handled bitwise. Managing an entire bus, for example,would be faster, but also less detailed. This distinction is crucialto make the tool able to catch all possible dependencies andthus to have a robust non-completeness analysis.As a recentexample, VerMI was able to detect a flaw [15] in the round-based KECCAK implementation of [12].

4) dth-order non-completeness.: All possible combinationsof d outputs are analyzed with the same method presentedabove. The difference now is that the dependencies of the doutputs are combined, building the data matrix with all ofthem. All combination of d output variables that together failnon-completeness are reported. Currently VerMI supports upto third-order non-completeness verification. However, it canbe extended to support verification of any meaningful order.

5) Dealing with glitches: The structural model built byVerMI ignores the timing of the gates. How, then, are glitchesdetected? We do not detect glitch dependencies themselves,instead we check for non-completeness. Hence, if this propertyis fulfilled ensures security in the presence of glitches [2].

D. Uniformity

Next functionality is to check whether uniformity is pre-served across operations. The tool calculates the outputdistribution of a given module by simulating all possibleinputs. The frequencies of occurrence of each output vector arecalculated for the different input sharings. If all frequenciesare the same inside every group of input sharings, then themodule is said to preserve uniformity. To generate uniformity-preserving shares is easy for invertible linear operations, but notfor non-linear operations, since interaction between shares isneeded. This functionality is meant to analyze these functionsseparately and check whether they preserve uniformity.

1) Evaluation: We use the formulas from [3] to evaluatethe uniformity. We check whether, for certain unshared input,every output has the same frequency of occurrence when allpossible sharings of this input are evaluated. This is repeatedfor every possible unshared input together with, if added, allrandom bits specified.

2) Limitations: It is important to note that the uniformitycheck is limited by the total number of input bits and randombits. As explained above, the program has to generate allpossible inputs for the circuit and evaluate each of them to getthe output. Naturally, it is only feasible to check modules witha small number of inputs, for instance the non-linear operations(S-boxes) in a cryptographic algorithm. The designer needs toprovide to the tool the sub-module that implements the S-box.

IV. SEQUENTIAL CIRCUITS

In this section, we present how VerMI handles sequentialcircuits. The analyzed circuit is divided into all possiblecombinational sub-circuits to analyze non-completeness, orsimulated altogether to analyze uniformity.

When designing for security, other than for functionality,the registers are required to prevent propagation of glitchdependencies and to synchronize refreshing. Uniformity is notaffected by the presence of registers. Thus, the main goal of theregisters, security wise, is to split the design in different sub-circuits in order to prevent the glitch propagation. Therefore,the tool analyzes non-completeness over the combinationallogic in between registers, while uniformity is analyzed withregisters included.

PO4

Fig. 3: Top: sequential circuit. Bottom: combinational sub-circuits for first and second stages.

For a tool it is not trivial to divide a netlist into differentsub-circuits. It is imperative to include the exact componentsthat correspond to a certain sub-circuit in order to perform anaccurate analysis on the security afterwards. Missing an elementor including extra components, would result in a possibly non-complete output that actually should have failed the test, or,on the other hand, in extra dependencies that are not in thecircuit, producing a false warning. We design a Back-and-forth recursive algorithm to correctly split the netlist. Fig. 3illustrates how the algorithm works.

The simulation of registers is only needed for uniformityanalysis, which is not affected by the fact that registers aresynchronization elements. Thus, the simulator ignores theconcept of time and registers are treated as plain buffers.

V. EVALUATION

Experiments are run in an Intel Core i5-4590 CPU, 3.3GHzclk, and 7.6 GB RAM running a 64-bit Linux CentOS7.

1) Non-completeness: Fig. 4 presents the time that thetool needs to evaluate first-, second- and third-order non-completeness for the full round of several KECCAK imple-mentations to see the effect of the different number of outputson the execution time.

Tim

e (

ln s

)

Tim

e (

ln s

- m

ean(l

n s

))

Tim

e (

ln s

)

Tim

e (

ln s

- m

ean(l

n s

))

Tim

e (

ln s

)

Tim

e (

ln s

)

Fig. 4: From left to right and top to bottom: first-, second- andthird-order non-completeness evaluation, time versus numberof output sensitive bits

The first picture shows the actual time for the different ordersin the logarithmic scale. A second picture is provided with theplots centered to zero. As it can be seen, not only the actualtime of higher-order secure implementations is greater, but also

TABLE I: Verification Tool benchmark

Design Cells Variables 1st N-C 2nd N-C 3rd N-C Uniformity

Comb. Regs. Ins Rand Outs Time Result Time Result Time Result Time Result

Shared AND gatesTrich [16] 7 0 4 1 2 2.2 ms 7 2.5 ms 7 2.5 ms 7 442 µs 3

ISW [17] 7 0 4 1 2 2.2 ms 7 2.6 ms 7 2.4 ms 7 365 µs 3

TI [2] 12 0 6 0 3 2 ms 3 2.7 ms 7 2.6 ms 7 739 µs 7

DOM [5] 8 4 4 1 2 1.7 ms 3 2.2 ms 7 2.5 ms 7 189 µs 3

Shared KECCAK S-boxesDOM [12] 45 20 10 5 10 1.3 ms 3 2.6 ms 7 10.1 ms 7 1.6 s 3

TI3sh. [18] 99 0 15 0 15 1.9 ms 3 2.8 ms 7 11 ms 7 1.7 s 7

CG3sh. [19] 104 0 15 4 15 2 ms 3 4.5 ms 7 9.8 ms 7 29 s 3

TI4sh. [20] 82 0 20 0 20 1.2 ms 3 4.4 ms 7 16.3 ms 7 62 s 3

Shared d + 1 AES S-boxes2sh. [21] 862 144 16 64 16 4.8 ms 3 55.7 ms 7 417 ms 7 - -3sh. [21] 1.95k 270 24 162 24 6.5 ms 3 106 ms 3 4.03 s 7 - -

Shared round-based KECCAK ImplementationsDOM [12] 2.7k 800 400 200 800 22.86 ms 7 7.28 s 7 45.2 m 7 - -TI3sh. 4.2k 0 600 0 600 37.82 ms 3 11.12 s 7 58.8 m 7 - -TI4sh. 6.3k 0 800 0 800 60.98 ms 3 21.47 s 7 150 m 7 - -TI6sh. 17k 0 1200 0 1200 137.28 ms 3 87 s 3 14.8 h 7 - -

Related work [9] Analysis time w. glitches/ResultDOM AND 8 2 - - - - - - - - - ≤ 2 s 3

DOM KEC. SB 50 10 - - - - - - - - - ≤ 20 s 3

DOM AES SB 536 208 - - - - - - - - - ≤ 5-10 h 3

the higher the order of the analysis, the faster the time growswith the number of output bits.

2) Uniformity: We evaluate different first-order KECCAK S-boxes from the literature [18]–[20], getting the same result asclaimed in the literature. The biggest design analyzed was asix shares 2nd-order KECCAK S-box, which meant evaluating230 test vectors, and which took almost 2 days.

3) Benchmarking: Table I gathers the results for severalimplementations analyzed: protected AND gates from the liter-ature, various sharing schemes of the KECCAK S-box and AESS-box, and few whole round-based KECCAK implementations.

Contrary to [9], VerMI is limited to univariate analysis(probes in the same cycle). However, this results in a greatlyimproved performance, which allows testing complete ciphers,while [9] is limited by the inner complexity. The possibilityto check non-completeness on a whole cipher led us to find aflaw in the round-based KECCAK implementation from [12],where only analyzing the S-box the problem would not befound since it is caused by the surrounding operations.

VI. CONCLUSIONS

We have presented VerMI, a formal verification tool to helpin automating the security evaluation of cryptographic hardwareimplementations. Our tool is able to address univariate securityof any logic circuit, even in the presence of glitches, by check-ing non-completeness. Furthermore, the tool is able to checkwhether uniformity is preserved in smaller blocks. Thus, it isable to give a complete security assessment for any hardwaremasking scheme published to date. While benchmarking VerMI,we found a flaw in the round-based KECCAK implementationfrom Gross et al. [12]. They aknowledged our findings andupdated their work in [13].

ACKNOWLEDGMENT

This work was supported in part by the Research CouncilKU Leuven: C16/15/058 and OT/13/071, by the NIST ResearchGrant 60NANB15D346 and the EU H2020 project FENTE.

REFERENCES

[1] T. P. S. Mangard, E. Oswald, Power Analysis Attacks: Revealing thesecrets of smart cards. Springer Science+Business Media, LLC, 2007.

[2] S. Nikova, C. Rechberger, and V. Rijmen, Threshold implementationsagainst side-channel attacks and glitches. In ICICS, 2006.

[3] B. Bilgin, B. Gierlichs, S. Nikova, V. Nikov, and V. Rijmen, Higher-orderthreshold implementations. In ASIACRYPT, 2014.

[4] O. Reparaz, B. Bilgin, S. Nikova, B. Gierlichs, and I. Verbauwhede,Consolidating Masking Schemes. In CRYPTO, 2015.

[5] H. Groß, S. Mangard, and T. Korak, An Efficient Side-Channel ProtectedAES Implementation with Arbitrary Protection Order. In CT-RSA, 2017.

[6] J. Cooper, E. D. Mulder, G. Goodwill, J. Jaffe, G. Kenworthy, andP. Rohatgi, Test Vector Leakage Assessment (TVLA) methodology inpractice. International Cryptographic Module Conference, 2013.

[7] G. Barthe, S. Belaıd, F. Dupressoir, P. Fouque, B. Gregoire, and P. Strub,Verified Proofs of Higher-Order Masking. In EUROCRYPT, 2015.

[8] J.-S. Coron, Formal Verification of Side-channel Countermeasures viaElementary Circuit Transformations. Cryptology ePrint A., 2017/879.

[9] R. Bloem, H. Groß, R. Iusupov, B. Konighofer, S. Mangard, and J. Winter,Formal Verification of Masked Hardware Implementations in the Presenceof Glitches. In EUROCRYPT, 2018.

[10] G. Bertoni and M. Martinoli, A Methodology for the Characterisationof Leakages in Combinatorial Logic. In SPACE, 2016.

[11] O. Reparaz, Detecting Flawed Masking Schemes with Leakage DetectionTests. In FSE, 2016.

[12] H. Gross, D. Schaffenrath, and S. Mangard, Higher-Order Side-ChannelProtected Implementations of KECCAK. In Euromicro DSD, 2017.

[13] ——, Higher-Order Side-Channel Protected Implementations of Keccak.Cryptology ePrint A., 2017/395.

[14] A. G. Bayrak, F. Regazzoni, D. Novo, and P. Ienne, “Sleuth: Automatedverification of software power analysis countermeasures,” In CHES, 2013.

[15] V. Arribas, B. Bilgin, G. Petrides, S. Nikova, and V. Rijmen, “Rhythmickeccak: Sca security and low latency in hw,” In IACR TCHES, 2018.

[16] E. Trichina, Combinational Logic Design for AES SubByte Transforma-tion on Masked Data. Cryptology ePrint A., 2003/236, 2003.

[17] Y. Ishai, A. Sahai, and D. Wagner, Private Circuits: Securing Hardwareagainst Probing Attacks. In CRYPTO, 2003.

[18] G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche, Building poweranalysis resistant implementations of Keccak. In Second SHA-3Candidate Conf., 2010.

[19] J. Daemen, Changing of the Guards: A Simple and Efficient Method forAchieving Uniformity in Threshold Sharing. In CHES, 2017.

[20] B. Bilgin, J. Daemen, V. Nikov, S. Nikova, V. Rijmen, and G. V. Assche,Efficient and First-Order DPA Resistant Implementations of Keccak. InCARDIS, 2014.

[21] T. D. Cnudde, O. Reparaz, B. Bilgin, S. Nikova, V. Nikov, and V. Rijmen,Masking AES with d+1 Shares in Hardware. In CHES, 2016.

vermi: veriﬁcation tool for masked implementations · ti provides security without the ideal-gate...

Documents