software watermarking via opaque predicates: implementation
TRANSCRIPT
ICECR 2004
Software Watermarking via Opaque Predicates:
Implementation, Analysis, and Attacks
Ginger Myles
Christian Collberg
{mylesg,collberg}@cs.arizona.edu
University of Arizona
Department of Computer Science
– p. 1
ICECR 2004
What is Software Watermarking?
Technique used to aid in the prevention of software piracy.
embed(P, w, key) → P ′
recognize(P ′, key) → w
Watermark: w uniquely identifies the owner of P .
Fingerprint: w uniquely identifies the purchaser of P .
Watermarked Program
OriginalProgram
EmbedWatermark
ExtractWatermark
WW KK
P P′
– p. 2
ICECR 2004
What is Software Watermarking?
Technique used to aid in the prevention of software piracy.
embed(P, w, key) → P ′
recognize(P ′, key) → w
Watermark: w uniquely identifies the owner of P .
Fingerprint: w uniquely identifies the purchaser of P .
Watermarked Program
OriginalProgram
EmbedWatermark
ExtractWatermark
WW KK
P P′
– p. 2
ICECR 2004
What is Software Watermarking?
Technique used to aid in the prevention of software piracy.
embed(P, w, key) → P ′
recognize(P ′, key) → w
Watermark: w uniquely identifies the owner of P .
Fingerprint: w uniquely identifies the purchaser of P .
Watermarked Program
OriginalProgram
EmbedWatermark
ExtractWatermark
WW KK
P P′
– p. 2
ICECR 2004
What is Software Watermarking?
Static: the watermark is stored directly in the data or codesections of a native executable or class file. Make use of thefeatures of an application that are available at compile-time.
Dynamic: the watermark is stored in the run-time structuresof the program.
– p. 3
ICECR 2004
What is Software Watermarking?
Blind: the recognizer is given the watermarked program andthe watermark key as input.
Informed: the recognizer is given the watermarked programand the watermark key as input and it also has access to theunwatermarked program.
– p. 4
ICECR 2004
Why use Software Watermarking?
Discourages illegal copying and redistribution.
A copyright notice can be used to provide proof of ownership.
A fingerprint can be used to trace the source of the illegalredistribution.
Does not prevent illegal copying and redistribution.
– p. 5
ICECR 2004
How can we watermark software?
Insert new (non-functional or nonexecuted) code
Reorder code where it does not change the functionality
Manipulate instruction frequencies
� �
switch ( E ) {
case 1 : { · · ·}
case 5 : { · · ·}
case 9 : { · · ·}
}� �
⇒
� �
switch ( E ) {
case 5 : { · · ·}
case 1 : { · · ·}
case 9 : { · · ·}
}� �
– p. 6
ICECR 2004
Opaque Predicates
Used to make it more difficult for an adversary to analyze thecontrol-flow of the application.
Can make it more difficult to identify that certain portions ofthe application are superfluous.
– p. 7
ICECR 2004
Opaque Predicates
Opaque Predicate: A predicate P is opaque at a programpoint p, if at point p the outcome of P is known at embeddingtime.
Opaque Method: A boolean method M is opaque at aninvocation point p, if at point p the return value of M is knowat embedding time.
– p. 8
ICECR 2004
Opaque Predicates
Opaque Predicate: A predicate P is opaque at a programpoint p, if at point p the outcome of P is known at embeddingtime.
Opaque Method: A boolean method M is opaque at aninvocation point p, if at point p the return value of M is knowat embedding time.
– p. 8
ICECR 2004
Opaque Predicates
Variety of techniques have been suggested
number theoretic results
pointer aliases
concurrency
Example: x(x + 1) (mod 2) ≡ 0
– p. 9
ICECR 2004
Arboit Algorithm
Originally proposed by Genevieve Arboit at ICECR-5
Embeds the watermark at select branching points in theapplication through the use of opaque predicates.
– p. 10
ICECR 2004
Arboit Algorithm
Two techniques were originally proposed:
1. An opaque predicate is appended to the predicate at theselected branch.
� �
c l a s s C{
vo id m1( i n t a , i n t b ){
. . .
i f ( a <= b ) { . . . }
e l s e { . . . }
. . .
}
}� �
W⇒
� �
c l a s s C{
vo id m1( i n t a , i n t b ){
. . .
i f ( ( a <= b) &&
( c∗c >= 0){ . . .}
e l s e { . . . }
. . .
}
}� �
– p. 11
ICECR 2004
Arboit Algorithm
Two techniques were originally proposed:
2. A call to an opaque method is appended to the predicate atthe selected branch.
� �
c l a s s C{
vo id m1( i n t a , i n t b ){
. . .
i f ( a <= b ) { . . . }
e l s e { . . . }
. . .
}
}� �
W⇒
� �
c l a s s C{
boolean m2(){
i n t c = 1 ;
r e t u r n ( c∗c >= 0);
}
vo id m1( i n t a , i n t b ){
. . .
i f ( ( a <= b) &&
m2 ( ) ) { . . . }
e l s e { . . . }
. . .
}
}� �
– p. 12
ICECR 2004
Arboit Algorithm
Watermark Encoding:
Watermark string is encoded as an integer.
Split into k pieces {w1, ..., wk} where 0 ≤ wi ≤ n.
Order of the pieces is unimportant.
– p. 13
ICECR 2004
Arboit Algorithm
Watermark Encoding:
wi is encoded in opaque predicate in one of two ways:
1. Use constants in the predicate.4|x2(x + 1)(x + 2) encodes the value 6.Can insert new constants in the predicate. Can encodethe value 42 by multiplying both sides by 18.(18)(4)|(18)x2(x + 1)(x + 2).
2. Assign a rank to each opaque predicate in the library.
– p. 14
ICECR 2004
Arboit Algorithm
Added features to improve strength:
Identify local variables to use in opaque predicate throughslicing.
If wi is encoded using rank and opaque methods, it may bepossible to reuse methods instead of adding k new methods.
– p. 15
ICECR 2004
Dynamic Arboit Algorithm
Motivation:
To examine if converting a known static algorithm to adynamic algorithm created a technique which is inherentlymore resilient to attack.
It is believed that truly dynamic algorithms are more resilient.
– p. 16
ICECR 2004
Dynamic Arboit Algorithm
Uses the program’s execution state for both embedding andrecognition.
The application is run with a specific input.
The order of X’s and O’s placed on a Tic-Tac-Toe board.
The execution path is used to identify the branching points.
– p. 17
ICECR 2004
Implementation
Implemented in Java using the BCEL bytecode editor.
Incorporated into the SandMark framework.
Static algorithms can be applied to an entire application or asingle class file,but the dynamic can only be used on an entireapplication.
– p. 18
ICECR 2004
Arboit Algorithm Evaluation
Performed a variety of empirical tests to evaluate eachalgorithm’s overall effectiveness.
Implementation within SandMark facilitated the study ofmanual attacks and the application of obfuscations.
The evaluation examined six software watermarking properties.
– p. 19
ICECR 2004
Watermark Evaluation Properties
Credibility: The watermark should be readily detectable forproof of authorship while minimizing the probability ofcoincidence.
Data-rate: Maximize the length of message that can beembedded.
Perceptual Invisibility (Stealth): A watermark should exhibit
the same properties as the code around it so as to makedetection difficult.
Part Protection: A good watermark should be distributedthroughout the software in order to protect all parts of it.
Overhead: A watermark should have little impact on theperformance of the application and the embedding/recognitionprocedure should not be costly.
– p. 20
ICECR 2004
Watermark Evaluation Properties
Credibility: The watermark should be readily detectable forproof of authorship while minimizing the probability ofcoincidence.
Data-rate: Maximize the length of message that can beembedded.
Perceptual Invisibility (Stealth): A watermark should exhibit
the same properties as the code around it so as to makedetection difficult.
Part Protection: A good watermark should be distributedthroughout the software in order to protect all parts of it.
Overhead: A watermark should have little impact on theperformance of the application and the embedding/recognitionprocedure should not be costly.
– p. 20
ICECR 2004
Watermark Evaluation Properties
Credibility: The watermark should be readily detectable forproof of authorship while minimizing the probability ofcoincidence.
Data-rate: Maximize the length of message that can beembedded.
Perceptual Invisibility (Stealth): A watermark should exhibit
the same properties as the code around it so as to makedetection difficult.
Part Protection: A good watermark should be distributedthroughout the software in order to protect all parts of it.
Overhead: A watermark should have little impact on theperformance of the application and the embedding/recognitionprocedure should not be costly.
– p. 20
ICECR 2004
Watermark Evaluation Properties
Credibility: The watermark should be readily detectable forproof of authorship while minimizing the probability ofcoincidence.
Data-rate: Maximize the length of message that can beembedded.
Perceptual Invisibility (Stealth): A watermark should exhibit
the same properties as the code around it so as to makedetection difficult.
Part Protection: A good watermark should be distributedthroughout the software in order to protect all parts of it.
Overhead: A watermark should have little impact on theperformance of the application and the embedding/recognitionprocedure should not be costly.
– p. 20
ICECR 2004
Watermark Evaluation Properties
Credibility: The watermark should be readily detectable forproof of authorship while minimizing the probability ofcoincidence.
Data-rate: Maximize the length of message that can beembedded.
Perceptual Invisibility (Stealth): A watermark should exhibit
the same properties as the code around it so as to makedetection difficult.
Part Protection: A good watermark should be distributedthroughout the software in order to protect all parts of it.
Overhead: A watermark should have little impact on theperformance of the application and the embedding/recognitionprocedure should not be costly.
– p. 20
ICECR 2004
Watermark Evaluation Properties
Resilience: A watermark should withstand a variety of attacks
– p. 21
ICECR 2004
Watermark Evaluation Properties
Resilience: A watermark should withstand a variety of attacks
Subtractive Attack: The adversary attempts to remove allor part of the watermark.
AliceBob
W K
P′′P
′P
K
– p. 22
ICECR 2004
Watermark Evaluation Properties
Resilience: A watermark should withstand a variety of attacks
Additive Attack: The adversary adds a new watermark.
Alice Bob
W W1
W1
WW
AdditiveAttack
P′
P
K
K1
KP′′
– p. 23
ICECR 2004
Watermark Evaluation Properties
Resilience: A watermark should withstand a variety of attacks
Distortive Attack: The attacker applies a series ofsemantics-preserving transformations to render thewatermark useless.
AliceBob
W W’ W’
DistortiveAttack
K
PK
P′ P
′′
– p. 24
ICECR 2004
Watermark Evaluation Properties
Resilience: A watermark should withstand a variety of attacks
Collusive Attack: The adversary compares two differentlyfingerprinted copies of the software to identify the location.
Alice Bob
F1
F2
CollusiveAttackP1
PP
K1
K2
P2
– p. 25
ICECR 2004
Summary of Results
Technique 1 is stronger than Technique 2.
Technique 1 has a lower overhead, was more resilient toattack, and demonstrates a higher degree of stealth.
Demonstrated that the dynamic algorithm is only minimallystronger than the static version.
– p. 26
ICECR 2004
Summary of Contributions
We added features to the original techniques to improve thestrength.
slicing to identify live variables
method reuse
Presented a novel extension of the technique to study staticversus dynamic algorithms.
Implemented and evaluated all techniques.
– p. 27
ICECR 2004
A shameless plug to conclude
http://www.cs.arizona.edu/sandmark
– p. 28