adaptive traitor tracing with bayesian networks

Upload: philip-zigoris

Post on 31-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Adaptive Traitor Tracing with Bayesian Networks

    1/6

    Adaptive Traitor Tracing with Bayesian Networks

    Philip ZigorisDepartment of Computer Science

    University of California, Santa CruzSanta Cruz, California 95060

    [email protected]

    Hongxia JinContent Protection Group

    IBM Almaden Research CenterSan Jose, California 95120

    [email protected]

    Abstract

    The practical success of broadcast encryption hinges on theability to (1) revoke the access of compromised keys and (2)determine which keys have been compromised. In this work we focus on the latter, the so-called traitor tracing problem.We present an adaptive tracing algorithm that selects forensictests according to the information gain criteria. The results of

    the tests rene an explicit, Bayesian model of our beliefs thatcertain keys are compromised. In choosing tests based on thiscriteria, we signicantly reduce the number of tests, as com-pared to the state-of-the-art techniques, required to identifycompromised keys. As part of the work we developed an ef-cient, distributable inference algorithm that is suitable for ourapplication and also give an efcient heuristic for choosingthe optimal test.

    IntroductionIn 1996, the rst DVDs were released just in time for theChristmas season. A relatively weak copy protection sys-tem, the Content Scrambling System (CSS), was hastily putin place in order to stave off digital piracy. Under CSS, ev-ery device manufacturer is assigned a single, condentialkey that is embedded in the devices they produce, allowingthose devices to descramble the contents of a DVD. ( Lots-piech 2005 )

    In 1999, a 16 year old Norwegian reverse engineered asoftware player and exposed the rst of 400 manufacturerskeys. The remaining 399 keys were exposed and publishedon the internet within weeks. In every sense, the cat was outof the bag. ( Lotspiech 2005 )

    As bandwidth has increased and le sharing on the in-ternet has become rampant, the stakes are even higher thanbefore. Swearing not to make the same mistake twice, theentertainment industry has invested quite a bit in new broad-cast encryption schemes. There are two key features inbroadcast encryption: rst, it is possible to revoke accessat the level of an individual player and, second, there is amethod for identifying compromised keys in use in illegiti-mate players/decoders. Coupled, these features make it im-possible for someone to produce software or hardware thatis not sanctioned by the licensing agency.

    Copyright c 2007, Association for the Advancement of ArticialIntelligence (www.aaai.org). All rights reserved.

    Of interest here is the second task, identifying compro-mised keys, which is referred to as traitor tracing . Initially,there was interest in tracing algorithms that treat the illegiti-mate player, or clone box , as a stateless black-box, avoidingthe need for any kind of reverse engineering. Traitor tracingis done by constructing informative tests that are submittedto the box, of which the output provides some informationabout which keys the clone box contains. Existing tracingalgorithms, while efcient in principle, may require manyyears of operation for realistic scenarios. In practical terms,the clone box will have effectively defeated the tracing strat-egy. We will describe this process in more detail in the nextsection.

    As we will also see, we can, in some sense, deduce thebest adversarial strategy for the clone box. That is, given aseries of tests, what is the response of the clone box that willmaximize the number of tests required for the tracing algo-rithm to succeed. Assuming we know the clone boxs strat-egy, we can leverage this extra bit of information to greatlyaccelerate tracing. In this sense, ours is not truly a black-boxtracing algorithm but it represents a signicant improvement

    over the existing technology.Underlying our approach is a Bayesian Network used toexplicitly represent our belief that certain keys are compro-mised. Not only does this allow for accurate diagnosis, wecan also quantify how informative a test is which we thenuse this to guide the tracing process. In this paper we presenta scalable inference algorithm that takes advantage of suf-cient statistics in the model as well as an effective heuristicmethod for selecting the optimal test.

    Finally, we demonstrate, empirically, that our method of-fers a dramatic reduction in the number of tests needed fortracing, as compared to the state of the art tracing algo-rithms.

    NNL Traitor TracingIn this section we present a simplied version of the traitortracing procedure given by Naor et. al ( Naor, Naor, & Lot-spiech 2001 ).

    We formalize traitor tracing as follows: let F be the setof valid device keys and the clone box owns a subset C F of these keys. Let m = |F | and n = |C|. The set C is,of course, unknown to the tracing algorithm. The task isto determine, within some specied condence , at least

  • 8/14/2019 Adaptive Traitor Tracing with Bayesian Networks

    2/6

    Algorithm 11: NNLTrace( F ,a ,b,p T a , pT b )2: if a = b 1 then3: return b4: else5: c a + b26: T c {f c+1 , . . . , f | F | }7: if | pT c pT a | | pT c pT b | then8: return NNLTrace (F ,a ,c ,p T a , pT c )9: else

    10: return NNLTrace (F ,c ,b,pT c , pT b )

    one key k F that is owned by the clone box. That is,P r (k C) > 1 . We can then invalidate key k, i.e.remove it from F , and reiterate the tracing procedure.

    Broadcast encryption works by rst encoding the media,e.g. HD movie, with some key K , referred to as the mediakey. Then, the media key is encrypted separately with eachof the valid device keys in D. As long as a device owns avalid key, it can decrypt the media key and then decrypt themedia.

    To construct a test, we disable a subset of the valid devicekeys. We do this by encrypting a random bit string instead of K . The remaining device keys are said to be enabled . Now,if all of the device keys owned by a device are disabled, thereis no way for it to recover the media. However, built in tothe protocol is a method for the device to validate the mediakey, meaning that it can recognize when it is under test if itcontains a disabled key. Therefore, if a device owns disabledand enabled keys, it can respond strategically; it has a choicewhether or not to play/decrypt the media and it can respondnon-deterministically. We will say more about this shortly.

    From here on, a test can be thought of as simply a setT F of keys. The keys in T are enabled and the keys in

    F /T are disabled. Let

    pT be the probability that the clone

    box plays test T . If the probability of playing two tests, T and T , are not equal then it must be case that the clonebox owns a device key in the exclusive-or of the two sets.This motivates NNLTrace (Algorithm 1), a binary-search-like method for identifying a compromised key. We ini-tially call the procedure with the arguments (F , 0,m,p F , 0).The algorithm proceeds by progressively reducing the inter-val (a, b) in which we know there must be a compromiseddevice-key. It does this by determining the probability thatthe midpoint test T c plays and recursing on the the side withthe larger difference in the endpoint probabilities.

    The challenge in this style of tracing algorithm is that wemust estimate pT c by repeatedly submitting T c to the clone

    box. The smaller | pT a pT b | , the more tests that are neededto condently decide which is the larger subinterval. This isadvantageous to the clone box. However, the subset tracingprocedure always recurses on the larger subinterval and soat every step the gap is reduced by, at most, a factor of 2.This gives us the following lower bound on the gap on thelast iteration:

    | pa pa +1 | 1

    2log 2 | C |=

    1|C |

    This lower bound is achieved by the uniform choice strategy:the clone box tries to decode the media key with one of itsdevice keys chosen uniformly at random. If the key chosenis enabled then the clone box successfully plays the test. If it disabled then it does not. Formally,

    P r (C plays T ) =|T C |

    |C |.

    Previous empirical work also substantiates the claim that thisis the best clone box strategy ( Zigoris 2006 ) and it will re-main our focus in subsequent sections.

    Naor et. al show that, regardless of the strategy, the num-ber of tests required for NNLTrace to succeed with probabil-ity at least 1 is upperbounded by O(m2 log log m log3 m).However, a clone box can take measures to delay its re-sponse to each test and for realistic scenarios, which includedetails beyond the description given here, we can expect thistracing algorithm to take upwards of 15 years to defeat aclone box. One would hardly call this a success.

    Traitor Tracing with a Known StrategyThe second shortcoming is that it cannot take advantageof any prior knowledge about the clone boxs strategy. Itis by leveraging this information, using the machinery of Bayesian networks and the insights of information theory,that we will be able to demonstrate signicant gains overNNLTrace.

    First, we introduce some additional notation. Denote by T the random variable associated with the outcome of a test orthe set of enabled keys comprising a test. The intent shouldbe clear from the context. Let R be the set of possible re-sponses to a test. In previous sections the set of responsesR was limited to the set {0, 1}; it either plays or it does not.Our approach can accommodate a richer set of responses, apossibility we will study more closely in the experimentalsection. Associated with each key k F is a binary ran-dom variable F k ; F k = 1 is meant to imply that k C. Werefer to these random variables, collectively, as F . In manycontexts it is useful to treat F as the set of keys for whichF k = 1 . That is, F represents our belief about which keysare in the clone box.

    The question we would like to answer is: provided with aset of test-response pairs T = t , where T i are independent(the clone box is stateless) but not identical, what is the pos-terior probability that the clone box contains a particular setof keys F . Applying Bayes rule we have:

    P r (F | T = t) =P r (T = t | F )P r (F )

    P r (T = T )

    = iP r (T i = t i | F )P r (F )

    P r (T = T )where the second equality derives from the fact that the re-sults of tests are independent conditioned on F . The termP r (T i = t i | F ) is specied by the uniform choice strategy.

    The term P r (F ) species our belief, prior to any obser-vations, that the clone box contains a set of keys F . Withoutany background knowledge we choose the uniform distribu-tion. It is possible to embed domain specic knowledge in

  • 8/14/2019 Adaptive Traitor Tracing with Bayesian Networks

    3/6

    the prior to give the process a head start. For example, if we knew that a particular subset of devices were more likelyto have their keys compromised then we could increase theprior probability that the clone box contained one of thosekeys.

    The denominator P r (T = t) is the marginal probabilityof observing the responses to the set of tests and is denedas:

    P r (T = t) = F F P r (

    T = t | F ).

    The sum is taken over a set of size 2m , making this a difcultquantity to calculate. We will address this issue later. Fornow, we will assume that it can be calculated efciently andexactly.

    The basic procedure is specied in Algorithm 2. We it-eratively choose tests to submit to the clone box and updateour beliefs about the contents of the clone box. If at anypoint we can diagnose a compromised key (line 6) we re-turn the key as well as our current set of beliefs.

    Algorithm 2 Strategy based traitor tracing procedure

    IGTrace( F , P r (F ))if response of clone-box to T = F is 0 thenreturn [, P r (F )] //in this case, C =

    loopfor all k F do

    if P r (F k = 1) > 1 thenreturn [k ,Pr (F )]

    select a test T submit test to clone-box and get response tP r (F ) P r (F | T = = t)

    Test Selection with Information Gain

    We can imagine the testing process as a decision tree whereevery node species the next test to submit to the clone box.The response of the clone box dictates the subtree on whichto recurse. Every leaf in the decision tree will have a keyassociated with it and reaching it in the recursion impliesthe associated key has been compromised. Ideally, we couldminimized the expected number of tests needed by mappingout an entire testing strategy. However, this task was shownto be NP-Complete (Hyal & Rivest 1976 ).

    Instead of devising such a decision tree at once, we takea greedy approach to test selection. First, we quantify theuncertainty about F as the entropy :

    H (F | T = t) =

    F F

    P r (F | T = t)log2 P r (F | T = t)

    We measure the quality of a new test T as the mutual infor-mation 1 between the T and F . This is equal to the expectedreduction in entropy by seeing the result of test T . This ex-pectation is taken with respect to the marginal probabilitiesof each of the possible responses t R . Note that it is

    1Sometimes referred to in the machine learning literature as theinformation gain

    possible that the marginal probability P r (T = t | T = t)is very different than the true probability of response t ,P r (T = t | F = C). For more background on informationtheory see, e.g., ( MacKay 2003) .

    Now we can view test selection as the following optimiza-tion problem:

    T = argmax T F I (F ; T | T = t)

    The maximum is taken over 2m

    possible tests and we knowof no algorithm for efciently solving this problem. It isnecessary to approximate this by only considering a smallsubset of the possible tests. To do this, we maintain a setS of tests called the retention set . At every iteration we tryadding a new key to each test in S , creating a new set of testsS . We then update S to be the top s tests in S . If after an it-eration the retention set does not change, then we return thetop test in S . The total number of tests evaluated by this pro-cedure is O(s2m2) (Zigoris 2006 ). Technically, evaluatingthe gain of each test is exponential in the number of devicekeys m , but we will introduce a method for approximatingthis quantity.

    Related WorkInformation gain is a popular criteria for selecting tests inadaptive diagnosis tasks. Rish, et. al ( Rish et al. 2005 )apply a method similar to ours to the task of diagnosingfaults in distributed system. There are a few key differences.First, they choose tests from a small pool whereas we arefaced with an exponential number of tests. Secondly, theyassume there is only a single fault in the system whereasin our work there can be multiple faults, i.e. compromiseddevice keys. They also use an approximate inference algo-rithm, mini-buckets, whereas we will devise an efcient andexact inference algorithm for our model.

    Bayesian networks and information gain have also beenapplied to a probabilistic version of the game of Mastermindwith reasonable results ( J. 2004) . Mastermind is a popularboard game created in the 70s where two players engagein a query-response loop. In the original formulation, gameplay is completely deterministic. The goal is for one player,the code-breaker, to identify the code put down by the code-maker. In many ways, it is similar to the task we are facewith. We can imagine the clone box as a secret code thatwe are trying to reveal. In our game, though, we are onlyrequired to nd one bit in the code.

    There are other measures of uncertainty, of course. InBayesNNLTrace, for instance, we measured our uncertaintyas the variance of the Beta distribution corresponding to theendpoints of each subinterval. Measures such as the Giniimpurity and misclassication impurity are also popular forlearning decision trees ( Duda, Hart, & Stork 2000 ).

    Representation and Inference with BayesianNetworks

    So far we have avoided any discussion of how to computethe necessary probabilities for our algorithm. There are twopoints to keep in mind. The rst is that an erroneous di-agnosis, which results in the disabling of a device, is very

  • 8/14/2019 Adaptive Traitor Tracing with Bayesian Networks

    4/6

    costly. This imposes the constraint that the marginal prob-ability that F k = 1 must be calculated exactly. Therefore,we cannot rely on approximate inference algorithms such asloopy belief propagation ( Pearl 1988 ). However, test selec-tion, which depends on the full joint distribution of F , is al-ready approximate so its reasonable to approximate the joint.In this section we present an efcient (polynomial time) in-ference algorithm that does exactly these two things.

    It helps to visualize our statistical model as a BayesianNetwork. For our application, the nodes in our network cor-respond to the tests T and the keys F k in the frontier. As arst approach, consider complete bipartite network in Figure1. The conditional probability distribution (CPD) for eachtest is specied by the clone box strategy. For every key kwe must specify the prior probability that F k = 1 . Withoutany background information, we set this to 0.5. However,in practice we may have reason to believe that some keysare more likely to be compromised. Or, if we expect that|C| = c, then we set P r (F k = 1) = cm .

    T e s t s

    F r o n t i e r

    K1 K2 Kn

    T1=t1 T2=t2 Tm=tm

    Figure 1: A simple bipartite network for our task. Everyvariable in the test layer is conditioned on every variable inthe frontier.

    The problem with using this representation is adding a

    test result as evidence creates a dependency between thekey nodes. Thus, in order to nd the marginal probabil-ity P r (F k |T = t) for one key k in the frontier, standardinference algorithms, such as variable elimination (Dechter1999 ) and junction tree ( Huang & Darwiche 1996) , mustcreate intermediate factors whose table sizes are exponen-tial in |F | .

    For arbitrary clone-box strategies, we can not expect to dobetter than this. However, the uniform choice strategy hada very concise expression; it depended only on the ratio of enabled to disabled compromised keys. It does not dependon the specic keys that are compromised but only on howmany are enabled and disabled.

    We can exploit these sufcient statistics to gain efciency

    in our model. The modied network is illustrated in Figure2. Let F E = {F k : k T } be the set of keys enabled inT ; similarly, let F D = {F k : k T } be the set of disabledkeys. Form two binary trees with F E and F D as the leavesof each and with edges pointing towards the root. Denote,respectively, the roots of these trees as E and D ; connect anedge from both E and D to T . Each of the internal nodes issimply the sum of its parents: P r (X = x | p(X )) = ( x =

    Y p(X ) Y ). The CPD of T is now P r (T = 1 | E, D ) =

    Frontier

    F E F D

    Test

    Internal

    K1

    T

    K6

    E D

    K2 K3 K4 K5 K7

    Sum Sum Sum

    Figure 2: An alternative network structure for the xed- pstrategy. The internal nodes act as an interface between thefrontier and the test layer, calculating the sufcient statisticsE and D . Reducing the frontier in this way makes efcientinference possible.

    EE + D . Note that in tabular form the CPDs are O(|F |

    2) insize.

    We now present a specic method for storing and updat-ing beliefs that is particularly well-suited to our application.In order to approximate the joint distribution of F we rstpartition the frontier into disjoint sets {F i } such that 2|F

    i |

    is a reasonable size for all i, allowing us to store P r (F i ) asa table in memory. We approximate our belief as

    P r (F ) i

    P r (F i )

    which is equivalent to assuming the partitions are indepen-dent. If X and Y are independent random variables then

    H (X, Y ) = H (X ) + H (Y )where H is the entropy function introduced in the section ontest selection. Whereas before, calculating information gain,and implicitly entropy, required us to sum over 2|F| terms,we can now decompose the calculation into a sum of sumsover 2|F

    i | terms. Note, too, that since P r (F i ) is in memory,calculating P r (k C) for some key k F i can be done inO(2 |F

    i | ) time. With a simple caching strategy, this can alsobe reduced to constant time.

    Algorithm 3 gives the procedure for nding the poste-rior distribution of a partition given test result T = t . Theprocedure getCountDistribution nds the probability distri-bution over the number of enabled and disabled keys in a set

    of partitions. Also, P r (T = t), which appears in the infor-mation gain, is calculated as a side effect of the procedure.The terms |T F i | and |F /T F i | in line 13 correspond tothe number of enabled and disabled keys in F i , respectively.The bottleneck in this procedure is calculating P r (E, D ),which would comprise an O(m2) size table. One nice fea-ture of this approach is that theposteriordistribution for eachpartition can be calculated independently making it easy todistribute across multiple servers.

  • 8/14/2019 Adaptive Traitor Tracing with Bayesian Networks

    5/6

    Algorithm 3 nds P r (F a |T = t) and P r (T = t)1: getPosterior( T,t, {F i }, {P r (F i )}, a )2: P r (E, D ) getCountDistribution (T, {F i : i =

    a}/, {P r (F i ) : i = a})3: P r (T = t | F a ) e,d P r (T = t | F

    a , E = e, D =d)P r (E = e, D = d)

    4: P r (T = t) F a P r (T = t | F a )P r (F a )

    5: P r (F a |T = t) P r (T = t | F a )P r (F a )/P r (T = t)

    6: return P r (F a |T = t) and P r (T = t)7:8:9: getCountDistribution( T, {F i }/, {P r (F i )})

    10: P r (E, D ) 011: P r (E = 0 , D = 0) 112: for all i do13: P r (E, D ) F i P r (F i )P r (E | T F i | , D

    |F /T F i |)

    ExperimentsIn this section we present empirical work comparing thenumber of tests needed by the described methods. All exper-iments were repeated 20 times for random choices of com-promised keys. In all experiments, we use = 0 .001 andR = {0, 1} Preliminary experiments showed that the num-ber of tests is not particularly sensitive to s , the retention setsize, so in all experiments s = 2 . We only report results forthe uniform choice strategy since it is, theoretically, the mostchallenging strategy to defeat. Preliminary experiments alsoconrmed this was the case.

    All experiments were run using Matlab running underLinux, with 3GB of memory and a 2Ghz Pentium 4 proces-sor. The code was optimized to take advantage of Matlabsefcient matrix operations and m = 18 , exact inference andtest selection takes around 45 seconds. The code for do-ing inference with partitioned device keys did not scale verywell and so we were unable to evaluate it for very large m .

    Watermarking, |R | > 2In this experiment we assume there is some watermarkingmechanism built into the encryption scheme. That is, everykey k has a watermark r (k) associated with it. If the clonebox decrypts the test T with k then the response to T is r (k).This puts the clone box at a disadvantage since it may revealmuch more information about its contents. For instance, if each of the keys in the frontier has a different watermark,then we can immediately identify a compromised key when

    the clone box plays. To clarify, we restate the the CPD fortest T = t , for t = 0 , as:

    P r (T = t |C) = |{k C: r (k) = t}|/ |C|

    For every trial, both the set of compromised keys and thewatermark assignment are chosen randomly. The frontiersize is xed at 16, the number of compromised keys is variedfrom 1 to 8 and |R | 1 is varied from 1 to 16. The results areplotted in Figure 3. For clarity we omit the standard error

    0 2 4 6 8 10 12 14 1610

    0

    101

    10 2

    10 3

    Number of tests as a function of |R|and the number of compromised keys (n)

    |R|1

    l o g

    ( n u m

    b e r o

    f t e s

    t s )

    n=1

    n=2

    n=4

    n=8

    Figure 3: Results of watermarking experiment

    bars. Note that the y-axis is on a logarithmic scale. Thenumber of tests, therefore, decreases extremely fast. In thelimit, the number of tests, for any number of compromisedkeys, will converge to 1 with |R | since we expect each keyto have a unique watermark.

    Something to note, but that is difcult to read off of theplot, is that as |R | increases, having more compromised keysbecomes benecial. The reason for this is that the clone boxis more likely to contain a key with a unique watermark.Also, we gain more information when a clone box gives anon-zero response. If there only a few compromised keysthen many of the responses will be 0 since the test is morelikely, at least in the early stages, to disable a large fractionof the clone box keys.

    Partition size

    This experiment is designed to study the effect of the par-titioning size on the number of tests. Again, we vary thenumber of compromised keys n from 1 to 8 and leave thefrontier size xed at 16. The size of the frontier partitions isvaried from 2 to 16, corresponding to exact inference. Theresults are plotted in Figure 4.

    Intuition tells us that the number of tests should decreaseas the partition size increases. What is surprising is that forn = 8 the number of tests increases with the partition size.And note that the slope of the other lines seems to increasewith n . It seems that smaller partitions actually benet thetracing process when there are a large number of compro-mised keys. Why this is the case is an intriguing question,

    one for which we dont have a complete answer. Our hy-pothesis is that information gain is not always the best op-timization criterion for selecting a test. Selection based oninformation gain represents a greedy step towards minimiz-ing the entropy of F . The true aim of our procedure is todiagnosing a compromised key, not minimizing the entropy.Information gain seeks to minimize our uncertainty about allkeys, but in actuality we only need to minimize our uncer-tainty about one key. By nely partitioning the frontier we

  • 8/14/2019 Adaptive Traitor Tracing with Bayesian Networks

    6/6

    0 2 4 6 8 10 12 14 16 180

    50

    100

    150

    200

    250

    size of frontier partitions

    n u m

    b e r o

    f t e s

    t s

    Number of tests as a function of partition sizeand number of compromised keys, n

    n=1

    n=2

    n=4

    n=8

    Figure 4: Results of experiment varying partition size.

    |C||F | 2 4 8

    8IG 19 8.0 41 12.6 42 16.1

    NNL 605 181.5 919 423.8 1424 160.610 IG 24 8.4 52 11.1 86 38.7 NNL 948 299.2 1408 601.2 2632 607.4

    12 IG 32 9.4 57 18.6 113 36.5 NNL 1263 415.1 1686 949.3 3562 1223.1

    14 IG 38 9.5 64 14.7 129 26.1 NNL 1366 399.4 2326 1221.7 4506 1547.4

    16 IG 47 9.0 68 14.0 159 41.6 NNL 1493 478.0 3660 1239.4 4278 1588.7

    18 IG 57 7.3 86 15.9 150 30.3 NNL 1845 488.9 4525 1457.6 6229 3199.4

    Table 1: Comparative performance of the two tracing meth-

    ods for a variety of frontier sizes and clone box sizes. Eachentry lists the mean and standard deviation of the number of tests. Column headers indicate the number of compromisedkeys.

    seem to deemphasize the global nature of information gainand instead focus on diagnosing particular keys.

    Overall comparisonFinally, in Table 1 we present a comparison of IGTrace toNNLTrace. It is immediately clear that IGTrace outperformsNNLTrace in all cases. Note, too, that there is substantiallyless variance in the number of tests needed by IGTrace.

    ConclusionThis work introduced a new method for traitor tracing whenthe strategy of the clone box is known and demonstrated,empirically, a signicant improvement over existing black-box tracing methods. We took advantage of the specicfunctional form of the uniform choice strategy to devise anefcient, and distributable, inference algorithm. This algo-

    rithm does not sacrice any accuracy in diagnosis and onlyapproximates the test selection step.

    A surprising observation was that the number of tests doesnot necessarily decrease as the partition sizes increase. Thisseems to suggest that information gain is not the best criteriafor test selection and it may be worth exploring other criteriafor test selection. Nevertheless, it seems to work very welland offers a dramatic improvement over the other methods.

    Our work does beg the question: how does one obtain in-formation about the clone box strategy? In the future, it isworth investigating methods for learning this strategy alongthe way, perhaps by representing the clone box strategy asa latent variable. Similarly, it would also be interesting toinvestigate the sensitivity of IGTrace to errors in the clonebox strategy. We would like it to be such that if the actualstrategy deviates from the modeled strategy we are still guar-anteed to make accurate diagnoses.

    References[Dechter 1999] Dechter, R. 1999. Bucket elimination: A

    unifying framework for reasoning. Articial Intelligence113(1-2):4185.

    [Duda, Hart, & Stork 2000] Duda, R. O.; Hart, P. E.; andStork, D. G. 2000. Pattern Classication . Wiley-Interscience Publication.

    [Huang & Darwiche 1996] Huang, C., and Darwiche, A.1996. Inference in belief networks: A procedural guide. Intl. J. Approximate Reasoning 15(3):225263.

    [Hyal & Rivest 1976] Hyal, L., and Rivest, R. L. 1976.Constructing optimal binary decision trees is np-complete. Inf. Process. Lett. 5(1):1517.

    [J. 2004] J., V. 2004. Bayesian networks in mastermind.Proceedings of the 7th Czech-Japan Seminar on Data Analysis and Decision Making under Uncertainty. 185190.

    [Lotspiech 2005] Lotspiech, J. 2005. Digital Rights Man-agement for Consumer Devices . chapter 23.

    [MacKay 2003] MacKay, D. J. C. 2003. Infor-mation Theory, Inference, and Learning Algo-rithms . Cambridge University Press. available fromhttp://www.inference.phy.cam.ac.uk/mackay/itila/.

    [Naor, Naor, & Lotspiech 2001] Naor, D.; Naor, M.; andLotspiech, J. B. 2001. Revocation and tracing schemesfor stateless receivers. In CRYPTO 01: Proceedings of the 21st Annual International Cryptology Conference on Advances in Cryptology , 4162. London, UK: Springer-Verlag.

    [Pearl 1988] Pearl, J. 1988. Probabilistic reasoning in intel-

    ligent systems: networks of plausible inference . San Fran-cisco, CA, USA: Morgan Kaufmann Publishers Inc.

    [Rish et al. 2005] Rish, I.; Brodie, M.; Ma, S.; Odintsova,N.; Beygelzimer, A.; Grabarnik, G.; and Hernandez, K.2005. Adaptive diagnosis in distributed systems. IEEE Transactions on Neural Networks 16:10881109.

    [Zigoris 2006] Zigoris, P. 2006. Improved traitor tracingalgorithms. Technical report, UC Santa Cruz.