optimum quantization for detector fusion: some proofs, examples, and pathology
TRANSCRIPT
![Page 1: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/1.jpg)
*Corresponding author. Tel.: 860-486-2195; e-mail: [email protected] address: Citicorp, New York City.2Supported in part by ONR/BMDO-IST grant N00014-91-J-1950 and AFOSR grant F49620-95-1-
0299.
Journal of the Franklin Institute 336 (1999) 323—359
Optimum quantization for detector fusion:some proofs, examples, and pathology
Douglas Warren!,1, Peter Willett",*,2
!Department of Electrical Engineering and Computer Science, ºniversity of Illinois at Chicago,Chicago, IL 60657, USA
"Department of Electrical/Systems Engineering, University of Connecticut, U-157, Storrs CT 06269, ºSA
Abstract
We consider optimal decentralized (or equivalently, quantized) detection for the Neyman—Pearson, Bayes, Ali—Silvey distance, and mutual (Shannon) information criteria. In all cases, it isshown that the optimal sensor decision rules are quantizers that operate on the likelihood ratioof the observations. We further show that randomized fusion rules are suboptimal for themutual information criterion.
We also show that if the processes observed at the sensors are conditionally independent andidentically distributed, and the criterion for optimization is either a member of a subclass of theAli—Silvey distances or local mutual information, then the quantizers used at all of the sensorsare identical. We give an example to show that for the Neyman—Pearson and Bayes criteria thisis not generally true. We go into some detail with respect to this last, and derive necessaryconditions for an assumptions of identical sensor quantizer maps to be reasonable. ( 1998The Franklin Institute. Published by Elsevier Science Ltd.
1. Introduction
The initial paper on the subject of distributed detection, by Tenney and Sandell [1],showed that under a fixed fusion rule, for two sensors with one bit outputs, theoptimal Bayes sensor decision rule is a likelihood ratio test. In [2], it is shown that theoptimal fusion rule for N sensors is a likelihood ratio test on the data received fromthe sensors. Reibman and Nolte [3] and Hoballah and Varshney [4] have generalized
S0016-0032/98/$ — See front matter ( 1998 The Franklin Institute. Published by Elsevier Science Ltd.PII: S0016-0032(98)00024-6
![Page 2: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/2.jpg)
the results in [1] to N sensors with optimal fusion, again with the restriction of one bitsensor outputs. Hoballah and Varshney [5] have also investigated system optimiza-tion under the aforementioned conditions for a mutual information criterion. Therestriction of the sensor outputs to single bits seems unduly harsh since it implieseither very rapid decision rates or extremely narrowband channels. In this paper, weremove this restriction and assume instead that the ith sensor produces a log
2M
ibit
output. The authors have derived some results concerning this more general case in[6]; using different techniques, Tsitsiklis has come to conclusions similar to thosepresented in this paper [7, 8], as have, for example Thomopoulos, Viswanathan et al.[9, 10].
As always, when speaking of optimality, it is necessary to impose a criterion forjudgment. For detectors, several such measures have been used. In this paper, westudy the structure of optimal decentralized detectors for the Neyman—Pearson,Bayes, Ali—Silvey, and mutual (Shannon) information criteria. We assume that,conditioned on the actual hypothesis (state of nature), the random processes observedat any two different sensors are independent. Given this assumption we show that foreach criterion, the optimal strategy is to quantize the local likelihood ratio at the sensors(to the maximum number of bits allowable), and transmit this result to the fusioncenter. The fusion center then performs a likelihood ratio test on this received data.
That to quantize the likelihood ratio is the optimal thing to do is scarcelya surprising result. Even prior to a proof of its optimality it was used by a number ofresearchers; and most certainly an excellent and rigorous proof is available in [7]. Inthat work, however, reference is made to the different and equally rigorous proof in[6], and since [6] has never been published archivally, it seems appropriate to offer itin the present paper. Certainly the field has not stood still since [6, 7], and somepapers we particularly admire are [11—13]; but to present a complete bibliography isnot the aim of this offering.
Following presentation of background material in Section 2, Sections 3 and 5,demonstrating optimality of likelihood ratio quantization under Neyman—Pearson,Bayes, and Ali—Silvey distance criteria, are more detailed versions of the material in[6]. We show in Section 4 that the detector that maximizes mutual information isidentical to the minimum Bayes cost detector for some set of prior probabilities. Wealso demonstrate that randomization at the fusion center reduces mutual information.Section 6 provides discussion and a number of examples and comparisons. One ofthese will deal with a case in which sensors observe (conditionally) iid data, yetoptimally should not use an identical quantization; in Section 7 we discuss thispathology in some depth. Finally, in Section 8 we summarize.
2. Statement of the problem
We consider the binary detection problem with N sensors. The purpose of thedetector is to discriminate optimally between two states of nature H
0and H
1. For
example, H0
may represent a noise only hypothesis whereas H1
represents signal plusadditive noise. For the Bayes and mutual information criteria, we consider the state of
324 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 3: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/3.jpg)
nature to be a random variable; for the Neyman—Pearson and Ali—Silvey criteria, it isassumed to be deterministic. The ith sensor input is a realization of the random vectorX
i3()
i,B
i) (of arbitrary dimension). The statistical behavior of this random vector is
defined by
H0
: Xi&P
0i
H1
: Xi&P
1i
where H0
and H1
are the null and alternative hypotheses, respectively, and the Pki,
i"1, 2,2,N, k"0, 1 are probability distributions. We assume throughout thatP0i
and P1i
are absolutely continuous with respect to each other; hence, the samep-algebra is used under both hypotheses.
If the channels between the sensors and the fusion center are not bandlimited, thedecentralized detection problem becomes, in effect, centralized. The sensors willthen transmit the received data directly. Let x"Mx
1, x
2,2, x
NN be a realization of the
complete (at all sensors) random data X"MX1, X
2,2,X
NN with distribution
Pk
under hypothesis Hk. The optimal centralized test for the Neyman—Pearson
and Bayes criteria is well known [14]. As we show in Section 4 (see also [15]),the same test is also optimal for the mutual information criterion [16]. This test isgiven by
/(x)"G1 if ¸ (x)'q
c if ¸ (x)"q
0 if ¸ (x)(q
(1)
where / (x) is the probability of deciding for H1when x is observed, ¸ is the likelihood
ratio dP1/dP
0, and c and q are chosen to conform to the specific performance goals of
the system designer. For Bayes detection, c is irrelevant and for mutual information itis optimally 0 or 1 (see Section 4). If we assume no bandwidth constraint and that thedata at different sensors is conditionally independent, then the sensors can transmitthe local likelihood ratio (since it is then a sufficient statistic for the detection problem[17, pp. 145—147]), rather than the data itself. We assume henceforth that the sensordata is, in fact, conditionally independent.
In general, for decentralized detection, the channels between the sensors and thefusion center will be bandlimited. Suppose that, for a given decision, the maximumnumber of bits that can be transmitted by sensor i is b
i"log
2M
i. Suppose also that
the number of possible values of ¸i(the local likelihood ratio at the ith sensor) is
greater than Mi. Under these circumstances, we must transmit a non-sufficient
statistic ºifrom the ith sensor to the fusion center. The major subject of this paper is
the optimal form of such a statistic. Since the channel bit rate is limited to bi, it is
apparent that ºimust belong to a set containing at most M
ielements. The ele-
ments themselves are irrelevant, for convenience we assume that they form the set
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 325
![Page 4: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/4.jpg)
M0, 1,2, Mi!1N and we define the probabilities a
ij"PrMº
i"jDH
0N and b
ij"
PrMºi"j DH
1N. Hence, the situation we are studying is the following: the ith sensor
receives the random data Xi; according to some yet to be determined decision rule, it
produces the statistic ºi, which takes on one of the values in the set M0, 1,2, M
i!1N
in accordance with the probabilities MaijN and Mb
ijN; the set of statistics Mº
iN,
i"1, 2,2,N, is transmitted to the fusion center where the final binary decision º0
ismade. The objective is to have a procedure (at both the sensors and the fusion center)that selects the value for º
0in an optimal way.
We assume throughout this paper that Xihas a density. This can be done without
loss of generality since we are concerned with the relative probabilities under the twohypotheses of sets of values for X
ibut never with the actual values of X
iitself. Hence,
for any Xithat does not have a density and any partition º
iof the set )
i, we can
substitute the random variable X@iand the partition º@
iso that X@
ihas a density and
º@ihas the same distribution as º
iunder both hypotheses. We do not, however,
assume that the likelihood ratio of Xihas a density since this excludes many situations
of interest (for example, additive signals in Laplace noise). One of the main points tobe demonstrated in this paper is that it is the distributions of the likelihood ratio withwhich we are most interested.
Lastly, as a notational convenience we let I`P
be the set of positive integers lessthan or equal to P and we let I
Qbe the set of non-negative integers less than or equal
to Q.
3. Neyman–Pearson and Bayes criteria
A binary detection system makes a decision that can take on one of two values. Theselection of this value amounts to deciding, based on the available data, that one of thetwo possible states of nature H
0and H
1is true. Since the data used in the decision
process is random, there is generally a non-zero probability of error associated withthe system decision. Throughout this paper, we will refer to the probability of decidingH
1when H
0is actually true as the false alarm probability, level, or size of the test. We
will refer to the probability of correctly deciding H1
as the probability of detection orthe power of the test. These two probabilities exhaustively define the statisticalproperties of the test.
3.1. Decentralized Neyman—Pearson detection
The objective for systems designed under the Neyman—Pearson criterion is toachieve maximum power for fixed test level. For an arbitrary fusion rule (with givensensor decision rules for the N sensors) and maximum overall level a
0, the ultimate
decision º0
is determined by a partition of the observation space D of the vectorMº
1,º
2,2,º
NN. The space D is the Cartesian product of the sets I
Mi~1for i3I`
N. In
the following, we denote a decision for Hk
by º0"k. Let u"Mu
1, u
2,2, u
NN be
a realization of U. If the fusion rule is chosen according to the Neyman—Pearsoncriterion (that is, the rule which maximizes the resulting power of the test), then we use
326 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 5: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/5.jpg)
a likelihood ratio test given by
/(u)"G1 if ¸(u)'q
c if ¸(u)"q
0 if ¸ (u)(q
(2)
where / (u)"PrMº0"1DU"uN,
¸ (u)"N<n/1
bnunC
N<n/1
anunD
~1(3)
and c3[0,1]. If we define D1"Mu :¸ (u)'qN, Dc"Mu :¸ (u)"qN, and D
0"DWD
1XDc,
then for the fixed test size a0, c and q are chosen so that
a0"+
D1
N<n/1
anun
#c+Dc
N<n/1
anun
(4)
The power of this test is
b"+D1
N<n/1
bnun
#c+Dc
N<n/1
bnun
(5)
It is well known [18] that the optimal Bayes test, with c"0 and an appropriatelychosen value for q, can be written in the form of (2). Hence, for a given set of priors, theoptimal Bayes test can be found by choosing an appropriate Neyman—Pearson test.Thus, it is clear that the optimal Bayes decentralized test will be determined by the setof ordered pairs (a
0, b*) where a
03(0, 1) is the test size and b* is the maximum power
possible at that size. Which particular pair is used will depend on the prior probabili-ties of H
0and H
1.
Let /i,j
(xi)"Pr(º
i"j DX
i"x
i), for each j3I
Mi~1, and each i3I`
N. If, for each x
i,
+Mi~1j/0
/i,j
(xi)"1, then the set M/
i,jN, defines a mapping of )
iinto I
Mi~1and can be
used in a decentralized detector as a sensor decision rule. We call such a mapping anM
ilevel mapping. With b
ijand a
ijas previously defined we have the following lemma.
Lemma 1. If bij/a
ij"b
ik/a
ikfor jOk, then the performance of a decentralized detection
system using an Milevel mapping as the ith sensor decision rule is equivalent to that of
a system using an Mi!1 level mapping.
Proof. Let Qn
be the probability mass function of ºi
under the hypothesis Hn,
n"0, 1. We define ui"Mu
1,2, u
i!1,u
i#1,2, u
NN and use the notation u"(u
i, u
i).
Assume that Xp
and Xq
are conditionally independent (given the hypothesis) forp3I`
N, q3I`
N, and pOq. We then have
¸ (u)"¸ (ui, u
i)"¸ (u
i)¸ (u
i)
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 327
![Page 6: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/6.jpg)
For q@"q/¸(ui), the fusion rule of Eq. (2) can be rewritten as
/(u)"G1 if ¸ (u
i)'q@
c if ¸ (ui)"q@
0 if ¸ (ui)(q@
Then (ui, u
i"k) and (u
i, u
i"j) are either both in D
0, both in D
1, or both in Dc. The
probabilities of these two points are added together in Eqs (4) and (5). Hence, withoutaffecting the level or power of the test, we can conjoin the decision regions for u
i"j
and ui"k into one region with probability Q
n( j )#Q
n(k), n"0, 1. h
Generally, there are an uncountable number of possible decision rules for thesensors. A decision rule of particular interest is given by the following definition.
Definition 1. A monotone partition of the set ) by the likelihood ratio ¸ is a collec-tion of disjoint, non-trivial sets MR
0,2, R
NN with ZN
i/0R
i") such that if x3R
j,
x@3Rj, y3R
k, y@3R
k, and ¸ (x)'¸ (y), then ¸ (x@)*¸ (y@) a.e. (P
0).
We henceforth refer to such a partition as a likelihood ratio partition. Let¸i"dP
1i/dP
0ibe the likelihood ratio of X
i. We show later that the optimal decision
rule for the ith sensor is a likelihood ratio partition of )iby ¸
i.
Lemma 2. For fixed test size, there exists an Milevel mapping such that the power of
a decentralized detection system using this Milevel mapping as the ith sensor decision
rule is no smaller than that of a system using any Mi!1 level mapping.
Proof. Subdivide the probability in the jth level by a likelihood ratio partitioninto levels j
1and j
2. Assume j
2includes the x
iwith larger likelihood ratios. We then
have
bij2
/aij2*b
ij/a
ij*b
ij1/a
ij1. (6)
and hence
¸ (ui, u
i"j
2)*¸(u
i, u
i"j)*¸(u
i, u
i"j
1)
for all ui. Assign (u
i, u
i"j
1) and (u
i, u
i"j
2) to the same D
k, k3M0, c, 1N as that to which
the original fusion rule assigned (ui, u
i"j). Then the power and the level of the test
remain unchanged. If inequality holds in Eq. (6), it may be necessary to reformulatethe fusion rule so that it performs a Neyman—Pearson test on U. Under thesecircumstances, the performance may improve. h
Lemma 3. Assume that for any finite set FLR` containing Mior fewer elements,
P0iM¸
i(X
i)3FN(1. ¹hen a decentralized detector using a likelihood ratio partition of
328 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 7: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/7.jpg)
)iis at best equivalent to one that uses some sensor decision rule of the form
Pr(ºi"j DX
i"x
i)"/
i,j(x
i)"G
1 if ti,j(¸
i(x
i)(t
i,j`1ci,j
if ¸i(x
i)"t
i,j1!c
i,j`1if ¸
i(x
i)"t
i,j`10 otherwise
(7)
where j3IMi~1
, ti,0"0, t
i,Mi"R, t
i,j)t
i,j`1, and c
i,k3[0, 1], k3I
Mi. Furthermore,
there always exists a likelihood partition for which detector performance will be equiva-lent to that achieved using the decision rule of Eq. (7).
Proof. Note that Eq. (7) is not always equivalent to a likelihood ratio partition sinceximay be assigned to more than one level with non-zero probability. If, however,
¸i(X
i) has a density with respect to Lebesgue measure, the assertion of equivalence is
trivially true with ci,k"1 for all k. Assume that this is not the case. Without loss of
generality, we can reorder the levels at the ith sensor so that
bij
aij
*
bik
aik
for j'k (8)
Assume, also without loss of generality, that Xihas a density with respect to Lebesgue
measure under both hypotheses. Let Ri,j
be the set of xito which the reordered
likelihood ratio partition assigns the value j. If ¸i(x
i) has the same value a.e.(P
0i) in
Ri,j
and Ri,k
, jOk, then by Lemma 1, performance is not degraded by conjoining thesesets. Because ¸
i(X
i) cannot place all of its mass in F, there must be some R
i,m, mOj,
mOk in which ¸i(X
i) is not constant. By Lemma 2, we can then add another level,
again without degrading performance. Hence, no more than three Ri,j
will containpoints with the same value of ¸
i. Lastly, assume that ¸
i(x
i)"c for all x
i3R
i,jand that
there is a subset ¹ of Ri,m
, mOj on which ¸iis identically equal to c. By first applying
Lemma 2 (making ¹ a separate partition set) and then Lemma 1 (conjoining ¹
to Ri,j
) no degradation occurs. Hence, no more than two Ri,j
will contain pointswith the same value of ¸
i. This is accounted for in Eq. (7) by the randomization
at the boundaries. A further implication is that we may say that ti,j
is strictly less thanti,j`1
.Because of the assumption that X
ihas a density, it is possible to partition the set
S"Mxi:¸
i(x
i)"t
i,jN into S
jand S
j~1such that PrMS
jDH
0N"c
i,jPrMS DH
0N and
PrMSj~1
DH0N"(1!c
i,j)Pr MS DH
0N. We then redefine our sensor decision rule so that
/i,j
(xi)"1 for x
i3S
jand /
i,j~1(x
i)"1 for x
i3S
j~1. Since the likelihood ratio is
constant over S the H1
probabilities of ºiare also unaffected and, hence, so is the test
performance. With this partition, the sets Ri,j
and Ri,j~1
will be disjoint. Hence, theboundary randomization of Eq. (7) never improves performance and is used merely asa mathematical convenience. We conclude, then, that test performance using thedecision rule of Eq. (7) is always equivalent to that achieved when some likelihoodratio partition is used. h
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 329
![Page 8: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/8.jpg)
We note in passing that if Xidoes not have a density, the definition of a likelihood
ratio partition may be modified by omitting the word disjoint. In that event, Lemma 3will hold. We also note that if F is any finite set of real numbers and if ¸
i(X
i)3F with
probability one, then the performance of a decentralized detector using likelihoodratio partitions cannot be improved by increasing the number of levels at the ithsensor beyond the number of elements in F.
Theorem 1. ¹he Neyman—Pearson optimal sensor decision rule is a likelihood ratiopartition.
Proof. We assume that bij/a
ij*b
ik/a
ik. Then ¸(u
i, j)*¸(u
i, k). Hence Mu
i, u
i"kN3D
1implies that (u
i, u
i"j)3D
1, and (u
i, u
i"k)3Dc implies that (u
i, u
i"j)3DcXD
1; that
is, if there is a non-zero probability that we would decide for H1
when ui"k, then the
probability of so deciding is at least as large if we change uifrom k to j. Assume that
the levels are ordered as in Eq. (8). Then for each N!1 dimensional vector uithere is
some integer b(ui) such that for j*b(u
i), (u
i, u
i"j )3D
1. Furthermore, there is some
integer a(ui)(b(u
i) such that for a(u
i)(j(b(u
i), (u
i, u
i"j)3Dc.
Suppose we are given an arbitrary decentralized detection system which satisfies (2)and (8). Let /
i,j(x
i)"Pr(º
i"j DX
i"x
i) define the sensor partition rules for this
arbitrary system. Here i3I`N
and xi3)
i. For this partition, let Q
ki( j)"
:)i/
i,j(x
i) dP
ki(x
i) be the probability that º
i"j under the hypothesis H
k, k"0,1. Also
construct a likelihood ratio partition for the ith sensor with the probability functionsQ*
ki, k"0,1. We choose the latter partition so that
Q0i
( j)"Q*0i
( j), j"0,1,2, Mi!1 (9)
and assume that it can be described by Eq. (7) with ti,0"0, t
i,Mi"R, t
i,j(t
i,j`1, and
the ci,j3[0,1], k3I
Mi. The Mt
i,jN, j3I`
Mi~1and Mc
i,jN are chosen to satisfy the equality
constraints of Eq. (9). Using the Neyman—Pearson fusion rule used for the arbitrarypartition, Q*
0iand Q*
1iwill also satisfy Eq. (8) and the two tests will have the same level.
For the likelihood ratio partition we have, for any n3IMi~1
,
Q*1iMº
i*nN"P
1iM¸(X
i)'t
i,nN#c
i,nP1i
M¸(Xi)"t
i,nN
For Neyman—Pearson fusion and by Eq. (8), the power of the test using the arbitrarypartition is
b"+ui
PrMuiDH
1N [Q
1iMu
i*b (u
i)N#cQ
1iMa (u
i)(u
i(b(u
i)N] (10)
where c3[0,1] is a randomization chosen so that the test level may be exactlyspecified. Suppose that v
iis some arbitrary value of u
iand that b(v
i)"b and a(v
i)"a.
Then for ui"v
ithe bracketed term in Eq. (10) can be written as
Q1iMu
i*bN#cQ
1iMa(u
i(bN"p#co
330 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 9: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/9.jpg)
where
p"P)i
+jzb
/i,j
(xi)dP
1i(x
i)
and
o"P)i
+a:j:b
/i,j
(xi)dP
1i(x
i)
For the likelihood ratio partition with Neyman—Pearson fusion, the power is
b*"+ui
PrMuiDH
1N [Q*
1iMu
i*b(u
i)N#cQ*
1iMa(u
i)(u
i(b(u
i)N] (11)
For ui"v
ithe bracketed term in Eq. (11) can be written as
Q*1iMu
i*bN#cQ*
1iMa(u
i(bN"p*#co* (12)
where
p*"P1iM¸
i(x
i)'t
i,bN#c
i,bP1iM¸
i(x
i)"t
i,bN
and
o*"P1iMt
i,a`1)¸
i(x
i)(t
i,bN#(1!c
i,b)P
1iM¸
i(x
i)"t
i,bN
!(1!ci,a`1
)P1iM¸
i(x
i)"t
i,a`1N
Since viis arbitrary, to complete the proof it suffices to show that the quantity in
Eq. (12) is at least as large as p#co or
p*!p!c(o!o*)*0 (13)
By Eq. (9), the H0
probability corresponding to p is
P0iM¸
i(x
i)'t
i,bN#c
i,bP0iM¸
i(x
i)"t
i,bN
and the H0
probability corresponding to o is
P0iMt
i,a`1)¸
i(x
i)(t
i,bN#(1!c
i,b)P
0iM¸
i(x
i)"t
i,bN
!(1!ci,a`1
)P0iM¸
i(x
i)"t
i,a`1N
Hence, by the Neyman—Pearson lemma for randomized tests [19], we know thatp**p. Then a necessary condition for Eq. (13) not to hold is o*(o. In that case, theleft side of Eq. (13) is decreasing in c and
p*!p!c(o!o*)*p*#o*!(p#o) (14)
But
p*#o*"P1iM¸
i(x
i)'t
i,a`1N#c
i,a`1P
1iM¸
i(x
i)"t
i,a`1N
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 331
![Page 10: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/10.jpg)
and, by Eq. (9),
P)i
+j;a
/i,j
dP0i(x
i)"P
0iM¸
i(x
i)'t
i,a`1N#c
i,a`1P
0iM¸
i(x
i)"t
i,a`1N
so by the Neyman—Pearson lemma, the right side of Eq. (14) is non-negative andEq. (13) holds. h
3.2. Decentralized Bayes detection
We now turn to the optimal decentralized Bayes test. Let n0
and n1
be the priorprobabilities of H
0and H
1, respectively. Also let C
mnbe the cost assigned to choosing
hypothesis Hm
when Hn
is the true hypothesis. The average or Bayes cost of thedetection system is then
C"
1+n/0
nnC
0n#
1+n/0
nn(C
1n!C
0n)P
n(D
1)
where D1
is the fusion center decision region for H1. The objective of Bayesian
detection is to minimize this cost. If, given a true hypothesis, the cost of making anerror is greater than the cost of making the correct decision (as it logically should be)then the optimal fusion rule for the Bayes test is given by the following:
/(u)"G1 if ¸(u)'q
0 if ¸ (u)(q(15)
where /(u) is the probability of choosing H1
given that U"u. The optimal thresholdis
q"n0(C
10!C
00)
n1(C
01!C
11)
(16)
Note the similarity between Eqs (2) and (15). Having specified the fusion rule, we nowturn to the sensor quantizers.
Theorem 2. ¹he sensor decision rules for an optimal Bayes detector are likelihood ratiopartitions of the sensor observation spaces.
Proof. The Bayes test uses, for each set of priors (n0,n
1), a particular level and the
maximum power that can be achieved for a test at that level. Eq. (2) gives themaximum power test for the level achieved by the threshold randomization pair cand q. For Bayes tests, we specify that c"0 and q is as given by Eq. (16). Hence,by application of Theorem 1, the theorem is proved. h
We have therefore shown that the optimal decentralized detector under the Ney-man—Pearson and Bayes criteria, uses a quantized version of the local likelihood ratioas the sensor decision rule. Hence, in the design of such a system, it is necessary only topartition the likelihood ratio rather than the raw data. This is intuitively satisfyingsince the likelihood ratio is a sufficient statistic for the detection problem.
332 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 11: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/11.jpg)
4. Mutual information
Mutual information can be thought of as the amount by which observation ofa random variable, ½, reduces the uncertainty about the value of a related (dependent)random variable, X. The concept is relatively simple: the uncertainty about X before½ is observed is measured by its entropy H(X) whereas the uncertainty afterobservation is the conditional entropy H(XD½). Since mutual information is thedifference H(X)!H(X D½ ) it is a measure of the amount of uncertainty resolved byobserving ½. In the detection problem, a heuristic goal may be stated as the elimina-tion of as much entropy as is possible; that is, if X is the hypothesis and ½ is thedecision, we would like the value of ½ to limit the uncertainty about X to the greatestpossible extent. In the detection problem, this perspective is intuitively appealing(although only applicable when the a priori probabilities of H
0and H
1are known); the
detector designed to maximize mutual information will produce decisions that minim-ize the remaining uncertainty about the true hypothesis. Since the purpose of detec-tion is to do precisely this, mutual information is a reasonable criteria for evaluatingdetector performance.
In [15], the relationship between the test power maximization and the mutualinformation between hypothesis and decision was studied. It was found that for anyvalue of the false alarm rate, the mutual information increases as the power of the testincreases. We summarize this result in the following lemma.
Lemma 4. For a given pair of prior probabilities for hypotheses H0and H
1, the unbiased
test that maximizes the mutual information between the random state of nature and thedetector decision is a likelihood ratio test.
Proof. In [15], the lemma was proven by differentiating I (º0;H) (the mutual
information between the hypothesis H and the detector decision º0) with respect
to test power. It was shown that if the test level and the prior probabilities of thehypotheses were held constant, then this derivative was positive. Because of itslater usefulness, we use a different technique here to prove the lemma. Definea transition matrix Q with elements Q
i`1,j`1"Pr(º
0"j DH
i), i"0, 1, j"0, 1.
Suppose that we initially make decisions purely by chance so that the probabilitythat º
0"1 is a regardless of the actual hypothesis. The transition matrix for this
test is
Qr"C
1!a a
1!a aDLet nN "(n
0,n
1) be the vector of priors (n
i"Pr(H
i)) and, for notational convenience,
let I(nN ; Q)"I(º0; H). Then it is easily shown that I(nN ; Q
r)"0. This should be readily
apparent since the test itself has added nothing to our knowledge of H; that is, weknew in advance that the chance for deciding for H
1did not depend on whether
or not H1
was true. Now suppose that the maximum possible power for a testof level a is b*. A test with this power is by definition a likelihood ratio test and
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 333
![Page 12: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/12.jpg)
its transition matrix is:
Q*"C1!a a
1!b* b*DFor an unbiased test with power b and level a we have b*a. Furthermore, we musthave b)b*. For this test the transition matrix can be written as
Q"jQr#(1!j)Q*
By the convexity of mutual information [16, Theorem 5.2.5]. we have
I(nN ; Q))jI(nN ; Qr)#(1!j) I (nN ; Q*)
or
I(nN ; Q))(1!j)I(nN ; Q*)
Hence, the test that maximizes mutual information is the maximum power test forsome false alarm rate. h
As we have seen, the mutual information for a test operating by pure chance is zero.It is well known that the mutual information between any two random variables is atleast zero and must be strictly positive if the variables are dependent. For thetransition matrix Q
rthe decision º
0is made without reference to any observable
characteristic of H; that is, the test is completely randomized. With respect to the testsspecified by Eqs (1) and (2), the question then arises as to whether partial randomiz-ation (at the threshold only) reduces the mutual information. The following lemmaaddresses this question.
Lemma 5. ¹he maximum mutual information test is a likelihood ratio test with norandomization at the threshold.
Proof. Suppose that º0
is randomized with probability qiunder hypothesis H
i. Then
qiis the probability that the likelihood ratio equals the threshold when H
iis the true
hypothesis. Further assume that º0
is chosen by a likelihood ratio test with random-ization parameter c and that the probabilities of exceeding the threshold are a andb under H
0and H
1, respectively. Since the test will choose H
1with probability c when
randomization occurs, the transition matrix Qc may be written as:
Qc"C1!a!cq
0a#cq
01!b!cq
1b#cq
1D
Then for c"0 we have
Q0"C
1!a a
1!b bD
334 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 13: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/13.jpg)
and for c"1 we have
Q1"C
1!a!q0
a#q0
1!b!q1
b#q1D
We can then write
Qc"cQ1#(1!c)Q
0
As in the proof of Lemma 4, we can make use of the fact that mutual information isconvex in the transition matrix and conclude
I(nN ; Qc))cI(nN ; Q1)#(1!c)I (nN ; Q
0)
Hence, mutual information is maximized for either c"0 or c"1. For these values ofc no randomization occurs. h
From Lemma 4 it is apparent that the optimal fusion rule will be as specified byEq. (2); that is, the maximum mutual information test fusion rule will be a likelihoodratio test. Lemma 5 specifies that this test use no randomization in the fusion rule. Theintuitive interpretation of the latter result is that more information can be gleanedfrom the observed data if the decision is based solely on the data. The use ofrandomization in the fusion rule assigns a portion of the decision process to chance;hence, we find that mutual information is reduced. This result may be contrasted tothe use of randomization in optimal Neyman—Pearson tests. The Neyman—Pearsonlemma specifies a false alarm rate for the test and probability is added to the decisionregion for H
1by ranking points in the observation space by their likelihood ratio.
Randomization occurs when the addition of all of the points with the same likelihoodratio would increase the false alarm rate beyond the specified value. Our result onmutual information restricts the false alarm rate for the maximum mutual informationto those values that require no randomization; that is, if there are points assigned toH
0and H
1that have the same likelihood ratio, under the mutual information criteria
the test cannot be optimal. Since the statistic of interest is the likelihood ratio, it seemsreasonable that a test that makes different decisions for two points with the samelikelihood ratio will be suboptimal.
Theorem 3. ¹he maximum mutual information test uses likelihood ratio partitions forthe sensor decision rules.
Proof. From Lemma 4 the maximum mutual information test is the maximum powertest at some unspecified level. From Theorem 4, we know that the maximum powertest for a given false alarm rate uses likelihood ratio partitions at the sensors. Hence,the maximum mutual information test uses likelihood ratio partitioning for the sensordecision rules. h
Summarizing the foregoing results, we have: (1) the fusion rule for a maximummutual information test is a non-randomized likelihood ratio test on the sensor
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 335
![Page 14: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/14.jpg)
output vector U, and (2) the sensor decision rules for a maximum mutual informationtest are likelihood ratio partitions. The false alarm rate for such a test will varyaccording to the H
0and H
1probability distributions of the ¸
i(X
i) and the prior
probabilities of the hypotheses. Finally, we note that since º0
is a deterministicfunction of the vector U, we can also consider maximization of I (U; H). A discussionof this quantity and its relation to I (º
0; H) may be found in section Section 6.
5. Ali–Silvey distances
It is often difficult to calculate the error probabilities for optimal systems; asa result, the literature includes numerous examples of metrics that have been used asdetector performance measures. These metrics include asymptotic relative efficiency,Chernoff bounds, rate distortion, mutual information, signal-to-noise ratio, anddistributional distance. As a result of either convexity or asymptotic normality,performance evaluation and system design may be significantly easier using thesesuboptimal measures than if error performance is used. Among the measures ofdistributional difference that are frequently cited are the members of the class ofAli—Silvey distances [20, 21]. This class includes the J-divergence, Bhattacharyyadistance (closely related to Chernoff bounds), and Matsushita distance among others.An Ali—Silvey distance that is frequently encountered in the detection literature(because of its relationship to signal-to-noise ratio) is the likelihood ratio variance[22]. This ‘‘distance’’ is the signal-to-noise ratio for a likelihood ratio detector.Applications of Ali—Silvey distances to signal selection are discussed in [23, 24].
In detection, Ali—Silvey distances are used in the following manner. The distancebetween the H
0and H
1probability distributions is maximized at the input to the
detector. This may involve the selection of signals to be used in the system (in digitalcommunications and active radar and sonar) or processing performed at the receiverprior to the application of the signal to the decision device. The detector then usesa likelihood ratio test to make the decision. It is assumed that the relationshipbetween error probability criteria and Ali—Silvey distance is monotonic. See [23, 24],for a more detailed discussion of this relationship.
In the context of decentralized detection, we are concerned with maximization ofthe distance between the respective distributions of U, the vector of inputs to thefusion center. Hence, in keeping with the previously mentioned procedure, the fusionrule is a likelihood ratio test on U. The distributional distance is maximized throughthe selection of appropriate sensor decision rules. In [25], Poor and Thomas discussthe quantization of data for centralized detectors. The quantizers they discuss are ofthe form
Q(x)"k, if x3(tk,tk`1
], k"0,1,2, M!1 (17)
where Q(x) and x are the output of and input to the quantizer, respectively. It is shownthat the quantizer that maximizes Ali—Silvey distance is found by choosing thebreakpoints for the level intervals (t
0, t
1,2, t
M) through use of the likelihood ratio.
A slight (and easily corrected) problem with this work is that more levels than
336 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 15: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/15.jpg)
necessary must be used if the likelihood ratio is not monotonic in x. To avoid thisproblem in our discussion, we use a slightly different form for the quantizer. Innotation analogous to that of Eq. (17), we use quantizers of the form
Q(x)"k, if x3Sk, k3I
M~1(18)
This less constrained quantizer format allows for more efficient use of the limited bitsavailable.
The general form of an arbitrary Ali—Silvey distance between the distributionsP0
and P1
is given by
d0(P
0,P
1)"f APC[¸(x)]dP
0(x)B (19)
where f is an increasing function, C is convex, and ¸"dP1/dP
0is the likelihood ratio.
For a decentralized detection system with N sensors, Eq. (19) can be rewritten as
d0"f A
M1~1+
u1/0
M2~1+
u2/0
2
MN~1+
uN/0
C[¸(u)]P (u DH0)B (20)
where ¸ (u) is given by Eq. (3) and P(u DH0)"<N
n/1anun
is the H0
probability of thevector u. Using the notation of Sections 2 and Section 3, Eq. (20) becomes
d0"f A+
ui
Mi~1+j/0
C[¸(ui"j)¸ (u
i)]a
ijP (u
iDH
0)B (21)
In the sequel, we will omit f since it is irrelevant to the discussion; that is, maximizingd"E
0C[¸(U)] is equivalent to maximizing d
0. Furthermore, we confine the class to
those distances for which C is differentiable. We now state and prove the followingtheorem.
Theorem 4. ¹he sensor decision rules that maximize the Ali—Silvey distance between thedistributions of U under H
0and H
1are likelihood ratio partitions.
Proof. Without loss of generality, we assume that the levels of an arbitrary decisionrule at the ith sensor are ordered as in Eq. (8). Suppose that under this arbitrarydecision rule Pr(º
i"n DX
i"x
i)"/
i,n(x
i). If M/
i,n, n3I
Mi~1N is not equivalent to
a likelihood ratio partition, there must be sets Sjand S
k, k(j, of non-zero P
0imeasure
such that for each x13S
jand each x
23S
k, /
i,j(x
1)'0, /
i,k(x
2)'0, and ¸
i(x
2)'
¸i(x
1). Suppose further, again without loss of generality, that
PSk
/i,k
(xi)dP
0i(x
i)"P
Sj
/i,j
(xi)dP
0i(x
i)"e
and
PSk
/i,k
(xi)dP
1i(x
i)!P
Sj
/i,j
(xi)dP
1i(x
i)"d
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 337
![Page 16: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/16.jpg)
From the construction thus far, both e and d will be greater than zero. We now changethe partition in the following manner. Let
/(1)i,j
(xi)"G
/i,j
(xi)#/
i,k(x
i) if x
i3S
k0 if x
i3S
j/
i,j(x
i) otherwise
/(1)i,k
(xi)"G
0 if xi3S
k/i,j
(xi)#/
i,k(x
i) if x
i3S
j/i,k
(xi) otherwise
and
/(1)i,n
(xi)"/
i,n(x
i) for nOj, nOk
For the new partition all ain
are unchanged, bij
increases by d, bik
decreases by d, andthe remaining b
inare unchanged. Then *d, the resulting change in the distance d is
*d"+ui
P0(u
i)CCA
bij# daij
¸(ui)Baij#CA
bik!daik
¸ (ui)B a
ikD!+
ui
P0(u
i) CCA
bij
aij
¸ (ui)Baij#CA
bik
aik
¸ (ui)Ba
ikD (22)
where we have adopted the notation P0(u
i)"Pr(u
iDH
0). By the mean value theorem
and the differentiability of C we have
CAbij#daij
¸(ui)B"CA
bij
aij
¸ (ui)B#dC@A
bij#d
jaij
¸ (ui)B
¸(ui)
aij
(23)
and
CAbik!daik
¸ (ui)B"CA
bik
aik
¸(ui)B!dC@A
bik!d
kaik
¸(ui)B
¸ (ui)
aik
(24)
where 0)dn(u
i))d, n"j, k. The difference in d due to the arbitrary vector u
i, after
substituting Eqs (23) and (24) into Eq. (22), is
*d(ui)"dP
1(u
i)CC@A
bij#d
j(u
i)
aij
¸(ui)B!C@A
bik!d
k(u
i)
aik
¸ (ui)BD (25)
Since C is convex, *d(ui) is non-negative if
bij#d
j(u
i)
aij
*
bik!d
k(u
i)
aik
. (26)
Recall that, by construction, bin/a
in*b
im/a
imfor n'm. Since j'k, d
j(u
i)*0, and
dj(u
i)*0, Eq. (26) holds. The partition can be further modified until, for any non-
trivial sets Rj
and Rk
with /i,j
(x1)'0 for x
13R
jand /
i,k(x
2)'0 for x
23R
k,
338 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 17: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/17.jpg)
P(RjDH
1)/P(R
jDH
1)5P(R
kDH
1)/P(R
kDH
0). The levels can then be reordered to con-
form with Eq. (8) and the process repeated for some other pair j1
and k1. Since j and
k in the foregoing were chosen arbitrarily, the distance d will be maximized when thesensor decision rules are given by Eq. (7). h
For the J-divergence, C[¸(u)]"[¸(u)!1]log¸ (u). From the conditional indepen-dence of the sensor data, the J-divergence can be written as J"+N
i/1[E
1log¸ (u
i)!
E0log¸ (u
i)] or J"+N
i/1J(u
i) where J(u
i) is the J-divergence of the H
1and
H0
distributions at the output of the ith sensor. In this case (and in the case of the
Bhattacharyya distance B where B"!logE0J¸(u)), the distance at the output of
the sensors can be individually maximized in order to maximize the overall distance.In general, if the sensor inputs are conditionally independent and C[¸(u)] can bewritten as either +N
i/1C[¸(u
i)] or <N
i/1C[¸(u
i)], then the optimal system can be
found by maximizing the distances at the individual sensor outputs. Furthermore, it isapparent that if the sensor inputs are also identically distributed, then all of theoptimal sensor decision rules are the same. This factor can significantly reduce theamount of work required to determine the thresholds for the N quantizers. It is shownin [Eq. (16)] that if C has a continuous second derivative, then t
i,jwill depend on
¸(ui"j ), ¸(u
i"j!1), C, and C@. In our notation this threshold is given by
ti,j"
¸i,j
C@(¸i,j
)!C(¸i,j
)!¸i,j~1
C@ (¸i,j~1
)!C (¸i,j~1
)
C@ (¸i,j
)!C@(¸i,j~1
)
where ¸i,n"¸(u
i"n).
6. Discussion and examples
It is intuitively appealing that an optimal sensor mapping strategy should bea quantization of the minimal sufficient statistic for detection, the likelihood ratio.Although the result appears to have been widely assumed, up to now it has beenproven only for a number of special cases. We mention several here, the first two ofwhich have been cited previously:
f For the Bayes criterion with single-bit channels (Mi"2), Reibman and Nolte have
shown that sensors perform local likelihood ratio tests [3].f Poor and Thomas poth have derived Ali—Silvey distance optimal thresholds for the
case in which the local likelihood ratios are monotonically-increasing functions ofthe data.
f For the ‘‘weak-signal’’ case, Kassam [26] has demonstrated that the sensor map-ping function that maximizes the efficacy (or asymptotic signal-to-noise ratio) isa quantization of the differential log-likelihood ratio.
f For discrimination EH0
[log(P1(u)/P
0(u))] (an Ali—Silvey distance measure closely
related to Shannon information) and discrete-valued sensor data, it can be inferredfrom the refinement lemma [16, Theorem 4.2.3] that the optimal sensor mapping isa quantization of the local likelihood ratio.
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 339
![Page 18: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/18.jpg)
f For the Chernoff function!log MEH0
[(P1(u)/P
0(u)) s]N, 0(s(1, it has been shown
in [27] that the optimal mapping is a quantization of the local likelihood ratio.Note that Bhattacharyya distance is a special case (s"1
2) of the Chernoff function.
Which measure is most appropriate is largely a function of the application. If priorprobabilities for the hypotheses are known, then Shannon information may be mostappealing; if, in addition, error costs can be assumed, then a Bayes criterion should beused. Without prior probabilities the Neyman—Pearson criterion is most natural;however, the resulting scheme is in general optimal only at a particular false-alarmrate and its derivation can require extensive computation. The Ali—Silvey distancecriteria, which utilize the sensor outputs U rather than the final decision º
0, generally
work well for a broad range of false-alarm rate; unfortunately, there is no guarantee ofany meaningful optimality. We present the following examples for clarification.
Example 1. Suppose we are to design a constant false-alarm rate (CFAR) system forSwerling II targets. The system is constrained to have three sensors (N"3); eachsensor can send one of three possible messages (M
i"3, i"1, 2, 3) to the fusion
center. A cell-averaging (CA) scheme on 20 homogeneous reference cells MM½ijN20j/1
N3i/1
is used to estimate the ambient noise level. For each i and j, ½ij
is exponentiallydistributed with parameter j; that is, the density function of the reference value is:
f (x)"Gje~jx x*0
0 else
The output of the test cell Ziis exponentially distributed with parameter (j/2) under
H1, and with parameter j under H
0. All random variables are assumed to be
independent. The sensors quantize the normalized data MXiN3i/1
, where
Xi"
Zi
(1/20)+20i/1
½ij
It can be shown that the Xiare independent and identically distributed (i.i.d.) with
density
fX(x)"
h(1#hx/20)21
where h"1 under H0
and h"0.5 under H1. Since the local likelihood ratios
¸i(X
i)"
1
2A1#X
i/20
1#Xi/40B
21
are monotone-increasing functions of Xi, quantizing the data directly is equivalent to
a likelihood-ratio partition.As we discuss later, it is not always true that the optimal sensor mappings are
identical likelihood ratio partitions. Obviously, this will not in general be the case ifthe channel capacity constraints differ between sensors. However, it still may not betrue even if all of the sensors observe i.i.d. random processes and face identicalcommunications constraints (i.e. M
i"M). However, in the case of Example 1, the
340 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 19: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/19.jpg)
Fig. 1. Plot of Neyman—Pearson optimal thresholds versus a for Example 1 (CA-CFAR).
partitions are in fact identical. The Neyman—Pearson optimal thresholds are plottedagainst a in Fig. 1. Note the discontinuity at a+0.7%. This can be seen more clearlyin Fig. 2, where we have expanded the a axis for a(0.01. The discontinuity is dueto a transition in the optimal fusion rule: for a'0.007, D
1"M(2,3,3) (3,2,3)
(3,3,2) (3,3,3)N; for a(0.007 D1
is any permutation of M(1,3,3) (2,2,3) (2,3,3) (3,3,3)N.Here (u
1,u
2,u
3) represent the outputs of the three sensors, and each output is assumed
to be ordered as in Eq. (8). As is true for many quantized detection systems, there is nofusion rule that is optimal for all a.
Optimal thresholds under a number of other criteria are shown in Table 1. In allcases the thresholds are the same at each of the sensors. Receiver operating character-istics (ROCs) for detectors optimized for the Neyman—Pearson, efficacy and Bhat-tacharyya distance criteria are shown in Fig. 3. Because optimization for the lattertwo criteria results in only one set of thresholds for each sensor (i.e. the thresholds donot change as a function of false alarm rate), the resulting fusion rule will userandomization and the detector ROC curves are piecewise linear. In Fig. 4 we showthe ROC curve for a quantizer optimized for a fixed false alarm rate (10%). Note thathere again fixed thresholds result in randomization and a piecewise linear ROC.
Figure 5 shows the values of a and b, as a function of n0
("Pr(H0)), achieved with
the quantizer that maximizes the information measure I (º0; H). By comparing this to
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 341
![Page 20: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/20.jpg)
Fig. 2. Plot of Neyman—Pearson optimal thresholds versus a for Example 1. This is an expanded version ofthe low-a behavior in Fig. 1.
Table 1Optimal thresholds (Example 1)
Criterion t1
t2
Bhattacharyya 1.33 3.65J-divergence 1.46 4.05Efficacy 0.58 1.60
Figs 3—5, we see that any information-optimal system (for 0(n0(1) is identical to
a Neyman—Pearson optimal scheme with 0.1(a(0.26 (approximately). This isconsistent with Theorem 3; however, we also note that maximum mutual informationis achieved for some test level in the interval (0.1, 0.26) for all values of n
0. Thus, the
mapping between n0and the information-maximizing false alarm rate is invertible but
not necessarily one-to-one.The thresholds used by the information-optimal system are plotted in Fig. 6. Here
‘‘global’’ refers to I(º0; H), the mutual information between the hypothesis and the
342 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 21: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/21.jpg)
Fig. 3. Performance comparison of thresholds optimized under Neyman—Pearson, efficacy, and Bhat-tacharyya measures, in situation of Example 1. Note that the NP-optimal thresholds vary with a; but thoseunder the other two criteria cannot, hence the piecewise-linear ROCs. The vertical lines denote the region ofthis ROC obtainable under by maximizing mutual information; see Fig. 4.
ultimate decision. This measure was discussed in Section 4. A system optimized withrespect to
I(U; H)"N+i/1
I(ºi; H) (27)
is referred to as ‘‘local’’; this is the information between the vector of sensor outputsand the hypothesis random variable. In Fig. 7, we show the maximum local andglobal information that can be achieved. Since º
0is the result of combining the
elements of U, it is not surprising that the local information is greater than the globalinformation. This can also be deduced from the following equation:
I(º0; H)"I (U; H)!I(U; H Dº
0)
The last term can be interpreted as ‘‘additional information about H contained inU when º
0is already known’’, and can be considered as a measure of the efficiency
with which information in U is absorbed into º0; in a Neyman—Pearson framework
this information is useless. It should be noted that for local information, as in Eq. (27),
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 343
![Page 22: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/22.jpg)
Fig. 4. Performance comparison of thresholds optimized under Neyman—Pearson criterion in situation ofExample 1. The ‘‘optimal’’ thresholds vary with a; but that optimized for a fixed a"10% naturally do not.
if the sensor observations are i.i.d and the communication channel constraints areidentical, the sensor mapping rules will all be the same. As we demonstrate in Example4 below, this is not always the case for global mutual information. We also note inpassing that we can write
J"Ad
dn0
[I(U;H)]n0/0B!Ad
dn0
[I(U; H)]n0/1Bwhere J is the J-divergence. Hence, the thresholds chosen to maximize J-divergencewill also maximize the difference between the slopes of the mutual information versusprior probability curve at its endpoints.
Example 2. Suppose we have ten sensors (N"10) that take one observation each.The observations are i.i.d. realizations of a unit Cauchy random variable with density
f (x)"1
n( 1#(x!h)2)
Under the noise-only hypothesis H0, h"0; under H
1, h"2. Each of the sensors uses
a five level quantizer.
344 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 23: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/23.jpg)
Fig. 5. Information optimal a and b for Example 1.
In Fig. 8, we show the log-likelihood ratio, the quantizer that is optimal forJ-divergence, and the Neyman—Pearson optimal quantizer for a false-alarm rate of1%. The resulting probabilities of detection for (a"0.01) are b"0.907 andb"0.925, respectively. Note that in both cases the sensor mappings are likelihoodratio quantizations.
Example 3. Suppose we have ten sensors (N"10) that are capable of transmittingone of five possible messages (M
i"5) each. With ½
irepresenting the observation at
the ith sensor, the detection problem can be written as
H0:½
i"(1!Z)N
1i#ZN
2i
H1:½
i"S
i#(1!Z)N
1i#ZN
2i
Here MSiN10i/1
, MN1iN10i/1
, and MN2iN10i/1
are all independent, zero-mean and Gaussian,with E (S2
i)"E (N2
1i)"1 and E (N2
2i)"10. The binary random variable Z takes on
values 0 and 1 with respective probabilities 90% and 10% and has the same value atall of the sensors.
This example is used to demonstrate the effect of dependence among the sensors.Without the high power noise process, the model conforms to the unknown
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 345
![Page 24: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/24.jpg)
Fig. 6. Information optimal thresholds for Example 1. This should be compared to the region of figure Fig. 1for the values of a available in Fig. 5. As indicated in the text, ‘‘local’’ refers to optimization of the informationbetween hypothesis and the outputs of the quantizers; while ‘‘global’’ is between hypothesis and decision.
(Gaussian) signal in Gaussian noise problem. When the N2i
process is included wehave added the possibility of jamming: there is a 10% probability that all of thesensors will be jammed, and a 90% probability of no jamming.
For each sensor the statistic Xi"½2
iis sufficient. Since ½
iis always Gaussian, X
i,
conditioned on the hypothesis and the presence or absence of jamming, will havea Rayleigh density given by:
fR(x;p2)"G
x
p2e~x2@2p2 x*0
0 else
Here, p2 will depend on both the hypothesis and the value of Z. Hence, the univariatedensity of X
iis
f (xi)"(1!e) f
R(x
i;1#h)#ef
R(x
i; 11#h) (28)
and the joint density of the observations is
f (x1,x
2,2,x
10)"(1!e)
10<i/1
fR(x
i; 1#h)#e
10<i/1
fR(x
i; 11#h)
Here e"0.1, while h"0 under H0
and h"1 under H1.
346 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 25: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/25.jpg)
Fig. 7. Maximized information for Example 1, where optimized thresholds are in Fig. 6, and where ‘‘local’’and ‘‘global’’ are there defined.
We consider two cases of quantizers optimal under the Bhattacharyya criterion.First, we maximize the Bhattacharyya distance of º
i, the (quantized) output of
each sensor; this is equivalent to the assumption that the sensor observationsare independent with the density given by Eq. (28). Second, we maximize theBhattacharyya distance for U ; that is, for the ensemble input to the fusion center.In Fig. 9, we plot the logarithm of the local likelihood ratio, the quantizer thatresults from the first maximization, and we indicate the thresholds that are used toquantize the data for the latter maximization. As expected, the independence assump-tion results in a quantization of the local likelihood ratio as specified by Eq. (28).The fact that inclusion of the dependence results in a quantization of the data X
iis not surprising; a large value of X
iis indicative of jamming at the ith sensor which in
turn implies jamming at all sensors. Note that due to dependence, the quantizer levelsin the second case are meaningless and are not shown. The ROC curves under bothassumptions are shown in Fig. 10. Since the dependence assumption more accuratelymodels the problem, the resulting ROC dominates that found when independence isassumed.
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 347
![Page 26: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/26.jpg)
Fig. 8. Quantizer optimized under Neyman—Pearson (a"0.01) and J-divergence, for additive signal inCauchy noise situation of Example 2.
Example 4. Suppose that three sensors (N"3) observe mutually independent dataand that the likelihood ratio of the data has density
fH0
(x)"Gc
1#[k (x!12)]2
#
c
1#[k (x!32)] 2
0)x)2
0 else
under H0and xf
H0(x) under H
1). The constant c is chosen to normalize the density and
k"100. Each sensor uses binary quantization (Mi"2).
This example is discussed in [28, 29]. It can be shown that if identical thresholds areused at all of the sensors, the optimal test (given the identical threshold constraint) willrequire a randomized fusion rule. Such tests were posited to be sub-optimal in [29],although an excellent recent article [30] indicates that this is true over a morerestricted class than was previously believed. In any case, when the constraint isremoved, randomization is no longer necessary. The optimal thresholds for theunconstrained case are shown in Fig. 11 as a function of a. In the figure, the partitionslabeled AND, OR, and MAJORITY refer to the optimal fusion rule logic for the testlevels within the partitions. Note that the three thresholds are not in general equal.
348 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 27: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/27.jpg)
Fig. 9. Quantizers optimized under Bhattacharyya criterion, for dependent situation of Example 3. Underan (incorrect) assumption of independence the result is a likelihood ratio quantization; correctly assumingdependence results in a direct data quantization.
Intuition suggests that if the sensors are quantizing i.i.d. random variables, then thequantizers used should be identical; this, however, is not always true. It should also benoted that for any prior probabilities, the operating points for a system optimizedwith respect to either global mutual information or Bayes cost must lie on the ROCcurve for Neyman—Pearson optimal detectors. Hence, for i.i.d. sensor observationsand identical channel constraints, the optimal detectors for the Bayes and mutualinformation criteria may use different sensor quantizers. From this somewhat patho-logical example we draw some conclusions in the following Section 7.
7. Identical quantization for identical sensors
In the previous Example 4, independent and identically distributed sensor data withlocal likelihood ratio densities differentiable to arbitrary order gives rise, optimally, todifferent sensor mapping functions [29]. When does this happen, and when can wereasonably ignore the possibility? Certainly it is known to occur when sensor likeli-hood ratios contain point masses of probability; yet Example 4 does not contain pointmasses.
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 349
![Page 28: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/28.jpg)
Fig. 10. Performance of quantizers for dependent situation of Example 3. The thresholding scheme and anexplanation of terms is in Fig. 9.
Here we provide some answer. It will turn out that there is a threshold in terms ofpoint-mass behavior of the likelihood ratio densities; once these densities are suffi-ciently ‘‘peaky’’, non-identical sensor quantization functions ought to be used.
7.1. The necessary conditions
With the local decision thresholds equal it is clear that the fusion rule must be of the‘‘k-out-of-n’’ variety; that is,
D"GH
1
n+i/1
nºi*k
H0
n+i/1
nºi(k
(29)
This is a familiar form, particularly for k"1 (the or rule), for k"n (the and rule), andfor k"(n#1)/2 (majority-logic). If an optimized system is to use identical localdecision thresholds, then certainly optimizing under the constraint of a k-out-of-nfusion rule must again result in identical thresholds. Let us check this.
350 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 29: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/29.jpg)
Fig. 11. Thresholds for ‘‘pathological’’ case of example 4. Here the sensors are independent (conditioned onhypothesis) and identical, yet the three sensors employ different thresholds. The fusion rules which happento be optimal are also given.
Following standard optimization theory we shall minimize the function
C(t1,2, t
n),1!b#j(a!a
d) (30)
subject to the constraint a"ad. We shall assume for now that 1(k(n. Writing b in
terms of tiwe get
b"F1(ti)PrA
n+
j/1,jEi
ºj*kDH
1B#(1!F1(ti))PrA
n+
j/1,jEi
ºj*k!1DH
1B (31)
or
b"PrAn+
j/1,jEi
ºj*kDH
1B#(1!F1(ti) )Pr A
n+
j/1,jEi
ºj"k!1DH
1B (32)
with a similar expression for a, and in which we have defined
Fl(t),Pr(¸
i)tDH
l) (33)
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 351
![Page 30: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/30.jpg)
for l"0,1. The first-order necessary conditions for a minimum are that the gradient ofC with respect to the thresholds is zero, or that
f1(ti)PrA
n+
j/1,jEi
ºj"k!1DH
1B"j f0(ti)PrA
n+
j/1,jEi
ºj"k!1DH
0B (34)
for all i3M1, nN, and where fl(t),(d/dt)F
l(t) are the likelihood ratio densities. It is clear
that equation Eq. (34) is satisfied for any set of identical thresholds, hence that valuet* which satisfies the false-alarm constraint is at least a critical point of the costfunction. Evaluating the probabilities explicitly we have
j*"f1(t*)
f0(t*) A
1!F1(t*)
1!F0(t*)B
k~1
AF1(t*)
F0(t*)B
n~k(35)
corresponding to t1"2"t
n"t*.
Now let us express b in terms of two thresholds tiand t
j. We have
b"(1!F1(ti)) (1!F
1(tj))
n~2+
l/k~2An!2
l B (1!F1(t*) )lF
1(t*)n~l~2
#(1!F1(ti))F
1(tj)
n~2+
l/k~1An!2
l B (1!F1(t*))lF
1(t*)n~l~2
#F1(ti) (1!F
1(tj))
n~2+
l/k~1An!2
l B (1!F1(t*) )lF
1(t*)n~l~2
#F1(ti)F
1(tj)
n~2+l/kAn!2
l B (1!F1(t*))lF
1(t*)n~l~2 (36)
or
b"F1(ti)F
1(tj) A
n!2
k!2B (1!F1(t*))k~2F
1(t*)n~k
!(F1(ti)#F
1(tj) ) A
n!2
k!2B (1!F1(t*))k~2F
1(t*)n~k
!F1(ti)F
1(tj) A
n!2
k!1B (1!F1(t*))k~1F
1(t*)n~k~1
#
n~2+
l/k~2An!2
l B (1!F1(t*))lF
1(t*)n~l~2 (37)
Taking the derivative with respect to tiwe get
LbLt
i
"!f1(ti) (1!F
1(tj)) A
n!2
k!2B (1!F1(t*))k~2F
1(t*)n~k
!f1(ti)F
1(tj) A
n!2
k!1B (1!F1(t*))k~1F
1(t*)n~k~1 (38)
352 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 31: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/31.jpg)
from which we get the second derivatives
L2bLt2
i
"!fQ1(t*) (1!F
1(t*))k~1F
1(t*)n~k CA
n!2
k!2B#An!2
k!1BD (39)
and
L2bLt
iLt
j
"f1(t*)2A
n!2
k!2B (1!F1(t*))k~2F
1(t*)n~k
!f1(t*)2 A
n!2
k!1B (1!F1(t*) )k~1F
1(t*)n~k~1 (40)
The first and second derivatives of a are similar.A necessary (and sufficient) condition for a set of thresholds to be a local minimum
of the cost function C is that for any vector y such that (+a)Ty"0 we have yTLy'0,where L is the Hessian matrix of C [31]. Since with all thresholds equal we have+a"(1, 1,2, 1)T, and since all off-diagonal terms of L are the same, this is equivalentto requiring
0(!
L2 bLt2
i
#jL2aLt2
i
#
L2bLt
iLt
j
!jL2a
LtiLt
j
(41)
or
0(C(n!k)d
dt*ln A
f1(t*)/f
0(t*)
F1(t*)/F
0(t*)B#(k!1)
d
dt*lnA
f1(t*)/f
0(t*)
(1!F1(t*))/(1!F
0(t*))BD
](n!2)!
(n!k)!(k!1)!f1(t*) (1!F
1(t*))k~1F
1(t*)n~k (42)
Thus a necessary condition for t1"2"t
n"t* is that
(n!k)d
dt*ln A
f1(t*)/f
0(t*)
F1(t*)/F
0(t*)B#(k!1)
d
dt*ln A
f1(t*)/f
0(t*)
(1!F1(t*))/(1!F
0(t*))B'0
(43)
While this has been proven only for 1(k(n, it can be shown via a paralleldevelopment that Eq. (43) is also a necessary condition for the and (k"n) and or(k"1) fusion rules. It is interesting that similar ROC slope conditions have been usedas predictors of pathology [32], in which it is shown that under certain suchconditions the probability of error does not go to zero as the number of sensors growswithout bound.
7.2. Interpretation
Let us recapitulate the results of the previous section. We have noted that if anidentical threshold is to be used at each sensor, then a k-out-of-n fusion rule must be
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 353
![Page 32: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/32.jpg)
employed. With such thresholds and fusion rule the first-order necessary conditionsfor optimality are indeed satisfied; in addition we have presented (second-order)conditions necessary for optimality, and sufficient for the cost function C to be at leasta local minimum.
Consider the functions
gk(x)"
n+l/k
nAn
lRBxl (1!x)n~l (44)
for which gk(0)"0, g
k(1)"1, and gR
k(x)'0 for 0)x)1. With b
0(a
0) representing
the receiver operating characteristic (ROC) of an individual sensor, a system optimalunder the assumption of identical thresholds uses
k*"arg maxkMg
k(b
0(g~1
k(a
d)) )N (45)
for its fusion rule. If the operating point on the individual-sensor ROC correspondingto the design false-alarm rate a
ddoes not satisfy Eq. (43), then an optimal design does
not use identical thresholds. If on the other hand Eq. (43) is satisfied, then the designproduces at least a local minimum of the cost function C; unfortunately there do notappear to be any general statements one can make about convexity, and there maystill be a better design using non-identical thresholds, or even with a different (notk-out-of-n) fusion rule.
The and and or fusion rules are exceptional. In these cases Eq. (34) may be writtenas
f1(ti)
1!F1(ti)
n<j/1
(1!F1(tj))"j
f0(ti)
1!F0(ti)
n<j/1
(1!F0(tj)) (46)
and
f1(ti)
F1(ti)
n<j/1
(F1(tj))"j
f0(ti)
F0(ti)
n<j/1
(F0(tj)) (47)
respectively, for which it is clear that Eq. (43) guarantees unique solutions. In fact, theand and or rules, since they represent inequality Eq. (43) in its purest forms, offer themost straightforward interpretation. The necessary conditions for the and rule may bewritten in terms of the individual-sensor ROC as
d
daln A
dbdaB(
d
dalnA
baB (48)
while for the or rule we have
d
dalnA
dbdaB(
d
dalnA
1!b1!aB (49)
Geometrically the left sides of Eq. (48) and Eq. (49) represent the relative rate ofchange of the slope of the individual-sensor ROC at a given operating point. The rightsides represent the relative rates of change of the slopes of secants drawn between theoperating point and (0, 0) and (1, 1) respectively; this is shown in Fig. 12. Here the
354 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 33: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/33.jpg)
Fig. 12. Interpretation of second-order necessary conditions.
slope of a is db/da, that of b is b/a, and that of c is (1!b)/(1!a). Since all quantitiesare negative, we see that the second-order necessary conditions will be violated fora sufficiently flat (in the sense of constant slope) portion of the ROC—this is indicativeof a large (but not necessarily a point) mass of probability.
Example 5. For the known-signal in Gaussian noise hypothesis problem, inequalitiesEqs (48) and (49) are always satisfied.
Example 6. For the single-sensor ROC described by b0"as
0(0(s(1), which may
arise from an exponential-density hypothesis-testing problem, the left and right sidesof Eq. (48) are equal. In fact, the performance of such a test using an n-sensor and ruleis completely insensitive to the thresholds used, and actually is equal to the single-sensor performance. Not surprisingly the and rule is never optimal, and Eq. (43) issatisfied for any k'1.
Example 7. (Same as Example 4). The null-hypothesis density f0is shown in Fig. 13 for
k"10 and k"100. It was observed that for k"10 the thresholds optimally shouldbe identical, while they may differ for k"100, the latter case more closely resemblingpoint masses. The actual thresholds for k"100 and three sensors are shown in
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 355
![Page 34: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/34.jpg)
Fig. 13. Likelihood ratio density functions for the ‘‘pathological’’ Example 4, different k parameters.
Fig. 11. It turns out that Eq. (43) begins to fail for some t when k'11. For example, ifan and rule is to be used, only the second term in equation Eq. (43) must be checkedsince k"n. This logarithm (that is, without the derivative) is plotted in Fig. 14. It isclear that with k"10 the slope is always positive, and there is no doubt that sensorquantizations should be identical; yet with k"100 this is not true, and the behavior ofFig. 11 results. The same conclusion can be drawn directly from the ROC of Fig. 15using Eq. (48) instead of Eq. (43).
In summary, there is, for the binary quantization case, an easy-to-check necessarycondition, given in terms of the local ROC, for the sensor mapping functions to beidentical. It turns out that the condition is violated when the local likelihood ratiodensities become ‘‘point-mass like’’ in that their probability is sharply concentrated;they need lose their continuity and differentiability, however. The condition wasderived in the context of Neyman—Pearson detection systems, but the proof is valid forBayesian schemes as well. The necessary condition is known to be sufficient only forand and or fusion rules. Excellent further results on the relative performance ofidentical versus non-identical quantization are available in [12]; generally the differ-ence is not great.
356 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 35: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/35.jpg)
Fig. 14. The version of equation Eq. (43) appropriate for Example 9 (same as Example 4) and an and rule.Note that with k"10 this curve is monotone (the derivative is always positive, and hence Eq. (43) is alwayssatisfied); while for k"100 this is not so.
8. Summary
In this paper, we have shown that under many performance criteria the optimaldecentralized detection system uses a likelihood ratio partition of the sensor observa-tions. The criteria studied were Neyman—Pearson, Bayes, mutual information, and theAli—Silvey distances. Unlike past work on the subject, we did not assume that thebandlimited channels between the sensors and the fusion center restrict communica-tion to one bit per decision, nor did we assume that the sensor decision rules wereidentical. In all cases, we have shown that the sensors optimally compute and quantizethe local likelihood ratio and transmit the quantizer level to the fusion center. We alsoshow that the optimal decentralized detector for global mutual information uses norandomization at the fusion center.
Conditions were given for the use of identical sensor decision rules for certainAli—Silvey distances and for the local mutual information. It was also shown that evenwith i.i.d. sensor observations and equal communication channel constraints, thesensor decision rules for the Neyman—Pearson, Bayes, and global mutual informationcriteria may vary between sensors. Examples were given illustrating the effects ofjamming (and hence dependence), the relationship between local and global mutual
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 357
![Page 36: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/36.jpg)
Fig. 15. The single-sensor ROC for example 9 (same as Example 4), for use with equation Eq. (48) todetermine whether the sensors use identical quantizations. The same conclusions can be drawn as in thealternative approach of figure Fig. 14: when k"100, this ROC is sufficiently flat to violate Eq. (48).
information, and situations in which identical sensor decision rules will and will not beoptimal.
As this last example is, we hope, intriguing, we devote some effort to its explication.Specifically, we are able to derive necessary (and sometimes sufficient) conditions foridentical sensors optimally to use identical quantizers. These conditions amount toa check on the nearness of a likelihood ratio density to exhibition of point-massbehavior.
References
[1] R.R. Tenney, N.R. Sandell Jr., Detection with distributed sensors, IEEE Trans. Aerospace Electron.Systems AES-17 (1981) 501—510.
[2] Z. Chair, P.K. Varshney, Optimal data fusion in multiple sensor detection systems, IEEE Trans.Aerospace Electron. Systems, AES-22 (1986) 98—101.
[3] A.R. Reibman, L.W. Nolte, Optimal detection and performance of distributed sensor systems, IEEETrans. Aerospace Electron. Systems AES-23 (1987) 24—30.
[4] I.Y. Hoballah, P.K. Varshney, Distributed Bayesian signal detection, IEEE Trans. Inform. TheoryIT-35 (1989) 995—1000.
358 Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359
![Page 37: Optimum quantization for detector fusion: some proofs, examples, and pathology](https://reader031.vdocument.in/reader031/viewer/2022020513/575020eb1a28ab877e9d352c/html5/thumbnails/37.jpg)
[5] I.Y. Hoballah, P.K. Varshney, An information theoretic approach to the distributed detectionproblem, IEEE Trans. Inform. Theory IT-35 (1989) 988—994.
[6] D.J. Warren, P.K. Willett, Optimal decentralized detection for conditionally independent sensors,Proc. 1989 Amer. Control Conf., vol. 2, Pittsburgh, June 1989, pp. 1326—1329.
[7] J.N. Tsitsiklis, Decentralized detection, in: H.V. Poor, J.B. Thomas (Eds.), Advances in StatisticalSignal Processing, vol. 2—Signal Detection, JAI Press, Greenwich, CT, 1990.
[8] J.N. Tsitsiklis, Extremal properties of likelihood ratio quantizers, Tech. Report dLIDS-P-1923,Laboratory for Information and Decision Sciences, Massachusetts Institute of Technology, Cam-bridge, MA, November 1989.
[9] S. Thomopoulos, R. Viswanathan, D. Bougoulias, Optimal distributed decision fusion, IEEE Trans.Aerospace Electron. Systems 25 (5) (1989) 761—765.
[10] R. Viswanathan, A. Ansari, S. Thomopoulos, Optimal partitioning of observations in distributeddetection, Proc. Int. Symp. on Information Theory, 1988, p. 197.
[11] R. Blum, Quantization in multi-sensor random signal detection, IEEE Trans. Inform. Theory 41 (1)(1995) 204—215.
[12] P.-N. Chen, A. Papamarcou, New asymptotic results in parallel distributed detection, IEEE Trans.Inform. Theory 39 (6) (1993) 1847—1863.
[13] R. Blum, Necessary conditions for optimum distributed detectors under the Neyman—Pearsoncriterion, IEEE Trans. Inform. Theory 42 (3) (1996) 990—994.
[14] H.L. Van Trees, Detection, Estimation, and Modulation Theory, vol. 1, Wiley, New York, 1968.[15] A. Martinez, Detector optimization using Shannon’s information, Proc. 20th Annual Conf. Informa-
tion Sciences and Systems, Princeton University, Princeton, NJ, March 1986.[16] R. Blahut, Principles and Practice of Information Theory, Addison-Wesley, Reading, MA, 1987.[17] S.D. Silvey, Statistical Inference, Chapman & Hall, London, 1974.[18] H.V. Poor, An Introduction to Signal Detection and Estimation, Springer, New York, 1988.[19] E.L. Lehmann, Testing Statistical Hypotheses, Wiley, New York, 1986.[20] S.M. Ali, S.D. Silvey, A general class of coefficients of divergence of one distribution from another,
J. Roy. Stat. Soc., Ser. B 28 (1966) 131—142.[21] H. Kobayashi, J.B. Thomas, Distance measures and related criteria, Proc. 5th Annual Allerton Conf.
Circuit and System Theory, Monticello, IL, October 1967, pp. 491—500.[22] H.V. Poor, Robust decision design using a distance criteria, IEEE Trans. Inform. Theory IT-26 (1980)
575—587.[23] T.L. Grettenberg, Signal selection in communication and radar systems, IEEE Trans. Inform. Theory
IT-9 (1963) 265—275.[24] T. Kailath, The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans.
Commun. Technol. COM-15 (1967) 52—60.[25] H.V. Poor, J.B. Thomas, Applications of Ali—Silvey distance measures in the design of generalized
quantizers for binary decision systems, IEEE Trans. Commun. COM-25 (1977) 893—900.[26] S. Kassam, Signal Detection in Non-Gaussian Noise, Springer, Berlin, 1987.[27] D. Kazakos, V. Vannicola, M. Wicks, On optimal quantization for detection, Proc. 1989 Conf. on
Information Sciences and Systems, March 1989.[28] P. Willett, D. Warren, Randomization and scheduling in decentralized detection, Proc. 27th Annual
Allerton Conf. on Communications, Control, and Computing, September 1989.[29] P. Willett, D. Warren, The suboptimality of randomized tests in quantized detection systems, IEEE
Trans. Inform. Theory 39 (2) (1992) 355—361.[30] Y. Han, T. Kim, Randomized fusion rules can be optimal in distributed Neyman—Pearson detectors,
IEEE Trans. Inform. Theory 43 (4) (1997) 1281—1288.[31] D. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, Reading, MA, 1984.[32] R. Viswanathan, V. Aalo, On counting rules in distributed detection, IEEE Trans. Acoust. Speech
Signal Process 37 (5) (1989) 772—775.
Douglas Warren, Peter Willett / Journal of the Franklin Institute 336 (1999) 323—359 359