learning the dimensionality of hidden variables
DESCRIPTION
h0. h1. h2. h3. x0. x1. x2. x3. x4. x5. x6. x7. h0. h0. h1. h2. h3. h1. h2. h3. X 1. X 1. X 1. X 2. X 2. X 2. X 3. X 3. X 3. H. x0. x1. x2. x3. x4. x5. x6. x7. x0. x1. x2. x3. x4. x5. x6. x7. H. Y 1. Y 1. Y 1. Y 2. Y 2. Y 2. Y 3. Y 3. Y 3. H. - PowerPoint PPT PresentationTRANSCRIPT
ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving our ability to learn compact models and complement our earlier work of discovering hidden variables. We describe an approach that utilizes a score-based agglomerative state-clustering. This approach allows us to efficiently evaluate models with a range of cardinality for the hidden variable. We extend our procedure to handle several interacting hidden variable. We demonstrate the effectiveness of this approach by evaluating this on several synthetic and real-life data sets. We show that our approach learns models with hidden variables that generalize better and have better structure then previous approaches.
Learning the Dimensionality of Hidden Variables
Why is dimensionality important?
Representation: The I-map—minimal structure which implies only independencies that hold in the marginal distribution—is typically complex
Improve Learning: Models with fewer parameters allow us to learn faster and more robustly.
not introducing new independencies
M-Step: Score & Parameterize
Learning: Structural EM
TrainingData
X1 X2 X3
H
Y1 Y2 Y3
+
E-Step:Computation
X1 X2 X3
H
Y1 Y2 Y3
X1 X2 X3
H
Y1 Y2 Y3
Expected Counts
N(X1)N(X2)N(X3)N(H, X1, X1, X3)...
re-iterate with best candidate
Bayesian scoring metric:
A Bayesian network represents a joint probability over a set of random variables using a DAG :
What is a Bayesian Network
Abnormality
in Chest
Visit to Asia
Smoking
Lung Cancer
Tuberculosis
Bronchitis
X-Ray Dyspnea
P(D|A,B) = 0.8
P(D|¬A,B)=0.1
P(D|A, ¬B)=0.1
P(D| ¬ A, ¬B)=0.01
P(X1,…Xn)=P(V)P(S)P(T|V) … P(X|A)P(D|A,B)
i u x ux
uxi
u
uX
iXX
i i
i
i
ii
uxN
uNUPaP
DPaFamScoreDGPDGScore
)(
)],[(log
)][(
)(log)(log
):()|(log):(
,
,
1
2
3
Single Hidden Variable4
5
9
7
h { 1 , 2 , … , n } h { 1 , 2 , … , n-1 }
X1 X2 X3
Y1 Y2
H
Y3
X1 X2 X3
Y1 Y2 Y3
H
Choosing the dimensionality
• Start with a unique value for each Markov Blanket assignment of the hidden variable
• Greedily combine two states for maximal score improvement
• Choose the number of values that correspond to the maximal score
Propose a candidate network: (1) Introduce H as a parent of all nodes in S (2) Replace all incoming edges to S by edges to H (3) Remove all inter-S edges (4) Make all children of S children of H if acyclic
The FindHidden Algorithm
Semi-Clique S with N nodes
2neighbors#
N
A hidden variable discovery algorithm (Elidan et al, 2000) that uses structural signatures (approximates cliques) to detect hidden variables.
6Behavior of the score
• Efficient computation: N[hi,PaH] + N[hj,PaH] = N[hij,PaH] and does not depend on other states
• Complexity reduction increases the score
• The likelihood of FamilyH is increased when |H| is smaller
• The likelihood of Familychild(H) is decreased and towards a single state significantly plunges.
8Several interacting variables
• Round-robin approach iterates between hidden variables from bottom-up
• Initialize with a single states to rely only on observable nodes
• Improvement to complete score guarantees convergence of method
Gal Elidan, Nir FriedmanHebrew University
{galel, nir}@huji.ac.il
Summary and Future WorkWe introduced the importance of setting the correct dimensionally for hidden variables and implemented a computationally effective agglomerative method to determine the number of states. The algorithm performs well and improves the quality and performance of the models learned when combined with the hidden variable discovery algorithm FindHidden.
Future work:
Use additional measures to discover hidden variable such as edge confidence, information measures computed directly from the data, etc.
Handle hidden variables when the data is sparse
Explore hidden variables in Probabilistic Relational Models
Integration with FindHidden
Log-loss performance of FindHidden with and without agglomeration on test and real-life data. Base line is the performance of the original input network
The TB network after FindHidden
The TB network after FindHidden
with agglomeration
24 Variables in the Alarm network were hidden and the agglomeration methods was applied:
• Perfect recovery: 15 variables ; Single missing state: 2 variables
• Extra state: 2 variables. These variable’s children have stochastic CPDs. The algorithm tries to explain dependencies that arise in a specific training set.
• 5 variables collapse to a single state. These were redundant (confirmed by aggressive EM).
0.02
0.04
0.06
0.08
0.1
Original
FindHidden with Agglomeration
log-
los s
(bi
ts/ i n
sta n
ce)
HR
LVF
AIL
UR
E
VE
NT
LUN
G
INT
UB
AT
ION
TB
ST
OC
K
NE
WS
0.1
0.2
0.3
0.4
0.5
0
The Alarm network: Cardinality deviations for 24 variables
0
2
4
6
8
10
12
14
16
250 500 750 1000 2500 5000 10000
number of instances
nu
mb
er o
f va
riab
les
Missing a single state
Correct cardinality
Collapsed to a single state
x0 x1 x2 x3 x4 x5 x6 x7
h1 h2 h3
h0
x0 x1 x2 x3 x4 x5 x6 x7
h1 h2 h3
h0
x0 x1 x2 x3 x4 x5 x6 x7
h1 h2 h3
h0
True model
)h0-h3 have 3,2,4,3 states(
Model learned with agglomeration
Model learned with binary states
x-ray
smpros
hivpos
age
Hidden
hivres
clustered
ethnic
homeless
pob
gender
disease_site
x-ray
smpros
hivpos
age
Hidden
hivres
clusteredethnic
homeless
pob
gender
disease_site
The Alarm network: STROKEVOLUME score progress
-153000
-152500
-152000
-151500
-151000
-150500
-150000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
number of states
Sc
ore
Agglomeration results
EM with agglomeration as starting point
Multiple starting point EM
Agglomeration Tree of the HYPOVOLEMIA node in the alarm network. Leaves show assignments to parents. Each node is numbered according to agglomeration order and shows change in score
N,T,LN,F,L
N,F,HN,F,N
H,F,H
L,F,L
H,F,L H,F,N
L,F,N L,F,H
L,T,N H,T,L
L,T,L
(1) +610.6
(12) –185.5
(9) +10.0
(8) +10.6
(3) +38.4
(11) –19.6
(4) +23.4
(2) +46.4
(10) +5.0
(7) +12.3
(6) +15.6
(5) +17.5