learning the dimensionality of hidden variables

ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving our ability to learn compact models and complement our earlier work of discovering hidden variables. We describe an approach that utilizes a score-based agglomerative state-clustering. This approach allows us to efficiently evaluate models with a range of cardinality for the hidden variable. We extend our procedure to handle several interacting hidden variable. We demonstrate the effectiveness of this approach by evaluating this on several synthetic and real-life data sets. We show that our approach learns models with hidden variables that generalize better and have better structure then previous approaches.

Learning the Dimensionality of Hidden Variables

Why is dimensionality important?

Representation: The I-map—minimal structure which implies only independencies that hold in the marginal distribution—is typically complex

Improve Learning: Models with fewer parameters allow us to learn faster and more robustly.

not introducing new independencies

M-Step: Score & Parameterize

Learning: Structural EM

TrainingData

X1 X2 X3

H

Y1 Y2 Y3

+

E-Step:Computation

X1 X2 X3

H

Y1 Y2 Y3

X1 X2 X3

H

Y1 Y2 Y3

Expected Counts

N(X1)N(X2)N(X3)N(H, X1, X1, X3)...

re-iterate with best candidate

Bayesian scoring metric:

A Bayesian network represents a joint probability over a set of random variables using a DAG :

What is a Bayesian Network

Abnormality

in Chest

Visit to Asia

Smoking

Lung Cancer

Tuberculosis

Bronchitis

X-Ray Dyspnea

P(D|A,B) = 0.8

P(D|¬A,B)=0.1

P(D|A, ¬B)=0.1

P(D| ¬ A, ¬B)=0.01

P(X1,…Xn)=P(V)P(S)P(T|V) … P(X|A)P(D|A,B)

i u x ux

uxi

u

uX

iXX

i i

i

i

ii

uxN

uNUPaP

DPaFamScoreDGPDGScore

)(

)],[(log

)][(

)(log)(log

):()|(log):(

,

,

1

2

3

Single Hidden Variable4

5

9

7

h { 1 , 2 , … , n } h { 1 , 2 , … , n-1 }

X1 X2 X3

Y1 Y2

H

Y3

X1 X2 X3

Y1 Y2 Y3

H

Choosing the dimensionality

• Start with a unique value for each Markov Blanket assignment of the hidden variable

• Greedily combine two states for maximal score improvement

• Choose the number of values that correspond to the maximal score

Propose a candidate network: (1) Introduce H as a parent of all nodes in S (2) Replace all incoming edges to S by edges to H (3) Remove all inter-S edges (4) Make all children of S children of H if acyclic

The FindHidden Algorithm

Semi-Clique S with N nodes

2neighbors#

N

A hidden variable discovery algorithm (Elidan et al, 2000) that uses structural signatures (approximates cliques) to detect hidden variables.

6Behavior of the score

• Efficient computation: N[hi,PaH] + N[hj,PaH] = N[hij,PaH] and does not depend on other states

• Complexity reduction increases the score

• The likelihood of FamilyH is increased when |H| is smaller

• The likelihood of Familychild(H) is decreased and towards a single state significantly plunges.

8Several interacting variables

• Round-robin approach iterates between hidden variables from bottom-up

• Initialize with a single states to rely only on observable nodes

• Improvement to complete score guarantees convergence of method

Gal Elidan, Nir FriedmanHebrew University

{galel, nir}@huji.ac.il

Summary and Future WorkWe introduced the importance of setting the correct dimensionally for hidden variables and implemented a computationally effective agglomerative method to determine the number of states. The algorithm performs well and improves the quality and performance of the models learned when combined with the hidden variable discovery algorithm FindHidden.

Future work:

Use additional measures to discover hidden variable such as edge confidence, information measures computed directly from the data, etc.

Handle hidden variables when the data is sparse

Explore hidden variables in Probabilistic Relational Models

Integration with FindHidden

Log-loss performance of FindHidden with and without agglomeration on test and real-life data. Base line is the performance of the original input network

The TB network after FindHidden

The TB network after FindHidden

with agglomeration

24 Variables in the Alarm network were hidden and the agglomeration methods was applied:

• Perfect recovery: 15 variables ; Single missing state: 2 variables

• Extra state: 2 variables. These variable’s children have stochastic CPDs. The algorithm tries to explain dependencies that arise in a specific training set.

• 5 variables collapse to a single state. These were redundant (confirmed by aggressive EM).

0.02

0.04

0.06

0.08

0.1

Original

FindHidden with Agglomeration

log-

los s

(bi

ts/ i n

sta n

ce)

HR

LVF

AIL

UR

E

VE

NT

LUN

G

INT

UB

AT

ION

TB

ST

OC

K

NE

WS

0.1

0.2

0.3

0.4

0.5

0

The Alarm network: Cardinality deviations for 24 variables

0

2

4

6

8

10

12

14

16

250 500 750 1000 2500 5000 10000

number of instances

nu

mb

er o

f va

riab

les

Missing a single state

Correct cardinality

Collapsed to a single state

x0 x1 x2 x3 x4 x5 x6 x7

h1 h2 h3

h0

x0 x1 x2 x3 x4 x5 x6 x7

h1 h2 h3

h0

x0 x1 x2 x3 x4 x5 x6 x7

h1 h2 h3

h0

True model

)h0-h3 have 3,2,4,3 states(

Model learned with agglomeration

Model learned with binary states

x-ray

smpros

hivpos

age

Hidden

hivres

clustered

ethnic

homeless

pob

gender

disease_site

x-ray

smpros

hivpos

age

Hidden

hivres

clusteredethnic

homeless

pob

gender

disease_site

The Alarm network: STROKEVOLUME score progress

-153000

-152500

-152000

-151500

-151000

-150500

-150000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

number of states

Sc

ore

Agglomeration results

EM with agglomeration as starting point

Multiple starting point EM

Agglomeration Tree of the HYPOVOLEMIA node in the alarm network. Leaves show assignments to parents. Each node is numbered according to agglomeration order and shows change in score

N,T,LN,F,L

N,F,HN,F,N

H,F,H

L,F,L

H,F,L H,F,N

L,F,N L,F,H

L,T,N H,T,L

L,T,L

(1) +610.6

(12) –185.5

(9) +10.0

(8) +10.6

(3) +38.4

(11) –19.6

(4) +23.4

(2) +46.4

(10) +5.0

(7) +12.3

(6) +15.6

(5) +17.5

learning the dimensionality of hidden variables

Documents