inferring data inter-relationships via fast hierarchical models

47
nferring Data Inter-Relationshi Via Fast Hierarchical Models Lawrence Carin Duke University www.ece.duke.edu/~lcarin

Upload: edena

Post on 09-Feb-2016

24 views

Category:

Documents


1 download

DESCRIPTION

Inferring Data Inter-Relationships Via Fast Hierarchical Models. Lawrence Carin Duke University www.ece.duke.edu/~lcarin. Sensor Deployed Previously Across Globe. Previous deployments. New deployment. Deploy to New Location. Can Algorithm Infer Which Data from - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Inferring Data Inter-RelationshipsVia Fast Hierarchical Models

Lawrence CarinDuke University

www.ece.duke.edu/~lcarin

Page 2: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Sensor Deployed Previously Across Globe

Deploy to New Location. Can Algorithm Infer Which Data fromPast is Most Relevant for New Sensing Task?

Previous deployments

New deployment

Page 3: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Semi-Supervised & Active Learning

• Enormous quantity of unlabeled data -> exploit context via semi-supervised learning

• Focus the analyst on most-informative data -> active learning

Page 4: Inferring Data Inter-Relationships Via Fast Hierarchical Models

• Appropriately exploit related data from previous experience over sensor “lifetime”

- Transfer learning

• Place learning with labeled data in the context of unlabeled data, thereby exploiting manifold information

- Semi-supervised learning

• Reduce load on analyst: only request labeled data on subset of data for which label acquisition would be most informative

- Active learning

Technology Employed & Motivation

Page 5: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Bayesian Hierarchical Models:Dirichlet Processes

• Principled setting for transfer learning

• Avoids problems with model selection

- Number of mixture components

- Number of HMM states

[iGMM: Rasmussan, 00], [iHMM: Teh et al., 04,06], [Escobar & West, 95]

Page 6: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Data Sharing: Stick-Breaking View of DP – 1/2

• The Dirichlet process (DP) is a prior on a density function, i.e., G(Θ) ~DP[α,Go(Θ)]

• One draw of G(Θ) from DP[α,Go(Θ)]:

1

11 1

221

),1(~ Betak)1(

1

1

k

iikk

1=π∑∞

1=kk)Θ-Θ(δπ=)Θ( *

1=∑ kk

kG

ok G~Θ*

[Sethuraman, 94]

Page 7: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Data Sharing: Stick-Breaking View of DP – 2/2

1

11 1

221

• As α → 0, the more likely that Beta(1, α) yields large νk , implying more sharing; a few larger “sticks”, with corresponding likely parameters

• As α → ∞, sticks very small and roughly the same size, so reduces to Go

),1(~ Betak)1(

1

1

k

iikk

1=π∑∞

1=kk)Θ-Θ(δπ=)Θ( *

1=∑ kk

kG

ok G~Θ*

*Θk

)Θ(G

Page 8: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Non-Parametric Mixture Models- Data sample di drawn from a Gaussian/HMM with associated parameters

- Posterior on model parameters indicates which parameters are shared, yielding a Gaussian/HMM mixture model; no model selection on number of mixture components

)]Θ(,α[~)Θ-Θ(δπ=)Θ(~Θ *∞

1=∑ okk

ki GDPG)Θ(~ ii dFd

Gaussian or HMM

iz

0G

n

di

πα

)Θ(~}Θ{,

~}Θ{

)(~

)α,1(~α

∞,1=

∞,1=*

izkkii

ookk

i

Fzd

GG

Multz

Beta

ππ

π

*Θk

Page 9: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Dirichlet Process as a Shared Prior

• Cumulative set of data D={d1, d2, …,dn}, with associated parameters

• When parameters are shared then the associated data are also shared; data sharing implies learning from previous/other experiences → Life-long learning

• Posterior reflects a balance between the DP-based desire for sharing, constituted by the prior , against the likelihood function that rewards parameters that match the data well

DP Desire forSharing Parameters

Likelihood’s Desire to Fit Data

Posterior Balances these Objectives

∫ ∫ ∫ ),αΘ,...,Θ,Θ()Θ,...,Θ,Θ(Θ...ΘΘ

),αΘ,...,Θ,Θ()Θ,...,Θ,Θ(=),α,Θ,...,Θ,Θ(

212121

212121

onnn

onnon

Gppddd

GppGp

D

DD

),αΘ,...,Θ,Θ( 21 on Gp )Θ,...,Θ,Θ( 21 np D

}Θ,...,Θ,Θ{ 21 n

Page 10: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Hierarchical Dirichlet Process – 1/2

• A DP prior on the parameters of a Gaussian model yields a GMM in which the number of mixture components need not be set a priori (non-parametric)

• Assume we wish to build N GMMs, each designed using a DP prior

• We link the N GMMs via an overarching DP “hyper prior”

⇒),γ(~ oGDPG

)Θ(~}Θ{,

)(~

~}Θ{

)α,1(~α

,2∞,1=,2,2,2

22,2

∞,1=*

,2

2

izkkii

i

kk

Fzd

Multz

GG

Beta

ππ

π

)Θ(~}Θ{,

)(~

~}Θ{

)α,1(~α

,1∞,1=,1,1,1

11,1

∞,1=*,1

1

izkkii

i

kk

Fzd

Multz

GG

Beta

ππ

π

)Θ(~}Θ{,

)(~

~}Θ{

)α,1(~α

,∞,1=,1,,

1,

∞,1=*

,

izNkkiNiN

NiN

kkN

N

Fzd

Multz

GG

Beta

ππ

π

)Θ-Θ(δπ= *∞

1=∑ kk

kGwe draw

[Teh et al., 06]

Page 11: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Hierarchical Dirichlet Process – 2/2

• HDP yields a set of GMMs, each of which shares the same parameters , corresponding to Gaussian mean and covariance, with distinct probabilities of observation

*Θk

• Coefficients an,k represent the probability of transitioning from state n to state k

• Naturally yields the structure of an HMM; number of large amplitude coefficients an,k

implicitly determines the most-probable number of states

)Θ(=)=( *1+

1=,221+ ∑ kt

kktt oFaSsop

)Θ(=)=( *1+

1=,111+ ∑ kt

kktt oFaSsop

)Θ(=)=( *1+

1=k∞,∞1+ ∑ kt

ktt oFaSsop

Page 12: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Computational Challenges in Performing Inference

• We have the general challenge of estimating the posterior

• The denominator is typically of high dimension (number of parameters in model), and cannot be computed exactly in reasonable time

• Approximations required

Computational Complexity

Acc

urac

y

MCMC

Laplace

VariationalBayes (VB)

∫ )MΘ)p(M,Θp(Θd

)MΘ)p(M,Θp(=

)Mp(

)MΘ)p(M,Θp(=)M,Θp(

D

D

D

DD

[Blei & Jordan, 05]

Page 13: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Graphical Model of the nDP-iHMM

[Ni, Dunson, Carin; ICML 07]

Page 14: Inferring Data Inter-Relationships Via Fast Hierarchical Models

How Do You Convince Navy Data Search Works?

Validation Not as “Simple” as Text Search

Consider Special Kind of Acoustic Data: Music

Page 15: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Multi-Task HMM Learning

• Assume we have N sequential data sets

• Wish to learn HMM for each of the data sets

• Believe that data can be shared between the learning tasks; not independent task

• All N HMMs learned jointly, with appropriate data sharing

• Use of iHMM avoids the problem of selecting number of states in HMM

• Validation on large music database; VB yields fast inference

Page 16: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Demonstration Music Database

525 Jazz 975 Classical 997 Rock

Jazz Rock

Page 17: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Classical

Page 18: Inferring Data Inter-Relationships Via Fast Hierarchical Models

500 1000 1500 2000 2500

500

1000

1500

2000

2500 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Inter-Task Similarity Matrix

Page 19: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Typical Recommendations from Three Genres

Classical Jazz Rock

Page 20: Inferring Data Inter-Relationships Via Fast Hierarchical Models
Page 21: Inferring Data Inter-Relationships Via Fast Hierarchical Models
Page 22: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Applications of Interest to Navy

• Music search provides a fairly good & objective demonstration of the technology

• Other than use of acoustic/speech features (MFCCs), nothing in previous analysis specifically tied to music – simply data search

• Use similar technology for underwater acoustic sensing (MCM) - generative

• Use related technology for synthetic aperture radar and EO/IR detection and classification – discriminative

• Technology delivered to NSWC Panama City, and demonstrated independently on mission-relevant MCM data

Page 23: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Underwater Mine Counter Measures (MCM)

Page 24: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Generative Model - iHMM

[Ni & Carin, 07]

Page 25: Inferring Data Inter-Relationships Via Fast Hierarchical Models
Page 26: Inferring Data Inter-Relationships Via Fast Hierarchical Models
Page 27: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Full Posterior on Number of HMM States

Page 28: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Anti-Submarine Warfare (ASW)

Page 29: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Design HMM for all Targets of Interest Over Sensor Lifetime

Page 30: Inferring Data Inter-Relationships Via Fast Hierarchical Models

State Sharing Between ASW Targets

Page 31: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Semi-Supervised Multi-Task Learning

Page 32: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Semi-Supervised Discriminative Multi-Task Learning

• Semi-supervised learning implemented via graphical techniques

• Multi-task learning implemented via DP

• Exploits all available data-driven context

- Data available from previous collections, labeled & unlabeled

- Labeled and unlabeled data from current data set

Page 33: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Graph representation of partially labeled data manifolds (1/2)

Construct the graph G=(X,W), with the affinity matrix W, where the (i, j)-th element of W is defined by a Gaussian kernel:

Define a Markov random walk on the graph by the transition matrix A, where the (i, j)-th element:

which gives the probability of walking from xi to xj by a single step Markov random walk.

The one-step Markov random walk provides a local similarity measure between data points.

)2/exp( 22jiij xxw

N

k ik

ijij

w

wa

1

[Lu, Liao, Carin; 07] [Szummer & Jaakkola, 02]

Page 34: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Graph representation (2/2)

To account for global similarity between data points, we consider a t-step random walk, where the transition matrix is given by A raised to the power of t:

It was demonstrated[1] that the t-step Markov random walk would result in a volume of paths connecting the data points in stead of the shortest path that are susceptible to noise; thus it permits us to incorporate global manifold structure in the training data set.

The t-step neighborhood of xi is defined as the set of data points xj with

and denoted as

NNt

ijt a ][ )(A

[1] Tishby and Slonim, Data clustering by Markovian relaxation and the information bottleneck Method. NIPS 13, 2000

0)( tija ).( it xN

Page 35: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Semi-Supervised Learning Algorithm (1/2)

• Neighborhood-based classifier: Define the probability of label yi given the t-step neighborhood of xi as:

where is probability of labeling yi given a single data point xj

and is represented by a standard probabilistic classifier parameterized by

• The label yi implicitly propagates over the neighborhood. Thus it is possible to learn a classifier with only a few labels present.

N

jji

tijiti xypaxyp

1

)( ),|()),(|( N

),|( ji xyp.

Page 36: Inferring Data Inter-Relationships Via Fast Hierarchical Models

The Algorithm (2/2)• For binary classification problems, we choose the form of as logistic

regression classifier:

• To enforce sparseness, we impose a normal prior with zero mean and diagonal precision matrix on , and each hyper-parameter has an independent Gamma prior.

• Important for transfer learning: The semi-supervised algorithm is inductive and parametric

• Place a DP prior on parameters, shared among all tasks

},...{ 1 ddiag

)exp(11)|(

jT

iji xy

xyp

),|( ji xyp

Page 37: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Toy Data for Tasks 1-6

-8 -6 -4 -2 0 2 4 6 8-8

-6

-4

-2

0

2

4

6

8Data for task 1

x1

x2

Data for Class 1Data for Class 2

-8 -6 -4 -2 0 2 4 6 8-8

-6

-4

-2

0

2

4

6

8Data for task 2

x1

x2

Data for Class 1Data for Class 2

-6 -4 -2 0 2 4 6 8-6

-4

-2

0

2

4

6Data for task 3

x1

x2

Data for Class 1Data for Class 2

-6 -4 -2 0 2 4 6-8

-6

-4

-2

0

2

4

6

8Data for task 4

x1

x2

Data for Class 1Data for Class 2

-6 -4 -2 0 2 4 6-10

-8

-6

-4

-2

0

2

4

6

8Data for task 5

x1

x2

Data for Class 1Data for Class 2

-6 -4 -2 0 2 4 6-8

-6

-4

-2

0

2

4

6

8Data for task 6

x1

x2

Data for Class 1Data for Class 2

Page 38: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Sharing Data

-8 -6 -4 -2 0 2 4 6 8-8

-6

-4

-2

0

2

4

6

8Pooling task 1-3

x1

x2

Data for Class 1Data for Class 2

-8 -6 -4 -2 0 2 4 6 8-10

-8

-6

-4

-2

0

2

4

6

8Pooling task 1-6

x1

x2

Pooling tasks 1-3 Pooling tasks 1-6

Page 39: Inferring Data Inter-Relationships Via Fast Hierarchical Models

0 5 10 15 20 25 30 350.84

0.85

0.86

0.87

0.88

0.89

0.9

0.91

0.92

Number of labeled data from each task

Ave

rage

AU

C o

n 6

task

s

Supervised STLSemi-supervised STLSupervised MTLSemi-supervised MTL

Page 40: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Task similarity for MTL tasks 1-6

1 2 3 4 5 6

1

2

3

4

5

6

task

task

Page 41: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Navy-Relevant Data

Synthetic Aperture Radar (SAR) Data CollectedAt 19 Different Locations Across USA

Page 42: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Real Radar Sensor Data

• Data from 19 “tasks” or geographical regions

• 10 of these regions are relatively highly foliated

• 9 regions bare earth, or desert

• Algorithm adaptively and autonomously clusters the task-dependent classifier weights into two basic pools, which agree with truth

• Active learning used to define labels of interest for the site under test

• Other sites used as auxiliary data, in a “life-long-learning” setting

Page 43: Inferring Data Inter-Relationships Via Fast Hierarchical Models

40 80 120

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

Number of Labeled Data in Each Task

Ave

rage

AU

C o

n 19

task

s

Supervised SMTL-2Supervised SMTL-1Supervised STLSupervised PoolingSemi-Supervised STLSemi-Supervised MTL-Order 1Semi-Supervised MTL-Order 2

Supervised MTL: JMLR 07

Page 44: Inferring Data Inter-Relationships Via Fast Hierarchical Models
Page 45: Inferring Data Inter-Relationships Via Fast Hierarchical Models

• Classifier at new site placed appropriately within context of all available previous data

• Both labeled and unlabeled data employed

• Found that the algorithm relatively insensitive to particular labeled data selected

• Validation with relatively large music database

Previous deployments

New deployment

Page 46: Inferring Data Inter-Relationships Via Fast Hierarchical Models

Reconstruction of Random-Bars with hybrid CS. Example (a) is from [3], and (b-c) are the modified images from (a) by us to represent similar tasks for simultaneous CS inversion. The intensities of all the rectangles are randomly permuted, and the positions of all the rectangles are shifted by distances randomly sampled from a uniform distribution of [-10,10].

Page 47: Inferring Data Inter-Relationships Via Fast Hierarchical Models