a generalized linear model for principal component ... · a generalized linear model for principal...

29
A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar and Lawrence K. Saul. Department of Computer and Information Science The University Of Pennsylvania. Philadelphia, PA. Ninth International Workshop on AI and Statistics

Upload: others

Post on 24-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

A Generalized Linear Model for PrincipalComponent Analysis of Binary Data

January 6, 2003

Andrew I. Schein, Lyle H. Ungar and Lawrence K. Saul.Department of Computer and Information ScienceThe University Of Pennsylvania. Philadelphia, PA.

Ninth International Workshop on AI and Statistics

Page 2: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

PCA and LPCA

Principal Component Analysis is commonly called PCA.

PCA is a widely-used dimensionality reduction technique.

PCA handles real-valued data through a Gaussian assumption.

Today we will explore a different assumption for binary data.

Logistic PCA is to (Linear) PCAas

Logistic Regression is to Linear Regression

Schein et al. AI & Statistics 2003, p.1

Page 3: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Talk Outline

1. Linear PCA of Real-Valued Data (review)

2. Logistic PCA of Binary Data

Our Contributions Consist of:

3. Model Fitting by Alternating Least Squares

4. Experimental Results on 4 Natural Data Sets

Schein et al. AI & Statistics 2003, p.2

Page 4: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Multivariate Binary Data

People

Mov

ies

Person/Movie Co−Occurrence: Who Rated What

200 400 600 800

200

400

600

800

1000

1200

1400

1600

Schein et al. AI & Statistics 2003, p.3

Page 5: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Visualizing PCA

−1 −.5 0 .5 1−1

−.5

0

.5

1

X Axis

Y A

xis

Schein et al. AI & Statistics 2003, p.4

Page 6: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Applications of PCA

• Noise Removal

• Dimensionality Reduction

• Data Compression

• Visualization

• Exploratory Data Analysis

• Feature Extraction

Schein et al. AI & Statistics 2003, p.5

Page 7: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

PCA as Least-Squares Decomposition

Error(V L, U) =N

n=1

||Xn − UnV L||2

X = The Data: N × D

U = Latent Coordinates: N × L

V = Orthogonal Latent Axes: L × D

L << D, L is the dimensionality of the latent space.

Schein et al. AI & Statistics 2003, p.6

Page 8: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Gaussian Interpretation of PCA

When σ is known there is an equivalent model:

Xnd ∼ N ((UV L)nd, σ2)

Maximum Likelihood Objective = Least Squares Loss

So PCA assumes a Gaussian distribution on X.

Schein et al. AI & Statistics 2003, p.7

Page 9: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Generalized Principal Component Analysis (GPCA)

Collins et al. (2001) propose a generalized scheme for PCA.

Define a constrained decomposition of the natural parameter

Θnd = (UV )nd, dim(U) = N × L, dim(V ) = L × D.

N = Number of ObservationsD = Dimensionality of DataL = Dimensionality of Latent Space

L(V,U) = −∑

n

d

log P(Xnd|Θnd)

Insert your favorite exponential family distribution to instantiate P.

Schein et al. AI & Statistics 2003, p.8

Page 10: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

The Logistic Function

σ(θ) =1

1 + exp(−θ)

−4 −3 −2 −1 0 1 2 3 40

0.2

0.4

0.6

0.8

1

θ

σ(θ)

Schein et al. AI & Statistics 2003, p.9

Page 11: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Logistic PCA Model

Inserting Bernoulli Distribution We Get Log-Likelihood:

L =∑

n,d

[Xnd log σ(Θnd) + (1 − Xnd) log σ(−Θnd)]

subject to constraint

Θnd =∑

l

UnlVld

N = Number of ObservationsD = Dimensionality of DataL = Dimensionality of Latent Space

Schein et al. AI & Statistics 2003, p.10

Page 12: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

How to Fit LPCA?

Collins et al. (2001) propose a general strategy for fitting GPCA.

Applying this framework to the LPCA case looks hard if not intractable.

We take the approach of fitting LPCA through specialized strategies.

Our methods exploit bounds on the logistic function.

Schein et al. AI & Statistics 2003, p.11

Page 13: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Defining the Auxiliary Function for LPCA

A useful fact:

log σ(θ) = − log 2 + θ/2 − log cosh(θ/2)

We exploit a bound:

log cosh(θ/2) ≤ log cosh(θ0/2) + (θ2 − θ2

0)

[

tanh(θ0/2)

4θ0

]

[Jaakkola and Jordan, 1997. Tipping, 1999.]

The bound is concave and quadratic in the parameter θ.

Schein et al. AI & Statistics 2003, p.12

Page 14: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Visualizing the Approximation of σ

−4 −2 0 2 40

0.5

1

θ

σ(θ)

σ(θ)

approximation θ0 = 2

Schein et al. AI & Statistics 2003, p.13

Page 15: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Model Fitting by Alternating Least Squares

We develop a model fitting strategy that alternates between two steps:

• Fix V , find the least squares solution for U rows.

• Fix U , find the least squares solution for V columns.

Each iteration guarantees an improvement in log-likelihood.

The likelihood structure: global vs. local maxima is unknown.

Schein et al. AI & Statistics 2003, p.14

Page 16: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

A Related Use of the Bound

Normal Linear Factor Analysis (NLFA) is a generative cousin of PCA.

[Tipping, 1999], uses the bound to fit a logit/normit factor analysis.

Logit/normit factor analysis is a type of factor analysis for binary data.

LPCA is binary PCAwhile

logit/normit factor analysis is binary factor analysis

We follow Tipping’s factor analysis strategy in fitting LPCA.

Schein et al. AI & Statistics 2003, p.15

Page 17: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Logit/Normit Factor Analysis

Maximize:

L =∑

n,d

Xnd log σ(Θnd) + (1 − Xnd) log σ(−Θnd)

subject to constraint

Θnd =∑

l

UnlVld, where l is a latent space dimension

and Un ∼ N (0, I)

Schein et al. AI & Statistics 2003, p.16

Page 18: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Logit/Normit Factor Analysis

Un ∼ N (0, I)

Fitting the Un in logit/normit factor is harder than in LPCA.

It requires an additional variational approximation and iterative process.

Model fitting improves the lower bound on the log-likelihood.

...not necessarily the log-likelihood itself.

In contrast, ALS for LPCA guarantees an increase in the log-likelihood.

Schein et al. AI & Statistics 2003, p.17

Page 19: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Example in 3 Dimensions

00.25

0.50.75

1

0

0.25

0.5

0.75

10

0.25

0.5

0.75

1

1

X Axis

6

6

Y Axis

2

Z A

xis

Schein et al. AI & Statistics 2003, p.18

Page 20: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Example in 3 Dimensions

00.25

0.50.75

1

0

0.25

0.5

0.75

10

0.25

0.5

0.75

1

1

X Axis

6

6

Y Axis

2

Z A

xis

Schein et al. AI & Statistics 2003, p.19

Page 21: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Example in 3 Dimensions

00.25

0.50.75

1

0

0.25

0.5

0.75

10

0.25

0.5

0.75

1

1

X Axis

6

6

Y Axis

2

Z A

xis

Schein et al. AI & Statistics 2003, p.20

Page 22: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Empirical Evaluation: Data Reconstruction

CompressedData

Original

Data Data

Reconstructed

X = Original Data N = Number of ObservationsR = Reconstructed Data D = Dimensionality of the Data

Error =

∑N

n

∑D

d |Xnd − Rnd|

N ∗ D

Schein et al. AI & Statistics 2003, p.21

Page 23: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Microsoft Web Log Reconstruction Results

Web log shows URL visitation by anonymized users.

Data Set: a matrix of users and URLS clicked on

N = 32711 Observations are a sessionD = 285 URLS ClickedDensity = 0.011

Data Set Task: Build a recommender system of URLS.

Our Task: Data Reconstruction

Schein et al. AI & Statistics 2003, p.22

Page 24: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Microsoft Web Log Reconstruction Results

Error Rates (%)

L Linear PCA Logistic PCA1 1.52 1.282 1.41 1.154 1.36 0.7608 1.11 0.355

1 LPCA dimension ' 6 PCA dimensions

Schein et al. AI & Statistics 2003, p.23

Page 25: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Advertising Data Reconstruction Results

A UC Irvine data set of web linked images and surrounding features.

N = 3279 image linksD = 1555 context featuresdensity = 0.072

Data Set Task: Predict whether an image is an advertisement

Our Task: Data Reconstruction

Features include phrases in the anchor text and around image:

microsoft.com, toyotaofroswell.com, home+page

Schein et al. AI & Statistics 2003, p.24

Page 26: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Advertising Data Reconstruction Results

Error Rates (%)

L Linear PCA Logistic PCA1 2.68 1.972 2.39 1.204 2.17 0.6268 1.76 0.268

1 LPCA dimension ' 7 PCA dimensions

Schein et al. AI & Statistics 2003, p.25

Page 27: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Other Data Sets (in paper)

• Microarray Gene Expression Data

– Observations are genes– Attributes are environmental conditions– Binary values indicate whether genes are expressed or not

• MovieLens Movie Ratings Data

– Observations are users– Attributes are movies– Binary values indicate whether a user rated a movie or not

Schein et al. AI & Statistics 2003, p.26

Page 28: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Related Models

Both of these models share a decomposition:

Θnd = (UV )nd

• Factor Analysis: A generative relative of PCA

• Multinomial PCA (MPCA): A multinomial, generative variant of PCA

MPCA is represented in the proceedings: [Buntine and Perttu, 2003].

Schein et al. AI & Statistics 2003, p.27

Page 29: A Generalized Linear Model for Principal Component ... · A Generalized Linear Model for Principal Component Analysis of Binary Data January 6, 2003 Andrew I. Schein, Lyle H. Ungar

Summary

• We derive and implement the ALS algorithm for fitting LPCA.

• In data reconstruction experiments, LPCA outperforms PCA.

• LPCA is well suited for smoothed probability models of binary data:

– People and URLs they click– Phrase features surrounding image links

• Future work will explore LPCA in other traditional PCA tasks.

– Feature extraction– Machine learning

Schein et al. AI & Statistics 2003, p.28