transfer learning - eddy pei-hao su · transfer learning pei-hao (eddy) su1 and yingzhen li2...

53
Outline Motivation Historical points Definition Case studies Transfer Learning Pei-Hao (Eddy) Su 1 and Yingzhen Li 2 1 Dialogue Systems Group and 2 Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su and Yingzhen Li Transfer Learning 1 / 41

Upload: others

Post on 23-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Transfer Learning

Pei-Hao (Eddy) Su1 and Yingzhen Li2

1Dialogue Systems Group and 2Machine Learning Group

January 29, 2015

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 1 / 41

Page 2: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Outline

1 Motivation

2 Historical points

3 Definition

4 Case studies

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 2 / 41

Page 3: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Outline

1 Motivation

2 Historical points

3 Definition

4 Case studies

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 3 / 41

Page 4: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Standard Supervised Learning Task

Most ML tasks assume the training/test data are drawn fromthe same data space and the same distribution

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 4 / 41

Page 5: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Standard Supervised Learning Task

Most ML tasks assume the training/test data are drawn fromthe same data space and the same distribution

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 4 / 41

Page 6: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

NLP tasks: POS, NER, Category labelling

Modified from Gao et al.’s presentation in KDD ’08

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 5 / 41

Page 7: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Combine and get better result

Modified from Gao et al.’s presentation in KDD ’08

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 6 / 41

Page 8: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Motivation

Traditional ML tasks assume the training/test data are drawnfrom the same data space and the same distribution

Insufficient labelled data result in poor prediction performance

Lots of (un-)related existing data from various sources

Start from scratch is always time-consuming

Transfer knowledge from other sources may help!

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 7 / 41

Page 9: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Motivation (Taylor et.al JMLR ’09)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 8 / 41

Page 10: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Outline

1 Motivation

2 Historical points

3 Definition

4 Case studies

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 9 / 41

Page 11: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Psychology and Education

In 1901, Thorndike and Woodworth explored how individualstransfer similar characteristics shared by different contexts

In 1992, Perkins and Salomon published ”Transfer ofLearning” which defined different types of transfer

examples:

Skill learning: C/C + +→ PythonLanguage acquisition: German→ English

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 10 / 41

Page 12: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Psychology and Education

In 1901, Thorndike and Woodworth explored how individualstransfer similar characteristics shared by different contexts

In 1992, Perkins and Salomon published ”Transfer ofLearning” which defined different types of transfer

examples:

Skill learning: C/C + +→ PythonLanguage acquisition: German→ English

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 10 / 41

Page 13: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Psychology and Education

In 1901, Thorndike and Woodworth explored how individualstransfer similar characteristics shared by different contexts

In 1992, Perkins and Salomon published ”Transfer ofLearning” which defined different types of transfer

examples:

Skill learning: C/C + +→ PythonLanguage acquisition: German→ English

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 10 / 41

Page 14: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Machine Learning

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 11 / 41

Page 15: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Machine Learning

Explanation-Based Neural Network Learning: A LifelongLearning Approach [Thrun PhD ’95, NIPS ’96]

Multitask Learning [Caruana ICML ’93 & ’96, PhD ’97]Workshops

Learning to Learn: Knowledge Consolidation and Transfer inInductive Systems [NIPS ’95]Inductive Transfer: 10 Years Later [NIPS ’05]Structural Knowledge Transfer for Machine Learning [ICML’06]Transfer Learning for Complex Tasks [AAAI ’08]Lifelong Learning [AAAI ’11]Theoretically Grounded Transfer Learning [ICML ’13]Workshop: Second Workshop on Transfer and Multi-TaskLearning: Theory meets Practice [NIPS ’14]...

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 12 / 41

Page 16: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Machine Learning

Explanation-Based Neural Network Learning: A LifelongLearning Approach [Thrun PhD ’95, NIPS ’96]

Multitask Learning [Caruana ICML ’93 & ’96, PhD ’97]

WorkshopsLearning to Learn: Knowledge Consolidation and Transfer inInductive Systems [NIPS ’95]Inductive Transfer: 10 Years Later [NIPS ’05]Structural Knowledge Transfer for Machine Learning [ICML’06]Transfer Learning for Complex Tasks [AAAI ’08]Lifelong Learning [AAAI ’11]Theoretically Grounded Transfer Learning [ICML ’13]Workshop: Second Workshop on Transfer and Multi-TaskLearning: Theory meets Practice [NIPS ’14]...

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 12 / 41

Page 17: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Machine Learning

Explanation-Based Neural Network Learning: A LifelongLearning Approach [Thrun PhD ’95, NIPS ’96]

Multitask Learning [Caruana ICML ’93 & ’96, PhD ’97]Workshops

Learning to Learn: Knowledge Consolidation and Transfer inInductive Systems [NIPS ’95]Inductive Transfer: 10 Years Later [NIPS ’05]Structural Knowledge Transfer for Machine Learning [ICML’06]Transfer Learning for Complex Tasks [AAAI ’08]Lifelong Learning [AAAI ’11]Theoretically Grounded Transfer Learning [ICML ’13]Workshop: Second Workshop on Transfer and Multi-TaskLearning: Theory meets Practice [NIPS ’14]...

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 12 / 41

Page 18: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Outline

1 Motivation

2 Historical points

3 Definition

4 Case studies

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 13 / 41

Page 19: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Definition

Notations

Domain D1 Data space X2 Marginal distribution P(X ), where X ∈ X

Task T (Given D = {X ,P(X )})1 Label space Y2 Learn a f : X → Y to approach the underlying P(Y |X ), where

X ∈ X and Y ∈ Y

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 14 / 41

Page 20: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Definition

Assume we have only one source S and one target T :

Definition

Transfer Learning (TL): Given a source domain DS and learningtask TS , a target domain DT and learning task TT , transferlearning aims to help improve the learning of the target predictivefunction fT (·) in DT using the knowledge in DS and TS , where

DS 6= DT (either XS 6= XT or PS(X ) 6= PT (X ))

or TS 6= TT (either YS 6= YT or P(YS |XS) 6= P(YT |XT ))

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 15 / 41

Page 21: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Example: Category labelling

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 16 / 41

Page 22: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Example: Category labelling

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 17 / 41

Page 23: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Example: Category labelling

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 18 / 41

Page 24: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

ML v.s. TL (Langley ’06, Yang et al. ’13)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 19 / 41

Page 25: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Outline

1 Motivation

2 Historical points

3 Definition

4 Case studies

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 20 / 41

Page 26: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Transfer in practice

The rest of the talk will give you an intuition, with examples, on:

when to transfer

what to transfer

and how to transfer

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 21 / 41

Page 27: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

When to transfer: Domain relatedness

Transfer learning is applicable when there exists relatedness

Standard machine learning assume source = target

Transferring knowledge from unrelated domain can be harmful- Negative transfer [Rosenstein et al NIPS-05 Workshop]

(Ben-David et al.) proposed a bound of target domain error

ReferenceBen-David et al. Analysis of Representation for Domain Adaptation. NIPS ’06

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 22 / 41

Page 28: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

When to transfer (Ben-David et al.)

In standard binary classification supervised learning task:

Given X ,Y = {0, 1} and samples from P(x , y), we aim to learnf : X → [0, 1] which captures P(y |x)

Often we decompose the problem into:

1 determine a feature mapping Φ : X → Z2 learn a hypothesis h : Z → {0, 1} on dataset {Φ(x), y}

In transfer learning scenario:

Theorem (Simplified version of Thm. 1&2)

Given X = XS = XT and PS(x),PT (x) the distributions of the source andtarget domain. Let Φ : X → Z be a fixed mapping function and H be ahypothesis space. For any hypothesis h ∈ H trained on source domain:

εT (h) ≤ εS(h) + dH(P̃S , P̃T ) + εS(h∗) + εT (h∗)

where P̃S , P̃T are induced distributions on Z wrt. PS and PT ,h∗ = arg minh∈H(εS(h) + εT (h)) is the best hypothesis by joint training.

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 23 / 41

Page 29: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Domain adaptation

Approach 1: mixture of general & specific component

Can we learn hypotheses for both the general and specific components?

Reference:Daume III. Frustratingly easy domain adaptation. ACL ’07Daume III et al. Co-regularization Based Semi-supervised Domain Adaptation.NIPS ’10

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 24 / 41

Page 30: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

EasyAdapt (Daume III)

Binary classification problem:

XS = XT ⊂ Rd , YS = YT = {−1,+1}Goal: obtain classifier fT : XT → YT

in SVM context: learn a hypothesis hT ∈ Rd

However:

too little training data available on (XT ,YT ) for robust training

also P(xS) 6= P(xT ) and P(xS , yT ) 6= P(xS , yT )

...so directly apply a trained hypothesis hs returns bad results

How to use xS , yS ∼ P(xS , yS) to improve learning of hT?

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 25 / 41

Page 31: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

EasyAdapt (Daume III)

EasyAdapt algorithm

define two mappings ΦS ,ΦT : Rd →R3d :

ΦS(xS) = (xS , xS , 0), Φt(xT ) = (xT , 0, xT )

training: learn a hypothesis h = (w g ,w s ,w t) ∈ R3d on transformeddataset {(ΦS(xS), yS)} ∪ {(ΦT (xT ), yT )}test: apply hT = w g + w t on xT

(also hS = w g + w s)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 26 / 41

Page 32: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

EA++ (Daume III et al.)

Use unlabelled data to improve training:

want hS and hT to agree on unlabelled data xU :

hS · xU = hT · xU ⇔ w s · xU = w t · xU ⇔ h · (0, xU ,−xU) = 0

so we define mapping ΦU : Rd →R3d for unlabelled data

ΦU(xU) = (0, xU ,−xU) (1)

and train the hypothesis h on augmented and transformed dataset{(ΦS(xS), yS)} ∪ {(ΦT (xT ), yT )} ∪ {(ΦU(xU), 0)}

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 27 / 41

Page 33: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

EA++ (Daume III et al.)

(a) DVD → BOOKS (proxy A-distance=0.7616),(b) KITCHEN → APPAREL (proxy A-distance=0.0459).

SOURCE/TARGETONLY(-FULL): trained on source/target (full)labelled samples

ALL: trained on combined labelled samples

EA/EA++: trained in augmented feature space (and unlabelled targetdata)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 28 / 41

Page 34: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Feature transfer

Approach 2: shared lower-level features

DNN first layer learns Gabor filters or color blobs when trained on images

instances in source/target domain share the same lower-level features?

Reference:Yosinski et al. How transferable are features in deep neural networks? NIPS’14.

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 29 / 41

Page 35: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Feature transfer1

Lee et al. Convolutional deep belief networks for scalable unsupervised learningof hierarchical representations. ICML ’09

1adapt from Ruslan Salakhutdinov’s tutorial in MLSS’14 BeijingPei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 30 / 41

Page 36: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Feature transfer (Yosinski et al.)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 31 / 41

Page 37: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Feature transfer (Yosinski et al.)

Test 1 (similar datasets): random A/B splits of the ImageNet dataset

(similar source and target domain training/testing instances)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 32 / 41

Page 38: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Feature transfer (Yosinski et al.)

Test 1 (similar datasets): random A/B splits of the ImageNet dataset

(similar source and target domain training/testing instances)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 32 / 41

Page 39: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Feature transfer (Yosinski et al.)

Test 1 (similar datasets): random A/B splits of the ImageNet dataset

(similar source and target domain training/testing instances)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 32 / 41

Page 40: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Feature transfer (Yosinski et al.)

Test 2 (very different datasets): man-made/natural object split

(dissimilar source and target domain training/testing instances)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 33 / 41

Page 41: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Feature transfer (Yosinski et al.)

Test 2 (very different datasets): man-made/natural object split

(dissimilar source and target domain training/testing instances)

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 33 / 41

Page 42: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Joint representation

Approach 3: joint feature representation

data has many domain specific characteristics

however might be related in high level?

our brain might work like this as well

Reference:Srivastava and Salakhutdinov. Multimodal Learning with Deep BoltzmannMachines. NIPS ’12, JMLR 15 (2014).

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 34 / 41

Page 43: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Joint representation (Srivastava et al.)

MIR Flickr Dataset http://press.liacs.nl/mirflickr/

For images

1M datapoints, 25K labelled instances in 38 classes, 10K fortraining, 5K for validation and 10K for testinginputs are the concatenation of PHOW and MPEG-7 features

For texts

use word count vectors on 2K frequently used tags (verysparse)18% training images have missing texts

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 35 / 41

Page 44: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Joint representation (Srivastava et al.)

for images: 2-layer deep Boltzmann machine (DBM) with Gaussian input units

(vmi ∈ R, abbrev. W(k)m (i , j) as W

(k)ij )

P(vm, h(1)m , h(2)

m ) ∝

exp

−∑i

(vmi − bi )2

2σ2i

+∑i,j

vmi

σiW

(1)ij h

(1)mj +

∑j,l

h(1)mj W

(2)jl h

(2)ml

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 36 / 41

Page 45: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Joint representation (Srivastava et al.)

for texts: 2-layer DBM with replicated softmax model (vti counts the

occurrence of word i, abbrev. W(k)t (i , j) as W

(k)ij )

P(vt , h(1)t , h

(2)t ) ∝

exp

−∑i=1

vtibi +∑i,j

vtiW(1)ij h

(1)mj +

∑j,l

h(1)tj W

(2)jl h

(2)tl

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 36 / 41

Page 46: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Joint representation (Srivastava et al.)

combining domain specific models to a multimodal DBM:

P(vm, vt , h; θ) ∝

exp(−E(h(2)

m , h(2)t , h(3))− E(vm, h

(1)m , h(2)

m )− E(vt , h(1)t , h

(2)t ))

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 36 / 41

Page 47: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Joint representation (Srivastava et al.)

first pre-train domain specific DBMs with CD, then co-train the jointmodel with PCD

use mean-field variational approximation when computing hidden unitmoments driven by data

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 36 / 41

Page 48: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Joint representation (Srivastava et al.)

Results:

Figure: Classification with data from both image and text domain

Figure: Classification with data from image domain only

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 37 / 41

Page 49: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Joint representation (Srivastava et al.)

Results:

Figure: Retrieval results for multi/image domain queries

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 37 / 41

Page 50: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Conclusions

In this talk, we showed that

transfer learning adapts knowledge from other sources to improve targettask performance

domains related to each other in different ways

In the future:

manage large scale data that do not lack in size but may lack in quality

manage data which may continuously change over time

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 38 / 41

Page 51: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Open Questions2

what are the limits of existing multi-task learning methods when thenumber of tasks grows while each task is described by only a small bunchof samples (“big T, small n”)?

what is the right way to leverage over noisy data gathered from theInternet as reference for a new task?

how can an automatic system process a continuous stream of informationin time and progressively adapt for life-long learning?

can deep learning help to learn the right representation (e.g., tasksimilarity matrix) in kernel-based transfer and multi-task learning?

How can similarities across languages help us adapt to different domainsin natural language processing tasks?

...

2nips.cc/Conferences/2014/Program/event.php?ID=4282

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 39 / 41

Page 52: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Thank you

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 40 / 41

Page 53: Transfer Learning - Eddy Pei-Hao Su · Transfer Learning Pei-Hao (Eddy) Su1 and Yingzhen Li2 1Dialogue Systems Group and 2Machine Learning Group January 29, 2015 Pei-Hao (Eddy) Su

Outline Motivation Historical points Definition Case studies

Reference

1 Pan and Yang. A Survey on Transfer Learning. IEEE TKDE 20102 Pan and Yang. Transfer Learning. MLSS 20113 Taylor et al. Transfer Learning for Reinforcement Learning Domains: A

Survey. JMLR 20104 Langley. Transfer of Learning in Cognitive System. ICML 20065 Perkins et al. Transfer of Learning. IEE 19926 Thrun. Explanation-Based Neural Network Learning: A Lifelong Learning

Approach. PhD thesis 19957 Caruana. Multitask Learning. PhD thesis 19938 Ben-David et al. Analysis of Representation for Domain Adaptation.

NIPS 20069 Daume III. Frustratingly easy domain adaptation. ACL 200710 Daume III et al. Co-regularization Based Semi-supervised Domain

Adaptation. NIPS 201011 Yosinski et al. How transferable are features in deep neural networks?

NIPS 201412 Lee et al. Convolutional deep belief networks for scalable unsupervised

learning of hierarchical representations. ICML 200913 Srivastava and Salakhutdinov. Multimodal Learning with Deep

Boltzmann Machines. NIPS 2012, JMLR 2014

Pei-Hao (Eddy) Su and Yingzhen Li

Transfer Learning 41 / 41