semi-supervised learning and self-training ling 572 fei xia 02/14/06

36
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Semi-supervised learning and self-training

LING 572

Fei Xia

02/14/06

Page 2: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Outline

• Overview of Semi-supervised learning (SSL)

• Self-training (a.k.a. Bootstrapping)

Page 3: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Additional Reference

• Xiaojin Zhu (2006): Semi-supervised learning literature survey.

• Olivier Chapelle et al. (2005): Semi-supervised Learning. The MIT Press.

Page 4: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Overview of SSL

Page 5: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

What is SSL?

• Labeled data:– Ex: POS tagging: tagged sentences – Creating labeled data is difficult, expensive, and/or

time-consuming.

• Unlabeled data:– Ex: POS tagging: untagged sentences. – Obtaining unlabeled data is easier.

• Goal: use both labeled and unlabeled data to improve the performance

Page 6: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

• Learning– Supervised (labeled data only)– Semi-supervised (both labeled and unlabeled data)– Unsupervised (unlabeled data only)

• Problems:– Classification– Regression– Clustering– …

Focus on semi-supervised classification problem

Page 7: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

A brief history of SSL

• The idea of self-training appeared in the 1960s.

• SSL took off in the 1970s.

• The interest for SSL increased in the 1990s, mostly due to applications in NLP.

Page 8: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Does SSL work?• Yes, under certain conditions.

– Problem itself: the knowledge on p(x) carry information that is useful in the inference of p(y | x).

– Algorithm: the modeling assumption fits well with the problem structure.

• SSL will be most useful when there are far more unlabeled data than labeled data.

• SSL could degrade the performance when mistakes reinforce themselves.

Page 9: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Illustration(Zhu, 2006)

Page 10: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Illustration (cont)

Page 11: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Assumptions

• Smoothness (continuity) assumption: if two points x1 and x2 in a high-density region are close, then so should be the corresponding outputs y1 and y2.

• Cluster assumption: If points are in the same cluster, they are likely to be of the same class.

Low density separation: the decision boundary should lie in a low density region.

• ….

Page 12: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

SSL algorithms

• Self-training

• Co-training

• Generative models: – Ex: EM with generative mixture models

• Low Density Separations:– Ex: Transductive SVM

• Graph-based models

Page 13: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Which SSL method should we use?

• It depends.

• Semi-supervised methods make strong model assumptions. Choose the ones whose assumptions fit the problem structure.

Page 14: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Semi-supervised and active learning

• They address the same issue: labeled data are hard to get.

• Semi-supervised: choose the unlabeled data to be added to the labeled data.

• Active learning: choose the unlabeled data to be annotated.

Page 15: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Self-training

Page 16: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Basics of self-training

• Probably the earliest SSL idea.

• Also called self-teaching or bootstrapping.

• Appeared in the 1960s and 1970s.

• First well-known NLP paper: (Yarowsky, 1995)

Page 17: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Self-training algorithm

• Let L be the set of labeled data, U be the set of unlabeled data.

• Repeat – Train a classifier h with training data L– Classify data in U with h– Find a subset U’ of U with the most confident scores.– L + U’ L – U – U’ U

Page 18: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Case study: (Yarowsky, 1995)

Page 19: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Setting

• Task: WSD– Ex: plant: living / factory

• “Unsupervised”: just need a few seed collocations for each sense.

• Learner: Decision list

Page 20: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Assumption #1: One sense per collocation

• Nearby words provide strong and consistent clues to the sense of a target word.

• The effect varies depending on the type of collocation– It is strongest for immediately adjacent collocations.

• Assumption #1 Use collocations in the decision rules.

Page 21: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Assumption #2: One sense per discourse

• The sense of a target word is highly consistent within any given document.

• The assumption holds most of the time (99.8% in their experiment)

• Assumption #2 filter and augment the addition of unlabeled data.

Page 22: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Step 1: identify all examples of the given word: “plant”

Our sample set S

Page 23: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Step 2: Create initial labeled data using a small number of seed collocations

Sense A: “life”

Sense B: “manufacturing”Our L(0)

U(0) = S – L(0): residual data set.

Page 24: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Initial “labeled” data

Page 25: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Step 3a: Train a BL classifier

Page 26: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Step 3b: Apply the classifier to the entire set

• Add to L the members in U which are tagged with prob above a threshold.

)1()()(

)1()()(

iii

iii

UVU

LVL

)1()()0(

)1()()0(

ii

ii

UVU

LVL

or

Page 27: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Step 3c: filter and augment this addition with assumption #2

Page 28: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Repeat step 3 until converge

Page 29: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

The final DL classifier

Page 30: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

The original algorithm

Keep initial labeling unchanged.

Page 31: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Options for obtaining the seeds

• Use words in dictionary definitions

• Use a single defining collocate for each sense:– Ex: “bird” and “machine” for the word “crane”

• Label salient corpus collocates:– Collect frequent collocates– Manually check the collocates

Getting the seeds is not hard.

Page 32: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Performance

Baseline: 63.9%Supervised learning: 96.1%Unsupervised learning: 90.6% - 96.5%

Page 33: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Why does it work?

• (Steven Abney, 2004): “Understanding the Yarowsky Algorithm”

Page 34: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Summary of self-training

• The algorithm is straightforward and intuitive.

• It produces outstanding results.

• Added unlabeled data pollute the original labeled data:

Page 35: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Additional slides

Page 36: Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06

Papers on self-training

• Yarowsky (1995): WSD

• Riloff et al. (2003): identify subjective nouns

• Maeireizo et al. (2004): classify dialogues as “emotional” or “non-emotional”.