introduction ling 572 fei xia week 1: 1/4/06. outline course overview mathematical foundation:...

81
Introduction LING 572 Fei Xia Week 1: 1/4/06

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Introduction

LING 572

Fei Xia

Week 1: 1/4/06

Page 2: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Outline

• Course overview

• Mathematical foundation: (Prereq)– Probability theory– Information theory

• Basic concepts in the classification task

Page 3: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Course overview

Page 4: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

General info

• Course url: http://courses.washington.edu/ling572

– Syllabus (incl. slides, assignments, and papers): updated every week.

– Message board– ESubmit

• Slides:– I will try to put the slides online before class.– “Additional slides” are not required and not covered in

class.

Page 5: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Office hour

• Fei:– Email:

• Email address: fxia@u • Subject line should include “ling572”• The 48-hour rule

– Office hour: • Time: Fr 10-11:20am • Location: Padelford A-210G

Page 6: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Lab session

• Bill McNeil– Email: billmcn@u– Lab session: what time is good for you?

• Explaining homework and solution• Mallet related questions• Reviewing class material

I highly recommend you to attend lab sessions, especially the first few sessions.

Page 7: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Time for Lab Session

• Time:– Monday: 10:00am - 12:20pm, or– Tues: 10:30 am - 11:30 am, or– ??

• Location: ??

Thursday 3-4pm, MGH 271?

Page 8: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Misc

• Ling572 Mailing list: ling572a_wi07@u

• EPost

• Mallet developer mailing list:

[email protected]

Page 9: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Prerequisites

• Ling570– Some basic algorithms: FSA, HMM, – NLP tasks: tokenization, POS tagging, ….

• Programming: If you don’t know Java well, talk to me. – Java: Mallet

• Basic concepts in probability and statistics– Ex: random variables, chain rule, Gaussian distribution, ….

• Basic concepts in Information Theory: – Ex: entropy, relative entropy, …

Page 10: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Expectations

• Reading: – Papers are online: – Reference book: Manning & Schutze (MS)– Finish reading papers before class

I will ask you questions.

Page 11: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Grades

• Assignments (9 parts): 90%– Programming language: Java

• Class participation: 10%

• No quizzes, no final exams

• No “incomplete” unless you can prove your case.

Page 12: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Course objectives

• Covering basic statistical methods that produce state-of-the-art results

• Focusing on classification algorithms

• Touching on unsupervised and semi-supervised algorithms

• Some material is not easy. We will focus on applications, not theoretical proofs.

Page 13: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Course layout

• Supervised methods– Classification algorithms:

• Individual classifiers:– Naïve Bayes– kNN and Rocchio – Decision tree– Decision list: ??– Maximum Entropy (MaxEnt)

• Classifier ensemble:– Bagging– Boosting– System combination

Page 14: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Course layout (cnt)

• Supervised algorithms (cont)– Sequence labeling algorithms:

• Transformation-based learning (TBL)• FST, HMM, …

• Semi-supervised methods– Self-training– Co-training

Page 15: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Course layout (cont)

• Unsupervised methods– EM algorithm

• Forward-backward algorithm• Inside-outside algorithm• …

Page 16: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Questions for each method

• Modeling: – what is the model? – How does the decomposition work? – What kind of assumption is made? – How many types of model parameters? – How many “internal” (or non-model) parameters?– How to handle multi-class problem?– How to handle non-binary features?– …

Page 17: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Questions for each method (cont)

• Training: how to estimate parameters? • Decoding: how to find the “best” solution?• Weaknesses and strengths?

– Is the algorithm • robust? (e.g., handling outliners) • scalable?• prone to overfitting?• efficient in training time? Test time?

– How much data is needed?• Labeled data• Unlabeled data

Page 18: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Relation between 570/571 and 572

• 570/571 are organized by tasks; 572 is organized by learning methods.

• 572 focuses on statistical methods.

Page 19: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

NLP tasks covered in Ling570

• Tokenization

• Morphological analysis

• POS tagging

• Shallow parsing

• WSD

• NE tagging

Page 20: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

NLP tasks covered in Ling571

• Parsing

• Semantics

• Discourse

• Dialogue

• Natural language generation (NLG)

• …

Page 21: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

A ML method for multiple NLP tasks

• Task (570/571):– Tokenization– POS tagging– Parsing– Reference resolution– …

• Method (572):– MaxEnt

Page 22: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Multiple methods for one NLP task

• Task (570/571): POS tagging

• Method (572):– Decision tree– MaxEnt– Boosting– Bagging– ….

Page 23: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Projects: Task 1• Text Classification Task: 20 groups

– P1: First look at the Mallet package– P2: Your first tui class Naïve Bayes– P3: Feature selection Decision Tree– P4: Bagging Boosting

• Individual project

Page 24: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Projects: Task 2

• Sequence labeling task: IGT detection– P5: MaxEnt– P6: Beam Search– P7: TBA– P8: Presentation: final class– P9: Final report

• Group project (?)

Page 25: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Both projects

• Use Mallet, a Java package

• Two types of work:– Reading code to understand ML methods– Writing code to solve problems

Page 26: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Feedback on assignments

• “Misc” section in each assignment– How long it takes to finish the homework?– Which part is difficult?– …

Page 27: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Mallet overview

• It is a Java package, that includes many– classifiers, – sequence labeling algorithms,– optimization algorithms,– useful data classes,– …

• You should – read “Mallet Guides”– attend mallet tutorial: next Tuesday 10:30-11:30am: LLC109– start on Hw1

• I will use Mallet class/method names if possible.

Page 28: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Questions for “course overview”?

Page 29: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Outline

• Course overview

• Mathematical foundation– Probability theory– Information theory

• Basic concepts in the classification task

Page 30: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Probability Theory

Page 31: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Basic concepts

• Sample space, event, event space

• Random variable and random vector

• Conditional probability, joint probability, marginal probability (prior)

Page 32: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Sample space, event, event space

• Sample space (Ω): a collection of basic outcomes. – Ex: toss a coin twice: {HH, HT, TH, TT}

• Event: an event is a subset of Ω.– Ex: {HT, TH}

• Event space (2Ω): the set of all possible events.

Page 33: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Random variable

• The outcome of an experiment need not be a number.

• We often want to represent outcomes as numbers.

• A random variable X is a function: ΩR.– Ex: toss a coin twice: X(HH)=0, X(HT)=1, …

Page 34: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Two types of random variables

• Discrete: X takes on only a countable number of possible values.– Ex: Toss a coin 10 times. X is the number of

tails that are noted.

• Continuous: X takes on an uncountable number of possible values.– Ex: X is the lifetime (in hours) of a light bulb.

Page 35: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Probability function

• The probability function of a discrete variable X is a function which gives the probability p(xi) that X equals xi: a.k.a. p(xi) = p(X=xi).

1)(

1)(0

ix

i

i

xp

xp

Page 36: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Random vector

• Random vector is a finite-dimensional vector of random variables: X=[X1,…,Xk].

• P(x) = P(x1,x2,…,xn)=P(X1=x1,…., Xn=xn)

• Ex: P(w1, …, wn, t1, …, tn)

Page 37: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Three types of probability

• Joint prob: P(x,y)= prob of x and y happening together

• Conditional prob: P(x|y) = prob of x given a specific value of y

• Marginal prob: P(x) = prob of x for all possible values of y

Page 38: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Common tricks (I):Marginal prob joint prob

B

BAPAP ),()(

nAA

nAAPAP,...,

11

2

),...,()(

Page 39: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Common tricks (II):Chain rule

)|(*)()|(*)(),( BAPBPABPAPBAP

),...|(),...,( 111

1 ii

in AAAPAAP

Page 40: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Common tricks (III):Bayes rule

)(

)()|(

)(

),()|(

AP

BPBAP

AP

BAPABP

)()|(maxarg

)(

)()|(maxarg

)|(maxarg*

yPyxP

xP

yPyxP

xyPy

y

y

y

Page 41: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Common tricks (IV):Independence assumption

)|(

),...|(),...,(

11

111

1

ii

i

ii

in

AAP

AAAPAAP

Page 42: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Prior and Posterior distribution

• Prior distribution: P() a distribution over parameter values θ set prior to

observing any data.

• Posterior Distribution: P( |data) It represents our belief that θ is true after observing the

data.

• Likelihood of the model : P(data | )

• Relation among the three: Bayes Rule: P( | data) = P(data | ) P() / P(data)

Page 43: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Two ways of estimating

• Maximum likelihood: (ML)

* = arg max P(data | )

• Maxinum A-Posterior: (MAP)

* = arg max P(data)

Page 44: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Information Theory

Page 45: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Information theory

• It is the use of probability theory to quantify and measure “information”.

• Basic concepts:– Entropy– Joint entropy and conditional entropy– Cross entropy and relative entropy– Mutual information and perplexity

Page 46: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Entropy

• Entropy is a measure of the uncertainty associated with a distribution.

• The lower bound on the number of bits it takes to transmit messages.

• An example: – Display the results of horse races. – Goal: minimize the number of bits to encode the results.

x

xpxpXH )(log)()(

Page 47: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

An example

• Uniform distribution: pi=1/8.

• Non-uniform distribution: (1/2,1/4,1/8, 1/16, 1/64, 1/64, 1/64, 1/64)

bitsXH 3)8

1log8

1(*8)( 2

bitsXH 2)64

1log

64

1*4

16

1log

16

1

8

1log8

1

4

1log4

1

2

1log2

1()(

(0, 10, 110, 1110, 111100, 111101, 111110, 111111)

Uniform distribution has higher entropy.MaxEnt: make the distribution as “uniform” as possible.

Page 48: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Joint and conditional entropy

• Joint entropy:

• Conditional entropy:

x y

yxpyxpYXH ),(log),(),(

)(),(

)|(log),()|(

XHYXH

xypyxpXYHx y

Page 49: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Cross Entropy

• Entropy:

• Cross Entropy:

• Cross entropy is a distance measure between p(x) and q(x): p(x) is the true probability; q(x) is our estimate of p(x).

xc

x

xqxpXH

xpxpXH

)(log)()(

)(log)()(

)()( XHXH c

Page 50: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Relative Entropy

• Also called Kullback-Leibler divergence:

• Another “distance” measure between prob functions p and q.

• KL divergence is asymmetric (not a true distance):

)()()(

)(log)()||( 2 XHXH

xq

xpxpqpKL c

),(),( pqKLqpKL

Page 51: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Mutual information

• It measures how much is in common between X and Y:

• I(X;Y)=KL(p(x,y)||p(x)p(y))

);(

),()()(

)()(

),(log),();(

XYI

YXHYHXH

ypxp

yxpyxpYXI

x y

Page 52: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Perplexity

• Perplexity is 2H.

• Perplexity is the weighted average number of choices a random variable has to make.

Page 53: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Questions for “Mathematical foundation”?

Page 54: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Outline

• Course overview

• Mathematical foundation– Probability theory– Information theory

• Basic concepts in the classification task

Page 55: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Types of ML problems

• Classification problem• Estimation problem• Clustering• Discovery• …

A learning method can be applied to one or more types of ML problems.

We will focus on the classification problem.

Page 56: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Definition of classification problem

• Task: – C= {c1, c2, .., cm} is a set of pre-defined classes

(a.k.a., labels, categories).– D={d1, d2, …} is a set of input needed to be classified.– A classifier is a function: D £ C {0, 1}.

• Multi-label vs. single-label– Single-label: for each di, only one class is assigned to

it.

• Multi-class vs. binary classification problem– Binary: |C| = 2.

Page 57: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Conversion to single-label binary problem

• Multi-label single-label– We will focus on single-label problem.– A classifier: D £ C {0, 1}

becomes D C– More general definition: D £ C [0, 1]

• Multi-class binary problem– Positive examples vs. negative examples

Page 58: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Examples of classification problems

• Text classification

• Document filtering

• Language/Author/Speaker id

• WSD

• PP attachment

• Automatic essay grading

• …

Page 59: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Problems that can be treated as a classification problem

• Tokenization / Word segmentation

• POS tagging

• NE detection

• NP chunking

• Parsing

• Reference resolution

• …

Page 60: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Labeled vs. unlabeled data

• Labeled data:– {(xi, yi)} is a set of labeled data.

– xi 2 D: data/input, often represented as a feature vector.

– yi 2 C: target/label

• Unlabeled data– {xi} without yi.

Page 61: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Instance, training and test data

• xi with or without yi is called an instance.

• Training data: a set of (labeled) instances.

• Test data: a set of unlabeled instances.

• The training data is stored in an InstanceList in Mallet, so is test data.

Page 62: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Attribute-value table

• Each row corresponds to an instance.• Each column corresponds to a feature.

• A feature type (a.k.a. a feature template): w-1

• A feature: w-1=book• Binary feature vs. non-binary feature

Page 63: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Attribute-value table

f1 f2 … fK Target

d1 yes 1 no -1000 c2

d2

d3

dn

Page 64: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Feature sequence vs. Feature vector

• Feature sequence: a (featName, featValue) list for features that are present.

• Feature Vector: a (featName, featValue) list for all the features.

• Representing data x as a feature vector.

Page 65: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Data/Input a feature vector

• Example:– Task: text classification– Original x: a document– Feature vector: bag-of-words approach

• In Mallet, the process is handled by a sequence of pipes:– Tokenization– Lowercase– Merging the counts– …

Page 66: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Classifier and decision matrix

• A classifier is a function f: f(x) = {(ci, scorei)}. It fills out a decision matrix.

• {(ci, scorei)} is called a Classification in Mallet.

d1 d2 d3 ….

c1 0.1 0.4 0 …

c20.9 0.1 0 …

c3

Page 67: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Trainer (a.k.a Learner)

• A trainer is a function that takes an InstanceList as input, and outputs a classifier.

• Training stage: – Classifier train (instanceList);

• Test stage:– Classification classify (instance);

Page 68: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Important concepts (summary)

• Instance, InstanceList• Labeled data, unlabeled data• Training data, test data

• Feature, feature template• Feature vector• Attribute-value table

• Trainer, classifier• Training stage, test stage

Page 69: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Steps for solving an NLP task with classifiers

• Convert the task into a classification problem (optional)

• Split data into training/test/validation

• Convert the data into attribute-value table

• Training

• Decoding

• Evaluation

Page 70: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Important subtasks (for you)

• Converting the data into attribute-value table– Define feature types– Feature selection– Convert an instance into a feature vector

• Understanding training/decoding algorithms for various algorithms.

Page 71: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Notation

Classification in general

Text categorization

Input/data xi di

Target/label yi ci

Features fk tk (term)

… … …

Page 72: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Questions for “Concepts in a classification task”?

Page 73: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Summary

• Course overview

• Mathematical foundation– Probability theory– Information theoryM&S Ch2

• Basic concepts in the classification task

Page 74: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Downloading

• Hw1

• Mallet Guide

• Homework Guide

Page 75: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Coming up

• Next Tuesday:– Mallet tutorial on 1/8 (Tues): 10:30-11:30am

at LLC 109.– Classification algorithm overview and Naïve

Bayes: read the paper beforehand.

• Next Thursday:– kNN and Rocchio: read the other paper

• Hw1 is due at 11pm on 1/13

Page 76: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Additional slides

Page 77: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

An example

• 570/571:– POS tagging: HMM– Parsing: PCFG– MT: Model 1-4 training

• 572:– HMM: forward-backward algorithm– PCFG: inside-outside algorithm– MT: EM algorithm All special cases of EM algorithm, one method of

unsupervised learning.

Page 78: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Proof: Relative entropy is always non-negative

1log0 zzz

0))(())((

)))()((())1)(

)()(((

)(

)(log)(

)(

)(log)(

)||(

xx

xx

xx

xqxp

xpxqxp

xqxp

xp

xqxp

xq

xpxp

qpKL

Page 79: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Entropy of a language

• The entropy of a language L:

• If we make certain assumptions that the language is “nice”, then the cross entropy can be calculated as:

n

xpxp

LH nxnn

n

1

)(log)(

lim)(11

n

xp

n

xpLH nn

n

)(log)(loglim)( 11

Page 80: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Cross entropy of a language

• The cross entropy of a language L:

• If we make certain assumptions that the language is “nice”, then the cross entropy can be calculated as:

n

xqxp

qLH nxnn

n

1

)(log)(

lim),(11

n

xq

n

xqqLH nn

n

)(log)(loglim),( 11

Page 81: Introduction LING 572 Fei Xia Week 1: 1/4/06. Outline Course overview Mathematical foundation: (Prereq) –Probability theory –Information theory Basic

Conditional Entropy