1 introduction ling 572 fei xia, dan jinguji week 1: 1/08/08

32
1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Post on 21-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

1

Introduction

LING 572

Fei Xia, Dan Jinguji

Week 1: 1/08/08

Page 2: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Outline

• General course information

• Course contents

• Reading assignment #1: due 1/10

• Take-home exam #1: due 1/10

2

Page 3: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

3

General info

• Course url: http://courses.washington.edu/ling572

– Syllabus (incl. slides, assignments, and papers): updated every week.

– GoPost:– Collect it:

• Please check your emails and GoPost at least once per day.

Page 4: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

4

Slides

• The slides will be online before class.

• The final version will be uploaded a few hours after class.

• “Additional slides” are not required and not covered in class.

Page 5: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

5

Prerequisites• CS 326 (Data Structures) or equivalent:

– Ex: hash table, array, tree, …

• Stat 391 (Prob. and Stats for CS) or equivalent: Basic concepts in probability and statistics– Ex: random variables, chain rule, Bayes’ rule

• Programming in Perl, C, C++, Java, or Python

• Basic unix/linux commands (e.g., ls, cd, ln, sort, head):  tutorials on unix

• LING570:

• If you don’t meet all the prerequisites above, you need to get permission from Fei before taking LING572.

Page 6: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

LING570 prerequisites

• If you did not take LING570 last quarter, you need to understand all the material covered in that class.

• Especially,– the material in Weeks #9-#10– hw #10, Quiz #4– the Mallet package– http://courses.washington.edu/ling570

6

Page 7: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

7

Topics covered in Ling570

• Unit #1:– Formal languages and formal grammars– FSA, FST– Morphological analysis

• Unit #2: LM, ngram, and smoothing

• Unit #3: HMM and POS tagging

• Unit #4: Classification and sequence labeling tasks.

Page 8: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

8

Grades for LING572

No midterm or final exams.

Graded:• Assignments (9): 65-75%• Take-home exams (3-5): 15-25%

Not graded:• Reading assignments (5-9): 5-10%• Class participation: 5%

Page 9: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

9

Office hour

• Fei:– Email:

• Email address: [email protected] • Subject line should include “ling572”• The 48-hour rule: it works both ways

– Office hour: • Time: Fr: 10:30-11:30am• Location: Padelford A-210G

Page 10: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

10

TA hour

• Dan Jinguji– Email: [email protected]

– Time: T: ??

– Location: Art 337

Page 11: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Assignments / Exams

• Assignments: the same as in ling570

• Take-home exams: to replace in-class quizzes

• Reading assignments: some papers should be read before class.

• When there are take-home exams or reading assignments for the same period, the amount of assignments will be reduced accordingly.

• The total amount of time spent on Ling572 will be about 15-30 hours.

11

Page 12: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Assignments

• Nine assignments

• Programming languages: C, C++, Java, Perl, or Python.

• Please follow the instructions in the assignments, including– command line format– file format– the probability model– …

12

Page 13: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

The Mallet package

• Several assignments will use the Mallet package.

• If you don’t know how to use Mallet, you should get familiar with the package ASAP. You can start with the hw10 from LING570.

• The Mallet slides are at http://courses.washington.edu/ling570/fei_fall07/11_28_mallet.pdf

• The LING570 hw10 is at http://courses.washington.edu/ling570/fei_fall07/hw10.pdf

13

Page 14: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

14

Assignment submission

• Use “Collect it”: submit the tar file.– E.g., tar –cvf hw1.tar hw1_dir

• Due date: every Thurs at 1pm unless specified otherwise

• The submission area is closed 4 days after the due date.

• There is 1% penalty for every hour after the due date.

Page 15: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

15

Homework Submission (cont)

• Each submission includes – a note file: hw1.(txt|doc|pdf) for hw1.

• If your code does not work, explain in the note file what you have implemented so far.

– a set of shell scripts: e.g., kNN.sh– source code: e.g., kNN.C– binary code (for C/C++/Java): kNN.out– data files if any.– The TA will NOT compile or debug your code.

• Time spent on an assignment: 10-20 hours/week

I would appreciate it if you could tell me the time you spent on the homework.

Page 16: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Take-home exams

• Normally, it is handed out on Tuesday, and due on Thursday

• Bring the hardcopy of your solution to class.

• The dates on the tentative schedule are subject to change.

• Extension will be granted for the exams ONLY under extremely unusual circumstances.

• There are no makeup exams. If you know that you will miss a class in which an exam could be given, you need to inform me at least two hours before the class starts.

16

Page 17: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Take-home exams (cont)

• You should complete the exams on your own.

• No discussion among students is allowed.

• You can refer to anything that is available on the course url and on patas, but please don’t search the Web for answers.

• If you have any questions about the exams, please email Fei.

17

Page 18: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Reading assignments

• You will answer some questions about the papers that will be discussed in next class.

• The questions are on teaching slides, and there are no separate documents for them.

• Your answers should be concise and no more than a few lines.

• Your answers are due before the next class. Bring the hardcopy of your answers to class.

18

Page 19: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Summary of assignments and exams

Assignments (hw)

Take-home exams

Reading assignments

Num 9 3-5 5-9

Distribution Download from the course url

Distributed in class

Download from the course url

Discussion Allowed Disallowed Allowed

Submission Collect It Bring to class Bring to classNot graded

Due date 1pm every Thurs Before next class Before next class

Extension 1% penalty per hour

Normally disallowed

Disallowed

Estimate of hours 10-20 hours 1-5 hours 2-5 hours

Solution files On Patas On Patas Discussed in class

19

Page 20: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Patas

• If you need to have a patas account, you need to email [email protected] right away to get an account.

• The directory for LING572:

~/dropbox/07-08/572/– hw1/, hw2/, ….: Assignments and solution– misc_slides/: Solution to exams and misc slides that

are not on the course url.

• For jobs that run more than 5 minutes, use the cluster submission commands: see Hw1.

20

Page 21: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

21

Course plan

Page 22: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Types of ML problems

• Classification problem• Estimation problem• Clustering• Discovery• …

A learning method can be applied to one or more types of ML problems.

We will focus on the classification problem.

22

Page 23: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Course objectives

• Covering basic statistical methods that produce state-of-the-art results

• Focusing on classification and sequence labeling problems

• Some ML algorithms are complex. We will focus on basic ideas, not theoretical proofs.

23

Page 24: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Main units• Unit #1 (2 weeks): simple classification algorithms

– kNN– Decision tree– Naïve Bayes

• Unit #2 (3 weeks): advanced classification algorithms– MaxEnt* – SVM**

24

Page 25: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Main units (cont)

• Unit #3 (2 weeks): sequence labeling algorithms– TBL* (if time permits)– CRF**

• Unit #4 (1 week): system combination

• Unit #5 (if time permits)– Introduction to semi-supervised learning **– Introduction to EM **

25

Page 26: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Other topics

• Information theory

• Feature selection

• Converting multi-class task to binary classification task

26

Page 27: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Three levels of discussion

The default (i.e., unmarked): We will discuss the model, training, decoding, etc. – kNN, Decision tree, Naïve Bayes.

*: We will discuss the model, but not the training and other implementation issues:– MaxEnt, TBL

**: We will only go over the main intuition about the algorithms:– SVM, CRF, semi-supervised learning, EM

27

Page 28: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Questions for each ML method

• Modeling: – what is the model? – What kind of assumption is made by the model? – How many types of model parameters? – How many “internal” (or non-model) parameters?– …

28

Page 29: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Questions for each method (cont)

• Training: how to estimate parameters?

• Decoding: how to find the “best” solution?

• Weaknesses and strengths? – Is the algorithm

• robust? (e.g., handling outliners) • scalable?• prone to overfitting?• efficient in training time? Test time?

– How much data is needed?• Labeled data• Unlabeled data

29

Page 30: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Reading assignment #1

30

Page 31: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Reading assignment #1

• Read M&S 2.2: Essential Information Theory

• Questions: For a random variable X, p(x) and q(x) are two distributions: Assuming p is the real distribution.– p(X=a)=p(X=b)=1/8, p(X=c)=1/4, p(X=d)=1/2– q(X=a)=q(X=b)=q(X=c)=q(X=d)=1/4

(a) What is H(X)?

(b) What is cross entropy H(X,q)?

(c) What is KL divergence D(p||q)?

(d) What is D(q||p)?31

Page 32: 1 Introduction LING 572 Fei Xia, Dan Jinguji Week 1: 1/08/08

Next time

• Both Reading #1 and Exam #1 are due before class.

• Bring the hardcopy to class.

• Topics:– Information theory and hw #1– Solution to Exam #1 (if time permits)– Recap on the classification problem (if time permits)

32