midterm review

5
Midterm Review CS4705 Natural Language Processing

Upload: shad-hansen

Post on 31-Dec-2015

12 views

Category:

Documents


0 download

DESCRIPTION

Midterm Review. CS4705 Natural Language Processing. Midterm Review. Statistical v. Symbolic Processing 80/20 Rule Regular Expressions Finite State Automata Determinism v. non-determinism (Weighted) Finite State Transducers Morphology Word Classes Inflectional v. Derivational - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Midterm Review

Midterm Review

CS4705

Natural Language Processing

Page 2: Midterm Review

• Statistical v. Symbolic Processing– 80/20 Rule

• Regular Expressions • Finite State Automata

– Determinism v. non-determinism– (Weighted) Finite State Transducers

• Morphology– Word Classes– Inflectional v. Derivational– Affixation, infixation, concatenation– Morphotactics

Midterm Review

Page 3: Midterm Review

• Morphological parsing– Koskenniemi’s two-level morphology– Porter stemmer

• Minimum Edit Distance (Levenshtein)• N-grams

– Markov assumption– Chain Rule– Language Modeling

• Simple, Adaptive, Class-based (syntax-based), bursty

– Smoothing• Add-one, Witten-Bell, Good-Turing

– Back-off– Perplexity, Entropy

• Maximum Likelihood Estimation

Page 4: Midterm Review

• Syntax– Chomsky’s view: Syntax is cognitive reality– Parse Trees

• Dependency Structure

– Part-of-Speech Tagging• Hand Written Rules v. Statistical v. Hybrid• Brill Tagging

– Types of Ambiguity

• Context Free Grammars– Top-down v. Bottom-up Derivations

• Left Corners

– Grammar Equivalence– Normal Forms (CNF)

Page 5: Midterm Review

• Probabilistic Parsing– (p)CYK, Earley Parsing– Derivational Probability– Lexicalization– Classification– Supertagging

• Machine Learning– Dependent v. Independent variables– Training v. Development Test v. Test sets– Feature Vectors– Metrics

• Accuracy• Precision, Recall, F-Measure

– Gold Standards