midterm review
DESCRIPTION
Midterm Review. CS4705 Natural Language Processing. Midterm Review. Statistical v. Symbolic Processing 80/20 Rule Regular Expressions Finite State Automata Determinism v. non-determinism (Weighted) Finite State Transducers Morphology Word Classes Inflectional v. Derivational - PowerPoint PPT PresentationTRANSCRIPT
Midterm Review
CS4705
Natural Language Processing
• Statistical v. Symbolic Processing– 80/20 Rule
• Regular Expressions • Finite State Automata
– Determinism v. non-determinism– (Weighted) Finite State Transducers
• Morphology– Word Classes– Inflectional v. Derivational– Affixation, infixation, concatenation– Morphotactics
Midterm Review
• Morphological parsing– Koskenniemi’s two-level morphology– Porter stemmer
• Minimum Edit Distance (Levenshtein)• N-grams
– Markov assumption– Chain Rule– Language Modeling
• Simple, Adaptive, Class-based (syntax-based), bursty
– Smoothing• Add-one, Witten-Bell, Good-Turing
– Back-off– Perplexity, Entropy
• Maximum Likelihood Estimation
• Syntax– Chomsky’s view: Syntax is cognitive reality– Parse Trees
• Dependency Structure
– Part-of-Speech Tagging• Hand Written Rules v. Statistical v. Hybrid• Brill Tagging
– Types of Ambiguity
• Context Free Grammars– Top-down v. Bottom-up Derivations
• Left Corners
– Grammar Equivalence– Normal Forms (CNF)
• Probabilistic Parsing– (p)CYK, Earley Parsing– Derivational Probability– Lexicalization– Classification– Supertagging
• Machine Learning– Dependent v. Independent variables– Training v. Development Test v. Test sets– Feature Vectors– Metrics
• Accuracy• Precision, Recall, F-Measure
– Gold Standards