thomas muller, hinrich schutze and helmut schmid acl june 3-8, 2012 reporter:sitong yang
DESCRIPTION
A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union. ICT. Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang. Outline. Introduction Modeling of morphology and shape Experimental Setup - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/1.jpg)
1
A Comparative Investigation of Morphological Language Modeling
for the Languages of the European UnionThomas Muller, Hinrich Schutze and Helmut Schmid
ACL June 3-8, 2012 Reporter:Sitong Yang
ICT
![Page 2: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/2.jpg)
2
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
![Page 3: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/3.jpg)
3
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
![Page 4: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/4.jpg)
4
Introduction
• Motivation
• Main idea
![Page 5: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/5.jpg)
5
Motivation
Language model?
potentially
large
dangerous
serious
hypothetically
large
dangerous
serious
(frequent history) (rare history)
how to transfer ?
morphology
![Page 6: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/6.jpg)
6
main idea• goal
•perplexity reduction(PD) for a large number of languages
![Page 7: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/7.jpg)
7
main idea• goal
•perplexity reduction(PD) for a large number of languages
• Feature•Morphologigy•Shape Feature
![Page 8: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/8.jpg)
8
main idea• goal
•perplexity reduction(PD) for a large number of languages
• Feature•Morphologigy•Shape Feature
• parameters•frequency threshold θ•number of suffixes uesd φ•morphological segmentation algorithms
![Page 9: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/9.jpg)
9
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
![Page 10: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/10.jpg)
10
Modeling of morphology and shape
• Morphology
• Shape features
• Similarity measure
![Page 11: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/11.jpg)
11
Morphology
• Automatic suffix identification algorithms:Reports , Morfessor and Frequency
• Parameter:φ most frequent suffixes
![Page 12: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/12.jpg)
12
Shape features• capitalization• special characters• word length
![Page 13: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/13.jpg)
13
similarity measure
• similarity measure and details of the shape features in prior work (M¨ uller and Sch¨ utze, 2011).
![Page 14: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/14.jpg)
14
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
![Page 15: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/15.jpg)
15
Experimental Setup• Baseline
• Morphological class language model
• Distributional class language model
• Corpus
![Page 16: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/16.jpg)
16
Experimental Setup• Experiments:
•srilm, kneser-Ney(KN), generic class implementation, optimal interpolation parameters
• Baseline•modified KN model
![Page 17: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/17.jpg)
17
Morphological class language model
Class-based language model:
Word emission probobility:
![Page 18: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/18.jpg)
18
Morphological class language model
Final model PM interpolates PC with a modified KN model:
Unknow word estimation:
![Page 19: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/19.jpg)
19
Morphological class language model
modified class model PC'
![Page 20: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/20.jpg)
20
Distributional class language model
• PD is same form PM
• The difference is the classes are mophological for PM and distributional for PD
• Whole-context distributional vector space model
![Page 21: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/21.jpg)
21
Corpus• training set(80%)• validation set(10%)• test set(10%)
![Page 22: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/22.jpg)
22
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
![Page 23: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/23.jpg)
23
Results and Discussion
• Morphological model vs. Distributional model
• Sensitivity analysis of parameters
![Page 24: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/24.jpg)
24
Morphological model vs. Distributional model
• MM:more morphological , more perplexity reduction ,largerφ.
• MM : Result considerable perplexity reduc-tions 3%-11%
• Frequency is surprisingly well
• Noly 4 cases DM better than MM
• DM restriction clustering to less frequent words
![Page 25: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/25.jpg)
25
Morphological model vs. Distributional model
![Page 26: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/26.jpg)
26
Sensitivity analysis of parameters• best and worst values of each parameter and the diffe
rence in perplexity improve-ment between the two.
• θ•strong influence on PD•positive correlated with morphological complexit
y
• φ and segmentation algorithms•negligible effect•frequency is perform best.
![Page 27: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/27.jpg)
27
Sensitivity analysis of parameters
![Page 28: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/28.jpg)
28
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
![Page 29: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/29.jpg)
29
Conclusion• Feature:morphology shape feature
• Result:perplexity reduc-tions 3%-11%
• parameters:•θ:considerable influence•φ and segmentation algorithms: small effect
![Page 30: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/30.jpg)
30
Future Work• A model that interpolates KN, morphological class mo
del and distributional class model.
![Page 31: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/31.jpg)
31
my thought
• Minority language model
![Page 32: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/32.jpg)
32
Q&A?
ICT
![Page 33: Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang](https://reader035.vdocument.in/reader035/viewer/2022081603/56814882550346895db59468/html5/thumbnails/33.jpg)
33
Thank you!
ICT