![Page 1: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/1.jpg)
Rogelio Nazar & Maarten JanssenIULA, Universitat Pompeu Fabra, Barcelona
![Page 2: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/2.jpg)
Dictionaries good source for information Long tradition of taxonomy extraction
Calzolari (1977), Amsler (1981), Chodorow et al (1985), Fox et al. (1988), Alshawi (1989), Boguraev (1991), Barrière & Popowich (1996), Chang (1998), Renau & Battaner (2008)
Exploiting Machine Readable Dictionaries Parsing definitional phrases Pattern extraction, Shallow parsing Full treatment of a single dictionary
![Page 3: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/3.jpg)
There is a lot of information available Hand crafted, high-qualify resources
Combining yields new data Taxonomy from multiple dictionaries
Language-independent shallow method Combining definitions of the same word Various dictionaries, online versions DRAE, DGLE, Clave, DEM Frequency Based
![Page 4: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/4.jpg)
Dictionaries differ◦ Different lexicon and definitions◦ Even if only for legal reasons
Hyperonym should be the same◦ A cat is an animal◦ Unless there is uncertainty in the hyperonym
Most dictionaries should use same genus◦ Statistically relevant
![Page 5: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/5.jpg)
3xablandabrevaspersona2xcom. inútil1xsubstantivocomúnfig.
![Page 6: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/6.jpg)
Directly from harvested text◦ With begin/end tags
No textual analysis More than definitions
◦ Examples, multiple senses, etc. Sense matching impossible
◦ Entries unsystematic◦ Dictionaries do not match in senses
![Page 7: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/7.jpg)
Minimum number of dictionaries Raw frequency count
◦ Hyperonym tends to be repeated Candidates have to be words
◦ Of the same word-class Use of a stop-list
◦ Dictionary generated◦ Words that occur in more than 10% entries
![Page 8: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/8.jpg)
# deconstrucción (3 dictionaries)teoría 2 1EWN: 0.desconstrucción; 0.deconstrucción; 1.teoría filosófica; 1.doctrina filosófica; 2.filosofía; 3.creencia; 4.contenido mental; 5.conocimiento; 5.cognición; 6.rasgo psicológico;
# descubrimiento (5 dictionaries)acción 3 3cosa 3 5efecto 2 -EWN: 0.descubrimiento; 1.logro; 1.presentación; 1.revelación; 2.realización; 2.información; 2.exposición; 3.acción; 3.hecho; 3.acto de habla; 3.comunicación visual; 4.acto; 4.actividad humana; 4.comunicación; 5.relación social; 6.relación; 7.abstracción;
# cumbia (5 dictionaries)danza 2 -EWN: 0.cumbiamba; 0.cumbia; 1.baile regional; 1.danza popular; 2.baile social; 3.baile; 4.recreación; 4.diversión; 5.actividad; 6.acto; 6.actividad humana;
# asta (5 dictionaries)mar 6 -lanza 6 -media 5 -toro 5 -cuerno 5 -bandera 4 -EWN: 0.cuerno; 0.asta; 1.tomadero; 1.materia animal; 1.cogedero; 1.bastón; 1.agarradera; 1.asimiento; 1.asidero; 1.asa; 2.materia; 2.apéndice; 2.vara; 2.palo; 3.porción; 3.sustancia; 3.parte; 3.herramienta; 4.utillaje; 5.artefacto; 6.objeto físico; 6.cosa; 6.objeto; 6.objeto inanimado; 7.competente; 7.respirar; 7.capaz; 7.entidad;
![Page 9: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/9.jpg)
WordNet (still) best available taxonomy◦ Not the best resource for evaluation
Automatic Verification◦ 100 Random nouns◦ Best 5 hyperonymy candidates◦ Match when candidate in chain
Only about 50% accurracy
![Page 10: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/10.jpg)
![Page 11: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/11.jpg)
WordNet ◦ Many intermediate/artificial levels◦ Compulsory hyperonym◦ Contains proper names
Dictonaries ◦ More word-senses◦ Alternative definitions (synonymy, paraphrasis,
…) Differences
◦ Different choice of hyperonym◦ Different lexicon
![Page 12: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/12.jpg)
![Page 13: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/13.jpg)
![Page 14: Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona](https://reader036.vdocument.in/reader036/viewer/2022081503/5665b4391a28abb57c902619/html5/thumbnails/14.jpg)
Question?