semi-automated extension of a specialized medical lexicon for french bruno cartoni & pierre...
TRANSCRIPT
![Page 1: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/1.jpg)
Semi-Automated Extension of a Specialized Medical Lexicon for French
Bruno Cartoni & Pierre ZweigenbaumLIMSI-CNRS, France
![Page 2: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/2.jpg)
2
Outline
Context : UMLF for French The desired coverage The target lexical information The organisation of a specialised lexicon
Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique
Results Consensus guessing Acquisition of the full paradigm General improvement
Conclusion and further work
![Page 3: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/3.jpg)
3
Context : the InterSTIS project
InterSTIS: development of Terminology Server for French Medical Terminologies
Sub-Project: Improving the Lexical Coverage of a French medical lexicon (UMLF : Unified Medical Lexicon for French)
Use: support indexation process of medical texts
Issues: What is the desired lexical knowledge ? How to acquire it ?
![Page 4: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/4.jpg)
4
The desired coverage
Reference: “Term-Union” Union of 10 terminologies (CIM-10,
SNOMED, MeSH, CISMeF, …) of French medical domains, organised around concept identifiers (CUI) of the UMLS
311,518 terms 203,300 unique concepts (CUI) 94,964 word-forms
![Page 5: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/5.jpg)
5
Term-Union: example
C0000936 MSHFRE … Accommodation de l'oeiC0000936 MSHFRE … Accommodation des yeuxC0000936 MSHFRE … Accommodation oculaireC0000936 SNMIGIPFRE … accommodation visuelle...C00001558 MSHF … Voie cutanéeC00001558 MSHF … Voie intradermiqueC00001558 MSHF … Voie percutanéeC00001558 MSHF … Voie transcutanée
Observation of term variation
![Page 6: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/6.jpg)
6
Target lexical information
Term variation within Term-Union Graphemic
équilibre acido-basique – équilibre acidobasique [EN: acid-base balance]
Morphosyntactic adaptation de l'oeil - adaptation des yeux
[EN: eye adaptation]
Morphosemantic intoxication à l’alcool - intoxication alcoolique
[EN: alcohol intoxication]
Others ...
![Page 7: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/7.jpg)
7
Organisation of the specialised lexicon
3 types of relational tables for the 3 levels of representation (graphemic, inflection, derivation)
A full-entry lexicon (LMF compliant) that gathers all lexical information
…inter-maxillaire | intermaxillaireinsulino-sécrétantes | insulinosécrétantesscléro-cornéenne | sclérocornéenne …
...abdominal | abdomenaplasique | aplasiearachnoïdien | arachnoïdeargentique | argent…
…sérofibrineux | sérofibrineux | Afpmssérofibrineuse | sérofibrineux | Afpfssérofibrineux | sérofibrineux | Afpmpsérofibrineuses | sérofibrineux | Afpfp…
![Page 8: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/8.jpg)
8
Outline
Context : UMLS for French The desired coverage The target lexical information The organisation of a specialised lexicon
Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique
Results Consensus guessing Acquisition of the full paradigm General improvement
Conclusion and further work
![Page 9: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/9.jpg)
9
Acquiring the lexical information
Initial coverage of UMLF (previous project, UMLF, based on Baud et al. 1998) 17,192 lexical units
5,353 adjectives 11,799 nouns
36,211 word forms
![Page 10: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/10.jpg)
10
Acquiring the lexical information
From general lexicon Existing French general lexicon
(Morphalou) With a guessing technique
![Page 11: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/11.jpg)
11
Acquiring the lexical information
From guessing technique (Tanguy & Hathout 2007)
3 steps: Learning phase : calculating the most
frequent tag for each ending string in 2 existing lexicons
Guessing phase: assigning possible tag(s)
Cross validation with 2 guessing based on 2 lexicons
![Page 12: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/12.jpg)
12
Acquiring the lexical information
Acquiring the full paradigm All the inflectional forms Lemma
Based on “productive” inflectional paradigms 9 for adjectives 3 for nouns
Algorithm based on lexical tries to cluster forms of the same paradigm
![Page 13: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/13.jpg)
13
Outline
Context : UMLS for French The desired coverage The target lexical information The organisation of a specialised lexicon
Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique
Results Consensus guessing Acquisition of the full paradigm General improvement
Conclusion and further work
![Page 14: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/14.jpg)
14
Acquisition from general lexicon: results
74,9786,617Morphalou
81,59519,599Initial UMLF
94,964Term-Union
Remaining words to describe
Known words entries
![Page 15: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/15.jpg)
15
Acquisition with guessing techniques: results
74,978 unknown forms 44,515 analyses from Morphalou-based
program 35,438 analyses from UMLF-based
program Cross-validation: 30,137 in common
![Page 16: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/16.jpg)
16
Acquisition with guessing techniques: evaluation
Errors: 82 out of 1000 (8.2 %)
82Total
5Other
10Spelling/segmentation
1English words
5Latin words
49Proper names
12Wrong label
![Page 17: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/17.jpg)
17
Acquisition of the full paradigm: Results
4,453 paradigms captured (incomplete or not, grouping 9352 word forms) 3,308 adjectives 514 nouns
Automatic extension for the full paradigms (with canonical forms only)
Manually checked for the others
![Page 18: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/18.jpg)
18
General improvement
25,7%70,6028,088Acquisition
21,0%74,97817,828Morphalou
14,1%81,59536,211UMLF-v1
CoverageStill unknown in Term-union
Forms added
Source
![Page 19: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/19.jpg)
19
Outline
Context : UMLS for French The desired coverage The target lexical information The organisation of a specialized lexicon
Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique
Results Consensus guessing Acquisition of the full paradigm General improvement
Conclusion and further work
![Page 20: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/20.jpg)
20
Discussion and conclusion
The acquisition and evaluation of specialised lexical resources require a specific reference Term-Union Extract (full) lexical information Assess lexical needs and target
Other acquisition techniques (CRF for inflectional information, rule-based techniques for derivational information)
![Page 21: Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France](https://reader030.vdocument.in/reader030/viewer/2022032612/56649eb55503460f94bbda1b/html5/thumbnails/21.jpg)
21
Acknowledgment
This work was partially funded by project InterSTIS (ANR-07-TECSAN-010)
InterSTIS project: www.interstis.org