Download - Amia06
![Page 1: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/1.jpg)
AMIA-2006 1
A Comparative Study of Supervised Learning
as Applied to Acronym Expansion in
Clinical Reports
Mahesh Joshi, Serguei Pakhomov, Ted Pedersen, Christopher G. Chute
University of Minnesota, DuluthMayo College of Medicine, Rochester
![Page 2: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/2.jpg)
AMIA-2006 2
Overview
• Acronyms are ambiguous– in general, and in more specialized domains
• Acronyms can be disambiguated by expansion – expansions act as senses or definitions
• Acronym expansion can be viewed as word sense disambiguation– supervised learning from annotated examples
• Features trump learning algorithms– unigrams dominant
![Page 3: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/3.jpg)
AMIA-2006 3
AMIA - Top Google Results
• American Medical Informatics Association
• Association of Moving Image Archivists
• Anglican Mission in America
• Associcion Mutual Israelita Argentina
![Page 4: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/4.jpg)
AMIA-2006 4
RN in Wikipedia
• Registered Nurse
• Royal Navy
• Radio National
• Radio Nederland
• Richard Nixon
• Registered Identification Number
• Renovacion Nacional
![Page 5: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/5.jpg)
AMIA-2006 5
Acronym Ambiguity not just a problem for General English…
• 33% of Acronyms in UMLS are ambiguous– Liu et. al. AMIA-2001
• 81% of Acronyms in MEDLINE abstracts are ambiguous, with an average of 16 expansions– Liu et. al. AMIA-2002
![Page 6: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/6.jpg)
AMIA-2006 6
We view AE as WSD
• AE – sense 1: American Eagle– sense 2: Arab Emirates– sense 3: acronym expansion
• WSD– sense 1: Washington School for the Deaf– sense 2: web server director– sense 3: word sense disambiguation
![Page 7: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/7.jpg)
AMIA-2006 7
Methodology
• Identify 16 ambiguous acronyms– 9 from Pakhomov, et. al. AMIA-2005– 7 newly annotated for this this study
• Manually annotate in clinical notes– 7,738 total instances from Mayo Clinic
database of clinical notes
• Use as training data for supervised learning
![Page 8: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/8.jpg)
AMIA-2006 8
Acronyms (majority < 50%)
• AC – Acromioclavicular– Antitussive with Codeine– Acid Controller– 10 more
• APC – Argon Plasma Coagulation – Adenomatous Polyposis Coli– Atrial Premature Contraction– 10 more expansions
• LE– Limited Exam Lower
Extremity– Initials– 5 more expansions
• PE – Pulmonary Embolism– Pressure Equalizing– Patient Education– 12 more expansions
![Page 9: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/9.jpg)
AMIA-2006 9
Acronyms (50% < majority < 80%)
• CP– Chest Pain– Cerebral Palsy– Cerebellopontine– 19 more expansions
• HD– Huntington's Disease – Hemodialysis– Hospital Day– 9 more expansions
• CF– Cystic Fibrosis – Cold Formula– Complement Fixation– 6 more expansions
• MCI– Mild Cognitive Impairment– Methylchloroisothiazolinone– Microwave Communications,
Inc.– 5 more expansions
• ID– Infectious Disease– Identification– Idaho Identified– 4 more expansions
• LA– Long Acting– Person– Left Atrium– 5 more expansions
![Page 10: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/10.jpg)
AMIA-2006 10
Acronyms (majority > 80%)• MI
– Myocardial Infarction– Michigan– Unknown– 2 more expansions
• ACA– Adenocarcinoma– Anterior Cerebral Artery– Anterior Communication
Artery– 3 more expansions
• GE– Gastroesophageal– General Exam– Generose– General Electric
• HA– Headache– Hearing Aid– Hydroxyapatite– 2 more expansions
• FEN– Fluids, Electrolytes and
Nutrition– Drug Fen Phen– Unknown
• NSR– Normal Sinus Rhythm– Nasoseptal Reconstruction
![Page 11: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/11.jpg)
AMIA-2006 11
Experimental Objectives
• Compare performance of ML methods– Naïve Bayesian classifier– J48/C4.5 decision tree learner – Support vector machine (SMO)
• Compare four different feature sets– POS tags from Brill-Hepple Tagger– Unigrams that occur 5 or more times
• Flexible window of size 5 around target
– Bigrams that occur 5 or more times• Flexible window of size 5 around target
– Unigrams + Bigrams + POS tags
![Page 12: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/12.jpg)
AMIA-2006 12
Feature Extraction
• Horizon : up to 5 content words to left and right of target• Boundaries : cross sentences, but not clinical notes• Skip stop words• Bigrams are pairs of contiguous content words• Example (CF is target):
– Unigrams: “if she is found to be a carrier, then they will follow with CF carrier testing in her husband.”
– Bigrams: “if she is found to be a carrier, then they will follow with CF carrier testing in her husband.”
![Page 13: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/13.jpg)
AMIA-2006 13
Results (majority < 50%)Feature Comparison (AC, APC, LE, PE)
30
40
50
60
70
80
90
100
Decision Trees Naïve Bayes SVM
Classifier
Accu
racy (
%)
POS bigrams unigrams ALL Majority
![Page 14: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/14.jpg)
AMIA-2006 14
Results (50% < majority < 80%)Feature Comparison (CP, HD, CF, MCI, ID, LA)
30
40
50
60
70
80
90
100
Decision Trees Naïve Bayes SVM
Classifier
Accu
racy (
%)
POS bigrams unigrams ALL Majority
![Page 15: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/15.jpg)
AMIA-2006 15
Results (majority > 80%)Feature Comparison (MI, ACA, GE, HA, FEN, NSR)
30
40
50
60
70
80
90
100
Decision Trees Naïve Bayes SVM
Classifier
Accu
racy (
%)
POS bigrams unigrams ALL Majority
![Page 16: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/16.jpg)
AMIA-2006 16
Results (flexible window)Fixed vs. Flexible Window Performance
70
75
80
85
90
95
1 2 3 4 5 6 7 8 9 10Window Size
Accu
racy (
%)
fixed-bigrams fixed-unigrams fixed-unigrams+bigramsflexi-bigrams flexi-unigrams flexi-unigrams+bigrams
![Page 17: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/17.jpg)
AMIA-2006 17
Conclusions
• Overall expansion accuracy at or above 90% regardless of distribution
• Differences in accuracy are largely due to features, not ML algorithms
• Addition of bigrams and POS tags helps performance, but unigrams dominant
• Flexible window improves upon fixed window feature selection
![Page 18: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/18.jpg)
AMIA-2006 18
Future Work
• Expand all acronyms in a text, not just select few– expand based on prior expansions– utilize one sense per discourse constraint
• Integrate supervised methods with knowledge based approaches and clustering methods to reduce need for annotated examples
![Page 19: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/19.jpg)
AMIA-2006 19
Acknowledgments
• We would like to thank our annotators Barbara Abbott, Debra Albrecht and Pauline Funk.
• This work was supported in part by the NLM Training Grant (T15 LM07041-19) and the NIH Roadmap Multidisciplinary Clinical Research Career Development Award (K12/NICHD)-HD49078.
• Dr. Pedersen has been partially supported by a National Science Foundation Faculty Early CAREER Development Award (#0092784).
![Page 20: Amia06](https://reader037.vdocument.in/reader037/viewer/2022100600/55503f38b4c9058f768b48b6/html5/thumbnails/20.jpg)
AMIA-2006 20
Software Resources
• NSPGate (from Duluth/Mayo)– http://nspgate.sourceforge.net/
• Ngram Statistics Package (from Duluth)– http://ngram.sourceforge.net/
• WSDGate (from Duluth/Mayo)– http://wsdgate.sourceforge.net/
• WEKA (from Waikato) – http://www.cs.waikato.ac.nz/ml/weka/
• GATE (from Sheffield) – http://gate.ac.uk/