biomedical informatics and clinical nlp in translational science research piet c. de groen, m.d
TRANSCRIPT
Overview - Examples
Patient-specific research – N=1 study Understanding a disease Finding the right MD, diagnosis and
treatment
Renal Transplant patientMay, 2005
Hepatobiliary Clinic Consultation Abnormal liver tests – using Lipitor™ Diarrhea and weight loss
Challenge Very complex medical history Nobody understands the case HUGE history with hundreds of notes
Questions
What is exactly the patient’s problem?– Are liver tests and weight loss due to Lipitor?– When did she use Lipitor?– What was the weight on what date?
Impossible to review all notes!– Which notes are relevant to current symptoms?– Which have notes have weights and drug
information?
What I need
I need to see trends over time– Weight– Lipitor use
– Effects of Lipitor on lipids and liver tests
But I cannot see trends over time– EMR does not have structured data for weight or
Lipitor use– EMR only allows for display of laboratory test results
in very large tables or simple graphs
Data Warehouse to the Rescue!
Demographics– MC # = xx-xxx-xxx
Clinical Notes– Patient Vitals
Weight exists
Result– 243 notes
43 had weight
Weight in kg
40
45
50
55
60
65
70
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
Start DialysisTransplantNew Problem
What happened to Cholesterol?
She was on Lipitor, but:– When was it discontinued?– Did it do anything to her lipid levels?
NLP to the rescue!
Sort 33 identified Clinical Notes on date First note is from 1997
– Lipitor is highlighted in the note– …Dr. X recommended discontinuation of Pravachol
and initiation of Lipitor … have written a prescription for Lipitor …
Last note is from 2005– … Lipitor was discontinued in 2004 …– March 2004 note confirms discontinuation
Warehouse to the Rescue!
Demographics– MC # = xx-xxx-xxx
Tests– Cholesterol exists
Clinical Notes– “Lipitor”
Result– 22 cholesterol levels– 243 notes: 33 mentioned “Lipitor”
Cholesterol in mg/dL
0
50
100
150
200
250
300
350
1993 1995 1998 2001 2004 2006
Lipitor
Recommendations
72 hour stool fat on 100 gram fat diet– 689 gram, 23 gram fat/day (2-7 Normal)
EGD/EUS with biopsies and aspirate– Esophagitis - ? Candida – biopsy negative– Duodenal diverticula, normal pancreas– Duodenal biopsy normal– Aerobes > 100,000 Gram negative bacillus cfu/mL– Anaerobes > 10,000 Bacteroides Fragilis cfu/mL– Yeast 1,000-10,000 cfu/mL
Small Bowel X-ray– Numerous diverticula
Spring 2006Based on simple queries of MCLSS
• For NASH the ICD-9 code 473.8 was used; this code may include other diagnoses, but the vast majority is NASH
• For Primary Liver Cancer the ICD-9 codes 155.0 and 155.1 were used
• For Obesity ICD-9 code 278.0 was used, or Diagnosis section Clinical Notes
• BMI was retrieved from Clinical Notes; maximum value during life time was used
Primary Liver CancerNASH Cases with BMI>30
0
5
10
15
20
25
30
35
40
1992 1994 1996 1998 2000 2002 2004 2006
Males
FemalesCases
Cancers with Increasing Incidence2012 report US: 1999 through 2008
CA: A Cancer Journal for CliniciansVolume 62, Issue 2, pages 118-128, 4 JAN 2012 DOI: 10.3322/caac.20141http://onlinelibrary.wiley.com/doi/10.3322/caac.20141/full#fig2
Time LineExample of Interval Colorectal Cancer
Pathology
Endoscopy
Diagnoses
Time Line
1 2 3 4 5
Year
Benign Colon Colon Cancer Non-Colon Disease
< 3 years
Co
lon
Can
cer
1993
-200
6(Pathology data)
4,203,857 specimens
238,177 specimens
Part description = “COL/RECT” AND Valid MCN
19,259 specimens
13,477 specimens(10,136 patients)
(Endoscopy data)325,370 Procedures
2,692 patients
4,743 procedures (date, other features)
Missed Lesions (Anatomic location, tumor size, other
characteristics)
Diagnosis_code = One of 50 identified cancer diagnosis codes
Unique? (One specimen may have multiple diagnosis codes)
Patients with CC diagnosis and C procedure
Extract all C procedures, the date and other features
Compare the CC diagnosis and C dates
Remove Patients with Research Authorization = ‘No’
Co
lon
osc
op
y 19
92-
2004
MethodsPathology = Colorectal Cancer
Negative History
Year
1 32 4 5Truly Missed
No lesions at colonoscopy
Probably Missed 1 32 4 5
1 32 4 5Seen, removed
Lesions at colonoscopy
1 32 4 5Seen, not removed
Colorectal Cancer History
1 32 4 5Recurrent, 2nd, 3rd
cancer not prevented
Results Summary
• Truly missed case– 90 days to 3 years
• Probably missed case– 3 to 5 years
• A lesion was seen– removed <5 years– not removed <5 years
• Local recurrence or 2nd, 3rd cancer
82
95
8
>44
54 >283
©Ralph A. Clevenger
Tumor Growth Curves
Truly MissedProbably MissedSeen & RemovedRecurrent, 2nd, 3rd
Time Interval (days)
Tu
mo
r S
ize
(m
m)
t = 3 yrs
0 200 400 600 800 1000 1200 1400 1600 18000
5
10
20
30
40
50
60
70
80
90
100 3 Months Doubling Time
Number Not
Detected
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 460
5
10
15
20
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 460
50
100Number
Seen
Numbers for each Endoscopist
Truly MissedProbably MissedSeen & RemovedRecurrent, 2nd, 3rd
% Not Detected
Miss Rate for each Endoscopist
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 460
5
10
15
20
25 Truly MissedProbably MissedSeen & Removed
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 460
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Detection of cancers in previously seen patients (self)
Detection of cancers in patients seen by colleagues (others)
Endoscopist