Download - Data Mining in Rediology reports
![Page 1: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/1.jpg)
Data Mining in Radiology Reports
Saeed Mehrabi
Spring 2010INFO-I535
Dr. Patrick W. Jamieson
Dr. Josette Jones
![Page 2: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/2.jpg)
Outline• Introduction to data and text mining
• Our data set
• Structuring free text
• Results
• Similar works
• Discussion
![Page 3: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/3.jpg)
What is Data Mining • Data mining is
The extraction of useful patterns from data sources such as databases, texts and web.
• There is a big gap from stored data to knowledge and
the transition won’t occur automatically.
• Many interesting things you want to find cannot be found using database queries “find me people likely to buy my products”
“Who are likely to respond to my promotion”
![Page 4: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/4.jpg)
Why data mining now?
• The data is abundant.
• The data is being warehoused.
• The computing power is affordable.
• The competitive pressure is strong.
• Data mining tools have become available
![Page 5: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/5.jpg)
Text Mining
Text mining applies and adapts data mining techniques to text domain
Structured vs. Free Text
• Structured text can be stored in a relational database.
• Providing the means to represent data available in text in structured format will make information exchange, data mining and information retrieval more feasible.
![Page 6: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/6.jpg)
Data Set
• Our corpus consists of: 594,000 de-identified radiology reports
36 million words
4.3 million sentences
• The reports were dictated by the Indiana University Radiology faculty, a group of 40 radiologists, from 1993-1998.
![Page 7: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/7.jpg)
Structuring Free text
• Regular expression was used to detect sentences in reports!
• Regular expression is a concise and flexible way of matching strings of text, such as particular characters or words.
• Sentences annotated to propositions which simply are sentences expressing the same concept for similar findings within reports
![Page 8: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/8.jpg)
Structuring Free text (Cont.)
• A proposition is a declarative sentence, that is either true or false but not both.
Today is a beautiful sunny day. ( A proposition)
x + 2 = 4 (Not a proposition)
• Users can select propositions and map sentences to propositions
![Page 9: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/9.jpg)
![Page 10: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/10.jpg)
Corpus Annotation
• So for annotating each new sentence from the radiology reports the computer initially propose propositions
• The suggested propositions by the software are reviewed by experts and corrected as needed before validation.
• If there is no proposition in the ontology then the expert can create new ones.
![Page 11: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/11.jpg)
![Page 12: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/12.jpg)
Results
• The process of building the ontology of propositions is in parallel with the expert annotating sentences to the existing proposition
• So far, 427,433 unique sentences from the corpus have been annotated.
Representing a total of 2,561,330 sentences or 60% of the total sentences.
![Page 13: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/13.jpg)
Results (Cont.)• The propositions are categorized into main findings such as
brain and skull, general radiology, ..
• All propositions with information such as whether they are normal or abnormal finding and the number of the sentences mapped to them are all stored in a relational data base
• We can find the most frequent or highest ranked propositions by sorting them based the number of sentences that are mapped to them, how many of them are normal or abnormal and the number of normal and abnormal propositions and sentences in each category
![Page 14: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/14.jpg)
1-50
0
501-
1000
1001
-150
0
1501
-200
0
2001
-250
0
2501
-300
0
3001
-350
0
3501
-400
0
4001
-450
0
4501
-500
0
5001
-550
0
5501
-600
0
6001
-650
0
6501
-700
0
7001
-750
0
7501
-800
0
8001
-850
0
8501
-900
0
9001
-950
0
9501
-100
00
1000
1-10
500
1050
1-11
000
1100
1-11
500
1150
1-12
000
1200
1-12
500
1250
1-13
000
1300
1-13
500
1350
1-13
581
0
50
100
150
200
250
300
350
Number of normal and abnormal propositions within the 500 interval of highest ranked propositions
NormalAbnormal
Rank of Propositions
Nu
mb
er
of
Pro
po
sit
ion
s
![Page 15: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/15.jpg)
1-500 501-1000 1001-1500 1501-2000 2001-25000
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
Number of normal and abnormal sentences mapped to the propositions
NormalAbnormal
Rank of Propositions
Nu
mb
er
of
Se
nte
nc
es
![Page 16: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/16.jpg)
2501
-300
0
3001
-350
0
3501
-400
0
4001
-450
0
4501
-500
0
5001
-550
0
5501
-600
0
6001
-650
0
6501
-700
0
7001
-750
0
7501
-800
0
8001
-850
0
8501
-900
0
9001
-950
0
9501
-100
00
1000
1-10
500
1050
1-11
000
1100
1-11
500
1150
1-12
000
1200
1-12
500
1250
1-13
000
1300
1-13
500
1350
1-13
581
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Number of normal and abnormal sentences mapped to the propositions
NormalAbnormal
Rank of Propositions
Nu
mb
er
of
Stu
nd
en
ts
![Page 17: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/17.jpg)
Brain
and
Skull
Breas
t
Face,
Mas
toids
, and
Nec
k
Gastro
intes
tinal
Gener
al Rad
iolog
y
Genito
urina
ry
Heart
and
Great
Ves
sel
Lung
, Med
iastin
um, a
nd P
leura
Misc
ellan
eous
Obs
erva
tion
Skelet
al an
d Sof
t Tiss
ue
Spine
and
Conte
nts
Vascu
lar a
nd L
ymph
atic
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Number of normal and abnormal propositions based on report categories
NormalAbnormal
Categories of findings
Nu
mb
er
of
Pro
po
sit
ion
s
![Page 18: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/18.jpg)
Brain
and
Skull
Breas
t
Face,
Mas
toids
, and
Nec
k
Gastro
intes
tinal
Gener
al Rad
iolog
y
Genito
urina
ry
Heart
and
Great
Ves
sel
Lung
, Med
iastin
um, a
nd P
leura
Misc
ellan
eous
Obs
erva
tion
Skelet
al an
d Sof
t Tiss
ue
Spine
and
Conte
nts
Vascu
lar a
nd L
ymph
atic
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
Number of normal and abnormal sentences based on report cat-egories
NormalAbnormal
Categroies of Findings
Nu
mb
er
of
Sc
en
ten
ce
s
![Page 19: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/19.jpg)
Similar works
CLEF (Clinical E-Science Framework)
• It consists of both structured records and free text documents(clinical narratives, radiology reports and histopathology report)
• Semantic annotation of clinical text to assist in the development and evaluation of an Information Extraction system
![Page 20: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/20.jpg)
LEXIcon Mediated Entropy Reduction
![Page 21: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/21.jpg)
LEXIMER(Cont.)
• Phrase Isolation includes scanning the report text and separating the content into phrases
• Noise Reduction decreases the amount of non-clinically relevant information contained within the report
• Signal Extraction pulls out the positive statements and recommendations from the clinically relevant phrases
![Page 22: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/22.jpg)
NLP using OLAP for assessing Recommendations in radiology reports
• Database:4,279,179 radiology reports from a single tertiary health care center
10-year period (1995-2004)
Consist of reports of most common imaging modalities tests with patient demographics
• Leximer in conjunction with OnLine Analytic Processing was used for classifying reports into those with recommendation (IREC) and without recommendations for imaging
• IREC rates were determined for different patient age groups, gender, imaging modalities, indications, diseases, subspecialties, and referring physicians
![Page 23: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/23.jpg)
Discussion
• CLEF work is on very limited number of reports
• In Leximer, there is no validation of their classification method and phrases cannot convey the meaning of a sentence.
• What distinguish our work from others is the large amount of data that is mined and consistent expert validation.
![Page 24: Data Mining in Rediology reports](https://reader035.vdocument.in/reader035/viewer/2022062704/5562d8a8d8b42aac778b4c04/html5/thumbnails/24.jpg)
Reference
• Friedlin, J., Mahoui, M., Jones, J., Kashyap, V., & Jamieson , P. (2010). Knowledge Discovery and Data Mining of Free Text Radiology. Submitted to the journal of biomedical informatics
• Roberts, A., Gaizauskas, R., Hepple, M., Demetriou, G., Guo, Y., Setzer, A., et al. (2008). Semantic Annotation of Clinical Text: The CLEF Corpus. Retrieved April 20, 2010, from ftp://ftp.dcs.shef.ac.uk/home/robertg/papers/lrec08-clefcorpus.pdf
• Dang PA, Kalra MK, Blake MA, Schultz TJ, Stout M, Lemay PR, Freshman DJ, Halpern EF, Dreyer KJ. Natural language processing using online analytic processing for assessing recommendations in radiology reports.J Am Coll Radiol. 2008 Mar;5(3):197-204.
• http://www.nuance.com/healthcare/products/radcube-for-radiology.asp