an investigation into the relationship between semantic and content based similarity using lidc...
Post on 15-Jan-2016
214 views
TRANSCRIPT
An Investigation into the Relationship between Semantic and Content Based Similarity
Using LIDC
Grace Dasovich
Robert Kim
Midterm Presentation
August 21 2009
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
• Computer-Aided Diagnosis (CADx) based on low-level image features– Armato et al. developed a linear discriminant
classifier using features of lung nodules– Need to find the relationship between the
image features and radiologists’ ratings
Related Work
• Image features and the semantic ratings– Lung Interpretations
• Barb et al. developed Evolutionary System for Semantic Exchange of Information in Collaborative Environments (ESSENCE)
• Raicu et al. used ensemble classifiers and decision trees to predict semantic ratings
• Samala et al. used several combinations of image features and the radiologists’ ratings to classify nodules
Related Work
– Similarity• Li et al. investigated four different methods to
compute similarity measures for lung nodules– Feature-based– Pixel-value-difference– Cross correlation– ANN
Related Work
Materials
• LIDC Dataset
• 149 Unique Nodules– One slice per nodule, largest nodule area
• 9 Semantic Characteristics– Calcification and Internal Structure had little
variation, thus were not used
• 64 Content Features– Shape, size, intensity, and texture
6
Data
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
Outline
• Cosine Similarity
• Jeffrey Divergence
• Euclidean Distance
Similarity Measures
Similarity Measures
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Euclidean Distance
Co
sin
e S
imila
rity
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5
4
Euclidean Distance
Jeff
rey
Div
erg
en
ce
Similarity Measures
• Computed feature distance measures
Similarity Measures
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
• Two three-layer ANNs – Input (64 neurons), hidden layer (5 neurons), output
(1)– Input (64 neurons), hidden layer (5 neurons), output
(7)
• Input = 64 feature distances• Output = Semantic similarity or difference in
semantic ratings• Hyperbolic tangent function, backpropagation
algorithm, 200 iterations
Methods
• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2
Methods
Methods
• ANN with seven outputs– 640 random pairs from all 109 nodules
• Leave-one-out method– Cosine similarity or Jeffrey divergence or
difference in Semantic ratings used as teaching data
– An ANN trained with entire dataset minus one image pair
– The pair left out used for testing– Correlation between calculated radiologists’
similarity and ANN output calculated
Methods
• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2
• ANN with seven outputs– 640 random pairs from all 109 nodules
Methods
• ANN using 640 random pairs
Results
• ANN using 231 pairs with malignancy rating > 3
Results
• ANN using 496 pairs with area > 122 mm2
Results
• ANN output vs. target values using Jeffrey divergence for the 640 pairs (r = 0.438)
Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Output
Ta
rge
t
• ANN using random 640 pairs and the Jeffrey divergence with seven semantic ratings
Results
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
Methods
• Normalization of Features– Min-Max Technique – Z-Score Technique
• Pair Selection– Looked for matches between k number of
most similar images based on semantic and content
24
Methods
Methods
• Multivariate Regression Analysis– Select features with highest correlation
coefficients
– Feature distance measures
25
Methods
• Nodule Analysis– Determine differences between selected and
non-selected nodules– Define requirements for our model
Methods
Results
27
Results
0 2 4 6 8 10 12 14 16 18 200
0.5
1
Cor
rela
tion
Threshold0 2 4 6 8 10 12 14 16 18 20
0
1000
2000
Num
ber
of P
airs
Results
d(i, j) d2(i, j) exp(d(i, j))
Cosine 0.871 0.849 0.866
Jeffrey 0.647 0.633 0.608
Results
Correlation Coefficient Feature0.1175 Equivalent Diameter0.1085 Energy (Haralick)0.0823 Gabor Mean 135_050.0647 Convex Area0.0467 Gabor STD 135_040.0322 Min Intensity BG0.0295 Markov 40.0280 Variance (Haralick)0.0265 Gabor STD 45_050.0238 SD Intensity
R2 = 0.871
29
Results
Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Content
Sem
antic
30
Results
Results
1 2 3 4 50
0.5
1Lobulation
1 2 3 4 50
0.5
1Malignancy
1 2 3 4 50
0.2
0.4
0.6
0.8
1Margin
1 2 3 4 50
0.2
0.4
0.6
0.8
1Sphericity
1 2 3 4 50
0.5
1Spiculation
1 2 3 4 50
0.5
1Subtlety
1 2 3 4 50
0.5
1Texture
79 Nodules
70 Nodules
31
Results
Results
-2 0 2 4 6 80
0.2
0.4Equivalent Diameter
-2 0 2 4 60
0.2
0.4Energy
-1 0 1 2 3 40
0.2
0.4Gabor Mean 135 5
-2 0 2 4 6 8 100
0.5
1Convex Area
-2 -1 0 1 2 3 4 50
0.1
0.2Gabor SD 135 4
-3 -2 -1 0 1 20
0.2
0.4Min Intensity BG
-1 0 1 2 3 4 5 60
0.5
1Markov4
-2 0 2 4 6 80
0.5
1Variance
-2 -1 0 1 2 3 40
0.1
0.2Gabor SD 45 5
-2 0 2 4 60
0.1
0.2SD Intensity
79 nodules70 nodules
32
Results
Results
-5 0 5 100
0.1
0.2
0.3
0.4A
-5 0 5 100
0.05
0.1
0.15
0.2B
79 Nodules70 Nodules
79 Nodules70 Nodules
1 2 3 4 50
0.2
0.4
0.6
0.8C
1 2 3 4 50
0.2
0.4
0.6
0.8D
79 Nodules70 Nodules
79 Nodules70 Nodules
Results
A. Equivalent Diameter, B. Standard Deviation of Intensity, C. Malignancy, D. Subtlety
Preliminary Issues
• The ANN also is not yet sufficient to predict semantic similarity from content– Best correlation 0.438– Malignancy correlation 0.521– Jeffrey performed better unlike linear model
• A semantic gap still exists
Conclusions
Conclusions
• Our linear model applies to a specific type of nodule– Characteristics: High malignancy, high texture,
low lobulation, and low spiculation– Features: Larger diameter, greater intensity
• Linear models are not sufficient for determination of similarities– R2 of 0.871 with chosen nodules
35
Conclusions
Future Work
• Reduce variability among radiologists– Use only nodules with radiologists’ agreement
• Find best combination of content features– 64 may be too many– Currently only using 2D
Future Work
• Different semantic distance measures– Some ratings are ordinal, Jeffery is for
categorical
• Different methods of machine learning– Incorporate radiologists’ feedback into training– Ensemble of classifiers
Future Work
Thanks for Listening
Any Questions?
38
Thanks for Listening