visual rating wm changes
TRANSCRIPT
Standardised Visual
assessmentVisual rating of White Matter
Changes
Lena Bronge
MRI-Dept
Aleris röntgen, Sabbatsberg
White matter lesions (WML) /
Age Related White Matter
Changes (ARWMC)
Degeneration of the brain white matter
Unknown cause
Age related
Hypertension, cerebrovasc risk factors
Probably due to chronic ischemia
Relation to cognitive decline
Relation to dementia?
White matter changes
A spectrum of degeneration/destruction
of tissue elements
Myelin pallor
Demyelination
Increased distance between myelin fibres
Loss of axons
Gliosis
Cavitation/Necrosis
MRI
Peri-ventricular
(bands)Deep white matter lesions
Peri-ventricular lesions
(caps)
CT
Many have studied relations between
WMC and different clinical
parameters
Disparate results, Poor agreement
Methods for assessing WMC
Visual rating according to different scales
(MRI or CT)
Semiautomatic computer based analysis
-segmentation- (MRI), manual settings
Automatic computer analysis
-segmentation- (MRI)
Visual rating scales
A great number of scales exist
Fazekas - 87
Wahlund - 91
van Swieten - 92
Scheltens - 93
ARWMC scale (European task force) – 01
Etc, etc……
White matter changes
0 No
1 Yes
Simplest possible scale
White matter changes
0 No
1 Minor changes
2 Extensive changes
Or 0-3, 0-7, 0-9…
Periventricular Lesions
0 No lesions
1 Caps or thin line
2 Smooth halo
3 Extension into the white matter
White matter lesions
0 No lesions
1 Punctate foci
2 Beginning confluence of foci
3 Large confluent areas
Fazekas scale for MRI
White Matter Lesions (PVL and
WML)
0 No lesions
1 Focal lesions
2 Beginning confluence of lesions
3 Diffuse involvement of entire region
Basal Ganglia Lesions
0 No lesions
1 One focal lesion
2 More than one focal lesion
3 Confluent lesions
ARWMC scale for both CT and MRIApplied for several different regionsFrontal, Parieto-occipital, Temporal, Infratentorial, Basal ganglia
Scheltens scale
Periventricular hyperintensities1 0-6
Frontal 0-2 Caps
Occipital 0-2
Bands Lat. Ventricles 0-2
Deep white matter hyperintensities2 0-24
Frontal 0-6
Parietal 0-6
Occipital 0-6
Temporal 0-6
Basal Ganglia hyperintensities2 0-30
Caudate Nucleus 0-6
Putamen 0-6
Globus pallidus 0-6
Thalamus 0-6
Internal capsule 0-6
Infratentorial hyperintensities 0-24
1
0 = absent; 1 = <= 5 mm; 2 = 6-10 mm 2
0 = No abnormalities; 1 = < 3 mm, n <= 5; 2 = < 3 mm, n > 5;
3 = 4-10 mm, n <= 5; 4 = 4-10 mm, n > 5; 5 = > 10 mm, n >= 1; 6 =
confluent.
Rating scales
Give numbers but not “measures”
Give data that are not quantitative but
qualitative
Give ordinal data, at best
Non-parametric statistics
Scheltens scale
Is claimed to be semiquantitative
Considering both number and volume of
lesions
The total score ranges from 0-84
Modified variant (separating sin/dx) 0-108
It is an obvious disadvantage that there are a number of different scales measuring the same thing
Are the results even comparable?
Poor agreement between
different scales
Mäntylä et al (Stroke 1997)
-Compared 13 different scales in the same patient group (400 post-stroke patients)
-Poor agreement overall
-different relation between WMC and e.g. Hypertension
”Part of the inconsistensies in previous studies are due to the different properties of the scales”
Scale properties
Ceiling effect / Floor effect (truncation)
Different and sometimes vague definitions of
the scores
Varying number of points (dichotomic, 0-3, 0-6…)
Different types and location of lesions
Sum of scores – from different areas or lesion types
– sometimes measuring the same thing twice
Validation (how well does the scale match a ”gold
standard” i.e. the true phenomenon?)
What are the advantages with
rating scales?
When and why do we use them?
What are the advantages with rating scales?
When and why do we use them?
In the absence of other techniques
Measuring a phenomenon that is not
easily quantified by automatic methods
(e.g. segmentation – contrast, threshold
effects, manual settings)
Often quicker than automatic methods
Sometimes more appropriate (results are
more reproducible)
What are the advantages with rating scales?
When and why do we use them?
Measurements from non-
standardised images (multicenter)
In case of poor image quality
Possible even with different imaging
modalities (i.e. CT and MRI)
Not only area / volume but other
characteristics like appearance,
number or location
What type of scale is best?
The simplest one or the one
with the most detailed rating?
What type of scale is best?
The simplest one or the one
with the most detailed rating?
Depends on the purpose, what kind of
information is required?
Ratings on a simple scale are often easier
to reproduce, but not always…
Fazekas scale has had low reproducibility
in several studies
Scheltens scale had even somewhat higher
reproducibility
What type of scale is best?
The simplest one or the one
with the most detailed rating?
Many different raters – simple scale
Varying image quality – simple scale
Very large material – consider simple scale
Few raters and standardised images; You
could use a more complex scale. Practice
first and harmonise your ratings
Experienced rater? – more complex scale
Setting:
Reliability of ratings
Many, but not all, scales have previously
published reliability measures
Recommendable to also do your own
reliability testing
Inter- and/or intra-observer agreement
Inter-: more than one rater
Intra-: the same rater more than once
(but some time apart)
Kappa ratio;
Kappa ratio
weighted – if ordinal scale
From –1 to +1
0 = no agreement
<0.40; poor agreement
0.40-0.60; fair agreement
0.60-0.80; good agreement
>0.80; excellent agreement
Reliability of ratings…
Rating scales never give perfect
agreement
Reliability is no proof of validity
Reliability
of ratings…
Different kinds of data
With ratings you can get
Dichotomous data (Yes or No)
Nominal data (categories)
Ordered nominal data (categories with an order)
Ordinal data (values with order but no fixed intervals)
Different kinds of data…
With ratings you can NEVER get
Interval scale data (the same interval
between each point)
Ratio scale data (interval scale that
include an absolute zero: the ratio has a
meaning)
Ordinal data
There is no real measure
just an arbitrary value based on identification and comparison
The steps or intervals on the scale are defined by a text
Ordinal data
A step from one score to the next mean
different things depending on where on the
scale you are and who is the rater
The scale is often truncated
Ordinal data
Sums of scores are often used
But; a great number of different
combinations of scores can give the
same sum score
The same sum of score in two
persons does not mean the same
thing
The sum score often includes rating
the same thing twice
Ordinal data
You cannot calculate means or
standard deviations on ordinal
data
Use medians, quartiles and
ranges
To summarize
Choose an appropriate rating scale
Consider what type of data you need, what
questions do you want to answer?
How does your material look?
Who will do the rating? When?
Previous studies that you want to compare?
Choose appropriate statistical tests
Do your own reliability testing