visual rating wm changes

Standardised Visual

assessmentVisual rating of White Matter

Changes

Lena Bronge

MRI-Dept

Aleris röntgen, Sabbatsberg

White matter lesions (WML) /

Age Related White Matter

Changes (ARWMC)

Degeneration of the brain white matter

Unknown cause

Age related

Hypertension, cerebrovasc risk factors

Probably due to chronic ischemia

Relation to cognitive decline

Relation to dementia?

White matter changes

A spectrum of degeneration/destruction

of tissue elements

Myelin pallor

Demyelination

Increased distance between myelin fibres

Loss of axons

Gliosis

Cavitation/Necrosis

Peri-ventricular

(bands)Deep white matter lesions

Peri-ventricular lesions

(caps)

Many have studied relations between

WMC and different clinical

parameters

Disparate results, Poor agreement

Methods for assessing WMC

Visual rating according to different scales

(MRI or CT)

Semiautomatic computer based analysis

-segmentation- (MRI), manual settings

Automatic computer analysis

-segmentation- (MRI)

Visual rating scales

A great number of scales exist

Fazekas - 87

Wahlund - 91

van Swieten - 92

Scheltens - 93

ARWMC scale (European task force) – 01

Etc, etc……


0 No

1 Yes

Simplest possible scale


0 No

1 Minor changes

2 Extensive changes

Or 0-3, 0-7, 0-9…

Periventricular Lesions

0 No lesions

1 Caps or thin line

2 Smooth halo

3 Extension into the white matter

White matter lesions

0 No lesions

1 Punctate foci

2 Beginning confluence of foci

3 Large confluent areas

Fazekas scale for MRI

White Matter Lesions (PVL and

WML)

0 No lesions

1 Focal lesions

2 Beginning confluence of lesions

3 Diffuse involvement of entire region

Basal Ganglia Lesions

0 No lesions

1 One focal lesion

2 More than one focal lesion

3 Confluent lesions

ARWMC scale for both CT and MRIApplied for several different regionsFrontal, Parieto-occipital, Temporal, Infratentorial, Basal ganglia

Scheltens scale

Periventricular hyperintensities1 0-6

Frontal 0-2 Caps

Occipital 0-2

Bands Lat. Ventricles 0-2

Deep white matter hyperintensities2 0-24

Frontal 0-6

Parietal 0-6

Occipital 0-6

Temporal 0-6

Basal Ganglia hyperintensities2 0-30

Caudate Nucleus 0-6

Putamen 0-6

Globus pallidus 0-6

Thalamus 0-6

Internal capsule 0-6

Infratentorial hyperintensities 0-24

1

0 = absent; 1 = <= 5 mm; 2 = 6-10 mm 2

0 = No abnormalities; 1 = < 3 mm, n <= 5; 2 = < 3 mm, n > 5;

3 = 4-10 mm, n <= 5; 4 = 4-10 mm, n > 5; 5 = > 10 mm, n >= 1; 6 =

confluent.

Rating scales

Give numbers but not “measures”

Give data that are not quantitative but

qualitative

Give ordinal data, at best

Non-parametric statistics

Scheltens scale

Is claimed to be semiquantitative

Considering both number and volume of

lesions

The total score ranges from 0-84

Modified variant (separating sin/dx) 0-108

It is an obvious disadvantage that there are a number of different scales measuring the same thing

Are the results even comparable?

Poor agreement between

different scales

Mäntylä et al (Stroke 1997)

-Compared 13 different scales in the same patient group (400 post-stroke patients)

-Poor agreement overall

-different relation between WMC and e.g. Hypertension

”Part of the inconsistensies in previous studies are due to the different properties of the scales”

Scale properties

Ceiling effect / Floor effect (truncation)

Different and sometimes vague definitions of

the scores

Varying number of points (dichotomic, 0-3, 0-6…)

Different types and location of lesions

Sum of scores – from different areas or lesion types

– sometimes measuring the same thing twice

Validation (how well does the scale match a ”gold

standard” i.e. the true phenomenon?)

What are the advantages with

rating scales?

When and why do we use them?

What are the advantages with rating scales?


In the absence of other techniques

Measuring a phenomenon that is not

easily quantified by automatic methods

(e.g. segmentation – contrast, threshold

effects, manual settings)

Often quicker than automatic methods

Sometimes more appropriate (results are

more reproducible)

What are the advantages with rating scales?


Measurements from non-

standardised images (multicenter)

In case of poor image quality

Possible even with different imaging

modalities (i.e. CT and MRI)

Not only area / volume but other

characteristics like appearance,

number or location

What type of scale is best?

The simplest one or the one

with the most detailed rating?




Depends on the purpose, what kind of

information is required?

Ratings on a simple scale are often easier

to reproduce, but not always…

Fazekas scale has had low reproducibility

in several studies

Scheltens scale had even somewhat higher

reproducibility




Many different raters – simple scale

Varying image quality – simple scale

Very large material – consider simple scale

Few raters and standardised images; You

could use a more complex scale. Practice

first and harmonise your ratings

Experienced rater? – more complex scale

Setting:

Reliability of ratings

Many, but not all, scales have previously

published reliability measures

Recommendable to also do your own

reliability testing

Inter- and/or intra-observer agreement

Inter-: more than one rater

Intra-: the same rater more than once

(but some time apart)

Kappa ratio;

Kappa ratio

weighted – if ordinal scale

From –1 to +1

0 = no agreement

<0.40; poor agreement

0.40-0.60; fair agreement

0.60-0.80; good agreement

>0.80; excellent agreement

Reliability of ratings…

Rating scales never give perfect

agreement

Reliability is no proof of validity

Reliability

of ratings…

Different kinds of data

With ratings you can get

Dichotomous data (Yes or No)

Nominal data (categories)

Ordered nominal data (categories with an order)

Ordinal data (values with order but no fixed intervals)

Different kinds of data…

With ratings you can NEVER get

Interval scale data (the same interval

between each point)

Ratio scale data (interval scale that

include an absolute zero: the ratio has a

meaning)

Ordinal data

There is no real measure

just an arbitrary value based on identification and comparison

The steps or intervals on the scale are defined by a text

Ordinal data

A step from one score to the next mean

different things depending on where on the

scale you are and who is the rater

The scale is often truncated

Ordinal data

Sums of scores are often used

But; a great number of different

combinations of scores can give the

same sum score

The same sum of score in two

persons does not mean the same

thing

The sum score often includes rating

the same thing twice

Ordinal data

You cannot calculate means or

standard deviations on ordinal

data

Use medians, quartiles and

ranges

To summarize

Choose an appropriate rating scale

Consider what type of data you need, what

questions do you want to answer?

How does your material look?

Who will do the rating? When?

Previous studies that you want to compare?

Choose appropriate statistical tests

Do your own reliability testing

visual rating wm changes

Documents