nick cercone vlado keselj calvin thomas computer science dalhousie university

1

Analysis of Spontaneous Speech in Dementia of Alzheimer Type: Experiments with Morphological and Lexical Analysis

Nick Cercone

Vlado Keselj

Calvin Thomas

Computer Science

Dalhousie University

PUL Workshop, Dalhousie University, Halifax, 23 Apr 2004

Kenneth Rockwood

Medicine, Dalhousie University

Elissa Asp

English Deparment

Saint Mary’s University

2

Overview

Introduction Related work: Bucks et al, authorship

attribution CNG discrimination Pt/other rating dementia levels

use of attribute sets: MA-A, MA-B CNG and Ordinal CNG

Conclusion

3

Introduction Effects of the Alzheimer’s disease (AD)

reduced communicative ability deterioration of linguistic performance

Can we detect it? Current methods rely on structured interviews

confrontation naming single word production word generation given context word generation given first letter picture description

4

Analysis of spontaneous speech drawbacks of structured interviews:

sometimes insensitive to early signs of dementia observed by family

low scores are not reliable unless difficulty is observed in natural conversation

brake “natural speech” into components subjective, i.e., designed by a researcher

alternative solution: objective automatic analysis of spontaneous, i.e., natural, speech

5

Speech characteristics in Dementia of Alzheimer Type (DAT) frequent use of functional words (closed

class) less rich vocabulary difficulty with constructing longer coherent

phrases more difficulties at lexical and morphological

level than phonetic and syntactic levels

6

Related work: Bucks et al. (BSCW) Bucks, Singh, Cuerden, Wilcock 2000, 2001:

Analysis of spontaneous conversational speech in dementia of Alzheimer type (DAT)

use eight linguistic measures to analyze transcribed spontaneous speech:1) noun rate 5) clause-like semantic unit rate

2) pronoun rate (CSU)

3) verb rate 6) Brunet’s index (W)

4) adjective rate 7) token type ratio (TTR)

8) Honore’s statistic (R)

7

Bucks et al.: Experiment design• experiment with 24 participants:

• 8 patients and 16 healthy individuals• discriminating between demented and

healthy individuals: • 100% on training data• 87.5% with cross-validation

8

Related work: Automated authorship attribution Problem of identifying the author of an

anonymous text

One of Text Categorization Problems

. Spam detection

. Language and encoding identification

. Authorship attribution and plagiarism detection

. Text genre classification

. Topic detection

. Sentiment classification

9

Related work (authorship attribution)

1. style analysis using style markers (features) relying on non-trivial NL analysis Stamatatos et al. 2000-02

2. language modeling Peng et al. 2003, EACL’03 Khmelev and Teahan 2003, SIGIR’03

3. N-gram-based text categorization Cavnar and Trenkle 1994

10

Shortcomings of style analysis

• difficult to automatically extract some features

• feature selection is critical• language dependent• task dependent, i.e., does not

generalize well to other types of classification

11

Character N-gram -based Methods Text can be considered as a concatenated

sequence of characters instead of words.

Advantages

1. small vocabulary

2. language independence

3. no word segmentation problems in many Asian languages such as Chinese and Thai

12

How do character n-grams work?Marley was dead: to begin with. There

is no doubt whatever about that. …

n = 3Mararlrleleyey_y_w_wawas

_th 0.015 ___ 0.013 the 0.013 he_ 0.011 and 0.007 _an 0.007 nd_ 0.007 ed_ 0.006

sort by frequency

L=5

(from Christmas Carol by Charles Dickens)

…

13

How do we compare two profiles?

_th 0.015 ___ 0.013 the 0.013 he_ 0.011 and 0.007

Dickens: Christmas Carol _th 0.016 the 0.014 he_ 0.012 and 0.007 nd_ 0.007

Dickens: A Tale of Two Cities

_th 0.017 ___ 0.017 the 0.014 he_ 0.014 ing 0.007

Carroll: Alice’s adventures in wonderland

?

?

14

N-gram distribution

0.00E+00

5.00E-04

1.00E-03

1.50E-03

2.00E-03

2.50E-03

3.00E-03

3.50E-03

4.00E-03

4.50E-03

5.00E-03

1 4 7 10 13 16 19 22 25 28 31 34

6-grams

(From Dickens: Christmas Carol)

15

CNG profile similarity measure

• a profile = the set of L the most frequent n-grams

• profile dissimilarity measure:

weight

2

profile 21

21

2

profile 21

21

)()(

))()((2

2)()()()(

nn nfnf

nfnfnfnfnfnf

16

Authorship Attribution Evaluation

0

10

20

30

40

50

60

70

80

90

100

English Greek A Greek B Greek B+ Chinese

Style

Lang. M

CNG

17

ACADIE Data Set• 189 GAS interviews (Goal Attainment Scaling)• 95 patients (2 interviews per patient, except 1

patient)• 6 sites; 17 MB of data (3.2 million words)• interview participants:

• FR – field researcher• Pt – patient• Cg – caregiver• other people

18

Experiment set-up• preprocessing• patients divided into two groups

• 85 training group (169 interviews)• 10 testing group (20 interviews)

• patient speech in training group is used to build Alzheimer profile

• non-patient speech in training group is used to build non-Alzheimer profile

• two experiments:• classification• improvement detection

19

Classification

• from each test interview patient and non-patient speech is extracted

• this produces 40 speech extracts• each speech extract is labelled by

the classifier as Alzheimer or non-Alzheimer

• accuracy is reported

20

Experiment 1.1

training and testing part (90:10) use all speakers to generate profiles use both interviews

21

ACADIE: Classification accuracy

n=1 2 3 4 5 6 7 8 9 10L = 20 88% 85% 83% 88% 98% 93% 95% 80% 85% 85%

50 73% 80% 78% 95% 95% 85% 93% 93% 95% 100%100 73% 78% 95% 95% 98% 98% 100% 98% 98% 100%200 73% 93% 98% 100% 98% 100% 100% 100% 100% 100%500 73% 80% 95% 100% 100% 98% 98% 100% 100% 100%

1000 73% 95% 100% 100% 100% 100% 100% 100% 100% 100%1500 73% 98% 98% 100% 100% 100% 100% 100% 100% 100%2000 73% 98% 93% 100% 100% 100% 100% 100% 100% 100%3000 73% 98% 93% 100% 100% 100% 100% 100% 100% 100%4000 73% 98% 98% 100% 100% 100% 100% 100% 100% 100%5000 73% 98% 98% 98% 100% 100% 100% 100% 100% 100%

22

Improvement detection

) threshold(0.5 profileAlzheimer

withsimilarity normalized

profileAlzheimer -non with similarity

profileAlzheimer with similarity

ba

a

b

a

SS

SS

S

S

improvement is detected by observing an increase in S value between the first and second interview

23

ACADIE: Detected improvement

n=1 2 3 4 5 6 7 8 9 10L = 20 50% 60% 70% 80% 70% 50% 50% 40% 60% 50%

50 50% 70% 60% 30% 60% 30% 30% 60% 50% 70%100 40% 60% 40% 40% 40% 40% 80% 60% 70% 60%200 40% 30% 30% 40% 50% 70% 40% 70% 50% 60%500 40% 80% 60% 80% 60% 50% 40% 60% 80% 70%

1000 40% 50% 90% 60% 70% 70% 70% 90% 60% 60%1500 40% 70% 80% 70% 80% 60% 80% 80% 60% 50%2000 40% 60% 90% 70% 70% 70% 70% 70% 60% 60%3000 40% 60% 70% 70% 70% 60% 60% 70% 60% 70%4000 40% 60% 70% 90% 80% 80% 70% 60% 70% 70%5000 40% 60% 70% 80% 80% 70% 60% 70% 70% 70%

24

Experiment 1.2

use only first interviews to create Alzheimer and Non-Alzheimer profiles

25

Exp. 1.2: Classification accuracy

n=1 2 3 4 5 6 7 8 9 10L = 20 85% 85% 83% 88% 93% 90% 95% 80% 80% 83%

50 70% 90% 83% 98% 95% 85% 93% 95% 95% 90%100 73% 98% 98% 98% 90% 98% 98% 98% 95% 98%200 73% 88% 98% 100% 100% 98% 100% 95% 100% 100%500 73% 83% 98% 100% 95% 98% 95% 100% 98% 100%

1000 73% 95% 100% 100% 100% 100% 100% 100% 100% 100%1500 73% 95% 95% 100% 100% 100% 100% 100% 100% 100%2000 73% 95% 93% 100% 100% 100% 100% 100% 100% 100%3000 73% 95% 95% 100% 100% 100% 100% 100% 100% 100%4000 73% 95% 98% 100% 100% 100% 100% 100% 100% 100%5000 73% 95% 100% 100% 100% 100% 100% 100% 100% 100%

Improvement detection: 0.6-0.9

26

Experiment 1.3

use only first interviews only speech produced by patients,

caregivers, and other (not field researchers)

27

Exp. 1.3: Classification accuracy

n=1 2 3 4 5 6 7 8 9 10L = 20 75% 90% 85% 80% 65% 75% 80% 70% 75% 70%

50 73% 88% 68% 80% 90% 75% 75% 75% 80% 83%100 73% 85% 88% 85% 88% 80% 83% 88% 88% 88%200 73% 83% 90% 90% 95% 88% 93% 88% 98% 93%500 73% 65% 95% 95% 95% 95% 98% 90% 93% 95%

1000 73% 83% 93% 93% 98% 93% 98% 98% 98% 95%1500 73% 78% 80% 95% 95% 100% 98% 98% 98% 95%2000 73% 80% 75% 95% 100% 98% 98% 98% 98% 95%3000 73% 83% 83% 88% 95% 98% 95% 98% 95% 93%4000 73% 83% 90% 95% 95% 95% 98% 98% 95% 95%5000 73% 83% 93% 98% 95% 98% 98% 98% 98% 93%

Improvement detection: 0.6-0.8

28

Some experiment observations Alzheimer n-gram profile captures many

indefinite terms and negated (e.g., sometimes, don’t know, can not, …)

the profiles captures reduced lexical richness

Alzheimer

non-Alzheimer

n-gram rank

n-gram

frequency

29

Second set of experiments

rating dementia levels

implement method BSCW (by Bucks et al.),

analysis and extension

comparison with CNG

application of a wider set of machine learning algorithms

30

MMSE – Mini-Mental State Exam MMSE – a standard test for identifying

cognitive impairment in a clinical setting 17 questions, 5-10 minutes introduced in 1975 by Folstein et al. score range from 0 to 30 a variety of cut points suggested over years:

17.5, 21.5, 23.5, 25.5

31

MMSE Score Gradation

we use the following gradation

four classes: severe moderate mild normal

two classes: low high

0 14.5 20.5 24.5 30

32

MMSE Score distribution in data set

severe moderate mild normal

34

Part-of-speech tagging, MA-A following the BSCW method applied Hepple from NL GATE and Connexor Hepple is based on Brill’s tagger Connexor performed better set of attributes MA-A: attributes similar to

BSCW: excluded CSU-rate:

1. manually annotated

2. reported non-significant impact by BSCW

35

Morphological Attribute Set: MA-B start with all POS attributes regression-based attribute selection 7 POS attributes selected (conjunctions

included) add TTR and Honore statistics

Brunet statistic shown to be non-significant use several machine learning algorithms with

cross-validation, using software tool WEKA

37

Ordinal CNG Method

• use two extreme groups to build profiles

severe dementia level normal level

profilesevere

profilenormal

test speechprofile

CNG similarity: SsevereSnormal

classify according tonormalsevere

severe

SS

S

38

Ordinal CNG: Thresholds

range of values: [0,1] 0 corresponds to severe, 1 to normal

what are good threshold interesting observation:

the optimal threshold is very close to the “natural threshold” – 0.5 (varies from 0.5 to 0.512)

40

Conclusions extensive experiments on morphological and lexical

analysis of spontaneous speech for detecting dementia of Alzheimer type

methods: CNG and Ordinal CNG extension of method proposed by use of POS tags as

suggested by BSCW positive results in classification and detecting

dementia level: 100% discrimination accuracy (Pt and other) 93% - severe/normal 70% - two-class accuracy 46% - four-class accuracy

41

Future work

improvement detection use of word CNG method stop-word frequency-based classifier syntactic analysis semantic analysis

nick cercone vlado keselj calvin thomas computer science dalhousie university

Documents