lexical differences in autobiographical narratives from schizophrenic patients and healthy controls...

45
Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber A. Parker, Ani Nenkova University of Pennsylvania

Upload: henry-stanley

Post on 25-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Lexical Differences in Autobiographical Narratives from SchizophrenicPatients and Healthy Controls

Kai Hong, Christian G. Kohler, Mary E. March,

Amber A. Parker, Ani Nenkova

University of Pennsylvania

Page 2: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Our Task

Identifying significant differences in lexical use from narratives by Patients vs Controls

Perform automatic classification

Identify a small subset of highly distinguishing features

How prediction accuracy varies with emotion type

Page 3: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Observations on lexical use# occurrences in narratives

Subjects Patient Controldog/dogs 28 1money 41 4sorry 0 7relationship 0 9

Self reference – “ I “Total occurring times: 1291 times vs 626 timesRatio after normalization by #words: 5.5% vs 4.3%

Page 4: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Dataset 201 stories from 39 subjects

- Patients: 120 stories, 23 patients- Controls: 81 stories, 16 controls

Five emotions: Anger Sad Happy Disgust Fear

Talk about past experience (moderately, mildly, extremely) in their lives

30 – 90 seconds to finish the story

Page 5: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Length of Stories

• No big difference when Patients vs Controls• Some difference between emotions.

Average # WordsPatients 192

Controls 181

P-value: 0.4254

Page 6: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Workflow

Narratives (Training)

Features

Lexical Feature Extraction

Page 7: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Features Basic Feature - Few, easy to compute general features Lexical Features and Repetitions - Sparse and many LIWC, Diction - Based on dictionary, More general Two-tailed T-test for significant features - 169 out of 6057 significant

Page 8: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Basic Features• Patients have more: sentences/document, words/document

• Control have more: letters/word, words/sentence, tokens/vocabulary

Control > SCH P-valueletters/word 0.003

words/sentence 0.001tokens/vocabulary 0.153

SCH > Control P-valuesentences/ doc 0.038

words/doc 0.460

Page 9: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Repetitions• Example: One day um , my um , my sister had brought my , her niece , her daughter , my sister had brought her daughter uh to watch my dog right .

• Repetition: Calculate the frequency that one word appeared repeatedly within some window size (5).

• Repetition of Words and punctuations

Page 10: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Repetitions: Significance?

Rep-Word0

0.02

0.04

0.06

SCHNC

Rep-Punc0

0.010.020.030.040.050.06

SCHNC

P-value < 0.001 P-value < 0.001

Significant: - Rep-word

Significant: - Rep-punctuation

Page 11: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Lexical Features

• Words - Frequency in narratives

• Repetition of specific words - The presence of repetition about one word (0/1)

• Example: She was , she was a huge , she was very , very wonderful.

Page 12: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

More common in Schizophrenia

P-value Features< 1e-3 I couldn’t extremely mildly money0.001 – 0.01 extreme feeling moderately my took

way ?0.01 – 0.05 ain’t alone at aw before

behind became care chance confused

• First personal pronoun: I, my• money• Feelings • some adverbs: mildly, moderately, extremely• ?

Page 13: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

More common in Schizophrenia

• Focus on family (grandfather, sister, son)• dog/ dogs

P-value Features0.01 – 0.05 December dog dogs forty friends

god got grandfather guess guyhand hanging hearing hundred increasedlooking loved mental met mildmoderate myself outside paper passedpiece remember sister son standstand stop story take takenthrowing trouble use wake wanna

Page 14: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

More common in Control

P-value Features

< 1e-3 comma0.001 – 0.01 really sorry very0.01 – 0.05 able actually are basically be

being get’s in late notrelationship result she’s sleep telltheir there’s weeks

• Third person plural: their • sorry• Some adjectives and adverbs: actually, basically,

really, very

Page 15: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Significant Rep+Lexical Features

• Patients: more repetition of and, um, I, a, was.

Schizophrenia Status P-valueRep-and SCH < 0.001Rep-um SCH 0.008

Rep-I SCH < 0.001Rep-a SCH 0.011

Rep-was SCH 0.018

Page 16: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Significant Rep+Lexical Features• Patients: more repetition of and, um, I, a, was.

• Control: more repetition of comma, very.

Schizophrenia Status P-valueRep-and SCH < 0.001Rep-um SCH 0.008

Rep-I SCH < 0.001Rep-a SCH 0.011

Rep-was SCH 0.018

Control Status P-valueRep-, NC 0.001

Rep-very NC 0.007

Page 17: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

LIWC Method: - Degree for usage of different categories of words - Dictionary based approach - 69 dictionaries

Example: Cried - sadness, negative emotion, overall effect, verb, past-tense verb

Previous Use - writing styles, physical and emotional pain (Tausczik and Pennebaker, 2010)

Page 18: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

LIWC – Significant Features

Category #words Example Status P-valueI 12 I, me, mine SCH < 0.001personal pronoun

70 I, them, itself, you SCH 0.029

insight 195 Think, know, consider

SCH 0.026

adverb 69 Very, really, quickly

NC 0.001

exclusive words

17 But, without, exclusive

NC 0.005

Inhibition 111 Block, constrain, stop

NC 0.019

More common for Patients & Control

Page 19: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

DictionMethod: - Also dictionary-based approach - 28 small categories, 5 master variables

Master variables (major categories) - Realism, Optimism, Certainty, Activity, Commonality.

Example: Certainty = [Tenacity + Leveling + Collectives + Insistence] - [Numerical Terms + Ambivalence + Self Reference + Variety]

Page 20: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Diction – Significant Features

Category Status P-valueself reference SCH < 0.001

cognitive terms SCH 0.014

past SCH 0.036insistence SCH 0.046satisfaction SCH 0.047

Page 21: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Diction – Significant Features

Category Status P-valueself reference SCH < 0.001word-mean-length NC < 0.001realism NC < 0.001diversity NC 0.005familiarity NC 0.019cognitive terms SCH 0.014cooperation NC 0.027past SCH 0.036insistence SCH 0.046satisfaction SCH 0.047

Page 22: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Workflow

Narratives(Training)

Features

Lexical Feature Extraction

Page 23: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Workflow

Narratives(Training)

Features Selected Features

Narratives(Training)

Lexical Feature Extraction

Feature Selection

Page 24: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Feature Selection Two-tailed T-test for real valued features

- Thresholds: 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.15

Signal to noise - Using Challenge Learning Object Package (CLOP)

Page 25: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Experimental Setup Leave-one-subject-out (39 times) Subject Status = Story Status

Page 26: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Experimental Setup

Voting: stories -> subjects

Evaluation metrics: Accuracy and F-measure - by stories - by subjects

Page 27: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Workflow

Narratives(Training)

Features Selected Features

Narratives(Training)

Narratives(Testing)

SVM-light +

Control

Patients

Lexical Feature Extraction

Feature Selection

Narratives(Training) Voting

?

Page 28: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Performance by T-testMuch higher than Random

P-value by Story by Subject # Features

0.05 62.7 64.1 169

Page 29: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Performance by T-testMore noise when relaxing threshold

P-value by Story by Subject # Features0.15 59.0 58.9 4500.1 61.7 64.1 341

0.05 62.7 64.1 169

Page 30: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Performance by T-test Better performance when tighten the threshold Best Performance when threshold = 0.001

P-value by Story by Subject # Features0.15 59.0 58.9 4500.1 61.7 64.1 341

0.05 62.7 64.1 1690.01 57.7 65.4 44

0.005 64.2 71.6 320.001 65.7 75.6 18

0.0005 61.7 66.7 14

Page 31: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Performance changing with feature size Best performance achieved when feature = 25 Signal to noise selection

Page 32: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Best Performance by Signal-to-noise

Achieved when #Features = 25 - Accuracy for story 64.7%, accuracy for subject: 76.9% - Patient Recall: 91.3%

Schizophrenia Control General

P(%) R(%) F(%) P(%) R(%) F(%) Accuracy Macro-F

75.0 91.3 82.4 81.8 56.3 66.7 76.9 74.6

Page 33: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Status Prediction by Emotion

Accuracy (%) Signal-to-noise (25) T-test (0.05) T-test (0.001)

Happy 66.7 59.0 71.8

Disgust 63.4 61.0 51.2

Anger 61.0 70.7 70.7

Fear 60.0 55.0 67.5

Sad 72.5 60.0 67.5

Story 64.7 62.9 65.7

Patient 76.9 64.1 74.4

Same training data Predict on different emotions Different approaches and settings

Page 34: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Number of features on different thresholds

0.1 0.05 0.010

50

100

150AngerSadHappyDisgustFear

p-value

# Features

More features -> more distinguishing

From one emotion

T-test

Page 35: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Emotion related analysis

Emotion Schizophrenia ControlHappy ambivalent doDisgust dogs, health communicationAnger argued praiseFear money accidentSad satisfaction working

Higher value in each emotion

Page 36: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Conclusion Analyze distinguishing power of different features - Basic features - Lexical features, repetitions - LIWC - Diction 25 features: top performance (65%, 77%) - p-value feature selection - signal-to-noise feature selection Different emotions have different distinguishing power - anger, sad > happy > fear, disgust

Page 37: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Thank you !!!

Page 38: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Backup Slides

Page 39: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Related work LMs to detect language dominance and language impairment

(Gabani et al, 2009) Speech related features for autism patients (Heeman et al, 2010) Syntax features for mild cognitive impairment (Roark et al, 2011) Syntactic complexity features for autism (Prud’hommeaux et al, 2011) Lexical features to recognize different personalities (Gill et al,

2009; Mairesse et al, 2006) Predict adherence to treatment and syndrome scale in

Schizophrenia through conversations (Howes, et, al, 2012)

Page 40: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Language Model

Using Unigram, Bigram, Trigram

Use Pos-Tag and Lexical

Simply using Laplace smoothing,

Page 41: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

LMs Performance

By Story(%) Schizophrenia-F Control-F AccuracyRandom 54.4 44.6 50.02-gram 62.5 44.4 55.22-gram-pos 62.2 53.3 58.2

By Subject(%) Schizophrenia-F Control-F AccuracyRandom 54.1 45.0 50.02-gram 62.5 50.0 58.92-gram-pos 62.2 54.5 61.5

Page 42: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Feature Normalization

Approach 1: - Get the average from training data.

Approach 2: - Get the maximum and minimum from training data. - Projection into [0,1].

Page 43: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Motivating Applications

Track patient status between visits

Early automatic diagnosis and screening

Page 44: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Best Performance by Signal-to-noise Achieved when #Features = 25 - Accuracy for story 64.7%, accuracy for subject: 76.9% - Patient Recall: 91.3%

Schizophrenia Control General

Measurement P(%) R(%) F(%) P(%) R(%) F(%) Accuracy Macro-F

Story Majority 59.7 100 74.8 0 0 0 59.7 37.425-

Features68.7 75.0 71.7 57.1 49.4 52.9 64.7 62.3

Sub-ject

Majority 59.0 100 74.2 0 0 0 59.0 37.125-

Features75.0 91.3 82.4 81.8 56.3 66.7 76.9 74.6

(All) Average 59.7 50 54.4 40.5 50 44.6 50.0 49.5

Page 45: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber

Diction Definitions• Cognitive terms: Modes of discovery, Mental challenges, Institutional

learning practices, Intellection: intuitional, retionalistic, calculative.• Reality: [Familiarity + Spatial Awareness + Temporal Awareness +

Present Concern + Human Interest + Concreteness] -[Past Concern + Complexity]

• Diversity: Neutral: inconsistent, contrasting; Positive: exceptional, unique; Negative: Extremist

• Cooperation: work relations, interactions, associations, job-related tasks, personal involvement, etc. (sisterhood, friendship, teamwork, consolidate, relationship)

• Familiarity: consisting of a selected number of C.K. Ogden’s (1968) operation words which he calculates to be the most common words in the English language