lexical differences in autobiographical narratives from schizophrenic patients and healthy controls...
TRANSCRIPT
Lexical Differences in Autobiographical Narratives from SchizophrenicPatients and Healthy Controls
Kai Hong, Christian G. Kohler, Mary E. March,
Amber A. Parker, Ani Nenkova
University of Pennsylvania
Our Task
Identifying significant differences in lexical use from narratives by Patients vs Controls
Perform automatic classification
Identify a small subset of highly distinguishing features
How prediction accuracy varies with emotion type
Observations on lexical use# occurrences in narratives
Subjects Patient Controldog/dogs 28 1money 41 4sorry 0 7relationship 0 9
Self reference – “ I “Total occurring times: 1291 times vs 626 timesRatio after normalization by #words: 5.5% vs 4.3%
Dataset 201 stories from 39 subjects
- Patients: 120 stories, 23 patients- Controls: 81 stories, 16 controls
Five emotions: Anger Sad Happy Disgust Fear
Talk about past experience (moderately, mildly, extremely) in their lives
30 – 90 seconds to finish the story
Length of Stories
• No big difference when Patients vs Controls• Some difference between emotions.
Average # WordsPatients 192
Controls 181
P-value: 0.4254
Workflow
Narratives (Training)
Features
Lexical Feature Extraction
Features Basic Feature - Few, easy to compute general features Lexical Features and Repetitions - Sparse and many LIWC, Diction - Based on dictionary, More general Two-tailed T-test for significant features - 169 out of 6057 significant
Basic Features• Patients have more: sentences/document, words/document
• Control have more: letters/word, words/sentence, tokens/vocabulary
Control > SCH P-valueletters/word 0.003
words/sentence 0.001tokens/vocabulary 0.153
SCH > Control P-valuesentences/ doc 0.038
words/doc 0.460
Repetitions• Example: One day um , my um , my sister had brought my , her niece , her daughter , my sister had brought her daughter uh to watch my dog right .
• Repetition: Calculate the frequency that one word appeared repeatedly within some window size (5).
• Repetition of Words and punctuations
Repetitions: Significance?
Rep-Word0
0.02
0.04
0.06
SCHNC
Rep-Punc0
0.010.020.030.040.050.06
SCHNC
P-value < 0.001 P-value < 0.001
Significant: - Rep-word
Significant: - Rep-punctuation
Lexical Features
• Words - Frequency in narratives
• Repetition of specific words - The presence of repetition about one word (0/1)
• Example: She was , she was a huge , she was very , very wonderful.
More common in Schizophrenia
P-value Features< 1e-3 I couldn’t extremely mildly money0.001 – 0.01 extreme feeling moderately my took
way ?0.01 – 0.05 ain’t alone at aw before
behind became care chance confused
• First personal pronoun: I, my• money• Feelings • some adverbs: mildly, moderately, extremely• ?
More common in Schizophrenia
• Focus on family (grandfather, sister, son)• dog/ dogs
P-value Features0.01 – 0.05 December dog dogs forty friends
god got grandfather guess guyhand hanging hearing hundred increasedlooking loved mental met mildmoderate myself outside paper passedpiece remember sister son standstand stop story take takenthrowing trouble use wake wanna
More common in Control
P-value Features
< 1e-3 comma0.001 – 0.01 really sorry very0.01 – 0.05 able actually are basically be
being get’s in late notrelationship result she’s sleep telltheir there’s weeks
• Third person plural: their • sorry• Some adjectives and adverbs: actually, basically,
really, very
Significant Rep+Lexical Features
• Patients: more repetition of and, um, I, a, was.
Schizophrenia Status P-valueRep-and SCH < 0.001Rep-um SCH 0.008
Rep-I SCH < 0.001Rep-a SCH 0.011
Rep-was SCH 0.018
Significant Rep+Lexical Features• Patients: more repetition of and, um, I, a, was.
• Control: more repetition of comma, very.
Schizophrenia Status P-valueRep-and SCH < 0.001Rep-um SCH 0.008
Rep-I SCH < 0.001Rep-a SCH 0.011
Rep-was SCH 0.018
Control Status P-valueRep-, NC 0.001
Rep-very NC 0.007
LIWC Method: - Degree for usage of different categories of words - Dictionary based approach - 69 dictionaries
Example: Cried - sadness, negative emotion, overall effect, verb, past-tense verb
Previous Use - writing styles, physical and emotional pain (Tausczik and Pennebaker, 2010)
LIWC – Significant Features
Category #words Example Status P-valueI 12 I, me, mine SCH < 0.001personal pronoun
70 I, them, itself, you SCH 0.029
insight 195 Think, know, consider
SCH 0.026
adverb 69 Very, really, quickly
NC 0.001
exclusive words
17 But, without, exclusive
NC 0.005
Inhibition 111 Block, constrain, stop
NC 0.019
More common for Patients & Control
DictionMethod: - Also dictionary-based approach - 28 small categories, 5 master variables
Master variables (major categories) - Realism, Optimism, Certainty, Activity, Commonality.
Example: Certainty = [Tenacity + Leveling + Collectives + Insistence] - [Numerical Terms + Ambivalence + Self Reference + Variety]
Diction – Significant Features
Category Status P-valueself reference SCH < 0.001
cognitive terms SCH 0.014
past SCH 0.036insistence SCH 0.046satisfaction SCH 0.047
Diction – Significant Features
Category Status P-valueself reference SCH < 0.001word-mean-length NC < 0.001realism NC < 0.001diversity NC 0.005familiarity NC 0.019cognitive terms SCH 0.014cooperation NC 0.027past SCH 0.036insistence SCH 0.046satisfaction SCH 0.047
Workflow
Narratives(Training)
Features
Lexical Feature Extraction
Workflow
Narratives(Training)
Features Selected Features
Narratives(Training)
Lexical Feature Extraction
Feature Selection
Feature Selection Two-tailed T-test for real valued features
- Thresholds: 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.15
Signal to noise - Using Challenge Learning Object Package (CLOP)
Experimental Setup Leave-one-subject-out (39 times) Subject Status = Story Status
Experimental Setup
Voting: stories -> subjects
Evaluation metrics: Accuracy and F-measure - by stories - by subjects
Workflow
Narratives(Training)
Features Selected Features
Narratives(Training)
Narratives(Testing)
SVM-light +
Control
Patients
Lexical Feature Extraction
Feature Selection
Narratives(Training) Voting
?
Performance by T-testMuch higher than Random
P-value by Story by Subject # Features
0.05 62.7 64.1 169
Performance by T-testMore noise when relaxing threshold
P-value by Story by Subject # Features0.15 59.0 58.9 4500.1 61.7 64.1 341
0.05 62.7 64.1 169
Performance by T-test Better performance when tighten the threshold Best Performance when threshold = 0.001
P-value by Story by Subject # Features0.15 59.0 58.9 4500.1 61.7 64.1 341
0.05 62.7 64.1 1690.01 57.7 65.4 44
0.005 64.2 71.6 320.001 65.7 75.6 18
0.0005 61.7 66.7 14
Performance changing with feature size Best performance achieved when feature = 25 Signal to noise selection
Best Performance by Signal-to-noise
Achieved when #Features = 25 - Accuracy for story 64.7%, accuracy for subject: 76.9% - Patient Recall: 91.3%
Schizophrenia Control General
P(%) R(%) F(%) P(%) R(%) F(%) Accuracy Macro-F
75.0 91.3 82.4 81.8 56.3 66.7 76.9 74.6
Status Prediction by Emotion
Accuracy (%) Signal-to-noise (25) T-test (0.05) T-test (0.001)
Happy 66.7 59.0 71.8
Disgust 63.4 61.0 51.2
Anger 61.0 70.7 70.7
Fear 60.0 55.0 67.5
Sad 72.5 60.0 67.5
Story 64.7 62.9 65.7
Patient 76.9 64.1 74.4
Same training data Predict on different emotions Different approaches and settings
Number of features on different thresholds
0.1 0.05 0.010
50
100
150AngerSadHappyDisgustFear
p-value
# Features
More features -> more distinguishing
From one emotion
T-test
Emotion related analysis
Emotion Schizophrenia ControlHappy ambivalent doDisgust dogs, health communicationAnger argued praiseFear money accidentSad satisfaction working
Higher value in each emotion
Conclusion Analyze distinguishing power of different features - Basic features - Lexical features, repetitions - LIWC - Diction 25 features: top performance (65%, 77%) - p-value feature selection - signal-to-noise feature selection Different emotions have different distinguishing power - anger, sad > happy > fear, disgust
Thank you !!!
Backup Slides
Related work LMs to detect language dominance and language impairment
(Gabani et al, 2009) Speech related features for autism patients (Heeman et al, 2010) Syntax features for mild cognitive impairment (Roark et al, 2011) Syntactic complexity features for autism (Prud’hommeaux et al, 2011) Lexical features to recognize different personalities (Gill et al,
2009; Mairesse et al, 2006) Predict adherence to treatment and syndrome scale in
Schizophrenia through conversations (Howes, et, al, 2012)
Language Model
Using Unigram, Bigram, Trigram
Use Pos-Tag and Lexical
Simply using Laplace smoothing,
LMs Performance
By Story(%) Schizophrenia-F Control-F AccuracyRandom 54.4 44.6 50.02-gram 62.5 44.4 55.22-gram-pos 62.2 53.3 58.2
By Subject(%) Schizophrenia-F Control-F AccuracyRandom 54.1 45.0 50.02-gram 62.5 50.0 58.92-gram-pos 62.2 54.5 61.5
Feature Normalization
Approach 1: - Get the average from training data.
Approach 2: - Get the maximum and minimum from training data. - Projection into [0,1].
Motivating Applications
Track patient status between visits
Early automatic diagnosis and screening
Best Performance by Signal-to-noise Achieved when #Features = 25 - Accuracy for story 64.7%, accuracy for subject: 76.9% - Patient Recall: 91.3%
Schizophrenia Control General
Measurement P(%) R(%) F(%) P(%) R(%) F(%) Accuracy Macro-F
Story Majority 59.7 100 74.8 0 0 0 59.7 37.425-
Features68.7 75.0 71.7 57.1 49.4 52.9 64.7 62.3
Sub-ject
Majority 59.0 100 74.2 0 0 0 59.0 37.125-
Features75.0 91.3 82.4 81.8 56.3 66.7 76.9 74.6
(All) Average 59.7 50 54.4 40.5 50 44.6 50.0 49.5
Diction Definitions• Cognitive terms: Modes of discovery, Mental challenges, Institutional
learning practices, Intellection: intuitional, retionalistic, calculative.• Reality: [Familiarity + Spatial Awareness + Temporal Awareness +
Present Concern + Human Interest + Concreteness] -[Past Concern + Complexity]
• Diversity: Neutral: inconsistent, contrasting; Positive: exceptional, unique; Negative: Extremist
• Cooperation: work relations, interactions, associations, job-related tasks, personal involvement, etc. (sisterhood, friendship, teamwork, consolidate, relationship)
• Familiarity: consisting of a selected number of C.K. Ogden’s (1968) operation words which he calculates to be the most common words in the English language