www.trendminer-project.eu 21st november 2014, budapest social psychological analysis of public...

25
www.trendminer- project.eu 21st November 2014, Budapest Social Psychological Analysis of Public Political Comments on Facebook Márton Miháltz

Upload: coleen-robyn-barnett

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

www.trendminer-project.eu

21st November 2014, Budapest

Social Psychological Analysis of Public Political Comments on

Facebook

Márton Miháltz

TrendMiner Overview

• What kind of social political trends are there in Hungarian comments to political posts on Facebook?– Facebook in Hungary: 4.27M registered users = 59.2% of internet users,

43% of total population

• Download all public comments from Hungarian politicians’, parties’ facebook pages

• Analysis of comments:– Basic NLP (tokenization, PoS, stemming), domain-adapted– Entities: political actors (people, organizations)– Sentiment– Social psychology dimensions: agency/communion, individualism/collectivism,

optimism/pessimism, primordial/conceptual thinking

• In cooperation with Narrative Psychology Research Group, Hungarian Academy of Sciences

2

Data Acquisition

• Get comments via fb Graph API– 1.9M comments for 141K fb posts (2013.10.01 – 2014.09.02)– from 1344 fb pages

• Organizations: parties, regional and associated branches• People: candidate and elected representatives (MPs), government,

party officials• Official and fan pages

– In 3 categories• Hungarian parliament 2010-2014• Hungarian parliament elections 2014 (6th April)• EU parliament elections 2014 (25th May)• Sources: valasztas.hu, wikipedia.hu

• Everything in a MySQL database– For arbitrary queries (political groups, time etc.)

Data model

• Fb_pages– Id, URL, Page title– Type: person or organization– Affiliated party (3 campaigns)

• Fb_posts, Fb_comments– Id, Created_timestamp– Message text, Author_user_id

• Comments_annotations– Sentence_id, Start_token,

End_token index– Annotated text,

Lemmatized_annotated_text, Annotation_tag

• Fb_comments_scores– 16 scores and counts

(sentiment, RID,, agency, communion, optimism, …)

Hungarian Political Ontology

• Extending TM multilingual political ontology– 8 New classes, 3+3 new object/data properties, 1579

new instances (1 Country,18 Party, 661 Politician, 899 Nomination)

– Nominated and elected MPs (2010 Hu. Parl., 2014 Hu. Parl., 2014 EU Parl.), nominating parties;

– Names, abbreviated names, nicknames, Facebook page URLs etc.

• Example:

5

6

Example: Benedek Jávor was member of Hungarian Parliament during 2010-2014 (nominated by LMP), member of European Parliament from 2014 (nominated by EGYÜTT-PM).

Hungarian Political Ontology

Processing Pipeline

• Downloading (Fb Graph API py script)• Tokenizaton (huntoken tool)• PoS-tagging (hunmorph tool)• Morphological analysis (hunmorph tool)• Stem+analysis disambiguation (Python script)• Content analysis (Java NooJ)• Scoring & storage in DB• Uploading in RDF to TM Integration Server

Domain Adaptation

• Problem: existing NLP tools developed on different domain, (f)ail on social media language (facebook comments)

• Using corpus for survey:– 1.25M fb comments (29M tokens)– 2.25M unknown tokens (694K types)– Frequency list, f > 15 items manually revised– Identify common problems– Lists of frequent, relevant unknown, new words etc.

Domain Adaptation: Tokenization

• Huntoken tool• Frequent problems:

– missing spaces around punctuation... end of sentence.Beginning of another ...

– Multiplicated punctuationfirst part……. Second part

– Contracted words (slang)asszem = azt hiszem (“I think”)

– Consonant multiplication (interjections, onomatopeic words etc.)e.g. pfffffffff, uffffff, ejjjjjjjj (pff(f*), uff(f*), ej(j*))

– split large numbers by decimal groups125 000

– split URLS– split emoticons

: D

Domain Adaptation: PoS/stemming

• Hunpos tagger + hunmorph analyzer + stemming script• Frequent problems:

– Unknown words (no lemma/PoS)• add to hunmorph analyzer’s lexicon • using analogous words (morphological paradigm)• Compounds, abbreviations, acronyms, slang words etc.

– Frequently misspelled word forms: • replace with correct forms

– Wrong capitalizatione.g. SENTENCES IN ALL CAPS

– Missing accent characters –disambiguation model neededE.g. kor (age), kór (disease), kör (circle)

NooJ, Java NooJ, Nooj-cmd

• Java NooJ– Open source version of NooJ: define and run finite state

machines for querying, annotation etc. (morphology, syntax) – NooJ-Cmd extension: all NooJ GUI features => command line

options– Open source: https://github.com/tkb-/nooj-cmd

• NooJ grammars (FSMs) for annotation:– Actors (entities)– Emotional valence (sentiment polarity)– Regressive imagery dictionary– Agency-communion– Optimism-pessimism– Individualism-collectivism

Development of NooJ Grammars

• In collaboration with social psychologist researchers– Social Psychology Department, Eötvös Lóránd University,

Budapest– Narrative Psychology Research Group, Hungarian Academy of

Sciences• Development Corpus

– 176K sample fb comments from 570 fb pages (4.9M tokens)– NLP annotation– Frequency lists (lemmas, lemmas+PoS, lemmas+morphological

info etc.)• Development:

– f > 100 content words from development corpus (3500 types)– 7 independent annotators– >= 4 annotartors agree: manual revision– Compile into NooJ grammar with polarity shifters, items to be

excluded etc.

1. Political Actors (NEs)

• Maxent NE tool (huntag): low performance on domain– Trained on standard language news texts– Miscategorization, false positive NEs, entity boundary

recognition problems• NooJ grammar/lexicon for Trendminer

– Person names: family_name (given_name_lemmatized)? | frequent_nicknames …

– Organization names:Standard_form | abbreviated_forms… | nicknames…

– Created automatically (names from DB) + manually (nicknames from freq. lists)

2. Emotional Valence

• Emotions with positive or negative polarity• Polarity in context: recognize negation using simple

rules• Nouns, adjectives, verbs, adverbs, emoticons, multi-

word expressions• 500 Positive, 420 negative entries

3. Regressive Imagery Dictionary

• Martindale (1975, 1990): uncover psychological processes reflected in the text

• 2 basic categories of thinking:– Primordial (primary): associative, concrete, and takes little

account of reality (fantasy, dreams)– Conceptual (secondary): abstract, logical, reality oriented,

aimed at problem solving

• 7+29 more subcategories (social behavior, cognition, perceptions, sensations etc.)

• Hungarian version by Pólya and Szász• 3000+ terms

4. Agency/Communion

• 2 fundamental dimensions of social values:– Communion: moral and emotional aspects of an individual’s

relations to others (affection, expressiveness, cooperation, social benefit etc.)

– Agency: efficiency of an individual’s goal-orientated behavior (motivation, competence, control)

• Positive or negative for both dimensions– Context dependent (e.g. negation)

• 640 expressions

5. Optimism/Pessimism

• Based on PoS and morphology annotations + time expressions

• 2 measures:1. |future_tense_verbs| / (|present_tense_verbs| + |past_tense_verbs|)2. |present_tense_verbs| / |past_tense_verbs|

• Both correlate with degree of optimism

6. Individualism/Collectivism

• Based on PoS and morphology annotations• 1 measure:

|personal pronouns| /(|verbs with personal inflection| + |nouns with possessive inflection|)

• Higher score: higher degree of individualism

Visualisation

19

20

21

22

23

Dissemination and Exploitation

• Presentations– Hungarian NLP Meetup, Sept. 25. 2014., Budapest– conText, Nov. 20. 2014, Budapest

• Conference papers, presentations– 2 papers at 11th Conference on Hungarian Computational Linguistics (January

15-16. 2015., Szeged)

• Source code– https://github.com/mmihaltz/trendminer-hunlp– https://github.com/mmihaltz/trendminer-hutools– https://github.com/tkb-/nooj-cmd

• Project website (http://corpus.nytud.hu/trendminer)– Download political ontology– Download 1.9M facebook comments corpus (w/ annotations)– Project info, papers, presentations slides

24

www.trendminer-project.eu

21st November 2014, Budapest

Thank You!