exploring the political agenda of the greek parliament ... · 1.063.546 unique speeches output •...
TRANSCRIPT
www.athenarc.gr
Exploring the Political Agenda of the Greek
Parliament Plenary SessionsDimitris Gkoumas, Maria Pontiki, Konstantina Papanikolaou, and Haris Papageorgiou
ATHENA Research & Innovation Centre/Institute for Language & Speech Processing(ATHENA RIC/ILSP), Greece.
ParlaCLARIN@LREC2018, 7 May 2018, Miyazaki, Japan
“APOLLONIS, Greek Infrastructure for Digital Arts and Humanities, Language Research and Innovation” (MIS 5002738)
“Computational Science and Technologies: Data, Content and Interaction” (MIS 5002437)
www.athenarc.gr
•Capturing the way political and policyagendas are shaped and evolve overtime.
•Reflecting the stance and the politicalbehavior under which individualParliament Members and politicalgroups react and act over time.
Parliament plenary sessions: Valuable Data Sources
2/20
www.athenarc.gr
Insights about:•the way in which the European Parliament reacts to exogenous events like the Eurocrisis (Greene and Gross,2017)
•the political attention of individual U.S. Senators (Quinn et al., 2010)
Parliament plenary sessions & Topic Modelling
3/20
Image source: https://theintelligenceofinformation.wordpress.com/2016/12/06/topic-modeling-latent-dirichlet-allocation-vs-correlation-explanation-alternative/
www.athenarc.gr
Exploring the Political Agenda of the Greek Parliament Plenary Sessions
Methodology: Overview
Data Collection Data Preparation Preprocessing Organization Topic Modelling Visualization
4/20
www.athenarc.gr
Data Collection
Data Collection
Data Preparation Preprocessing Organization Topic
Modelling Visualization
1989 to 2015 proceedingsà4356 plenary sessions
à1.063.546 unique speeches
5/20
www.athenarc.gr
Data Preparation
Different formats (txt/doc/docx/pdf/jpeg)
à structured and machine-readable xmlversion
Data Collection
Data Preparation
Preprocessing Organization Topic Modelling Visualization
6/20
www.athenarc.gr
Preprocessing
Data Collection
Data Preparation
Preprocessing
Organization Topic Modelling Visualization
ü Segment annotation: the start/end point of each speech
ü Temporal annotation: the date of each speech
ü Speaker annotation: the name of each speaker and his/her politicalaffiliation
ü POS tagging7/20
www.athenarc.gr
Data Organization
ü time windows of fixed durationà 1 month
ü 283 windows for the time period from 1989 to 2015
àdiscover topics within the monthly proceedings (e.g. segments)& monitor their dynamics over time
Data Collection
Data Preparation Preprocessing
Organization
Topic Modelling Visualization
8/20
www.athenarc.gr
Topic Modelling
Data Collection
Data Preparation Preprocessing Organization
Topic Modelling
Visualization
q Dynamic Non-negative Matrix Factorization (NMF) approach (Greene and Cross, 2017).
q After identifying a certain number of topics in each 1-month-window, the ones which are semantically similar are groupedtogether as “dynamic topics”, rather than identifying topics from the beginning.
9/20
www.athenarc.gr
Dynamic NMF
Input • Processed and organized data• 283 monthly windows,
1.063.546 unique speeches
Output • 283 window-based topics
• 90 dynamic topics
• Document term-matrix TF-IDF normalized• -stop words, +content words• -rare terms & units with length less than three
lines, +nouns & adjectives
• NMF model generates topics for each windowPhase 1
• NMF algorithm is applied again, with the number of k topics decided after applying the Topic Coherence via Word2Vec (TC-W2V)
• GrParl speeches used as a training corpus for the extraction of the word embeddings
Phase 2
10/20
www.athenarc.gr
Visualization
Data Collection Data Preparation Preprocessing Organization Topic Modelling
Visualization
ü By topic
ü By time
ü Search function
11/20
www.athenarc.gr
Greek Parliament Plenary Sessions presented by topic
12/20
www.athenarc.gr
20 most prominent words for the dynamic topic D79
13/20
www.athenarc.gr
Involved Parliament Members
14/20
www.athenarc.gr
Contribution of each political party to a dynamic topic
15/20
www.athenarc.gr
Contribution of each region to a dynamic topic
16/20
www.athenarc.gr
Most important window-topics related to a dynamic topic
17/20
www.athenarc.gr
Greek Parliament Plenary Sessions presented by time
18/20
www.athenarc.gr
DiscussionC
ontri
butio
nTransformation of the Greek Parliament plenary sessions minutes into a structured and machine readable corpus
√ For the first time available forNatural Language Processing andText Mining approaches
Insightful navigation across the Greek Parliament corpus
√ Topic based presentation√ Time-window based presentation √ Search function
Futu
re W
ork Qualitative analysis of the extracted dynamic topics
& assignment of descriptive labels
Exploration of the evolution of selected topics (e.g. refugee crisis, Eurocrisis) & comparison with the European Parliament and/or other European countries
Enrichment with more insights e.g. Quotations, Sentiment
19/20
www.athenarc.gr
Dimitris Gkoumas, Maria Pontiki, Konstantina Papanikolaou, and Haris PapageorgiouATHENA-Research & Innovation Centre in Information, Communication and Knowledge Technologies (ATHENA RIC), Greece.
Exploring the Political Agenda of the Greek Parliament Plenary Sessions
ParlaCLARIN@LREC2018, 7 May 2018, Miyazaki, Japan
“Computational Science and Technologies: Data, Content and Interaction” (MIS 5002437)“APOLLONIS, Greek Infrastructure for Digital Arts and Humanities, Language Research
and Innovation” (MIS 5002738)