modeling activist discourse on state repression through...

18
Modeling Activist Discourse on State Repression Through Text Analytics Evan Brown and Hailey Reeves

Upload: others

Post on 06-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Modeling Activist Discourse on State Repression Through

Text Analytics

Evan Brown and Hailey Reeves

Page 2: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Overview of Project● Analyzed trends in activist discourse on state repression in recent decades, using a data set of

94 documents related to state repression. ●● We define state repression as the actual or threatened use of penalization against a person or

an organization committing nonviolent dissent within state jurisdiction.●● Perception has a direct connection to why and how people engage with causes they deem

important. Social movements heavily influence societal ethics and law. ●● Our analysis will determine underlying themes that influence activist engagement as well as

common perceptions held by activists within this sample.

Page 3: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

MethodProcessed documents using OCR software

Created frequency tables of all terms; removed stop words

Used frequency tables to develop networks using NetworkX

Input text files into R and performed LSA using R packages “tm" and “lsa”

Used high degree terms from network, found most similar LSA terms

Created high degree LSA network

Created additional networks with input from Reynolds-Stenson

Page 4: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Text Mining and Latent Semantic Analysis● Text Mining - general term for extracting useful information from text.● Latent Semantic Analysis (LSA) - Manipulating a term-document matrix using SVD and a

low-rank approximation of the resultant matrices in order to extract semantic information and to better see similarities between terms and documents.

● Singular Value Decomposition (SVD) - The process of splitting a matrix into the product of three matrices, according to the equation where the middle matrix is diagonal, containing only the matrix M’s singular values as entries.

● Query Matching (Cosine method) - For a certain document vector aj , and a query q, the angle θ between them can be found with:

Page 5: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Generating the LSA Space● Create term-document matrix from original/OCR texts● Perform SVD to split the matrix into U, Σ , V*, where Σ is diagonal● Determine the optimal number of dimensions, k, to truncate

○ A “least-squares” problem○ Determined using “dimcalc_share”○ This method calculates all singular values and finds the first position where their sum is

greater than or equal to a certain percentage of the sum of all singular values (for this case, 50% is chosen as the default measure).

○ For our LSA space, k=12

● Truncate each matrix, and multiply them together to return to term-document format but with semantic information better highlighted

Page 6: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

LSA Conclusions/Results● Resultant term-document matrix is assumed to better incorporate

semantic similarities between terms and documents. Our results show this is the case

● Can be seen in the example below, as well as in the document similarity matrices.

Page 7: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Document Similarity Matrix (Pre-LSA)

Page 8: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Document Similarity Matrix (Post-LSA)

Page 9: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Network Analysis, Informal Overview● Assortativity: The measure of how correlated nodes are by degree.●● Centrality: A way to indicate hubs. ●● Density: the ratio of the edges in the network over all possible edges

between any pair of vertices. ●● Clustering Coefficient: A percentage that measures how well a

group of nodes are cliqued. ●● Weight: Strength of links/edge.

Closeness (top)Betweenness (bottom)

Assortativity

Page 10: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Frequency Network

● Used degree centrality (nodes with highest number of incident edges) to determine most significant terms

● Terms included: action, court, federal, government, information, law, people, police, prison, security, and support

Page 11: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

High Degree LSA NetworkNumber of nodes: 235Number of edges: 252Average degree: 2.1447Graph Density:

0.009165302782Average clustering: 0.03177123113293326Transitivity

0.01265822785

Assortativity: -0.916834

Page 12: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Centrality (>0.10)Betweenness Centrality Closeness Centrality

< 0.15 “Provide”, “security”, “success”, “system”, “war”

< 0.13 “Arrested”, “deadly”, “force”

< 0.23 “court”, “federal”, “law”, “national”, “serve”

< 0.15 “FBI”, “asserted”, “conduct”, “convictions”, “crime”, “criminal”, “custody”, “detained”, “detentions”, “disobedience”, “espionage”, “imminent”, “intent”, “law”, “preventive”, “rights”

< 0.30 “government” < 0.17 “Court”

Potential theme:Prosecution, security, civil disobedience

Page 13: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

High Degree Network - Line GraphInterpretation:Nodes with high degree centrality (>0.07) revolve around security and police:

('data', 'security'): 0.07171314741035856,('information', 'police'): 0.13147410358565736,('measures', 'security'): 0.07171314741035856,('messages', 'security'): 0.07171314741035856,('personal', 'police'): 0.07171314741035856,('police', 'public'): 0.07171314741035856,

Page 14: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

“People-Police” LSA Network● Created because nodes “people” and

“police” had highest degree rank in original network

●● Network is the union of the two terms

and their respective latent semantic info

●● Shared vertices are few: only “versus”

and “street”, depicting an explicit conflict

Page 15: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

Conclusion●

● The most connected and central terms illustrate a narrative of conflict that parallel Reynold-Stenson’s findings in her interviews.

● This is supported by how consequential terms such as “arrested”, “convictions”, “court”, “force”, “law”, and “rights” had high centrality measures relative to the rest of the network (>0.1).

● Trends across these texts could indicate trends in activist perceptions in the original sample.

● Further research with a larger data set and a more generalized demographic may allow for a more generalized conclusion and actual hypothesis testing, but this work provides potential hypotheses to test.

Page 16: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

AcknowledgementsSincere thanks is given to Heidi Reynolds-Stenson for providing our data and offering her conclusions, as well as to William Lippitt and Dr. Ildar Gabitov for facilitating this project.

Page 17: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

References1. Padraig Mac Carron and Ralph Kenna, Universal properties of mythological networks, EPL, 99 (2012) 28002, doi: 10.1209/0295-5075/99/28002

2. M.E.J. Newman, Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002)

3. David Easley and Jon Kleinberg, "Positive and negative relationships," in Networks, Crowds, and Markets: Reasoning about a Highly Connected World, Cambridge University Press, 2010. Print.

4. Anand, Ashish. "Complex Network Theory: An Introductory Tutorial." Department of Computer Science and Engineering. Indian Institute of Technology, Guwahati. 12 Sept. 2013. Lecture.

Page 18: Modeling Activist Discourse on State Repression Through ...math.arizona.edu/~gabitov/teaching/181/math_485/... · NetworkX Input text files into R ... For a certain document vector

References5. Fariss, C. J. & Schnakenberg, K. F. Measuring Mutual Dependence between State Repressive Actions. Journal of Conflict Resolution 58, (2014).

6. Noldus, R. & Mieghem, P. V. Assortativity in Complex Networks. Assortativity Survey (2015).

7. Turner, S. Success in Social Movements: Looking at Constitutional-Based Demands to Determine the Potential Success of Social Movements. (2013).

8. Saito, N. MAT 167: Applied Linear Algebra, Lecture 22: Text Mining. (2012).

9. Kim, K. (2018, March 12). Mathematical approach for Text Mining 1. Lecture presented in Ulsan National Institute of Science and Technology, Ulsan.