new software and tools for analyzing …vanatteveldt.com/p/ica2017_tools.pdfcollecting and analyzing...

26
New Software and Tools for Analyzing Communication http://ica-cm.org/ica2017-tools/

Upload: others

Post on 16-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

New Software and Tools for Analyzing Communication

http://ica-cm.org/ica2017-tools/

Page 2: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Collecting and Analyzing Social Media Data Using SocialMediaLab

Timothy John Graham, The Australian National URobert Ackland, Australian National U

Page 3: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

1

SocialMediaLab R PackageAims to be the “Swiss army knife” for collecting social media data via free APIs and constructing datasets for network and text analysis

• Tim Graham (ANU, @TimothyJGraham)

• Rob Ackland (ANU, @RobAckland)

• Chung-hong Chan (Univ. of Hong Kong, @chainsawriot); new UI using maggritr

Download: https://cran.r-project.org/web/packages/SocialMediaLab/index.html

Tutorials / help: http://vosonlab.net/SocialMediaLab

Page 4: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

2

SocialMediaLab data typology

Page 5: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Workflow: code to data to network

3

• Collect 500 latest tweets from #ica17 and construct an “actor” network showing replies+mentions+retweetsbetween users

Try it yourself! https://goo.gl/8Rb1sA

Page 6: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Virtual Observatory for the Study of Online Networks (VOSON)

Robert Ackland, Australian National U

Page 7: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

1

Virtual Observatory for the Study of Online Networks (VOSON) software

Web-based tool – originally for hyperlink network construction and analysis – from June 2017 VOSON includes Twitter collection

Created at the Australian National University - VOSON Lab http://vosonlab.net

Since 2010 VOSON has been commercially hosted and developed by Uberlink http://www.uberlink.com

– used by academics, students, analysts worldwide

– over 2500 user accounts issued

Page 8: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

2

Uberlink VOSON development team:– Rob Ackland (ANU, Uberlink Founder & CEO)

– Jamsheed Shorish (Uberlink CTO)

– Francisca Borquez (Uberlink Communication Officer & Research Assistant)

VOSON 2.5 will be released 6 June 2017

– Improved user interface/workflow

– More flexbility with database naming (e.g. special characters)

– Collect Twitter data from the real-time stream of tweets matching your search criteria (e.g. hashtag use) over a scheduled time period.

Page 9: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

3

1. Scheduling a 1 hour collection on hashtag #ica17

2. As collection is run there is an update of the number of nodes (Twitter users) collected.

3. @mention tie network and key SNA metrics

Page 10: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Same, Same? Ensuring Comparative Equivalence in the Semantic Analysis of

Heterogeneous, Multilingual Corpora

Christian Baden, Hebrew U of Jerusalem

Page 11: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Christian Baden | Noah Mozes Department of Communication & JournalismEnsuring Comparative Equivalence

167th ICA Annual Conference | San Diego, CA, USA | 26-05 11 2017

PUNCHING THE BAG OF WORDS

…and some related approaches

BAG Assumptions of relation homogeneity

Assumptions about meaning uniqueness

Christian Baden | Noah Mozes Department of Communication & JournalismEnsuring Comparative Equivalence

267th ICA Annual Conference | San Diego, CA, USA | 26-05 11 2017

WORDS

JAMCODE

Page 12: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Christian Baden | Noah Mozes Department of Communication & JournalismEnsuring Comparative Equivalence

267th ICA Annual Conference | San Diego, CA, USA | 26-05 11 2017

IN THE BEGINNING, THERE WAS THE WORD…

Lexical units are different from unique meanings:

Languages, jargons, etc.

Synonymy, Nicknames, Acronyms, etc.

Polysemy

Roots & Inflections

Partial meaning: Metonymy, Metaphor, etc.

Nested meanings

Polygrams: named entities, standing expressions

Anaphora, Coreference & Exophora

Need to map words as pointers onto meanings

DICTIONARY

‘Trump’ or ‘The Donald’ or ‘US President’ or

‘Agent Orange’ or …

…but not ‘Trump card’, ‘Ivanka Trump’, …

…only after 20 January

2017Trump’s Valueless Foreign Policywww.nytimes.com | Roger Cohen | 2 May 2017

So the threats were no more than bluster, and all is well. That is one view of President Trump’s foreign policy at the 100-day-or-so mark.

Wrong.

Yes, there’s no sign of the Wall, and NATO is no longer “obsolete,” and the Iran nuclear deal is still in place, […]

Defense Secretary Jim Mattis and H.R. McMaster, the national security adviser, have ring-fenced Trump’s recklessness and bellicosity. They have neutralized his ignorance even if nobody can help the president grasp its extent. Some of the loonier members of the president’s entourage have been fired or marginalized. Adults have taken charge. There is still a lot of noise, but “America First” has not upended the world.

The WallMusic album by

Pink Floyd

wall (n.)Continuous vertical

brick or stone structure

fired (1)job contract

ended

fired (2)launched projectile

The WallTrump’s policy proposal to protect border with

Mexico

Page 13: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Christian Baden | Noah Mozes Department of Communication & JournalismEnsuring Comparative Equivalence

367th ICA Annual Conference | San Diego, CA, USA | 26-05 11 2017

Content Analysis Tool

Trump’s Valueless Foreign Policywww.nytimes.com | Roger Cohen | 2 May 2017

So the threats were no more than bluster, and all is well. That is one view of President Trump’s foreign policy at the 100-day-or-so mark.

Wrong.

Yes, there’s no sign of the Wall, and NATO is no longer “obsolete,” and the Iran nuclear deal is still in place, […]

Defense Secretary Jim Mattis and H.R. McMaster, the national security adviser, have ring-fenced Trump’s recklessness and bellicosity. They have neutralized his ignorance even if nobody can help the president grasp its extent. Some of the loonier members of the president’s entourage have been fired or marginalized. Adults have taken charge. There is still a lot of noise, but “America First” has not upended the world.

LETTING THE CAT OUT OF THE BAG

Co-presence is related to relatedness,

but the relationship is complicated.

Macrosyntax: Headlines, Turns, Lists, etc.

Syntax: Clauses, Parentheses, Sentences, etc.

Conjunctions, Grammatic Roles & POS

Anaphora, Coreference & Exophora

Sequence & Proximity

Stylistic devices: Rhymes, Puns, Alliterations

Register & Contextual Knowledge

JAMCODE

Shallow parsing of Syntax and Macrosyntax

Scoring based on Proximity & Sequence

Imputation based on Contextual Discourse

Multi-feature Probabilistic Relatedness

Micro & Macro Orderclause, sentence,

paragraph, title, …

Local Coherence proximity probabilityIntertextual Inference

Context Model

Page 14: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Automatic Text Analysis Made Easy: Using AmCAT, NLPipe, and R

For Corpus Management, Linguistic Processing, and Automatic Text Analysis

Wouter van Atteveldt, VU Amsterdam Kasper Welbers, U Leuven et al.

Page 15: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

AmCAT - Easy document management and querying● Manage large text collections

○ Rights management for multiple users○ Upload plain text, csv, PDF, lexisnexis, …

● Complex keyword queries

● Quantitative manual coding

● Available for use!○ Free and open source○ Use amcat.nl, setup your own server, use docker image

Page 16: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

An API for power users● All functionality available through API● Use python/R to manage and analyse data● Scrape and upload articles● Conduct automatic queries● Download text, metadata● Upload new article sets● Create projects, users, etc.● Workflow:

○ Corpus/project management and explorative analysis using website○ Reproducible queries using API○ Download text or results and connect to other tools (topicmodels etc)

Page 17: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

NLPipe - easy NLP processing● Setting up NLP tools can be challenging

○ Lemmatizing, POS tagging, parsing○ Need to download tools, fix prerequisites, install

● NLPipe provides a simple interface to multiple tools○ CoreNLP, Alpino, Frog○ Connect from R, python○ Works on local computer or distributed (server/worker/clients)

● Can be installed as docker image for server/workers● Easy to connect to quanteda, corpustools, tm, ...

Page 18: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Facebook Page Data Extraction for Nonprogrammers: Introducing the Netvizz and Facepager Tools

Michael Che Ming Chan, Chinese U of Hong Kong

Page 19: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

NETVIZZ (Rieder, 2013) Online access through https://apps.facebook.com/netvizz/. Must have Facebook account.

Use

Facebook Page ID

What posts to extract

Data to output

Page 20: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

FACEPAGER (Keyling & Jünger, 2016) Download program from https://github.com/strohne/Facepager. PC or Mac version. Must have Facebook account.

Customize field output to database file

User Facebook credentials

Data request commands

Page 21: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

SOME ANALYTICAL POSSIBILITIES

Page 22: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Corpustools: An R Package for Text Analysis Beyond Bags of Words

Kasper Welbers, U of LeuvenWouter van Atteveldt, VU Amsterdam

Page 23: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Kasper Welbers, KU Leuven & Wouter van Atteveldt, VU

corpustools:An R package for text analysis beyond bag-of-words

Why another R corpus package?- Focus on maintaining token data

Full text tokens bag-of-words

doc_id token token_index lemma pos

111541965 It 1 it O111541965 is 2 be V111541965 our 3 we O111541965 unfinished 4 unfinished A111541965 task 5 task N111541965 to 6 to ?

Page 24: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

The tCorpus classAn R6 class– Reference class, to prevent unnecessary copies– Intuitive syntax for methods– clear distinction public/private

Token and meta data use the data.table package– Memory efficient and fast

Some cool features:

- Basic preprocessing- Word co-occurrence- Document similarity and fuzzy deduplication- Complex Boolean queries- Keyword + condition queries- KWIC that also supports co-occurrence- Vocabulary comparison- Annotating the token data based on various analyses, e.g., LDA

In progress:

- dealing with data that doesn't fit in memory- making fancy text browsers with annotations (being developed as the tokenbrowser package)

Page 25: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

MPPA

Codegithub.com/kasperwelbers/corpustools

CRANplanned for this summer

Related packages

RNewsflow tokenbrowser (formerly topicbrowser)semnet

corpustools:An R package for text analysis beyond bag-of-words

Kasper Welbers, KU Leuven Wouter van Atteveldt, VU

Page 26: New Software and Tools for Analyzing …vanatteveldt.com/p/ica2017_tools.pdfCollecting and Analyzing Social Media Data Using SocialMediaLab Timothy John Graham, The Australian National

Interactive Tool demos

http://ica-cm.org/ica2017-tools/