on the coverage of science in the media a big data study on the impact of the fukushima disaster...

20
On the Coverage of Science in the Media - A Big Data Study on the Impact of the Fukushima Disaster Thomas Lansdall-Welfare and Nello Cristianini Department of Computer Science, University of Bristol BigData’ 14 2015/8/3(Mon.) Chang Wei-Yuan @ MakeLab Lab Meeting Keywords: Data analysis; Text mining; Knowledge discovery; Computational linguistics;

Upload: chang-wei-yuan

Post on 19-Aug-2015

25 views

Category:

Data & Analytics


1 download

TRANSCRIPT

On the Coverage of Science in the Media - !A Big Data Study on the Impact of the Fukushima

Disaster� Thomas Lansdall-Welfare and Nello Cristianini!

Department of Computer Science, University of Bristol BigData’ 14!

2015/8/3(Mon.)!Chang Wei-Yuan @ MakeLab Lab Meeting�

Keywords: Data analysis; Text mining; Knowledge discovery; Computational linguistics;�

Introduction�•  This work analyzes online-news to explore

the impact of the Fukushima disaster on the media representations of nuclear power.!

Introduction�•  A corpus of news articles are used to

detect the impact in the media before and after the event. !–  the evolution of attention and sentiment of

nuclear power!–  the networks of the actors and actions linked

to nuclear power!–  the network of topics�

Introduction�•  The key finding is that media attitude

towards nuclear power has significantly changed in the wake of the Fukushima disaster.�

Data Description�•  This analysis only focus on science news

articles in an effort to ensure monitoring how the reporting of science has changed. !– number: over 5 million science articles !– period: about five years from1st May 2008

and 31st December 2013 !

NOAM: News Outlets Analysis and Monitoring System, SIGMOD 2011. !�

Data Description�•  News articles are labeled as science

articles in one of two ways. !–  that all news articles coming from an online

news feed that was explicitly hand annotated. !– automatically classify news articles into

different generic news categories. !

Methodology�•  This work focused on analyzing the

context of how different scientific concepts and associated actors. !–  the Apache Hadoop framework !–  MongoDB!–  ElasticSearch�

Methodology�•  Extracting References !– References are extracted from the corpora of

science articles by the list of the items which we wish to detect. !

– scientific topics: Wikipedia!– universities: QS World University Rankings!– diseases: Wikipedia!

Methodology�•  Generating Time Series!–  two types of time series: !– 1. the amount of attention a given item

receives !– 2. the sentiment surrounding a given item

over time�

Methodology�•  Mining Associations!– Associations between the items were

obtained by performing association rule mining using the FP-Growth algorithm. !

Methodology�•  Extracting Triplets and Action Clouds!– To extract triplets that match the form

Subject-Verb-Object.!– By aggregating together all the verbs from

the triplets where a particular item forms the subject or object of the triplet. !

Result�•  The attention on the topic of “Nuclear

Power”!– showing how a big data approach to corpus

analysis can reveal information.�

Result�•  Evolution of Attention�

Result�•  Evolution of Sentiment�

Before�

•  Associations�

A'er�

Before�

A'er�

Conclusion�•  The findings reveal an insight into the

change associated with the global media reporting. !–  the nuclear power following the nuclear

disaster in Fukushima !

Conclusion�•  The methodology presents a

comprehensive way to monitor critical events and their media. !

•  The innovative character of these techniques opens up new possibilities in social scientific research.�