on the coverage of science in the media a big data study on the impact of the fukushima disaster...
TRANSCRIPT
On the Coverage of Science in the Media - !A Big Data Study on the Impact of the Fukushima
Disaster� Thomas Lansdall-Welfare and Nello Cristianini!
Department of Computer Science, University of Bristol BigData’ 14!
�
2015/8/3(Mon.)!Chang Wei-Yuan @ MakeLab Lab Meeting�
�
Keywords: Data analysis; Text mining; Knowledge discovery; Computational linguistics;�
Introduction�• This work analyzes online-news to explore
the impact of the Fukushima disaster on the media representations of nuclear power.!
Introduction�• A corpus of news articles are used to
detect the impact in the media before and after the event. !– the evolution of attention and sentiment of
nuclear power!– the networks of the actors and actions linked
to nuclear power!– the network of topics�
Introduction�• The key finding is that media attitude
towards nuclear power has significantly changed in the wake of the Fukushima disaster.�
Data Description�• This analysis only focus on science news
articles in an effort to ensure monitoring how the reporting of science has changed. !– number: over 5 million science articles !– period: about five years from1st May 2008
and 31st December 2013 !
NOAM: News Outlets Analysis and Monitoring System, SIGMOD 2011. !�
Data Description�• News articles are labeled as science
articles in one of two ways. !– that all news articles coming from an online
news feed that was explicitly hand annotated. !– automatically classify news articles into
different generic news categories. !
Methodology�• This work focused on analyzing the
context of how different scientific concepts and associated actors. !– the Apache Hadoop framework !– MongoDB!– ElasticSearch�
Methodology�• Extracting References !– References are extracted from the corpora of
science articles by the list of the items which we wish to detect. !
– scientific topics: Wikipedia!– universities: QS World University Rankings!– diseases: Wikipedia!
Methodology�• Generating Time Series!– two types of time series: !– 1. the amount of attention a given item
receives !– 2. the sentiment surrounding a given item
over time�
Methodology�• Mining Associations!– Associations between the items were
obtained by performing association rule mining using the FP-Growth algorithm. !
Methodology�• Extracting Triplets and Action Clouds!– To extract triplets that match the form
Subject-Verb-Object.!– By aggregating together all the verbs from
the triplets where a particular item forms the subject or object of the triplet. !
Result�• The attention on the topic of “Nuclear
Power”!– showing how a big data approach to corpus
analysis can reveal information.�
Conclusion�• The findings reveal an insight into the
change associated with the global media reporting. !– the nuclear power following the nuclear
disaster in Fukushima !