readability and linguistic subjectivity of news

28
Readability and Linguistic Subjectivity of News Ilias Flaounas University of Bristol February 22, 2011 I. Flaounas (University of Bristol) February 22, 2011 1 / 21

Upload: ilias-flaounas

Post on 18-Aug-2015

96 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Readability and Linguistic Subjectivity of News

Readability and Linguistic Subjectivity of News

Ilias Flaounas

University of Bristol

February 22, 2011

I. Flaounas (University of Bristol) February 22, 2011 1 / 21

Page 2: Readability and Linguistic Subjectivity of News

Our research area

I. Flaounas (University of Bristol) February 22, 2011 2 / 21

Page 3: Readability and Linguistic Subjectivity of News

Traditional Research on Media

few outlets per study (< 10)

limited numbers of news-items(few hundreds in best cases)

restricted time periods (fewdays)

news-items from a singlecountry’s media

manual annotation (‘coding’)

commercial databases – andtheir constrains

hypothesis driven research

I. Flaounas (University of Bristol) February 22, 2011 3 / 21

Page 4: Readability and Linguistic Subjectivity of News

Research Focus

In our research we undertake a large-scale mainstream news-media textualcontent analysis using automated techniques.

I. Flaounas (University of Bristol) February 22, 2011 4 / 21

Page 5: Readability and Linguistic Subjectivity of News

Research Focus

In our research we undertake a large-scale mainstream news-media textualcontent analysis using automated techniques.

‘Mainstream news-media’ since we do not focus on modernonline-only news spreading means such as blogs or Twitter.

I. Flaounas (University of Bristol) February 22, 2011 4 / 21

Page 6: Readability and Linguistic Subjectivity of News

Research Focus

In our research we undertake a large-scale mainstream news-media textualcontent analysis using automated techniques.

‘Mainstream news-media’ since we do not focus on modernonline-only news spreading means such as blogs or Twitter.

‘Textual’ since we use only the textual information of news ratherthan analysing e.g. images, videos, or speech.

I. Flaounas (University of Bristol) February 22, 2011 4 / 21

Page 7: Readability and Linguistic Subjectivity of News

Research Focus

In our research we undertake a large-scale mainstream news-media textualcontent analysis using automated techniques.

‘Mainstream news-media’ since we do not focus on modernonline-only news spreading means such as blogs or Twitter.

‘Textual’ since we use only the textual information of news ratherthan analysing e.g. images, videos, or speech.

‘Large-scale’ since we analyse hundreds of outlets, typically forextended periods of time, involving millions of news items.

I. Flaounas (University of Bristol) February 22, 2011 4 / 21

Page 8: Readability and Linguistic Subjectivity of News

Research Focus

In our research we undertake a large-scale mainstream news-media textualcontent analysis using automated techniques.

‘Mainstream news-media’ since we do not focus on modernonline-only news spreading means such as blogs or Twitter.

‘Textual’ since we use only the textual information of news ratherthan analysing e.g. images, videos, or speech.

‘Large-scale’ since we analyse hundreds of outlets, typically forextended periods of time, involving millions of news items.

‘Automated’ in the sense that the analysis is performed by applyingArtificial Intelligence techniques rather than using human ‘coders’.

In our research data management is also a challenge!

I. Flaounas (University of Bristol) February 22, 2011 4 / 21

Page 9: Readability and Linguistic Subjectivity of News

NOAM: News Outlets Analysis & Monitoring system

I. Flaounas, O. Ali, M. Turchi, T. Snowsill, F. Nicart, T. De Bie, N. Cristianini: “NOAM: News Outlets Analysis and Monitoring

System”, SIGMOD, Accepted for publication, 2011.

I. Flaounas (University of Bristol) February 22, 2011 5 / 21

Page 10: Readability and Linguistic Subjectivity of News

Current Status

Our corpus in numbers:

> 1300 multilingual news sources

> 3000 news feeds

133 countries

22 languages

> 3 years of continuous monitoring

40K news items per day

> 30M news items in total

I. Flaounas (University of Bristol) February 22, 2011 6 / 21

Page 11: Readability and Linguistic Subjectivity of News

Support Vector Machines as Topic Taggers

We trained 15 topic taggers on 5 years data from:◮ Reuters◮ NY Times

Typical text preprocessing: Stemming, stop-words removal, TF-IDF

Two-class SVMs

Cosine similarity

Empirical tuning of C parameter per tagger

We set decision threshold to get maximum F0.5-Score at the testset

I. Flaounas (University of Bristol) February 22, 2011 7 / 21

Page 12: Readability and Linguistic Subjectivity of News

SVM Taggers

Trained on Reuters & NY Times corpora

Topic F0.5-Score F0.5 Std.Dev. Precision Recall

CRIME 78.92 1.51 82.93 66.59

DISASTERS 83.4 3.7 87.69 70.34

ELECTIONS 70.32 8.74 78.99 49.32

FASHION 83.88 18.61 94.61 71.27

INFLATION-PRICES 77.01 3.19 81.45 63.38

MARKETS 92.02 0.32 94.09 84.63

PETROLEUM 70.67 2.78 75.14 58.73

SCIENCE 73.63 5.17 83.72 50.62

SPORTS 97.78 0.5 98.31 95.75

WEATHER 71.43 3.68 82.91 46.84

ART 81.67 1.34 84.9 71.38

BUSINESS 81.16 1.19 86.23 65.87

ENVIRONMENT 64.29 4.26 73.48 43.7

POLITICS 73.81 2.29 76.65 64.81

RELIGION 74.95 4.21 83.57 53.59

I. Flaounas (University of Bristol) February 22, 2011 8 / 21

Page 13: Readability and Linguistic Subjectivity of News

The experiment

The goal is to measure two writing style properties of news:

Linguistic Subjectivity

Readability

over different topics and outlets.

Corpus for the Experiment

10 months monitoring, (Jan. 1st, 2010 – Oct 31st, 2010)

498 English-language media

99 different countries

2.5M articles appeared in ‘Main’ feed

I. Flaounas (University of Bristol) February 22, 2011 9 / 21

Page 14: Readability and Linguistic Subjectivity of News

Articles Annotation

We annotated 926,411 articles, with 1,037,359 tags, an average of 1.12tags per article.

Topic Articles

ART 42896 MARKETS 24319BUSINESS 126494 PETROLEUM 21236CRIME 277626 POLITICS 201776DISASTERS 83828 RELIGION 34441ELECTIONS 28656 SCIENCE 10076ENVIRONMENT 16103 SPORTS 141665FASHION 1284 WEATHER 8505INFLATION-PRICES 2331 Total 1037359

I. Flaounas (University of Bristol) February 22, 2011 10 / 21

Page 15: Readability and Linguistic Subjectivity of News

Linguistic Subjectivity

We measure the number of sentimental adjectives over the totalnumber of adjectives per article.

We detect adjectives using Stanford POS tagger

We measure sentiment using SentiWordnet.

We characterize an adjective sentimental if either its positive ornegative sentimental score is > 0.25.

10K items per topic randomly selected

I. Flaounas (University of Bristol) February 22, 2011 11 / 21

Page 16: Readability and Linguistic Subjectivity of News

Validation of Linguistic Subjectivity?

This is a challenge due to miss of a golden standard.

I. Flaounas (University of Bristol) February 22, 2011 12 / 21

Page 17: Readability and Linguistic Subjectivity of News

Validation of Linguistic Subjectivity?

This is a challenge due to miss of a golden standard.But we found that:

Editorials and Opinion articles are more linguistically subjectivecompared to average.

◮ 5766 Ed/Op articles from 57 different sources.◮ LS mean value 26.15%(std.dev of 0.29%)◮ Articles in Main-feed have mean LS of 19.45% (std. dev 0.22%).

I. Flaounas (University of Bristol) February 22, 2011 12 / 21

Page 18: Readability and Linguistic Subjectivity of News

Validation of Linguistic Subjectivity?

This is a challenge due to miss of a golden standard.But we found that:

Editorials and Opinion articles are more linguistically subjectivecompared to average.

◮ 5766 Ed/Op articles from 57 different sources.◮ LS mean value 26.15%(std.dev of 0.29%)◮ Articles in Main-feed have mean LS of 19.45% (std. dev 0.22%).

Popular articles are more linguistically subjective compared to average(Based on 108,516 popular articles in period of study)

I. Flaounas (University of Bristol) February 22, 2011 12 / 21

Page 19: Readability and Linguistic Subjectivity of News

Validation of Linguistic Subjectivity?

This is a challenge due to miss of a golden standard.But we found that:

Editorials and Opinion articles are more linguistically subjectivecompared to average.

◮ 5766 Ed/Op articles from 57 different sources.◮ LS mean value 26.15%(std.dev of 0.29%)◮ Articles in Main-feed have mean LS of 19.45% (std. dev 0.22%).

Popular articles are more linguistically subjective compared to average(Based on 108,516 popular articles in period of study)

UK tabloids are more Linguistically Subjective compared tobroadsheets.

I. Flaounas (University of Bristol) February 22, 2011 12 / 21

Page 20: Readability and Linguistic Subjectivity of News

Linguistic Subjectivity of Topics

0 5 10 15 20 25

POLITICSELECTIONS

BUSINESSSCIENCE

ENVIRONMENTRELIGION

PETROLEUMRANDOMSPORTSPRICES

MOST POP.WEATHERMARKETS

ARTCRIME

DISASTERSFASHION

I. Flaounas (University of Bristol) February 22, 2011 13 / 21

Page 21: Readability and Linguistic Subjectivity of News

Readability

We measure readability based on the Flesch Reading Ease Test

FRET (article) = 206.835 − (1.015 · ASL)− 84.6 · ASW

Scores range from 0–100.

The higher the FRET the easier the text to read.

10K items per topic randomly selected

As validation we checked the readability of CBBC Newsround. It was themost readable set of articles with mean score 62.50.

I. Flaounas (University of Bristol) February 22, 2011 14 / 21

Page 22: Readability and Linguistic Subjectivity of News

Readability of Topics

0 10 20 30 40 50

POLITICSENVIRONMENT

PRICESSCIENCE

BUSINESSELECTIONS

RELIGIONPETROLEUM

CRIMERANDOM

MARKETSMOST POP.

DISASTERSWEATHERFASHION

ARTSPORTS

I. Flaounas (University of Bristol) February 22, 2011 15 / 21

Page 23: Readability and Linguistic Subjectivity of News

Readability vs. Linguistic Subjectivity on Topics

14 16 18 20 22 24 26 2836

38

40

42

44

46

48

50

ART

BUSINESS

ENVIRONMENT

POLITICS

RELIGION

CRIME

DISASTERS

ELECTIONS

FASHION

MARKETS

PETROLEUM

PRICESSCIENCE

SPORTS

WEATHER

Linguistic Subjectivity

Rea

dabi

lity

I. Flaounas (University of Bristol) February 22, 2011 16 / 21

Page 24: Readability and Linguistic Subjectivity of News

Outlets

We compare for Readability and Linguistic Subjectivity of

8 US newspapers

8 UK newspapers (4 Tabloids / 4 Broadsheets)

Newspaper Articles

Chicago Tribune 5477 Daily Mail 24326

Daily News 2212 Daily Mirror 7731

Los Angeles Times 6696 Daily Star 8946

New York Post 32033 Daily Telegraph 22682

NY Times 11508 Independent 43557

The Wall Street Journal 12300 The Guardian 15393

The Washington Post 7228 The Sun 9048

USA Today 6208 The Times 2957

I. Flaounas (University of Bristol) February 22, 2011 17 / 21

Page 25: Readability and Linguistic Subjectivity of News

Linguistic Subjectivity of Outlets

0 5 10 15 20 25 30

The Wall Str JournalThe Washington Post

USA TodayThe Times

Los Angeles TimesNY Times

Daily TelegraphThe Guardian

Chicago TribuneDaily Star

New York PostIndependent

Daily MailDaily NewsDaily Mirror

The Sun

I. Flaounas (University of Bristol) February 22, 2011 18 / 21

Page 26: Readability and Linguistic Subjectivity of News

Readability of Outlets

0 10 20 30 40 50 60

The GuardianUSA Today

Daily MailDaily Star

The Washington PostLos Angeles Times

The Wall Str JournalDaily News

Daily TelegraphNew York Post

NY TimesThe Times

Chicago TribuneIndependentDaily Mirror

The Sun

I. Flaounas (University of Bristol) February 22, 2011 19 / 21

Page 27: Readability and Linguistic Subjectivity of News

Readability vs. Linguistic Subjectivity on Outlets

15 20 25 3030

35

40

45

50

55

60

Chicago Tribune

Daily Mail

Daily Mirror

Daily News

Daily Star

Daily Telegraph

Independent

Los Angeles Times

New York PostNY Times

The Guardian

The Sun

The Times

The Wall Street Journal

The Washington Post

USA Today

Linguistic Subjectivity

Rea

dabi

lity

I. Flaounas (University of Bristol) February 22, 2011 20 / 21

Page 28: Readability and Linguistic Subjectivity of News

More info and results at: http://mediapatterns.enm.bris.ac.uk

Thank you!

I. Flaounas (University of Bristol) February 22, 2011 21 / 21