version controlling the news. how we can archive. sxsw presentation on newsdiffs

Post on 15-May-2015

1.509 Views

Category:

News & Politics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Version Controlling the News. How We Can Archive. SXSW Presentation on NewsDiffs. A look at how version control software can track the changes of online articles in The New York Times, Politico, BBC and CNN. Eric Price and Margaret Sullivan, Public Editor of The New York Times, presented at SXSW 2013. http://schedule.sxsw.com/2013/events/event_FP990508

TRANSCRIPT

NewsDiffs: Version Controlling the News

Eric Price Margaret Sullivan

MIT The New York Times

2013-03-11

http://newsdiffs.org/

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 1 / 30

NewsDiffs

Online news is different from print.

I Print: hard to change, daily deadlines.I Online: easy to change, deadline now.

Online news articles have a lifecycle:

I Reporter writes a rushed story.I Editor makes a pass or two.I (Another) reporter rewrites the story.I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Online news is different from print.I Print: hard to change, daily deadlines.

I Online: easy to change, deadline now.

Online news articles have a lifecycle:

I Reporter writes a rushed story.I Editor makes a pass or two.I (Another) reporter rewrites the story.I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Online news is different from print.I Print: hard to change, daily deadlines.I Online: easy to change, deadline now.

Online news articles have a lifecycle:

I Reporter writes a rushed story.I Editor makes a pass or two.I (Another) reporter rewrites the story.I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Online news is different from print.I Print: hard to change, daily deadlines.I Online: easy to change, deadline now.

Online news articles have a lifecycle:

I Reporter writes a rushed story.I Editor makes a pass or two.I (Another) reporter rewrites the story.I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Online news is different from print.I Print: hard to change, daily deadlines.I Online: easy to change, deadline now.

Online news articles have a lifecycle:I Reporter writes a rushed story.

I Editor makes a pass or two.I (Another) reporter rewrites the story.I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Online news is different from print.I Print: hard to change, daily deadlines.I Online: easy to change, deadline now.

Online news articles have a lifecycle:I Reporter writes a rushed story.I Editor makes a pass or two.

I (Another) reporter rewrites the story.I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Online news is different from print.I Print: hard to change, daily deadlines.I Online: easy to change, deadline now.

Online news articles have a lifecycle:I Reporter writes a rushed story.I Editor makes a pass or two.I (Another) reporter rewrites the story.

I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Online news is different from print.I Print: hard to change, daily deadlines.I Online: easy to change, deadline now.

Online news articles have a lifecycle:I Reporter writes a rushed story.I Editor makes a pass or two.I (Another) reporter rewrites the story.I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Online news is different from print.I Print: hard to change, daily deadlines.I Online: easy to change, deadline now.

Online news articles have a lifecycle:I Reporter writes a rushed story.I Editor makes a pass or two.I (Another) reporter rewrites the story.I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Online news is different from print.I Print: hard to change, daily deadlines.I Online: easy to change, deadline now.

Online news articles have a lifecycle:I Reporter writes a rushed story.I Editor makes a pass or two.I (Another) reporter rewrites the story.I Editor makes another pass or two.

Libraries archive print version, not what people actually read.

NewsDiffs tracks stories as they evolve.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 2 / 30

NewsDiffs

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 3 / 30

NewsDiffs

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 3 / 30

Outline of Talk

1 Motivation and Creation

2 Case Studies

3 Future

4 Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 4 / 30

Outline of Talk

1 Motivation and Creation

2 Case Studies

3 Future

4 Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 5 / 30

Occupy Wall Street arrests

After allowing themonto the bridge, police cutoff and arrested dozens ofoccupy wall street demon-strators.

Lede rewritten to remove first bit.

Lucky someone must have kept the old tab open!

Reporter’s defense: body of article consistent.

Hard to judge without access to old version.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 6 / 30

N’kisi the telepathic parrotFound via Language Log

N’kisi’s remarkable abilities, which aresaid to include telepathy, feature in thelatest BBC Wildlife Magazine.

2004: BBC Science article appears

2006: “Telepathy” removed; no correction

2007 (May): Article completely replaced

2007 (August): “Correction” appears:

Note: This story about animal communication has replacedan earlier one on this page which contained factualinaccuracies we were unable to correct. As a result, theoriginal story is no longer in our archive. It is still visibleelsewhere, via [link to WayBack Machine].

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 7 / 30

N’kisi the telepathic parrotFound via Language Log

N’kisi’s remarkable abilities, which aresaid to include telepathy, feature in thelatest BBC Wildlife Magazine.

2004: BBC Science article appears

2006: “Telepathy” removed; no correction

2007 (May): Article completely replaced

2007 (August): “Correction” appears:

Note: This story about animal communication has replacedan earlier one on this page which contained factualinaccuracies we were unable to correct. As a result, theoriginal story is no longer in our archive. It is still visibleelsewhere, via [link to WayBack Machine].

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 7 / 30

N’kisi the telepathic parrotFound via Language Log

N’kisi’s remarkable abilities, which aresaid to include telepathy, feature in thelatest BBC Wildlife Magazine.

2004: BBC Science article appears

2006: “Telepathy” removed; no correction

2007 (May): Article completely replaced

2007 (August): “Correction” appears:

Note: This story about animal communication has replacedan earlier one on this page which contained factualinaccuracies we were unable to correct. As a result, theoriginal story is no longer in our archive. It is still visibleelsewhere, via [link to WayBack Machine].

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 7 / 30

N’kisi the telepathic parrotFound via Language Log

N’kisi’s remarkable abilities, which aresaid to include telepathy, feature in thelatest BBC Wildlife Magazine.

2004: BBC Science article appears

2006: “Telepathy” removed; no correction

2007 (May): Article completely replaced

2007 (August): “Correction” appears:

Note: This story about animal communication has replacedan earlier one on this page which contained factualinaccuracies we were unable to correct. As a result, theoriginal story is no longer in our archive. It is still visibleelsewhere, via [link to WayBack Machine].

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 7 / 30

The public editor, a year before NewsDiffs

Right now, tracking changes is not a priority at The Times.As [the new executive editor Jill Abramson] told me, it’sunrealistic to preserve an “immutable, permanent record ofeverything we have done.”

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 8 / 30

NewsDiffs team

Jennifer 8. Lee Greg Price Eric Price

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 9 / 30

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

Knight-Mozilla Open News Hackathon

27 hours of furious coding

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

Knight-Mozilla Open News Hackathon

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 10 / 30

A permanent record is feasible

Recall The Times’s statement:

[I]t’s unrealistic to preserve an “immutable, permanent recordof everything we have done.”

Wikipedia does it.

Version control is a solved problem.

We did it in one* weekend, from the outside.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 11 / 30

Technical overview

ScraperBeautifulSoup parser

www.nytimes.com

MySQL Database of Article URLs

nytimes.com/2013/...ating.html

nytimes.com/2013/...-jail.html

Git repositoryof text of all articles

WebsiteDjangoGoogle diff-match-patch

You

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 12 / 30

*Not quite one weekend

Another day of work after each of 3, 10, 22 weeks.

Scaling issuesI Running on AFS, a networked file systemI Moved version metadata from git to MySQL.I Optimized queries to both backends

UI improvements.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 13 / 30

Press

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 14 / 30

Press

[A] more comprehensive archive that retains all significantversions of an article (and all corrections) would send readers astrong message that The Times is committed to fulltransparency and accountability. [...]

As NewsDiffs demonstrates, if you don’t make yourselfaccountable nowadays, someone else will do it for you.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 15 / 30

Press

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 16 / 30

Press

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 16 / 30

Press

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 16 / 30

Easy to extend

We’re tracking the New York Times, CNN, BBC, Politico.

To track another site, need to write code to extract plain text fromwebpage.

30-40 lines of code; takes maybe one hour.I But resource constraints: running on free MIT servers out of my

account.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 17 / 30

NewsDiffs is Free Software

Forks:

http://redactado.com.ar/

Patches:I Received (and merged) patch to parse tagesschau.de.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 18 / 30

NewsDiffs is Free Software

Forks:http://redactado.com.ar/

Patches:I Received (and merged) patch to parse tagesschau.de.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 18 / 30

NewsDiffs is Free Software

Forks:http://newsdiffs.es/

Patches:I Received (and merged) patch to parse tagesschau.de.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 18 / 30

NewsDiffs is Free Software

Forks:http://newsdiffs.es/

Patches:I Received (and merged) patch to parse tagesschau.de.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 18 / 30

Statistics

Tracking 28000 NYT articles (62000 over all sources).

44% of articles changed at least once.I 20-30% in opinion, books, fashion sectionsI 55-60% in sports, NY region, world sections

15% of articles changed at least three times.

9% have official corrections.

4% have byline changes.I 11% in world section.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 19 / 30

Outline of Talk

1 Motivation and Creation

2 Case Studies

3 Future

4 Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 20 / 30

Examples: Nuclear Talks with Iran

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 21 / 30

Examples: Sandy Hook Shooting

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 22 / 30

Examples: Edward Koch Obituary

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 23 / 30

Examples: Romney and Benghazi

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 24 / 30

Examples: Romney and Benghazi

In first version:

For a country looking to understand how Mr. Romney, aRepublican candidate with no foreign policy experience, wouldrespond to a major crisis, this was a first glimpse.

And as an adviser to the campaign who worked in the GeorgeW. Bush administration said on Wednesday, Mr. Romney’saccusation [...] looked like “he had forgotten the first rule in acrisis: don’t start talking before you understand what’shappening.”

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 25 / 30

Outline of Talk

1 Motivation and Creation

2 Case Studies

3 Future

4 Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 26 / 30

Goals

Goals for NewsDiffs1 Reference for known interesting changes.2 Unearth interesting changes.3 Study the process of journalistic editing.

Currently only satisfying (1) well.

To satisfy the others, needI Automated tools to sift through the changes for interesting ones.I Someone to use our data for research

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 27 / 30

Simple example of studyThat vs. Who

The council could choose a pope that all factions would recognize.

The council could choose a pope whom all factions would recognize.

Rule: “who” refers to people, “that” to non-people.

When do reporters make mistakes that editors catch?

Mitik, the baby walrus whowas orphaned

The giant panda cub whodied

The subatomic analogue ofcats who are alive and deadat the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

Simple example of studyThat vs. Who

The council could choose a pope that all factions would recognize.

The council could choose a pope whom all factions would recognize.

Rule: “who” refers to people, “that” to non-people.

When do reporters make mistakes that editors catch?

Mitik, the baby walrus whowas orphaned

The giant panda cub whodied

The subatomic analogue ofcats who are alive and deadat the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

Simple example of studyThat vs. Who

The council could choose a pope that all factions would recognize.

The council could choose a pope whom all factions would recognize.

Rule: “who” refers to people, “that” to non-people.

When do reporters make mistakes that editors catch?

Mitik, the baby walrus whowas orphaned

The giant panda cub whodied

The subatomic analogue ofcats who are alive and deadat the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

Simple example of studyThat vs. Who

The council could choose a pope that all factions would recognize.

The council could choose a pope whom all factions would recognize.

Rule: “who” refers to people, “that” to non-people.

When do reporters make mistakes that editors catch?

Mitik, the baby walrus whowas orphaned

The giant panda cub whodied

The subatomic analogue ofcats who are alive and deadat the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

Simple example of studyThat vs. Who

The council could choose a pope that all factions would recognize.

The council could choose a pope whom all factions would recognize.

Rule: “who” refers to people, “that” to non-people.

When do reporters make mistakes that editors catch?

Mitik, the baby walrus whowas orphaned

The giant panda cub whodied

The subatomic analogue ofcats who are alive and deadat the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

Simple example of studyThat vs. Who

The council could choose a pope that all factions would recognize.

The council could choose a pope whom all factions would recognize.

Rule: “who” refers to people, “that” to non-people.

When do reporters make mistakes that editors catch?

Mitik, the baby walrus whowas orphaned

The giant panda cub whodied

The subatomic analogue ofcats who are alive and deadat the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

Simple example of studyThat vs. Who

The council could choose a pope that all factions would recognize.

The council could choose a pope whom all factions would recognize.

Rule: “who” refers to people, “that” to non-people.

When do reporters make mistakes that editors catch?

Mitik, the baby walrus whowas orphaned

The giant panda cub whodied

The subatomic analogue ofcats who are alive and deadat the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

Simple example of studyThat vs. Who

The council could choose a pope that all factions would recognize.

The council could choose a pope whom all factions would recognize.

Rule: “who” refers to people, “that” to non-people.

When do reporters make mistakes that editors catch?

Mitik, the baby walrus thatwas orphaned

The giant panda cub thatdied

The subatomic analogue ofcats that are alive and deadat the same time

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 28 / 30

Conclusions

News websites should keep a public record of what they publish.

In the meantime, NewsDiffs fills the role.

We have lots of data, ready to be mined for useful information.

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 29 / 30

Outline of Talk

1 Motivation and Creation

2 Case Studies

3 Future

4 Q & A

Eric Price, Margaret Sullivan (MIT, NYT) NewsDiffs: Version Controlling the News 2013-03-11 30 / 30

top related