frontiers of computational journalism - columbia journalism school fall 2012 - week 1
Post on 27-Oct-2014
148 Views
Preview:
DESCRIPTION
TRANSCRIPT
Fron%ers of Computa%onal Journalism
Columbia Journalism School
Week 1: Basics September 10, 2012
Week 1: Basics
What is computa%onal journalism?
Data in journalism
Aims of the course
Course structure
Week 1: Basics
What is computa%onal journalism?
Data in journalism
Aims of the course
Course structure
Computa%onal Journalism: Defini%ons
“Broadly defined, it can involve changing how stories are discovered, presented, aggregated, mone%zed, and archived. Computa%on can advance journalism by drawing on innova%ons in topic detec%on, video analysis, personaliza%on, aggrega%on, visualiza%on, and sensemaking.” -‐ Cohen, Hamilton, Turner, Computa(onal Journalism
Computa%onal Journalism: Defini%ons
“Stories will emerge from stacks of financial disclosure forms, court records, legisla%ve hearings, officials' calendars or mee%ng notes, and regulators' email messages that no one today has %me or money to mine. With a suite of repor%ng tools, a journalist will be able to scan, transcribe, analyze, and visualize the paRerns in these documents.” -‐ Cohen, Hamilton, Turner, Computa(onal Journalism
Cohen et al. model
Data Repor%ng
User
Computer Science
CS for presenta%on / interac%on
Data Repor%ng
User
CS CS
Filter many stories for user
User
Data Repor%ng
CS
Data Repor%ng
CS
Data Repor%ng
CS
Filtering
CS CS
CS
CS
• What an editor puts on the front page • Google News • Reddit’s comment system • TwiRer • Facebook news feed • Techmeme • …
Examples of filters
Memetracker by Leskovic, Backstrom, Kleinberg
Kony 2012 early network, by Gilad Lotan / Socialflow
Track effects
User
Data Repor%ng
CS
Data Repor%ng
CS
Data Repor%ng
CS
Filtering
CS CS
CS
CS
Effects
CS
Computa%onal journalism process
Repor%ng
Presenta%on Filtering Tracking
Computa%onal Journalism: Defini%ons
“the applica%on of computer science to the problems of public informa%on, knowledge, and belief, by prac%%oners who see their mission as outside of both commerce and government.” -‐ Jonathan Stray, A Computa(onal Journalism Reading List
Week 1: Basics
What is computa%onal journalism?
Data in journalism
Aims of the course
Course structure
a collec%on of similar pieces of
informa%on
Defini%on of data
structured data
unstructured data
Why use data in journalism?
1. data is where the informa%on is
More video on YouTube than produced by TV networks during en%re 20th century
10,000 legally-‐required reports filed by U.S. public companies every day
400,000,000 tweets per day
AP moves ~15,000 stories per day
390,000 Wikileaks cables
500,000 Enron emails
…how many gov’t and corporate docs?
There’s a lot out there
Human data generated in 2010 =
1,000,000,000 terabytes
Library of congress digital archive =
160 terabytes (only 20 TB for all books!)
All New York Times ar%cles ever =
0.06 terabytes (13 million stories, assuming 5k per story)
Transparency means nothing if no one is watching.
Why use data in journalism?
1. Data is where the informa%on is 2. Data can give a more complete picture
Phil Meyer, Detroit Riots, 1967
“A reporter, talking to people on the street corner, draws comparisons intui%vely, almost unconsciously. When dealing with large numbers of people—437 were interviewed in the Detroit survey—intui%on is not enough. It takes a computer to count and sort and analyze the thoughts of that many people, and the input must be consistently structured.”
Phil Meyer, Detroit Riots, 1967
“Educa%on and income were not good predictors of whether a person would riot.”
Week 1: Basics
What is computa%onal journalism?
Data in journalism
Aims of the course
Course structure
Design
“[Designers] are guided by the ambi%on to imagine a desirable state of the world, playing through alterna%ve ways in which it might be accomplished, carefully tracing the consequences of contemplated ac%ons.”
-‐ Horst RiRel, The Reasoning of Designers
Design is not objec%ve “During the industrial age, the idea of planning, in common with the idea of professionalism, was dominated by the pervasive idea of efficiency. We have come to think about the planning task in very different ways in recent years. We have been learning to ask whether what we are doing is the right thing to do. That is to say, we have been learning to ask ques%ons about the outputs of ac%ons and to pose problem statements in valua%ve frameworks.
-‐ Horst RiRel, Dilemmas in a General Theory of Planning
Design is poli%cal
“No plan has ever been beneficial to everybody. Therefore, many persons with varying, oten contradictory interests and ideas are or want to be involved in plan-‐making. The resul%ng plans are usually compromises resul%ng from nego%a%on and the applica%on of power. The designer is party in these processes; he takes sides.”
-‐ Horst RiRel, The Reasoning of Designers
Different kinds of knowledge
Norma%ve: “what should be”
(poli%cal philosophy, sociology, ethics, cri%cal theory…)
Instrumental: “how to get there”
(in our case: journalism and computer science)
This course is about both.
Week 1: Basics
What is computa%onal journalism?
Data in journalism
Aims of the course
Course structure
Theory
We will learn important guiding principles about • Filter design • Visualiza%on • Social network analysis • Drawing conclusions from data • Security modeling
Techniques We will discuss a handful of techniques in great depth. • Distance func%ons and clustering • Vector space document model • Recommender systems • Proposi%on extrac%on • Knowledge representa%on as linked data • Community detec%on
Any requests?
Course structure
• Classes: we’ll review the readings (so please read them)
• By next week: form groups of 2-‐3. • Assignments every other week, due in two weeks
• Some involve will involve coding, all will involve cri%cal analysis.
Your data
• You are encouraged to pick a data set and s%ck with it.
• If you want, can do all assignments, final research report, etc. with this data
• This is a research course… let’s learn something new.
What data?
SEC reports, municipal open gov data, Wikileaks, your favorite archive, social media…
Two criteria:
Journalis%cally interes%ng
Requires advanced techniques
Final Report
For 3-‐point students • A theore%cal discussion (10 pages) For 6-‐point students, one of: • A theore%cal discussion (25 pages) • An implementa%on of a technique and discussion of results
• Analysis of your chosen data • A completed story, plus methodology
top related