data journalism david donald data editor data journalist in residence

38
Data Journalism David Donald Data Editor Data Journalist in Residence

Upload: darrell-lang

Post on 30-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Data Journalism

David DonaldData EditorData Journalist in Residence

Precision Journalism

Computer-Assisted Reporting

• Analysts versus developers• Hacks and hackers

Data Journalism

Examples

California Watch: Broken Shield

• FINDINGS: Exposed flaws in the way a special state police force handles crimes against the developmentally disabled.

• Abuse cases rose 43 percent while patient population decreased 12 percent.

• DATA: Inspection, salary

The Tampa Bay Times: Stand your ground

• FINDINGS: Florida's "stand your ground" law is being used to free gang members involved in shootouts drug dealers beefing with clients and people who shot their victim in the back.

• DATA: Built an interactive database using newspaper reports, court records and documents obtained from prosecutors and defense attorneys to compile a partial list of self-defense cases in Florida since 2005.

ProPublica and the Washington Post: Presidential pardons heavily favor whites

• FINDINGS: White criminals seeking presidential pardons over the past decade have been nearly four times as likely to succeed as minorities.

• Blacks have had the poorest chance of receiving the president's ultimate act of mercy

• DATA: denied and approved pardons from Office of the Pardon Attorney inside the Justice Department, various public records for demographics

The New York Times: For horse and jockey, risks vary by state

• FINDINGS: showed an industry still mired in a culture of drugs and lax regulation and a fatal breakdown rate that remains far worse than in most of the world. More than 3,000 horses died during racing or training from 2009-11 according to a New York Times survey of 29 racing states.

• DATA: The Times purchased data from more than 150,000 races, along with injury reports, drug test results. The paper surveyed 29 racing states

Reuters: Income Inequality

• FINDINGS: Income inequality has increased in 49 of 50 states since 1989

• The poverty rate increased in 43 states, most sharply in Nevada

• DATA: Census and Current Population Survey from US Census

Atlanta Journal-Constitution: Cheating our children

• FINDINGS: About 200 school districts around the country had high concentrations of suspect test scores that follow a pattern of both unusually high and unusually low scores similar to Atlanta. For these school systems, the odds of so many suspicious score changes occurring in a single district due to chance alone were extraordinarily low — ranging from 1 in 1,000 to worse than 1 in 1 trillion

• DATA: Reading and math test results from all 50 states and DC for all years for grades 3 through 8

WisconsinWatch: Walker’s official work time declines as national fame grows

• FINDINGS: As national fame grew, Walker’s official time declined

• Walker received contributions from employees or political action committees at more than half of the 130-plus companies that appear in his official calendars

• DATA: Created a database of the more than 4,400 entries in Walker’s calendars from his first 13 months in office, campaign contributions

The Seattle Times: Methadone and the politics of pain

• FINDINGS: Methadone overdoses concentrated in poor areas. State steering Medicaid patients to Methadone, which is cheap

• DATA: death certificates, hospitalization records and poverty data

“Social Security is Grenada. Medicare is Vietnam.”-- Douglas Holtz-Eakin Director of Congressional Budget Office, 2003-2005

Initial FOIA letter• Took two months to

get response from CMS.

• Claimed we couldn’t foia data as available on Web.

• Refused to even put that in writing.

http://www.cms.gov/LimitedDataSets/

Lawsuit against CMS• Initial response

from Department of Justice – “Can we settle?”

• Negotiated from $97,000 to $12,000.

• Claims it lost money at this price.

Data Use Agreement• Standard form that

researchers fill out.• Most of its limiting

factors related to privacy of patients.

• But other restrictions place on naming individual doctors from this database.

• Added addendum to establish our right to store the database in our newsrooms.

CMS claim form• Allows multiple

procedures on same claim.

• Most typical procedures made up of multiple codes.

• Carriers allow more than one event on the claim.

• Each record in the database is a claim.

From CMS•Carrier Part B provider (going to see doc)•Out-patient (In and out of hospital in one day; also group practice.)•In-Patient (a night or more in hospital)•Skilled nursing (nursing home) •Hospice •Durable medical equipment (late night infomercials)•MEDPAR (aggregated hospital Part A)•Denominator (beneficiary info)

Others•CPT procedure codes and descriptors (AMA)•ICD-9 diagnostic and procedure codes•DRG codes and procedures•NPI and UPIN lookup (Provider ids)

The datasets

Record layouts / data dictionary• Every thing is a

code.• Some coding can

spread across 45 columns.

• Total dictionary just under 1,000 pages.

Importing the data•Each table had between 600 and 1700 variables, way too many for SQL Server•Used SPSS, broke tables into thirds, retaining linking variables to create three relational tables.•Exported each table from SPSS ,year by year, into SQL Server•Some years/tables took several days and much fretting.

Typical table• Carrier file 2008

(individual docs’ billing).

• 122 gigs.• 417 million claims.• A 5 percent sample.

Standardizing the data

We called it exploding … In Part B up to 13 procedure-related variables and 30 diagnosis-related variables

Procedure table

Diagnosis table

41 million records became 1.2 billion records (about 700 gigabytes)

Encryption

•In the data, almost all provider ids are encrypted•Got crosswalk but could not publish names with numbers from data.•Any names in stories were gathered through independent reporting•No data on tax id

Permanent injunction• 1979 court case

where Florida docs and AMA sued the old HEW to prevent naming docs from the database.

• Still in effect.• Must sue in

Jacksonville, Fla. court.

http://www.publicintegrity.org/articles/entry/2571/

from SPSS

http://www.dowjones.com/pressroom/presskits/secrets/secretsofsystem.asp

Back to court• CMS refused access

to some 100 percent files.

• Time to hold docs accountable.

Venn diagrams

Precision Journalism

Computer-AssistedReporting

Data Journalism

Symbolic Logic 1881

CAR Conference 2013Louisville, KY

Questions?

David [email protected]