data journalism 101 - day 1 by michael j. berens

66
Donald W. Reynolds National Center for Business Journalism at ASU Michael J. Berens The Seattle Times Data Journalism 101

Upload: reynolds-center-for-business-journalism

Post on 10-Jul-2015

838 views

Category:

Career


2 download

TRANSCRIPT

Page 1: Data Journalism 101 - Day 1 by Michael J. Berens

Donald W. Reynolds National Center for Business

Journalism at ASU

Michael J. Berens – The Seattle Times

Data Journalism 101

Page 2: Data Journalism 101 - Day 1 by Michael J. Berens

Skills – rooted in past

Page 3: Data Journalism 101 - Day 1 by Michael J. Berens

Skills – lost in space

Page 4: Data Journalism 101 - Day 1 by Michael J. Berens

He said. She said. Now I’m going to tell you

who’s telling the truth.

Page 5: Data Journalism 101 - Day 1 by Michael J. Berens

Poll Question: Have you ever been denied public data?

1) Yes 2) No

Page 6: Data Journalism 101 - Day 1 by Michael J. Berens

Finding a serial killer

Page 7: Data Journalism 101 - Day 1 by Michael J. Berens

Finding deadly

germs and dirty

hospitals

Page 8: Data Journalism 101 - Day 1 by Michael J. Berens

Tracking elephant deaths inside America’s zoos

Page 9: Data Journalism 101 - Day 1 by Michael J. Berens

Tracking fraudulent

medical devices and profiteers

Page 10: Data Journalism 101 - Day 1 by Michael J. Berens

Tracking the exploitation of

vulnerable seniors

Page 11: Data Journalism 101 - Day 1 by Michael J. Berens

Cops who own crack houses

Secret release of fugitives

Sexual misconduct in health care

Jailing the poor

Nursing errors

Unsanitary hospitals

Page 12: Data Journalism 101 - Day 1 by Michael J. Berens

Most dangerous highway

Most dangerous intersection

Number of deadly police chases

Most dangerous area for crime

Most unsanitary restaurants

“Quantitative”

Page 13: Data Journalism 101 - Day 1 by Michael J. Berens

Poll Question: Why were you denied data?

• Too expensive

• Agency claimed info was not a public

record.

• Agency claimed the request was a burden.

Page 14: Data Journalism 101 - Day 1 by Michael J. Berens

Negotiating for data • Delay - we’re working on it.

• Deny – it’s proprietary software

• Divert – yours for just $12,000

Page 15: Data Journalism 101 - Day 1 by Michael J. Berens

“If you don’t know who I am, then maybe your best course of action

would be to tread lightly.”

""Walter White in "Breaking Bad"

Page 16: Data Journalism 101 - Day 1 by Michael J. Berens

Step One File layout

(secret weapon to finding stories)

Page 17: Data Journalism 101 - Day 1 by Michael J. Berens

Fields, position, type, length

Field

Number Variable Type Format Label Comment

1 SEQ_NO Char $10. Sequence Number

Unique sequence number assigned to each record within a year. First four digits

are the year of discharge.

2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003.

3 STAYTYPE Char $1 Type of Stay

1 = Inpatient

2 = Observation patient

4 HOSPITAL Char $4 Hospital Number

DOH assigned hospital number.

Fourth character describes the Medicare certified unit type with:

blank = acute care

R = Rehabilitation unit

P = Psychiatric unit

S = Swing bed unit

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

A = Alcohol (discontinued after 1992)

B = Bone marrow transplants (discontinued after 2000)

E = Extended care (discontinued after 2001)

H = Tacoma General & Group Health combined (discontinued after 1992)

I = Group Health only at Tacoma General (discontinued after 1992)

5 LINENO Num 3. Number of Reported Revenue Items Codes

6 ZIPCODE Char $5 Patient's Zip Code

99999 indicates the zip code is unknown.

99998 indicates homelessness (some homeless patients may have a zip code for a shelter or

other temporary location).

Blanks indicate non-U.S. residence.

7 STATERES Char $2 State of Residence

State abbreviation used by U.S. Postal Service.

This is assigned from the zip code.

Residents with zip code 99998 are assigned to Washington

XX = invalid zip code or a non-U.S. residence.

Page 18: Data Journalism 101 - Day 1 by Michael J. Berens
Page 19: Data Journalism 101 - Day 1 by Michael J. Berens
Page 20: Data Journalism 101 - Day 1 by Michael J. Berens

Code keys

Page 21: Data Journalism 101 - Day 1 by Michael J. Berens

Finding stories that lurk in code

keys

Page 22: Data Journalism 101 - Day 1 by Michael J. Berens
Page 23: Data Journalism 101 - Day 1 by Michael J. Berens

Stories that hide in plain sight

E9220 HANDGUN ACCIDENT

E9221 SHOTGUN ACCIDENT

E9222 HUNTING RIFLE ACCIDENT

E9223 MILITARY FIREARM ACCID

E9224 ACCIDENT - AIR GUN

E9225 ACCIDENT-PAINTBALL GUN

E9228 FIREARM ACCIDENT NEC

E9229 FIREARM ACCIDENT NOS

E9230 FIREWORKS ACCIDENT

E9231 BLASTING MATERIALS ACCID

E9232 EXPLOSIVE GASES ACCIDENT

E9238 EXPLOSIVES ACCIDENT NEC

E9239 EXPLOSIVES ACCIDENT NOS

E9240 ACC-HOT LIQUID & STEAM

E9241 ACCID-CAUSTIC SUBSTANCE

Page 24: Data Journalism 101 - Day 1 by Michael J. Berens

Secret release of fugitives – code in court data

Rising tide of innocent people killed in police chases –

code in NHTSA data

How many people contracted a hospital-acquired

infection during heart surgery – code in hospital data

----------------------

Power of two – combining data

Death certificates – list of adult family homes

Page 25: Data Journalism 101 - Day 1 by Michael J. Berens

Know the rules of the data. No detail is too small.

Tips

Page 26: Data Journalism 101 - Day 1 by Michael J. Berens

Step Two File format

Page 27: Data Journalism 101 - Day 1 by Michael J. Berens

Every computer file has an extension:

.txt Text file .csv Comma-separated value .dbf Database format .html Hyper-text mark-up language .mdb Microsoft database (Access file) .pdf Portable Document Format

Rule of thumb: Always request comma-delimited text if Excel format is unavailable

Page 28: Data Journalism 101 - Day 1 by Michael J. Berens

Two database structures: 1) Fixed length 2) Delimited

Page 29: Data Journalism 101 - Day 1 by Michael J. Berens

Fixed-length file

Berens 2312 Columbus blue Anderson 4563625 Seattle violet

Becker 45453 New York light brown

Page 30: Data Journalism 101 - Day 1 by Michael J. Berens

Delimited file

berens,272464,Seattle,blue

Page 31: Data Journalism 101 - Day 1 by Michael J. Berens
Page 32: Data Journalism 101 - Day 1 by Michael J. Berens
Page 33: Data Journalism 101 - Day 1 by Michael J. Berens

In general, how long do you wait for public data? 1) Quickly - within a few weeks at most 2) Slowly – often takes a month or more 3) Never – there’s always some issue

Poll Question:

Page 34: Data Journalism 101 - Day 1 by Michael J. Berens

Talk first. File a request last.

Tip

Page 35: Data Journalism 101 - Day 1 by Michael J. Berens

Blank canvas - importing

Page 36: Data Journalism 101 - Day 1 by Michael J. Berens

Go to “Data” tab, then look for “Text” icon

Page 37: Data Journalism 101 - Day 1 by Michael J. Berens
Page 38: Data Journalism 101 - Day 1 by Michael J. Berens
Page 39: Data Journalism 101 - Day 1 by Michael J. Berens
Page 40: Data Journalism 101 - Day 1 by Michael J. Berens
Page 41: Data Journalism 101 - Day 1 by Michael J. Berens
Page 42: Data Journalism 101 - Day 1 by Michael J. Berens

CASE DATE TIME COUNTY AREA WOUND INJURY TYPE CAUSE

1 11/21/87 645 Sauk south neck minor victim in car-stray bullet

2 11/21/87 730 Marathon centrl arm major sp loaded firearm in vehicle

3 11/21/87 930 Oneida north chest fatal si careless handling-tree involvd

4 11/21/87 945 Juneau south chest major victim in line of fire

5 11/21/87 950 Buffalo centrl leg major sp victim out of sight of shooter

6 11/21/87 1000 Portage centrl foot major si careless handling-tree involvd

7 11/21/87 1000 Portage centrl chest major sp careless handling-tree invovld

8 11/21/87 1135 Rock south head fatal victim in line of fire

9 11/21/87 1235 Columbia south head major si careless handling-tree involvd

10 11/21/87 1300 Columbia south abdomn fatal si victim fell from tree

11 11/21/87 1440 Shawano centrl chest fatal victim out of sight of shooter

12 11/21/87 1445 Trempealeau centrl neck major ricochet-off gun

13 11/21/87 1445 Columbia south leg major sp gun hammer struck an object

14 11/21/87 1630 Langlade north arm minor victim out of sight of shooter

15 11/22/87 815 Trempealeau centrl head major ricochet-bullet thru deer

16 11/22/87 900 Oconto centrl toe major si careless handling-tree involvd

17 11/22/87 900 Trempealeau centrl leg major sp victim in line of fire

18 11/22/87 1130 Buffalo centrl head minor sp victim out of sight of shooter

19 11/22/87 1143 Door north hand major si unloading firearm-defective

Page 43: Data Journalism 101 - Day 1 by Michael J. Berens

Make a copy of the database. Call it “master file” and never touch it. Always work from a copy. Hint: Keep a log of everything

Tip

Page 44: Data Journalism 101 - Day 1 by Michael J. Berens

Importing a fixed-length file

Page 45: Data Journalism 101 - Day 1 by Michael J. Berens
Page 46: Data Journalism 101 - Day 1 by Michael J. Berens
Page 47: Data Journalism 101 - Day 1 by Michael J. Berens
Page 48: Data Journalism 101 - Day 1 by Michael J. Berens
Page 49: Data Journalism 101 - Day 1 by Michael J. Berens
Page 50: Data Journalism 101 - Day 1 by Michael J. Berens

Always show your results to the sources in your story. Remember: You’re one keystroke away from a career-ending error

Tip

Page 51: Data Journalism 101 - Day 1 by Michael J. Berens

What (and where) is your favorite source of Web-based data?

Answer in the chat box

Page 52: Data Journalism 101 - Day 1 by Michael J. Berens

https://www.fpds.gov/

Page 53: Data Journalism 101 - Day 1 by Michael J. Berens

Searching for Microsoft

Page 54: Data Journalism 101 - Day 1 by Michael J. Berens
Page 55: Data Journalism 101 - Day 1 by Michael J. Berens

Instant database – 17,583 records

Page 56: Data Journalism 101 - Day 1 by Michael J. Berens

http://www.fda.gov/

Page 57: Data Journalism 101 - Day 1 by Michael J. Berens
Page 58: Data Journalism 101 - Day 1 by Michael J. Berens
Page 59: Data Journalism 101 - Day 1 by Michael J. Berens
Page 60: Data Journalism 101 - Day 1 by Michael J. Berens

Look for the entire download

Page 62: Data Journalism 101 - Day 1 by Michael J. Berens

Code key

Page 63: Data Journalism 101 - Day 1 by Michael J. Berens

http://ire.org/nicar

Page 64: Data Journalism 101 - Day 1 by Michael J. Berens
Page 65: Data Journalism 101 - Day 1 by Michael J. Berens

Don’t be

obsolete.

Page 66: Data Journalism 101 - Day 1 by Michael J. Berens

Unleash your inner watchdog