keeping governments accountable with open data science

22
KEEPING GOVERNMENTS ACCOUNTABLE WITH OPEN DATA SCIENCE Cezary Podkul O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci

Upload: odsc

Post on 15-Aug-2015

40 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Keeping Governments Accountable with Open Data Science

KEEPING GOVERNMENTS ACCOUNTABLE WITH OPEN DATA SCIENCE

Cezary Podkul

O P E ND A T AS C I E N C EC O N F E R E N C E_

BOSTON 2015

@opendatasci

Page 2: Keeping Governments Accountable with Open Data Science

ODSC 2015 | Boston 2

KEEPING GOVERMENTS ACCOUNTABLE WITHOPEN DATA SCIENCE

Cezary Podkul, ProPublica | @Cezary

5/31/2015

Open Data Science Conference Boston 2015

Page 3: Keeping Governments Accountable with Open Data Science

ODSC 2015 | Boston 3

Quick Word About ProPublica

• We are a non-profit investigative news-room focused on accountability journalism

• We publish stories, develop news apps, tools and open source a lot of our code at: github.com/propublica

5/31/2015

Page 5: Keeping Governments Accountable with Open Data Science

ODSC 2015 | Boston 5

The Good News

5/31/2015

• A lot of data already exists on the finances of state and local governments:–Governments that borrow money from

investors provide bond offering documents and other disclosures on EMMA

– They must also produce annual filings called “Comprehensive Annual Financial Reports” which detail all of their financials

Page 6: Keeping Governments Accountable with Open Data Science

NICAR 2015 | Atlanta 6

The Good News: EMMA

• What is EMMA?– Electronic Municipal Market Access

• Since 2009, the official repository for muni bond offering documents and continuing disclosures

• Run by the Municipal Securities Rulemaking Board (MSRB)

3/7/2015

Page 7: Keeping Governments Accountable with Open Data Science

NICAR 2015 | Atlanta 7

• What’s in EMMA?–Data on more than 1.2 million muni bonds:• Official statements; ongoing financial

disclosures; advance refunding documents; event notices, voluntary disclosures, and more

–Real-time trade data for nearly every municipal bond bought and sold

– Political contribution disclosures (here)–Documents, documents, more documents

3/7/2015

The Good News: EMMA

Page 8: Keeping Governments Accountable with Open Data Science

ODSC 2015 | Boston 8

The Bad News

5/31/2015

• EMMA is great repository of info, but little of it is easily accessible:– PDFs, PDFs and more PDFs• Sell a bond? Submit a PDF• Material event happened? Tell us via PDF• File financials? File a PDF

–No standardized reporting templates• Important info scattered in different places

–No machine-readable bulk download• XBRL? You wish

Page 9: Keeping Governments Accountable with Open Data Science

ODSC 2015 | Boston 9

Things Could Be Better

5/31/2015

• The SEC’s EDGAR database makes a wealth of info available about corporations:– Bulk download of filings available via FTP:

• http://datahub.io/dataset/edgar• ftp://ftp.sec.gov/

– The agency is also moving away from text-based submissions to XBRL filings:• http://www.sec.gov/info/edgar/edgartaxonomies.shtml

– No PDFs … seriously:• “Only documents submitted to the EDGAR system in

either plain text or HTML are official filings. PDF documents are unofficial copies of filings. Filers may not use the unofficial PDF copies instead of plain text or HTML documents to meet filing requirements.”

Page 10: Keeping Governments Accountable with Open Data Science

ODSC 2015 | Boston 10

The Result

• When IBM files its annual form 10-K, you get this:– XBRL:

• http://www.sec.gov/Archives/edgar/data/51143/000104746915001106/ibm-20141231_pre.xml

– Text:• http://www.sec.gov/Archives/edgar/data/51143/0001047469-15-0

01106.txt

– Even an interactive data explorer, with Excel download:

5/31/2015

Page 11: Keeping Governments Accountable with Open Data Science

ODSC 2015 | Boston 11

The Result

• When Detroit files its Comprehensive Annual Financial Report with EMMA, you get this:– http://emma.msrb.org/ER789294-ER614016-ER10159

78.pdf

5/31/2015

Page 12: Keeping Governments Accountable with Open Data Science

ODSC 2015| Boston 12

Happy Hunting

5/31/2015

• So how do you spot anomalies like these and write about them in a systematic way?

10/2007

10/2009

10/2011

10/2013

10/2015

10/2017

10/2019

10/2021

10/2023

10/2025

10/2027

10/2029

10/2031

10/2033

10/2035

10/2037

10/2039

10/2041

10/2043

10/2045

$0

$500,000,000

$1,000,000,000

$1,500,000,000

$2,000,000,000

$2,500,000,000

$3,000,000,000

$3,500,000,000

Ohio Series 2007B Tobacco Settlement Bonds

Principal Accreted Interest

Amou

nt o

wed

ove

r tim

e

$191.3m borrowed, with $3.2bn due at maturity in 2047.

Interest accrues at 7.25% interest rate, compounded.

No option to redeem until 2017

Page 13: Keeping Governments Accountable with Open Data Science

ODSC 2015| Boston 13

Example: Tobacco Bonds

5/31/2015

• That’s what I wanted to do for my series on tobacco bonds – state and local debts backed by payments from the 1998 legal settlement with Big Tobacco

Page 14: Keeping Governments Accountable with Open Data Science

ODSC 2015| Boston 14

Example: Tobacco Bonds

5/31/2015

• Problem: How do you define the sample universe?– How many bonds are there, which ones are the anomalies?– Searching on EMMA wasn’t much help; just links to PDFs

• Solution: Asked a data vendor, Thomson Reuters SDC, for their list:

Source: Thomson Reuters SDC

Page 15: Keeping Governments Accountable with Open Data Science

ODSC 2015| Boston 15

Example: Tobacco Bonds

5/31/2015

• Problem: How do you vet the data?– Need to ensure completeness and accuracy

• Solution: Lots, and lots of reading– Re-created Thomson

Reuters database from paper filings, zeroing-in on 38 deals that included the anomalous bonds

– Logged all the terms and conditions we needed to calculate the amounts owed on the debt

Page 16: Keeping Governments Accountable with Open Data Science

Example: Tobacco Bonds

• Why not do it programmatically?

Wish we could have, but:– Data often buried in

scanned PDFs like this ->

– Even if you OCR, data do not appear in same place across documents

– Different labels, different conventions for reporting

– Sometimes, repayment amounts not reported at all5/31/2015 ODSC 2015| Boston 16

Page 17: Keeping Governments Accountable with Open Data Science

ODSC 2015| Boston 17

Example: Tobacco Bonds

5/31/2015

• Results:– Calculated that, in aggregate, state

and local governments promised to repay $64 billion on $3 billion they raised by borrowing using these bonds

– Money from tobacco settlement was supposed to go for healthcare, instead turned into multi-generational debt

– The bonds are now heading for default, prompting some state and local governments to bail out bondholders

– Focused attention on this issue, spurred additional local, state and national media coverage

Source: GoComics

Page 18: Keeping Governments Accountable with Open Data Science

Next Steps

• The Financial Transparency Act of 2015 has some helpful provisions in it:

• But for now it’s up to us to liberate the data5/31/2015 ODSC 2015 | Boston 18

Source: Data Transparency Coalition

Page 19: Keeping Governments Accountable with Open Data Science

Example: Treasury.io

• API for daily spending, revenue and debt operations data for U.S. Treasury

5/31/2015 ODSC 2015 | Boston 19

Developed by csv soundsystem with grant from Knight-Mozilla Open News Code Sprint Grant

Page 20: Keeping Governments Accountable with Open Data Science

Example: Treasury.io

• Turns text:

5/31/2015 ODSC 2015 | Boston 20

• Into structured csv:

• Parser code available at:https://github.com/csvsoundsystem/federal-treasury-api

Page 21: Keeping Governments Accountable with Open Data Science

Next Challenge

5/31/2015 ODSC 2015 | Boston 21

• The U.S. Treasury publishes even more useful data in its monthly statement:– http://

www.fiscal.treasury.gov/fsreports/rpt/mthTreasStmt/backissues.htm

• I am looking for developers interested in helping liberate the data– Is that you? Code repo available here:

https://github.com/csvsoundsystem/monthly-treasury-statements

Page 22: Keeping Governments Accountable with Open Data Science

ODSC 2015 | Boston 22

Questions?

5/31/2015

[email protected]

@Cezary