ddod - the lean startup approach to open data
TRANSCRIPT
Presented at
Executive Office of the PresidentOffice of Science and Technology
PolicyUS Gov Data Cabinet meeting
June 21, 2016
The Lean Startup Approach to Open DataHow Demand-Driven Open Data (DDOD) Improves Relevance, Discoverability and Usability
David PortnoyEntrepreneur-in-ResidenceU.S. Department of Health & Human ServicesTwitter: @dportnoyhttp://ddod.healthdata.gov
Piloted as “Lean Startup for Open Data”
at HHS IDEA Lab across Department of Health & Human Services agencies
Demonstrated capabilities in improving data quality
at White House Open Data Roundtable
Optimizing DDOD to be scalable and applicable across government
with OSTP/OMB, Data Cabinet and Center for Open Data Enterprise
Discuss data maturity model application beyond open data
at MIT Chief Data Officer conference
The Background
CIOs and agency heads must: Maintain an EDI (Enterprise Data Inventory); Implement a “process to evaluate and improve timeliness, completeness, accuracy, usefulness, and availability” of open data; Implement a method for understanding data asset usage, responding to quality issues, usability, recommendations for improvements, and adherence complaints; Ensure conformance with open data best practices; Produce an Open Data Compliance Report.
Agencies must: Analyze data asset usage, including responding to quality issues, usability, recommendations for improvements, and complaints about adherence; Monitor public satisfaction and performance improvement needs; Engage the public in using open data and encourage collaborative approaches to improving data use; Provide information for the GAO report on the value of information made available to the public and additional data assets that should be made available publicly.
Looking Ahead... OPEN Gov Data Act
Focus on measuring value of data
Engage the public in using open data and encourage collaborative approaches to improving data use
Analyze data asset usage. “Monitor public satisfaction and performance improvement needs”
Institute a process to continuously improve on “quality issues, usability, recommendations, complaints...”
What happens when we don’t measure value?
Data owners focus on datasets that are:easiest to generate and least risky to release
Unusable and low-value datasetsDifficult to find useful data
The Reality
The Result
Take community engagement (on steroids, of course)
The Solution
And pair it with lean startup principles
The Shift
What’s a Use Case?
All metrics in DDOD are in terms of Use Cases,
...which is simply a well-defined application of a dataset for a specific purpose in industry, research or media.
It always includes a statement of value -- both to the requester and the general public
Each use case has core sections…
Description
Value
Specifications
Solution
See them at:http://ddod.healthdata.gov/
Anatomy of a Use Case
Processes for administration of use cases, such as • Encouraging responsiveness, transparency and documentation • Ensuring use cases and resulting datasets are indexed in HealthData.gov
Specialized tools for administering use cases • Workflow engine, communications method, knowledge base • Data processing, storage, hosting, versioning
Proactive outreach to industry and academia for a thriving community
DDOD provides 3 core services to Data Owners
The Framework
Identify missing technical capability
Manually improve data catalog and
data assets
Contribute toUse Case
knowledge base
External DDOD Activity• Outreach & collaboration• Use case administration
DDOD drives 3 types of deliverables:cataloging of use cases,
improvements of data assets and development of technical capabilities
Internal DDOD Activity• Systems development• Program evaluation
Ongoing Systems development specification
Increase & measure value Improve capability
Knowledge Base Data Assets Technical Capability
The Process DDOD’s workflow for a Use Case enabled by 3 types of participants:
Data User, DDOD Admin, Data Owner
Communications Platform (Github Issues)
Data Catalog (HealthData.gov)
Knowledge Base (MediaWiki)
The Tools
Middleware (Python)
Tied together with middleware that monitors changes and
tracks progress
The Architecture
Data.json
Hosted charts
(Flask, Google Charts, Bokeh)
Embed
Middleware(Python, Flask, math libraries)
HealthData.gov
Drupal
(CMS, workflow)
Semantic MediaWiki
Drupal DKAN theme
SMW API
GitHub issues
GH API
DCAN Drupal Extension
(DKAN data catalog)
Requests Library
...but it’s always changing
DDOD use cases deliver value in 6 ways...
The Metrics
✤ As of May 2016
The Deliverables
Knowledge Base Data Catalog & Assets Technical Capability
◼ 44 use cases documenting specific applications of open data assets added to DDOD knowledge base
◼ 8 agencies covered: CMS, FDA, CDC, HRSA, ONC, ACF, ACL, ASPE
◼ 47 users served by DDOD, including companies, data scientists, researchers, journalists and nonprofits
◼ 20 use cases driving additional datasets indexed
◼ 180 previously uncataloged URLs identified
◼ 9 use cases driving new or improved datasets released
◼ 2 standards for open data resulting from 8 use cases
◼ Automated calculation and visualization of value metrics◼ Dataset count fluctuation monitoring
◼ Daily catalog change reports
◼ Data asset federation report & harvest flow visualization
◼ DDOD/HealthData.gov integration roadmap • Single source of truth monitoring
• Data quality notifications
• Auto sync between platforms
It started with frustration about data quality
Can’t reconcile multiple sources
Missing unique identifiers
Refreshes change history
And ended with a release of new data (including an API!)
Example Use Case
Quality improvements using machine readability and consolidation
Medicaid enrollment data reports have been published only as PDFs
...with different files by years and state!
Lots of overhead and transcription errors
If only they could all be that easy!
Example Use Case
Data quality improves by eliminating manual entry
Federal poverty guidelines are tables published annually
Lots of organizations enter these by hand
But community already solved the problem
The best kind of problems solve themselves!
Example Use Case
Insights for regulation Stimulate adoption
Sometimes, the biggest gains come when you observe trends.
Observe 7 use cases with common challenge
Need standardized provider dimension
Work with regulators and industry
DDOD was able to contribute to a new standard
7 use cases impacted Industry work groupExample Use Case
Your insights please!
Help fine-tune DDOD to be most applicable to your agency’s needs
1. Channels used to reach public2. Prioritization of releases / improvements3. Measuring value of data assets4. Incentives for program owner
The Request