harnessing health.data.gov data to address diabetes in the us

24
Harnessing Health.Data.gov Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ April 17, 2013 http://semanticommunity.info/Health_Datapalooza_IV#Health.Data.gov 1

Upload: ilana

Post on 24-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Harnessing Health.Data.gov Data to Address Diabetes in the US. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ April 17, 2013 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Harnessing Health.Data.gov Data to Address Diabetes in the US

1

Harnessing Health.Data.gov Data to Address Diabetes in the US

Dr. Brand NiemannDirector and Senior Data Scientist

Semantic Communityhttp://semanticommunity.info/

AOL Government Bloggerhttp://gov.aol.com/bloggers/brand-niemann/

April 17, 2013http://semanticommunity.info/Health_Datapalooza_IV#Health.Data.gov

Page 2: Harnessing Health.Data.gov Data to Address Diabetes in the US

2

Background• HealthData.gov and Health Datapalooza III Knowledge Base and Data Ecosystem:

– Two Published Stories, Two Spreadsheets, and Two Spotfire Dashboards. My Note: HealthData.gov 194 Data Sets in 2012 and 399 now in 2013.

• Health Datapalooza IV Technology Development Track:– Knowledge Graph, Metadata, RPI Watson, Bootcamp, and Linked Data. See Next Slide

• My Process:– Harness Data for Diabetes Knowledge Base– Data Ecosystem Spreadsheet– Data Ecosystem Spotfire

• My Results:– Story– Slides– Spotfire Dashboard– Research Notes

Page 3: Harnessing Health.Data.gov Data to Address Diabetes in the US

3

HealthData.gov and Health Datapalooza III Knowledge Base

http://semanticommunity.info/HealthData.gov

Page 4: Harnessing Health.Data.gov Data to Address Diabetes in the US

4

HealthData.gov and Health Datapalooza III Spotfire Data Ecosystem

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?HealthData.gov-Spotfire

Page 5: Harnessing Health.Data.gov Data to Address Diabetes in the US

5

Health Datapalooza IV Technology Development Track

• Open Health Knowledge Graphs:– This session will describe healthdata.gov platform components, including new functionality that

programmatically exposes tabular and graph-oriented data. • Lifting Schemes:

– We will describe the ‘bottom up’ automation tools and techniques employed in the winning submission for the healthdata.gov Metadata Domain Challenge.

• Open Government Data:– We will present emerging solution standards and transitioning academic technologies, including

innovative work conducted by the ‘Watson’ research group at Rensselaer Polytechnic Institute on using Watson as a ‘data advisor’.

• Health Industry Bootcamp - A Real-World Crash Course:– An interactive, games-based bootcamp designed to get participants up and running the same day with

their own real-world portfolio covering how to use public data to create market value, how to navigate perverse incentives in the industry, and how to deliver public and social good.

• Cooperation Without Coordination: Managing Distributed Clinical Trial Data:– TBA See http://health.data.gov/cqld/ and http://reference.data.gov/cqld/about.html

• Linked Data – Structured Data on the Web:– TBA See http://sw.appliedinformaticsinc.com/fct/facet_doc.html

http://healthdatapalooza.org/agenda/tech-development-track/

Page 6: Harnessing Health.Data.gov Data to Address Diabetes in the US

6

Vocab.Data.gov: Government Data Vocabulary

http://vocab.data.gov/gd

Page 7: Harnessing Health.Data.gov Data to Address Diabetes in the US

7

Health Data Platform Metadata Challenge

http://www.health2con.com/devchallenge/health-data-platform-metadata-challenge/http://www.healthdata.gov/blog/domain-challenge-1-metadata

Mirrored http://hub.healthdata.gov to improve the CKAN-metadata and RDF.Created three levels of metadata for http://healthdata.gov datasets.Created a set of ontologies to link several datasets from HealthData.gov.

Page 8: Harnessing Health.Data.gov Data to Address Diabetes in the US

8

IBM Watson at RPI• What is Watson?:

– The underlying “DeepQA” architecture is designed to find the meaning behind a question posed in natural language and deliver a single, precise answer.

• IBM’s Watson goes to school: A Q&A with RPI’s Jim Hendler:– A version of the system similar to the one used on “Jeopardy!” will be housed at RPI for

three years as part of a Shared University Research Award from IBM Research. The system at RPI will have 15 terabytes of hard disk storage and give 20 users access to the system simultaneously, making it, according to a release, "an innovation hub” for the campus.

– One thing we want to explore is how Watson can interact with social media, especially things such as “tweets” where the language is not as carefully constructed as it is in the documents Watson has used in the Jeopardy game.

– I run a group that does a lot of work with Open Government Data systems (like the US data.gov) and we’re excited about the possibility of using Watson to help researchers around the world find relevant government data and documents for their work.

– Our goal for the next few years is to gain an understanding of what having the new ways of bringing unstructured data and documents into our computational lives will be.

http://watson.rpi.edu/

My Note: See Our Semantic Medline Work with New Cray Graph Computer.

Page 9: Harnessing Health.Data.gov Data to Address Diabetes in the US

9

Health.Data.gov

http://www.healthdata.gov/

My Note: Promotes the Diabetes Challenge,But Does Not Provide Much Data For It!

Page 10: Harnessing Health.Data.gov Data to Address Diabetes in the US

10

Health.Data.gov: Search for Diabetes

http://www.healthdata.gov/dataset/search/diabeteshttp://statesnapshots.ahrq.gov/snaps09/allStatesallMeasures.jsp?menuId=63&state=

My Note: Found One Data Set andDownloaded Two Excel Files andAdded Them to the Diabetes EcosystemSpreadsheet. See Slide 18.

Page 11: Harnessing Health.Data.gov Data to Address Diabetes in the US

11

HealthData.gov Catalog Hub

http://hub.healthdata.gov/

My Note: 402 datasets instead of 399.

My Note: Found Same State AQHR Snapshotsand CDC WONDER Births. See Next Slide.

Page 12: Harnessing Health.Data.gov Data to Address Diabetes in the US

12

HealthData.gov Catalog Hub: CDC WONDER Births

http://hub.healthdata.gov/dataset/wonder-births

Page 13: Harnessing Health.Data.gov Data to Address Diabetes in the US

13

HealthData.tw.rpi.edu Catalog Hub: CDC WONDER Births

http://healthdata.tw.rpi.edu/hub/dataset/wonder-births-1

“We mirrored the http://hub.healthdata.gov CKAN instance using its API to our own instance at http://healthdata.tw.rpi.edu/hub. This allowed us to both improve the CKAN-based metadata, including adding Data Dictionaries and Technical Documentation as Resources, and to improve the RDF generated by CKAN.”Source: Health Data Platform Metadata Challenge

Source: See Next Slide

Page 14: Harnessing Health.Data.gov Data to Address Diabetes in the US

14

CDC WONDER: Natality Information Live Births

http://wonder.cdc.gov/natality.html

My Note: Data Description contains Maternal Risk Factors:Diabetes - Yes, No, Not Stated, Not Reported.

My Note: A Data Access Agreement is Required.

Page 15: Harnessing Health.Data.gov Data to Address Diabetes in the US

15

CDC WONDER: Natality Data Live Births - Diabetes

http://wonder.cdc.gov/controller/datarequest/D66;jsessionid=A7C4A365FB2F877955A61D7BF9C5EC5C

Page 16: Harnessing Health.Data.gov Data to Address Diabetes in the US

16

CDC WONDER: Natality Data Live Births - Diabetes

http://wonder.cdc.gov/controller/datarequest/D66;jsessionid=A7C4A365FB2F877955A61D7BF9C5EC5C

My Note: Export to Text FileAnd Remove Metadata andImport to Spreadsheet.

Page 17: Harnessing Health.Data.gov Data to Address Diabetes in the US

17

Harness Health.Data.gov Data to Address Diabetes in the US Knowledge Base

http://semanticommunity.info/Health_Datapalooza_IV#Health.Data.gov

My Note: Did not find CAHMI!

My Note: Only found one!

Page 18: Harnessing Health.Data.gov Data to Address Diabetes in the US

18

Diabetes Data Ecosystem Spreadsheet

http://semanticommunity.info/@api/deki/files/23811/Diabetes.xlsx

Page 19: Harnessing Health.Data.gov Data to Address Diabetes in the US

19

NHQR State Snapshots 2009

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes-Spotfire.dxp

Page 20: Harnessing Health.Data.gov Data to Address Diabetes in the US

20

AHRQ State Snapshots Conclusion• Getting started on quality improvement is not an easy task. One

strategy a State may find helpful is to identify other States with populations similar to those targeted for a quality improvement effort. For example, a State seeking to improve rates of pneumonia vaccination for people discharged from hospitals may want to model its efforts on those of a State that has previously implemented an improvement program in this area and demonstrated success.

• In many cases, the greatest value in comparison may lie in identifying States that have started from relatively low performance and made incremental improvements. The State with the greatest improvements may have the most to contribute in demonstrating to other States how to encourage delivery system change that improves quality of care.

http://statesnapshots.ahrq.gov/snaps09/interpretation.jsp?menuId=67&state=AL#conclusion

Page 21: Harnessing Health.Data.gov Data to Address Diabetes in the US

21

AHRQ Quality of Care for Diabetes by Region and State for 2005-2006 by Conditions

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes-Spotfire.dxp

Page 22: Harnessing Health.Data.gov Data to Address Diabetes in the US

22

CDC WONDER Births Natality Diabetes

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes-Spotfire.dxp

Page 23: Harnessing Health.Data.gov Data to Address Diabetes in the US

23

Diabetes Data Ecosystem Spotfire

https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AHRQFocusonDiabetes-Spotfire.dxp

My Note: Can See All the Data Sets and Their Data ElementsTo Do Joins, Mappings, and Rule-Driven Visualizations.

Page 24: Harnessing Health.Data.gov Data to Address Diabetes in the US

24

Conclusions and Recommendations

• A Health.Data.gov search for “diabetes” gives only one data set. A Search of HealthData.gov Catalog Hub gives two data sets.

• The Health Datapalooza IV Technology Development Track Objectives Are Shown in This Work.

• I prefer both human-readable and machine-readable metadata instead of just the later which I find at the HealthData.gov Catalog Hub.

• Next is First Lady Michelle Obama on Exercise and Dr. Amen on Natural Supplements Data in Preventing and Treating Diabetes.