a tedmed data reveal: big and little dr. brand niemann director and senior data scientist semantic...
TRANSCRIPT
1
A TEDMED Data Reveal:Big and Little
Dr. Brand NiemannDirector and Senior Data Scientist
Semantic Communityhttp://semanticommunity.info/
AOL Government Bloggerhttp://gov.aol.com/bloggers/brand-niemann/
April 22, 2013http://semanticommunity.info/A_TEDMED_Data_Reveal
2
Background• I did a story about TEDMED 2012 for the 2012 Health Datapalooza III and was invited to go to TEDMED 2013 as
a Journalist!• Session 2: “How Can Big Data Become Real Wisdom?” and Session 6: “Going Farther while Staying Closer” were
the most interesting and motivating to me. See next slide.• I heard about Big and Little Data and saw an opportunity to help TEDMED with a taxonomy that is a semantic
Index to a knowledge base for improved search and to help TEDMED with examples of big and little data science.
• And the best data source for my work was Professor Christopher Murray’s (IHME/GBD) presentation and demonstration on “What does a $100 million public health data revolution look like?” funded by the Bill and Melinda Gates Foundation to prioritize global health research and help.
• It made me think of the Monica Rogati’s Tweet @ Strata 2012: More data beats clever algorithms but better data beats more data.
• I Tweeted: @TEDMED @Storify Yes, and working on IHME/GDB (Global Burden of Disease) Visualizations like: http://semanticommunity.info/Census_Data_Visualization
• But I want to volunteer to help TEDMED 2013 and 2014 as a data scientist/data journalist and saw on their Web site: If you are a talented designer and/or illustrator with experience in bringing presentations to life, you could help with our speaker presentation materials.
• I attended the First Great Challenges Day, participated in the Inventing Wellness Programs Breakout Session, and learned the importance of scientists storifying with “and, but, and therefore”.
• Therefore my story is a TEDMED Data Reveal: Big (IHME/GBD) and Little (TEDMED Web Site) with “and, but, and therefore.”
3
My TEDMED 2013 Highlights
• SESSION 2: How Can Big Data Become Real Wisdom?– Jay Walker: Introduction.
• Need a macro-scope to gather, network, store, and access data and to go from data to wisdom by finding patterns in the data.
– Larry Smarr: Can you coordinate the dance of your body's 100 trillion microorganisms?• How to quantify self movement with medical detail in real time by an
astrophysicist turned computer scientist.
• SESSION 6: Going Farther while Staying Closer– Christopher Murray: What does a $100 million public health data
revolution look like?• Talk and live demo of Global Burden of Disease Treemap, Map, Time
Plot, Age Plot, and Stacked Bar Chart by Age and Sex.See: http://blog.tedmed.com/
4
TEDMED 2013
http://www.tedmed.com/
My Note: I decided to make this a Searchable Knowledge Base.
5
TEDMED Knowledge Base
http://semanticommunity.info/A_TEDMED_Data_Reveal
Google Chrome: Find
6
TEDMED Speakers
http://www.tedmed.com/speakers
My Note: I decided to make this a little data set for faceted search.
7
TEDMED Speakers Spreadsheet
http://semanticommunity.info/@api/deki/files/23881/TEDMED.xlsx
My Note: The facets are Year, Keywords, and Tags.
8
Institutions hosting TEDMEDLive 2013
http://www.tedmed.com/event/tedmedlive?ref=participating
My Note: I decided to make this a little data set for mapping, but it was difficult to get the geo-referenced data set.
9
TEDMEDLive 2013 Institutions Spreadsheet
http://semanticommunity.info/@api/deki/files/23881/TEDMED.xlsx
My Note: Simple Geo-referencing of Institutions.
10
TEDMED 2013: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?TEDMED2013-Spotfire
11
TEDMED 2009-2012: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?TEDMED2013-Spotfire
12
Institute for Health Metrics and Evaluation (IHME)
http://www.healthmetricsandevaluation.org/
My Note: I heard this talk and decide to work with this big data.
My Note: There are three Web site
13
Press Release• The Global Burden of Disease (GBD) is a first-of-its-kind study of health around the
world. • The GBD findings present a new way to look at health, allowing countries to track
progress against diseases ranging from malaria to cancer to diabetes, identify risks including smoking and poor diet, see how people in 187 countries are faring in terms of health and gauge emerging health challenges. The GBD is a collaboration of nearly 500 researchers in 50 countries, and is led by IHME, part of the University of Washington.
• Some of the countries included in the GBD, such as the UK and Indonesia, already have started to produce their own policy recommendations as a result of the study. Australia and China are also planning to produce studies that use GBD to drill down and develop local-level health data.
• IHME is working with three localities in the US to produce GBD-type data at the community level as well.
• Efforts are underway to provide continuous updates to the GBD and expand the range of health issues included in the study.
• The GBD measures health issues around the world through more than 1 billion pieces of data that can also be explored through interactive visualization tools online.
http://semanticommunity.info/@api/deki/files/23885/TEDMED-media-advisory-Chris-Murray.docx
14
GHDx Catalog of Demographic and Health Data by IHME
http://ghdx.healthmetricsandevaluation.org/
My Note: Download Data
15
Global Burden of Disease Study 2010 Data Downloads
http://ghdx.healthmetricsandevaluation.org/global-burden-disease-study-2010-gbd-2010-data-downloads
My Note: I downloaded 17 files totaling 1.13 GB.Two Codebook files were damaged and I repaired them.
16
GBD Compare
http://viz.healthmetricsandevaluation.org/gbd-compare/
My Note: Treemap and Map.
17
GBD Cause Patterns
http://www.healthmetricsandevaluation.org/gbd/visualizations/gbd-cause-patterns
My Note: Stacked Bar Chart.
18
GBD Cause Patterns: Reports
http://www.healthmetricsandevaluation.org/gbd/visualizations/gbd-cause-patterns#/publications-presentations/reports
19
IHME-GBD Causes of Death: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?TEDMED2013-Spotfire
20
IHME-GBD Life Expectancy: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?TEDMED2013-Spotfire
21
IHME-GBD Mortality: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?TEDMED2013-Spotfire
22
IHME-GBD Risk Factors: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?TEDMED2013-Spotfire
23
IHME-GBD Breast and Cervical Cancer: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?TEDMED2013-Spotfire
Navigation and Metadata
Data Set
World Map
Bar Chart
My Note: Data Visualizations are Linked.
24
Data Ecosystem: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?TEDMED2013-Spotfire
25
IHME-GBD Life Expectancy by Country: Spotfire
Navigation and Metadata
Code Book
Filters
Details-on-Demand
My Note: The Visualizations Are Linked to One Another.
Data Set
My Note: 19 files totaling 1.13 GB of data in a Spotfire file of only 0.5 GB!
Life Expectancy by Region
Life Expectancy (LE) Versus HealthAdjusted Life Expectancy (HALE)
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?IHME-GBD-Spotfirehttps://silverspotfire.tibco.com/us/library#/users/bniemann/Public?IHME-GBD1-Spotfire
26
Conclusions and Recommendations
• My story is a TEDMED Data Reveal: Big (IHME/GBD) and Little (TEDMED) with “and, but, and therefore.”– I have done as Jay Walker suggested: We need a macro-scope to gather,
network, store, and access data and to go from data to wisdom by finding patterns in the data.• But to do that, TEDMED needs a taxonomy that is a semantic index to a
knowledge base for improved search and help with examples of big and little data science.
– I found the best big data source for my work was Professor Christopher Murray’s IHME/GBD funded by the Bill and Melinda Gates Foundation to prioritize global health research and help.• But I found I could improved the access and simplify the visualizations of the
IHME/GBD data.
– Therefore, I did both of the above and volunteered to help TEDMED 2013 and 2014 as a data scientist/data journalist.
27
Data Visualizations
http://www.healthmetricsandevaluation.org/tools/data-visualizations?page=3
28
GBD Data Visualizations Spreadsheet
http://semanticommunity.info/@api/deki/files/23881/TEDMED.xlsx
My Note: See All 13 Tabs.
29
GBD Data Visualizations Inventory
My Note: Download 36 flies totaling 19 MB and selected a few for visualizations.
30
Diabetes Prevalence by County (US) Maps
http://www.healthmetricsandevaluation.org/tools/data-visualization/diabetes-prevalence-county-us-maps#/overview/explore
My Note: I used this in my 2013Health Datapalooza IV Submissionand the The Sanofi US 2013 Data Design Diabetes Innovation Challenge – Prove It!
31
Research Articles
http://www.healthmetricsandevaluation.org/tools/data-visualization/diabetes-prevalence-county-us-maps#/publications-presentations/publications
My Note: Research Article.
32
Research Articles
http://www.pophealthmetrics.com/content/8/1/26
My Note: Included this in the Knowledge Base.
33
Datasets
http://www.healthmetricsandevaluation.org/publications/summaries/novel-framework-validating-and-applying-standardized-small-area-measurement-s#/data-methods
My Note: Downloaded this dataset.
34
Diabetes prevalence rates by age, sex, and county, 2008 (21KB* xls)
http://www.healthmetricsandevaluation.org/sites/default/files/datasets/diabetes_prevalence_by_county_rank_age_and_sex_2008_US_IHME_1010.xls http://ghdx.healthmetricsandevaluation.org/sites/ghdx/files/record-attached-files/IHME_USA_DIABETES_BY_COUNTY_2008.xls
*My Note: Actual size is 556KB.
My Note: Needed to be separatedinto county and state.
35
Metadata
http://ghdx.healthmetricsandevaluation.org/record/united-states-diabetes-prevalence-county-2008
My Note: Another Excel file name,but same file.
36
IHME Diabetes County 2009: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Diabetes-Spotfire
Navigation and Metadata
Data Set
Map
Top 10 Counties With High Prevalence of Diabetes
Higher Female Than Male Diabetes Prevalence
37
IHME-GBD Mortality by Country: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?IHME-GBD1-Spotfire
38
IHME-GBD Disability Factors by Health State: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?IHME-GBD1-Spotfire
39
IHME-GBD Risk Factors by Region: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?IHME-GBD-Spotfire
40
IHME-GBD Cause of Death by Region: Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?IHME-GBD1-Spotfire