![Page 1: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/1.jpg)
Using DCO Data (Infrastructure, Management,
Analysis, Visualization, …)Peter Fox @taswegian, [email protected] (Marshall Ma) and
the Data Science TeamTetherless World Constellation
Rensselaer Polytechnic InstituteDCO Summer School, July 14, 2014. Big Sky, MT
DataSciencehttps://deepcarbon.net/group/dco-summer-school-2014
![Page 2: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/2.jpg)
Deep Carbon ObservatoryGlobal community of ‘Carbon’ scientists (~1000 from ~40 countries) contributing to a Deep Earth Computer (data legacy) comprising:
• Global Earth Mineral Laboratory• Global Census of Deep Fluids• Global Volcano Gas Emissions• Global Census of Deep Microbial Life• Global State of High Pressure and Temperature Carbon and
Related Materials• Global Inventory of Diamonds with Inclusions• …
![Page 3: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/3.jpg)
Data Science is …• Doing science with someone else’s data …
– across datasets– with models– multi-dimensional, multi-scale, multi-mode– complex data-types– needing new analytic and visual approaches
• Especially in multiple “dimensions” (functional) – E.g. Detection/ attribution methods/ algorithms– Visual exploration
DataScience
![Page 4: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/4.jpg)
You may see many diagrams like
4
![Page 5: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/5.jpg)
5
Physical quantity versus measured as quantity
Value and units?
Reference frame?
Reference units?Value and units?
![Page 6: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/6.jpg)
Data
A scientist bringing new data
Spreadsheet
Diagram
Digital MapReport
A data manager transforming data
Transformed data ready for import
Repository staff/Data librarian
(Fleischer, 2011)
Importing toolA data repository
Internet
Use case: How DCO Finds Out About Data
![Page 7: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/7.jpg)
Data-Information-Knowledge “Ecosystem”
7
Data Information Knowledge
Producers Consumers
Context
PresentationOrganization
IntegrationConversation
CreationGathering
Experience
![Page 8: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/8.jpg)
8
Producers Consumers
Quality Control
Fitness for Purpose Fitness for Use
Quality Assessment
Trustee Trustor
![Page 9: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/9.jpg)
Spreadsheets• E.g. Excel – import data
9
![Page 10: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/10.jpg)
Documentation?
10
![Page 11: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/11.jpg)
• Substantial metadata – how to visualize THIS?
Census of Deep Life
![Page 12: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/12.jpg)
• To incline to one side; to give a particular direction to; to influence; to prejudice; to prepossess. [1913 Webster]
• A partiality that prevents objective consideration of an issue or situation [syn: prejudice, preconception]
• For acquisition – sampling bias is your enemy
• Cognitive bias is (due to) YOU!
12
![Page 13: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/13.jpg)
Provenance*• Origin or source from which something
comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility– Internal– External
![Page 14: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/14.jpg)
How you find DCO data…?• http://deepcarbon.net/dco_datasets
– Will soon be a window into community-based sources• http://metpetdb.rpi.edu • http://earthchem.org/• http://www.earthchem.org/petdb • http://vamps.mbl.edu/portals/deep_carbon/
cdl.php• …
![Page 15: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/15.jpg)
Browser
![Page 16: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/16.jpg)
All information is linked and traceable!
16
![Page 17: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/17.jpg)
![Page 18: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/18.jpg)
E.g. Deep Life (CoDL)New tools: R (statistics, visualization, modeling), D3.js (visualization) NOT just of the data, but of all types of information, knowledge! iPython Notebooks?
![Page 19: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/19.jpg)
When You Use Data – Science 2.0• Version/ subsetting and converting to a format you are
familiar with is very common but mysterious– Take notes – document – provenance
• Software – what did you use and how?• Derived products – what did you create, how, why, etc.• Use the metadata every chance you get, e.g.
filenames!• Place them in a Web-accessible folder, consider getting
an identifier• Use social media, blogs, etc. to discuss it..
![Page 20: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/20.jpg)
4 R’s … Goble and others
![Page 21: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/21.jpg)
![Page 22: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/22.jpg)
Exercise 1• Search for and access a dataset that you are not
familiar with:• Can you read it?• Can you make sense of it?• Can you assess quality, uncertainty?• Any sources of bias?• What would you need to do to make it useful?
![Page 23: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/23.jpg)
When You Generate Data – Science 2.0• How the data was generated, why, for what, when and
in what format – Take notes – document – provenance
• Software – what did you use and how?• Derived products – what did you create, how, why, etc.• Use the metadata every chance you get, e.g.
filenames!• Place them in a Web-accessible folder, consider getting
an identifier• Use social media, blogs, etc. to discuss it..
![Page 24: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/24.jpg)
Make it visible to DCO (can be private)https://deepcarbon.net/dco/dco-open-access-and-data-
policies https://deepcarbon.net/page/submit-community-
data You get an identifier! DCO-ID, can be cited, rewarded and much more…Share…
![Page 25: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/25.jpg)
DCO checklist: what people have to do (courtesy UC3)
Your data management plan
Funding agency requirements
Creating your data
Organizing your data
Managing your data
Sharing your data
Domain Scientist
Data manager
Repository staff
Data Scientist
CurationServices
&Tools
Domain scientists often also take up these two roles,which however is not efficient and effective (i.e., the 80-20 rule). Data
Science
![Page 26: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/26.jpg)
DCO checklist: a service & tool perspective
Your data management plan
AP Sloan requirements+
Creating your data
Organizing your data
Managing your data
Sharing your data
e.g., NSF New Proposal and Award Policies and Procedures Guide (effective January 14, 2013)
Object Modeling
Identity Services
Storage Services
Ingest Services
Discovery Service
Characterization Services
Access Services
CKAN, community
CKAN, community
Faceted search and Drupal etc.
DCO-ID (Handle+DOI)
+
Linked Data, community
Schema.org, etc.
Use cases, info. model
![Page 27: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/27.jpg)
Exercise 2• Begin with a recent dataset that you generated or
we’re involved in generating• Can someone else read it?• Can someone make sense of it?• Have you asserted quality, uncertainty?• Have you described known sources of bias?• What else would you now do to make it more
useful?
![Page 28: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/28.jpg)
Further reading• Data Science course at RPI:
http://tw.rpi.edu/web/Courses/DataScience/2013• Fourth Paradigm:
http://research.microsoft.com/en-us/collaboration/fourthparadigm/
• Data Management Planning tools:– http://tw.rpi.edu/web/project/DCO-DS/WorkingGroups
/DMP
– http://www.iedadata.org/compliance/plan– https://dmp.cdlib.org/
![Page 29: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/29.jpg)
Breakout Session Today• Exercises 1 and 2• Discussion
![Page 30: Using DCO Data ( Infrastructure , Management , Analysis, Visualization, …)](https://reader036.vdocument.in/reader036/viewer/2022062323/568161b6550346895dd17fa9/html5/thumbnails/30.jpg)
Friday• Marshall (Xiaogang) Ma will round out the data
discussion
• DCO goal for data: in the interim, – help you become data scientists (as well as your
specialty) • Then, in time…
– you can drop “data” because you will handle data as easily as you do field work, use instruments, etc…