Download - Sailing on the ocean of 1s and 0s
![Page 1: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/1.jpg)
Sailing on the Ocean of 1's and 0's
![Page 2: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/2.jpg)
Chris Woodruff
Chris [email protected] – http://chriswoodruff.comTechnical Architect -- Perficient
• Coordinator, Grand Rapids DevDay• INETA Director• Co-host of Deep Fried Bytes Tech Podcast – http://deepfriedbytes.com
![Page 3: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/3.jpg)
Where are we sailing today?• Lets look at Data• Go on to making Data
valuable• Look at ways to share
Data• Finally lets talk about
making Data look good
![Page 4: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/4.jpg)
Science Paradigms
• 1000’s Years Ago– Science was empirical– Describing
• 100’s Years Ago– Theoretical– Using Models
• Last Few Decades– Computational– Simulations
• Today (eScience)– Data Exploration– Unified Theory– Data Generated by
Instruments or Simulations
– Scientists Analyzes data after curated
from The Fourth Paradigm: Data-Intensive Scientific Discovery
![Page 5: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/5.jpg)
Before we get into the water lets talk about the Digital Ocean
The Internet
![Page 6: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/6.jpg)
Why the Internet Won
• Simple architecture - HTML, URI, HTTP• Networked - value grows with data, services, users• Extensible - from Web of documents to ...• Tolerant - even w/ imperfect mark-up, data, links,
software• Universal - independent of systems and people• Free / cheap - browsers, information, services• Simple / powerful / productive for users - text, graphics,
links• Open standards
![Page 7: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/7.jpg)
What is Data?
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data (plural of "datum") are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which information and then knowledge are derived. Raw data, i.e. unprocessed data, refers to a collection of numbers, characters, images or other outputs from devices that collect information to convert physical quantities into symbols.
![Page 8: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/8.jpg)
What really is Data?
Information that has no meaning or
understanding.
![Page 9: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/9.jpg)
What is Data Really?
![Page 10: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/10.jpg)
Where is Data produced?
![Page 11: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/11.jpg)
HOW MUCH DATA IS GENERATED ON INTERNET EVERY
YEAR/MONTH/DAY?
![Page 12: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/12.jpg)
HOW MUCH DATA IS MOVED ON INTERNET EVERY MONTH/DAY?
• 21 exabytes per a month
• Around 675 petabytes per a day
The amount of data produced each year would fill 37,000 libraries the size of the Library of Congress. (2003)
![Page 13: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/13.jpg)
Exabyte == a quintillion (or a million trillion) bytes or units of computer data. One exabyte is equivalent to 50,000 years’ worth of DVD-quality data.
![Page 14: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/14.jpg)
TWITTER USERS ARE AVERAGING 27.3 MILLION TWEETS PER DAY WITH AN ANNUAL RUN RATE OF 10 BILLION TWEETS
According to data from Pingdom
HOW MUCH DATA DOES TWITTER PRODUCE?
![Page 15: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/15.jpg)
How much Data is Facebook generating?
More than 30 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) shared each month.
Average user creates 90 pieces of content each month
![Page 16: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/16.jpg)
INTERNET USERS ARE GENERATING PETABYTES OF DATA EVERY DAY
![Page 17: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/17.jpg)
How much Data does your organization
produce?
![Page 18: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/18.jpg)
Curating Data
![Page 19: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/19.jpg)
Definition
“Data curation is the selection, preservation, maintenance, collection and archiving of digital assets.”
![Page 20: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/20.jpg)
What is involved in D Curation?
• Collecting verifiable digital assets• Providing digital asset search and
retrieval• Certification of the trustworthiness and
integrity of the collection content• Semantic and ontological continuity and
comparability of the collection content
![Page 21: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/21.jpg)
Challenges of D Curation
• Storage format evolution and obsolescence
• Rate of creation of new data and data sets
• Broad access and searching flexibility and variety
• Comparability of semantic and ontological definitions of data sets
![Page 22: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/22.jpg)
Setting up a Curation Process
• Identify what data you need to curate• Identify who will curate the data• Define the curation workflow• Identity the most appropriate data-in and
data-out formats• Identify the artifacts, tools, and processes
needed to support the curation process
![Page 23: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/23.jpg)
Tools to Curate Data
Physical• SQL Databases• Wiki’s• SharePoint• Data Warehouses
Collaborative• DBPedia• Azure Datamarket
Semantics!!
![Page 24: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/24.jpg)
“Open” Data
![Page 25: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/25.jpg)
Semantic Web• XML provides an elemental syntax for
content structure within documents, yet associates no semantics with the meaning of the content contained within.
• XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents.
• RDF is a simple language for expressing data models, which refer to objects ("resources") and their relationships.
• RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.
• OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
• SPARQL is a protocol and query language for semantic web data sources.
![Page 26: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/26.jpg)
Open Data Protocol (OData) The Open Data Protocol (OData) enables the
creation of HTTP-based data services, which allow resources identified using Uniform Resource Identifiers (URIs) and defined in an abstract data model, to be published and edited by Web clients using simple HTTP messages.
![Page 27: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/27.jpg)
The Key to “Open Data”?
• Shared Agreed upon Protocols• Metadata• Shared Vocabularies
![Page 28: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/28.jpg)
Visualization of Data
![Page 29: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/29.jpg)
Think about your Data
![Page 30: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/30.jpg)
Produce Great Graphical Information
![Page 31: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/31.jpg)
Minard's Diagram of Napoleon's March on Moscow
![Page 32: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/32.jpg)
Have Integrity in your Graphical Information
Edward Tufte’sThe Lie Factor
![Page 33: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/33.jpg)
Have Context with your Graphical Information
![Page 34: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/34.jpg)
Use less “Ink”
![Page 35: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/35.jpg)
Get Rid of the Junk
![Page 36: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/36.jpg)
Thanks Dave Giard!!!
![Page 37: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/37.jpg)
Examples of Great Visual Data
![Page 38: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/38.jpg)
![Page 39: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/39.jpg)
![Page 40: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/40.jpg)
Data Experience (DX)
![Page 41: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/41.jpg)
Wrap Up
• Think about your data• Learn more about how your users work with
the data you curate• Learn about better ways to share your data• Visualize and show the information your data
best for your users• Be a Data Experience Expert
![Page 42: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/42.jpg)
Required Reading
The Fourth Paradigm: Data-Intensive Scientific Discovery
![Page 43: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/43.jpg)
Required Reading
The Visual Display of Quantitative Information
![Page 44: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/44.jpg)
Required Reading
Beautiful Visualization: Looking at Data through the Eyes of Experts
![Page 45: Sailing on the ocean of 1s and 0s](https://reader035.vdocument.in/reader035/viewer/2022070317/5564d7f4d8b42ad3488b4744/html5/thumbnails/45.jpg)
Discussions