Download - Big Data Taxonomy 8/26/2013
![Page 1: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/1.jpg)
Towards a Big Data Taxonomy
Bill Mandrick, PhD
Data Tactics
Version 26_August_2013
![Page 2: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/2.jpg)
Scientific Taxonomies Represent
• Types of Processes • Types of Objects
– Physical Objects – Information Artifacts
• Types of Characteristics – Qualities – Roles
• Relationships – Between Processes – Between Objects – Between Characteristics
2
![Page 3: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/3.jpg)
Big Data Taxonomy
• Big Data Related Processes
• Big Data Characteristics
• Big Data Information Artifacts
• Big Data Information Bearers
• Relationships between Big Data Elements
• Mapping Instances to the Taxonomy
• Creating Situational Awareness
3
![Page 4: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/4.jpg)
Relations Between Processes
• Processes A <relation> Processes B
– Complex Process <has part> Sub-Process
– Sub-Process <part of> Complex Process
– Process A <precedes> Process B
– Process A <follows> Process B
Examples: Data Curation Process <has part> Data Selection Process Data Curation Process <has part> Data Collection Process Data Curation Process <has part> Data Archiving Process
4
![Page 5: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/5.jpg)
Information Artifact Lifecycle Processes
• Collecting
• Curating
• Representing
• Storing
– Cluster Storing
• Managing
– Processing
• Distributed Processing
– Map Reduce
• Analyzing
– Data Mining
– Causal Analysis
– Probabilistic Analysis
– Correlation Analysis
• Data Collection Process
• Data Curation Process
• Data Representation Process
• Data Storing Process
– Cluster Storing Process
• Data Management Process
– Processing
• Distributed Data Process
– Map Reduce Process
• Data Analytics Process
– Data Mining Process
– Causal Analysis Process
– Probabilistic Analysis Process
– Correlation Analysis Process
Common Labels Taxonomy Labels
5
![Page 6: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/6.jpg)
Big Data Processes
6
Big Data Processes can be decomposed and related to
other (sub)processes
…as well as to their outputs (Information Artifacts).
![Page 7: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/7.jpg)
Relating Processes to Products
7
![Page 8: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/8.jpg)
Big Data Information Artifacts
8
![Page 9: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/9.jpg)
9
![Page 10: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/10.jpg)
10
![Page 11: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/11.jpg)
Information Content Entities
11
Use Case
![Page 12: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/12.jpg)
Data Characteristics
12
![Page 13: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/13.jpg)
Information Bearers
13
![Page 14: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/14.jpg)
Partial Taxonomy
14
![Page 15: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/15.jpg)
Human Genome Data
15
![Page 16: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/16.jpg)
Terms from Human Genome Data Use Case
Use Case Term: Genomic Measurements Reference Materials Reference Data Reference Methods Assess Performance Genome Sequencing Integrate Data Sequencing Technologies Sequencing Methods Characterization Whole Human Genomes Assess Performance Genome Sequencing Run Computer System Storage Networking Processing Software Open Source Sequencing Bioinformatics Software Data Source Sequencer Volume Variety Variability Veracity Visualization Data Quality Data Types Data Analytics
Taxonomical Term: Genomic Measurement Result (Measurement Result) Reference Material Role Reference Data Role Reference Method Performance Assessment Process Genome Sequencing Process Data Integration Process Data Sequencing Technology (Tool) Sequencing Method (Process) Characterization (Data Characterization, IA or ICE) Whole Human Genome Characterization (IA or ICE?) Performance Assessment Process Genome Sequencing Run Computer System Data Storage Process Computer Networking Process Data Processing Process Software (IAO placement?) Bioinformatics Sequencing Software Data Source Role Sequencer Data Volume (Characteristic) Data Variety (Characteristic) Data Variability (Characteristic) Data Veracity (Characteristic) Data Visualization Process Data Quality (Characteristic) Data Type Data Analytics Process
16
![Page 17: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/17.jpg)
Information Artifacts: Human Genome Data Measurement Result Characterization (Data Characterization, IA or ICE) Whole Human Genome Characterization (IA or ICE?) Performance Assessment Genome Sequence Software (IAO placement?) Data Visualization
Processes: Human Genome Data Measurement Process Reference Method Performance Assessment Process Genome Sequencing Process Data Integration Process Sequencing Method (Process) Data Characterization Process Performance Assessment Process Genome Sequencing Run Data Storage Process Computer Networking Process Data Processing Process Data Visualization Process Data Analytics Process
Roles and Characteristics: Reference Material Role Reference Data Role Data Source Role Data Volume (Characteristic) Data Variety (Characteristic) Data Variability (Characteristic) Data Veracity (Characteristic) Data Visualization Process Data Quality (Characteristic)
Artifacts/Tools: Data Sequencing Technology (Tool) Computer System Computer Network Software (IAO placement?) Bioinformatics Sequencing Software Sequencer
17
Terms from Human Genome Data Use Case
![Page 18: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/18.jpg)
Genomic Research Organizations
18 Instances
![Page 19: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/19.jpg)
DNA Data Sets
19 Instances
![Page 20: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/20.jpg)
DNA Organizational Roles
20 Instances
![Page 21: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/21.jpg)
Agent Roles
21
![Page 22: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/22.jpg)
DNA Visualization
22 Instances
![Page 23: Big Data Taxonomy 8/26/2013](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6432d4a7959b07d8b46b0/html5/thumbnails/23.jpg)
Conclusion
• This method can be done for any part of the Big Data Taxonomy
• Need SME input for various areas/domains
• Need to add definitions in owl
• Need to expand set of standardized relations
• Link instances to the taxonomy (e.g. actual data sets, organizations, etc.)
23