from big data to big analysis - the european materials ......realization: ontology, er model, uml...
Post on 26-Sep-2020
0 Views
Preview:
TRANSCRIPT
V.2.2
Heiner Oberkampf, PhDNovember 7-8th 2017EMMC Workshop on Interoperability in Materials Modelling, Cambridge
From Big Data to Big AnalysisThe convergence of Formal Semantics & Data Science in Life Sciences
Slide 2
Understanding the 4V’s of Big Data
Normally the focus of Big Data Solutions
Performance is Critical to Success
Data Complexity is Increasing
Handling Uncertainty Requires Statistics
Majority of Big Data analytics approaches treat these two V’s
Semantic technologies provide
clear advantages
Mathematical Clustering
Techniques provide clear advantages
Slide 3
AT OSTHUS LAB DATA SCIENCE IS
B I G A N A L Y S I SST
ATIS
TIC
AL
SEM
ANTI
CS
MAC
HIN
ELE
ARN
ING
REA
SON
ING
Slide 4
Laboratory Analytical Process
sample dataanalytical process
Slide 5
Typical Laboratory Data
Slide 6
Allotrope Structure 2017
Astrix Technology GroupBSSN SoftwareElemental MachinesErasmus MCFraunhofer IPAThe HDF GroupLabAnswerLabWareMettler ToledoNISTSciBiteStanford UniversityUniversity of Illinois at ChicagoUniversity of Southampton
More information: https://www.allotrope.org/
Slide 7
Allotrope Data Format (ADF)
HDF5Platform Independent File Format
Allotrope Data Format (ADF)
Descriptive metadata about• Method, instrument, sample,
process, result, etc.• Provenance, audit trail• Data Cube, Data Package
Analytical data represented by one-or multidimensional arrays of homogeneous data structures.
Analytical data represented by arbitrary formats, incl. native instrument formats, images, pdf, video, etc.
Specifically designed to store and organize large amounts of scientific data.
Data DescriptionSemantic Graph Model
Data Cubes Universal Data Container
Data Package Virtual File System
APIs
(Jav
a &
.NET
cla
ss li
brar
ies)
Chromatogram 2D HDF
Slide 8
Ontology for HPLC Example
resultdevice
materialprocess
Slide 9
Allotrope Example: Semantics Provides Common Meaning
Allotrope Data Format (ADF)Instance Data
Allotrope Data Models (ADM)Constraints
Allotrope Foundation Ontologies (AFO)Classes and Properties(aligned with Basic Formal Ontology)
is structured by
is classified by
provide standardizedvocabulary
Slide 10
Semantic Spectrum of Knowledge Organization Systems
• Deborah L. McGuinness. "Ontologies Come of Age". In Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2003. • Michael Uschold and Michael Gruninger “Ontologies and semantics for seamless connectivity” SIGMOD Rec. 33, 4 (December 2004), 58-64. DOI=http://dx.doi.org/10.1145/1041410.1041420• Leo Obrst “The Ontology Spectrum”. Book section in of Roberto Poli, Michael Healy, Achilles Kameas “Theory and Applications of Ontology: Computer Applications”. Springer Netherlands, 17 Sep 2010.• Leo Obrst and Mills Davis "Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibillion Dollar Market Opportunities”. 2008.
Sources
Slide 11
Application and Reference Ontologies
Materials Models
Application Ontology• includes: Information/Data Model, Schema, Domain
Ontology• Role: Defines the important entities and their
relationships for a specific application scenario.• Realization: ontology, ER Model, UML etc.
Reference Ontology• Also called: Canonical Reference Ontology,
Reference Terminology, Domain Ontology, Foundational Ontology
• Role: Standard (structured) vocabulary to be used for placeholder classes of the data model
• Realization: list, thesaurus, taxonomy or ontology• Domain models reusable in many different
application scenarios• Modules: Public ref. ontologies plus extension• Mappings between ref. ontologies
Units
Terminology Binding• Interface between data model and ref. terminologies
simulationobservation
subjectvalue
unit
Upp
er-L
evel
Ont
olog
y
11
Qualities
physical property
Slide 12
Linked Materials Modelling Data
Lightweight Semantic Integration LayerMake data Findable, Accessible , Interoperable Reusable
(APIs, semantic indexing, data annotation, catalogs, metadata and linking)
Linked Open Data& Open APIs
Semantic Graph DB
(Knowledge Graph)
Simulation and Material Models
Repository
…Unstructured Documents
Analyticssimulationslearningreasoning
Visualizationdashboardsexplorationsearch …
The FAIR Guiding Principles for scientific data management and stewardship https://www.nature.com/articles/sdata201618
Slide 13
1. Think from the end and put use-cases first.
2. Reduce the pain of data sharing and integration by using semantics and FAIR principles.
3. Combine logical and statistical approaches.
Towards Big Analysis
Slide 14
Heiner OberkampfConsultant at OSTHUS GmbH+49 (0) 24194314-490heiner.oberkampf@osthus.comwww.osthus.com
CONNECTING DATA, PEOPLE AND ORGANIZATIONS
top related