the preparation of information in data...

17
The Preparation of Information in Data Science

Upload: others

Post on 31-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

The Preparation of Information in Data Science

Page 2: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

2

The Role of Ontologies in Unlocking Big Data

•  Big Data holds the potential of revealing great insights from large diverse data sets if properly exploited with the right analytics

•  To better realize this potential a shift needs to occur from representations of individual data sets to representations that enable interoperability across all data sets

Page 3: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

3

The Common Core Ontology Development Method

•  Rule governed development of an extensible set of ontologies to which data from sub-domains can be aligned and linked together

•  Combines principles from the Linked Open Data Initiative, Open Biological and Biomedical Ontologies (OBO) Foundry, and object-oriented programming

Page 4: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

4

Linked Open Data Initiative

•  Began as a means for integrating data on the world wide web

•  Based on a simple set of guiding principles* – Use Universal Resource Identifiers (URIs) as

names of things – Use HTTP URIs so that people can look up

those names – When someone looks up a URI provide useful

information –  Include links to other URIs so they can discover

other things *TimBerners-Lee“LinkedOpenData”h:ps://www.w3.org/DesignIssues/LinkedData

Page 5: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

A Linked Open Data Success Story

DBPedia

5

•  Pages accessed from web browsers that link data from Wikipedia

Page 6: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

6

Linked Open Data Issue - A Profusion of Ontologies

LinkingOpenDataclouddiagram2014,byMaxSchmachtenberg,ChrisPanBizer,AnjaJentzschandRichardCyganiak.h:p://lod-cloud.net/

Page 7: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

7

Effects of Profusion

•  Costs increase – relative to the amount of duplicative effort – relative to the number of mappings – relative to the number of vernaculars

•  Effectiveness decreases – Searches have low recall and precision – Re-use creates ambiguities

Page 8: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

8

OBO Foundry

•  The Open Biological and Biomedical (OBO) Foundry is a collaborative group of organizations devoted to establishing best practices in ontology development – Leverages the lessons learned from over

$300M investment in ontology development

Page 9: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

9

An OBO Foundry Best Practice – Use a Common Upper Ontology

•  Produces common patterns within ontologies – Reuse of mappings from the sources

•  Easier to include new sources of data

– Enables reuse of queries and analytics •  Structure of data stays constant •  Easier to transition to new domains of interest

EnPty

OrganizaPon

Object

QualityofPhysicalArPfact

QualityofOrganizaPo

n

PhysicalArPfact

Quality

has_quality has_quality

bearer_of

Page 10: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

10

Basic Formal Ontology

•  An upper ontology with not more than 40 class terms and 20 relationships

•  Provides an extensible structure for the interrelationships between basic entities

•  Used as the upper ontology in hundreds of ontologies, primarily in the biomedical domain

•  Used by at least one hundred different project

Page 11: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

An OBO Foundry Best Practice - Truth as a Development Guideline

Strive towards creating a digital copy of the world

11

Reduces perspective from the ontology enabling links to many sources

Provides an objective means for settling disputes over terminology

Adds the constraint that every assertion within an ontology must be true

Page 12: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

OBO Foundry Issue - Ontologies with Too Wide a Scope

Good practice of reusing existing terminology

12

•  But the Ontology of Biomedical Investigations (OBI) is not a logical choice for where the term “Organization” is maintained

Page 13: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

Object Oriented Programming - Modularity as a Development Guideline

One axis of modularity in the CCO is level of generality

13

Content and structure is inherited from higher levels

Upper Ontologies Describe the Structure

of the World

Mid-Level Ontologies Add General Content to

the Structure

Domain Level Ontologies

Add Content Relevant to a Community

Upper and mid-level ontologies are stable and of manageable scale

Page 14: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

14

Object Oriented Programming - Modularity as a Development Guideline

The second axis of modularity in the CCO is content

A:ribute

Process

SiteTemporalRegion

PhysicalObject

has

parPcipatesin

occursatoccurson

Site

containedin

Page 15: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

15

The Common Core Ontologies in Practice

•  The Common Core Ontologies (CCO) are intended to serve as a vocabulary that can describe objects and processes that are common to many domains of interest

•  The remaining objects and processes that are unique to particular domains of interest are described by ontologies that extend from the CCO in a repeatable, rule governed process

Page 16: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

16

The Common Core and Domain Ontologies

BasicFormalOntology(BFO)

ExtendedRelaPonOntology

TimeOntology

QualityOntology

InformaPonEnPty

Ontology

GeospaPalOntology

EventOntology ArPfact

OntologyAgent

Ontology

Affec%veState

Ontology

EthnicityOntology

Occupa%onOntology

HydrographicFeatureOntology

PhysiographicFeatureOntology

CurrencyUnit

OntologyUnitsofMeasureOntology

CurriculumOntology

Ci%zenshipOntology

UpperOntology:

CommonCoreOntology:

DomainOntology:

WatercraCOntology

SensorOntology

AgentInforma%onOntology

UnderseaWarfareOntology

SpaceObjectOntology

Page 17: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the

17

The Benefits of the Common Core Ontology Development Process