smart data for smart labs

19
Smart Data for Smart Labs: Utilizing Semantic Technologies for Improved Integration and Sharing of Laboratory Data Eric Little, PhD VP Data Science [email protected]

Upload: osthus

Post on 12-Feb-2017

326 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: Smart Data for Smart Labs

Smart Data for Smart Labs:

Utilizing Semantic Technologies for Improved

Integration and Sharing of Laboratory Data

Eric Little, PhD

VP Data Science

[email protected]

Page 2: Smart Data for Smart Labs

Slide 2

Outline

The Current Laboratory Data Situation

The Growing Importance of Data As A Corporate Asset

What is Semantic Technology and How Can It Help?

Moving Beyond Semantics – Big Data & Analytics

Smart Labs for the 21st Century

Page 3: Smart Data for Smart Labs

Slide 3

The Current Lab Situation

Many challenges exist for

data to be captured,

integrated and shared

Data Silos

Incompatible

instruments and

software systems

Legacy architectures are

brittle and rigid

SME knowledge resides

in people’s heads

Data schemas are not

explicitly understood

Lack of common vision

between business units

and scientists

Page 4: Smart Data for Smart Labs

Slide 4

Pharma Example in Action

Documentation

Initial Step

Local

Regulatory

Affiliate

Calibration

SME

Instrumentation Marketing

R&D Data

R&D Tech

R&D Data Stores

Production Data

External

Regulatory

Affiliate

Manual Data

Verification

Process

Verify

OK?

NO

YES

Finalized

Report

• This process can take weeks to complete

because it often had to be done several

times over due to errors.

• Relations must be built by hand on the

user side from flat files or spreadsheets.

The relations can therefore not be retained

over time or automatically generated later.

• The DBs are not built for retrieval of

different information types – the joins are

not always there.

Page 5: Smart Data for Smart Labs

Slide 5

Why Data Matters

Enterprise systems are increasingly

“hybrid” in their design and architectures

Legacy Data Sources combined with new

tech

Integrating data is becoming more

complex

The size of data sources continues to grow

Different user groups within organizations

Answers need to reflect increasingly

complex patterns

Finding and utilizing key data within an

organization is of increasing importance

Data is a valuable corporate asset

The fundamentals of data management

have changed. Basic storage & retrieval has

given way to analytics and

responsiveness.

Page 6: Smart Data for Smart Labs

Slide 6

Analytics and Data Science for the 21st Century

The rate of change in digital information is growing exponentially

Cloud Computing is now critical for scaling an enterprise

New data types are being created - hold significant value

Data is becoming more personalized and context-based

The effect of data is changing the business landscape

90% of the world’s data was produced in the last 2 years – how well can

you mine/leverage this data? What is this worth to a company?

$900 Billion/year: cost of lowered employee productivity and reduced

innovation from information overload – how can we avoid these costs?

“Increasing volume and detail of enterprise information, multimedia, social media, and the

Internet of Things will fuel exponential growth in data for the foreseeable future.”

“The use of big data will become a key basis of competition and growth for individual firms.”

McKinsey: “Big data: The next frontier for innovation, competition, and productivity”, May 2011

Page 7: Smart Data for Smart Labs

Semantic Technologies:

What Are They & How Are They Used?

Page 8: Smart Data for Smart Labs

Slide 8

The Value of Semantics

Has its origins in philosophy - generally understood as the abstract

study of meaning

Distinguished from syntax – which is the rules-based grammar of a

language

“Washington”

Page 9: Smart Data for Smart Labs

Slide 9

Semantic Web and IT Evolution: Evolving from

Code-Centric to Data-Centric IT

Semantic technologies: IT evolution from code to data centricity

In the Code-Centric years, data was often stored in flat files

The creation of databases, specifically Network and RDBMS, was

one of the first steps leading to Data-Centric evolution

The last decade has seen standards such as XML, RDF, Web

Services, and now OWL, that further evolve IT to a Data-Centric

environment

2016

Page 10: Smart Data for Smart Labs

Slide 10

Utilizing Taxonomies for Reference Data

Management

Taxonomies provide important

structure to data - as a-cyclical

tree graphs

2 Types of Applications:

• Captures sub-class and super-

class relationships

• Captures broad/narrow

relationships between terms

Page 11: Smart Data for Smart Labs

Slide 11

Allotrope Foundation Taxonomies (AFT)

mass

inte

nsity

af-m:AFM_0000350

af-

r:A

FR

_0

00

04

95

Page 12: Smart Data for Smart Labs

Slide 12

Utilizing the Semantic Spectrum

(Moving Beyond Taxonomies)

Code (Lists) Terms (Soil, Plant, etc.)

Controlled Vocabulary

(Agreed Upon Terms)

Taxonomy

(Hierarchy)

Thesaurus

(Preferred Labels, Synonyms, etc.) RDF Models

(Triples as Graphs)

OWL Ontologies

(RDF + Axioms)

Reasoning

(Rule-based Logics:

Discover New Patterns)

Ontologies and Reasoning add

Axioms and Advanced Logic

Page 13: Smart Data for Smart Labs

Slide 13

Levels of Semantic Expressivity

Semantics can be modeled at many levels

Finding the right level is a tradeoff of expressivity, performance,

decidability, and other factors

The weakest representation is basic syntax matching

The strongest representation is higher order logic

Semantic representation in RDF and ontologies is roughly in the

middle

Using knowledge representation one can separate schema

level from data level

Data becomes much more flexible and reusable

Allows easier transformation of data to knowledge creation

Raises computational value (now data can be more easily

extracted from legacy systems, shared, and used across an

enterprise).

Page 14: Smart Data for Smart Labs

Slide 14

Benefits of Semantic Technology

Interoperability

Searching/

Browsing

Reuse

Architectural

Intent

Automated

Reasoning

Development

Lifecycle

Page 15: Smart Data for Smart Labs

Moving From Semantics to

Big Data Analytics

Page 16: Smart Data for Smart Labs

Slide 16

The power of analytics is now just

beginning to be felt

Moore’s Law pertaining to

processing is not the problem

Focus on the growth of Analysis:

From 1988-2003 Computer

processing speed grew by

1000x

In the same period algorithm

dev grew by 43,000x

What does this tell you about

the direction in which we are

headed?

As data grows, so too will the need

to utilize it more effectively

The Rise of Analytics is Changing the Game

AN

ALY

TIC

S

Page 17: Smart Data for Smart Labs

Slide 17

Understanding the 4V’s of Big Data

Normally the focus –

Big Data Analysis is

more than just size

Performance is

Critical to Success

Data complexity is

increasing – Model

complexity

Uncertainty abounds

– requires statistics

and probabilities

Majority of Big Data analytics

approaches treat these two V’s

Semantic

technologies provide

clear advantages

Mathematical

Clustering

Techniques

provide clear

advantages

Page 18: Smart Data for Smart Labs

Slide 18

Why Semantics Matters for Data Analytics

Big Data approaches require proper metadata

and terminologies to integrate information well

Relationships matter in the data

Understanding perspective (context) is crucial for

success in today’s world

Semantics provides better data models/schemas

Page 19: Smart Data for Smart Labs

Slide 19

Smart Labs for the 21st Century

Smart labs in the future will provide

customers with:

Integrated Data – common reference

data structures (vocabularies)

Sharable Data – easier interaction

across teams and business units

Scalability – Big data applications

that can be highly elastic

Conceptual Representations –

context and perspective are captured

Advanced Analytics – complex &

automated problem-solving

capabilities