data science london - meetup, 28/05/15

24
Semantic web warmed up: Ontologies for the IoT Dr. Boris Adryan @BorisAdryan @thingslearn Currently getting divorced from logic.sysbiol.cam.ac.uk

Upload: boris-adryan

Post on 28-Jul-2015

968 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Semantic web warmed up:

Ontologies for the IoT

Dr. Boris Adryan

@BorisAdryan@thingslearn

Currently getting divorced fromlogic.sysbiol.cam.ac.uk

‣Everything is connected ‣ Big, noisy, often

unstructured data

‣We are learning how biological entities depend on each other

DNA > RNA > proteins

have been

‣ Everything is connected‣ Big, noisy, often

unstructured data

www.thingslearn.com

Analytics, context integration, machine learning and predictive modelling for the IoT.

0 clean shirt left +

washing machine estimates 97% of your last pack of

powder used +

it’s Wednesday, 23:55 +

the last four Thursdays had a morning business

meeting +

the car is parked 20 m from a shop

+ last retail activity: 8 sec ago

Send immediate text reminder to pick up

washing powder + send tweet from @BorisHouse

“need identified” AND “notification appropriate”Actionable insight.

From everything.

NO ANALYTICAL FLEXIBILITY IN M2M/IOTMatt Hatton, Machina Research The BLN IoT ‘14

Internet replaces wire

It’s all about the context

M2M

consumer

IoT

defined I-P-O like it’s 1975

contextcontext

context

context

context

context

context

Is it hot?

LIFE SCIENCE STRATEGIES DON’T WORK IN THE IOT

- There are no commonly accepted- ‘catalogue’ of things,- ‘ontology’ of things,- ‘data format’ of things,- ‘meta data’ for things.

- Most businesses are driven by revenue, not long-term strategic vision

- Service providers have no need to publish

- Data can be highly personal (cheap excuse)

unless they’re

META DATA, SHARING AND DATA REPOSfounded in Nov. 1999

But this is a complex and ambitious project, and is one of the biggest challenges that bioinformatics has yet faced. Major difficulties stem from the detail required to describe the conditions of an experiment, and the relative and imprecise nature of measurements of expression levels. The potentially huge volume of data only adds to these difficulties.

NatureFeb. 2000

Nov. 2000 Oct. 2002

Wide adoption: as requirement for publication in scientific journals

THE LIFE SCIENCES FIXED THEIR KNOWLEDGE REPRESENTATION PROBLEM

FORMALISING KNOWLEDGE

FORMALISING KNOWLEDGE WITH GENE ONTOLOGY

CURRENT GOVERNMENT INVESTMENTS INTO GENE ONTOLOGY

NIH alone spent $44,616,906 on the ontology structure since 2001(I don’t have data for UK/EU spendings)

~100 full-time salaries for experts with domain-specific knowledge

~40,000 terms

story

measurements + meta data

open, public repositories

human curators

ontology terms

community

PUBLISH OR PERISH

ok?

journal

informal exchange - no credit!

funders

assessment

The majority of this infrastructure is paid for by governments and charities

industry!

measurements + meta data

storage & provenance

human curators

ontology terms

user

PUBLISH OR YOU’RE NOT DOING IOT

ok?

Maybe the majority of this infrastructure should be paid for by governments?

companycloud

device registration

“ “

privileges dataadded value

WHAT IS AN ONTOLOGY?

ARE PEOPLE NOT ALREADY USING ONTOLOGIES IN THE IOT?

ONTOLOGIES HAVE TO BE PRAGMATIC COMPROMISES

Gene Ontology annotation

15 years of research47 publications100+ authors

50+ PhDs15 direct annotations

~150 inferred annotations

THE THREE BRANCHES OF

Adapted from Anurag et al., Mol. BioSyst., 2012,8, 346-352

Localization: Where is an entity acting?

Function: What does the entity do?

Process: When is the entity needed?

inferences on “is a”

“part of”

“regulates”

“has part”

from geneontology.org

from Ashburner et al., Nat Genet. 2000, 25(1):25-9.

GO AND CONTEXT

THE BRANCHES OF GO AND THE IOTLocalization: inside, (my?) home, living room

Function:measures temperatureregulates temperature

interacts with user directlyinteracts with user via app

Process: regulation of temperaturemeasurement of ambient temperature ‘is proxy / is avatar’ for

presence?fire?ice age?winter?

A LAST WORD ON PRAGMATISM

“perfect” ontology

The SSN Ontology allows for inference entirely on the basis of its structure and annotation.In reality, many parameters are difficult to establish and the effort to annotate things outweighs the utility.

“crude” ontology

A simplified structure allows for quick annotation even by non-specialists.

The lack of details can lead to clashes in the ontology => more smartness has to go into software; more coding effort.

1 billlion

different things

1 milllion

use cases

0 clean shirt left +

washing machine estimates 97% of your last pack of

powder used +

it’s Wednesday, 23:55 +

the last four Thursdays had a morning business

meeting +

the car is parked 20 m from a shop

+ last retail activity: 8 sec ago

Send immediate text reminder to pick up

washing powder + send tweet from @BorisHouse

“need identified” AND “notification appropriate”Actionable insight.

From everything.

“indicator of esteem”

3% left and

not pressed

“not home”

“buying”credit card:

“highly personal device” ~ alive and awake

Dr. Boris Adryan

@BorisAdryan@thingslearn

@SoftwareSaved

Open software Open source Open data

Fellow of the