data science london - meetup, 28/05/15
TRANSCRIPT
Semantic web warmed up:
Ontologies for the IoT
Dr. Boris Adryan
@BorisAdryan@thingslearn
Currently getting divorced fromlogic.sysbiol.cam.ac.uk
‣Everything is connected ‣ Big, noisy, often
unstructured data
‣We are learning how biological entities depend on each other
DNA > RNA > proteins
have been
‣ Everything is connected‣ Big, noisy, often
unstructured data
www.thingslearn.com
Analytics, context integration, machine learning and predictive modelling for the IoT.
0 clean shirt left +
washing machine estimates 97% of your last pack of
powder used +
it’s Wednesday, 23:55 +
the last four Thursdays had a morning business
meeting +
the car is parked 20 m from a shop
+ last retail activity: 8 sec ago
Send immediate text reminder to pick up
washing powder + send tweet from @BorisHouse
“need identified” AND “notification appropriate”Actionable insight.
From everything.
NO ANALYTICAL FLEXIBILITY IN M2M/IOTMatt Hatton, Machina Research The BLN IoT ‘14
Internet replaces wire
It’s all about the context
M2M
consumer
IoT
defined I-P-O like it’s 1975
contextcontext
context
context
context
context
context
Is it hot?
LIFE SCIENCE STRATEGIES DON’T WORK IN THE IOT
- There are no commonly accepted- ‘catalogue’ of things,- ‘ontology’ of things,- ‘data format’ of things,- ‘meta data’ for things.
- Most businesses are driven by revenue, not long-term strategic vision
- Service providers have no need to publish
- Data can be highly personal (cheap excuse)
unless they’re
META DATA, SHARING AND DATA REPOSfounded in Nov. 1999
But this is a complex and ambitious project, and is one of the biggest challenges that bioinformatics has yet faced. Major difficulties stem from the detail required to describe the conditions of an experiment, and the relative and imprecise nature of measurements of expression levels. The potentially huge volume of data only adds to these difficulties.
NatureFeb. 2000
“
“
Nov. 2000 Oct. 2002
Wide adoption: as requirement for publication in scientific journals
CURRENT GOVERNMENT INVESTMENTS INTO GENE ONTOLOGY
NIH alone spent $44,616,906 on the ontology structure since 2001(I don’t have data for UK/EU spendings)
~100 full-time salaries for experts with domain-specific knowledge
~40,000 terms
story
measurements + meta data
open, public repositories
human curators
ontology terms
community
PUBLISH OR PERISH
ok?
journal
informal exchange - no credit!
funders
assessment
The majority of this infrastructure is paid for by governments and charities
industry!
measurements + meta data
storage & provenance
human curators
ontology terms
user
PUBLISH OR YOU’RE NOT DOING IOT
ok?
Maybe the majority of this infrastructure should be paid for by governments?
companycloud
device registration
“ “
privileges dataadded value
ONTOLOGIES HAVE TO BE PRAGMATIC COMPROMISES
Gene Ontology annotation
15 years of research47 publications100+ authors
50+ PhDs15 direct annotations
~150 inferred annotations
THE THREE BRANCHES OF
Adapted from Anurag et al., Mol. BioSyst., 2012,8, 346-352
Localization: Where is an entity acting?
Function: What does the entity do?
Process: When is the entity needed?
inferences on “is a”
“part of”
“regulates”
“has part”
from geneontology.org
from Ashburner et al., Nat Genet. 2000, 25(1):25-9.
GO AND CONTEXT
THE BRANCHES OF GO AND THE IOTLocalization: inside, (my?) home, living room
Function:measures temperatureregulates temperature
interacts with user directlyinteracts with user via app
Process: regulation of temperaturemeasurement of ambient temperature ‘is proxy / is avatar’ for
presence?fire?ice age?winter?
A LAST WORD ON PRAGMATISM
“perfect” ontology
The SSN Ontology allows for inference entirely on the basis of its structure and annotation.In reality, many parameters are difficult to establish and the effort to annotate things outweighs the utility.
“crude” ontology
A simplified structure allows for quick annotation even by non-specialists.
The lack of details can lead to clashes in the ontology => more smartness has to go into software; more coding effort.
1 billlion
different things
1 milllion
use cases
0 clean shirt left +
washing machine estimates 97% of your last pack of
powder used +
it’s Wednesday, 23:55 +
the last four Thursdays had a morning business
meeting +
the car is parked 20 m from a shop
+ last retail activity: 8 sec ago
Send immediate text reminder to pick up
washing powder + send tweet from @BorisHouse
“need identified” AND “notification appropriate”Actionable insight.
From everything.
“indicator of esteem”
3% left and
not pressed
“not home”
“buying”credit card:
“highly personal device” ~ alive and awake