Slide 1

Slide 1 text

Semantic web warmed up: Ontologies for the IoT Dr. Boris Adryan @BorisAdryan @thingslearn Currently getting divorced from logic.sysbiol.cam.ac.uk

Slide 2

Slide 2 text

‣Everything is connected ‣ Big, noisy, often unstructured data ‣ We are learning how biological entities depend on each other DNA > RNA > proteins have been

Slide 3

Slide 3 text

‣ Everything is connected ‣ Big, noisy, often unstructured data www.thingslearn.com Analytics, context integration, machine learning and predictive modelling for the IoT.

Slide 4

Slide 4 text

0 clean shirt left + washing machine estimates 97% of your last pack of powder used + it’s Wednesday, 23:55 + the last four Thursdays had a morning business meeting + the car is parked 20 m from a shop + last retail activity: 8 sec ago Send immediate text reminder to pick up washing powder + send tweet from @BorisHouse “need identified” AND “notification appropriate” Actionable insight. From everything.

Slide 5

Slide 5 text

NO ANALYTICAL FLEXIBILITY IN M2M/IOT Matt Hatton, Machina Research The BLN IoT ‘14 Internet replaces wire It’s all about the context M2M consumer IoT defined I-P-O like it’s 1975 context context context context context context context Is it hot?

Slide 6

Slide 6 text

LIFE SCIENCE STRATEGIES DON’T WORK IN THE IOT - There are no commonly accepted - ‘catalogue’ of things, - ‘ontology’ of things, - ‘data format’ of things, - ‘meta data’ for things. - Most businesses are driven by revenue, not long-term strategic vision - Service providers have no need to publish - Data can be highly personal (cheap excuse) unless they’re

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

META DATA, SHARING AND DATA REPOS founded in Nov. 1999 But this is a complex and ambitious project, and is one of the biggest challenges that bioinformatics has yet faced. Major difficulties stem from the detail required to describe the conditions of an experiment, and the relative and imprecise nature of measurements of expression levels. The potentially huge volume of data only adds to these difficulties. Nature Feb. 2000 “ “ Nov. 2000 Oct. 2002 Wide adoption: as requirement for publication in scientific journals

Slide 9

Slide 9 text

THE LIFE SCIENCES FIXED THEIR KNOWLEDGE REPRESENTATION PROBLEM

Slide 10

Slide 10 text

FORMALISING KNOWLEDGE

Slide 11

Slide 11 text

FORMALISING KNOWLEDGE WITH GENE ONTOLOGY

Slide 12

Slide 12 text

CURRENT GOVERNMENT INVESTMENTS INTO GENE ONTOLOGY NIH alone spent $44,616,906 on the ontology structure since 2001 (I don’t have data for UK/EU spendings) ~100 full-time salaries for experts with domain-specific knowledge ~40,000 terms

Slide 13

Slide 13 text

story measurements + meta data open, public repositories human curators ontology terms community PUBLISH OR PERISH ok? journal informal exchange - no credit! funders assessment The majority of this infrastructure is paid for by governments and charities industry!

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

measurements + meta data storage & provenance human curators ontology terms user PUBLISH OR YOU’RE NOT DOING IOT ok? Maybe the majority of this infrastructure should be paid for by governments? company cloud device registration “ “ privileges data added value

Slide 16

Slide 16 text

WHAT IS AN ONTOLOGY?

Slide 17

Slide 17 text

ARE PEOPLE NOT ALREADY USING ONTOLOGIES IN THE IOT?

Slide 18

Slide 18 text

ONTOLOGIES HAVE TO BE PRAGMATIC COMPROMISES Gene Ontology annotation 15 years of research 47 publications 100+ authors 50+ PhDs 15 direct annotations ~150 inferred annotations

Slide 19

Slide 19 text

THE THREE BRANCHES OF Adapted from Anurag et al., Mol. BioSyst., 2012,8, 346-352 Localization: Where is an entity acting? Function: What does the entity do? Process: When is the entity needed?

Slide 20

Slide 20 text

inferences on “is a” “part of” “regulates” “has part” from geneontology.org from Ashburner et al., Nat Genet. 2000, 25(1):25-9. GO AND CONTEXT

Slide 21

Slide 21 text

THE BRANCHES OF GO AND THE IOT Localization: inside, (my?) home, living room Function: measures temperature regulates temperature interacts with user directly interacts with user via app Process: regulation of temperature measurement of ambient temperature ‘is proxy / is avatar’ for presence? fire? ice age? winter?

Slide 22

Slide 22 text

A LAST WORD ON PRAGMATISM “perfect” ontology The SSN Ontology allows for inference entirely on the basis of its structure and annotation. In reality, many parameters are difficult to establish and the effort to annotate things outweighs the utility. “crude” ontology A simplified structure allows for quick annotation even by non- specialists. The lack of details can lead to clashes in the ontology => more smartness has to go into software; more coding effort. 1 billlion different things 1 milllion use cases

Slide 23

Slide 23 text

0 clean shirt left + washing machine estimates 97% of your last pack of powder used + it’s Wednesday, 23:55 + the last four Thursdays had a morning business meeting + the car is parked 20 m from a shop + last retail activity: 8 sec ago Send immediate text reminder to pick up washing powder + send tweet from @BorisHouse “need identified” AND “notification appropriate” Actionable insight. From everything. “indicator of esteem” 3% left and not pressed “not home” “buying” credit card: “highly personal device” ~ alive and awake

Slide 24

Slide 24 text

Dr. Boris Adryan @BorisAdryan @thingslearn @SoftwareSaved Open software Open source Open data Fellow of the