Slide 1

Slide 1 text

New deals on data – Generating open knowledge based on closed data Konrad U. Förstner ZB MED – Information Center for Life Sciences, Cologne, Germany & TH Köln, Cologne Germany November 5th, 2018, Blockchain for Science Con

Slide 2

Slide 2 text

Disclaimer I have no to connection to any of the companies that I will be metioned here. I present my perspective as a bioinformatician and open science enthusiast. https://www.flickr.com/photos/redjar/113823307/ – CC-BY by flickr user redjar

Slide 3

Slide 3 text

Open [data|source|*] should be the default in science. This is simply good scientific practice. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 4

Slide 4 text

... but ... https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 5

Slide 5 text

There are cases where privacy migh be a higher good than openess. Certain data should not be linked to individuals. https://commons.wikimedia.org/wiki/File:Masks_in_Venice.jpg CC-BY by Wikipedia user Rasevic

Slide 6

Slide 6 text

Behavioral data https://www.flickr.com/photos/andrikoolme/27729630943 – CC-BY by flickr user andrikoolme

Slide 7

Slide 7 text

Socio-economical status https://commons.wikimedia.org/wiki/File:Assorted_United_States_coins.jpg

Slide 8

Slide 8 text

Medical records https://de.wikipedia.org/wiki/Datei:Blood_pressure_measurement.jpg

Slide 9

Slide 9 text

Genome / Exome / SNPs https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Microbiome information https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

Slide 12

Slide 12 text

https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

Slide 13

Slide 13 text

Having access to the such data of a large popuplation would significantly help research and to extend our medicial knowledge. https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

Slide 14

Slide 14 text

On the other hand the data can be misused for systematic discrimination due to political, ideological and commercial interests. https://www.flickr.com/photos/22394551@N03/2226095398 CC-BY by flickr user viZZZual.com

Slide 15

Slide 15 text

We have moral dillemma. Protect individual rights or push the scientific progress. https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ

Slide 16

Slide 16 text

Similar dilemmata from other research domains • Financial data of organisations • Energy consumption recording of devices • Location data of vehicles https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ

Slide 17

Slide 17 text

How can this dillemma be solved? https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ

Slide 18

Slide 18 text

Can we use closed data to generate (open) knowledge? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

Slide 19

Slide 19 text

Can we research based on black boxed data that is at least reproducible? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

Slide 20

Slide 20 text

Can we train machine learning models on closed data? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

Slide 21

Slide 21 text

Or can we at least use the data to generate hypthesis that then can be tested with complementary methods? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

Slide 22

Slide 22 text

Genomics England • Aims to hold 100,000 full genomes • Data processing in closed data centers • Only results leave the center via an ”airlock” https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

Slide 23

Slide 23 text

Personal Health Train (PHT) • Data stations – (”FAIRports”) • Trains – Workflows that can work on the data provided to them https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

Slide 24

Slide 24 text

• Locked system • Trust of the platform required https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

Slide 25

Slide 25 text

(This slide was modified for online deposition - simply click on the link below; It is a news article that describes how 23andMe and other are selling genomic data to pharma industry.) https://www.businessinsider.de/dna-testing-delete-your-data-23andme-ancestry-2018-7

Slide 26

Slide 26 text

Promises of blockchain-based, decentralized data marketplaces • owners have control over their data and can stay anonymous • standardisation of data • people can be incentivized to share the data • traceability (especially for pharmaceutical companies interesting) https://www.flickr.com/photos/katerha/4592429363 – CC-BY by flick user katerha

Slide 27

Slide 27 text

Data market place Data consumer Access to data Token Data owners

Slide 28

Slide 28 text

Concepts of underlying solutions • Fully Homomorphic Encryption (FHE) • Multi-party Computation (MPC) • Trusted Execution Environment (TEE) like Intel SGX https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD

Slide 29

Slide 29 text

Blockchain Data consumer Secure compute node Private data storage Data owner

Slide 30

Slide 30 text

General purpose blockchain-based solutions • Ocean protocol • Enigma (secret contracts) • Ekiden protocol (Oasis Labs) • OpenMind https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD

Slide 31

Slide 31 text

Blockchain-based solutions for healthcare data • Nebula (by George Church) • Longenesis • Luna DNA • phrOS (Personal Health Record Operating System) • EncrypGen https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD

Slide 32

Slide 32 text

Will these data market places improve science? https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 33

Slide 33 text

This has huge potential! https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 34

Slide 34 text

... but ... https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 35

Slide 35 text

Currently lot of white papers available – nothing openly testable. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 36

Slide 36 text

Discussion unfortunately driven by companies not academics. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 37

Slide 37 text

High risk – you won’t get your genome back once it leaked. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 38

Slide 38 text

https://doi.org/10.1126/science.1229566

Slide 39

Slide 39 text

The suggested systems have very high complexity. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 40

Slide 40 text

Problem of different legal systems. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 41

Slide 41 text

Implications for data owner/seller might be not clear – education needed. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 42

Slide 42 text

Data stored off-chain = outsourcing of one important problem (suggestion like Dropbox metioned – IMO quite a bad idea) https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 43

Slide 43 text

How to avoid false statements in surveys to become interesting for data consumers? https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 44

Slide 44 text

Bottom line: Very promising, but a long and hard way to go. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

Slide 45

Slide 45 text

What are your questions? konrad.foerstner.org / @konradfoerstner zbmed.de / @ZB_MED th-koeln.de / @th_koeln https://www.flickr.com/photos/nateone/3768979925/ – CC-BY by flick user nateone