New deals on data –
Generating open knowledge
based on closed data
Konrad U. Förstner
ZB MED – Information Center for Life Sciences, Cologne, Germany &
TH Köln, Cologne Germany
November 5th, 2018, Blockchain for Science Con
Slide 2
Slide 2 text
Disclaimer
I have no to connection to any of the companies
that I will be metioned here.
I present my perspective as a bioinformatician
and open science enthusiast.
https://www.flickr.com/photos/redjar/113823307/ – CC-BY by flickr user redjar
Slide 3
Slide 3 text
Open [data|source|*] should be the default in science.
This is simply good scientific practice.
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 4
Slide 4 text
... but ...
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 5
Slide 5 text
There are cases where privacy migh be
a higher good than openess.
Certain data should not be linked to individuals.
https://commons.wikimedia.org/wiki/File:Masks_in_Venice.jpg CC-BY by Wikipedia user Rasevic
Slide 6
Slide 6 text
Behavioral data
https://www.flickr.com/photos/andrikoolme/27729630943 – CC-BY by flickr user andrikoolme
Slide 7
Slide 7 text
Socio-economical status
https://commons.wikimedia.org/wiki/File:Assorted_United_States_coins.jpg
Slide 8
Slide 8 text
Medical records
https://de.wikipedia.org/wiki/Datei:Blood_pressure_measurement.jpg
Having access to the such data of a large popuplation
would significantly help research and
to extend our medicial knowledge.
https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
Slide 14
Slide 14 text
On the other hand the data can be misused for
systematic discrimination due to
political, ideological and commercial interests.
https://www.flickr.com/photos/22394551@N03/2226095398 CC-BY by flickr user viZZZual.com
Slide 15
Slide 15 text
We have moral dillemma.
Protect individual rights
or
push the scientific progress.
https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ
Slide 16
Slide 16 text
Similar dilemmata from other research domains
• Financial data of organisations
• Energy consumption recording of devices
• Location data of vehicles
https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ
Slide 17
Slide 17 text
How can this dillemma be solved?
https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ
Slide 18
Slide 18 text
Can we use closed data to generate (open) knowledge?
https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD
Slide 19
Slide 19 text
Can we research based on black boxed data
that is at least reproducible?
https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD
Slide 20
Slide 20 text
Can we train machine learning models on closed data?
https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD
Slide 21
Slide 21 text
Or can we at least use the data to generate hypthesis
that then can be tested with complementary methods?
https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD
Slide 22
Slide 22 text
Genomics England
• Aims to hold 100,000 full genomes
• Data processing in closed data centers
• Only results leave the center via an ”airlock”
https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
Slide 23
Slide 23 text
Personal Health Train (PHT)
• Data stations – (”FAIRports”)
• Trains – Workflows that can work on the data
provided to them
https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
Slide 24
Slide 24 text
• Locked system
• Trust of the platform required
https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
Slide 25
Slide 25 text
(This slide was modified for online deposition - simply click on
the link below; It is a news article that describes how 23andMe
and other are selling genomic data to pharma industry.)
https://www.businessinsider.de/dna-testing-delete-your-data-23andme-ancestry-2018-7
Slide 26
Slide 26 text
Promises of blockchain-based,
decentralized data marketplaces
• owners have control over their data and can stay
anonymous
• standardisation of data
• people can be incentivized to share the data
• traceability (especially for pharmaceutical
companies interesting)
https://www.flickr.com/photos/katerha/4592429363 – CC-BY by flick user katerha
Slide 27
Slide 27 text
Data market place
Data
consumer
Access to data
Token
Data
owners
Blockchain-based solutions for healthcare data
• Nebula (by George Church)
• Longenesis
• Luna DNA
• phrOS (Personal Health Record Operating System)
• EncrypGen
https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD
Slide 32
Slide 32 text
Will these data market places improve science?
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 33
Slide 33 text
This has huge potential!
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 34
Slide 34 text
... but ...
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 35
Slide 35 text
Currently lot of white papers available –
nothing openly testable.
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 36
Slide 36 text
Discussion unfortunately driven
by companies not academics.
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 37
Slide 37 text
High risk –
you won’t get your genome back once it leaked.
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 38
Slide 38 text
https://doi.org/10.1126/science.1229566
Slide 39
Slide 39 text
The suggested systems have very high complexity.
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 40
Slide 40 text
Problem of different legal systems.
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 41
Slide 41 text
Implications for data owner/seller might be not clear
– education needed.
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 42
Slide 42 text
Data stored off-chain
=
outsourcing of one important problem
(suggestion like Dropbox metioned –
IMO quite a bad idea)
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 43
Slide 43 text
How to avoid false statements in surveys
to become interesting for data consumers?
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 44
Slide 44 text
Bottom line:
Very promising,
but a long and hard way to go.
https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
Slide 45
Slide 45 text
What are your questions?
konrad.foerstner.org / @konradfoerstner
zbmed.de / @ZB_MED
th-koeln.de / @th_koeln
https://www.flickr.com/photos/nateone/3768979925/ – CC-BY by flick user nateone