Upgrade to Pro — share decks privately, control downloads, hide ads and more …

New deals on data - Generating open knowledge based on closed data

New deals on data - Generating open knowledge based on closed data

Talk hold at the "Blockchain for Science 2018" (Berlin).

Konrad Förstner

November 05, 2018
Tweet

More Decks by Konrad Förstner

Other Decks in Science

Transcript

  1. New deals on data –
    Generating open knowledge
    based on closed data
    Konrad U. Förstner
    ZB MED – Information Center for Life Sciences, Cologne, Germany &
    TH Köln, Cologne Germany
    November 5th, 2018, Blockchain for Science Con

    View full-size slide

  2. Disclaimer
    I have no to connection to any of the companies
    that I will be metioned here.
    I present my perspective as a bioinformatician
    and open science enthusiast.
    https://www.flickr.com/photos/redjar/113823307/ – CC-BY by flickr user redjar

    View full-size slide

  3. Open [data|source|*] should be the default in science.
    This is simply good scientific practice.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  4. ... but ...
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  5. There are cases where privacy migh be
    a higher good than openess.
    Certain data should not be linked to individuals.
    https://commons.wikimedia.org/wiki/File:Masks_in_Venice.jpg CC-BY by Wikipedia user Rasevic

    View full-size slide

  6. Behavioral data
    https://www.flickr.com/photos/andrikoolme/27729630943 – CC-BY by flickr user andrikoolme

    View full-size slide

  7. Socio-economical status
    https://commons.wikimedia.org/wiki/File:Assorted_United_States_coins.jpg

    View full-size slide

  8. Medical records
    https://de.wikipedia.org/wiki/Datei:Blood_pressure_measurement.jpg

    View full-size slide

  9. Genome / Exome / SNPs
    https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

    View full-size slide

  10. Microbiome information
    https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

    View full-size slide

  11. https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

    View full-size slide

  12. Having access to the such data of a large popuplation
    would significantly help research and
    to extend our medicial knowledge.
    https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

    View full-size slide

  13. On the other hand the data can be misused for
    systematic discrimination due to
    political, ideological and commercial interests.
    https://www.flickr.com/photos/22394551@N03/2226095398 CC-BY by flickr user viZZZual.com

    View full-size slide

  14. We have moral dillemma.
    Protect individual rights
    or
    push the scientific progress.
    https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ

    View full-size slide

  15. Similar dilemmata from other research domains
    • Financial data of organisations
    • Energy consumption recording of devices
    • Location data of vehicles
    https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ

    View full-size slide

  16. How can this dillemma be solved?
    https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ

    View full-size slide

  17. Can we use closed data to generate (open) knowledge?
    https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

    View full-size slide

  18. Can we research based on black boxed data
    that is at least reproducible?
    https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

    View full-size slide

  19. Can we train machine learning models on closed data?
    https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

    View full-size slide

  20. Or can we at least use the data to generate hypthesis
    that then can be tested with complementary methods?
    https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

    View full-size slide

  21. Genomics England
    • Aims to hold 100,000 full genomes
    • Data processing in closed data centers
    • Only results leave the center via an ”airlock”
    https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

    View full-size slide

  22. Personal Health Train (PHT)
    • Data stations – (”FAIRports”)
    • Trains – Workflows that can work on the data
    provided to them
    https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

    View full-size slide

  23. • Locked system
    • Trust of the platform required
    https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

    View full-size slide

  24. (This slide was modified for online deposition - simply click on
    the link below; It is a news article that describes how 23andMe
    and other are selling genomic data to pharma industry.)
    https://www.businessinsider.de/dna-testing-delete-your-data-23andme-ancestry-2018-7

    View full-size slide

  25. Promises of blockchain-based,
    decentralized data marketplaces
    • owners have control over their data and can stay
    anonymous
    • standardisation of data
    • people can be incentivized to share the data
    • traceability (especially for pharmaceutical
    companies interesting)
    https://www.flickr.com/photos/katerha/4592429363 – CC-BY by flick user katerha

    View full-size slide

  26. Data market place
    Data
    consumer
    Access to data
    Token
    Data
    owners

    View full-size slide

  27. Concepts of underlying solutions
    • Fully Homomorphic Encryption (FHE)
    • Multi-party Computation (MPC)
    • Trusted Execution Environment (TEE) like Intel SGX
    https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD

    View full-size slide

  28. Blockchain
    Data
    consumer
    Secure
    compute
    node
    Private
    data
    storage
    Data
    owner

    View full-size slide

  29. General purpose blockchain-based solutions
    • Ocean protocol
    • Enigma (secret contracts)
    • Ekiden protocol (Oasis Labs)
    • OpenMind
    https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD

    View full-size slide

  30. Blockchain-based solutions for healthcare data
    • Nebula (by George Church)
    • Longenesis
    • Luna DNA
    • phrOS (Personal Health Record Operating System)
    • EncrypGen
    https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD

    View full-size slide

  31. Will these data market places improve science?
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  32. This has huge potential!
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  33. ... but ...
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  34. Currently lot of white papers available –
    nothing openly testable.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  35. Discussion unfortunately driven
    by companies not academics.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  36. High risk –
    you won’t get your genome back once it leaked.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  37. https://doi.org/10.1126/science.1229566

    View full-size slide

  38. The suggested systems have very high complexity.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  39. Problem of different legal systems.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  40. Implications for data owner/seller might be not clear
    – education needed.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  41. Data stored off-chain
    =
    outsourcing of one important problem
    (suggestion like Dropbox metioned –
    IMO quite a bad idea)
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  42. How to avoid false statements in surveys
    to become interesting for data consumers?
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  43. Bottom line:
    Very promising,
    but a long and hard way to go.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View full-size slide

  44. What are your questions?
    konrad.foerstner.org / @konradfoerstner
    zbmed.de / @ZB_MED
    th-koeln.de / @th_koeln
    https://www.flickr.com/photos/nateone/3768979925/ – CC-BY by flick user nateone

    View full-size slide