Upgrade to Pro — share decks privately, control downloads, hide ads and more …

New deals on data - Generating open knowledge based on closed data

New deals on data - Generating open knowledge based on closed data

Talk hold at the "Blockchain for Science 2018" (Berlin).

Konrad Förstner

November 05, 2018
Tweet

More Decks by Konrad Förstner

Other Decks in Science

Transcript

  1. New deals on data –
    Generating open knowledge
    based on closed data
    Konrad U. Förstner
    ZB MED – Information Center for Life Sciences, Cologne, Germany &
    TH Köln, Cologne Germany
    November 5th, 2018, Blockchain for Science Con

    View Slide

  2. Disclaimer
    I have no to connection to any of the companies
    that I will be metioned here.
    I present my perspective as a bioinformatician
    and open science enthusiast.
    https://www.flickr.com/photos/redjar/113823307/ – CC-BY by flickr user redjar

    View Slide

  3. Open [data|source|*] should be the default in science.
    This is simply good scientific practice.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  4. ... but ...
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  5. There are cases where privacy migh be
    a higher good than openess.
    Certain data should not be linked to individuals.
    https://commons.wikimedia.org/wiki/File:Masks_in_Venice.jpg CC-BY by Wikipedia user Rasevic

    View Slide

  6. Behavioral data
    https://www.flickr.com/photos/andrikoolme/27729630943 – CC-BY by flickr user andrikoolme

    View Slide

  7. Socio-economical status
    https://commons.wikimedia.org/wiki/File:Assorted_United_States_coins.jpg

    View Slide

  8. Medical records
    https://de.wikipedia.org/wiki/Datei:Blood_pressure_measurement.jpg

    View Slide

  9. Genome / Exome / SNPs
    https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

    View Slide

  10. View Slide

  11. Microbiome information
    https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

    View Slide

  12. https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

    View Slide

  13. Having access to the such data of a large popuplation
    would significantly help research and
    to extend our medicial knowledge.
    https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

    View Slide

  14. On the other hand the data can be misused for
    systematic discrimination due to
    political, ideological and commercial interests.
    https://www.flickr.com/photos/[email protected]/2226095398 CC-BY by flickr user viZZZual.com

    View Slide

  15. We have moral dillemma.
    Protect individual rights
    or
    push the scientific progress.
    https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ

    View Slide

  16. Similar dilemmata from other research domains
    • Financial data of organisations
    • Energy consumption recording of devices
    • Location data of vehicles
    https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ

    View Slide

  17. How can this dillemma be solved?
    https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ

    View Slide

  18. Can we use closed data to generate (open) knowledge?
    https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

    View Slide

  19. Can we research based on black boxed data
    that is at least reproducible?
    https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

    View Slide

  20. Can we train machine learning models on closed data?
    https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

    View Slide

  21. Or can we at least use the data to generate hypthesis
    that then can be tested with complementary methods?
    https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD

    View Slide

  22. Genomics England
    • Aims to hold 100,000 full genomes
    • Data processing in closed data centers
    • Only results leave the center via an ”airlock”
    https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

    View Slide

  23. Personal Health Train (PHT)
    • Data stations – (”FAIRports”)
    • Trains – Workflows that can work on the data
    provided to them
    https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

    View Slide

  24. • Locked system
    • Trust of the platform required
    https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera

    View Slide

  25. (This slide was modified for online deposition - simply click on
    the link below; It is a news article that describes how 23andMe
    and other are selling genomic data to pharma industry.)
    https://www.businessinsider.de/dna-testing-delete-your-data-23andme-ancestry-2018-7

    View Slide

  26. Promises of blockchain-based,
    decentralized data marketplaces
    • owners have control over their data and can stay
    anonymous
    • standardisation of data
    • people can be incentivized to share the data
    • traceability (especially for pharmaceutical
    companies interesting)
    https://www.flickr.com/photos/katerha/4592429363 – CC-BY by flick user katerha

    View Slide

  27. Data market place
    Data
    consumer
    Access to data
    Token
    Data
    owners

    View Slide

  28. Concepts of underlying solutions
    • Fully Homomorphic Encryption (FHE)
    • Multi-party Computation (MPC)
    • Trusted Execution Environment (TEE) like Intel SGX
    https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD

    View Slide

  29. Blockchain
    Data
    consumer
    Secure
    compute
    node
    Private
    data
    storage
    Data
    owner

    View Slide

  30. General purpose blockchain-based solutions
    • Ocean protocol
    • Enigma (secret contracts)
    • Ekiden protocol (Oasis Labs)
    • OpenMind
    https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD

    View Slide

  31. Blockchain-based solutions for healthcare data
    • Nebula (by George Church)
    • Longenesis
    • Luna DNA
    • phrOS (Personal Health Record Operating System)
    • EncrypGen
    https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD

    View Slide

  32. Will these data market places improve science?
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  33. This has huge potential!
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  34. ... but ...
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  35. Currently lot of white papers available –
    nothing openly testable.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  36. Discussion unfortunately driven
    by companies not academics.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  37. High risk –
    you won’t get your genome back once it leaked.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  38. https://doi.org/10.1126/science.1229566

    View Slide

  39. The suggested systems have very high complexity.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  40. Problem of different legal systems.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  41. Implications for data owner/seller might be not clear
    – education needed.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  42. Data stored off-chain
    =
    outsourcing of one important problem
    (suggestion like Dropbox metioned –
    IMO quite a bad idea)
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  43. How to avoid false statements in surveys
    to become interesting for data consumers?
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  44. Bottom line:
    Very promising,
    but a long and hard way to go.
    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

    View Slide

  45. What are your questions?
    konrad.foerstner.org / @konradfoerstner
    zbmed.de / @ZB_MED
    th-koeln.de / @th_koeln
    https://www.flickr.com/photos/nateone/3768979925/ – CC-BY by flick user nateone

    View Slide