New deals on data - Generating open knowledge based on closed data

New deals on data - Generating open knowledge based on closed data

Talk hold at the "Blockchain for Science 2018" (Berlin).

E00d7a8d27c399a1a688c3ab2c0e5b62?s=128

Konrad Förstner

November 05, 2018
Tweet

Transcript

  1. New deals on data – Generating open knowledge based on

    closed data Konrad U. Förstner ZB MED – Information Center for Life Sciences, Cologne, Germany & TH Köln, Cologne Germany November 5th, 2018, Blockchain for Science Con
  2. Disclaimer I have no to connection to any of the

    companies that I will be metioned here. I present my perspective as a bioinformatician and open science enthusiast. https://www.flickr.com/photos/redjar/113823307/ – CC-BY by flickr user redjar
  3. Open [data|source|*] should be the default in science. This is

    simply good scientific practice. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  4. ... but ... https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

  5. There are cases where privacy migh be a higher good

    than openess. Certain data should not be linked to individuals. https://commons.wikimedia.org/wiki/File:Masks_in_Venice.jpg CC-BY by Wikipedia user Rasevic
  6. Behavioral data https://www.flickr.com/photos/andrikoolme/27729630943 – CC-BY by flickr user andrikoolme

  7. Socio-economical status https://commons.wikimedia.org/wiki/File:Assorted_United_States_coins.jpg

  8. Medical records https://de.wikipedia.org/wiki/Datei:Blood_pressure_measurement.jpg

  9. Genome / Exome / SNPs https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

  10. None
  11. Microbiome information https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

  12. https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

  13. Having access to the such data of a large popuplation

    would significantly help research and to extend our medicial knowledge. https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
  14. On the other hand the data can be misused for

    systematic discrimination due to political, ideological and commercial interests. https://www.flickr.com/photos/22394551@N03/2226095398 CC-BY by flickr user viZZZual.com
  15. We have moral dillemma. Protect individual rights or push the

    scientific progress. https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ
  16. Similar dilemmata from other research domains • Financial data of

    organisations • Energy consumption recording of devices • Location data of vehicles https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ
  17. How can this dillemma be solved? https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia

    Commons user Fæ
  18. Can we use closed data to generate (open) knowledge? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg

    PD
  19. Can we research based on black boxed data that is

    at least reproducible? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD
  20. Can we train machine learning models on closed data? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg

    PD
  21. Or can we at least use the data to generate

    hypthesis that then can be tested with complementary methods? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD
  22. Genomics England • Aims to hold 100,000 full genomes •

    Data processing in closed data centers • Only results leave the center via an ”airlock” https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
  23. Personal Health Train (PHT) • Data stations – (”FAIRports”) •

    Trains – Workflows that can work on the data provided to them https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
  24. • Locked system • Trust of the platform required https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg

    CC-BY by Wikimedia Commons Ibirapuera
  25. (This slide was modified for online deposition - simply click

    on the link below; It is a news article that describes how 23andMe and other are selling genomic data to pharma industry.) https://www.businessinsider.de/dna-testing-delete-your-data-23andme-ancestry-2018-7
  26. Promises of blockchain-based, decentralized data marketplaces • owners have control

    over their data and can stay anonymous • standardisation of data • people can be incentivized to share the data • traceability (especially for pharmaceutical companies interesting) https://www.flickr.com/photos/katerha/4592429363 – CC-BY by flick user katerha
  27. Data market place Data consumer Access to data Token Data

    owners
  28. Concepts of underlying solutions • Fully Homomorphic Encryption (FHE) •

    Multi-party Computation (MPC) • Trusted Execution Environment (TEE) like Intel SGX https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD
  29. Blockchain Data consumer Secure compute node Private data storage Data

    owner
  30. General purpose blockchain-based solutions • Ocean protocol • Enigma (secret

    contracts) • Ekiden protocol (Oasis Labs) • OpenMind https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD
  31. Blockchain-based solutions for healthcare data • Nebula (by George Church)

    • Longenesis • Luna DNA • phrOS (Personal Health Record Operating System) • EncrypGen https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD
  32. Will these data market places improve science? https://www.flickr.com/photos/subcircle/500995147 – CC-BY

    by flickr user subcircle
  33. This has huge potential! https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user

    subcircle
  34. ... but ... https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

  35. Currently lot of white papers available – nothing openly testable.

    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  36. Discussion unfortunately driven by companies not academics. https://www.flickr.com/photos/subcircle/500995147 – CC-BY

    by flickr user subcircle
  37. High risk – you won’t get your genome back once

    it leaked. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  38. https://doi.org/10.1126/science.1229566

  39. The suggested systems have very high complexity. https://www.flickr.com/photos/subcircle/500995147 – CC-BY

    by flickr user subcircle
  40. Problem of different legal systems. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr

    user subcircle
  41. Implications for data owner/seller might be not clear – education

    needed. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  42. Data stored off-chain = outsourcing of one important problem (suggestion

    like Dropbox metioned – IMO quite a bad idea) https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  43. How to avoid false statements in surveys to become interesting

    for data consumers? https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  44. Bottom line: Very promising, but a long and hard way

    to go. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  45. What are your questions? konrad.foerstner.org / @konradfoerstner zbmed.de / @ZB_MED

    th-koeln.de / @th_koeln https://www.flickr.com/photos/nateone/3768979925/ – CC-BY by flick user nateone