$30 off During Our Annual Pro Sale. View Details »

New deals on data - Generating open knowledge based on closed data

New deals on data - Generating open knowledge based on closed data

Talk hold at the "Blockchain for Science 2018" (Berlin).

Konrad Förstner

November 05, 2018
Tweet

More Decks by Konrad Förstner

Other Decks in Science

Transcript

  1. New deals on data – Generating open knowledge based on

    closed data Konrad U. Förstner ZB MED – Information Center for Life Sciences, Cologne, Germany & TH Köln, Cologne Germany November 5th, 2018, Blockchain for Science Con
  2. Disclaimer I have no to connection to any of the

    companies that I will be metioned here. I present my perspective as a bioinformatician and open science enthusiast. https://www.flickr.com/photos/redjar/113823307/ – CC-BY by flickr user redjar
  3. Open [data|source|*] should be the default in science. This is

    simply good scientific practice. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  4. ... but ... https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

  5. There are cases where privacy migh be a higher good

    than openess. Certain data should not be linked to individuals. https://commons.wikimedia.org/wiki/File:Masks_in_Venice.jpg CC-BY by Wikipedia user Rasevic
  6. Behavioral data https://www.flickr.com/photos/andrikoolme/27729630943 – CC-BY by flickr user andrikoolme

  7. Socio-economical status https://commons.wikimedia.org/wiki/File:Assorted_United_States_coins.jpg

  8. Medical records https://de.wikipedia.org/wiki/Datei:Blood_pressure_measurement.jpg

  9. Genome / Exome / SNPs https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

  10. None
  11. Microbiome information https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

  12. https://de.wikipedia.org/wiki/Datei:DNA_human_male_chromosomes.gif

  13. Having access to the such data of a large popuplation

    would significantly help research and to extend our medicial knowledge. https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
  14. On the other hand the data can be misused for

    systematic discrimination due to political, ideological and commercial interests. https://www.flickr.com/photos/22394551@N03/2226095398 CC-BY by flickr user viZZZual.com
  15. We have moral dillemma. Protect individual rights or push the

    scientific progress. https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ
  16. Similar dilemmata from other research domains • Financial data of

    organisations • Energy consumption recording of devices • Location data of vehicles https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia Commons user Fæ
  17. How can this dillemma be solved? https://commons.wikimedia.org/wiki/File:Apothecary%27s_balance_with... CC-BY by Wikimedia

    Commons user Fæ
  18. Can we use closed data to generate (open) knowledge? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg

    PD
  19. Can we research based on black boxed data that is

    at least reproducible? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD
  20. Can we train machine learning models on closed data? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg

    PD
  21. Or can we at least use the data to generate

    hypthesis that then can be tested with complementary methods? https://commons.wikimedia.org/wiki/File:Eiserne_Truhe_Museum_Senftenberg.jpg PD
  22. Genomics England • Aims to hold 100,000 full genomes •

    Data processing in closed data centers • Only results leave the center via an ”airlock” https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
  23. Personal Health Train (PHT) • Data stations – (”FAIRports”) •

    Trains – Workflows that can work on the data provided to them https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg CC-BY by Wikimedia Commons Ibirapuera
  24. • Locked system • Trust of the platform required https://de.wikipedia.org/wiki/Datei:Crowd_at_Knebworth_House_-_Rolling_Stones_1976.jpg

    CC-BY by Wikimedia Commons Ibirapuera
  25. (This slide was modified for online deposition - simply click

    on the link below; It is a news article that describes how 23andMe and other are selling genomic data to pharma industry.) https://www.businessinsider.de/dna-testing-delete-your-data-23andme-ancestry-2018-7
  26. Promises of blockchain-based, decentralized data marketplaces • owners have control

    over their data and can stay anonymous • standardisation of data • people can be incentivized to share the data • traceability (especially for pharmaceutical companies interesting) https://www.flickr.com/photos/katerha/4592429363 – CC-BY by flick user katerha
  27. Data market place Data consumer Access to data Token Data

    owners
  28. Concepts of underlying solutions • Fully Homomorphic Encryption (FHE) •

    Multi-party Computation (MPC) • Trusted Execution Environment (TEE) like Intel SGX https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD
  29. Blockchain Data consumer Secure compute node Private data storage Data

    owner
  30. General purpose blockchain-based solutions • Ocean protocol • Enigma (secret

    contracts) • Ekiden protocol (Oasis Labs) • OpenMind https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD
  31. Blockchain-based solutions for healthcare data • Nebula (by George Church)

    • Longenesis • Luna DNA • phrOS (Personal Health Record Operating System) • EncrypGen https://unsplash.com/@toddquackenbush?photo=IClZBVw5W5A - PD
  32. Will these data market places improve science? https://www.flickr.com/photos/subcircle/500995147 – CC-BY

    by flickr user subcircle
  33. This has huge potential! https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user

    subcircle
  34. ... but ... https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle

  35. Currently lot of white papers available – nothing openly testable.

    https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  36. Discussion unfortunately driven by companies not academics. https://www.flickr.com/photos/subcircle/500995147 – CC-BY

    by flickr user subcircle
  37. High risk – you won’t get your genome back once

    it leaked. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  38. https://doi.org/10.1126/science.1229566

  39. The suggested systems have very high complexity. https://www.flickr.com/photos/subcircle/500995147 – CC-BY

    by flickr user subcircle
  40. Problem of different legal systems. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr

    user subcircle
  41. Implications for data owner/seller might be not clear – education

    needed. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  42. Data stored off-chain = outsourcing of one important problem (suggestion

    like Dropbox metioned – IMO quite a bad idea) https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  43. How to avoid false statements in surveys to become interesting

    for data consumers? https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  44. Bottom line: Very promising, but a long and hard way

    to go. https://www.flickr.com/photos/subcircle/500995147 – CC-BY by flickr user subcircle
  45. What are your questions? konrad.foerstner.org / @konradfoerstner zbmed.de / @ZB_MED

    th-koeln.de / @th_koeln https://www.flickr.com/photos/nateone/3768979925/ – CC-BY by flick user nateone