Opening up Research and Data Day2 - FORCE11 Scholarly Communication Institute (FSCI)

Opening up Research and Data Day2 - FORCE11 Scholarly Communication Institute (FSCI)

FORCE11 Scholarly Communications Institute at the University of California, San Diego is a week long summer training course on improving research and communication

33325ff5fafdf8849195687d12abf30b?s=128

Gaurav Godhwani

August 01, 2017
Tweet

Transcript

  1. Opening up Research and Data FORCE11 FSCI | University of

    California, San Diego Slides Link: http://tiny.cc/fsci-mt6-2 Gaurav Godhwani | Handle: @gggodhwani Technical Lead - Open Budgets India - CBGA | Chapter Lead - DataKind Bangalore
  2. Session 3: Discuss Key Aspects related to Open Data

  3. Session Outline - Open Data Licences - Indexing, Searching and

    Reusing Open Data - Open Data Ethics and Privacy - Open Data Visualization
  4. Open Data Licences

  5. Open Data Licences An open licence allows users to do

    things like: • Republish the content or data on their own website • Derive new content or data from yours • Make money by selling products that use your content or data • Republish the content or data while charging a fee for access Source: CC-BY-SA Open Data Institue https://theodi.org/guides/publishers-guide-open-data-licensing
  6. Open Data Licences According to the open definition, there are

    only two kinds of restrictions that an open licence can place: • that reusers must give attribution to the source of the content or data • that reusers must publish any derived content or data under the same licence (this is called share-alike) Source: CC-BY-SA Open Data Institue https://theodi.org/guides/publishers-guide-open-data-licensing
  7. Open Data Licences We can choose to make data open

    under one of three levels of licence: 1. a public domain licence has no restrictions at all (technically, these indicate that you waive your rights to the content or data) 2. an attribution licence just says that reusers must give attribution to you 3. an attribution & share-alike licence says that reusers must give attribution and share any derived content or data under the same licence Source: CC-BY-SA Open Data Institue https://theodi.org/guides/publishers-guide-open-data-licensing
  8. Image Source: https://creativecommons.org/about/downloads/

  9. Creative Commons Data Licences CC0 enables users to freely build

    upon, enhance and reuse the works for any purposes without restriction under copyright or database law. Image Source: https://en.wikipedia.org/wiki/File:CC0_button.svg
  10. Creative Commons Data Licences Source: https://creativecommons.org/2012/09/12/europeana-releases-20-million-records-into-the-public-domain-using-cc0/ Europeana puts more than

    20 million records into the public domain using CC0. The Europeana dataset consists of descriptive information from a huge trove of digitized cultural and artistic works.
  11. Creative Commons Data Licences Image Source: https://www.digital-science.com/products/figshare/ Figshare has adopted

    CC0 as the default tool for researchers to share their datasets, it is a way to remove any legal doubt about whether researchers can use the data in their projects.
  12. Creative Commons Data Licences You are free to: Share —

    copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the material for any purpose, even commercially. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY_icon.svg
  13. Creative Commons Data Licences Under the following terms: Attribution —

    You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY_icon.svg
  14. Creative Commons Data Licences Under the following terms: No additional

    restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY_icon.svg
  15. Creative Commons Data Licences Image Source: ABS https://www.mobileiron.com/sites/default/files/customers/lg-svg/ABS.png Image Source:

    Data.gv.au https://pbs.twimg.com/profile_images/587786515750621184/dHscsgY0.jpg Image Source: Geoscience Australia https://sentinel.ga.gov.au/img/geoscience_inline.png
  16. Creative Commons Data Licences You are free to: Share —

    copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the material for any purpose, even commercially. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY-SA_icon.svg
  17. Creative Commons Data Licences Under the following terms: Attribution —

    You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY-SA_icon.svg
  18. Creative Commons Data Licences Under the following terms: ShareAlike —

    If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY-SA_icon.svg
  19. Creative Commons Data Licences Under the following terms: No additional

    restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY-SA_icon.svg
  20. Creative Commons Data Licences Image Source: http://wiki.dbpedia.org/ DBpedia is a

    Public Data Infrastructure for a Large, Multilingual, Semantic Knowledge Graph. DBpedia 2016-10 release consists of 13 pieces of information (RDF triples). All under CC-BY-SA 3.0
  21. Open Data Commons

  22. ODC Public Domain Dedication and License (PDDL) Image Source: https://opendatacommons.org/licenses/pddl/

  23. Open Data Commons Attribution License Image Source: https://opendatacommons.org/licenses/by/summary/

  24. Open Data Commons Open Database License (ODbL) Image Source: https://opendatacommons.org/licenses/odbl/

  25. Open Data Commons Open Database License (ODbL) Image Source: https://commons.wikimedia.org/wiki/File:Openstreetmap_logo.svg

  26. Creative Commons Or Open Data Commons? • What is the

    difference between the Open Data Commons licenses and the CC 4.0 licenses? • Why Not Use a Creative Commons (or Free/Open Source Software License) for Data(bases)?
  27. Indexing, Searching and Reusing Open Data

  28. Tim Berners-Lee’s 5-Stars of Open Data Image Source: http://ec.europa.eu/newsroom/itemdetail.cfm?item_id=27191&newsletter=126

  29. https://index.okfn.org/

  30. http://opendatabarometer.org

  31. https://www.opendatanetwork.com/

  32. http://docs.socratadiscovery.apiary.io

  33. http://dataportals.org/

  34. http://openprism.thomaslevine.com/

  35. But can we make our Open Data Searchable on Google?

  36. But can we make our Open Data searchable on Google?

    • Ensure to have a detailed Sitemap of your platform
  37. But can we make our Open Data searchable on Google?

    • Have SEO optimized Metadata Image Source: https://adwords.googleblog.com/2013/05/introducing-keyword-planner-combining.html
  38. But can we make our Open Data searchable on Google?

    • Setup a Google Webmaster Console Image Source: https://webmasters.googleblog.com/2012/05/navigation-dashboard-and-home-page.html
  39. But can we make our Open Data searchable on Google?

    • Measure and re-iterate Image Source: http://mediashift.org/2017/07/social-and-digital-certifications-what-they-are-and-why-educators-should-get-them/
  40. But can we make our Open Data searchable on Google?

    • Setups alerts for your key datasets and categories
  41. Open Data Ethics and Privacy

  42. Defining Risks “The probability of something happening multiplied by the

    resulting cost or benefit if it does” (Oxford English Dictionary) Three parts: • Cost/benefit • Probability • Subject Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  43. Risk of What? • Physical harm • Legal harm (e.g.

    jail, IP disputes) • Reputational harm • Privacy breach Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  44. Risk to Whom? • Data subjects (elections example) • Data

    collectors (conflict example) • Data processing team (military equipment example) • Person releasing the data (corruption example) • Person using the data Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  45. Personal Identifiable Information “Personally identifiable information (PII) is any data

    that could potentially identify a specific individual. Any information that can be used to distinguish one person from another and can be used for de- anonymizing anonymous data can be considered PII.” Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  46. Spotting Red Flags • Names, addresses, phone numbers • Locations:

    lat/long, GIS traces, locality (e.g. home + work as an identifier) • Members of small populations • Untranslated text • Codes (e.g. “41”) • Slang terms • Can be combined with other datasets to produce PII Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  47. Consider Partial Release to only Some Groups • Academics •

    People in your organisation • Data subjects • Release at lower granularity • Town/district level, not street • Subset or sample of data ‘rows’ • Subset of data ‘columns’ Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  48. Include locals Locals can spot: • Local languages • Local

    slang • Innocent-looking phrases Locals might also choose the risk Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  49. Open Brain Consent

  50. De-identifying Data https://responsibledata.io/forums/discussion-mini-s eries-on-de-identifying-data/

  51. Open Data Visualizations

  52. Image Source: https://github.com/apache/incubator-superset

  53. Image Source: https://eng.uber.com/deck-gl-4-0/

  54. Image Source: http://uber.github.io/deck.gl/blog/2017/rendering-minecraft-with-deckgl

  55. Session 4: Open Discussions & Use Cases

  56. http://opendata.atlas.cern/

  57. URL: https://openbudgetsindia.org Slides URL: http://tiny.cc/obi-fifthel Code: https://github.com/cbgaindia Talk: https://youtu.be/tX7-Ega6OxA

  58. Thanks Everyone