Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Opening up Research and Data Day2 - FORCE11 Sch...

Opening up Research and Data Day2 - FORCE11 Scholarly Communication Institute (FSCI)

FORCE11 Scholarly Communications Institute at the University of California, San Diego is a week long summer training course on improving research and communication

Gaurav Godhwani

August 01, 2017
Tweet

More Decks by Gaurav Godhwani

Other Decks in Research

Transcript

  1. Opening up Research and Data FORCE11 FSCI | University of

    California, San Diego Slides Link: http://tiny.cc/fsci-mt6-2 Gaurav Godhwani | Handle: @gggodhwani Technical Lead - Open Budgets India - CBGA | Chapter Lead - DataKind Bangalore
  2. Session Outline - Open Data Licences - Indexing, Searching and

    Reusing Open Data - Open Data Ethics and Privacy - Open Data Visualization
  3. Open Data Licences An open licence allows users to do

    things like: • Republish the content or data on their own website • Derive new content or data from yours • Make money by selling products that use your content or data • Republish the content or data while charging a fee for access Source: CC-BY-SA Open Data Institue https://theodi.org/guides/publishers-guide-open-data-licensing
  4. Open Data Licences According to the open definition, there are

    only two kinds of restrictions that an open licence can place: • that reusers must give attribution to the source of the content or data • that reusers must publish any derived content or data under the same licence (this is called share-alike) Source: CC-BY-SA Open Data Institue https://theodi.org/guides/publishers-guide-open-data-licensing
  5. Open Data Licences We can choose to make data open

    under one of three levels of licence: 1. a public domain licence has no restrictions at all (technically, these indicate that you waive your rights to the content or data) 2. an attribution licence just says that reusers must give attribution to you 3. an attribution & share-alike licence says that reusers must give attribution and share any derived content or data under the same licence Source: CC-BY-SA Open Data Institue https://theodi.org/guides/publishers-guide-open-data-licensing
  6. Creative Commons Data Licences CC0 enables users to freely build

    upon, enhance and reuse the works for any purposes without restriction under copyright or database law. Image Source: https://en.wikipedia.org/wiki/File:CC0_button.svg
  7. Creative Commons Data Licences Source: https://creativecommons.org/2012/09/12/europeana-releases-20-million-records-into-the-public-domain-using-cc0/ Europeana puts more than

    20 million records into the public domain using CC0. The Europeana dataset consists of descriptive information from a huge trove of digitized cultural and artistic works.
  8. Creative Commons Data Licences Image Source: https://www.digital-science.com/products/figshare/ Figshare has adopted

    CC0 as the default tool for researchers to share their datasets, it is a way to remove any legal doubt about whether researchers can use the data in their projects.
  9. Creative Commons Data Licences You are free to: Share —

    copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the material for any purpose, even commercially. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY_icon.svg
  10. Creative Commons Data Licences Under the following terms: Attribution —

    You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY_icon.svg
  11. Creative Commons Data Licences Under the following terms: No additional

    restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY_icon.svg
  12. Creative Commons Data Licences Image Source: ABS https://www.mobileiron.com/sites/default/files/customers/lg-svg/ABS.png Image Source:

    Data.gv.au https://pbs.twimg.com/profile_images/587786515750621184/dHscsgY0.jpg Image Source: Geoscience Australia https://sentinel.ga.gov.au/img/geoscience_inline.png
  13. Creative Commons Data Licences You are free to: Share —

    copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the material for any purpose, even commercially. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY-SA_icon.svg
  14. Creative Commons Data Licences Under the following terms: Attribution —

    You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY-SA_icon.svg
  15. Creative Commons Data Licences Under the following terms: ShareAlike —

    If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY-SA_icon.svg
  16. Creative Commons Data Licences Under the following terms: No additional

    restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. Image Source: https://commons.wikimedia.org/wiki/File:CC-BY-SA_icon.svg
  17. Creative Commons Data Licences Image Source: http://wiki.dbpedia.org/ DBpedia is a

    Public Data Infrastructure for a Large, Multilingual, Semantic Knowledge Graph. DBpedia 2016-10 release consists of 13 pieces of information (RDF triples). All under CC-BY-SA 3.0
  18. Creative Commons Or Open Data Commons? • What is the

    difference between the Open Data Commons licenses and the CC 4.0 licenses? • Why Not Use a Creative Commons (or Free/Open Source Software License) for Data(bases)?
  19. But can we make our Open Data searchable on Google?

    • Ensure to have a detailed Sitemap of your platform
  20. But can we make our Open Data searchable on Google?

    • Have SEO optimized Metadata Image Source: https://adwords.googleblog.com/2013/05/introducing-keyword-planner-combining.html
  21. But can we make our Open Data searchable on Google?

    • Setup a Google Webmaster Console Image Source: https://webmasters.googleblog.com/2012/05/navigation-dashboard-and-home-page.html
  22. But can we make our Open Data searchable on Google?

    • Measure and re-iterate Image Source: http://mediashift.org/2017/07/social-and-digital-certifications-what-they-are-and-why-educators-should-get-them/
  23. But can we make our Open Data searchable on Google?

    • Setups alerts for your key datasets and categories
  24. Defining Risks “The probability of something happening multiplied by the

    resulting cost or benefit if it does” (Oxford English Dictionary) Three parts: • Cost/benefit • Probability • Subject Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  25. Risk of What? • Physical harm • Legal harm (e.g.

    jail, IP disputes) • Reputational harm • Privacy breach Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  26. Risk to Whom? • Data subjects (elections example) • Data

    collectors (conflict example) • Data processing team (military equipment example) • Person releasing the data (corruption example) • Person using the data Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  27. Personal Identifiable Information “Personally identifiable information (PII) is any data

    that could potentially identify a specific individual. Any information that can be used to distinguish one person from another and can be used for de- anonymizing anonymous data can be considered PII.” Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  28. Spotting Red Flags • Names, addresses, phone numbers • Locations:

    lat/long, GIS traces, locality (e.g. home + work as an identifier) • Members of small populations • Untranslated text • Codes (e.g. “41”) • Slang terms • Can be combined with other datasets to produce PII Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  29. Consider Partial Release to only Some Groups • Academics •

    People in your organisation • Data subjects • Release at lower granularity • Town/district level, not street • Subset or sample of data ‘rows’ • Subset of data ‘columns’ Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data
  30. Include locals Locals can spot: • Local languages • Local

    slang • Innocent-looking phrases Locals might also choose the risk Source: Sara-Jayne Terp https://www.slideshare.net/bodacea/risks-and-mitigations-of-releasing-data