Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trusty URIs

Trusty URIs


Jonathan Wallace

January 07, 2015


  1. Trusty URIs Hi everyone, Thanks for coming to my talk

    on trusty URIs. My name is Jonathan Wallace and I’m going to share how we can bring trust back to the internet. This talk is based on a paper written by Tobias Kuhn and Michel Dumontier.
  2. Intro • Why does this matter • Trusty URI Requirements

    • How it works • Further explorations First, I’m going to present a problem that we’ve both experienced on the internet.
  3. Intro • Why does this matter • Trusty URI Requirements

    • How it works • Further explorations Then we’re going to talk about what is necessary for a trusty URI. What is a trusty URI and how does it work?
  4. Intro • Trusty URI Requirements • Why does this matter

    • How it works • Further explorations Then we’ll explore the high level details of implementing the trusty URI algorithm.
  5. Intro • Trusty URI Requirements • Why does this matter

    • How it works • Further explorations Finally, I’ll share where you can go to learn more about and help contribute.
  6. Why does this matter So, why does this matter? The

    authors bring up the context of nano-publications in scientific publishing. I don’t know anything about this arena but the scientific community cares about verifiability, immutability and permanence.
  7. Trust The question is “Can you trust what you read

    on the internet?”
  8. Dog This comic was published in 1993 in the New

  9. Dog How do you know what to trust?

  10. Snopes! http://www.snopes.com/info/whatsnew.asp How do you know that snopes hasn’t changed

    or edited information that you’ve read recently?
  11. You don’t. Short answer.

  12. Why does this matter time t There’s another context. Did

    you know that the United States Supreme Court engages in retconning. If you’re not familiar with the term, this means that the Supreme Court will issue a decision at point t.
  13. Why does this matter time t + x At some

    future point in time t + x, when they issue another decision, they will go back and change the content of their decision at point t to ensure continuity and conceptual integrity with their decision at the original point t. This matters.
  14. “The only way the public can identify most changes is

    by painstaking comparison of early versions of decisions to ones published years later.” http://www.nytimes.com/2014/05/25/us/final-word-on-us- law-isnt-supreme-court-keeps-editing.html?_r=0 Wouldn’t it be great if you didn’t have to do painstaking comparisons by hand?
  15. Why does this matter https://twitter.com/scotus_servo Luckily someone has already done

    this for us w/r/t the Supreme Court. But how are you supposed to know when changes have occurred? Wouldn’t it be great to know that the law has changed by examining tiny little hash outputs? Or that a web page has changed by examining tiny little hash outputs?
  16. Requirements Let’s talk about the requirements for a URI to

    be considered ‘trusty’.
  17. What is a URI? e.g. “http://example.org/wiki/Main_Page” (URL) and “ISBN 0-486-27557-4”

    (URN) URIs are a string of characters used to identify a name of a resource. A URL is a URI that specifies the protocol and location.
  18. Requirements • Verifiable • Immutable • Permanent First we’re going

    to talk about what is necessary for a trusty URI.
  19. Verifiable We’re going to use a hash algorithm. Simply put

    a hash algorithm is some code that takes a bunch input and converts it into a small piece of output. If you change one tiny piece of the input, the small output will change greatly.
  20. require ‘digest' file_name = “test_file.html" dig = Digest::SHA256.file(file_name).hexdigest

  21. <h1> here's my html page </h1> d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

  22. <h1> Here's my html page </h1> 8bf9f3fccf34b02019fefbbe524a2a2a8607598ae3264f15 d3e10dced1f3cad9

  23. Verifiable Most importantly, if I give you the output, it

    is hard, if not impossible to determine the input. That makes a good hash algorithm. So to say a URI is verifiable means that you can compute the hash output for the content of the URI.
  24. d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

  25. <h1> here's my html page </h1> d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

  26. Immutable By virtue of using a hash algorithm, if you

    change the content, the input i.e., the content of the URI, the hash out put will change. So we have immutability.
  27. Permanent Here we’re going to cheat a little. We all

    know that search engines crawl the web and cache content. By examining the cached URI, we’ll have “permanence.” In other words, if the original location is no longer available, we’ll have other places to retrieve the content.
  28. How does this work Let’s talk about how trusty URIs

  29. How does this work • Module ID • Artifact Code

    We’re going to focus on byte content of files though the authors go into detail about RDF, something with which I don’t have a ton of experience. There’s two parts to the trusty URI that are relevant.
  30. None
  31. None
  32. dis The paper goes into detail concerning how the content

    can contain self-references. Essentially, they use place holders when computing the hash value and then replace those place holders with the computed value.
  33. Further explorations

  34. Links • https://github.com/trustyuri • https://twitter.com/scotus_servo • http://www.nytimes.com/2014/05/25/us/final-word-on-us-law-isnt- supreme-court-keeps-editing.html?_r=0 • https://gigaom.com/2014/06/12/clever-piece-of-code-exposes-

    hidden-changes-to-supreme-court-opinions/ • http://2014.eswc-conferences.org/sites/default/files/papers/ paper_106.pdf • http://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you %27re_a_dog Check out the github organization. They have perl, python and java implementations. I was hoping to have a ruby version completed by this talk but I’m not quite there yet.