Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trusty URIs

Trusty URIs

Jonathan Wallace

January 07, 2015
Tweet

More Decks by Jonathan Wallace

Other Decks in Technology

Transcript

  1. Trusty URIs Hi everyone, Thanks for coming to my talk

    on trusty URIs. My name is Jonathan Wallace and I’m going to share how we can bring trust back to the internet. This talk is based on a paper written by Tobias Kuhn and Michel Dumontier.
  2. Intro • Why does this matter • Trusty URI Requirements

    • How it works • Further explorations First, I’m going to present a problem that we’ve both experienced on the internet.
  3. Intro • Why does this matter • Trusty URI Requirements

    • How it works • Further explorations Then we’re going to talk about what is necessary for a trusty URI. What is a trusty URI and how does it work?
  4. Intro • Trusty URI Requirements • Why does this matter

    • How it works • Further explorations Then we’ll explore the high level details of implementing the trusty URI algorithm.
  5. Intro • Trusty URI Requirements • Why does this matter

    • How it works • Further explorations Finally, I’ll share where you can go to learn more about and help contribute.
  6. Why does this matter So, why does this matter? The

    authors bring up the context of nano-publications in scientific publishing. I don’t know anything about this arena but the scientific community cares about verifiability, immutability and permanence.
  7. Why does this matter time t There’s another context. Did

    you know that the United States Supreme Court engages in retconning. If you’re not familiar with the term, this means that the Supreme Court will issue a decision at point t.
  8. Why does this matter time t + x At some

    future point in time t + x, when they issue another decision, they will go back and change the content of their decision at point t to ensure continuity and conceptual integrity with their decision at the original point t. This matters.
  9. “The only way the public can identify most changes is

    by painstaking comparison of early versions of decisions to ones published years later.” http://www.nytimes.com/2014/05/25/us/final-word-on-us- law-isnt-supreme-court-keeps-editing.html?_r=0 Wouldn’t it be great if you didn’t have to do painstaking comparisons by hand?
  10. Why does this matter https://twitter.com/scotus_servo Luckily someone has already done

    this for us w/r/t the Supreme Court. But how are you supposed to know when changes have occurred? Wouldn’t it be great to know that the law has changed by examining tiny little hash outputs? Or that a web page has changed by examining tiny little hash outputs?
  11. What is a URI? e.g. “http://example.org/wiki/Main_Page” (URL) and “ISBN 0-486-27557-4”

    (URN) URIs are a string of characters used to identify a name of a resource. A URL is a URI that specifies the protocol and location.
  12. Requirements • Verifiable • Immutable • Permanent First we’re going

    to talk about what is necessary for a trusty URI.
  13. Verifiable We’re going to use a hash algorithm. Simply put

    a hash algorithm is some code that takes a bunch input and converts it into a small piece of output. If you change one tiny piece of the input, the small output will change greatly.
  14. Verifiable Most importantly, if I give you the output, it

    is hard, if not impossible to determine the input. That makes a good hash algorithm. So to say a URI is verifiable means that you can compute the hash output for the content of the URI.
  15. Immutable By virtue of using a hash algorithm, if you

    change the content, the input i.e., the content of the URI, the hash out put will change. So we have immutability.
  16. Permanent Here we’re going to cheat a little. We all

    know that search engines crawl the web and cache content. By examining the cached URI, we’ll have “permanence.” In other words, if the original location is no longer available, we’ll have other places to retrieve the content.
  17. How does this work • Module ID • Artifact Code

    We’re going to focus on byte content of files though the authors go into detail about RDF, something with which I don’t have a ton of experience. There’s two parts to the trusty URI that are relevant.
  18. dis The paper goes into detail concerning how the content

    can contain self-references. Essentially, they use place holders when computing the hash value and then replace those place holders with the computed value.
  19. Links • https://github.com/trustyuri • https://twitter.com/scotus_servo • http://www.nytimes.com/2014/05/25/us/final-word-on-us-law-isnt- supreme-court-keeps-editing.html?_r=0 • https://gigaom.com/2014/06/12/clever-piece-of-code-exposes-

    hidden-changes-to-supreme-court-opinions/ • http://2014.eswc-conferences.org/sites/default/files/papers/ paper_106.pdf • http://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you %27re_a_dog Check out the github organization. They have perl, python and java implementations. I was hoping to have a ruby version completed by this talk but I’m not quite there yet.