Slide 1

Slide 1 text

Trusty URIs Hi everyone, Thanks for coming to my talk on trusty URIs. My name is Jonathan Wallace and I’m going to share how we can bring trust back to the internet. This talk is based on a paper written by Tobias Kuhn and Michel Dumontier.

Slide 2

Slide 2 text

Intro • Why does this matter • Trusty URI Requirements • How it works • Further explorations First, I’m going to present a problem that we’ve both experienced on the internet.

Slide 3

Slide 3 text

Intro • Why does this matter • Trusty URI Requirements • How it works • Further explorations Then we’re going to talk about what is necessary for a trusty URI. What is a trusty URI and how does it work?

Slide 4

Slide 4 text

Intro • Trusty URI Requirements • Why does this matter • How it works • Further explorations Then we’ll explore the high level details of implementing the trusty URI algorithm.

Slide 5

Slide 5 text

Intro • Trusty URI Requirements • Why does this matter • How it works • Further explorations Finally, I’ll share where you can go to learn more about and help contribute.

Slide 6

Slide 6 text

Why does this matter So, why does this matter? The authors bring up the context of nano-publications in scientific publishing. I don’t know anything about this arena but the scientific community cares about verifiability, immutability and permanence.

Slide 7

Slide 7 text

Trust The question is “Can you trust what you read on the internet?”

Slide 8

Slide 8 text

Dog This comic was published in 1993 in the New Yorker.

Slide 9

Slide 9 text

Dog How do you know what to trust?

Slide 10

Slide 10 text

Snopes! http://www.snopes.com/info/whatsnew.asp How do you know that snopes hasn’t changed or edited information that you’ve read recently?

Slide 11

Slide 11 text

You don’t. Short answer.

Slide 12

Slide 12 text

Why does this matter time t There’s another context. Did you know that the United States Supreme Court engages in retconning. If you’re not familiar with the term, this means that the Supreme Court will issue a decision at point t.

Slide 13

Slide 13 text

Why does this matter time t + x At some future point in time t + x, when they issue another decision, they will go back and change the content of their decision at point t to ensure continuity and conceptual integrity with their decision at the original point t. This matters.

Slide 14

Slide 14 text

“The only way the public can identify most changes is by painstaking comparison of early versions of decisions to ones published years later.” http://www.nytimes.com/2014/05/25/us/final-word-on-us- law-isnt-supreme-court-keeps-editing.html?_r=0 Wouldn’t it be great if you didn’t have to do painstaking comparisons by hand?

Slide 15

Slide 15 text

Why does this matter https://twitter.com/scotus_servo Luckily someone has already done this for us w/r/t the Supreme Court. But how are you supposed to know when changes have occurred? Wouldn’t it be great to know that the law has changed by examining tiny little hash outputs? Or that a web page has changed by examining tiny little hash outputs?

Slide 16

Slide 16 text

Requirements Let’s talk about the requirements for a URI to be considered ‘trusty’.

Slide 17

Slide 17 text

What is a URI? e.g. “http://example.org/wiki/Main_Page” (URL) and “ISBN 0-486-27557-4” (URN) URIs are a string of characters used to identify a name of a resource. A URL is a URI that specifies the protocol and location.

Slide 18

Slide 18 text

Requirements • Verifiable • Immutable • Permanent First we’re going to talk about what is necessary for a trusty URI.

Slide 19

Slide 19 text

Verifiable We’re going to use a hash algorithm. Simply put a hash algorithm is some code that takes a bunch input and converts it into a small piece of output. If you change one tiny piece of the input, the small output will change greatly.

Slide 20

Slide 20 text

require ‘digest' file_name = “test_file.html" dig = Digest::SHA256.file(file_name).hexdigest

Slide 21

Slide 21 text

here's my html page

d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

Slide 22

Slide 22 text

Here's my html page

8bf9f3fccf34b02019fefbbe524a2a2a8607598ae3264f15 d3e10dced1f3cad9

Slide 23

Slide 23 text

Verifiable Most importantly, if I give you the output, it is hard, if not impossible to determine the input. That makes a good hash algorithm. So to say a URI is verifiable means that you can compute the hash output for the content of the URI.

Slide 24

Slide 24 text

d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

Slide 25

Slide 25 text

here's my html page

d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

Slide 26

Slide 26 text

Immutable By virtue of using a hash algorithm, if you change the content, the input i.e., the content of the URI, the hash out put will change. So we have immutability.

Slide 27

Slide 27 text

Permanent Here we’re going to cheat a little. We all know that search engines crawl the web and cache content. By examining the cached URI, we’ll have “permanence.” In other words, if the original location is no longer available, we’ll have other places to retrieve the content.

Slide 28

Slide 28 text

How does this work Let’s talk about how trusty URIs work.

Slide 29

Slide 29 text

How does this work • Module ID • Artifact Code We’re going to focus on byte content of files though the authors go into detail about RDF, something with which I don’t have a ton of experience. There’s two parts to the trusty URI that are relevant.

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

dis The paper goes into detail concerning how the content can contain self-references. Essentially, they use place holders when computing the hash value and then replace those place holders with the computed value.

Slide 33

Slide 33 text

Further explorations

Slide 34

Slide 34 text

Links • https://github.com/trustyuri • https://twitter.com/scotus_servo • http://www.nytimes.com/2014/05/25/us/final-word-on-us-law-isnt- supreme-court-keeps-editing.html?_r=0 • https://gigaom.com/2014/06/12/clever-piece-of-code-exposes- hidden-changes-to-supreme-court-opinions/ • http://2014.eswc-conferences.org/sites/default/files/papers/ paper_106.pdf • http://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you %27re_a_dog Check out the github organization. They have perl, python and java implementations. I was hoping to have a ruby version completed by this talk but I’m not quite there yet.