Trusty URIs

Trusty URIs Hi everyone, Thanks for coming to my talk
on trusty URIs. My name is Jonathan Wallace and I’m going to share how we can bring trust back to the internet. This talk is based on a paper written by Tobias Kuhn and Michel Dumontier.

Intro • Why does this matter • Trusty URI Requirements
• How it works • Further explorations First, I’m going to present a problem that we’ve both experienced on the internet.

Intro • Why does this matter • Trusty URI Requirements
• How it works • Further explorations Then we’re going to talk about what is necessary for a trusty URI. What is a trusty URI and how does it work?

Intro • Trusty URI Requirements • Why does this matter
• How it works • Further explorations Then we’ll explore the high level details of implementing the trusty URI algorithm.

Intro • Trusty URI Requirements • Why does this matter
• How it works • Further explorations Finally, I’ll share where you can go to learn more about and help contribute.

Why does this matter So, why does this matter? The
authors bring up the context of nano-publications in scientific publishing. I don’t know anything about this arena but the scientific community cares about verifiability, immutability and permanence.

Trust The question is “Can you trust what you read
on the internet?”

Dog This comic was published in 1993 in the New
Yorker.

Dog How do you know what to trust?

Snopes! http://www.snopes.com/info/whatsnew.asp How do you know that snopes hasn’t changed
or edited information that you’ve read recently?

You don’t. Short answer.

Why does this matter time t There’s another context. Did
you know that the United States Supreme Court engages in retconning. If you’re not familiar with the term, this means that the Supreme Court will issue a decision at point t.

Why does this matter time t + x At some
future point in time t + x, when they issue another decision, they will go back and change the content of their decision at point t to ensure continuity and conceptual integrity with their decision at the original point t. This matters.

“The only way the public can identify most changes is
by painstaking comparison of early versions of decisions to ones published years later.” http://www.nytimes.com/2014/05/25/us/ﬁnal-word-on-us- law-isnt-supreme-court-keeps-editing.html?_r=0 Wouldn’t it be great if you didn’t have to do painstaking comparisons by hand?

Why does this matter https://twitter.com/scotus_servo Luckily someone has already done
this for us w/r/t the Supreme Court. But how are you supposed to know when changes have occurred? Wouldn’t it be great to know that the law has changed by examining tiny little hash outputs? Or that a web page has changed by examining tiny little hash outputs?

Requirements Let’s talk about the requirements for a URI to
be considered ‘trusty’.

What is a URI? e.g. “http://example.org/wiki/Main_Page” (URL) and “ISBN 0-486-27557-4”
(URN) URIs are a string of characters used to identify a name of a resource. A URL is a URI that specifies the protocol and location.

Requirements • Veriﬁable • Immutable • Permanent First we’re going
to talk about what is necessary for a trusty URI.

Veriﬁable We’re going to use a hash algorithm. Simply put
a hash algorithm is some code that takes a bunch input and converts it into a small piece of output. If you change one tiny piece of the input, the small output will change greatly.

require ‘digest' file_name = “test_file.html" dig = Digest::SHA256.file(file_name).hexdigest

<h1> here's my html page </h1> d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

<h1> Here's my html page </h1> 8bf9f3fccf34b02019fefbbe524a2a2a8607598ae3264f15 d3e10dced1f3cad9

Veriﬁable Most importantly, if I give you the output, it
is hard, if not impossible to determine the input. That makes a good hash algorithm. So to say a URI is verifiable means that you can compute the hash output for the content of the URI.

d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

<h1> here's my html page </h1> d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

Immutable By virtue of using a hash algorithm, if you
change the content, the input i.e., the content of the URI, the hash out put will change. So we have immutability.

Permanent Here we’re going to cheat a little. We all
know that search engines crawl the web and cache content. By examining the cached URI, we’ll have “permanence.” In other words, if the original location is no longer available, we’ll have other places to retrieve the content.

How does this work Let’s talk about how trusty URIs
work.

How does this work • Module ID • Artifact Code
We’re going to focus on byte content of files though the authors go into detail about RDF, something with which I don’t have a ton of experience. There’s two parts to the trusty URI that are relevant.

dis The paper goes into detail concerning how the content
can contain self-references. Essentially, they use place holders when computing the hash value and then replace those place holders with the computed value.

Further explorations

Links • https://github.com/trustyuri • https://twitter.com/scotus_servo • http://www.nytimes.com/2014/05/25/us/ﬁnal-word-on-us-law-isnt- supreme-court-keeps-editing.html?_r=0 • https://gigaom.com/2014/06/12/clever-piece-of-code-exposes-
hidden-changes-to-supreme-court-opinions/ • http://2014.eswc-conferences.org/sites/default/ﬁles/papers/ paper_106.pdf • http://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you %27re_a_dog Check out the github organization. They have perl, python and java implementations. I was hoping to have a ruby version completed by this talk but I’m not quite there yet.

Trusty URIs

Trusty URIs

Jonathan Wallace

More Decks by Jonathan Wallace

Other Decks in Technology

Featured

Transcript

Trusty URIs Hi everyone, Thanks for coming to my talk

Intro • Why does this matter • Trusty URI Requirements

Intro • Why does this matter • Trusty URI Requirements

Intro • Trusty URI Requirements • Why does this matter

Intro • Trusty URI Requirements • Why does this matter

Why does this matter So, why does this matter? The

Trust The question is “Can you trust what you read

Dog This comic was published in 1993 in the New

Dog How do you know what to trust?

Snopes! http://www.snopes.com/info/whatsnew.asp How do you know that snopes hasn’t changed

You don’t. Short answer.

Why does this matter time t There’s another context. Did

Why does this matter time t + x At some

“The only way the public can identify most changes is

Why does this matter https://twitter.com/scotus_servo Luckily someone has already done

Requirements Let’s talk about the requirements for a URI to

What is a URI? e.g. “http://example.org/wiki/Main_Page” (URL) and “ISBN 0-486-27557-4”

Requirements • Veriﬁable • Immutable • Permanent First we’re going

Veriﬁable We’re going to use a hash algorithm. Simply put

require ‘digest' file_name = “test_file.html" dig = Digest::SHA256.file(file_name).hexdigest

<h1> here's my html page </h1> d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

<h1> Here's my html page </h1> 8bf9f3fccf34b02019fefbbe524a2a2a8607598ae3264f15 d3e10dced1f3cad9

Veriﬁable Most importantly, if I give you the output, it

d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

<h1> here's my html page </h1> d7f8dab3800e904bc2b70287d23a764fba07952824d23e b0566a0d2eb57f4bee

Immutable By virtue of using a hash algorithm, if you

Permanent Here we’re going to cheat a little. We all

How does this work Let’s talk about how trusty URIs

How does this work • Module ID • Artifact Code

dis The paper goes into detail concerning how the content

Further explorations

Links • https://github.com/trustyuri • https://twitter.com/scotus_servo • http://www.nytimes.com/2014/05/25/us/ﬁnal-word-on-us-law-isnt- supreme-court-keeps-editing.html?_r=0 • https://gigaom.com/2014/06/12/clever-piece-of-code-exposes-