Slide 1

Slide 1 text

URLs: In Plain View Mahmoud Hashemi May, 2017

Slide 2

Slide 2 text

I love URLs The most advanced technology to reach the masses. Ever.

Slide 3

Slide 3 text

A piece of the web for everyone ◉ Kids ◉ Santa ◉ Dads

Slide 4

Slide 4 text

Locating the URL Frontend Backend Non-web $ git clone [email protected]:mahmoud/boltons.git

Slide 5

Slide 5 text

URLs are everywhere The Internet is very leaky.

Slide 6

Slide 6 text

“ “Some long ass link you are somehow suppose to fit into the address bar.” fnaffoxy2916 Defined Feb 17, 2017 urbandictionary.com

Slide 7

Slide 7 text

Those Three Words Every Browser Understands Uniform Uniformity means the mechanism stays the same, even if the types of resources differ. Resource A resource can be anything, even dynamic content, representing a consistent concept. Locator Locators are more than just identifiers; they have directions for network lookup. URLs are like a treasure map every browser can read.

Slide 8

Slide 8 text

The history is long.com ◉ 1992 - W3 hypertext names ◉ 1994 - RFC 1630, 1736, 1737, 1738 ◉ 1995 - RFC 1808 ◉ 1997 - RFC 2141 ◉ 1998 - RFC 2396, 2368 ◉ 1999 - RFC 2732 ◉ 2002 - RFC 3305 ◉ 2005 - RFC 3986 (the gold standard) ◉ 2013 - RFC 6874 ◉ 2014 - RFC 7320 ◉ 2017 - WHATWG document (the browser bubble)

Slide 9

Slide 9 text

>67,000 Words spent explicitly defining URLs in the RFCs #

Slide 10

Slide 10 text

The overambitious URL 10 years later, even the W3C had to admit it made some mistakes.

Slide 11

Slide 11 text

Design intent ◉ Simple ◉ Transcribable ◉ No barrier to entry Usable by humans and computers

Slide 12

Slide 12 text

The knowable URL The right amount of URL engineering know-how.

Slide 13

Slide 13 text

The Scheme 1 https://mahmoud:[email protected]/anatomy/scheme?lang=en&rfc=3986#subtitle-2017 ◉ Short, case-insensitive ◉ Letters, numbers, +, -, . ◉ Registered with IANA ◉ Determines URL semantics http, https, ssh, gopher, rsync, mailto, tel, … ~60 in common use

Slide 14

Slide 14 text

The Userinfo 2 https://mahmoud:[email protected]/anatomy/userinfo?lang=en&rfc=3986#subtitle-2017 ◉ Comes after the scheme ◉ ...

Slide 15

Slide 15 text

The Netloc Slashes! 1.5 mailto:[email protected] vs. http://blog.hatnote.com https://mahmoud:[email protected]/anatomy/netloc?lang=en&rfc=3986#subtitle-2017

Slide 16

Slide 16 text

The Userinfo 2 https://mahmoud:[email protected]/anatomy/userinfo?lang=en&rfc=3986#subtitle-2017 ◉ username:password@ ◉ Password is base64-encoded into Authentication header in HTTP ◉ Our first percent-encoded field!

Slide 17

Slide 17 text

Percent encoding aka quoting % %20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 ◉ URLs are built to support non-ASCII ◉ Byte values are replaced with %XX ◉ No standard encoding underneath ○ UTF-8 conventional now ○ Latin-1 one of many before ○ Binary-capable

Slide 18

Slide 18 text

The Host 3 https://mahmoud:[email protected]/anatomy/host?lang=en&rfc=3986#subtitle-2017

Slide 19

Slide 19 text

The Host 3 https://mahmoud:[email protected]/anatomy/host?lang=en&rfc=3986#subtitle-2017 ◉ IPv4, [IPv6], or string resolved with DNS ◉ Supports Unicode via Punycode u'https://bücher.ch' 'https://xn--bcher-kva.ch' 'https://xn--ggbla1c4e.xn--ngbc5azd/'

Slide 20

Slide 20 text

The Port 4 https://mahmoud:[email protected]:8080/anatomy/port?lang=en&rfc=3986#subtitle-2017 ◉ Positive integers only ◉ Usually registered with IANA ◉ Not emitted if equal to scheme default

Slide 21

Slide 21 text

The Path 5 https://mahmoud:[email protected]:8080/anatomy/path?lang=en&rfc=3986#subtitle-2017 ◉ Host-local hierarchy ◉ Also percent-encoded ◉ Absolute vs. relative ◉ Almost anything is a path (and a URL) ○ mailto:[email protected] ○ this|is|not|a|url

Slide 22

Slide 22 text

The Query String 6 https://mahmoud:[email protected]:8080/anatomy/query?lang=en&rfc=3986#subtitle-2017 ◉ My favorite part ◉ Order is preserved ◉ Duplicate keys combine ◉ An ordered multidict!

Slide 23

Slide 23 text

The Fragment 7 https://mahmoud:[email protected]:8080/anatomy/fragment?lang=en&rfc=3986#subtitle-2017 ◉ The frontend developers’ favorite part ◉ Not sent to the server ◉ Based on apartment numbers

Slide 24

Slide 24 text

A Pythonic Example Let’s look at some Python

Slide 25

Slide 25 text

core.py def func(a1, a2, kw1=None): pass Python is pretty powerful caller.py from pkg.mod import func # powerful func(arg1, arg2, kw=’kw1’) ?

Slide 26

Slide 26 text

But it seems URLs can keep up! ! py://func.module.pkg/arg1/arg2?kw1=val1#awesome \_/ \_____________/\________/ \______/ \_____/ | | | | | scheme authority path query fragment OK, back to reality.

Slide 27

Slide 27 text

What about urlparse? No standard library is perfect...

Slide 28

Slide 28 text

urlparse design gaps ◉ Mostly RFC1738 (1994) and RFC2396 (1998) ◉ URLs are “just” tuples of strings ◉ Hardcoded schemes (~25) ◉ Crufty APIs ○ urlparse vs. urlsplit

Slide 29

Slide 29 text

What do we do? pip install hyperlink

Slide 30

Slide 30 text

pip install hyperlink ◉ RFC3986+ ◉ Full-fledged URL type ◉ 58 schemes and counting ◉ Smart conventions ○ Plus schemes (git+ssh, etc.) ○ IPv6 validation ○ normalization ◉ Python 2.6 - 3.6 tested ◉ github.com/mahmoud/hyperlink ◉ hyperlink.readthedocs.io

Slide 31

Slide 31 text

Hyperlink API highlights ◉ Immutable URL type ◉ URIs for computers, IRIs for humans >>> url = URL.from_text('http://example.com/caf%C3%A9/láit') >>> print(url.to_iri().to_text()) http://example.com/café/láit >>> print(url.to_uri().to_text()) http://example.com/caf%C3%A9/au%20l%C3%A1it

Slide 32

Slide 32 text

Want corner cases? Check hyperlink/test

Slide 33

Slide 33 text

Hyperlink History and Future My idea of fun over time: ◉ 2013 ○ Build an IO-agnostic HTTP library and spend way too much time reading URL RFCs ◉ 2017 ○ Work with the Twisted project to merge my URL (boltons.urlutils) with twisted.python.url ◉ Future ○ Work on the Hyper project to bring more sans-IO web libraries to Python ○ https://github.com/python-hyper/

Slide 34

Slide 34 text

URLs in short ◉ Flexible ◉ Powerful ◉ Becoming even more useful URLs are what you make of them.

Slide 35

Slide 35 text

Any questions? ◉ github.com/mahmoud ◉ twitter.com/mhashemi ◉ sedimental.org Thanks!