Pro Yearly is on sale from $80 to $50! »

URLs: In Plain View

URLs: In Plain View

URLs are all around us. But just because we see and use them every day, doesn't mean as engineers we understand fully the features and flaws inherent in one of the most powerful pieces of technology ever taken for granted.

B4bbc497062643a8913884e7aba305f2?s=128

Mahmoud Hashemi

May 27, 2017
Tweet

Transcript

  1. URLs: In Plain View Mahmoud Hashemi May, 2017

  2. I love URLs The most advanced technology to reach the

    masses. Ever.
  3. A piece of the web for everyone ◉ Kids ◉

    Santa ◉ Dads
  4. Locating the URL Frontend Backend Non-web $ git clone git@github.com:mahmoud/boltons.git

  5. URLs are everywhere The Internet is very leaky.

  6. “ “Some long ass link you are somehow suppose to

    fit into the address bar.” fnaffoxy2916 Defined Feb 17, 2017 urbandictionary.com
  7. Those Three Words Every Browser Understands Uniform Uniformity means the

    mechanism stays the same, even if the types of resources differ. Resource A resource can be anything, even dynamic content, representing a consistent concept. Locator Locators are more than just identifiers; they have directions for network lookup. URLs are like a treasure map every browser can read.
  8. The history is long.com ◉ 1992 - W3 hypertext names

    ◉ 1994 - RFC 1630, 1736, 1737, 1738 ◉ 1995 - RFC 1808 ◉ 1997 - RFC 2141 ◉ 1998 - RFC 2396, 2368 ◉ 1999 - RFC 2732 ◉ 2002 - RFC 3305 ◉ 2005 - RFC 3986 (the gold standard) ◉ 2013 - RFC 6874 ◉ 2014 - RFC 7320 ◉ 2017 - WHATWG document (the browser bubble)
  9. >67,000 Words spent explicitly defining URLs in the RFCs #

  10. The overambitious URL 10 years later, even the W3C had

    to admit it made some mistakes.
  11. Design intent ◉ Simple ◉ Transcribable ◉ No barrier to

    entry Usable by humans and computers
  12. The knowable URL The right amount of URL engineering know-how.

  13. The Scheme 1 https://mahmoud:urls@pyconweb.com/anatomy/scheme?lang=en&rfc=3986#subtitle-2017 ◉ Short, case-insensitive ◉ Letters, numbers,

    +, -, . ◉ Registered with IANA ◉ Determines URL semantics http, https, ssh, gopher, rsync, mailto, tel, … ~60 in common use
  14. The Userinfo 2 https://mahmoud:urls@pyconweb.com/anatomy/userinfo?lang=en&rfc=3986#subtitle-2017 ◉ Comes after the scheme ◉

    ...
  15. The Netloc Slashes! 1.5 mailto:mahmoud@hatnote.com vs. http://blog.hatnote.com https://mahmoud:urls@pyconweb.com/anatomy/netloc?lang=en&rfc=3986#subtitle-2017

  16. The Userinfo 2 https://mahmoud:urls@pyconweb.com/anatomy/userinfo?lang=en&rfc=3986#subtitle-2017 ◉ username:password@ ◉ Password is base64-encoded

    into Authentication header in HTTP ◉ Our first percent-encoded field!
  17. Percent encoding aka quoting % %20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 ◉ URLs are built

    to support non-ASCII ◉ Byte values are replaced with %XX ◉ No standard encoding underneath ◦ UTF-8 conventional now ◦ Latin-1 one of many before ◦ Binary-capable
  18. The Host 3 https://mahmoud:urls@pyconweb.com/anatomy/host?lang=en&rfc=3986#subtitle-2017

  19. The Host 3 https://mahmoud:urls@pyconweb.com/anatomy/host?lang=en&rfc=3986#subtitle-2017 ◉ IPv4, [IPv6], or string resolved

    with DNS ◉ Supports Unicode via Punycode u'https://bücher.ch' 'https://xn--bcher-kva.ch' 'https://xn--ggbla1c4e.xn--ngbc5azd/'
  20. The Port 4 https://mahmoud:urls@pyconweb.com:8080/anatomy/port?lang=en&rfc=3986#subtitle-2017 ◉ Positive integers only ◉ Usually

    registered with IANA ◉ Not emitted if equal to scheme default
  21. The Path 5 https://mahmoud:urls@pyconweb.com:8080/anatomy/path?lang=en&rfc=3986#subtitle-2017 ◉ Host-local hierarchy ◉ Also percent-encoded

    ◉ Absolute vs. relative ◉ Almost anything is a path (and a URL) ◦ mailto:mahmoud@hatnote.com ◦ this|is|not|a|url
  22. The Query String 6 https://mahmoud:urls@pyconweb.com:8080/anatomy/query?lang=en&rfc=3986#subtitle-2017 ◉ My favorite part ◉

    Order is preserved ◉ Duplicate keys combine ◉ An ordered multidict!
  23. The Fragment 7 https://mahmoud:urls@pyconweb.com:8080/anatomy/fragment?lang=en&rfc=3986#subtitle-2017 ◉ The frontend developers’ favorite part

    ◉ Not sent to the server ◉ Based on apartment numbers
  24. A Pythonic Example Let’s look at some Python

  25. core.py def func(a1, a2, kw1=None): pass Python is pretty powerful

    caller.py from pkg.mod import func # powerful func(arg1, arg2, kw=’kw1’) ?
  26. But it seems URLs can keep up! ! py://func.module.pkg/arg1/arg2?kw1=val1#awesome \_/

    \_____________/\________/ \______/ \_____/ | | | | | scheme authority path query fragment OK, back to reality.
  27. What about urlparse? No standard library is perfect...

  28. urlparse design gaps ◉ Mostly RFC1738 (1994) and RFC2396 (1998)

    ◉ URLs are “just” tuples of strings ◉ Hardcoded schemes (~25) ◉ Crufty APIs ◦ urlparse vs. urlsplit
  29. What do we do? pip install hyperlink

  30. pip install hyperlink ◉ RFC3986+ ◉ Full-fledged URL type ◉

    58 schemes and counting ◉ Smart conventions ◦ Plus schemes (git+ssh, etc.) ◦ IPv6 validation ◦ normalization ◉ Python 2.6 - 3.6 tested ◉ github.com/mahmoud/hyperlink ◉ hyperlink.readthedocs.io
  31. Hyperlink API highlights ◉ Immutable URL type ◉ URIs for

    computers, IRIs for humans >>> url = URL.from_text('http://example.com/caf%C3%A9/láit') >>> print(url.to_iri().to_text()) http://example.com/café/láit >>> print(url.to_uri().to_text()) http://example.com/caf%C3%A9/au%20l%C3%A1it
  32. Want corner cases? Check hyperlink/test

  33. Hyperlink History and Future My idea of fun over time:

    ◉ 2013 ◦ Build an IO-agnostic HTTP library and spend way too much time reading URL RFCs ◉ 2017 ◦ Work with the Twisted project to merge my URL (boltons.urlutils) with twisted.python.url ◉ Future ◦ Work on the Hyper project to bring more sans-IO web libraries to Python ◦ https://github.com/python-hyper/
  34. URLs in short ◉ Flexible ◉ Powerful ◉ Becoming even

    more useful URLs are what you make of them.
  35. Any questions? ◉ github.com/mahmoud ◉ twitter.com/mhashemi ◉ sedimental.org Thanks!