URLs: In Plain View

URLs: In Plain View

URLs are all around us. But just because we see and use them every day, doesn't mean as engineers we understand fully the features and flaws inherent in one of the most powerful pieces of technology ever taken for granted.

B4bbc497062643a8913884e7aba305f2?s=128

Mahmoud Hashemi

May 27, 2017
Tweet

Transcript

  1. 6.

    “ “Some long ass link you are somehow suppose to

    fit into the address bar.” fnaffoxy2916 Defined Feb 17, 2017 urbandictionary.com
  2. 7.

    Those Three Words Every Browser Understands Uniform Uniformity means the

    mechanism stays the same, even if the types of resources differ. Resource A resource can be anything, even dynamic content, representing a consistent concept. Locator Locators are more than just identifiers; they have directions for network lookup. URLs are like a treasure map every browser can read.
  3. 8.

    The history is long.com ◉ 1992 - W3 hypertext names

    ◉ 1994 - RFC 1630, 1736, 1737, 1738 ◉ 1995 - RFC 1808 ◉ 1997 - RFC 2141 ◉ 1998 - RFC 2396, 2368 ◉ 1999 - RFC 2732 ◉ 2002 - RFC 3305 ◉ 2005 - RFC 3986 (the gold standard) ◉ 2013 - RFC 6874 ◉ 2014 - RFC 7320 ◉ 2017 - WHATWG document (the browser bubble)
  4. 10.
  5. 11.
  6. 13.

    The Scheme 1 https://mahmoud:urls@pyconweb.com/anatomy/scheme?lang=en&rfc=3986#subtitle-2017 ◉ Short, case-insensitive ◉ Letters, numbers,

    +, -, . ◉ Registered with IANA ◉ Determines URL semantics http, https, ssh, gopher, rsync, mailto, tel, … ~60 in common use
  7. 17.

    Percent encoding aka quoting % %20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 ◉ URLs are built

    to support non-ASCII ◉ Byte values are replaced with %XX ◉ No standard encoding underneath ◦ UTF-8 conventional now ◦ Latin-1 one of many before ◦ Binary-capable
  8. 19.

    The Host 3 https://mahmoud:urls@pyconweb.com/anatomy/host?lang=en&rfc=3986#subtitle-2017 ◉ IPv4, [IPv6], or string resolved

    with DNS ◉ Supports Unicode via Punycode u'https://bücher.ch' 'https://xn--bcher-kva.ch' 'https://xn--ggbla1c4e.xn--ngbc5azd/'
  9. 21.

    The Path 5 https://mahmoud:urls@pyconweb.com:8080/anatomy/path?lang=en&rfc=3986#subtitle-2017 ◉ Host-local hierarchy ◉ Also percent-encoded

    ◉ Absolute vs. relative ◉ Almost anything is a path (and a URL) ◦ mailto:mahmoud@hatnote.com ◦ this|is|not|a|url
  10. 25.

    core.py def func(a1, a2, kw1=None): pass Python is pretty powerful

    caller.py from pkg.mod import func # powerful func(arg1, arg2, kw=’kw1’) ?
  11. 26.

    But it seems URLs can keep up! ! py://func.module.pkg/arg1/arg2?kw1=val1#awesome \_/

    \_____________/\________/ \______/ \_____/ | | | | | scheme authority path query fragment OK, back to reality.
  12. 28.

    urlparse design gaps ◉ Mostly RFC1738 (1994) and RFC2396 (1998)

    ◉ URLs are “just” tuples of strings ◉ Hardcoded schemes (~25) ◉ Crufty APIs ◦ urlparse vs. urlsplit
  13. 30.

    pip install hyperlink ◉ RFC3986+ ◉ Full-fledged URL type ◉

    58 schemes and counting ◉ Smart conventions ◦ Plus schemes (git+ssh, etc.) ◦ IPv6 validation ◦ normalization ◉ Python 2.6 - 3.6 tested ◉ github.com/mahmoud/hyperlink ◉ hyperlink.readthedocs.io
  14. 31.

    Hyperlink API highlights ◉ Immutable URL type ◉ URIs for

    computers, IRIs for humans >>> url = URL.from_text('http://example.com/caf%C3%A9/láit') >>> print(url.to_iri().to_text()) http://example.com/café/láit >>> print(url.to_uri().to_text()) http://example.com/caf%C3%A9/au%20l%C3%A1it
  15. 33.

    Hyperlink History and Future My idea of fun over time:

    ◉ 2013 ◦ Build an IO-agnostic HTTP library and spend way too much time reading URL RFCs ◉ 2017 ◦ Work with the Twisted project to merge my URL (boltons.urlutils) with twisted.python.url ◉ Future ◦ Work on the Hyper project to bring more sans-IO web libraries to Python ◦ https://github.com/python-hyper/
  16. 34.

    URLs in short ◉ Flexible ◉ Powerful ◉ Becoming even

    more useful URLs are what you make of them.