URLs are all around us. But just because we see and use them every day, doesn't mean as engineers we understand fully the features and flaws inherent in one of the most powerful pieces of technology ever taken for granted.
Those Three Words Every Browser Understands Uniform Uniformity means the mechanism stays the same, even if the types of resources differ. Resource A resource can be anything, even dynamic content, representing a consistent concept. Locator Locators are more than just identifiers; they have directions for network lookup. URLs are like a treasure map every browser can read.
The Netloc Slashes! 1.5 mailto:[email protected] vs. http://blog.hatnote.com https://mahmoud:[email protected]/anatomy/netloc?lang=en&rfc=3986#subtitle-2017
The Userinfo 2 https://mahmoud:[email protected]/anatomy/userinfo?lang=en&rfc=3986#subtitle-2017 ◉ username:[email protected] ◉ Password is base64-encoded into Authentication header in HTTP ◉ Our first percent-encoded field!
Percent encoding aka quoting % %20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 ◉ URLs are built to support non-ASCII ◉ Byte values are replaced with %XX ◉ No standard encoding underneath ○ UTF-8 conventional now ○ Latin-1 one of many before ○ Binary-capable
The Host 3 https://mahmoud:[email protected]/anatomy/host?lang=en&rfc=3986#subtitle-2017 ◉ IPv4, [IPv6], or string resolved with DNS ◉ Supports Unicode via Punycode u'https://bücher.ch' 'https://xn--bcher-kva.ch' 'https://xn--ggbla1c4e.xn--ngbc5azd/'
The Port 4 https://mahmoud:[email protected]:8080/anatomy/port?lang=en&rfc=3986#subtitle-2017 ◉ Positive integers only ◉ Usually registered with IANA ◉ Not emitted if equal to scheme default
The Path 5 https://mahmoud:[email protected]:8080/anatomy/path?lang=en&rfc=3986#subtitle-2017 ◉ Host-local hierarchy ◉ Also percent-encoded ◉ Absolute vs. relative ◉ Almost anything is a path (and a URL) ○ mailto:[email protected] ○ this|is|not|a|url
The Query String 6 https://mahmoud:[email protected]:8080/anatomy/query?lang=en&rfc=3986#subtitle-2017 ◉ My favorite part ◉ Order is preserved ◉ Duplicate keys combine ◉ An ordered multidict!
The Fragment 7 https://mahmoud:[email protected]:8080/anatomy/fragment?lang=en&rfc=3986#subtitle-2017 ◉ The frontend developers’ favorite part ◉ Not sent to the server ◉ Based on apartment numbers
But it seems URLs can keep up! ! py://func.module.pkg/arg1/arg2?kw1=val1#awesome \_/ \_____________/\________/ \______/ \_____/ | | | | | scheme authority path query fragment OK, back to reality.
Hyperlink History and Future My idea of fun over time: ◉ 2013 ○ Build an IO-agnostic HTTP library and spend way too much time reading URL RFCs ◉ 2017 ○ Work with the Twisted project to merge my URL (boltons.urlutils) with twisted.python.url ◉ Future ○ Work on the Hyper project to bring more sans-IO web libraries to Python ○ https://github.com/python-hyper/