Python 3 removes a lot of the confusion around Unicode handling in Python, but that by no means fixes everything. Different locales and writing systems have unique behaviours that can trip you up. Here's some of the worst ones and how to handle them correctly.
Presented at PyCon US 2018: https://us.pycon.org/2018/schedule/presentation/106/
Turkish i and ı:
- http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail
- http://www.i18nguy.com/unicode/turkish-i18n.html
- https://pypi.python.org/pypi/PyICU
Arabic:
- https://github.com/mpcabd/python-arabic-reshaper
Korean:
- http://www.gernot-katzers-spice-pages.com/var/korean_hangul_unicode.html
- https://docs.python.org/2/library/unicodedata.html
Security:
- http://howto.hackallthethings.com/2016/06/using-multi-byte-characters-to-nullify.html
- http://www.rafayhackingarticles.net/2016/08/google-chrome-firefox-address-bar.html
- https://www.xudongz.com/blog/2017/idn-phishing/
- https://www.theguardian.com/technology/2017/apr/19/phishing-url-trick-hackers
- https://it.slashdot.org/story/02/05/28/0142248/spoofing-urls-with-unicode
Further links:
http://www.joelonsoftware.com/articles/Unicode.html
http://lukas-prokop.at/talks/pydays18-unicode/#1 (a great exploration of Unicode's nooks and crannies)
https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/
https://modelviewculture.com/pieces/i-can-text-you-a-pile-of-poo-but-i-cant-write-my-name
http://nopenotarabic.tumblr.com/
http://sites.psu.edu/symbolcodes/