Python, Locales and Writing Systems - PyCon US, 12th May 2018

Python, Locales and Writing Systems - PyCon US, 12th May 2018

Python 3 removes a lot of the confusion around Unicode handling in Python, but that by no means fixes everything. Different locales and writing systems have unique behaviours that can trip you up. Here's some of the worst ones and how to handle them correctly.

Presented at PyCon US 2018: https://us.pycon.org/2018/schedule/presentation/106/

Turkish i and ı:
- http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail
- http://www.i18nguy.com/unicode/turkish-i18n.html
- https://pypi.python.org/pypi/PyICU

Arabic:
- https://github.com/mpcabd/python-arabic-reshaper

Korean:
- http://www.gernot-katzers-spice-pages.com/var/korean_hangul_unicode.html
- https://docs.python.org/2/library/unicodedata.html

Security:
- http://howto.hackallthethings.com/2016/06/using-multi-byte-characters-to-nullify.html
- http://www.rafayhackingarticles.net/2016/08/google-chrome-firefox-address-bar.html
- https://www.xudongz.com/blog/2017/idn-phishing/
- https://www.theguardian.com/technology/2017/apr/19/phishing-url-trick-hackers
- https://it.slashdot.org/story/02/05/28/0142248/spoofing-urls-with-unicode

Further links:
http://www.joelonsoftware.com/articles/Unicode.html
http://lukas-prokop.at/talks/pydays18-unicode/#1 (a great exploration of Unicode's nooks and crannies)
https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/
https://modelviewculture.com/pieces/i-can-text-you-a-pile-of-poo-but-i-cant-write-my-name
http://nopenotarabic.tumblr.com/
http://sites.psu.edu/symbolcodes/

5665302502b3f48f4decb7f37bdeb348?s=128

Rae Knowler

May 12, 2018
Tweet