Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Perl and the rest of the world - What have(n't)...

Dan Kogai
August 26, 2023

Perl and the rest of the world - What have(n't) changed in two decades?

Dan Kogai

August 26, 2023
Tweet

More Decks by Dan Kogai

Other Decks in Programming

Transcript

  1. @dankogai Perl and the rest of the world - What

    have(n't) changed in two decades?
  2. Table of Contents • What have changed: 2003 -> 2023?

    • Perl • The rest of the world • Data matters more than language • Doing Unicode right
  3. What have changed: 2003 -> 2023? Not much for Perl!

    • 🦏 JavaScript: ES3 -> ES2023 • 🐍 Python: 2.2 -> 3.11 • 💎 Ruby: 1.8 (no rails!) -> 3.2 • 🐘 PHP: 3 (not even 4) -> 7 • 🐪 Perl : 5.8 -> 5.36 • Perl6? It is raku now!
  4. What have changed: 2003 -> 2023? And the rest of

    the world… • 💻 -> 📱 • 🦏 > 🐪ʴ🐘ʴ🐍ʴ💎ʴ… • 32bit -> 64bit • SOAP, XML… -> JSON • Bunch of legacy encodings -> UTF-8
  5. Unicoding the World Perl 5.8 (Released 2001) • One of

    the first computer languages to harness Unicode • use utf8; • use Encode; • \x{} notation (\u{} in other languages) • /./ matches Unicode codepoint • /\X/ matches Unicode grapheme • /\p{Han}/ matches ׽ࣈ
  6. Unicoding the World What is a character? • String is

    /.*/ but . = • [\x00-\xff] # legacy world of bytes • [\u0000-\uFFFF] # prematurely modern • [\u{0000}-\u{10FFFF}] # correctly modern
  7. Unicoding the World What is a character? • String is

    /.*/ but . = • [\x00-\xff] # Perl < 5.7 • [\u0000-\uFFFF] # Java(Script)?, Python2, … • [\u{0000}-\u{10FFFF}] # Perl, Ruby, Python3, …
  8. Unicode Support? What will the following say? $ python2 -c

    'print(len("🐍"))' 2 # unless --enable-unicode=ucs4 $ python3 -c 'print(len("🐍"))' 1 # unconditionally. The way it is supposed to be
  9. Unicode Support? What will the following say? $ node -e

    'console.log("🐍".length)' 2 # 🤦 $ node -e 'console.log([..."🐍"].length)' 1 # 👍
  10. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'
  11. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "\x{1f98f}", "\x{1f42a}", "\x{1f418}", "\x{1f40d}", "\x{1f48e}", "\x{2699}" ];
  12. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'
  13. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))' [ '', '', '', '', '', '', '', '', '', '', '⚙' ]
  14. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/ug))' [ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]
  15. Unicode Support? What will the following say? $ node -e

    \ 'console.log([..."🦏🐪🐘🐍💎⚙"])' [ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]
  16. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
  17. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "\x{1f1ef}", "\x{1f1f5}", "\x{1f1fa}", "\x{1f1e6}" ];
  18. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "\x{1f1ef}", # REGIONAL INDICATOR SYMBOL LETTER J "\x{1f1f5}", # REGIONAL INDICATOR SYMBOL LETTER P "\x{1f1fa}", # REGIONAL INDICATOR SYMBOL LETTER U "\x{1f1e6}" # REGIONAL INDICATOR SYMBOL LETTER A ];
  19. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🇯🇵🇺🇦" =~ /(\X)/g); say Dumper([@m])' $VAR1 = [ "\x{1f1ef}\x{1f1f5}", "\x{1f1fa}\x{1f1e6}" ];
  20. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🇯🇵🇺🇦".match(/(.)/ug))'
  21. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🇯🇵🇺🇦".match(/(.)/ug))' [ '🇯', '🇵', '🇺', '🇦' ]
  22. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🇯🇵🇺🇦".match(/(\X)/ug))' 🙅 [ '🇯🇵','🇺🇦' ] 🙆 SyntaxError: Invalid regular expression: /(\X)/: Invalid escape at [eval]:1:24 at Script.runInThisContext (node:vm:129:12) at Object.runInThisContext (node:vm:305:38) at node:internal/process/execution:75:19 at [eval]-wrapper:6:22 at evalScript (node:internal/process/execution:74:60) at node:internal/main/eval_string:27:3
  23. Unicode Support? FYI: A workaround for modern JS $ node

    -e 'segmenter = new Intl.Segmenter(); segment = [...segmenter.segment("🇯🇵🇺🇦")].map(v=>v.segment); console.log(segment)' 🙆 [ '🇯🇵','🇺🇦' ] // cf. https://developer.mozilla.org/ja/docs/Web/JavaScript/ Reference/Global_Objects/Intl/Segmenter/Segmenter // unsupported by Firefox (as of Mar 2023)
  24. Unicode Support? Grapheme Cluster • Defined in: • https://unicode.org/reports/tr29/ •

    \X is supported by: • 🐘 PHP (via preg_*()) • 🐪 Perl • 💎 Ruby • Not yet supported by: • 🦏 JavaScript (Intl.Segmenter where available) • 🐍 Python (pip install regex?) • https://pypi.org/project/regex/
  25. Unicode Support? -- gone too far? What will the following

    say? $ swift Welcome to Swift version 5.5.2-dev. Type :help for assistance. 1> "\u{3060}\u{3093}" == "\u{305f}\u{3099}\u{3093}" // ͩΜ $R0: Bool = true
  26. Unicode Support? -- gone too far? What will the following

    say? in fi x operator ===: ComparisonPrecedence extension String { static func ===(_ lhs:Self, _ rhs:Self)->Bool { return lhs.utf8.elementsEqual(rhs.utf8) } } let dan0 = "\u{3060}\u{3093}" let dan1 = "\u{305f}\u{3099}\u{3093}" dan0 == dan1 // true dan0 === dan1 // false dan0 === dan0 // true
  27. Wrap↑ • Perl hasn't changed much • Because it didn't

    have to • Doing Unicode right since 5.8 • Other languages need some more catching up to do • PHP: ? • Ruby: well done! • Python: Kill 2! \X missing • JavaScript: use for-of; \X missing • Swift: gone too far?