Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Perl and the rest of the world - What have(n't)...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for Dan Kogai Dan Kogai
August 26, 2023

Perl and the rest of the world - What have(n't) changed in two decades?

Avatar for Dan Kogai

Dan Kogai

August 26, 2023
Tweet

More Decks by Dan Kogai

Other Decks in Programming

Transcript

  1. @dankogai Perl and the rest of the world - What

    have(n't) changed in two decades?
  2. Table of Contents • What have changed: 2003 -> 2023?

    • Perl • The rest of the world • Data matters more than language • Doing Unicode right
  3. What have changed: 2003 -> 2023? Not much for Perl!

    • 🦏 JavaScript: ES3 -> ES2023 • 🐍 Python: 2.2 -> 3.11 • 💎 Ruby: 1.8 (no rails!) -> 3.2 • 🐘 PHP: 3 (not even 4) -> 7 • 🐪 Perl : 5.8 -> 5.36 • Perl6? It is raku now!
  4. What have changed: 2003 -> 2023? And the rest of

    the world… • 💻 -> 📱 • 🦏 > 🐪ʴ🐘ʴ🐍ʴ💎ʴ… • 32bit -> 64bit • SOAP, XML… -> JSON • Bunch of legacy encodings -> UTF-8
  5. Unicoding the World Perl 5.8 (Released 2001) • One of

    the first computer languages to harness Unicode • use utf8; • use Encode; • \x{} notation (\u{} in other languages) • /./ matches Unicode codepoint • /\X/ matches Unicode grapheme • /\p{Han}/ matches ׽ࣈ
  6. Unicoding the World What is a character? • String is

    /.*/ but . = • [\x00-\xff] # legacy world of bytes • [\u0000-\uFFFF] # prematurely modern • [\u{0000}-\u{10FFFF}] # correctly modern
  7. Unicoding the World What is a character? • String is

    /.*/ but . = • [\x00-\xff] # Perl < 5.7 • [\u0000-\uFFFF] # Java(Script)?, Python2, … • [\u{0000}-\u{10FFFF}] # Perl, Ruby, Python3, …
  8. Unicode Support? What will the following say? $ python2 -c

    'print(len("🐍"))' 2 # unless --enable-unicode=ucs4 $ python3 -c 'print(len("🐍"))' 1 # unconditionally. The way it is supposed to be
  9. Unicode Support? What will the following say? $ node -e

    'console.log("🐍".length)' 2 # 🤦 $ node -e 'console.log([..."🐍"].length)' 1 # 👍
  10. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'
  11. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "\x{1f98f}", "\x{1f42a}", "\x{1f418}", "\x{1f40d}", "\x{1f48e}", "\x{2699}" ];
  12. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'
  13. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))' [ '', '', '', '', '', '', '', '', '', '', '⚙' ]
  14. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/ug))' [ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]
  15. Unicode Support? What will the following say? $ node -e

    \ 'console.log([..."🦏🐪🐘🐍💎⚙"])' [ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]
  16. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
  17. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "\x{1f1ef}", "\x{1f1f5}", "\x{1f1fa}", "\x{1f1e6}" ];
  18. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "\x{1f1ef}", # REGIONAL INDICATOR SYMBOL LETTER J "\x{1f1f5}", # REGIONAL INDICATOR SYMBOL LETTER P "\x{1f1fa}", # REGIONAL INDICATOR SYMBOL LETTER U "\x{1f1e6}" # REGIONAL INDICATOR SYMBOL LETTER A ];
  19. Unicode Support? What will the following say? $ perl -Mutf8

    -MData::Dumper -E \ 'my@m=("🇯🇵🇺🇦" =~ /(\X)/g); say Dumper([@m])' $VAR1 = [ "\x{1f1ef}\x{1f1f5}", "\x{1f1fa}\x{1f1e6}" ];
  20. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🇯🇵🇺🇦".match(/(.)/ug))'
  21. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🇯🇵🇺🇦".match(/(.)/ug))' [ '🇯', '🇵', '🇺', '🇦' ]
  22. Unicode Support? What will the following say? $ node -e

    \ 'console.log("🇯🇵🇺🇦".match(/(\X)/ug))' 🙅 [ '🇯🇵','🇺🇦' ] 🙆 SyntaxError: Invalid regular expression: /(\X)/: Invalid escape at [eval]:1:24 at Script.runInThisContext (node:vm:129:12) at Object.runInThisContext (node:vm:305:38) at node:internal/process/execution:75:19 at [eval]-wrapper:6:22 at evalScript (node:internal/process/execution:74:60) at node:internal/main/eval_string:27:3
  23. Unicode Support? FYI: A workaround for modern JS $ node

    -e 'segmenter = new Intl.Segmenter(); segment = [...segmenter.segment("🇯🇵🇺🇦")].map(v=>v.segment); console.log(segment)' 🙆 [ '🇯🇵','🇺🇦' ] // cf. https://developer.mozilla.org/ja/docs/Web/JavaScript/ Reference/Global_Objects/Intl/Segmenter/Segmenter // unsupported by Firefox (as of Mar 2023)
  24. Unicode Support? Grapheme Cluster • Defined in: • https://unicode.org/reports/tr29/ •

    \X is supported by: • 🐘 PHP (via preg_*()) • 🐪 Perl • 💎 Ruby • Not yet supported by: • 🦏 JavaScript (Intl.Segmenter where available) • 🐍 Python (pip install regex?) • https://pypi.org/project/regex/
  25. Unicode Support? -- gone too far? What will the following

    say? $ swift Welcome to Swift version 5.5.2-dev. Type :help for assistance. 1> "\u{3060}\u{3093}" == "\u{305f}\u{3099}\u{3093}" // ͩΜ $R0: Bool = true
  26. Unicode Support? -- gone too far? What will the following

    say? in fi x operator ===: ComparisonPrecedence extension String { static func ===(_ lhs:Self, _ rhs:Self)->Bool { return lhs.utf8.elementsEqual(rhs.utf8) } } let dan0 = "\u{3060}\u{3093}" let dan1 = "\u{305f}\u{3099}\u{3093}" dan0 == dan1 // true dan0 === dan1 // false dan0 === dan0 // true
  27. Wrap↑ • Perl hasn't changed much • Because it didn't

    have to • Doing Unicode right since 5.8 • Other languages need some more catching up to do • PHP: ? • Ruby: well done! • Python: Kill 2! \X missing • JavaScript: use for-of; \X missing • Swift: gone too far?