Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Perl and the rest of the world - What have(n't) changed in two decades?

Dan Kogai
August 26, 2023

Perl and the rest of the world - What have(n't) changed in two decades?

Dan Kogai

August 26, 2023
Tweet

More Decks by Dan Kogai

Other Decks in Programming

Transcript

  1. @dankogai
    Perl and the rest of the world -
    What have(n't) changed in two
    decades?

    View Slide

  2. Table of Contents
    • What have changed: 2003 -> 2023?


    • Perl


    • The rest of the world


    • Data matters more than language


    • Doing Unicode right

    View Slide

  3. What have changed: 2003 -> 2023?
    Not much for Perl!
    • 🦏 JavaScript: ES3 -> ES2023

    • 🐍 Python: 2.2 -> 3.11

    • 💎 Ruby: 1.8 (no rails!) -> 3.2

    • 🐘 PHP: 3 (not even 4) -> 7

    • 🐪 Perl : 5.8 -> 5.36

    • Perl6? It is raku now!

    View Slide

  4. What have changed: 2003 -> 2023?
    And the rest of the world…
    • 💻 -> 📱

    • 🦏 > 🐪ʴ🐘ʴ🐍ʴ💎ʴ…

    • 32bit -> 64bit

    • SOAP, XML… -> JSON

    • Bunch of legacy encodings -> UTF-8

    View Slide

  5. Unicoding the World
    Perl 5.8 (Released 2001)
    • One of the first computer languages to harness Unicode


    • use utf8;


    • use Encode;


    • \x{} notation (\u{} in other languages)


    • /./ matches Unicode codepoint


    • /\X/ matches Unicode grapheme


    • /\p{Han}/ matches ׽ࣈ

    View Slide

  6. Unicoding the World
    What is a character?
    • String is /.*/ but . =


    • [\x00-\xff] # legacy world of bytes


    • [\u0000-\uFFFF] # prematurely modern


    • [\u{0000}-\u{10FFFF}] # correctly modern

    View Slide

  7. Unicoding the World
    What is a character?
    • String is /.*/ but . =


    • [\x00-\xff] # Perl < 5.7


    • [\u0000-\uFFFF] # Java(Script)?, Python2, …


    • [\u{0000}-\u{10FFFF}] # Perl, Ruby, Python3, …

    View Slide

  8. Unicode Support?
    What will the following say?
    $ python2 -c 'print(len("🐍"))'
    2 # unless --enable-unicode=ucs4
    $ python3 -c 'print(len("🐍"))'
    1 # unconditionally. The way it is supposed to be

    View Slide

  9. Unicode Support?
    What will the following say?
    $ node -e 'console.log("🐍".length)'
    2 # 🤦
    $ node -e 'console.log([..."🐍"].length)'
    1 # 👍

    View Slide

  10. Unicode Support?
    What will the following say?
    $ perl -Mutf8 -MData::Dumper -E \
    'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'

    View Slide

  11. Unicode Support?
    What will the following say?
    $ perl -Mutf8 -MData::Dumper -E \
    'my@m=("🦏🐪🐘🐍💎⚙" =~ /(.)/g); say Dumper([@m])'
    $VAR1 = [
    "\x{1f98f}",
    "\x{1f42a}",
    "\x{1f418}",
    "\x{1f40d}",
    "\x{1f48e}",
    "\x{2699}"
    ];

    View Slide

  12. Unicode Support?
    What will the following say?
    $ node -e \
    'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'

    View Slide

  13. Unicode Support?
    What will the following say?
    $ node -e \
    'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/g))'
    [
    '', '', '', '',
    '', '', '', '',
    '', '', '⚙'
    ]

    View Slide

  14. Unicode Support?
    What will the following say?
    $ node -e \
    'console.log("🦏🐪🐘🐍💎⚙".match(/(.)/ug))'
    [ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]

    View Slide

  15. Unicode Support?
    What will the following say?
    $ node -e \
    'console.log([..."🦏🐪🐘🐍💎⚙"])'
    [ '🦏', '🐪', '🐘', '🐍', '💎', '⚙' ]

    View Slide

  16. Unicode Support?
    What will the following say?
    $ perl -Mutf8 -MData::Dumper -E \
    'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'

    View Slide

  17. Unicode Support?
    What will the following say?
    $ perl -Mutf8 -MData::Dumper -E \
    'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
    $VAR1 = [
    "\x{1f1ef}",
    "\x{1f1f5}",
    "\x{1f1fa}",
    "\x{1f1e6}"
    ];

    View Slide

  18. Unicode Support?
    What will the following say?
    $ perl -Mutf8 -MData::Dumper -E \
    'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])'
    $VAR1 = [
    "\x{1f1ef}", # REGIONAL INDICATOR SYMBOL LETTER J
    "\x{1f1f5}", # REGIONAL INDICATOR SYMBOL LETTER P
    "\x{1f1fa}", # REGIONAL INDICATOR SYMBOL LETTER U
    "\x{1f1e6}" # REGIONAL INDICATOR SYMBOL LETTER A
    ];

    View Slide

  19. Unicode Support?
    What will the following say?
    $ perl -Mutf8 -MData::Dumper -E \
    'my@m=("🇯🇵🇺🇦" =~ /(\X)/g); say Dumper([@m])'
    $VAR1 = [
    "\x{1f1ef}\x{1f1f5}",
    "\x{1f1fa}\x{1f1e6}"
    ];

    View Slide

  20. Unicode Support?
    What will the following say?
    $ node -e \
    'console.log("🇯🇵🇺🇦".match(/(.)/ug))'

    View Slide

  21. Unicode Support?
    What will the following say?
    $ node -e \
    'console.log("🇯🇵🇺🇦".match(/(.)/ug))'
    [ '🇯', '🇵', '🇺', '🇦' ]

    View Slide

  22. Unicode Support?
    What will the following say?
    $ node -e \
    'console.log("🇯🇵🇺🇦".match(/(\X)/ug))'
    🙅 [ '🇯🇵','🇺🇦' ]
    🙆 SyntaxError: Invalid regular expression: /(\X)/: Invalid escape
    at [eval]:1:24
    at Script.runInThisContext (node:vm:129:12)
    at Object.runInThisContext (node:vm:305:38)
    at node:internal/process/execution:75:19
    at [eval]-wrapper:6:22
    at evalScript (node:internal/process/execution:74:60)
    at node:internal/main/eval_string:27:3

    View Slide

  23. 🤦

    View Slide

  24. Unicode Support?
    FYI: A workaround for modern JS
    $ node -e 'segmenter = new Intl.Segmenter(); segment =
    [...segmenter.segment("🇯🇵🇺🇦")].map(v=>v.segment);
    console.log(segment)'
    🙆 [ '🇯🇵','🇺🇦' ]
    // cf. https://developer.mozilla.org/ja/docs/Web/JavaScript/
    Reference/Global_Objects/Intl/Segmenter/Segmenter
    // unsupported by Firefox (as of Mar 2023)

    View Slide

  25. Unicode Support?
    Grapheme Cluster
    • Defined in:


    • https://unicode.org/reports/tr29/


    • \X is supported by:


    • 🐘 PHP (via preg_*())


    • 🐪 Perl


    • 💎 Ruby


    • Not yet supported by:


    • 🦏 JavaScript (Intl.Segmenter where available)


    • 🐍 Python (pip install regex?)


    • https://pypi.org/project/regex/

    View Slide

  26. Unicode Support? -- gone too far?
    What will the following say?
    $ swift
    Welcome to Swift version 5.5.2-dev.
    Type :help for assistance.
    1> "\u{3060}\u{3093}" == "\u{305f}\u{3099}\u{3093}" // ͩΜ
    $R0: Bool = true

    View Slide

  27. Unicode Support? -- gone too far?
    What will the following say?
    in
    fi
    x operator ===: ComparisonPrecedence
    extension String {
    static func ===(_ lhs:Self, _ rhs:Self)->Bool {
    return lhs.utf8.elementsEqual(rhs.utf8)
    }
    }
    let dan0 = "\u{3060}\u{3093}"
    let dan1 = "\u{305f}\u{3099}\u{3093}"
    dan0 == dan1 // true
    dan0 === dan1 // false
    dan0 === dan0 // true

    View Slide

  28. Wrap↑
    • Perl hasn't changed much


    • Because it didn't have to


    • Doing Unicode right since 5.8


    • Other languages need some more catching up to do


    • PHP: ?


    • Ruby: well done!


    • Python: Kill 2! \X missing


    • JavaScript: use for-of; \X missing


    • Swift: gone too far?

    View Slide

  29. Thank you
    🙇

    View Slide

  30. Questions and answers
    answer($_) foreach (/($questions)/sg);

    View Slide