the first computer languages to harness Unicode • use utf8; • use Encode; • \x{} notation (\u{} in other languages) • /./ matches Unicode codepoint • /\X/ matches Unicode grapheme • /\p{Han}/ matches ࣈ
-MData::Dumper -E \ 'my@m=("🇯🇵🇺🇦" =~ /(.)/g); say Dumper([@m])' $VAR1 = [ "\x{1f1ef}", # REGIONAL INDICATOR SYMBOL LETTER J "\x{1f1f5}", # REGIONAL INDICATOR SYMBOL LETTER P "\x{1f1fa}", # REGIONAL INDICATOR SYMBOL LETTER U "\x{1f1e6}" # REGIONAL INDICATOR SYMBOL LETTER A ];
\ 'console.log("🇯🇵🇺🇦".match(/(\X)/ug))' 🙅 [ '🇯🇵','🇺🇦' ] 🙆 SyntaxError: Invalid regular expression: /(\X)/: Invalid escape at [eval]:1:24 at Script.runInThisContext (node:vm:129:12) at Object.runInThisContext (node:vm:305:38) at node:internal/process/execution:75:19 at [eval]-wrapper:6:22 at evalScript (node:internal/process/execution:74:60) at node:internal/main/eval_string:27:3
have to • Doing Unicode right since 5.8 • Other languages need some more catching up to do • PHP: ? • Ruby: well done! • Python: Kill 2! \X missing • JavaScript: use for-of; \X missing • Swift: gone too far?