intl me this, intl me that

intl me this, intl me that

Quick, what's the proper way to show a number as currency for India? 1.2 billion people would like to know. No pressure. These common localization issues come up more and more frequently. Learn to use the improved intl extension to order lists, format numbers, split text into pieces and show calendars just like a local would. It even helps with character sets and time zones - 2 out of 3 things that commonly break the Internet. ¡Muy bueno!

Aa4af19d5034741a0864f0f0738800f2?s=128

Andrei Zmievski

February 22, 2014
Tweet

Transcript

  1. intl me this, intl me that Andrei Zmievski
 AppDynamics PHP

    UK ~ February 22, 2014 ~ London
  2. me • Software architect at AppDynamics • PHP Core contributor

    (1999-2010) • Architect of the Unicode/i18n in PHP 6 • Twitter: @a • Beer lover (and brewer)
  3. unicode 7

  4. terms • Internationalization (i18n) • to design and develop an

    application without built-in cultural assumptions that is efficient to localize • Localization (l10n) • to tailor an application to meet the needs of a particular region, market, or culture
  5. no assumptions • English/French/Chinese is just another language • Your

    country is just another country • Earth is just another planet (eventually)
  6. None
  7. why localize? • English speakers are now a minority on

    WWW • Nearly 3 out of 4 participants surveyed by Common Sense Advisory agreed that they were more likely to buy from sites in their own languages than in English • Global consumers will pay more for products with information in their language
  8. locale • identifier referring to linguistic and cultural preferences of

    a user community • language • script • country • variant • @keywords sr_Latn_YU_REVISED@currency=USD en_GB
  9. locale data • Common Locale Data Repository (CLDR) • 740

    locales: 238 languages and 259 territories • updated regularly
  10. intl • available since PHP 5.3 • bundled locale data

    • formatters/parsers • collation (sorting) • calendars and timezones • boundary iteration • transliteration • resource bundles • character set conversion • spoof checking
  11. API • OO and procedural API • Same underlying implementation

    collator_create() new Collator() collator_set_strength() $collator->setStrength() numfmt_format() NumberFormatter::format()
  12. UConverter

  13. purpose • robust conversion between character encodings • replacement for

    mb_convert_encoding()
  14. simple echo UConverter::transcode( "Cura\xe7ao", "utf-8", "iso-8859-1"); echo UConverter::transcode( "Curaçao", "iso-8859-8",

    "utf-8"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8", array("to_subst" => "×")); Curaçao
  15. simple echo UConverter::transcode( "Cura\xe7ao", "utf-8", "iso-8859-1"); echo UConverter::transcode( "Curaçao", "iso-8859-8",

    "utf-8"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8", array("to_subst" => "×")); Curaçao Curaao
  16. simple echo UConverter::transcode( "Cura\xe7ao", "utf-8", "iso-8859-1"); echo UConverter::transcode( "Curaçao", "iso-8859-8",

    "utf-8"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8", array("to_subst" => "×")); Curaçao Curaao Cura×ao
  17. callbacks class MyConverter extends UConverter { public function fromUCallback($reason, $source,

    $codepoint, &$error) { if (($reason == UConverter::REASON_UNASSIGNED) && ($codepoint == 0x221A)) { // translate √ to sqrt $error = U_ZERO_ERROR; return 'square root of '; } } } $c = new MyConverter('ascii', 'utf-8'); echo $c->convert("What is √2?"); What is square root of 2?
  18. Collator

  19. sorting • languages may sort more than one way •

    traditional vs. modern Spanish • Japanese stroke-radical vs. radical-stroke • German dictionary vs. phone book
  20. collation levels primary base characters secondary accents and language quirks

    tertiary case and variants of base forms quaternary you will never use this identical tie-breaker
  21. collation levels • Each locale has default level setting •

    Differences in lower levels are ignored if higher levels are already different
  22. comparing strings côte < coté $coll = new Collator("fr_FR"); if

    ($coll->compare("côte", "coté") < 0) { echo "before"; } else { echo "after"; } before
  23. strength control $coll = new Collator("fr_FR"); $coll->setStrength(Collator::PRIMARY); if ($coll->compare("côte", "coté")

    == 0) { echo "same"; } else { echo "different"; } côte = coté same
  24. sorting strings cote côte Côte coté Coté côté Côté coter

    $strings = array( "cote", "côte", "Côte", "coté", "Coté", "côté", "Côté", "coter"); $coll = new Collator("fr_FR"); $coll->sort($strings);
  25. other attributes $coll = new Collator("en_US"); $coll->setAttribute(Collator::CASE_FIRST, Collator::UPPER_FIRST); if ($coll->compare("abc",

    "ABC") < 0) { echo "before"; } else { echo "after"; } ABC < abc before
  26. numeric collation 1 < 2 < 10 $strings = array("10",

    "1", "2"); $coll->setStrength(Collator::NUMERIC_COLLATION, Collator::ON); $coll = new Collator(null); $coll->sort($strings);
  27. NumberFormatter

  28. purpose • formats numbers as strings according to the locale,

    given pattern or set of rules • parses strings into numbers according to these patterns • replacement for number_format()
  29. formatter styles • NumberFormatter::PATTERN_DECIMAL
 1234,567 (with ##.##) • NumberFormatter::DECIMAL
 1

    234,56 • NumberFormatter::CURRENCY
 1 234,57 € • NumberFormatter::PERCENT
 123 457 % 1234.567 in fr_FR
  30. formatter styles • NumberFormatter::SCIENTIFIC
 1,234567E3 • NumberFormatter::SPELLOUT
 mille deux cent

    trente-quatre virgule cinq six sept • NumberFormatter::ORDINAL
 1 235e • NumberFormatter::DURATION
 1 235 1234.567 in fr_FR
  31. formatting $fmt = new NumberFormatter('en_GB', NumberFormatter::DECIMAL); $fmt->format(1234); ! $fmt =

    new NumberFormatter('de_CH', NumberFormatter::CURRENCY); $fmt->formatCurrency(1234, 'CNY'); 1,234 CN¥ 1'234.00
  32. parsing $fmt = new NumberFormatter('in_IN', NumberFormatter::DECIMAL); var_dump($fmt->parse('7.005.944', NumberFormatter::TYPE_INT32)); int(7005944)

  33. MessageFormatter

  34. purpose • produces concatenated messages in a language- neutral way

    • operates on patterns, which contain sub formats • program does not need to know the order of fragments
  35. messages Today is February 22, 2014. echo "Today is ",

    date("F d, Y"); old way intl way pattern Today is {0,date}. args array(time())
  36. formatting $pattern = "On {0,date} you have {1,number} meetings."; $args

    = array(time(), 2); $fmt = new MessageFormatter("en_US", $pattern); echo $fmt->format($args); On February 22, 2014 you have 2 meetings.
  37. formatting $pattern = "On {0,date,short} your balance was {1,number,currency}."; $args

    = array(time(), 184.22); $fmt = new MessageFormatter("en_GB", $pattern); echo $fmt->format($args); On 22/02/14 your balance was £184.22.
  38. formatting $fr_pattern = "Aujourd'hui, {2,date,dd MMMM}, il y a {0,number}

    personnes sur {1}."; $fr_args = array(7213518802, "la Terre", time()); ! $msg = new MessageFormatter("fr_FR", $fr_pattern); echo $msg->format($fr_args); Aujourd'hui, 22 février, il y a 7 213 518 802
 personnes sur la Terre.
  39. parsing messages $pattern = “On {0,date} you have {1,number} meetings.”;

    $text = “On February 22, 2014 you have 33 meetings.”; $msg = new MessageFormatter("en_US", $pattern); var_dump($fmt->parse($text)); array(2) { [0]=> int(1393056000) [1]=> int(33) }
  40. plural selection $pattern = "There {0,plural, =0{are no results} =1{is

    # result} other{are # results}} found.”; $fmt = new MessageFormatter("en_GB", $pattern); echo $fmt->format(array(0)); echo $fmt->format(array(12)); There are no results found. There are 12 results found.
  41. Break Iterators

  42. purpose • locate linguistic boundaries • supported units • characters

    • words • lines • sentences • more complex ones are possible with custom rules
  43. sentences $text = <<<END She asked, "Are you from U.K.?"

    John Smith Sr. nodded. END; ! $bi = IntlBreakIterator::createSentenceInstance("en"); $bi->setText($text); foreach ($bi->getPartsIterator() as $part) echo "** ", $part, "\n"; ** She asked, “Are you from U.K.?” ** John Smith Sr. nodded.
  44. lines $bi = IntlBreakIterator::createLineInstance("en"); $bi->setText($text); foreach ($sentenceBI->getPartsIterator() as $part) echo

    $part, "\n"; She asked, "Are you from U.K.?" John Waxby Sr. nodded.
  45. lines $offset = 39; $lineBI->first(); echo substr($text, 0, $lineBI->next()),"."; echo

    substr($text, 0, $lineBI->next()),"."; echo substr($text, 0, $lineBI->preceding($offset)),"*"; She * She asked, * She asked, "Are you from U.K.?" John *
  46. Resource Bundles

  47. purpose • contain resources for localization • messages, labels, formatting

    patterns, etc • accessed via locale-independent interface • fallback mechanism is key
  48. data hierarchy root root en es ja zh language Hans

    Hant script US ES MX JP CN HK country
  49. data format • simple resources • string, integer, binary data,

    integer array • complex resources • arrays and tables
  50. root.txt root { version:string { "1.0.0" } ! mainTitle {

    "Welcome to our store!" } errors:array { :string { "Website is experiencing difficulties" } :string { "Maximum of {0,number,integer}” "products are allowed" } } sizes:intvector { 10, 100 } }
  51. en_GB.txt en_GB { version { "1.0.1" } ! mainTitle:string {

    "Welcome to our old shoppe!" } sizes:intvector { 25, 250 } }
  52. compiling % mkdir myres % genrb -d myres root.txt en.txt

    en_GB.txt % ls myres genrb number of files: 3 en.res en_GB.res root.res
  53. retrieval • root $bundle = DIRNAME(__FILE__).'/myres'; $r = ResourceBundle::create('root', $bundle);

    echo $r['mainTitle']; echo $r['errors'][1]; print_r($r['sizes']); Welcome to our store! Maximum of {0,number,integer} products are allowed Array ( [0] => 10 [1] => 100 )
  54. retrieval • en_GB $bundle = DIRNAME(__FILE__).'/myres'; $r = ResourceBundle::create('en_GB', $bundle);

    echo $r['mainTitle']; echo $r['errors'][1]; print_r($r['sizes']); Welcome to our olde shoppe! Maximum of {0,number,integer} products are allowed Array ( [0] => 25 [1] => 250 )
  55. retrieval • de $bundle = DIRNAME(__FILE__).'/myres'; $r = ResourceBundle::create('de', $bundle);

    echo $r['mainTitle']; echo $r['errors'][1]; print_r($r['sizes']); Welcome to our store! Maximum of {0,number,integer} products are allowed Array ( [0] => 10 [1] => 100 )
  56. Spoof Checking

  57. paypaI.com You received a large payment. Click here to receive:

  58. purpose • prevent certain classes of security attacks • check

    identifiers (typically URLs) for visual confusion • single script • mixed script • whole script
  59. single script $url1 = "google.com";! $url2 = "goog1e.com";! ! $spoof

    = new SpoofChecker();! if ($spoof->areConfusable($url1, $url2))! echo "$url1 and $url2 are confusable\n";
  60. single script $url1 = "google.com";! $url2 = "goog1e.com";! ! $spoof

    = new SpoofChecker();! if ($spoof->areConfusable($url1, $url2))! echo "$url1 and $url2 are confusable\n"; google.com and goog1e.com are confusable
  61. mixed script $url1 = "yahoo.com"; $url2 = "yahоo.com"; ! $spoof

    = new SpoofChecker(); if ($spoof->areConfusable($url1, $url2)) echo "$url1 and $url2 are confusable\n";
  62. mixed script $url1 = "yahoo.com"; $url2 = "yahоo.com"; ! $spoof

    = new SpoofChecker(); if ($spoof->areConfusable($url1, $url2)) echo "$url1 and $url2 are confusable\n"; yahoo.com and yahоo.com are confusable
  63. suspicious $word = "Норе"; $spoof->setAllowedLocales("en_US"); if ($spoof->isSuspicious($word)) echo "$word is

    suspicous in en_US"; else echo "not suspicious"; Норе is suspicous in en_US
  64. suspicious $word = "Норе"; $spoof->setAllowedLocales("en_US,ru_RU"); if ($spoof->isSuspicious($word)) echo "$word is

    suspicous in en_US,ru_RU"; else echo "not suspicious"; not suspicious
  65. Transliterator

  66. purpose • originally used for script transliteration • much more

    general transform mechanism, including: • case • normalization • full/half-width • hex/character names
  67. transliteration IDs source-target/variant

  68. transliteration IDs Any-target/variant

  69. sample IDs • Katakana-Latin • Latin-ASCII • NFD • Any-Hex/XML

  70. script conversion $tr = Transliterator::create("Any-Latin"); $sign = 'ϚοΫυφϧυ'; echo $latin

    = $tr->transliterate($sign); $tr = Transliterator::create("Latin-Katakana"); var_dump($tr->transliterate($latin) == $sign); makkudonarudo
  71. script conversion $tr = Transliterator::create("Cyrillic-Latin"); echo $tr->transliterate('я в избушке сижу

    опять’); ! $tr = Transliterator::create("Russian-Latin/BGN"); echo $tr->transliterate('я в избушке сижу опять'); â v izbuške sižu opâtʹ
  72. script conversion â v izbuške sižu opâtʹ ya v izbushke

    sizhu opyatʹ $tr = Transliterator::create("Cyrillic-Latin"); echo $tr->transliterate('я в избушке сижу опять’); ! $tr = Transliterator::create("Russian-Latin/BGN"); echo $tr->transliterate('я в избушке сижу опять');
  73. Any-Name $tr = Transliterator::create("Any-Name"); echo $tr->transliterate('я$'); \N{CYRILLIC SMALL LETTER YA}\N{DOLLAR

    SIGN}
  74. Latin-ASCII $tr = Transliterator::create("Latin-ASCII"); echo $tr->transliterate("© 1990 «PHP»"); (C) 1990

    <<PHP>>
  75. compound IDs $tr = Transliterator::create("Greek-Latin"); echo $tr->transliterate("Αλφαβητικός Κατάλογος”); Alphabētikós Katálogos

  76. compound IDs $tr = Transliterator::create("Greek-Latin"); echo $tr->transliterate("Αλφαβητικός Κατάλογος”); $tr =

    Transliterator::create( "Greek-Latin; NFD; [:Nonspacing Mark:] Remove; NFC”); echo $tr->transliterate("Αλφαβητικός Κατάλογος”); Alphabētikós Katálogos Alphabetikos Katalogos
  77. rule-based transforms $rules = <<<'RULES' $space = ' ' ;

    $space {$space} > ; # collapse multiple spaces '--' <> — ; # convert fake dash into real one RULES; $tr = Transliterator::createFromRules($rules); echo $tr->transliterate("a very spacey -- and delimited -- remark”); a very spacey — and delimited — remark
  78. • pecl.php.net/intl • php.net/intl • cldr.unicode.org • userguide.icu-project.org

  79. спасибо thank you merci þakka þér ͋Γ͕ͱ͏