Upgrade to Pro — share decks privately, control downloads, hide ads and more …

intl me this, intl me that

intl me this, intl me that

Quick, what's the proper way to show a number as currency for India? 1.2 billion people would like to know. No pressure. These common localization issues come up more and more frequently. Learn to use the improved intl extension to order lists, format numbers, split text into pieces and show calendars just like a local would. It even helps with character sets and time zones - 2 out of 3 things that commonly break the Internet. ¡Muy bueno!

Andrei Zmievski

February 22, 2014
Tweet

More Decks by Andrei Zmievski

Other Decks in Technology

Transcript

  1. me • Software architect at AppDynamics • PHP Core contributor

    (1999-2010) • Architect of the Unicode/i18n in PHP 6 • Twitter: @a • Beer lover (and brewer)
  2. terms • Internationalization (i18n) • to design and develop an

    application without built-in cultural assumptions that is efficient to localize • Localization (l10n) • to tailor an application to meet the needs of a particular region, market, or culture
  3. no assumptions • English/French/Chinese is just another language • Your

    country is just another country • Earth is just another planet (eventually)
  4. why localize? • English speakers are now a minority on

    WWW • Nearly 3 out of 4 participants surveyed by Common Sense Advisory agreed that they were more likely to buy from sites in their own languages than in English • Global consumers will pay more for products with information in their language
  5. locale • identifier referring to linguistic and cultural preferences of

    a user community • language • script • country • variant • @keywords sr_Latn_YU_REVISED@currency=USD en_GB
  6. locale data • Common Locale Data Repository (CLDR) • 740

    locales: 238 languages and 259 territories • updated regularly
  7. intl • available since PHP 5.3 • bundled locale data

    • formatters/parsers • collation (sorting) • calendars and timezones • boundary iteration • transliteration • resource bundles • character set conversion • spoof checking
  8. API • OO and procedural API • Same underlying implementation

    collator_create() new Collator() collator_set_strength() $collator->setStrength() numfmt_format() NumberFormatter::format()
  9. simple echo UConverter::transcode( "Cura\xe7ao", "utf-8", "iso-8859-1"); echo UConverter::transcode( "Curaçao", "iso-8859-8",

    "utf-8"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8", array("to_subst" => "×")); Curaçao
  10. simple echo UConverter::transcode( "Cura\xe7ao", "utf-8", "iso-8859-1"); echo UConverter::transcode( "Curaçao", "iso-8859-8",

    "utf-8"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8", array("to_subst" => "×")); Curaçao Curaao
  11. simple echo UConverter::transcode( "Cura\xe7ao", "utf-8", "iso-8859-1"); echo UConverter::transcode( "Curaçao", "iso-8859-8",

    "utf-8"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8", array("to_subst" => "×")); Curaçao Curaao Cura×ao
  12. callbacks class MyConverter extends UConverter { public function fromUCallback($reason, $source,

    $codepoint, &$error) { if (($reason == UConverter::REASON_UNASSIGNED) && ($codepoint == 0x221A)) { // translate √ to sqrt $error = U_ZERO_ERROR; return 'square root of '; } } } $c = new MyConverter('ascii', 'utf-8'); echo $c->convert("What is √2?"); What is square root of 2?
  13. sorting • languages may sort more than one way •

    traditional vs. modern Spanish • Japanese stroke-radical vs. radical-stroke • German dictionary vs. phone book
  14. collation levels primary base characters secondary accents and language quirks

    tertiary case and variants of base forms quaternary you will never use this identical tie-breaker
  15. collation levels • Each locale has default level setting •

    Differences in lower levels are ignored if higher levels are already different
  16. comparing strings côte < coté $coll = new Collator("fr_FR"); if

    ($coll->compare("côte", "coté") < 0) { echo "before"; } else { echo "after"; } before
  17. sorting strings cote côte Côte coté Coté côté Côté coter

    $strings = array( "cote", "côte", "Côte", "coté", "Coté", "côté", "Côté", "coter"); $coll = new Collator("fr_FR"); $coll->sort($strings);
  18. numeric collation 1 < 2 < 10 $strings = array("10",

    "1", "2"); $coll->setStrength(Collator::NUMERIC_COLLATION, Collator::ON); $coll = new Collator(null); $coll->sort($strings);
  19. purpose • formats numbers as strings according to the locale,

    given pattern or set of rules • parses strings into numbers according to these patterns • replacement for number_format()
  20. formatter styles • NumberFormatter::PATTERN_DECIMAL
 1234,567 (with ##.##) • NumberFormatter::DECIMAL
 1

    234,56 • NumberFormatter::CURRENCY
 1 234,57 € • NumberFormatter::PERCENT
 123 457 % 1234.567 in fr_FR
  21. formatter styles • NumberFormatter::SCIENTIFIC
 1,234567E3 • NumberFormatter::SPELLOUT
 mille deux cent

    trente-quatre virgule cinq six sept • NumberFormatter::ORDINAL
 1 235e • NumberFormatter::DURATION
 1 235 1234.567 in fr_FR
  22. formatting $fmt = new NumberFormatter('en_GB', NumberFormatter::DECIMAL); $fmt->format(1234); ! $fmt =

    new NumberFormatter('de_CH', NumberFormatter::CURRENCY); $fmt->formatCurrency(1234, 'CNY'); 1,234 CN¥ 1'234.00
  23. purpose • produces concatenated messages in a language- neutral way

    • operates on patterns, which contain sub formats • program does not need to know the order of fragments
  24. messages Today is February 22, 2014. echo "Today is ",

    date("F d, Y"); old way intl way pattern Today is {0,date}. args array(time())
  25. formatting $pattern = "On {0,date} you have {1,number} meetings."; $args

    = array(time(), 2); $fmt = new MessageFormatter("en_US", $pattern); echo $fmt->format($args); On February 22, 2014 you have 2 meetings.
  26. formatting $pattern = "On {0,date,short} your balance was {1,number,currency}."; $args

    = array(time(), 184.22); $fmt = new MessageFormatter("en_GB", $pattern); echo $fmt->format($args); On 22/02/14 your balance was £184.22.
  27. formatting $fr_pattern = "Aujourd'hui, {2,date,dd MMMM}, il y a {0,number}

    personnes sur {1}."; $fr_args = array(7213518802, "la Terre", time()); ! $msg = new MessageFormatter("fr_FR", $fr_pattern); echo $msg->format($fr_args); Aujourd'hui, 22 février, il y a 7 213 518 802
 personnes sur la Terre.
  28. parsing messages $pattern = “On {0,date} you have {1,number} meetings.”;

    $text = “On February 22, 2014 you have 33 meetings.”; $msg = new MessageFormatter("en_US", $pattern); var_dump($fmt->parse($text)); array(2) { [0]=> int(1393056000) [1]=> int(33) }
  29. plural selection $pattern = "There {0,plural, =0{are no results} =1{is

    # result} other{are # results}} found.”; $fmt = new MessageFormatter("en_GB", $pattern); echo $fmt->format(array(0)); echo $fmt->format(array(12)); There are no results found. There are 12 results found.
  30. purpose • locate linguistic boundaries • supported units • characters

    • words • lines • sentences • more complex ones are possible with custom rules
  31. sentences $text = <<<END She asked, "Are you from U.K.?"

    John Smith Sr. nodded. END; ! $bi = IntlBreakIterator::createSentenceInstance("en"); $bi->setText($text); foreach ($bi->getPartsIterator() as $part) echo "** ", $part, "\n"; ** She asked, “Are you from U.K.?” ** John Smith Sr. nodded.
  32. lines $offset = 39; $lineBI->first(); echo substr($text, 0, $lineBI->next()),"."; echo

    substr($text, 0, $lineBI->next()),"."; echo substr($text, 0, $lineBI->preceding($offset)),"*"; She * She asked, * She asked, "Are you from U.K.?" John *
  33. purpose • contain resources for localization • messages, labels, formatting

    patterns, etc • accessed via locale-independent interface • fallback mechanism is key
  34. data hierarchy root root en es ja zh language Hans

    Hant script US ES MX JP CN HK country
  35. data format • simple resources • string, integer, binary data,

    integer array • complex resources • arrays and tables
  36. root.txt root { version:string { "1.0.0" } ! mainTitle {

    "Welcome to our store!" } errors:array { :string { "Website is experiencing difficulties" } :string { "Maximum of {0,number,integer}” "products are allowed" } } sizes:intvector { 10, 100 } }
  37. en_GB.txt en_GB { version { "1.0.1" } ! mainTitle:string {

    "Welcome to our old shoppe!" } sizes:intvector { 25, 250 } }
  38. compiling % mkdir myres % genrb -d myres root.txt en.txt

    en_GB.txt % ls myres genrb number of files: 3 en.res en_GB.res root.res
  39. retrieval • root $bundle = DIRNAME(__FILE__).'/myres'; $r = ResourceBundle::create('root', $bundle);

    echo $r['mainTitle']; echo $r['errors'][1]; print_r($r['sizes']); Welcome to our store! Maximum of {0,number,integer} products are allowed Array ( [0] => 10 [1] => 100 )
  40. retrieval • en_GB $bundle = DIRNAME(__FILE__).'/myres'; $r = ResourceBundle::create('en_GB', $bundle);

    echo $r['mainTitle']; echo $r['errors'][1]; print_r($r['sizes']); Welcome to our olde shoppe! Maximum of {0,number,integer} products are allowed Array ( [0] => 25 [1] => 250 )
  41. retrieval • de $bundle = DIRNAME(__FILE__).'/myres'; $r = ResourceBundle::create('de', $bundle);

    echo $r['mainTitle']; echo $r['errors'][1]; print_r($r['sizes']); Welcome to our store! Maximum of {0,number,integer} products are allowed Array ( [0] => 10 [1] => 100 )
  42. purpose • prevent certain classes of security attacks • check

    identifiers (typically URLs) for visual confusion • single script • mixed script • whole script
  43. single script $url1 = "google.com";! $url2 = "goog1e.com";! ! $spoof

    = new SpoofChecker();! if ($spoof->areConfusable($url1, $url2))! echo "$url1 and $url2 are confusable\n";
  44. single script $url1 = "google.com";! $url2 = "goog1e.com";! ! $spoof

    = new SpoofChecker();! if ($spoof->areConfusable($url1, $url2))! echo "$url1 and $url2 are confusable\n"; google.com and goog1e.com are confusable
  45. mixed script $url1 = "yahoo.com"; $url2 = "yahоo.com"; ! $spoof

    = new SpoofChecker(); if ($spoof->areConfusable($url1, $url2)) echo "$url1 and $url2 are confusable\n";
  46. mixed script $url1 = "yahoo.com"; $url2 = "yahоo.com"; ! $spoof

    = new SpoofChecker(); if ($spoof->areConfusable($url1, $url2)) echo "$url1 and $url2 are confusable\n"; yahoo.com and yahоo.com are confusable
  47. suspicious $word = "Норе"; $spoof->setAllowedLocales("en_US"); if ($spoof->isSuspicious($word)) echo "$word is

    suspicous in en_US"; else echo "not suspicious"; Норе is suspicous in en_US
  48. purpose • originally used for script transliteration • much more

    general transform mechanism, including: • case • normalization • full/half-width • hex/character names
  49. script conversion $tr = Transliterator::create("Any-Latin"); $sign = 'ϚοΫυφϧυ'; echo $latin

    = $tr->transliterate($sign); $tr = Transliterator::create("Latin-Katakana"); var_dump($tr->transliterate($latin) == $sign); makkudonarudo
  50. script conversion $tr = Transliterator::create("Cyrillic-Latin"); echo $tr->transliterate('я в избушке сижу

    опять’); ! $tr = Transliterator::create("Russian-Latin/BGN"); echo $tr->transliterate('я в избушке сижу опять'); â v izbuške sižu opâtʹ
  51. script conversion â v izbuške sižu opâtʹ ya v izbushke

    sizhu opyatʹ $tr = Transliterator::create("Cyrillic-Latin"); echo $tr->transliterate('я в избушке сижу опять’); ! $tr = Transliterator::create("Russian-Latin/BGN"); echo $tr->transliterate('я в избушке сижу опять');
  52. compound IDs $tr = Transliterator::create("Greek-Latin"); echo $tr->transliterate("Αλφαβητικός Κατάλογος”); $tr =

    Transliterator::create( "Greek-Latin; NFD; [:Nonspacing Mark:] Remove; NFC”); echo $tr->transliterate("Αλφαβητικός Κατάλογος”); Alphabētikós Katálogos Alphabetikos Katalogos
  53. rule-based transforms $rules = <<<'RULES' $space = ' ' ;

    $space {$space} > ; # collapse multiple spaces '--' <> — ; # convert fake dash into real one RULES; $tr = Transliterator::createFromRules($rules); echo $tr->transliterate("a very spacey -- and delimited -- remark”); a very spacey — and delimited — remark