$30 off During Our Annual Pro Sale. View Details »

intl me this, intl me that

intl me this, intl me that

Quick, what's the proper way to show a number as currency for India? 1.2 billion people would like to know. No pressure. These common localization issues come up more and more frequently. Learn to use the improved intl extension to order lists, format numbers, split text into pieces and show calendars just like a local would. It even helps with character sets and time zones - 2 out of 3 things that commonly break the Internet. ¡Muy bueno!

Andrei Zmievski

February 22, 2014
Tweet

More Decks by Andrei Zmievski

Other Decks in Technology

Transcript

  1. intl me this,
    intl me that
    Andrei Zmievski

    AppDynamics
    PHP UK ~ February 22, 2014 ~ London

    View Slide

  2. me
    • Software architect at AppDynamics
    • PHP Core contributor (1999-2010)
    • Architect of the Unicode/i18n in PHP 6
    • Twitter: @a
    • Beer lover (and brewer)

    View Slide

  3. unicode 7

    View Slide

  4. terms
    • Internationalization (i18n)
    • to design and develop an application without built-in
    cultural assumptions that is efficient to localize
    • Localization (l10n)
    • to tailor an application to meet the needs of a particular
    region, market, or culture

    View Slide

  5. no assumptions
    • English/French/Chinese is just another language
    • Your country is just another country
    • Earth is just another planet (eventually)

    View Slide

  6. View Slide

  7. why localize?
    • English speakers are now a minority on WWW
    • Nearly 3 out of 4 participants surveyed by Common
    Sense Advisory agreed that they were more likely to
    buy from sites in their own languages than in English
    • Global consumers will pay more for products with
    information in their language

    View Slide

  8. locale
    • identifier referring to linguistic and cultural preferences of a user
    community
    • language
    • script
    • country
    • variant
    • @keywords
    sr_Latn_YU_REVISED@currency=USD
    en_GB

    View Slide

  9. locale data
    • Common Locale Data Repository (CLDR)
    • 740 locales: 238 languages and 259 territories
    • updated regularly

    View Slide

  10. intl
    • available since PHP 5.3
    • bundled locale data
    • formatters/parsers
    • collation (sorting)
    • calendars and
    timezones
    • boundary iteration
    • transliteration
    • resource bundles
    • character set
    conversion
    • spoof checking

    View Slide

  11. API
    • OO and procedural API
    • Same underlying implementation
    collator_create() new Collator()
    collator_set_strength() $collator->setStrength()
    numfmt_format() NumberFormatter::format()

    View Slide

  12. UConverter

    View Slide

  13. purpose
    • robust conversion between character encodings
    • replacement for mb_convert_encoding()

    View Slide

  14. simple
    echo UConverter::transcode(
    "Cura\xe7ao", "utf-8", "iso-8859-1");
    echo UConverter::transcode(
    "Curaçao", "iso-8859-8", "utf-8");
    echo UConverter::transcode(
    "Curaçao", "iso-8859-8", "utf-8",
    array("to_subst" => "×"));
    Curaçao

    View Slide

  15. simple
    echo UConverter::transcode(
    "Cura\xe7ao", "utf-8", "iso-8859-1");
    echo UConverter::transcode(
    "Curaçao", "iso-8859-8", "utf-8");
    echo UConverter::transcode(
    "Curaçao", "iso-8859-8", "utf-8",
    array("to_subst" => "×"));
    Curaçao
    Curaao

    View Slide

  16. simple
    echo UConverter::transcode(
    "Cura\xe7ao", "utf-8", "iso-8859-1");
    echo UConverter::transcode(
    "Curaçao", "iso-8859-8", "utf-8");
    echo UConverter::transcode(
    "Curaçao", "iso-8859-8", "utf-8",
    array("to_subst" => "×"));
    Curaçao
    Curaao
    Cura×ao

    View Slide

  17. callbacks
    class MyConverter extends UConverter {
    public function fromUCallback($reason, $source,
    $codepoint, &$error) {
    if (($reason == UConverter::REASON_UNASSIGNED)
    && ($codepoint == 0x221A)) {
    // translate √ to sqrt
    $error = U_ZERO_ERROR;
    return 'square root of ';
    }
    }
    }
    $c = new MyConverter('ascii', 'utf-8');
    echo $c->convert("What is √2?");
    What is square root of 2?

    View Slide

  18. Collator

    View Slide

  19. sorting
    • languages may sort more than one way
    • traditional vs. modern Spanish
    • Japanese stroke-radical vs. radical-stroke
    • German dictionary vs. phone book

    View Slide

  20. collation levels
    primary base characters
    secondary accents and language quirks
    tertiary case and variants of base forms
    quaternary you will never use this
    identical tie-breaker

    View Slide

  21. collation levels
    • Each locale has default level setting
    • Differences in lower levels are ignored if higher
    levels are already different

    View Slide

  22. comparing strings
    côte < coté
    $coll = new Collator("fr_FR");
    if ($coll->compare("côte", "coté") < 0) {
    echo "before";
    } else {
    echo "after";
    }
    before

    View Slide

  23. strength control
    $coll = new Collator("fr_FR");
    $coll->setStrength(Collator::PRIMARY);
    if ($coll->compare("côte", "coté") == 0) {
    echo "same";
    } else {
    echo "different";
    }
    côte = coté
    same

    View Slide

  24. sorting strings
    cote
    côte
    Côte
    coté
    Coté
    côté
    Côté
    coter
    $strings = array(
    "cote", "côte", "Côte", "coté",
    "Coté", "côté", "Côté", "coter");
    $coll = new Collator("fr_FR");
    $coll->sort($strings);

    View Slide

  25. other attributes
    $coll = new Collator("en_US");
    $coll->setAttribute(Collator::CASE_FIRST,
    Collator::UPPER_FIRST);
    if ($coll->compare("abc", "ABC") < 0) {
    echo "before";
    } else {
    echo "after";
    }
    ABC < abc
    before

    View Slide

  26. numeric collation
    1 < 2 < 10
    $strings = array("10", "1", "2");
    $coll->setStrength(Collator::NUMERIC_COLLATION,
    Collator::ON);
    $coll = new Collator(null);
    $coll->sort($strings);

    View Slide

  27. NumberFormatter

    View Slide

  28. purpose
    • formats numbers as strings according to the locale,
    given pattern or set of rules
    • parses strings into numbers according to these
    patterns
    • replacement for number_format()

    View Slide

  29. formatter styles
    • NumberFormatter::PATTERN_DECIMAL

    1234,567 (with ##.##)
    • NumberFormatter::DECIMAL

    1 234,56
    • NumberFormatter::CURRENCY

    1 234,57 €
    • NumberFormatter::PERCENT

    123 457 %
    1234.567 in fr_FR

    View Slide

  30. formatter styles
    • NumberFormatter::SCIENTIFIC

    1,234567E3
    • NumberFormatter::SPELLOUT

    mille deux cent trente-quatre virgule cinq six sept
    • NumberFormatter::ORDINAL

    1 235e
    • NumberFormatter::DURATION

    1 235
    1234.567 in fr_FR

    View Slide

  31. formatting
    $fmt = new NumberFormatter('en_GB',
    NumberFormatter::DECIMAL);
    $fmt->format(1234);
    !
    $fmt = new NumberFormatter('de_CH',
    NumberFormatter::CURRENCY);
    $fmt->formatCurrency(1234, 'CNY');
    1,234
    CN¥ 1'234.00

    View Slide

  32. parsing
    $fmt = new NumberFormatter('in_IN',
    NumberFormatter::DECIMAL);
    var_dump($fmt->parse('7.005.944',
    NumberFormatter::TYPE_INT32));
    int(7005944)

    View Slide

  33. MessageFormatter

    View Slide

  34. purpose
    • produces concatenated messages in a language-
    neutral way
    • operates on patterns, which contain sub formats
    • program does not need to know the order of
    fragments

    View Slide

  35. messages
    Today is February 22, 2014.
    echo "Today is ", date("F d, Y");
    old way
    intl way
    pattern Today is {0,date}.
    args array(time())

    View Slide

  36. formatting
    $pattern = "On {0,date} you have {1,number} meetings.";
    $args = array(time(), 2);
    $fmt = new MessageFormatter("en_US", $pattern);
    echo $fmt->format($args);
    On February 22, 2014 you have 2 meetings.

    View Slide

  37. formatting
    $pattern = "On {0,date,short} your balance was
    {1,number,currency}.";
    $args = array(time(), 184.22);
    $fmt = new MessageFormatter("en_GB", $pattern);
    echo $fmt->format($args);
    On 22/02/14 your balance was £184.22.

    View Slide

  38. formatting
    $fr_pattern = "Aujourd'hui, {2,date,dd MMMM},
    il y a {0,number} personnes sur {1}.";
    $fr_args = array(7213518802, "la Terre", time());
    !
    $msg = new MessageFormatter("fr_FR", $fr_pattern);
    echo $msg->format($fr_args);
    Aujourd'hui, 22 février, il y a 7 213 518 802

    personnes sur la Terre.

    View Slide

  39. parsing messages
    $pattern = “On {0,date} you have {1,number} meetings.”;
    $text = “On February 22, 2014 you have 33 meetings.”;
    $msg = new MessageFormatter("en_US", $pattern);
    var_dump($fmt->parse($text));
    array(2) {
    [0]=>
    int(1393056000)
    [1]=>
    int(33)
    }

    View Slide

  40. plural selection
    $pattern = "There {0,plural,
    =0{are no results}
    =1{is # result}
    other{are # results}} found.”;
    $fmt = new MessageFormatter("en_GB", $pattern);
    echo $fmt->format(array(0));
    echo $fmt->format(array(12));
    There are no results found.
    There are 12 results found.

    View Slide

  41. Break Iterators

    View Slide

  42. purpose
    • locate linguistic boundaries
    • supported units
    • characters
    • words
    • lines
    • sentences
    • more complex ones are possible with custom rules

    View Slide

  43. sentences
    $text = <<She asked, "Are you from U.K.?" John Smith Sr. nodded.
    END;
    !
    $bi = IntlBreakIterator::createSentenceInstance("en");
    $bi->setText($text);
    foreach ($bi->getPartsIterator() as $part)
    echo "** ", $part, "\n";
    ** She asked, “Are you from U.K.?”
    ** John Smith Sr. nodded.

    View Slide

  44. lines
    $bi = IntlBreakIterator::createLineInstance("en");
    $bi->setText($text);
    foreach ($sentenceBI->getPartsIterator() as $part)
    echo $part, "\n";
    She
    asked,
    "Are
    you
    from
    U.K.?"
    John
    Waxby
    Sr.
    nodded.

    View Slide

  45. lines
    $offset = 39;
    $lineBI->first();
    echo substr($text, 0, $lineBI->next()),".";
    echo substr($text, 0, $lineBI->next()),".";
    echo substr($text, 0, $lineBI->preceding($offset)),"*";
    She *
    She asked, *
    She asked, "Are you from U.K.?" John *

    View Slide

  46. Resource Bundles

    View Slide

  47. purpose
    • contain resources for localization
    • messages, labels, formatting patterns, etc
    • accessed via locale-independent interface
    • fallback mechanism is key

    View Slide

  48. data hierarchy
    root root
    en es ja zh language
    Hans Hant script
    US ES MX JP CN HK country

    View Slide

  49. data format
    • simple resources
    • string, integer, binary data, integer array
    • complex resources
    • arrays and tables

    View Slide

  50. root.txt
    root {
    version:string { "1.0.0" }
    !
    mainTitle {
    "Welcome to our store!" }
    errors:array {
    :string { "Website is experiencing difficulties" }
    :string { "Maximum of {0,number,integer}”
    "products are allowed" }
    }
    sizes:intvector { 10, 100 }
    }

    View Slide

  51. en_GB.txt
    en_GB {
    version { "1.0.1" }
    !
    mainTitle:string { "Welcome to our old shoppe!" }
    sizes:intvector { 25, 250 }
    }

    View Slide

  52. compiling
    % mkdir myres
    % genrb -d myres root.txt en.txt en_GB.txt
    % ls myres
    genrb number of files: 3
    en.res
    en_GB.res
    root.res

    View Slide

  53. retrieval • root
    $bundle = DIRNAME(__FILE__).'/myres';
    $r = ResourceBundle::create('root', $bundle);
    echo $r['mainTitle'];
    echo $r['errors'][1];
    print_r($r['sizes']);
    Welcome to our store!
    Maximum of {0,number,integer} products are allowed
    Array
    (
    [0] => 10
    [1] => 100
    )

    View Slide

  54. retrieval • en_GB
    $bundle = DIRNAME(__FILE__).'/myres';
    $r = ResourceBundle::create('en_GB', $bundle);
    echo $r['mainTitle'];
    echo $r['errors'][1];
    print_r($r['sizes']);
    Welcome to our olde shoppe!
    Maximum of {0,number,integer} products are allowed
    Array
    (
    [0] => 25
    [1] => 250
    )

    View Slide

  55. retrieval • de
    $bundle = DIRNAME(__FILE__).'/myres';
    $r = ResourceBundle::create('de', $bundle);
    echo $r['mainTitle'];
    echo $r['errors'][1];
    print_r($r['sizes']);
    Welcome to our store!
    Maximum of {0,number,integer} products are allowed
    Array
    (
    [0] => 10
    [1] => 100
    )

    View Slide

  56. Spoof Checking

    View Slide

  57. paypaI.com
    You received a large payment.
    Click here to receive:

    View Slide

  58. purpose
    • prevent certain classes of security attacks
    • check identifiers (typically URLs) for visual confusion
    • single script
    • mixed script
    • whole script

    View Slide

  59. single script
    $url1 = "google.com";!
    $url2 = "goog1e.com";!
    !
    $spoof = new SpoofChecker();!
    if ($spoof->areConfusable($url1, $url2))!
    echo "$url1 and $url2 are confusable\n";

    View Slide

  60. single script
    $url1 = "google.com";!
    $url2 = "goog1e.com";!
    !
    $spoof = new SpoofChecker();!
    if ($spoof->areConfusable($url1, $url2))!
    echo "$url1 and $url2 are confusable\n";
    google.com and goog1e.com are confusable

    View Slide

  61. mixed script
    $url1 = "yahoo.com";
    $url2 = "yahоo.com";
    !
    $spoof = new SpoofChecker();
    if ($spoof->areConfusable($url1, $url2))
    echo "$url1 and $url2 are confusable\n";

    View Slide

  62. mixed script
    $url1 = "yahoo.com";
    $url2 = "yahоo.com";
    !
    $spoof = new SpoofChecker();
    if ($spoof->areConfusable($url1, $url2))
    echo "$url1 and $url2 are confusable\n";
    yahoo.com and yahоo.com are confusable

    View Slide

  63. suspicious
    $word = "Норе";
    $spoof->setAllowedLocales("en_US");
    if ($spoof->isSuspicious($word))
    echo "$word is suspicous in en_US";
    else
    echo "not suspicious";
    Норе is suspicous in en_US

    View Slide

  64. suspicious
    $word = "Норе";
    $spoof->setAllowedLocales("en_US,ru_RU");
    if ($spoof->isSuspicious($word))
    echo "$word is suspicous in en_US,ru_RU";
    else
    echo "not suspicious";
    not suspicious

    View Slide

  65. Transliterator

    View Slide

  66. purpose
    • originally used for script transliteration
    • much more general transform mechanism, including:
    • case
    • normalization
    • full/half-width
    • hex/character names

    View Slide

  67. transliteration IDs
    source-target/variant

    View Slide

  68. transliteration IDs
    Any-target/variant

    View Slide

  69. sample IDs
    • Katakana-Latin
    • Latin-ASCII
    • NFD
    • Any-Hex/XML

    View Slide

  70. script conversion
    $tr = Transliterator::create("Any-Latin");
    $sign = 'ϚοΫυφϧυ';
    echo $latin = $tr->transliterate($sign);
    $tr = Transliterator::create("Latin-Katakana");
    var_dump($tr->transliterate($latin) == $sign);
    makkudonarudo

    View Slide

  71. script conversion
    $tr = Transliterator::create("Cyrillic-Latin");
    echo $tr->transliterate('я в избушке сижу опять’);
    !
    $tr = Transliterator::create("Russian-Latin/BGN");
    echo $tr->transliterate('я в избушке сижу опять');
    â v izbuške sižu opâtʹ

    View Slide

  72. script conversion
    â v izbuške sižu opâtʹ
    ya v izbushke sizhu opyatʹ
    $tr = Transliterator::create("Cyrillic-Latin");
    echo $tr->transliterate('я в избушке сижу опять’);
    !
    $tr = Transliterator::create("Russian-Latin/BGN");
    echo $tr->transliterate('я в избушке сижу опять');

    View Slide

  73. Any-Name
    $tr = Transliterator::create("Any-Name");
    echo $tr->transliterate('я$');
    \N{CYRILLIC SMALL LETTER YA}\N{DOLLAR SIGN}

    View Slide

  74. Latin-ASCII
    $tr = Transliterator::create("Latin-ASCII");
    echo $tr->transliterate("© 1990 «PHP»");
    (C) 1990 <>

    View Slide

  75. compound IDs
    $tr = Transliterator::create("Greek-Latin");
    echo $tr->transliterate("Αλφαβητικός Κατάλογος”);
    Alphabētikós Katálogos

    View Slide

  76. compound IDs
    $tr = Transliterator::create("Greek-Latin");
    echo $tr->transliterate("Αλφαβητικός Κατάλογος”);
    $tr = Transliterator::create(
    "Greek-Latin; NFD; [:Nonspacing Mark:] Remove; NFC”);
    echo $tr->transliterate("Αλφαβητικός Κατάλογος”);
    Alphabētikós Katálogos
    Alphabetikos Katalogos

    View Slide

  77. rule-based transforms
    $rules = <<<'RULES'
    $space = ' ' ;
    $space {$space} > ; # collapse multiple spaces
    '--' <> — ; # convert fake dash into real one
    RULES;
    $tr = Transliterator::createFromRules($rules);
    echo $tr->transliterate("a very spacey -- and
    delimited -- remark”);
    a very spacey — and delimited — remark

    View Slide

  78. • pecl.php.net/intl
    • php.net/intl
    • cldr.unicode.org
    • userguide.icu-project.org

    View Slide

  79. спасибо
    thank you
    merci
    þakka þér
    ͋Γ͕ͱ͏

    View Slide