Slide 1

Slide 1 text

intl me this, intl me that Andrei Zmievski
 AppDynamics PHP UK ~ February 22, 2014 ~ London

Slide 2

Slide 2 text

me • Software architect at AppDynamics • PHP Core contributor (1999-2010) • Architect of the Unicode/i18n in PHP 6 • Twitter: @a • Beer lover (and brewer)

Slide 3

Slide 3 text

unicode 7

Slide 4

Slide 4 text

terms • Internationalization (i18n) • to design and develop an application without built-in cultural assumptions that is efficient to localize • Localization (l10n) • to tailor an application to meet the needs of a particular region, market, or culture

Slide 5

Slide 5 text

no assumptions • English/French/Chinese is just another language • Your country is just another country • Earth is just another planet (eventually)

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

why localize? • English speakers are now a minority on WWW • Nearly 3 out of 4 participants surveyed by Common Sense Advisory agreed that they were more likely to buy from sites in their own languages than in English • Global consumers will pay more for products with information in their language

Slide 8

Slide 8 text

locale • identifier referring to linguistic and cultural preferences of a user community • language • script • country • variant • @keywords sr_Latn_YU_REVISED@currency=USD en_GB

Slide 9

Slide 9 text

locale data • Common Locale Data Repository (CLDR) • 740 locales: 238 languages and 259 territories • updated regularly

Slide 10

Slide 10 text

intl • available since PHP 5.3 • bundled locale data • formatters/parsers • collation (sorting) • calendars and timezones • boundary iteration • transliteration • resource bundles • character set conversion • spoof checking

Slide 11

Slide 11 text

API • OO and procedural API • Same underlying implementation collator_create() new Collator() collator_set_strength() $collator->setStrength() numfmt_format() NumberFormatter::format()

Slide 12

Slide 12 text

UConverter

Slide 13

Slide 13 text

purpose • robust conversion between character encodings • replacement for mb_convert_encoding()

Slide 14

Slide 14 text

simple echo UConverter::transcode( "Cura\xe7ao", "utf-8", "iso-8859-1"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8", array("to_subst" => "×")); Curaçao

Slide 15

Slide 15 text

simple echo UConverter::transcode( "Cura\xe7ao", "utf-8", "iso-8859-1"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8", array("to_subst" => "×")); Curaçao Curaao

Slide 16

Slide 16 text

simple echo UConverter::transcode( "Cura\xe7ao", "utf-8", "iso-8859-1"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8"); echo UConverter::transcode( "Curaçao", "iso-8859-8", "utf-8", array("to_subst" => "×")); Curaçao Curaao Cura×ao

Slide 17

Slide 17 text

callbacks class MyConverter extends UConverter { public function fromUCallback($reason, $source, $codepoint, &$error) { if (($reason == UConverter::REASON_UNASSIGNED) && ($codepoint == 0x221A)) { // translate √ to sqrt $error = U_ZERO_ERROR; return 'square root of '; } } } $c = new MyConverter('ascii', 'utf-8'); echo $c->convert("What is √2?"); What is square root of 2?

Slide 18

Slide 18 text

Collator

Slide 19

Slide 19 text

sorting • languages may sort more than one way • traditional vs. modern Spanish • Japanese stroke-radical vs. radical-stroke • German dictionary vs. phone book

Slide 20

Slide 20 text

collation levels primary base characters secondary accents and language quirks tertiary case and variants of base forms quaternary you will never use this identical tie-breaker

Slide 21

Slide 21 text

collation levels • Each locale has default level setting • Differences in lower levels are ignored if higher levels are already different

Slide 22

Slide 22 text

comparing strings côte < coté $coll = new Collator("fr_FR"); if ($coll->compare("côte", "coté") < 0) { echo "before"; } else { echo "after"; } before

Slide 23

Slide 23 text

strength control $coll = new Collator("fr_FR"); $coll->setStrength(Collator::PRIMARY); if ($coll->compare("côte", "coté") == 0) { echo "same"; } else { echo "different"; } côte = coté same

Slide 24

Slide 24 text

sorting strings cote côte Côte coté Coté côté Côté coter $strings = array( "cote", "côte", "Côte", "coté", "Coté", "côté", "Côté", "coter"); $coll = new Collator("fr_FR"); $coll->sort($strings);

Slide 25

Slide 25 text

other attributes $coll = new Collator("en_US"); $coll->setAttribute(Collator::CASE_FIRST, Collator::UPPER_FIRST); if ($coll->compare("abc", "ABC") < 0) { echo "before"; } else { echo "after"; } ABC < abc before

Slide 26

Slide 26 text

numeric collation 1 < 2 < 10 $strings = array("10", "1", "2"); $coll->setStrength(Collator::NUMERIC_COLLATION, Collator::ON); $coll = new Collator(null); $coll->sort($strings);

Slide 27

Slide 27 text

NumberFormatter

Slide 28

Slide 28 text

purpose • formats numbers as strings according to the locale, given pattern or set of rules • parses strings into numbers according to these patterns • replacement for number_format()

Slide 29

Slide 29 text

formatter styles • NumberFormatter::PATTERN_DECIMAL
 1234,567 (with ##.##) • NumberFormatter::DECIMAL
 1 234,56 • NumberFormatter::CURRENCY
 1 234,57 € • NumberFormatter::PERCENT
 123 457 % 1234.567 in fr_FR

Slide 30

Slide 30 text

formatter styles • NumberFormatter::SCIENTIFIC
 1,234567E3 • NumberFormatter::SPELLOUT
 mille deux cent trente-quatre virgule cinq six sept • NumberFormatter::ORDINAL
 1 235e • NumberFormatter::DURATION
 1 235 1234.567 in fr_FR

Slide 31

Slide 31 text

formatting $fmt = new NumberFormatter('en_GB', NumberFormatter::DECIMAL); $fmt->format(1234); ! $fmt = new NumberFormatter('de_CH', NumberFormatter::CURRENCY); $fmt->formatCurrency(1234, 'CNY'); 1,234 CN¥ 1'234.00

Slide 32

Slide 32 text

parsing $fmt = new NumberFormatter('in_IN', NumberFormatter::DECIMAL); var_dump($fmt->parse('7.005.944', NumberFormatter::TYPE_INT32)); int(7005944)

Slide 33

Slide 33 text

MessageFormatter

Slide 34

Slide 34 text

purpose • produces concatenated messages in a language- neutral way • operates on patterns, which contain sub formats • program does not need to know the order of fragments

Slide 35

Slide 35 text

messages Today is February 22, 2014. echo "Today is ", date("F d, Y"); old way intl way pattern Today is {0,date}. args array(time())

Slide 36

Slide 36 text

formatting $pattern = "On {0,date} you have {1,number} meetings."; $args = array(time(), 2); $fmt = new MessageFormatter("en_US", $pattern); echo $fmt->format($args); On February 22, 2014 you have 2 meetings.

Slide 37

Slide 37 text

formatting $pattern = "On {0,date,short} your balance was {1,number,currency}."; $args = array(time(), 184.22); $fmt = new MessageFormatter("en_GB", $pattern); echo $fmt->format($args); On 22/02/14 your balance was £184.22.

Slide 38

Slide 38 text

formatting $fr_pattern = "Aujourd'hui, {2,date,dd MMMM}, il y a {0,number} personnes sur {1}."; $fr_args = array(7213518802, "la Terre", time()); ! $msg = new MessageFormatter("fr_FR", $fr_pattern); echo $msg->format($fr_args); Aujourd'hui, 22 février, il y a 7 213 518 802
 personnes sur la Terre.

Slide 39

Slide 39 text

parsing messages $pattern = “On {0,date} you have {1,number} meetings.”; $text = “On February 22, 2014 you have 33 meetings.”; $msg = new MessageFormatter("en_US", $pattern); var_dump($fmt->parse($text)); array(2) { [0]=> int(1393056000) [1]=> int(33) }

Slide 40

Slide 40 text

plural selection $pattern = "There {0,plural, =0{are no results} =1{is # result} other{are # results}} found.”; $fmt = new MessageFormatter("en_GB", $pattern); echo $fmt->format(array(0)); echo $fmt->format(array(12)); There are no results found. There are 12 results found.

Slide 41

Slide 41 text

Break Iterators

Slide 42

Slide 42 text

purpose • locate linguistic boundaries • supported units • characters • words • lines • sentences • more complex ones are possible with custom rules

Slide 43

Slide 43 text

sentences $text = <<setText($text); foreach ($bi->getPartsIterator() as $part) echo "** ", $part, "\n"; ** She asked, “Are you from U.K.?” ** John Smith Sr. nodded.

Slide 44

Slide 44 text

lines $bi = IntlBreakIterator::createLineInstance("en"); $bi->setText($text); foreach ($sentenceBI->getPartsIterator() as $part) echo $part, "\n"; She asked, "Are you from U.K.?" John Waxby Sr. nodded.

Slide 45

Slide 45 text

lines $offset = 39; $lineBI->first(); echo substr($text, 0, $lineBI->next()),"."; echo substr($text, 0, $lineBI->next()),"."; echo substr($text, 0, $lineBI->preceding($offset)),"*"; She * She asked, * She asked, "Are you from U.K.?" John *

Slide 46

Slide 46 text

Resource Bundles

Slide 47

Slide 47 text

purpose • contain resources for localization • messages, labels, formatting patterns, etc • accessed via locale-independent interface • fallback mechanism is key

Slide 48

Slide 48 text

data hierarchy root root en es ja zh language Hans Hant script US ES MX JP CN HK country

Slide 49

Slide 49 text

data format • simple resources • string, integer, binary data, integer array • complex resources • arrays and tables

Slide 50

Slide 50 text

root.txt root { version:string { "1.0.0" } ! mainTitle { "Welcome to our store!" } errors:array { :string { "Website is experiencing difficulties" } :string { "Maximum of {0,number,integer}” "products are allowed" } } sizes:intvector { 10, 100 } }

Slide 51

Slide 51 text

en_GB.txt en_GB { version { "1.0.1" } ! mainTitle:string { "Welcome to our old shoppe!" } sizes:intvector { 25, 250 } }

Slide 52

Slide 52 text

compiling % mkdir myres % genrb -d myres root.txt en.txt en_GB.txt % ls myres genrb number of files: 3 en.res en_GB.res root.res

Slide 53

Slide 53 text

retrieval • root $bundle = DIRNAME(__FILE__).'/myres'; $r = ResourceBundle::create('root', $bundle); echo $r['mainTitle']; echo $r['errors'][1]; print_r($r['sizes']); Welcome to our store! Maximum of {0,number,integer} products are allowed Array ( [0] => 10 [1] => 100 )

Slide 54

Slide 54 text

retrieval • en_GB $bundle = DIRNAME(__FILE__).'/myres'; $r = ResourceBundle::create('en_GB', $bundle); echo $r['mainTitle']; echo $r['errors'][1]; print_r($r['sizes']); Welcome to our olde shoppe! Maximum of {0,number,integer} products are allowed Array ( [0] => 25 [1] => 250 )

Slide 55

Slide 55 text

retrieval • de $bundle = DIRNAME(__FILE__).'/myres'; $r = ResourceBundle::create('de', $bundle); echo $r['mainTitle']; echo $r['errors'][1]; print_r($r['sizes']); Welcome to our store! Maximum of {0,number,integer} products are allowed Array ( [0] => 10 [1] => 100 )

Slide 56

Slide 56 text

Spoof Checking

Slide 57

Slide 57 text

paypaI.com You received a large payment. Click here to receive:

Slide 58

Slide 58 text

purpose • prevent certain classes of security attacks • check identifiers (typically URLs) for visual confusion • single script • mixed script • whole script

Slide 59

Slide 59 text

single script $url1 = "google.com";! $url2 = "goog1e.com";! ! $spoof = new SpoofChecker();! if ($spoof->areConfusable($url1, $url2))! echo "$url1 and $url2 are confusable\n";

Slide 60

Slide 60 text

single script $url1 = "google.com";! $url2 = "goog1e.com";! ! $spoof = new SpoofChecker();! if ($spoof->areConfusable($url1, $url2))! echo "$url1 and $url2 are confusable\n"; google.com and goog1e.com are confusable

Slide 61

Slide 61 text

mixed script $url1 = "yahoo.com"; $url2 = "yahоo.com"; ! $spoof = new SpoofChecker(); if ($spoof->areConfusable($url1, $url2)) echo "$url1 and $url2 are confusable\n";

Slide 62

Slide 62 text

mixed script $url1 = "yahoo.com"; $url2 = "yahоo.com"; ! $spoof = new SpoofChecker(); if ($spoof->areConfusable($url1, $url2)) echo "$url1 and $url2 are confusable\n"; yahoo.com and yahоo.com are confusable

Slide 63

Slide 63 text

suspicious $word = "Норе"; $spoof->setAllowedLocales("en_US"); if ($spoof->isSuspicious($word)) echo "$word is suspicous in en_US"; else echo "not suspicious"; Норе is suspicous in en_US

Slide 64

Slide 64 text

suspicious $word = "Норе"; $spoof->setAllowedLocales("en_US,ru_RU"); if ($spoof->isSuspicious($word)) echo "$word is suspicous in en_US,ru_RU"; else echo "not suspicious"; not suspicious

Slide 65

Slide 65 text

Transliterator

Slide 66

Slide 66 text

purpose • originally used for script transliteration • much more general transform mechanism, including: • case • normalization • full/half-width • hex/character names

Slide 67

Slide 67 text

transliteration IDs source-target/variant

Slide 68

Slide 68 text

transliteration IDs Any-target/variant

Slide 69

Slide 69 text

sample IDs • Katakana-Latin • Latin-ASCII • NFD • Any-Hex/XML

Slide 70

Slide 70 text

script conversion $tr = Transliterator::create("Any-Latin"); $sign = 'ϚοΫυφϧυ'; echo $latin = $tr->transliterate($sign); $tr = Transliterator::create("Latin-Katakana"); var_dump($tr->transliterate($latin) == $sign); makkudonarudo

Slide 71

Slide 71 text

script conversion $tr = Transliterator::create("Cyrillic-Latin"); echo $tr->transliterate('я в избушке сижу опять’); ! $tr = Transliterator::create("Russian-Latin/BGN"); echo $tr->transliterate('я в избушке сижу опять'); â v izbuške sižu opâtʹ

Slide 72

Slide 72 text

script conversion â v izbuške sižu opâtʹ ya v izbushke sizhu opyatʹ $tr = Transliterator::create("Cyrillic-Latin"); echo $tr->transliterate('я в избушке сижу опять’); ! $tr = Transliterator::create("Russian-Latin/BGN"); echo $tr->transliterate('я в избушке сижу опять');

Slide 73

Slide 73 text

Any-Name $tr = Transliterator::create("Any-Name"); echo $tr->transliterate('я$'); \N{CYRILLIC SMALL LETTER YA}\N{DOLLAR SIGN}

Slide 74

Slide 74 text

Latin-ASCII $tr = Transliterator::create("Latin-ASCII"); echo $tr->transliterate("© 1990 «PHP»"); (C) 1990 <>

Slide 75

Slide 75 text

compound IDs $tr = Transliterator::create("Greek-Latin"); echo $tr->transliterate("Αλφαβητικός Κατάλογος”); Alphabētikós Katálogos

Slide 76

Slide 76 text

compound IDs $tr = Transliterator::create("Greek-Latin"); echo $tr->transliterate("Αλφαβητικός Κατάλογος”); $tr = Transliterator::create( "Greek-Latin; NFD; [:Nonspacing Mark:] Remove; NFC”); echo $tr->transliterate("Αλφαβητικός Κατάλογος”); Alphabētikós Katálogos Alphabetikos Katalogos

Slide 77

Slide 77 text

rule-based transforms $rules = <<<'RULES' $space = ' ' ; $space {$space} > ; # collapse multiple spaces '--' <> — ; # convert fake dash into real one RULES; $tr = Transliterator::createFromRules($rules); echo $tr->transliterate("a very spacey -- and delimited -- remark”); a very spacey — and delimited — remark

Slide 78

Slide 78 text

• pecl.php.net/intl • php.net/intl • cldr.unicode.org • userguide.icu-project.org

Slide 79

Slide 79 text

спасибо thank you merci þakka þér ͋Γ͕ͱ͏