$30 off During Our Annual Pro Sale. View Details »

I18n and L10n for Go (Chinese)

I18n and L10n for Go (Chinese)

Go如何解决i18n和i10n问题
Talk from Gopher China 2016.

Marcel van Lohuizen

April 16, 2016
Tweet

More Decks by Marcel van Lohuizen

Other Decks in Programming

Transcript

  1. አgolang.org/x/textਫሿ ࢵᴬ۸޾๜ࣈ۸ Marcel van Lohuizen Google, Go team I18n and

    L10n for Go using x/text
  2. ༷ᥦ • golang.org/x/text ৼդᎱପ • አ᭔Ҙ • ሿᇫ • ֺৼ

    • ᕮᦞ •golang.org/x/text subrepository •What is it for? •Current status •Examples •Conclusion Overview
  3. ࢵᴬ۸Ө๜ࣈ۸ • ൤ᔱ޾ഭଧ • य़ੜٟ޾ຽ᷌य़ੜٟ • ݌ݻ෈๜ • ဳفᘉᦲ෈๜ •

    හਁ, ᨵ૰, ෭๗෸ᳵ໒ୗ • ܔ֖᫨ഘ I18n and L10n • Searching and Sorting • Upper, lower, title case • Bi-directional text • Injecting translated text • Formatting of numbers, currency, date, time • Unit conversion
  4. golang.org/x/text ሿᇫ ᧍᥺ຽᓋ • language • display ਁᒧԀᒵୗ • collate

    • search • secure • precis ෈๜॒ቘ • cases • encoding • ... • runes • segment • transform • unicode • bidi • cldr • norm • rangetable • width ໒ୗ۸ • currency • date • message • number • measure • area • length • ... • feature • gender • plural
  5. Go᧍᥺ጱᥝ࿢ • ඪ೮෈๜ၞ (io.Reader, io.Writer) • ᶉா᱾ളପ • ݶ෸๐ۓग़ᐿ᧍᥺ •

    ௔ᚆ • ᓌܔጱAPI Go’s Requirements • Streaming • Statically-linked binaries • Multiple languages served simultaneously • Performance • Simple API
  6. GoӾጱUnicode Unicode Go Refresher

  7. GoֵአUTF-8 const beijing = "۹Ղ૱" for index, runeValue := range

    beijing { fmt.Printf("%#U ՗ᒫ%dਁᜓ୏ত\n", runeValue, index) } Go᧍᥺ܻኞඪ೮UTF-8: ᬌڊ: U+5317 '۹' ՗ᒫ0ਁᜓ୏ত U+4EAC 'Ղ' ՗ᒫ3ਁᜓ୏ত U+5E02 '૱' ՗ᒫ6ਁᜓ୏ত Go natively handles UTF-8: Go uses UTF-8 Output:
  8. ਁᒧԀཛྷࣳ௛ᕮ • তᕣֵአUTF-8 • ੒ܻդᎱֵአݶ໏ጱᖫᎱ॒ቘොୗ • ӧඪ೮ᵋ๢ᦢᳯ • ӧ൉׀زහഝҁᴻਁᜓᳩଶ҂౲ᘏਁᒧԀ੒᨝ •

    ଚӧᥝ࿢ਁᒧԀ஠ᶳฎ୭Ӟ۸ݸጱ String Model • Always UTF-8 • Same model for source code as for text handling! • No random access • No meta data (except for byte length) or string “object” • Strings not in canonical form
  9. ෈๜ጱଧڜ๜ᨶ ! const flags = "#$" // ࢵਹդᎱ "mc" +

    "nl" fmt.Println(flags[4:]) Sequential nature of text
  10. ෈๜ጱଧڜ๜ᨶ • ෈๜॒ቘ๜ᨶӤฎᶲଧ۸ጱ҅
 ܨֵ੒UTF-32ጱग़ਁᜓਁᒧ • ग़ਁᜓਁᒧ (multiple runes): “e +

    ´ = é” • ړ᦯ • य़ੜٟ (continued) • Text processing is inherently sequential, even for UTF-32 • Multi-rune characters: “e + ´ = é” • Segmentation • Casing
  11. ᫨ഘ෈๜ Transforming Text

  12. Transformer ളݗ type Transformer interface { Transform(dst, src []byte, atEOF

    bool) (nDst, nSrc int, err error) Reset() }
  13. ֵአ Transformers ᭗ଉֵአtransform۱൉׀ጱᬀۗڍහғ encoder := simplifiedchinese.GBK.NewEncoder() s, _, _ :=

    transform.String(encoder, "֦অ") ݶ෸य़᮱ړ᫫կ۱൉׀ԧො׎ጱ੗ᤰ s := encoder.String("֦অ")
 
 w := norm.NFC.Writer(w) • A transform is typically used with one of the helpers functions. • Most packages provide convenience wrappers Using Transformers
  14. Modifi ͜ ȩ̶̧̧̧̧̛̛̣̣̣͚᤹᤹᤹᤹᤹᤹́̐́́́͢͠rs x/text/unicode/norm۱൉׀ඪ೮෈๜ၞଚӬਞقጱO(n) Unicodeຽٵ۸ᓒဩ norm.NFC.Writer(w) // զNFC໒ୗݻwٟف෈๜ၞ ຽٵ۸ Normalization

    x/text/unicode/norm implements a stream-safe and secure O(n) normalization algorithm
  15. cases۱ ຽ᷌य़ੜٟғ toTitle := cases.Title(language.Dutch)
 
 fmt.Println(toTitle.String("'n ijsberg”)) ᬌڊ: 'n

    IJsberg ӧݶጱ᧍᥺ݢᚆᵱᥝӧݶጱय़ੜٟᓒဩ Package cases Languages may require different casing algorithms!
  16. Transformers • ਫሿԧTransformerളݗጱx/text۱ғ • cases • encoding/... • runes •

    transform • width • secure/precis • unicode/norm • unicode/bidi
  17. ൤ᔱӨഭଧ Searching and Sorting

  18. ग़᧍᥺൤ᔱӨഭଧ • ଃᶪ᧣ጱਁᒧғe < é < f • ग़ਁྮጱਁᒧғ”ch"ҁᥜቔᇌ᧍҂ •

    ᒵհਁᒧғå 㱻 aaҁԄἈ᧍҂҅ ß 㱻 ssҁ஛᧍҂ • ᯿ഭଧғZ < ÅҁԄἈ᧍҂ • ّ਻௔ᒵհғK (U+004B) 㱻 K (U+212A) • ݍଧഭڜے೭य़ဩ᧍Ӿଃᶪ᧣ጱਁᒧ Multilingual Search and Sort • Accented characters: e < é < f • Multi-letter characters: "ch" in Spanish • Equivalences: 
 
 å 㱻 aa in Danish 
 ß 㱻 ss in German • Reordering: Z < Å in Danish • Compatibility equivalence: 
 
 K (U+004B) 㱻 K (U+212A) • Reverse sorting of accents in Canadian French
  19. ൤ᔱӨ๊ഘ • አ bytes.Replace ಩ "a cafe" ๊ഘ౮ "many cafes"

    1. “We went to a cafe.” 2. “We went to a café.” 3. “We went to a cafe/u0301.” • ᒫӣӻֺݙጱᕮຎ: “We went to many cafes/u0301.” ҖNFC 㱺
 “We went to many cafeś.” ᓌܔጱܔਁᜓ൤ᔱ๊ഘଚӧᭇአѺ Search and Replace Simple byte-oriented search and replace will not work!
  20. x/text/search ֺৼ m := search.New(language.Danish, search.IgnoreCase, search.IgnoreDiacritics) start, end :=

    m.IndexString(text, s) match := s[start:end] SEARCH TEXT MATCH aarhus Århus a\u0303\u031b Århus a a\u0303\u031b a\u031b\u0303 a\u0303\u031b search Example
  21. x/text/collate ֺৼ import ( "fmt" "golang.org/x/text/collate" "golang.org/x/text/language" ) func main()

    { a := []string{"۹Ղ૱", "Ӥၹ૱", "ଠ૞૱"} for _, tag := range []string{"en","zh", "zh-u-co-stroke"} { collate.New(language.Make(tag)).SortStrings(a) fmt.Println(a) } } Output: [Ӥၹ૱ ۹Ղ૱ ଠ૞૱] [۹Ղ૱ ଠ૞૱ Ӥၹ૱] [Ӥၹ૱ ଠ૞૱ ۹Ղ૱] collate Example
  22. ෈๜ړۆ Segmentation

  23. ੒෈๜ړۆጱඪ೮ • ᦇښӾጱғ • ൉׀ړۆۑᚆጱAPI • Unicodeಅඪ೮ጱ: • ܔ᦯ҁզᑮ໒ړᵍጱ҂҅ᤈ҅ݙৼ҅ྦྷ៧ •

    ੢๚ᦇښጱғ • ᰒ੒ᇙਧ᧍᥺ጱ෈๜ړۆ • ཻᬨ๶ᛔᐒ܄ጱଆۗ Segmentation Support • Planned: • API for segmentation • Supported by Unicode: • word, line, sentence, paragraph • Not planned: • Language-specific segmentation • Community support welcome
  24. ᧍᥺ຽᓋ Language Tags Go

  25. ᧍᥺ຽᓋֺৼ zh Ӿ෈ (ἕᦊฎᓌ֛Ӿ෈) zh-Hant ᔺ֛Ӿ෈ҁݣკ҂ zh-HK ᔺ֛Ӿ෈ҁḕ჈҂ zh-Latn-pinyin Ӿ෈೪ᶪ

    zh-HK—u-co-pinyin Ӿ෈҅೪ᶪᶲଧ <lang> [-<script>] [-<region>] [-<variant>]* [-<extension>]* Language Tag Examples
  26. ᧍᥺܃ᯈଚӧᓌܔ • ᧔ታॊ஛᧍ጱՈ᭗ଉᚆލ౜஛᧍ gsw 㱺 de • ֕ݍᬦ๶੪ӧฎѺ de ≯

    gsw • cmnฎฦ᭗ᦾ҅zhๅଉአ • hr ܃ᯈ sr-Latn ࣁx/text/language᯾ጱmatcherᚆᥴ٬ᬯӻᳯ᷌ Matching is Non-Trivial • Swiss German speakers usually understand German gsw 㱺 de • The converse is not often true! 
 de ≯ gsw • cmn is Mandarin Chinese, zh is more commonly used • hr matches sr-Latn The Matcher in x/text/language solves this problem
  27. GoӾጱ᧍᥺܃ᯈ import ( “http”, ”golang.org/x/text/language” ) // Languages supported by

    your application var matcher = language.NewMatcher([]language.Tag{ language.SimplifiedChinese, // zh-Hans language.AmericanEnglish, // en-US }) func handle(w http.ResponseWriter, r *http.Request) { prefs, _, _ := language.ParseAcceptLanguage(r.Header.Get(“Accept-Language”)) tag, _, _ := matcher.Match(prefs…) // use tag; it includes carried over user preference } Language Matching in Go
  28. ᧍᥺܃ᯈ௛ᕮ • ತکአಁ؇ᆽጱ᧍᥺Ӿඪ೮๋অጱӞᐿ • ֵአ܃ᯈکጱຽᓋᭌೠ᧍᥺ፘىጱᩒრ • ᘉᦲ • ഭଧ •

    य़ੜٟ॒ቘ • ᕮຎຽᓋӾ൭ଃํአಁጱᦡᗝ Language Matching Recap • Find best supported language for list of user- preferred languages • Use matched tag to select language-specific resources • translations • sort order • case operations • Resulting tag has carried over user settings
  29. ဳفᘉᦲ෈๜ Hello, world! Hallo Wereld! ֦অ҅ӮኴѺ উ֞ೞࣁਃ, ࣁ҅! Translation Insertion

  30. ᘉᦲ෈๜ • ࣁդᎱӾ಩෈๜ຽᦕԅ“ᵱᥝᘉᦲ” • ਖ਼ᬯԶ෈๜՗դᎱӾ൉ݐڊ๶ • ݎᭆᕳᘉᦲՈާ • ਖ਼ᘉᦲԏݸጱ෈๜ൊفܻ๶ጱդᎱӾ Translating

    Text • Mark text within your code To Be Translated • Extract the text from your code • Send to translators • Insert translated messages back into your code
  31. ਖ਼෈๜ຽᦕԅ“ᵱᥝᘉᦲ” import ”fmt” // Report that person visited a city.

    fmt.Printf(“%[1]s went to %[2]s.”, person, city) import ”golang.org/x/text/message” p := message.NewPrinter(userLang) // Report that person visited a city. p.Printf(“%[1]s went to %[2]s.”, person, city) ԏڹғ ԏݸғ Mark Text “To Be Translated”
  32. ൉ݐଚݎᭆஇᘉᦲጱ෈๜ { Description: "Report that person visited a city.", Original:

    "{person} went to {city}.", Key: "%s went to %s.", } Extract and send for translation
  33. ࣁդᎱӾൊفᘉᦲᕮຎ import ”golang.org/x/text/message” message.SetString(language.Dutch, "%s went to %s", "%s is

    in %s geweest.”) message.SetString(language.SimplifiedChinese, "%s went to %s", "%s݄ԧ%s̶") Insert Translations in Code
  34. ᥢښ • Goૡٍғᛔۖುݐ݊ൊف • ᦇښӾጱғ • ໒ୗ۸හਁ • चԭܔ॔හ̵௔ڦᒵמ௳ጱᭌೠ •

    golang.org/design/12750-localization Planned extensions • Go tooling: automate extraction and insertion • Planned: • number formatting • selection based on plurals, gender, etc. •golang.org/design/12750-localization
  35. ᕮ᧍ • Ոᔄ᧍᥺অᵙ੒՞ • ᦏx/textଆ֦۸ᓌމ Conclusion • Human languages are

    hard to deal with • Let x/text can simplify it for you
  36. ᐒ܄ݍḇ • ӳԵ᧍᥺ҁق᥯҂ • ӳԵਁᒧጱ໒ୗ Community feedback • East-Asian Width

    • gofmt and East-Asian characters • Vertical support
  37. Q & A ᨀᨀ Marcel van Lohuizen • ݇ᘍ •

    godoc.org/golang.org/x/text • blog.golang.org/matchlang • blog.golang.org/normalization • blog.golang.org/strings • golang.org/issue/12750