Upgrade to Pro — share decks privately, control downloads, hide ads and more …

I18n and L10n for Go (Chinese)

I18n and L10n for Go (Chinese)

Go如何解决i18n和i10n问题
Talk from Gopher China 2016.

176b7829aecb44328ebd28c1a65d7d3f?s=128

Marcel van Lohuizen

April 16, 2016
Tweet

Transcript

  1. አgolang.org/x/textਫሿ ࢵᴬ۸޾๜ࣈ۸ Marcel van Lohuizen Google, Go team I18n and

    L10n for Go using x/text
  2. ༷ᥦ • golang.org/x/text ৼդᎱପ • አ᭔Ҙ • ሿᇫ • ֺৼ

    • ᕮᦞ •golang.org/x/text subrepository •What is it for? •Current status •Examples •Conclusion Overview
  3. ࢵᴬ۸Ө๜ࣈ۸ • ൤ᔱ޾ഭଧ • य़ੜٟ޾ຽ᷌य़ੜٟ • ݌ݻ෈๜ • ဳفᘉᦲ෈๜ •

    හਁ, ᨵ૰, ෭๗෸ᳵ໒ୗ • ܔ֖᫨ഘ I18n and L10n • Searching and Sorting • Upper, lower, title case • Bi-directional text • Injecting translated text • Formatting of numbers, currency, date, time • Unit conversion
  4. golang.org/x/text ሿᇫ ᧍᥺ຽᓋ • language • display ਁᒧԀᒵୗ • collate

    • search • secure • precis ෈๜॒ቘ • cases • encoding • ... • runes • segment • transform • unicode • bidi • cldr • norm • rangetable • width ໒ୗ۸ • currency • date • message • number • measure • area • length • ... • feature • gender • plural
  5. Go᧍᥺ጱᥝ࿢ • ඪ೮෈๜ၞ (io.Reader, io.Writer) • ᶉா᱾ളପ • ݶ෸๐ۓग़ᐿ᧍᥺ •

    ௔ᚆ • ᓌܔጱAPI Go’s Requirements • Streaming • Statically-linked binaries • Multiple languages served simultaneously • Performance • Simple API
  6. GoӾጱUnicode Unicode Go Refresher

  7. GoֵአUTF-8 const beijing = "۹Ղ૱" for index, runeValue := range

    beijing { fmt.Printf("%#U ՗ᒫ%dਁᜓ୏ত\n", runeValue, index) } Go᧍᥺ܻኞඪ೮UTF-8: ᬌڊ: U+5317 '۹' ՗ᒫ0ਁᜓ୏ত U+4EAC 'Ղ' ՗ᒫ3ਁᜓ୏ত U+5E02 '૱' ՗ᒫ6ਁᜓ୏ত Go natively handles UTF-8: Go uses UTF-8 Output:
  8. ਁᒧԀཛྷࣳ௛ᕮ • তᕣֵአUTF-8 • ੒ܻդᎱֵአݶ໏ጱᖫᎱ॒ቘොୗ • ӧඪ೮ᵋ๢ᦢᳯ • ӧ൉׀زහഝҁᴻਁᜓᳩଶ҂౲ᘏਁᒧԀ੒᨝ •

    ଚӧᥝ࿢ਁᒧԀ஠ᶳฎ୭Ӟ۸ݸጱ String Model • Always UTF-8 • Same model for source code as for text handling! • No random access • No meta data (except for byte length) or string “object” • Strings not in canonical form
  9. ෈๜ጱଧڜ๜ᨶ ! const flags = "#$" // ࢵਹդᎱ "mc" +

    "nl" fmt.Println(flags[4:]) Sequential nature of text
  10. ෈๜ጱଧڜ๜ᨶ • ෈๜॒ቘ๜ᨶӤฎᶲଧ۸ጱ҅
 ܨֵ੒UTF-32ጱग़ਁᜓਁᒧ • ग़ਁᜓਁᒧ (multiple runes): “e +

    ´ = é” • ړ᦯ • य़ੜٟ (continued) • Text processing is inherently sequential, even for UTF-32 • Multi-rune characters: “e + ´ = é” • Segmentation • Casing
  11. ᫨ഘ෈๜ Transforming Text

  12. Transformer ളݗ type Transformer interface { Transform(dst, src []byte, atEOF

    bool) (nDst, nSrc int, err error) Reset() }
  13. ֵአ Transformers ᭗ଉֵአtransform۱൉׀ጱᬀۗڍහғ encoder := simplifiedchinese.GBK.NewEncoder() s, _, _ :=

    transform.String(encoder, "֦অ") ݶ෸य़᮱ړ᫫կ۱൉׀ԧො׎ጱ੗ᤰ s := encoder.String("֦অ")
 
 w := norm.NFC.Writer(w) • A transform is typically used with one of the helpers functions. • Most packages provide convenience wrappers Using Transformers
  14. Modifi ͜ ȩ̶̧̧̧̧̛̛̣̣̣͚᤹᤹᤹᤹᤹᤹́̐́́́͢͠rs x/text/unicode/norm۱൉׀ඪ೮෈๜ၞଚӬਞقጱO(n) Unicodeຽٵ۸ᓒဩ norm.NFC.Writer(w) // զNFC໒ୗݻwٟف෈๜ၞ ຽٵ۸ Normalization

    x/text/unicode/norm implements a stream-safe and secure O(n) normalization algorithm
  15. cases۱ ຽ᷌य़ੜٟғ toTitle := cases.Title(language.Dutch)
 
 fmt.Println(toTitle.String("'n ijsberg”)) ᬌڊ: 'n

    IJsberg ӧݶጱ᧍᥺ݢᚆᵱᥝӧݶጱय़ੜٟᓒဩ Package cases Languages may require different casing algorithms!
  16. Transformers • ਫሿԧTransformerളݗጱx/text۱ғ • cases • encoding/... • runes •

    transform • width • secure/precis • unicode/norm • unicode/bidi
  17. ൤ᔱӨഭଧ Searching and Sorting

  18. ग़᧍᥺൤ᔱӨഭଧ • ଃᶪ᧣ጱਁᒧғe < é < f • ग़ਁྮጱਁᒧғ”ch"ҁᥜቔᇌ᧍҂ •

    ᒵհਁᒧғå 㱻 aaҁԄἈ᧍҂҅ ß 㱻 ssҁ஛᧍҂ • ᯿ഭଧғZ < ÅҁԄἈ᧍҂ • ّ਻௔ᒵհғK (U+004B) 㱻 K (U+212A) • ݍଧഭڜے೭य़ဩ᧍Ӿଃᶪ᧣ጱਁᒧ Multilingual Search and Sort • Accented characters: e < é < f • Multi-letter characters: "ch" in Spanish • Equivalences: 
 
 å 㱻 aa in Danish 
 ß 㱻 ss in German • Reordering: Z < Å in Danish • Compatibility equivalence: 
 
 K (U+004B) 㱻 K (U+212A) • Reverse sorting of accents in Canadian French
  19. ൤ᔱӨ๊ഘ • አ bytes.Replace ಩ "a cafe" ๊ഘ౮ "many cafes"

    1. “We went to a cafe.” 2. “We went to a café.” 3. “We went to a cafe/u0301.” • ᒫӣӻֺݙጱᕮຎ: “We went to many cafes/u0301.” ҖNFC 㱺
 “We went to many cafeś.” ᓌܔጱܔਁᜓ൤ᔱ๊ഘଚӧᭇአѺ Search and Replace Simple byte-oriented search and replace will not work!
  20. x/text/search ֺৼ m := search.New(language.Danish, search.IgnoreCase, search.IgnoreDiacritics) start, end :=

    m.IndexString(text, s) match := s[start:end] SEARCH TEXT MATCH aarhus Århus a\u0303\u031b Århus a a\u0303\u031b a\u031b\u0303 a\u0303\u031b search Example
  21. x/text/collate ֺৼ import ( "fmt" "golang.org/x/text/collate" "golang.org/x/text/language" ) func main()

    { a := []string{"۹Ղ૱", "Ӥၹ૱", "ଠ૞૱"} for _, tag := range []string{"en","zh", "zh-u-co-stroke"} { collate.New(language.Make(tag)).SortStrings(a) fmt.Println(a) } } Output: [Ӥၹ૱ ۹Ղ૱ ଠ૞૱] [۹Ղ૱ ଠ૞૱ Ӥၹ૱] [Ӥၹ૱ ଠ૞૱ ۹Ղ૱] collate Example
  22. ෈๜ړۆ Segmentation

  23. ੒෈๜ړۆጱඪ೮ • ᦇښӾጱғ • ൉׀ړۆۑᚆጱAPI • Unicodeಅඪ೮ጱ: • ܔ᦯ҁզᑮ໒ړᵍጱ҂҅ᤈ҅ݙৼ҅ྦྷ៧ •

    ੢๚ᦇښጱғ • ᰒ੒ᇙਧ᧍᥺ጱ෈๜ړۆ • ཻᬨ๶ᛔᐒ܄ጱଆۗ Segmentation Support • Planned: • API for segmentation • Supported by Unicode: • word, line, sentence, paragraph • Not planned: • Language-specific segmentation • Community support welcome
  24. ᧍᥺ຽᓋ Language Tags Go

  25. ᧍᥺ຽᓋֺৼ zh Ӿ෈ (ἕᦊฎᓌ֛Ӿ෈) zh-Hant ᔺ֛Ӿ෈ҁݣკ҂ zh-HK ᔺ֛Ӿ෈ҁḕ჈҂ zh-Latn-pinyin Ӿ෈೪ᶪ

    zh-HK—u-co-pinyin Ӿ෈҅೪ᶪᶲଧ <lang> [-<script>] [-<region>] [-<variant>]* [-<extension>]* Language Tag Examples
  26. ᧍᥺܃ᯈଚӧᓌܔ • ᧔ታॊ஛᧍ጱՈ᭗ଉᚆލ౜஛᧍ gsw 㱺 de • ֕ݍᬦ๶੪ӧฎѺ de ≯

    gsw • cmnฎฦ᭗ᦾ҅zhๅଉአ • hr ܃ᯈ sr-Latn ࣁx/text/language᯾ጱmatcherᚆᥴ٬ᬯӻᳯ᷌ Matching is Non-Trivial • Swiss German speakers usually understand German gsw 㱺 de • The converse is not often true! 
 de ≯ gsw • cmn is Mandarin Chinese, zh is more commonly used • hr matches sr-Latn The Matcher in x/text/language solves this problem
  27. GoӾጱ᧍᥺܃ᯈ import ( “http”, ”golang.org/x/text/language” ) // Languages supported by

    your application var matcher = language.NewMatcher([]language.Tag{ language.SimplifiedChinese, // zh-Hans language.AmericanEnglish, // en-US }) func handle(w http.ResponseWriter, r *http.Request) { prefs, _, _ := language.ParseAcceptLanguage(r.Header.Get(“Accept-Language”)) tag, _, _ := matcher.Match(prefs…) // use tag; it includes carried over user preference } Language Matching in Go
  28. ᧍᥺܃ᯈ௛ᕮ • ತکአಁ؇ᆽጱ᧍᥺Ӿඪ೮๋অጱӞᐿ • ֵአ܃ᯈکጱຽᓋᭌೠ᧍᥺ፘىጱᩒრ • ᘉᦲ • ഭଧ •

    य़ੜٟ॒ቘ • ᕮຎຽᓋӾ൭ଃํአಁጱᦡᗝ Language Matching Recap • Find best supported language for list of user- preferred languages • Use matched tag to select language-specific resources • translations • sort order • case operations • Resulting tag has carried over user settings
  29. ဳفᘉᦲ෈๜ Hello, world! Hallo Wereld! ֦অ҅ӮኴѺ উ֞ೞࣁਃ, ࣁ҅! Translation Insertion

  30. ᘉᦲ෈๜ • ࣁդᎱӾ಩෈๜ຽᦕԅ“ᵱᥝᘉᦲ” • ਖ਼ᬯԶ෈๜՗դᎱӾ൉ݐڊ๶ • ݎᭆᕳᘉᦲՈާ • ਖ਼ᘉᦲԏݸጱ෈๜ൊفܻ๶ጱդᎱӾ Translating

    Text • Mark text within your code To Be Translated • Extract the text from your code • Send to translators • Insert translated messages back into your code
  31. ਖ਼෈๜ຽᦕԅ“ᵱᥝᘉᦲ” import ”fmt” // Report that person visited a city.

    fmt.Printf(“%[1]s went to %[2]s.”, person, city) import ”golang.org/x/text/message” p := message.NewPrinter(userLang) // Report that person visited a city. p.Printf(“%[1]s went to %[2]s.”, person, city) ԏڹғ ԏݸғ Mark Text “To Be Translated”
  32. ൉ݐଚݎᭆஇᘉᦲጱ෈๜ { Description: "Report that person visited a city.", Original:

    "{person} went to {city}.", Key: "%s went to %s.", } Extract and send for translation
  33. ࣁդᎱӾൊفᘉᦲᕮຎ import ”golang.org/x/text/message” message.SetString(language.Dutch, "%s went to %s", "%s is

    in %s geweest.”) message.SetString(language.SimplifiedChinese, "%s went to %s", "%s݄ԧ%s̶") Insert Translations in Code
  34. ᥢښ • Goૡٍғᛔۖುݐ݊ൊف • ᦇښӾጱғ • ໒ୗ۸හਁ • चԭܔ॔හ̵௔ڦᒵמ௳ጱᭌೠ •

    golang.org/design/12750-localization Planned extensions • Go tooling: automate extraction and insertion • Planned: • number formatting • selection based on plurals, gender, etc. •golang.org/design/12750-localization
  35. ᕮ᧍ • Ոᔄ᧍᥺অᵙ੒՞ • ᦏx/textଆ֦۸ᓌމ Conclusion • Human languages are

    hard to deal with • Let x/text can simplify it for you
  36. ᐒ܄ݍḇ • ӳԵ᧍᥺ҁق᥯҂ • ӳԵਁᒧጱ໒ୗ Community feedback • East-Asian Width

    • gofmt and East-Asian characters • Vertical support
  37. Q & A ᨀᨀ Marcel van Lohuizen • ݇ᘍ •

    godoc.org/golang.org/x/text • blog.golang.org/matchlang • blog.golang.org/normalization • blog.golang.org/strings • golang.org/issue/12750