Upgrade to Pro — share decks privately, control downloads, hide ads and more …

I18n and L10n for Go (Chinese)

I18n and L10n for Go (Chinese)

Go如何解决i18n和i10n问题
Talk from Gopher China 2016.

Marcel van Lohuizen

April 16, 2016
Tweet

More Decks by Marcel van Lohuizen

Other Decks in Programming

Transcript

  1. አgolang.org/x/textਫሿ
    ࢵᴬ۸޾๜ࣈ۸
    Marcel van Lohuizen
    Google, Go team
    I18n and L10n for Go using x/text

    View full-size slide

  2. ༷ᥦ
    • golang.org/x/text ৼդᎱପ
    • አ᭔Ҙ
    • ሿᇫ
    • ֺৼ
    • ᕮᦞ
    •golang.org/x/text subrepository
    •What is it for?
    •Current status
    •Examples
    •Conclusion
    Overview

    View full-size slide

  3. ࢵᴬ۸Ө๜ࣈ۸
    • ൤ᔱ޾ഭଧ
    • य़ੜٟ޾ຽ᷌य़ੜٟ
    • ݌ݻ෈๜
    • ဳفᘉᦲ෈๜
    • හਁ, ᨵ૰, ෭๗෸ᳵ໒ୗ
    • ܔ֖᫨ഘ
    I18n and L10n
    • Searching and Sorting
    • Upper, lower, title case
    • Bi-directional text
    • Injecting translated text
    • Formatting of numbers, currency, date, time
    • Unit conversion

    View full-size slide

  4. golang.org/x/text ሿᇫ
    ᧍᥺ຽᓋ
    • language
    • display
    ਁᒧԀᒵୗ
    • collate
    • search
    • secure
    • precis
    ෈๜॒ቘ
    • cases
    • encoding
    • ...
    • runes
    • segment
    • transform
    • unicode
    • bidi
    • cldr
    • norm
    • rangetable
    • width
    ໒ୗ۸
    • currency
    • date
    • message
    • number
    • measure
    • area
    • length
    • ...
    • feature
    • gender
    • plural

    View full-size slide

  5. Go᧍᥺ጱᥝ࿢
    • ඪ೮෈๜ၞ (io.Reader, io.Writer)
    • ᶉா᱾ളପ
    • ݶ෸๐ۓग़ᐿ᧍᥺
    • ௔ᚆ
    • ᓌܔጱAPI
    Go’s Requirements
    • Streaming
    • Statically-linked binaries
    • Multiple languages served
    simultaneously
    • Performance
    • Simple API

    View full-size slide

  6. GoӾጱUnicode
    Unicode Go Refresher

    View full-size slide

  7. GoֵአUTF-8
    const beijing = "۹Ղ૱"
    for index, runeValue := range beijing {
    fmt.Printf("%#U ՗ᒫ%dਁᜓ୏ত\n", runeValue, index)
    }
    Go᧍᥺ܻኞඪ೮UTF-8:
    ᬌڊ:
    U+5317 '۹' ՗ᒫ0ਁᜓ୏ত
    U+4EAC 'Ղ' ՗ᒫ3ਁᜓ୏ত
    U+5E02 '૱' ՗ᒫ6ਁᜓ୏ত
    Go natively handles UTF-8:
    Go uses UTF-8
    Output:

    View full-size slide

  8. ਁᒧԀཛྷࣳ௛ᕮ
    • তᕣֵአUTF-8
    • ੒ܻդᎱֵአݶ໏ጱᖫᎱ॒ቘොୗ
    • ӧඪ೮ᵋ๢ᦢᳯ
    • ӧ൉׀زහഝҁᴻਁᜓᳩଶ҂౲ᘏਁᒧԀ੒᨝
    • ଚӧᥝ࿢ਁᒧԀ஠ᶳฎ୭Ӟ۸ݸጱ
    String Model
    • Always UTF-8
    • Same model for source code as for text
    handling!
    • No random access
    • No meta data (except for byte length) or
    string “object”
    • Strings not in canonical form

    View full-size slide

  9. ෈๜ጱଧڜ๜ᨶ
    !
    const flags = "#$" // ࢵਹդᎱ "mc" + "nl"

    fmt.Println(flags[4:])

    Sequential nature of text

    View full-size slide

  10. ෈๜ጱଧڜ๜ᨶ
    • ෈๜॒ቘ๜ᨶӤฎᶲଧ۸ጱ҅

    ܨֵ੒UTF-32ጱग़ਁᜓਁᒧ
    • ग़ਁᜓਁᒧ (multiple runes): “e + ´ = é”
    • ړ᦯
    • य़ੜٟ
    (continued)
    • Text processing is inherently sequential,
    even for UTF-32
    • Multi-rune characters: “e + ´ = é”
    • Segmentation
    • Casing

    View full-size slide

  11. ᫨ഘ෈๜
    Transforming Text

    View full-size slide

  12. Transformer ളݗ
    type Transformer interface {
    Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error)
    Reset()
    }

    View full-size slide

  13. ֵአ Transformers
    ᭗ଉֵአtransform۱൉׀ጱᬀۗڍහғ

    encoder := simplifiedchinese.GBK.NewEncoder()
    s, _, _ := transform.String(encoder, "֦অ")
    ݶ෸य़᮱ړ᫫կ۱൉׀ԧො׎ጱ੗ᤰ

    s := encoder.String("֦অ")


    w := norm.NFC.Writer(w)
    • A transform is typically used
    with one of the helpers
    functions.
    • Most packages provide
    convenience wrappers
    Using Transformers

    View full-size slide

  14. Modifi ͜
    ȩ̶̧̧̧̧̛̛̣̣̣͚᤹᤹᤹᤹᤹᤹́̐́́́͢͠rs
    x/text/unicode/norm۱൉׀ඪ೮෈๜ၞଚӬਞقጱO(n) Unicodeຽٵ۸ᓒဩ
    norm.NFC.Writer(w) // զNFC໒ୗݻwٟف෈๜ၞ
    ຽٵ۸
    Normalization
    x/text/unicode/norm implements a stream-safe and secure O(n) normalization algorithm

    View full-size slide

  15. cases۱
    ຽ᷌य़ੜٟғ
    toTitle := cases.Title(language.Dutch)


    fmt.Println(toTitle.String("'n ijsberg”))

    ᬌڊ:
    'n IJsberg
    ӧݶጱ᧍᥺ݢᚆᵱᥝӧݶጱय़ੜٟᓒဩ
    Package cases
    Languages may require different
    casing algorithms!

    View full-size slide

  16. Transformers
    • ਫሿԧTransformerളݗጱx/text۱ғ
    • cases
    • encoding/...
    • runes
    • transform
    • width
    • secure/precis
    • unicode/norm
    • unicode/bidi

    View full-size slide

  17. ൤ᔱӨഭଧ
    Searching and Sorting

    View full-size slide

  18. ग़᧍᥺൤ᔱӨഭଧ
    • ଃᶪ᧣ጱਁᒧғe < é < f
    • ग़ਁྮጱਁᒧғ”ch"ҁᥜቔᇌ᧍҂
    • ᒵհਁᒧғå 㱻 aaҁԄἈ᧍҂҅ ß 㱻 ssҁ஛᧍҂
    • ᯿ഭଧғZ < ÅҁԄἈ᧍҂
    • ّ਻௔ᒵհғK (U+004B) 㱻 K (U+212A)
    • ݍଧഭڜے೭य़ဩ᧍Ӿଃᶪ᧣ጱਁᒧ
    Multilingual Search and Sort
    • Accented characters: e < é < f
    • Multi-letter characters: "ch" in Spanish
    • Equivalences: 


    å 㱻 aa in Danish 

    ß 㱻 ss in German
    • Reordering: Z < Å in Danish
    • Compatibility equivalence: 


    K (U+004B) 㱻 K (U+212A)
    • Reverse sorting of accents in Canadian
    French

    View full-size slide

  19. ൤ᔱӨ๊ഘ
    • አ bytes.Replace ಩ "a cafe" ๊ഘ౮ "many cafes"
    1. “We went to a cafe.”
    2. “We went to a café.”
    3. “We went to a cafe/u0301.”
    • ᒫӣӻֺݙጱᕮຎ:
    “We went to many cafes/u0301.” ҖNFC 㱺

    “We went to many cafeś.”
    ᓌܔጱܔਁᜓ൤ᔱ๊ഘଚӧᭇአѺ
    Search and Replace
    Simple byte-oriented search and
    replace will not work!

    View full-size slide

  20. x/text/search ֺৼ
    m := search.New(language.Danish, search.IgnoreCase, search.IgnoreDiacritics)

    start, end := m.IndexString(text, s)

    match := s[start:end]
    SEARCH TEXT MATCH
    aarhus Århus a\u0303\u031b Århus
    a a\u0303\u031b
    a\u031b\u0303 a\u0303\u031b
    search Example

    View full-size slide

  21. x/text/collate ֺৼ
    import (

    "fmt"

    "golang.org/x/text/collate"

    "golang.org/x/text/language"

    )

    func main() {

    a := []string{"۹Ղ૱", "Ӥၹ૱", "ଠ૞૱"}
    for _, tag := range []string{"en","zh", "zh-u-co-stroke"} {

    collate.New(language.Make(tag)).SortStrings(a)
    fmt.Println(a)

    }

    }
    Output:
    [Ӥၹ૱ ۹Ղ૱ ଠ૞૱]
    [۹Ղ૱ ଠ૞૱ Ӥၹ૱]
    [Ӥၹ૱ ଠ૞૱ ۹Ղ૱]
    collate Example

    View full-size slide

  22. ෈๜ړۆ
    Segmentation

    View full-size slide

  23. ੒෈๜ړۆጱඪ೮
    • ᦇښӾጱғ
    • ൉׀ړۆۑᚆጱAPI
    • Unicodeಅඪ೮ጱ:
    • ܔ᦯ҁզᑮ໒ړᵍጱ҂҅ᤈ҅ݙৼ҅ྦྷ៧
    • ੢๚ᦇښጱғ
    • ᰒ੒ᇙਧ᧍᥺ጱ෈๜ړۆ
    • ཻᬨ๶ᛔᐒ܄ጱଆۗ
    Segmentation Support
    • Planned:
    • API for segmentation
    • Supported by Unicode:
    • word, line, sentence,
    paragraph
    • Not planned:
    • Language-specific segmentation
    • Community support welcome

    View full-size slide

  24. ᧍᥺ຽᓋ
    Language Tags
    Go

    View full-size slide

  25. ᧍᥺ຽᓋֺৼ
    zh Ӿ෈ (ἕᦊฎᓌ֛Ӿ෈)
    zh-Hant ᔺ֛Ӿ෈ҁݣკ҂
    zh-HK ᔺ֛Ӿ෈ҁḕ჈҂
    zh-Latn-pinyin Ӿ෈೪ᶪ
    zh-HK—u-co-pinyin Ӿ෈҅೪ᶪᶲଧ
    [-] [-<region>] [-<variant>]* [-<extension>]*<br/>Language Tag Examples<br/>

    View full-size slide

  26. ᧍᥺܃ᯈଚӧᓌܔ
    • ᧔ታॊ஛᧍ጱՈ᭗ଉᚆލ౜஛᧍ gsw 㱺 de
    • ֕ݍᬦ๶੪ӧฎѺ de ≯ gsw
    • cmnฎฦ᭗ᦾ҅zhๅଉአ
    • hr ܃ᯈ sr-Latn
    ࣁx/text/language᯾ጱmatcherᚆᥴ٬ᬯӻᳯ᷌
    Matching is Non-Trivial
    • Swiss German speakers usually
    understand German gsw 㱺 de
    • The converse is not often true! 

    de ≯ gsw
    • cmn is Mandarin Chinese, zh is more
    commonly used
    • hr matches sr-Latn
    The Matcher in x/text/language solves this problem

    View full-size slide

  27. GoӾጱ᧍᥺܃ᯈ
    import (
    “http”,
    ”golang.org/x/text/language”

    )

    // Languages supported by your application

    var matcher = language.NewMatcher([]language.Tag{
    language.SimplifiedChinese, // zh-Hans

    language.AmericanEnglish, // en-US

    })

    func handle(w http.ResponseWriter, r *http.Request) {

    prefs, _, _ := language.ParseAcceptLanguage(r.Header.Get(“Accept-Language”))
    tag, _, _ := matcher.Match(prefs…)
    // use tag; it includes carried over user preference

    }
    Language Matching in Go

    View full-size slide

  28. ᧍᥺܃ᯈ௛ᕮ
    • ತکአಁ؇ᆽጱ᧍᥺Ӿඪ೮๋অጱӞᐿ
    • ֵአ܃ᯈکጱຽᓋᭌೠ᧍᥺ፘىጱᩒრ
    • ᘉᦲ
    • ഭଧ
    • य़ੜٟ॒ቘ
    • ᕮຎຽᓋӾ൭ଃํአಁጱᦡᗝ
    Language Matching Recap
    • Find best supported language for list of user-
    preferred languages
    • Use matched tag to select language-specific
    resources
    • translations
    • sort order
    • case operations
    • Resulting tag has carried over user settings

    View full-size slide

  29. ဳفᘉᦲ෈๜
    Hello, world!
    Hallo Wereld!
    ֦অ҅ӮኴѺ
    উ֞ೞࣁਃ, ࣁ҅!
    Translation Insertion

    View full-size slide

  30. ᘉᦲ෈๜
    • ࣁդᎱӾ಩෈๜ຽᦕԅ“ᵱᥝᘉᦲ”
    • ਖ਼ᬯԶ෈๜՗դᎱӾ൉ݐڊ๶
    • ݎᭆᕳᘉᦲՈާ
    • ਖ਼ᘉᦲԏݸጱ෈๜ൊفܻ๶ጱդᎱӾ
    Translating Text
    • Mark text within your code To Be
    Translated
    • Extract the text from your code
    • Send to translators
    • Insert translated messages back into
    your code

    View full-size slide

  31. ਖ਼෈๜ຽᦕԅ“ᵱᥝᘉᦲ”
    import ”fmt”

    // Report that person visited a city.
    fmt.Printf(“%[1]s went to %[2]s.”, person, city)
    import ”golang.org/x/text/message”

    p := message.NewPrinter(userLang)
    // Report that person visited a city.
    p.Printf(“%[1]s went to %[2]s.”, person, city)
    ԏڹғ
    ԏݸғ
    Mark Text “To Be
    Translated”

    View full-size slide

  32. ൉ݐଚݎᭆஇᘉᦲጱ෈๜
    {

    Description: "Report that person visited a city.",

    Original: "{person} went to {city}.",

    Key: "%s went to %s.",

    }
    Extract and send for
    translation

    View full-size slide

  33. ࣁդᎱӾൊفᘉᦲᕮຎ
    import ”golang.org/x/text/message”

    message.SetString(language.Dutch,
    "%s went to %s",
    "%s is in %s geweest.”)
    message.SetString(language.SimplifiedChinese,
    "%s went to %s",
    "%s݄ԧ%s̶")
    Insert Translations in Code

    View full-size slide

  34. ᥢښ
    • Goૡٍғᛔۖುݐ݊ൊف
    • ᦇښӾጱғ
    • ໒ୗ۸හਁ
    • चԭܔ॔හ̵௔ڦᒵמ௳ጱᭌೠ
    • golang.org/design/12750-localization
    Planned extensions
    • Go tooling: automate extraction and insertion
    • Planned:
    • number formatting
    • selection based on plurals, gender, etc.
    •golang.org/design/12750-localization

    View full-size slide

  35. ᕮ᧍
    • Ոᔄ᧍᥺অᵙ੒՞
    • ᦏx/textଆ֦۸ᓌމ
    Conclusion
    • Human languages are hard to deal with
    • Let x/text can simplify it for you

    View full-size slide

  36. ᐒ܄ݍḇ
    • ӳԵ᧍᥺ҁق᥯҂
    • ӳԵਁᒧጱ໒ୗ
    Community feedback
    • East-Asian Width
    • gofmt and East-Asian characters
    • Vertical support

    View full-size slide

  37. Q & A
    ᨀᨀ
    Marcel van Lohuizen
    • ݇ᘍ
    • godoc.org/golang.org/x/text
    • blog.golang.org/matchlang
    • blog.golang.org/normalization
    • blog.golang.org/strings
    • golang.org/issue/12750

    View full-size slide