$30 off During Our Annual Pro Sale. View Details »

The Localization of Stack Overflow- QCon China 2014

The Localization of Stack Overflow- QCon China 2014

Slides supporting the "Localization of Stack OVerflow" talk presented at QCon China in April 2014

Marco Cecconi

April 25, 2014
Tweet

More Decks by Marco Cecconi

Other Decks in Programming

Transcript

  1. The Localization Of
    Marco Cecconi
    @sklivvz
    [email protected]

    View Slide

  2. Who are we?

    View Slide

  3. View Slide

  4. View Slide

  5. 561,027,840 pageviews in the last 30 days*
    (~100% growth year over year)
    *source: Quantcast

    View Slide

  6. Why did we want to do this?

    View Slide

  7. Expert
    programmers
    Fluent
    English
    EN
    Stack
    Overflow

    View Slide

  8. Expert
    programmers
    Fluent
    English
    EN
    Stack
    Overflow
    Fluent
    Portuguese
    PT Stack
    Overflow

    View Slide

  9. Different languages
    make
    Different communities

    View Slide

  10. Expert
    programmers
    Stack
    Overflow

    View Slide

  11. We want to make the internet a
    better place for all

    View Slide

  12. Architectural Requirements

    View Slide

  13. Easy.

    View Slide

  14. Easy.
    Fast.

    View Slide

  15. Performance is a feature

    View Slide

  16. View Slide

  17. View Slide

  18. 1. Allocates two objects per translated string
    2. Does a lot of lookups

    View Slide

  19. View Slide

  20. 1. No allocation (strings are interned)
    2. No lookups
    3. Not “easy”, not “usable”

    View Slide

  21. View Slide

  22. Simplest/easiest possible code

    View Slide

  23. Simplest/easiest possible code
    Implementation

    View Slide

  24. Source Code
    Compile Time
    Run Time
    aspnet_compiler
    Look Up

    View Slide

  25. Simplest/easiest possible code
    Compiled to the equivalent of this

    View Slide

  26. Source Code
    Compile Time
    Run Time
    extended
    aspnet_compiler
    using Roslyn
    Look Up

    View Slide

  27. Javascript
    - No GC pressure so we don’t care about interned
    strings
    - Can’t really precompile either
    - We simply create one set of JS files per language,
    e.g. “stub.en.js” and “stub.pt.js”
    - For all that follows, the same APIs are available to
    Javascript

    View Slide

  28. Our solution

    View Slide

  29. View Slide

  30. API
    _s(string value)
    Meaning “Substitute String”
    _m(string value)
    Meaning “Substitute Markdown”

    View Slide

  31. Languages are WEIRD (part 1)

    View Slide

  32. _s("Hello $name$", new { name = "Marco" })
    Hello Marco

    View Slide

  33. 一隻雞
    兩隻雞

    View Slide

  34. 1 chicken
    2 chickens

    View Slide

  35. Language
    Name
    Code Category Examples Rules
    Chinese zh other 0-999;
    1.2...
    other
    → everything
    English en one 1 one → n is 1;
    other
    → everything
    else
    other 0, 2-999;
    1.2, 2.07...

    View Slide

  36. one 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, … i % 10 = 1 and i % 100 !=
    11
    few 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102,
    1002, …
    i % 10 = 2..4 and i % 100 !=
    12..14
    many 0, 5~19, 100, 1000, 10000, 100000, 1000000,

    i % 10 = 0 or
    i % 10 = 5..9 or
    i % 100 = 11..14
    other 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0,
    100000.0, 1000000.0, …
    other 0~15, 100, 1000, 10000, 100000, 1000000, …
    Ukranian!

    View Slide

  37. Welsh has SIX modes…

    View Slide

  38. _s(“#num# chickens", new { num = 3 })
    3 chickens

    View Slide

  39. Behind the scenes
    All combinations are generated for each language
    and sent to translators:
     For Chinese: “$num:other$ chickens” will be sent
     For a 2 mode language: “$num:one$ chickens” and
    “$num:other$ chickens” will be sent
    Rules have to be evaluated at runtime to choose the
    correct translation.

    View Slide

  40. Languages are WEIRD (part 2)

    View Slide

  41. View Slide

  42. 10 classes called Class I to Class X and containing all
    sorts of arbitrary groupings but often characterised as
    • people,
    • long objects,
    • animals,
    • miscellaneous objects,
    • large objects and liquids,
    • small objects,
    • languages,
    • pejoratives,
    • infinitives,
    • mass nouns
    Uganda

    View Slide

  43. _s("Active $~posts$")
    Attivi
    _s("Active $~questions$")
    Attive

    View Slide

  44. Post Mortem

    View Slide

  45. Some numbers
    700 views localized
    100,000 lines of code
    A lot of javascript
    A LOT of refactoring/fixing/tech debt repayed
    Very little performance impact
    ~6 months of work (team of ~3)

    View Slide

  46. More numbers
    Portuguese released Dec. 12
    4k Questions
    7k Answers
    8k Users
    One of the best performing new communities ever

    View Slide

  47. View Slide

  48. Lessons learned

    View Slide

  49. Never put non-content text data in the DB
    It’s A Good Thing™ if all the text to be localized
    is in the views or javascript.

    View Slide

  50. Never compose sentences in code
    n==1? "1 unicorn": n.ToString() + " unicorns"

    View Slide

  51. View Slide

  52. Never assume anything about the language
    10 genders and 6 plurals? REALLY?

    View Slide

  53. Designing a global application is hard

    View Slide

  54. Conclusions

    View Slide

  55. 1. It’s possible to internationalize quickly and
    cheaply, without performance hits.
    2. Localization is a surprisingly rich problem.
    There are many gotchas that can be painful
    later, like pluralization “bugs”. Fun!
    3. Localization is a very healthy choice for
    Stack Overflow and we hope to provide
    more and more users with a native interface
    some day :-)

    View Slide

  56. Questions?
    Marco Cecconi
    @sklivvz
    [email protected]

    View Slide