Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nuts and Bolts of Internationalization

Aria Stewart
November 14, 2015

Nuts and Bolts of Internationalization

An in-depth dive into building an internationalized application, focusing on Express, but with concepts applicable to many situations.

Presented at Nodevember 2015 in Nashville, TN

Aria Stewart

November 14, 2015
Tweet

More Decks by Aria Stewart

Other Decks in Programming

Transcript

  1. Nuts and bolts of
    internationalization

    View Slide

  2. What is i18n?
    Internationlization is the process of making your application
    able to handle multiple languages.
    I, 18 letters, n.
    i18n.

    View Slide

  3. How many of you can
    read five languages?

    View Slide

  4. How many of you can
    read two languages?

    View Slide

  5. Just one?

    View Slide

  6. Supporting multiple
    languages is hard

    View Slide

  7. Why do we do this?

    View Slide

  8. Internationalization is
    accessibility

    View Slide

  9. Empathy
    Imagine you’re planning a
    vacation

    View Slide

  10. Empathy
    Grab your iPad, sit down on the
    patio in your house in El Pedregal,
    Mexico City.

    View Slide

  11. Empathy
    Type vacaciones gran cañon into
    google.com.mx.

    View Slide

  12. Human languages have some irregular bits.
    console.log("There are " + items.length + " " + (
    items.length == 1 ? "item" : "items"
    ) + " in your cart")

    View Slide

  13. in Polish
    istnieją 0 produkty w koszyku.
    istnieje 1 produkt w koszyku.
    istnieją 2 produkty w koszyku.
    istnieją 3 produkty w koszyku.
    istnieją 4 produkty w koszyku.
    istnieje 5 produktów w koszyku

    View Slide

  14. Polish
    console.log((
    items.length == 0 ? "istnieją " + items.length + " produkty" :
    items.length == 1 ? "znajduje się " + items.length + " produkt" :
    items.length % 10 == 2 || items.length % 10 == 3 || items.length % 10 == 4 ?
    "istnieją " + items.length + " produkty" :
    "istnieje " + items.length + " produktów"
    ) + " w koszyku");

    View Slide

  15. We’ve created a monster
    console.log(
    lang == "pl" ? (
    items.length == 0 ? "istnieją " + items.length + " produkty" :
    items.length == 1 ? "znajduje się " + items.length + " produkt" :
    items.length % 10 == 2 || items.length % 10 == 3 || items.length % 10 == 4 ?
    "istnieją " + items.length + " produkty" :
    "istnieje " + items.length + " produktów"
    ) + " w koszyku"
    ) :
    lang == "en" ? (
    "There are " + items.length + " " + (items.length == 1 ? "item" : "items") + " in your cart"
    ) : "unsupported language"
    );

    View Slide

  16. "dependencies": {
    "the-english-language": "^2015.0.0",
    "academie-francaise": "^2005.33.9"
    }
    Never make your code depend on English.

    View Slide

  17. Because we’ve integrated this into our code, we have to scatter
    i18n into all sorts of places, deep and high in the stack.

    View Slide

  18. Adding a new language means editing the entire codebase.
    Translations take time. This means several edits.
    Merge conflicts with every piece of the codebase that has user-
    visible text.

    View Slide

  19. Let’s find a better way.

    View Slide

  20. MessageFormat
    (or gettext, or ...)
    Push the list of cases out into each translation. Polish specifics
    go in the Polish language files. Programmers see only one
    string in the source code.

    View Slide

  21. Message formatters usually use a key in the source code, plus
    placeholder values to fill in numbers and dates.
    Essentially, a function call.

    View Slide

  22. English
    {
    "cart": {
    "items": "There are
    {items, number, =1 {item}, other {#items}}
    in your cart"
    }
    }

    View Slide

  23. Polish (as line-wrapped JSON)
    {
    "cart": {
    "items": "{items, number,
    one {znajduje się # produkt w koszyku.}
    few {istnieją # produkty w koszyku.}
    many {istnieje # produktów w koszyku}}"
    }
    }

    View Slide

  24. And in our code:
    formatMessage(messages.items.cart, { items: 3 });

    View Slide

  25. That was the easy part.

    View Slide

  26. Workflows
    The hard part.

    View Slide

  27. The ongoing pain
    Applications change over time.

    View Slide

  28. git commit -m 'updated translations for user
    interface'
    Not that simple.

    View Slide

  29. Translation takes
    time.

    View Slide

  30. Updates should flow
    one way

    View Slide

  31. Decide on a definitive source translation.
    Update that, then retranslate the changed pieces in each
    language.
    Remember that you have to maintain any specialization.

    View Slide

  32. ¡Vamos a crearlo!
    So let’s do this!

    View Slide

  33. server.js
    var express = require('express');
    var path = require('path');
    var app = express();
    var hbs = require('hbs');
    var hbsIntl = require('handlebars-intl');
    var engine = hbs.create();
    hbsIntl.registerWith(engine);
    app.engine("hbs", engine.__express);
    app.listen(process.env.PORT || 8080);

    View Slide

  34. app.use(function selectLanguageForRequest(req, res, next) {
    var lang = req.query.lang || 'en';
    // Or use req.headers['Accept-Language']
    // Or use the user's account settings.
    // Or use multiple strategies.
    req.messages = require(path.resolve(__dirname, 'locales', lang + '.json'))
    next();
    });

    View Slide

  35. views/hello.hbs

    {{formatMessage messages.hello}}
    Handler
    app.get('/', function (req, res) {
    res.render('hello.hbs', {
    messages: req.messages
    });
    });

    View Slide

  36. locales/es.json
    {
    "hello": "¡Hola al mundo!"
    }
    locales/en.json
    {
    "hello": "Hello, World!"
    }

    View Slide

  37. Let’s try it out.
    $ PORT=8080 npm start
    $ curl http://localhost:$PORT

    Hello, World!
    $ curl http://localhost:$PORT?lang=es

    ¡Hola al mundo!

    View Slide

  38. Let’s add more

    View Slide

  39. English
    {
    "bag": "There {items, plural, one{is one item} other {are # items}} in your bag"
    }

    View Slide

  40. Spanish
    {
    "bag": "Hay {items, plural, one {# itema} other {# itemas}} en su bolso"
    }

    View Slide

  41. views/bag.hbs

    {{formatMessage messages.bag items=items}}}
    Handler
    app.get('/bag', function (req, res) {
    res.render('bag.hbs', {
    messages: req.messages,
    items: req.query.items
    });
    });

    View Slide

  42. http://localhost:8080/bag?items=2

    There are 2 items in your bag
    http://localhost:8080/bag?items=1

    There is 1 item in your bag
    http://localhost:8080/bag?items=2&lang=es

    Hay 2 itemas en su bolso

    View Slide

  43. Now let’s do it in the
    browser

    View Slide

  44. app.use('/locales',
    serveStatic(path.resolve(__dirname, 'locales')));

    View Slide

  45. var MessageFormat = require('message-format');
    function Formatter(dict) {
    this.dict = dict;
    }
    Formatter.prototype.format = function format(message, args) {
    if (!this.dict[message]) {
    console.warn('no translation found for', message);
    }
    message = this.dict[message] || message;
    return new MessageFormat(message).format(args);
    }
    module.exports = Formatter;

    View Slide

  46. // A trivial ‘render’ function for my component^Wapplication
    module.exports = function render(formatter) {
    document.querySelector('p').innerText =
    formatter.format("bag", { items: 2});
    }

    View Slide

  47. // Polyfills are scratchy
    require('intl');
    require('intl/locale-data/jsonp/en.js');
    require('intl/locale-data/jsonp/es.js');
    var fetch = require('isomorphic-fetch');
    var Promise = require('bluebird');
    var Formatter = require('./formatter');
    var render = require('./render');

    View Slide

  48. var dramaticPause = 3000;
    var lang = document.documentElement.getAttribute('lang');
    var messages = fetch('/locale/' + lang + '.json').then(function (res) {
    return res.json();
    });
    messages.then(function (dict) {
    alert('A dramatic pause...');
    return Promise.delay(dramaticPause)).then(function () {
    var formatter = new Formatter(dict);
    render(formatter);
    });
    });

    View Slide



  49. Loading...


    View Slide

  50. Angular?
    Content service exposes key-value pairs. Supply some!

    View Slide

  51. Dust templating?
    formatjs.io has dust-intl. It’s great, and supplies message
    formatting. Just supply text strings.

    View Slide

  52. User Interface Concerns
    • Finding word boundaries isn’t always easy
    • Japanese sentences involving imported words can be very
    long
    • German words get very long and finding good wrapping
    gets tricky
    • Arabic and 8 other currently used scripts start on the right
    and go left.

    View Slide

  53. Culture Matters
    • Names don’t work everywhere the same way they do in your
    country.
    • Names don’t even work the way you think they do.
    • Not everyone writes numbers the same way.

    View Slide

  54. Warnings
    Language != locale
    English is spoken in the US. English is spoken in the UK.
    But we spell colour differently and we write our dates inside
    out in the US. Same language, different specifics. You can call
    the language with the local details a "locale".

    View Slide

  55. BCP47 document from the IETF has a whole standard for
    identifiers for languages.
    en-US
    en-UK
    en
    i-navajo
    zh-CN-hanz

    View Slide

  56. Tips for language tags
    If you’re parsing a language expectation from an external
    source, you may have more or less to the language tag than
    you expect.
    Use the bcp47 module to parse them. Use bcp47-serialize to
    get them as a string again.
    Canonicalize into a locale you support early on.

    View Slide

  57. Tips for language tags
    Plan to do matching and fallback when you get a request for a
    language that’s close to one you support but not quite right.

    View Slide

  58. Tips for language tags
    Pass locale tags as opaque strings whenever possible -- it’s far
    easier to get right.
    "en-US" // Better
    {lang: 'en', region: 'US'} // You will make mistakes
    Especially once you add i-navajo and zh-CN-hanz.

    View Slide

  59. Tips for long form
    Handle long form content separately.
    Use one language per file.
    Keep it simple.

    View Slide

  60. Command-line apps
    Look at the LANG environment variable, or the pieces and parts,
    LC_*.
    How you sort, display numbers, each specifiable separately.
    Consider gettext format messages to be similar to apps in C.

    View Slide

  61. And in closing
    Your native language isn’t the right way, it’s just a way.

    View Slide