Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nuts and Bolts of Internationalization

Aria Stewart
November 14, 2015

Nuts and Bolts of Internationalization

An in-depth dive into building an internationalized application, focusing on Express, but with concepts applicable to many situations.

Presented at Nodevember 2015 in Nashville, TN

Aria Stewart

November 14, 2015
Tweet

More Decks by Aria Stewart

Other Decks in Programming

Transcript

  1. Nuts and bolts of
    internationalization

    View full-size slide

  2. What is i18n?
    Internationlization is the process of making your application
    able to handle multiple languages.
    I, 18 letters, n.
    i18n.

    View full-size slide

  3. How many of you can
    read five languages?

    View full-size slide

  4. How many of you can
    read two languages?

    View full-size slide

  5. Supporting multiple
    languages is hard

    View full-size slide

  6. Why do we do this?

    View full-size slide

  7. Internationalization is
    accessibility

    View full-size slide

  8. Empathy
    Imagine you’re planning a
    vacation

    View full-size slide

  9. Empathy
    Grab your iPad, sit down on the
    patio in your house in El Pedregal,
    Mexico City.

    View full-size slide

  10. Empathy
    Type vacaciones gran cañon into
    google.com.mx.

    View full-size slide

  11. Human languages have some irregular bits.
    console.log("There are " + items.length + " " + (
    items.length == 1 ? "item" : "items"
    ) + " in your cart")

    View full-size slide

  12. in Polish
    istnieją 0 produkty w koszyku.
    istnieje 1 produkt w koszyku.
    istnieją 2 produkty w koszyku.
    istnieją 3 produkty w koszyku.
    istnieją 4 produkty w koszyku.
    istnieje 5 produktów w koszyku

    View full-size slide

  13. Polish
    console.log((
    items.length == 0 ? "istnieją " + items.length + " produkty" :
    items.length == 1 ? "znajduje się " + items.length + " produkt" :
    items.length % 10 == 2 || items.length % 10 == 3 || items.length % 10 == 4 ?
    "istnieją " + items.length + " produkty" :
    "istnieje " + items.length + " produktów"
    ) + " w koszyku");

    View full-size slide

  14. We’ve created a monster
    console.log(
    lang == "pl" ? (
    items.length == 0 ? "istnieją " + items.length + " produkty" :
    items.length == 1 ? "znajduje się " + items.length + " produkt" :
    items.length % 10 == 2 || items.length % 10 == 3 || items.length % 10 == 4 ?
    "istnieją " + items.length + " produkty" :
    "istnieje " + items.length + " produktów"
    ) + " w koszyku"
    ) :
    lang == "en" ? (
    "There are " + items.length + " " + (items.length == 1 ? "item" : "items") + " in your cart"
    ) : "unsupported language"
    );

    View full-size slide

  15. "dependencies": {
    "the-english-language": "^2015.0.0",
    "academie-francaise": "^2005.33.9"
    }
    Never make your code depend on English.

    View full-size slide

  16. Because we’ve integrated this into our code, we have to scatter
    i18n into all sorts of places, deep and high in the stack.

    View full-size slide

  17. Adding a new language means editing the entire codebase.
    Translations take time. This means several edits.
    Merge conflicts with every piece of the codebase that has user-
    visible text.

    View full-size slide

  18. Let’s find a better way.

    View full-size slide

  19. MessageFormat
    (or gettext, or ...)
    Push the list of cases out into each translation. Polish specifics
    go in the Polish language files. Programmers see only one
    string in the source code.

    View full-size slide

  20. Message formatters usually use a key in the source code, plus
    placeholder values to fill in numbers and dates.
    Essentially, a function call.

    View full-size slide

  21. English
    {
    "cart": {
    "items": "There are
    {items, number, =1 {item}, other {#items}}
    in your cart"
    }
    }

    View full-size slide

  22. Polish (as line-wrapped JSON)
    {
    "cart": {
    "items": "{items, number,
    one {znajduje się # produkt w koszyku.}
    few {istnieją # produkty w koszyku.}
    many {istnieje # produktów w koszyku}}"
    }
    }

    View full-size slide

  23. And in our code:
    formatMessage(messages.items.cart, { items: 3 });

    View full-size slide

  24. That was the easy part.

    View full-size slide

  25. Workflows
    The hard part.

    View full-size slide

  26. The ongoing pain
    Applications change over time.

    View full-size slide

  27. git commit -m 'updated translations for user
    interface'
    Not that simple.

    View full-size slide

  28. Translation takes
    time.

    View full-size slide

  29. Updates should flow
    one way

    View full-size slide

  30. Decide on a definitive source translation.
    Update that, then retranslate the changed pieces in each
    language.
    Remember that you have to maintain any specialization.

    View full-size slide

  31. ¡Vamos a crearlo!
    So let’s do this!

    View full-size slide

  32. server.js
    var express = require('express');
    var path = require('path');
    var app = express();
    var hbs = require('hbs');
    var hbsIntl = require('handlebars-intl');
    var engine = hbs.create();
    hbsIntl.registerWith(engine);
    app.engine("hbs", engine.__express);
    app.listen(process.env.PORT || 8080);

    View full-size slide

  33. app.use(function selectLanguageForRequest(req, res, next) {
    var lang = req.query.lang || 'en';
    // Or use req.headers['Accept-Language']
    // Or use the user's account settings.
    // Or use multiple strategies.
    req.messages = require(path.resolve(__dirname, 'locales', lang + '.json'))
    next();
    });

    View full-size slide

  34. views/hello.hbs

    {{formatMessage messages.hello}}
    Handler
    app.get('/', function (req, res) {
    res.render('hello.hbs', {
    messages: req.messages
    });
    });

    View full-size slide

  35. locales/es.json
    {
    "hello": "¡Hola al mundo!"
    }
    locales/en.json
    {
    "hello": "Hello, World!"
    }

    View full-size slide

  36. Let’s try it out.
    $ PORT=8080 npm start
    $ curl http://localhost:$PORT

    Hello, World!
    $ curl http://localhost:$PORT?lang=es

    ¡Hola al mundo!

    View full-size slide

  37. Let’s add more

    View full-size slide

  38. English
    {
    "bag": "There {items, plural, one{is one item} other {are # items}} in your bag"
    }

    View full-size slide

  39. Spanish
    {
    "bag": "Hay {items, plural, one {# itema} other {# itemas}} en su bolso"
    }

    View full-size slide

  40. views/bag.hbs

    {{formatMessage messages.bag items=items}}}
    Handler
    app.get('/bag', function (req, res) {
    res.render('bag.hbs', {
    messages: req.messages,
    items: req.query.items
    });
    });

    View full-size slide

  41. http://localhost:8080/bag?items=2

    There are 2 items in your bag
    http://localhost:8080/bag?items=1

    There is 1 item in your bag
    http://localhost:8080/bag?items=2&lang=es

    Hay 2 itemas en su bolso

    View full-size slide

  42. Now let’s do it in the
    browser

    View full-size slide

  43. app.use('/locales',
    serveStatic(path.resolve(__dirname, 'locales')));

    View full-size slide

  44. var MessageFormat = require('message-format');
    function Formatter(dict) {
    this.dict = dict;
    }
    Formatter.prototype.format = function format(message, args) {
    if (!this.dict[message]) {
    console.warn('no translation found for', message);
    }
    message = this.dict[message] || message;
    return new MessageFormat(message).format(args);
    }
    module.exports = Formatter;

    View full-size slide

  45. // A trivial ‘render’ function for my component^Wapplication
    module.exports = function render(formatter) {
    document.querySelector('p').innerText =
    formatter.format("bag", { items: 2});
    }

    View full-size slide

  46. // Polyfills are scratchy
    require('intl');
    require('intl/locale-data/jsonp/en.js');
    require('intl/locale-data/jsonp/es.js');
    var fetch = require('isomorphic-fetch');
    var Promise = require('bluebird');
    var Formatter = require('./formatter');
    var render = require('./render');

    View full-size slide

  47. var dramaticPause = 3000;
    var lang = document.documentElement.getAttribute('lang');
    var messages = fetch('/locale/' + lang + '.json').then(function (res) {
    return res.json();
    });
    messages.then(function (dict) {
    alert('A dramatic pause...');
    return Promise.delay(dramaticPause)).then(function () {
    var formatter = new Formatter(dict);
    render(formatter);
    });
    });

    View full-size slide

  48. Angular?
    Content service exposes key-value pairs. Supply some!

    View full-size slide

  49. Dust templating?
    formatjs.io has dust-intl. It’s great, and supplies message
    formatting. Just supply text strings.

    View full-size slide

  50. User Interface Concerns
    • Finding word boundaries isn’t always easy
    • Japanese sentences involving imported words can be very
    long
    • German words get very long and finding good wrapping
    gets tricky
    • Arabic and 8 other currently used scripts start on the right
    and go left.

    View full-size slide

  51. Culture Matters
    • Names don’t work everywhere the same way they do in your
    country.
    • Names don’t even work the way you think they do.
    • Not everyone writes numbers the same way.

    View full-size slide

  52. Warnings
    Language != locale
    English is spoken in the US. English is spoken in the UK.
    But we spell colour differently and we write our dates inside
    out in the US. Same language, different specifics. You can call
    the language with the local details a "locale".

    View full-size slide

  53. BCP47 document from the IETF has a whole standard for
    identifiers for languages.
    en-US
    en-UK
    en
    i-navajo
    zh-CN-hanz

    View full-size slide

  54. Tips for language tags
    If you’re parsing a language expectation from an external
    source, you may have more or less to the language tag than
    you expect.
    Use the bcp47 module to parse them. Use bcp47-serialize to
    get them as a string again.
    Canonicalize into a locale you support early on.

    View full-size slide

  55. Tips for language tags
    Plan to do matching and fallback when you get a request for a
    language that’s close to one you support but not quite right.

    View full-size slide

  56. Tips for language tags
    Pass locale tags as opaque strings whenever possible -- it’s far
    easier to get right.
    "en-US" // Better
    {lang: 'en', region: 'US'} // You will make mistakes
    Especially once you add i-navajo and zh-CN-hanz.

    View full-size slide

  57. Tips for long form
    Handle long form content separately.
    Use one language per file.
    Keep it simple.

    View full-size slide

  58. Command-line apps
    Look at the LANG environment variable, or the pieces and parts,
    LC_*.
    How you sort, display numbers, each specifiable separately.
    Consider gettext format messages to be similar to apps in C.

    View full-size slide

  59. And in closing
    Your native language isn’t the right way, it’s just a way.

    View full-size slide