Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nuts and Bolts of Internationalization

Aria Stewart
November 14, 2015

Nuts and Bolts of Internationalization

An in-depth dive into building an internationalized application, focusing on Express, but with concepts applicable to many situations.

Presented at Nodevember 2015 in Nashville, TN

Aria Stewart

November 14, 2015
Tweet

More Decks by Aria Stewart

Other Decks in Programming

Transcript

  1. What is i18n? Internationlization is the process of making your

    application able to handle multiple languages. I, 18 letters, n. i18n.
  2. Empathy Grab your iPad, sit down on the patio in

    your house in El Pedregal, Mexico City.
  3. Human languages have some irregular bits. console.log("There are " +

    items.length + " " + ( items.length == 1 ? "item" : "items" ) + " in your cart")
  4. in Polish istnieją 0 produkty w koszyku. istnieje 1 produkt

    w koszyku. istnieją 2 produkty w koszyku. istnieją 3 produkty w koszyku. istnieją 4 produkty w koszyku. istnieje 5 produktów w koszyku
  5. Polish console.log(( items.length == 0 ? "istnieją " + items.length

    + " produkty" : items.length == 1 ? "znajduje się " + items.length + " produkt" : items.length % 10 == 2 || items.length % 10 == 3 || items.length % 10 == 4 ? "istnieją " + items.length + " produkty" : "istnieje " + items.length + " produktów" ) + " w koszyku");
  6. We’ve created a monster console.log( lang == "pl" ? (

    items.length == 0 ? "istnieją " + items.length + " produkty" : items.length == 1 ? "znajduje się " + items.length + " produkt" : items.length % 10 == 2 || items.length % 10 == 3 || items.length % 10 == 4 ? "istnieją " + items.length + " produkty" : "istnieje " + items.length + " produktów" ) + " w koszyku" ) : lang == "en" ? ( "There are " + items.length + " " + (items.length == 1 ? "item" : "items") + " in your cart" ) : "unsupported language" );
  7. Because we’ve integrated this into our code, we have to

    scatter i18n into all sorts of places, deep and high in the stack.
  8. Adding a new language means editing the entire codebase. Translations

    take time. This means several edits. Merge conflicts with every piece of the codebase that has user- visible text.
  9. MessageFormat (or gettext, or ...) Push the list of cases

    out into each translation. Polish specifics go in the Polish language files. Programmers see only one string in the source code.
  10. Message formatters usually use a key in the source code,

    plus placeholder values to fill in numbers and dates. Essentially, a function call.
  11. English { "cart": { "items": "There are {items, number, =1

    {item}, other {#items}} in your cart" } }
  12. Polish (as line-wrapped JSON) { "cart": { "items": "{items, number,

    one {znajduje się # produkt w koszyku.} few {istnieją # produkty w koszyku.} many {istnieje # produktów w koszyku}}" } }
  13. Decide on a definitive source translation. Update that, then retranslate

    the changed pieces in each language. Remember that you have to maintain any specialization.
  14. server.js var express = require('express'); var path = require('path'); var

    app = express(); var hbs = require('hbs'); var hbsIntl = require('handlebars-intl'); var engine = hbs.create(); hbsIntl.registerWith(engine); app.engine("hbs", engine.__express); app.listen(process.env.PORT || 8080);
  15. app.use(function selectLanguageForRequest(req, res, next) { var lang = req.query.lang ||

    'en'; // Or use req.headers['Accept-Language'] // Or use the user's account settings. // Or use multiple strategies. req.messages = require(path.resolve(__dirname, 'locales', lang + '.json')) next(); });
  16. Let’s try it out. $ PORT=8080 npm start $ curl

    http://localhost:$PORT <!doctype html> <p>Hello, World!</p> $ curl http://localhost:$PORT?lang=es <!doctype html> <p>¡Hola al mundo!</p>
  17. views/bag.hbs <!doctype html> <p>{{formatMessage messages.bag items=items}}}</p> Handler app.get('/bag', function (req,

    res) { res.render('bag.hbs', { messages: req.messages, items: req.query.items }); });
  18. http://localhost:8080/bag?items=2 <!doctype html> <p>There are 2 items in your bag</p>

    http://localhost:8080/bag?items=1 <!doctype html> <p>There is 1 item in your bag</p> http://localhost:8080/bag?items=2&lang=es <!doctype html> <p>Hay 2 itemas en su bolso</p>
  19. var MessageFormat = require('message-format'); function Formatter(dict) { this.dict = dict;

    } Formatter.prototype.format = function format(message, args) { if (!this.dict[message]) { console.warn('no translation found for', message); } message = this.dict[message] || message; return new MessageFormat(message).format(args); } module.exports = Formatter;
  20. // A trivial ‘render’ function for my component^Wapplication module.exports =

    function render(formatter) { document.querySelector('p').innerText = formatter.format("bag", { items: 2}); }
  21. // Polyfills are scratchy require('intl'); require('intl/locale-data/jsonp/en.js'); require('intl/locale-data/jsonp/es.js'); var fetch =

    require('isomorphic-fetch'); var Promise = require('bluebird'); var Formatter = require('./formatter'); var render = require('./render');
  22. var dramaticPause = 3000; var lang = document.documentElement.getAttribute('lang'); var messages

    = fetch('/locale/' + lang + '.json').then(function (res) { return res.json(); }); messages.then(function (dict) { alert('A dramatic pause...'); return Promise.delay(dramaticPause)).then(function () { var formatter = new Formatter(dict); render(formatter); }); });
  23. User Interface Concerns • Finding word boundaries isn’t always easy

    • Japanese sentences involving imported words can be very long • German words get very long and finding good wrapping gets tricky • Arabic and 8 other currently used scripts start on the right and go left.
  24. Culture Matters • Names don’t work everywhere the same way

    they do in your country. • Names don’t even work the way you think they do. • Not everyone writes numbers the same way.
  25. Warnings Language != locale English is spoken in the US.

    English is spoken in the UK. But we spell colour differently and we write our dates inside out in the US. Same language, different specifics. You can call the language with the local details a "locale".
  26. BCP47 document from the IETF has a whole standard for

    identifiers for languages. en-US en-UK en i-navajo zh-CN-hanz
  27. Tips for language tags If you’re parsing a language expectation

    from an external source, you may have more or less to the language tag than you expect. Use the bcp47 module to parse them. Use bcp47-serialize to get them as a string again. Canonicalize into a locale you support early on.
  28. Tips for language tags Plan to do matching and fallback

    when you get a request for a language that’s close to one you support but not quite right.
  29. Tips for language tags Pass locale tags as opaque strings

    whenever possible -- it’s far easier to get right. "en-US" // Better {lang: 'en', region: 'US'} // You will make mistakes Especially once you add i-navajo and zh-CN-hanz.
  30. Tips for long form Handle long form content separately. Use

    one language per file. Keep it simple.
  31. Command-line apps Look at the LANG environment variable, or the

    pieces and parts, LC_*. How you sort, display numbers, each specifiable separately. Consider gettext format messages to be similar to apps in C.