Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Localization (l10n) - The Process

Localization (l10n) - The Process

Explanation of GNU Gettext with internationalized 'hello world' program. Developers checklist while writing i18n programs. Presented at MahaOnline Limited in Sion, Mumbai (MH) India.

Sundeep Anand

March 07, 2012

More Decks by Sundeep Anand

Other Decks in Programming


  1. www.cdac.in Agenda  Introduction  Localization Methods  Translations: Portable

    Object & Native Formats, Translation and Filtration  Localization Sphere  Localization Mechanics  Working with „gettext‟  Developers Checklist
  2. www.cdac.in Why developing internationalized applications?  Localization (l10n) and Internationalization

    (i18n) are means of adapting computer software to different languages, regional differences and technical requirements of a target market.  Internationalization is a combination of developers task and localization. Which enables a product to be used with multiple scripts and cultures; separating user interface resources in a localizable format.  This concept is also known as NLS (National Language Support or Native Language Support).
  3. www.cdac.in Compile Time Localization Link Time Localization Create language times

    *.cpp & convert them into objects Compile objects to libraries Compile parent CPP with Required Library Run Time Localization Set Locale (LANG/LANGUAGE Environment variable) and Bind Text-Domain Triggering ‘gettext’ to fetch strings from message catalogs as per set locale.
  4. www.cdac.in Translations: Portable Objects & Native Formats Portable Objects 

    text file that includes the original texts and the translations.  language independent Machine Objects includes the exact same contents as PO file. are compiled to binary format and are used for machine translations.  Using Poedit  Translations  Filtrations .sdf, .xml, .properties, .ini, .rc, .yml, .wordfast, .json, .sub Native Formats
  5. www.cdac.in Tools we need to keep in box!  Poedit:

    cross-platform gettext catalogs (.po files) editor, using Poedit we can generate .mo files also.  Translation Toolkit (http://translate.sourceforge.net/wiki/toolkit/index ) • Convertors: moz2po, oo2po, prop2po, php2po, txt2po, po2wordfast, pot2po, csv2po, html2po, ini2po, json2po, rc2po • Tools: poconflicts, pofilter, pogrep, pomerge, pocompile, poclean For common platforms: Windows / Linux / Mac Tools free /non-free Licensed Under Online/Offline Platform Dependency 1. Pootle Free GNU GPL Online N/A 2. Rosetta Non-free Online N/A 3. Kartouche Free GNU GPL Online N/A 4. KBabel Free GNU GPL Offline Widows/Linux 5. poEdit Free Offline Widows/Linux 6. Attesoro Free GNU GPL Offline Linux 7. passolo Free GNU GPL Offline Windows 8. IniTranslator Free GNU GPL Offline Windows 9. GTranslator Free GNU GPL Offline Linux 10. LocFactoryEditor Free GNU GPL Offline Mac OS
  6. www.cdac.in Localization Sphere: Desktop, Web, Mobile We have i18n support

    available in every technology in terms of API, Framework, Libraries etc and they work on similar concept of run-time injection, fetching strings from native format. An example could be…  GNU gettext for C, C++ and open source tools  Microsoft Localization Framework (resource.dll based)  For Java: Apache Tapestry and International Components for Unicode  BabelFx for Flash and Flex Rich Internet Applications  Rails Internationalization (I18n) API for Ruby on Rails http://www.endlesslycurious.com/ 2008/10/
  7. www.cdac.in Localization Mechanics How the things are actually linked up

    to provide dynamism in localization: a tool having English Speaking UI quickly switches to Hindi, thus add Hindi territory to the list of its lovers!!
  8. www.cdac.in Locale – the program – basis of localization The

    locale program writes information about current locale environment, or all locales to standard output. Environment variables available to locale aware programs: 1. LC_CTYPE (Character classification and case conversion) 2. LC_COLLATE (Collation order) 3. LC_TIME (Date and time formats) 4. LC_NUMERIC (Non-monetary numeric formats) 5. LC_MONETARY (Monetary formats) 6. LC_MESSAGES (Formats of informative, diagnostic messages and interactive responses) 7. LC_PAPER (Paper size) 8. LC_NAME (Name formats) 9. LC_ADDRESS (Address formats and location information) 10. LC_TELEPHONE (Telephone number formats) 11. LC_MEASUREMENT (Measurement units) 12. LC_IDENTIFICATION (Metadata about the locale information) LOCPATH: where locale data is stored. Default is /usr/lib/locale A way to handle localization levels easily…
  9. www.cdac.in Required programs for GNOME are: 1. gcc (GNU C

    Compiler) 2. gettext (GNU Internationalized Utilities) 3. gettext-base (GNU Internationalized Utilities for the base system) 4. libc6 (GNU C Shared Libraries) 5. libc6-dev (GNU C Development Libraries) 6. locales (Common files for locale support) 7. libintl (Message translations system compatible i18n library) 8. php-gettext (read gettext MO files directly through PHP) 9. gtranslator (PO File editor for the GNOME desktop) 10.poedit (gettext catalog editor) Things we need in place… Working with GNU Gettext
  10. www.cdac.in Working with GNU Gettext – Implementation thru ‘C’ #include<stdio.h>

    #include<locale.h> #include<libintl.h> int main(void) { /* initializes the entire current locale as per environment variables set by the user */ setlocale(LC_ALL, “”); /* sets the base directory for the message catalogs */ bindtextdomain(“hello”, “.”); textdomain(“hello”); /* set domain for future gettext() calls */ /* allows the translator to work independently from the programmer */ printf(gettext(“Hello World\n”)); return(0); } Internationalized ‘Hello World’ Program  man setlocale, textdomain  xgettext, msginit, translate  msgfmt, set lang and chmod
  11. www.cdac.in Working with GNU Gettext – Implementation thru ‘C’ Next

    Steps… • Extract strings from source file • Create the template for translations xgettext • Create the files to translate using the template • Edit and translate file. • Set Project-Id- Version to {TextDomain} msginit • Create target directories in Text Domain Location bound. • Compile and install translations msgfmt
  12. www.cdac.in Developers Checklist Separating the translatable text from the code

    will avoid code duplication, will let localizers and developers work on updates simultaneously and remove the possibility of damaging code during translation. Externalize all translatable content – Take the text out of the code and place in resource files
  13. www.cdac.in Developers Checklist Input fields often do character validation, so

    make sure to attach the validation rule to the specific country or have the validation rule update when country selection changes. Allow input of international data and foreign scripts
  14. www.cdac.in Developers Checklist Concatenation only works when the content is

    written for a specific language. Avoid constructing strings through concatenation as this makes translation hard – even impossible in certain cases. Avoid string concatenation
  15. www.cdac.in Developers Checklist This form will not work for many

    languages as the verb will be different depending on the product name. Further, do not use a noun as a parameter in a sentence and avoid reusing strings. Translation tools let linguists recycle previously translated strings during the translation pass. Avoid using given string variable in more than one context
  16. www.cdac.in Developers Checklist Make sure the characters don‟t get corrupt

    during input > database > output route: Do all string handling with Unicode An internationalized application uses Unicode for all handling of strings and text. This applies to the static text as well as the dynamic text that is communicated between the application and the database.
  17. www.cdac.in Developers Checklist Translated text expands 30% on average with

    the exception of some languages where it may shrink. Leave enough room on the layout for expansion and avoid static sizing. If there are strings that should not exceed a certain size, always include comments in the resource file for those items. Provide extra room for text expansion – User Interface
  18. www.cdac.in Developers Checklist A string can be translated to a

    Indian language in many different ways. It is very important to provide context information in the resource file when necessary. Add context information to strings using comments
  19. www.cdac.in Developers Checklist Date/time and numeric formatting differ even between

    the regions that speak the same language. Example: dd.mm.yyyy in Bengali; dd- mm-yyyy in Kannada, Gujarati, Hindi, Marathi, Punjabi, Tamil; d-m- yyyy in Telugu, no leading zeroes. Use system functions for date/time and numeric formatting
  20. www.cdac.in Developers Checklist  Font face, size, style will be

    different for some languages. In line styling will prevent these modifications to be done or require code duplication. Always use external style sheets to define styles for a web application.  Avoid using styling tags such as "em", "strong" and "italic" text. Bold font faces cause problems as bold strokes may result in a big blob of ink when the font size is small in printing.  If emphasizing a string is needed with bold font face, we can do it by externalizing the style. This way, localizers can decide for font size as per need. Externalize all styles and formatting
  21. www.cdac.in Developers Checklist Use system functions for sorting and string

    comparison This example has been taken from Microsoft MSDN. An internationalized application does not use any manual sorting logic and relies on the underlying framework‟s API for string comparison. This applies to database data as well as the strings that come from resource files, which may be used in form elements and others such as combo boxes. http://msdn.microsoft.com/en -us/goglobal/bb688122