Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Localization (l10n) - The Process

Localization (l10n) - The Process

Explanation of GNU Gettext with internationalized 'hello world' program. Developers checklist while writing i18n programs. Presented at MahaOnline Limited in Sion, Mumbai (MH) India.

Sundeep Anand

March 07, 2012
Tweet

More Decks by Sundeep Anand

Other Decks in Programming

Transcript

  1. www.cdac.in
    Graphics & Intelligence
    Based Script Technology
    Your Gateway to
    Indian Language Computing

    View Slide

  2. www.cdac.in
    By Sundeep Anand [at] MahaOnline, Govt. of Maharashtra

    View Slide

  3. www.cdac.in
    Agenda
     Introduction
     Localization Methods
     Translations: Portable Object & Native
    Formats, Translation and Filtration
     Localization Sphere
     Localization Mechanics
     Working with „gettext‟
     Developers Checklist

    View Slide

  4. www.cdac.in
    Why developing internationalized applications?
     Localization (l10n) and Internationalization
    (i18n) are means of adapting computer
    software to different languages, regional
    differences and technical requirements of a
    target market.
     Internationalization is a combination of
    developers task and localization. Which
    enables a product to be used with multiple
    scripts and cultures; separating user interface
    resources in a localizable format.
     This concept is also known as NLS (National
    Language Support or Native Language
    Support).

    View Slide

  5. www.cdac.in
    Compile Time
    Localization
    Link Time
    Localization
    Create language
    times *.cpp &
    convert them into
    objects
    Compile objects to
    libraries
    Compile parent CPP
    with Required
    Library
    Run Time
    Localization
    Set Locale
    (LANG/LANGUAGE
    Environment variable)
    and Bind Text-Domain
    Triggering ‘gettext’ to
    fetch strings from
    message catalogs as
    per set locale.

    View Slide

  6. www.cdac.in
    Translations: Portable Objects & Native Formats
    Portable Objects
     text file that includes
    the original texts and
    the translations.
     language independent
    Machine Objects
    includes the exact same contents as PO file.
    are compiled to binary format and are used for machine translations.
     Using Poedit
     Translations
     Filtrations
    .sdf, .xml, .properties, .ini, .rc,
    .yml, .wordfast, .json, .sub
    Native Formats

    View Slide

  7. www.cdac.in
    Tools we need to keep in box!
     Poedit: cross-platform gettext catalogs (.po files) editor, using Poedit we
    can generate .mo files also.
     Translation Toolkit (http://translate.sourceforge.net/wiki/toolkit/index )
    • Convertors: moz2po, oo2po, prop2po, php2po, txt2po, po2wordfast,
    pot2po, csv2po, html2po, ini2po, json2po, rc2po
    • Tools: poconflicts, pofilter, pogrep, pomerge, pocompile, poclean
    For common platforms: Windows
    / Linux / Mac
    Tools free /non-free Licensed Under Online/Offline Platform Dependency
    1. Pootle Free GNU GPL Online N/A
    2. Rosetta Non-free Online N/A
    3. Kartouche Free GNU GPL Online N/A
    4. KBabel Free GNU GPL Offline Widows/Linux
    5. poEdit Free Offline Widows/Linux
    6. Attesoro Free GNU GPL Offline Linux
    7. passolo Free GNU GPL Offline Windows
    8. IniTranslator Free GNU GPL Offline Windows
    9. GTranslator Free GNU GPL Offline Linux
    10. LocFactoryEditor Free GNU GPL Offline Mac OS

    View Slide

  8. www.cdac.in
    Localization Sphere: Desktop, Web, Mobile
    We have i18n support
    available in every
    technology in terms of
    API, Framework,
    Libraries etc and they
    work on similar concept
    of run-time injection,
    fetching strings from
    native format.
    An example could be…
     GNU gettext for C, C++ and open source tools
     Microsoft Localization Framework (resource.dll based)
     For Java: Apache Tapestry and International Components for Unicode
     BabelFx for Flash and Flex Rich Internet Applications
     Rails Internationalization (I18n) API for Ruby on Rails
    http://www.endlesslycurious.com/ 2008/10/

    View Slide

  9. www.cdac.in
    Localization Mechanics
    How the things are actually linked up to provide
    dynamism in localization: a tool having English
    Speaking UI quickly switches to Hindi, thus add
    Hindi territory to the list of its lovers!!

    View Slide

  10. www.cdac.in
    Locale – the program – basis of
    localization
    The locale program writes information about current locale environment,
    or all locales to standard output.
    Environment variables available to locale aware programs:
    1. LC_CTYPE (Character classification and case conversion)
    2. LC_COLLATE (Collation order)
    3. LC_TIME (Date and time formats)
    4. LC_NUMERIC (Non-monetary numeric formats)
    5. LC_MONETARY (Monetary formats)
    6. LC_MESSAGES (Formats of informative, diagnostic messages and interactive
    responses)
    7. LC_PAPER (Paper size)
    8. LC_NAME (Name formats)
    9. LC_ADDRESS (Address formats and location information)
    10. LC_TELEPHONE (Telephone number formats)
    11. LC_MEASUREMENT (Measurement units)
    12. LC_IDENTIFICATION (Metadata about the locale information)
    LOCPATH: where locale data is stored. Default is /usr/lib/locale
    A way to handle localization levels easily…

    View Slide

  11. www.cdac.in
    Required programs for GNOME are:
    1. gcc (GNU C Compiler)
    2. gettext (GNU Internationalized Utilities)
    3. gettext-base (GNU Internationalized Utilities for the base
    system)
    4. libc6 (GNU C Shared Libraries)
    5. libc6-dev (GNU C Development Libraries)
    6. locales (Common files for locale support)
    7. libintl (Message translations system compatible i18n library)
    8. php-gettext (read gettext MO files directly through PHP)
    9. gtranslator (PO File editor for the GNOME desktop)
    10.poedit (gettext catalog editor)
    Things we need in place…
    Working with GNU Gettext

    View Slide

  12. www.cdac.in
    Working with GNU Gettext –
    Implementation thru ‘C’
    #include
    #include
    #include
    int main(void)
    {
    /* initializes the entire current locale as per environment variables set by the
    user */
    setlocale(LC_ALL, “”);
    /* sets the base directory for the message catalogs */
    bindtextdomain(“hello”, “.”);
    textdomain(“hello”); /* set domain for future gettext() calls */
    /* allows the translator to work independently from the programmer */
    printf(gettext(“Hello World\n”));
    return(0);
    }
    Internationalized ‘Hello World’ Program
     man setlocale, textdomain
     xgettext, msginit, translate
     msgfmt, set lang and
    chmod

    View Slide

  13. www.cdac.in
    Working with GNU Gettext –
    Implementation thru ‘C’
    Next Steps…
    • Extract strings from
    source file
    • Create the template
    for translations
    xgettext • Create the files to
    translate using the
    template
    • Edit and translate
    file.
    • Set Project-Id-
    Version to
    {TextDomain}
    msginit
    • Create target directories
    in Text Domain Location
    bound.
    • Compile and install
    translations
    msgfmt

    View Slide

  14. www.cdac.in
    Developers Checklist
    Separating the translatable text from the code will avoid code
    duplication, will let localizers and developers work on updates
    simultaneously and remove the possibility of damaging code during
    translation.
    Externalize all translatable content – Take the
    text out of the code and place in resource files

    View Slide

  15. www.cdac.in
    Developers Checklist
    Input fields often do character validation, so make sure to attach
    the validation rule to the specific country or have the validation
    rule update when country selection changes.
    Allow input of international data and foreign scripts

    View Slide

  16. www.cdac.in
    Developers Checklist
    Concatenation only works when the content is written for a
    specific language. Avoid constructing strings through concatenation
    as this makes translation hard – even impossible in certain cases.
    Avoid string concatenation

    View Slide

  17. www.cdac.in
    Developers Checklist
    This form will not work for many languages as the verb
    will be different depending on the product name.
    Further, do not use a noun as a parameter in a sentence
    and avoid reusing strings. Translation tools let linguists
    recycle previously translated strings during the
    translation pass.
    Avoid using given string variable in more
    than one context

    View Slide

  18. www.cdac.in
    Developers Checklist
    Make sure the characters don‟t get corrupt during
    input > database > output route:
    Do all string handling with Unicode
    An internationalized application uses Unicode for all handling of
    strings and text. This applies to the static text as well as the
    dynamic text that is communicated between the application and
    the database.

    View Slide

  19. www.cdac.in
    Developers Checklist
    Translated text expands 30% on average with the exception of
    some languages where it may shrink. Leave enough room on
    the layout for expansion and avoid static sizing. If there are
    strings that should not exceed a certain size, always include
    comments in the resource file for those items.
    Provide extra room for text expansion – User Interface

    View Slide

  20. www.cdac.in
    Developers Checklist
    A string can be translated to a Indian language in many
    different ways. It is very important to provide context
    information in the resource file when necessary.
    Add context information to strings using comments

    View Slide

  21. www.cdac.in
    Developers Checklist
    Date/time and numeric formatting differ even between the regions
    that speak the same language. Example: dd.mm.yyyy in Bengali; dd-
    mm-yyyy in Kannada, Gujarati, Hindi, Marathi, Punjabi, Tamil; d-m-
    yyyy in Telugu, no leading zeroes.
    Use system functions for date/time and numeric formatting

    View Slide

  22. www.cdac.in
    Developers Checklist
     Font face, size, style will be different for some languages. In line
    styling will prevent these modifications to be done or require code
    duplication. Always use external style sheets to define styles for a
    web application.
     Avoid using styling tags such as "em", "strong" and "italic" text. Bold
    font faces cause problems as bold strokes may result in a big blob of
    ink when the font size is small in printing.
     If emphasizing a string is needed with bold font face, we can do it by
    externalizing the style. This way, localizers can decide for font size as
    per need.
    Externalize all styles and formatting

    View Slide

  23. www.cdac.in
    Developers Checklist
    Use system functions for sorting and string comparison

    View Slide

  24. www.cdac.in
    Developers Checklist
    Use system functions for sorting and string comparison
    This example has been taken from Microsoft MSDN.
    An internationalized application does not use any manual
    sorting logic and relies on the underlying framework‟s API for
    string comparison. This applies to database data as well as the
    strings that come from resource files, which may be used in
    form elements and others such as combo boxes.
    http://msdn.microsoft.com/en -us/goglobal/bb688122

    View Slide

  25. www.cdac.in
    Discussi
    ons
    Thank You
    Thank You

    View Slide