Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Localization (l10n) - An Introduction

Sundeep Anand
February 03, 2012

Localization (l10n) - An Introduction

This is about: how l10n, i18n and g11n are connected and what are the steps we need to undergo during localization process. Presented at Department of IT, Govt. of Goa, India.

Sundeep Anand

February 03, 2012
Tweet

More Decks by Sundeep Anand

Other Decks in Technology

Transcript

  1. Relationship: i18n, l10n, g11n • Localization is the process of

    adapting a product or service to a particular language, culture. • Internationalization is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes. • Globalization is the process of designing and developing applications that function for multiple cultures and regions.
  2. Localization at a glance Often thought of only as a

    synonym for translation of the user interface and documentation, localization is substantially more complex issue. It can entail customization related to: • Numeric, date and time formats • Use of currency, Symbols, icons and colours • Keyboard usage • Collation and sorting • Text and graphics containing references to objects, actions or ideas which, in a given culture, may be subject to misinterpretation or viewed as insensitive. • Varying legal requirements • Translating text content, software source code, web sites, or database content and many more things.
  3. Internationalization and Localization are the subset of Globalization Example :

    A small inventory management has been developed targeting Hindi geographic areas. Now, for marketing it in a different state, let’s say Gujarat, we need to provide appropriate mechanism by which UI of system could be switched to Gujarati (i18n) so that it can be easily localized.
  4. Localization Framework for Data Localization Frameworks for conversion of data

    has been developed. The data may be in the form of a database such as ORACLE, MS-SQL or MS-Access (.mdb) or in Excel (.xls), MS-Word (.doc) or FoxPro. Includes : • Database translator • Acronym handler • eg :Dr., Mr., USD, SBI • Number to word conversion • eg :Rupees Fifty Nine only • Date-time conversion • eg : Quarter Past five, 11 A.M. • Address field conversion routines. • eg : Near J.J. Bridge
  5. Database Conversion Utility • List all the databases installed on

    local system, LAN in case of SQL Server • Facility to select database, table and row to be transliterated • Separate column or table for transliterated data • Supported Databases MS-SQL ORACLE DB2 (GOM) Ms Access
  6. Localizing existing Data • Quick and handy tool for transliteration

    of data from ISFOC/ISCII format in Office documents into Unicode • Bulk conversion of data – Large number of files can be converted in single execution • Retains all formatting information in Word File after conversion • Choice of different font sizes for HTML file conversion Supported file types • Word Files (*.doc,*.rtf)‏ • Excel Files (*.xls,*.xlsx) • HTML Files (*.html,*.htm)‏ • Text Files (*.txt)‏
  7. Web Code Conversion • HTML / ASP/ ASPX pages of

    existing English application can also be converted to Indian Languages using this framework. • It retains the physical layout of the web page and enables Indian Languages without changing the layout or the look and feel of the web-page. • Presentation layer changes are done using XML-XSLT and CSS based approach Client-Server based application seamlessly. • Supports Conversion of .xls and .txt / .aci files. Data to be transliterated is sent to the server Transliterated output in Hindi will be send back to the client. Supports Hindi<-> English Transliteration
  8. File Transliteration (Over internet) • Client-Server based application • Supports

    Conversion of .xls and .txt / .aci files • Data to be transliterated is sent to the server • Transliterated output in Hindi will be send back to the client. • Supports Hindi <-> English Transliteration
  9. Other Tools HTML .Doc and .Xls Conversion from English Localisation

    of CORE BANKING Applications using .Net and XML and Universal Banking for SBI • Presentation layer changes • Minimal downloads • Aesthetics and layout of a web page must be maintained • XML-XSLT Based approaches
  10. HTML Translator An utility has been developed for bulk conversion

    of English HTML pages to Hindi. The folder is scanned for all existing Web pages to which the processing logic based on rules of domain specific terms and terminologies, followed by transliteration is applied. Acronyms are also handled by this utility. The font sizes for applications created for Hindi often needs to be modified in a ratio / factor for better clarity of the ascenders, diacritic marks, etc. This also caters to the CSS (cascading style sheets) for applying format information uniformly to the webpage. This utility not only converts the data but also adjusts the fonts according to the ratio set.
  11. Localization – In Depth • Localization Activities • Feasibility Analysis

    • Localization Methods • Packaging for public • Perspective of Testing • Localization Levels • Translations: Portable Object and Native Formats • Tools we need to keep in box! • Localization Sphere: Desktop, Web, Mobile • Localization Mechanics • Proliferation Mechanisms
  12. Localization Activities Feasibility analysis of tools we need to localize

    Identification of suitable localization method considering environment, platform, architecture Local functionality development and translation (with the help of Translation Team) Packaging of localized tools /applications for deployment needs after unit testing Testing of localized tool / application context, content and functionalities wise (with the help of Testing Team)
  13. Feasibility analysis of tools we need to localize • Portability.

    platform independence. There is no reason to develop platform dependent tools for a platform independent product. • Consistency. Needing different sets of tools for different components of the client indicates that the client localizability solutions are not consistent nor developers friendly. For localizers, multiple file formats reduces their productivity, too. Therefore, we shall advocate a singular localizability solution across all client modules and components wherever feasible. • Valid-ability. Normally, localization work is performed at a remote site. The translators need to be able to validate the results in their local environment. Sending translations back and forth between we and localizers is time consuming and inclined to misplace or lose data. • Leveragability. Localization is costly. The ability to leverage is essential to bring down the cost. In addition, it will make the translation consistency among releases. • Flat file. Ideally, we want to developers to put localizable resources in flat file; if not feasible, we shall convert them to flat files.
  14. Localization Methods Compile Time Localization Link Time Localization Create language

    times CPP and Convert them into objects Create objects to libraries Compile parent CPP with Required Library Run Time Localization (Gettext Framework) GNU Gettext Machine Object (MO) File Setting of locale for LANG/LANGUAGE Environment Variable of the Tool Concerned
  15. Packing for Public • Nullsoft Scriptable Install System (NSIS) •

    Installers Formats… • Windows (msi) thru NSIS • Ubuntu [debian flavors] (deb) thru deb-build • Fedora [redhat flavors] (rpm) thru rpm-build • Every tool has base/core module which will install en-US component and for Indian Language we need to install ‘Language Pack’ of desired locale.
  16. Perspective of Testing • Contextual meaning of translated strings of

    welcoming GUIs. • Working of shortcuts in comparison to that of English’s. • Working of Installer and Uninstaller in different environments. • Do installers are symmetric across platforms: windows/linux. • Disturbance found in software functionality due to localization. • Testing of font/s being applied to the current UI.
  17.  translation of strings (english to hindi / marathi) 

    formats used for date & time, currency, fraction, zip code, phone number, units of measurement etc.  geographic concerns over signs / denotations, colors, patterns etc. China - symbol of celebration and luck, used in many cultural ceremonies that range from funerals to weddings. India - color of purity (used in wedding outfits). United States - Christmas color when combined with green, Valentines Day when combined with pink, indicates stop (danger) at traffic lights. Eastern cultures - signifies joy when combined with white. Localization Levels dd.mm.yyyy in Bengali; dd-mm-yyyy in Kannada, Gujarati, Hindi, Marathi, Punjabi, Tamil; d-m-yyyy in Telugu, no leading zeroes used msgid “Hello” (en-US) msgstr “नमस्ते” (hi-IN) msgstr “નમસ્તે” (gu-IN) msgstr “ਨਮਸਕਾਰ” (pu-IN) msgstr “வணக்கம்” (ta-IN)
  18. Translations: Portable Objects & Native Formats Portable Objects  text

    file that includes the original texts and the translations.  language independent Machine Objects  includes the exact same contents as PO file.  are compiled to binary format and are used for machine translations.  Sample PO file  Using Poedit .sdf, .xml, .properties, .ini, .rc, .yml, .wordfast, .json, .sub Native Formats
  19. Tools we need to keep in box!  Poedit: cross-platform

    gettext catalogs (.po files) editor, using Poedit we can generate .mo files also.  Translation Toolkit (http://translate.sourceforge.net/wiki/toolkit/index ) • Convertors: moz2po, oo2po, prop2po, php2po, txt2po, po2wordfast, pot2po, csv2po, html2po, ini2po, json2po, rc2po • Tools: poconflicts, pofilter, pogrep, pomerge, pocompile, poclean For common platforms: Windows / Linux / Mac Tools free /non-free Licensed Under Online/Offline Platform Dependency 1. Pootle Free GNU GPL Online N/A 2. Rosetta Non-free Online N/A 3. Kartouche Free GNU GPL Online N/A 4. KBabel Free GNU GPL Offline Widows/Linux 5. poEdit Free Offline Widows/Linux 6. Attesoro Free GNU GPL Offline Linux 7. passolo Free GNU GPL Offline Windows 8. IniTranslator Free GNU GPL Offline Windows 9. GTranslator Free GNU GPL Offline Linux 10. LocFactoryEditor Free GNU GPL Offline Mac OS
  20. Localization Sphere: Desktop, Web, Mobile We have i18n support available

    in every technology in terms of API, Framework, Libraries etc and they work on similar concept of run-time injection, fetching strings from native format.  GNU gettext for C, C++ and open source tools  Microsoft Localization Framework (Resource.dll based)  Apache Tapestry and ICU4J are favorable choices for Java  BabelFx for flash and flex RIAs  Rails Internationalization (I18n) API for Ruby on Rails http://www.endlesslycurious.com/ 2008/10/
  21. Localization Mechanics  Selecting best localization method after analyzing given

    tool / application on specified platform  Machine Translation Localization Method  GNU Gettext How the things are actually linked up to provide dynamism in localization: a tool having English Speaking UI quickly switches to Hindi, thus add Hindi territory to the list of its lovers!! http://books.zkoss.org/wiki/Small_Talks/2009/September/I18n_Java_and_ZUL_files_with_GNU_gettext
  22. Road Ahead Proliferation of Indian languages • will help bridge

    the digital divide and make information available to vast majority of population • increase demand of localized applications and solutions. Expectations of masses from computers is over-hyped and very high • language translation, weather prediction, correct horoscope Need for speech based solutions to cater to a large illiterate population Proliferation Mechanisms… • ILDC – Indian Language Data Center • CD Distribution • Training Programs • Conferences, Exhibition, Seminars • Request from end users • Multilingual Support Center • TDIL – Technology Development for Indian Languages