Clienti soddisfatti da oltre 15 anni
Fornitore di servizi linguistici e tecnologici personalizzati per clienti internazionali
e aziende specializzate nei campi dell'IT, del software, dei prodotti multimediali,
dei videogiochi, della formazione e dell'eLearning, dell'industria e del turismo.
With more than 200 countries and over 6 billion people speaking different languages, using different currencies and applying country-specific business practices, being a global player is no easy matter. In fact, most global companies concentrate on a few poles around the world. Published in 2006, the LISA association's Global Business Practices Report revealed target languages chosen by organization that break down as follows: French 56%, Spanish 53%, German 50%, Japanese 29%, Simplified Chinese 27%, Italian 21%, and all other languages less than 7%. However, a comment by former German Chancellor Willy Brandt still holds true: "If I'm selling to you, I speak your language. If I'm buying, dann müssen Sie Deutsch sprechen [then you must speak German]."
To become a global player, product lines need to be designed in such a way that they can be sold anywhere in the world with minimum change. Let's take McDonald's as an icon of globalization; present in 100 countries, McDonald's have started to "localize" their menus! In Hong Kong, burgers are served with two patties of glutinous rice of instead of buns. In Japan, you will find shrimp burgers and Green Tea-flavored milkshake. In India, instead of beef you will find lamb, chicken or vegetarian burgers.
For IT products to be easily changed, internationalization guidelines need to be followed. Products are designed and/or coded to support national standards and conventions. Consequently, only internationalized products will be cost-effective to localize. Localizing a product means not only translating it, but also giving it a "local flavor" by making the content meaningful/acceptable on a local level. End user should be under the impression that the product was designed by a native of their country, and not have the feeling that it has been badly adapted to their local needs.
Localization challenges tie in with linguistics and cultural differences. Below, we will give a few examples of these challenges to illustrate the most common issues. Another challenge inherent in localization is the choice of tools used to manage the localization cycle effectively and efficiently. Whilst localization is possible without these tools, it would mean a great deal of wasted time, and would probably result in poor quality.
Technical challenges linked to linguistics
- Text string expansion: It is not uncommon for short texts, such as the titles of software commands, to be three times as long in German as they are in English, while the Chinese equivalent will be much shorter. For example, the English word "Redo" translated to German is "Wiederherstellen" up from 4 characters to 16, i.e. representing an expansion of 400%.
- Character sets and encodings: Character encoding schemes are limited in size, and do not always cover all the characters of a specific language. For instance, encodings belonging to the ISO-8859 family are represented by one byte, and are thus limited to 256 characters. In this family, Latin 1 covers Western European languages with characters such as © , «, ¿ and à, while Latin 2 covers Central and Eastern European languages with characters such as ś, ŭ, ť and č. Latin 8 covers Hebrew characters. For Far East languages, other complicated character encoding schemes are defined, such as S-JIS (Microsoft for Japanese) and GB18030 (Simplified Chinese). The latest Unicode standard (5.0) covers almost 100,000 characters and most scripts (writing systems) in use today. Multilanguage software should be adapted to Unicode to support as many languages as possible. The necessary adaptation process is by no means trivial, as not only does the software need to be adapted, so do all related legacy systems that store data.
- Bi-directional text and vertical character display: In Arabic and Hebrew, the text is written from right to left. However, references in Latin characters and numbers remain from left to right. Third party applications necessary during the localization cycle may not support this, resulting in high conversion costs. Similar challenges are faced for languages that can be written vertically, such as Chinese.
- Keyboard character layout: Keyboard layouts vary from one country to the next, as they have been customized to the characters and symbols mostly used in a given language. For some languages, there are various ways in which to input the language (Chinese, Japanese and even English). Keyboard input may not be appropriate if the product is not internationalized and does not recognize other keyboards.
- Fonts: Fonts are generally limited to a set of scripts. Thus, the appropriate set of fonts should be chosen and used during runtime and DTP. This can cause visual problems, especially when multiple languages are represented in the same page.
- Keyboard short cuts: These will also need to be localized to make sense for a particular language. [Ctrl] O makes sense in English for "Open", but not for its Portuguese equivalent "Abrir".
- Alphabetical sorting order: Sorting rules for extended characters differ from language to language. In Polish, extended characters are collated after their non-extended counterparts: A, Ą, B, C, Ć, D, E, Ę, .... while in Swedish, they are placed at the end..., X, Y, Z, Å, Ä, Ö). In Hungarian, both extended characters and consonants are written with single, double or triple characters. The alphabetic order thus looks something like: A, Á, B, C, CS, D, DZ, DZS, E, É, F...
It goes without saying that when sorting lists of items in the Graphical User Interface (GUI) or in your documentation's indexes, you need to be fully concentrated and have all the rules at hand. - Text and audio concatenation or placeholders: Concatenation is when a sentence is composed of different segments of text. For instance: "do not", "click on" and "print" could be composed as "click on print" or "do not click on print". Well-intentioned programmers use concatenation to save space. While it makes perfect sense in their native language, it may not work for other languages which are structured differently. German, for instance, requires the verb to be at the end of the sentence. If composed in the same way, a German sentence will thus be grammatically incorrect: "Nicht klicken auf Drucken" instead of "Klicken Sie nicht auf Drucken". These localization issues require re-engineering and are time-consuming.
Similar issues occur with placeholders. If we take "%d red", "flag" and "flags", where %d is the placeholder for the number, it will be impossible to obtain a correct sentence in Polish as the word "red" changes based on both the numbers and the gender: for illustration we selected the Polish word for flag (flaga - female), for armchair (fotel - male) and for lake (Jezioro - neutral).

- Linguistic Style: Writing styles invariably differ from one translator to another, even though all may be correct. Maintaining consistency is thus a major localization concern, especially when more than one translator is working on the translation. Using Linguistic Style Guides, Translation Memories and a single reviewer can help maximize consistency.
- Abbreviations: Abbreviations are often used to save space in English documentation, but they don't always translate into other languages, or worse, may sound like an offensive word!
- Terminology: Terminology is related to a specific domain (finance, IT, travel, etc.), but choices will depend on both branding and overall corporate strategies. Content created in different Business Units will have different formats: it can represent the actual software, but also documentation, online help, training material, the company website, newsletters, contracts and so on. Keeping terminology consistent in all these areas is not possible without a centralized system and strict processes. Defining terminology when creating the content is the best way to ensure good quality translation. The systems to use will be described later in this article.

Technical challenges linked to cultural differences
- Numeric formats: 10,000 means the number 10 for a French person, while for an American it means 10 thousand! 100,000.99 does not mean anything in France, where decimals are separated by a comma "," and not a dot "." as in the US. In France, this number should be written 100 000,99.
- Currency: In Europe, the symbol associated with the currency is placed after the numbers and not before. For example €25,000.00 is written 25 000,00 €. Currency needs to be localized to its local equivalent to be meaningful for the end user.
- Payment methods: E-commerce solutions need to account for local payment preferences. In Germany, people tend to prefer wire transfer payments to online Visa payments. In Slovakia, payments by check do not exist! In other countries, people may simply not have credit cards.
- Local address and telephone formats: Local address input forms have to be adapted to the country's address format. The same goes for phone numbers, either for input or when published in the content. Locals should understand address forms straight away without having to guess if a country code or an area code has to be included for it to work!
- Name formats: Hungarians expect to see their family name first and their first name second. A Hungarian called Zoltan (first name) Kiss (family name) would thus expect to see his name displayed as: Kiss Zoltan. However, in Germany, the Netherlands or the US, the first name is displayed first: Christian Meier.
- Calendars: The Gregorian calendar is the most widely used calendar in the world, but references to the New Year will not have the same meaning if the person has the Chinese or Islamic calendars in mind.
- Date: 03/05/01 does not have the same meaning for all of us. It could be referring to the 3rd of May 2001, or the 5th of March 2001, or the 1st of May 2003, and does the 01 refer to 2001 in the first two cases? In the US, the date format is mm/dd/yy, whereas in Europe, dd/mm/yy tends to be used. However, in Sweden, the year is placed at the start!
- Time zone: Greenwich Mean Time (GMT) and Coordinated Universal Time (UTC) are international standards that are widely used. However, people in the US and Canada tend to refer to Eastern Time (ET), Eastern Standard Time (EST) etc. In France or Spain, people like to use Central European Time (CET), or Eastern European Time (EET).
- Time format: Use of a.m. and p.m. instead of a 24-hour clock needs to be localized for Europe. Only a minority of Europeans will understand 8 pm!
- Metric system: Only three countries have not officially adopted the metric system, and amongst them you find the US. Also, when speaking to British or Irish citizens, you will make far more sense if you talk about miles and pounds rather than in kilometers and kilograms.
- Colors: Choices of colors seemingly without consequence used for a graphic or background can send different messages depending on the target region. If the intended effect given does not correspond to that of the original product, it will need to be localized.

Concerning illustrations, consider how Yahoo localized their Finance website; the colors used to represent numbers and/or arrows differ according to whether the market trend is up or down.

- Graphics or Icons: Graphics may indicate events or actions particular to a specific country, but will have no significance for others. Icons showing images of animals or body parts such as the eyes, hands or feet may be offensive in some countries.
- Geopolitical issues: When illustrating a map of China or India, should the disputed border with Tibet be included or not? Given that the Kuril Islands are under Russian administration, but that the Japanese refer to them as the Northern Territories, how can this be represented? Can Ireland be illustrated as the whole isle, or should it stop at the northern border line? Is Jerusalem to be represented as the capital ofIsrael? Not all countries recognize it as such.
These illustrations show that the translation process, which entails communicating the meaning of words or sentences, only represents a subset of what we refer to as localization. Technical issues are to be solved; processes and procedures are to be put in place to attain the ultimate objective of localization, namely, having the same functionalities across different language editions of the same product.
The localization process is far from straightforward. Terminology needs to be correct and consistent. Source files have different formats, and translations should ideally be reused from one project to another and from one version to another. A great many tasks and people are necessary to accomplish all of this. To ease the localization cycle and enhance the quality, different tools are available on the market. These tools range from Terminology Management Systems to pure Translation Memory tools, localization tools or comprehensive Workflow Management Systems. Selecting the appropriate tools and using them properly is obviously all-important.
Choice of tools to manage the Localization Cycle
Terminology Management Systems
As mentioned earlier, managing terminology is crucial to ensure localization quality. Terminology Management Systems should be used to maximize the consistency and relevance of terms used both in the source authoring and localization stages. These systems help manage lists of terms, and give information such as context, explanations, definitions, classifications and graphics, where applicable. This information may be important for the translation of these terms and for their selection during the translation cycle.
Terminology Management Systems can be used as components of Translation Memory and Localization tools, or as standalone solutions that plug into them during translation phase.
Building a Terminology Management database is no easy task. Firstly, all existing data and assets have to be analyzed to extract the existing terms. Although this task can be done using tools, it will inevitably entail a great deal of human work. Once the terms have been validated (preferably by the end customer), they are translated to the target languages, reviewed and ready to use.
Combined with Translation Memory tools, the Terminology Management System helps translators in their work by providing them with a term's translations whenever required.
Although there are many Term Management Systems available, most of them are complex and need to be customized. This alone explains why spreadsheets are often used instead.
Translation Memories
The basic concept inherent in Translation Memories (TM) is simple: once a text has been translated, it is "memorized" in a database for possible use in future translations, thereby avoiding redundant work. This is achieved by segmenting the source text into "translation units" (phrase, paragraph, etc.), and saving these units in the TM database along with their translations for future use. New segments for translation are subsequently matched against the existing segments in the TM database. If there is a match, the existing translations are used. The words in the matched segments are counted as "repeated words".
There are several types of matches:
- Exact match (also called 100% match): the current translation unit (source language) exactly matches the one stored in the TM.
- Fuzzy match: the match between the current translation unit and the one stored in the TM is not exact; it is said to be "fuzzy". Here, the existing translation can be used as a reference for the new translation.
- ICE match (In Context Exact match): not only does the current translation have an exact match, the surrounding translation units have a match too (location within the paragraph, for example).
Figures show that in a given product line's documentation, the repetition rate of segments can reach 30%. For updates to the same product, this repetition rate can be as high as 80%. The benefits of such tools are considerable:
- Translation consistency and, therefore, quality: once a segment is translated, the same translation will be used—when possible—across the entire project (documentation, software, online help, etc.). Consistency is additionally ensured for future releases of the product.
- Speed: only what needs to be translated is translated.
- Cost: payment does not go beyond what is translated or reviewed.
TMs can also be used to analyze workloads involved in new projects, and give an accurate word count (total word, words to be translated, ICE match words, exact match words, fuzzy match words). Based on these figures, costs can be estimated in terms of the translation itself, the effort required and the time frame.
While TMs are very useful for localization work, considerable effort is needed to maintain them. Once a translation has been approved by reviewers, the TM need to be updated with the latest translations, then cleaned up (this is an automatic process that removes bad translations, bad segmentations and other quality-related issues.
Note that it is not necessarily a good idea to use the same TM database for all translated segments, even if specific attributes are defined for each. It usually makes more sense to use a different TM per business unit or product line.
Localization tools
Localizing software is not the same as localizing documents. Often, the strings to translate are short (even single words), the context is unclear, and other components need to be localized such as:
- Text messages (strings table)
- Short cut keys
- Accelerator keys
- Dialog structure (size, location and other properties)
- Menus
- Bitmaps and icons
The all too common solution of simply extracting the GUI strings into a file (text or spreadsheet) will lead to major problems and affect the overall quality. To overcome these problems, localization tools should be used. Localization tools use dedicated parsers to parse the components of the application to be localized. Relevant resources are loaded into a common database, and non-localizable resources are locked or hidden. A pseudo-translation can be done to confirm that all the extended characters are correctly displayed before preparing the translation kit.
During the actual translation cycle, the information is presented to the user (localization project managers, translators and reviewers) in WYSIWYG format. Translation and review work can thus be done in context, and the impact of long or out-of-context translations can be seen and fixed immediately (by adapting the translation or the dimensions of the GUI component). Once localized, the target resources are reloaded in the localization tool, and the target files generated for delivery to the publisher.
The main drawback with localization tools is the fact that they cannot handle successfully dynamic content. This makes them unsuited for localizing enterprise and web applications. A significant amount of research work is underway in this area.
Workflow Systems
The localization process involves numerous steps and functions, both for the publisher and the multilingual vendor. As illustrated in the figure below, the major steps in a conventional localization process are:
- Fetching content to be localized
- Preparing the content (text segmentation, resource extraction etc.)
- Leveraging the content from existing TMs
- Effort estimation, costing
- Management approval
- Work assignment
- Translation and localization
- Proofreading / Editing / Reviewing
- Testing, if applicable
- TMs update, maintenance of linguistic assets
- Delivery
- Billing
The number of roles involved is no less significant:
- Content providers (editors, technical writers, R&D teams etc.)
- Localization Project Managers (on the publisher and vendor sides)
- Localization Engineers (on the publisher or vendor side)
- Translators (in-house, freelance, Single Language Vendor, sub-contractors)
- Reviewers (in-house, freelance, Single Language Vendor, sub-contractors)
- Quality Assurance engineers (on the publisher and vendor sides)
- Finance personnel
- Product Managers

Automation Workflow Systems connect to the various systems used in the localization process to fetch the necessary data and push the work down the stream. For example:
- The tools used are connected to the CMS (Content Management System) to fetch the content to be translated and deliver the localized content.
- TM systems are used to automate the leveraging, word count and effort/cost estimates (using information from finance systems).
- Localization tools are used to automatically extract all the resources, and prepare localization kits to be sent to Localization Vendors.
- Localization tools are used to automatically build the application resource in the target languages to be shipped to the publisher.
- Localization Vendors, translators and reviewers, and gather the localized work.
- Reports systems are used to give full visibility over the project's status.
This type of tool is also referred to as a GMS (Globalization Management System). It exists both as a standalone solution or as part of a suite of localization solutions.
Now that you have gained greater insight into the issues involved in localization and the tools needed to successfully manage a multilingual localization project, you must bear in mind that the localization industry is a fast-changing one. Clearly, even more automation is required in the localization process, and a great deal of effort is being made in this respect. For example, Machine Translation Systems represent an exciting area of research for many around the world. Although considerable progress has been made in recent years, the resulting quality still lags behind that of a human translation. In order to be effective, this technology requires a very large corpus of multilingual content (up to 1 million words), but the future is open… In fact, machine translation is set to play a very important role if we are to meet the increasing desire to go global, and thus to localize content.
Janaina Wittner, Strategic Development Manager at WhP and Daniel Goldschmidt software engineer at Google.