Internationalization

From Omeka 1.5 on, site admins can pick the locale they want Omeka to use. When working on the core, plugins, or themes, the code must be internationalized to make display in different languages and locales possible.

Text

For most plugins and themes, making user-facing text translatable will be the lion’s share of the internationalization work. All text strings that are presented to the user and are not editable by the user should be translated. This includes obvious suspects like text in paragraphs and other visible HTML elements, but also less obvious places like the <title> element or title and alt attributes.

Omeka uses one function for enabling text translation, the __ (double-underscore) function. Anything that needs to be translated must be passed through the double-underscore function.

Bare text

Before internationalization, a great deal of the user-facing text may be written directly in HTML or plain text, with no PHP code. Omeka’s translation works through a PHP function, so you need to introduce a PHP block.

Untranslatable

<p>Some text.</p>

Translatable

<p><?php echo __('Some text.'); ?></p>

Literal PHP strings

PHP strings that will end up being shown to the user also need to get translated. These strings are already in PHP code blocks, so the process is easy. Just wrap the double-underscore function around the string that’s already there.

Untranslatable

<?php
echo head(array(
    'title' => 'Page Title'
));
?>

Translatable

<?php
echo head(array(
    'title' => __('Page Title')
));
?>

Strings with variables

A common pattern in PHP is to write strings that directly contain variables. These need a slightly different approach to be translatable. The goal is to make translators only have to translate your string once, no matter what the particular values of the variables inside are.

To do this, you replace your variables with placeholders, and pass your variables separately into the double-underscore function. (The placeholders used are from PHP’s sprintf function.)

Single variable

The basic placeholder is %s. It’s used when your original string simply contained one variable.

Untranslatable

<?php
echo "The site contains $numItems items.";
?>

Translatable

<?php
echo __('The site contains %s items.', $numItems);
?>

This will output the same way as the original, but translators will work with the single string 'The site contains %s items.' instead of many different ones for each possible number.

Multiple variables

The %s placeholder is fine for a string with only one variable. However, with two or more, you need to account for the possibility that some translations will need to reorder the variables, because their sentence structure differs from English. With multiple variables, you must instead use numbered placeholders like %1$s, %2$s, and so on.

Untranslatable

<?php
echo "Added $file to $item.";
?>

Translatable

<?php
echo __('Added %s$1 to %s$2.', $file, $item);
?>

By using numbered placeholders, translators can reorder where the variables will appear in the string, without modifying the code to do so.

Dates and times

The other major thing you will often want to display differently for different for different locales are dates and times. Omeka comes pre-packaged with date formats for various locales already.

Where translations run through one function, the double-underscore function, dates and times similarly work with one function: format_date. format_date automatically selects the right format based on the site’s configured locale.

format_date takes two parameters. The first is the time you want to display. The second, which is optional, is the format you want to use. If you don’t pick a format, the default is an appropriate format for displaying a date.

Time

There are two possible types for the time parameter for format_date: integer and string. If you pass an integer, the time is interpreted as a Unix timestamp. If you pass a string, the time/date is interpreted according to the ISO 8601 standard (this will, among many other formats, correctly parse the output from MySQL date and time columns).

Format

format_date uses Zend_Date internally, so the Zend documentation is the place to go for an exhaustive list of available formats.

Format constants starting with DATE are used for displaying dates without a specific time, ones starting with DATETIME are used for date/time combinations, and ones starting with TIME are for times alone. For each, there are FULL, LONG, MEDIUM, and SHORT variants. Each variant will automatically use a format specific to the current locale, including things like the proper order for dates and the correct names of months.

The default format is Zend_Date::DATE_MEDIUM. This will display the given date/time value as a date, with medium length. In the standard US English locale, this looks like “May 31, 2013.” In a Brazilian locale, it would instead look like “31/05/2013.”

Preparing Translation Files

Omeka reads translations from .mo files produced with GNU gettext. There are three steps to the process. After the basic work described above is complete, you will need to

  1. Create a template file that includes all of the strings to translate

  2. Create .po files that contain the actual translations

  3. Compile .mo files that Omeka will use

The guide for these tasks below follows the practices used by the Omeka dev team. There are other tools and approaches that can accomplish the same tasks. The tool we use are

Creating the template file

The simplest way to produce the template file is to follow the examples in Omeka. We begin with a template.base.pot file, which contains the basic format required to begin generating translations.

# Translation for the Simple Pages plugin for Omeka.
# Copyright (C) 2011 Roy Rosenzweig Center for History and New Media
# This file is distributed under the same license as the Omeka package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: SimplePages\n"
"Report-Msgid-Bugs-To: http://github.com/omeka/plugin-SimplePages/issues\n"
"POT-Creation-Date: 2012-01-09 21:49-0500\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

This file will be used to generate the template.pot file that is used as the template for translations. template.pot files will begin with exactly the content shown above and then include pairs of msgid s and empty msgstr. The msgid s contain the English string of text to translate. The msgstr s will eventually contain the actual translations.

The template.base.pot file is also helpful if your plugin uses strings of text that are not available for the __() function described above. For example, if your records include a flag for a permission such as allowed or required in the database, those strings need to be translated, but might not appear directly in your plugin’s display. In such cases, the strings should be added to template.base.pot below the last line:

msgid "allowed"
msgstr ""

msgid "required"
msgstr ""

If you have ant installed on your system, you can modify the following build.xml file.

<?xml version="1.0" encoding="UTF-8"?>
<project name="SimplePages" basedir=".">
    <property name="lang.dir" location="languages" />
    <property name="core.pot" location="../../application/languages/Omeka.pot" />
    <target name="update-pot" description="Update the translation template.">
        <property name="pot.file" location="${lang.dir}/template.pot"/>
        <property name="pot.base" location="${lang.dir}/template.base.pot"/>
        <tempfile property="pot.temp" suffix=".pot"/>
        <tempfile property="pot.duplicates" suffix="-duplicates.pot" />
        <copy file="${pot.base}" tofile="${pot.temp}"/>
        <apply executable="xgettext" relative="true" parallel="true" verbose="true">
            <arg value="--language=php"/>
            <arg value="--from-code=utf-8"/>
            <arg value="--keyword=__"/>
            <arg value="--flag=__:1:pass-php-format"/>
            <arg value="--add-comments=/"/>
            <arg value="--omit-header"/>
            <arg value="--join-existing"/>
            <arg value="-o"/>
            <arg file="${pot.temp}"/>
            <fileset dir="." includes="**/*.php **/*.phtml"
                excludes="tests/"/>
        </apply>
        <exec executable="msgcomm">
            <arg value="--omit-header" />
            <arg value="-o" />
            <arg file="${pot.duplicates}" />
            <arg file="${pot.temp}" />
            <arg file="${core.pot}" />
        </exec>
        <exec executable="msgcomm">
            <arg value="--unique" />
            <arg value="-o" />
            <arg file="${pot.temp}" />
            <arg file="${pot.temp}" />
            <arg file="${pot.duplicates}" />
        </exec>
        <move file="${pot.temp}" tofile="${pot.file}"/>
        <delete file="${pot.duplicates}" quiet="true" />
    </target>

    <target name="build-mo" description="Build the MO translation files.">
        <apply executable="msgfmt" dest="${lang.dir}" verbose="true">
            <arg value="-o"/>
            <targetfile />
            <srcfile />
            <fileset dir="${lang.dir}" includes="*.po"/>
            <mapper type="glob" from="*.po" to="*.mo"/>
        </apply>
    </target>
</project>

It creates two ant commands. The first one that is important to us here is ant update-pot . It will read the template.base.pot and generate the template.pot file from the strings that are wrapped in __(). template.pot will then contain all the msgid s to be translated.

You will want to double-check that you have found all of the strings that require localization. The podebug utility can be helpful with this. It automatically generates .po files that contain pseudo-translations that will help you spot any strings that are not being translated, but should be.

Creating .po files

The .po files contain the localizations, named according to the ISO 639-1 standard. For example, es.po will contain translations into Spanish, and es_CO.po will contain the more precise localization to Colombian Spanish.

Omeka uses the Transifex service to produce our translations. Other tools and services also exist to help you produce your translations, but we recommend using Transifex if possible, and setting up your plugin as child project to Omeka. This will widen the pool of translators and languages for your project.

Compiling .mo files

Once you have created the .po files for your localizations, the final step is to compile them into binary .mo files. The second command defined by the build.xml file used above, ant build-mo will perform this task for you.

All files, template.base.pot, template.pot, and all .po and .mo files should be in a languages directory at the top level of your plugin.