Transifex: Let a Thousand Languages Bloom!

For most open source projects, translations are pretty important. Projects that are used by desktop users, such as desktop environments, GUI applications, and distributions, most frequently ship localised user interfaces, documentation, websites and other types of resources.

Take Fedora, for example—one of the most popular Linux distributions out there. Around 60 per cent of its users use a localised desktop, and the percentages may probably be higher with other major desktop environments. In the case of Fedora content, this gets translated to something like 3-5 million users. With contributions having such a large audience and impact, it’s no surprise that the open source translation community is very active, and most major open source projects enjoy an active community devoted to translating the project into various languages.

Challenges in FOSS localisation

Typically, software developers use an internationalisation platform like gettext, which parses the source code and extracts the translatable strings from the code into special PO files. These files are handed to translators, who translate them into a target language using a variety of tools.

The challenge for most projects lies in receiving those translation files back in their version control system (VCS). Giving access to your VCS to a few developers is usually okay, but having to administrate accounts for hundreds of translators could be a challenge. To avoid that, some developers even decide to only accept translations with bug reports or e-mail attachments. But a developing product usually means that “strings are changing often”, and with each release, translators will send a new batch of translations in. That’s a lot of bug reports and e-mails.

Larger projects usually have the advantage of developing their own translation community. In which case, however, some developers feel more productive using a different type of VCS, and some others even host their project on external servers. The consequences of these approaches are either low productivity, or just a small number of translators and quality that suffers.

Finding a solution

Transifex has been developed as a solution to these issues, and to make translations dead-simple both for developers and translators. The goal with Transifex was to work as a translation proxy and handle the mechanical processes for both these groups of users, allowing them to work more efficiently and effectively.

Developers give Transifex access to their source repository. The Transifex “robot” can log in to a number of different versioning systems and grab the translation files for the translators. The latter log in to a unified, easy-to-use interface, independent of the upstream VCS type and location, and receive the translations they need. Upon translation, they can use the same interface to submit the files back to the VCS.

How it works

Richard Hughes is the software developer of PackageKit. He hosts his project in packagekit.org, and needs to find a way to receive quality translations in a hassle-free way. He fires up his browser to an existing Transifex server (such as the soon-to-be-launched transifex.net) and registers his project there. He then receives an SSH key and uses it to create a special user on his server, with write access in the translation directories. His project is now ready to receive translations.

At this point, Richard is asked whether he’d like Transifex to scan its translation memory from other projects to bootstrap the translations of his own projects. He’s delighted to see that his PO files have been translated to somewhere between 20-40 per cent with no human interaction.

Piotr is a Polish translator who loves translating free software GUIs. He has registered with Transifex and requested to receive notifications for new projects registered, which might interest him. He receives an e-mail with a direct link to the Polish PackageKit translation and another link that he can use to submit the file back.

Once the file is submitted back, Richard is notified that language translation for Polish is now at 100 per cent.

Architecture details

Under the hood, Transifex abstracts all VCSs and runs a clone/checkout on the repository. It identifies the i18n method and the translation files. Depending on the i18n method, it compares the translation files with the template file (for example, the English one) and calculates translation statistics for each one.

The management burden is removed from developers, who can concentrate on what they do best, which is writing code. Translators can use their single Transifex login account to contribute to any project they like, as long as it’s registered on Transifex.

As a high-level Python application, the service includes hooks that can improve the workflow in a number of ways. Pre-commit, the validity of the file’s syntax is checked, avoiding breaking the developer’s build process with broken files. It also allows fine-grained permissions to files the translators need access to. Post-commit, Transifex can notify language leaders and others about file submissions, provide RSS feeds for submissions, etc.

Transifex currently supports git, hg, cvs, svn and bzr, and adding more VCSs is a matter of writing a few lines of code. Its developers serve POT-based projects, and are looking forward to extending the i18n support to include intltool-based projects (GNOME), XLIFF, etc. The login mechanism also supports OpenID.

Development of Transifex

The development of Transifex began as part of the 2007 Google Summer of Code project by myself (Oh! Hi! I’m Dimitris Glezos :-)). It was initially written in Python using the TurboGears framework, and right after the summer it was put into production in Fedora, used by more than 100 projects and 500 translators.

Next year, Transifex was presented in more than 10 international conferences, including FOSS.in 2008. In the summer, Transifex earned three more GSoC applications and was re-written from scratch using the Django Python framework, now including many of the suggestions from existing users. Development has taken place since then on transifex.org and on the transifex-devel mailing list.

In the meantime, other projects liked the platform and joined in our efforts. GNOME’s Damned Lies and Vertimus tools migrated their code to Django, with the goal of being merged with Transifex at some point in the future.

Future features

With more contributors joining in the developer team, Transifex is now moving towards a stabilised platform to serve independent and upstream software projects and then on to bigger ones.

One of the immediate features we’d like to add is per-VCS file monitoring, so that translators can ‘track’ a project and get notified when the translation percentage for their language changes. Adding commenting support for projects and submissions, as well as developing support for file uploads will enable translators to better collaborate in QA.

Another often requested feature is the development of a command-line interface allowing translators to do something like the following:

$ tx set-language bn_IN
$ tx get-collection Fedora
Received anaconda/po/bn_IN.po
Received packagekit/po/bn_IN.po
$ # Translation...
$ tx send-collection Fedora
Sent ‘anaconda/po/bn_IN.po’ (100% translated)

The vision: Transifex.net

As mentioned earlier, Transifex allows downstream communities to send files directly to the VCS of upstream projects. One might wonder then, which Transifex community should an independent project choose to receive translations from—Fedora, GNOME, or example.com?
Having a common place where open source translations take place is key to link translation communities together and reach new levels of collaboration between translation teams. Here’s a plan we’re evolving with www.transifex.net: Establish a healthy network where developers can translate their applications and translators can contribute to their favourite projects. Project teams that wouldn’t like to undertake the trouble of setting up their own Transifex instance, should have a stable, rich-in-features service, to join their efforts with the rest of the open source community, under a common umbrella.

Becoming a contributor

Transifex is written in Python and utilises the awesome Django branch with its infamous top-notch documentation. This makes it really easy for folks to join in and extend the platform with the features they’d like to see added. Development information can be found at transifex.org/wiki/Development. To set up a development environment of your own, check out the documentation at docs.transifex.org/intro/install.html.

An example of an easy task would be to add support for associating registered projects with their maintainers/developers. This will give translators a contact point for more information on the project and for conflict resolution. Creating a patch that adds simple support for project maintainers is a matter of a few lines of code: add a foreign key from the Project model to the User and probably edit the User Profile page to include a section listing the projects the user maintains.

Adding support for more VCSs and i18n back-ends is also quite feasible because of the abstractions Transifex includes in those areas. For most needs, one just copies a Python file and changes accordingly. We’ve marked quite a few tickets with the ‘easy_task’ keyword, so check out transifex.org/report/9 to start hacking.

Let a thousand languages bloom!

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.