Google Translates everything

Is Google Translate heading towards an end game position? Two posts on the products blog would have me believe it is closer than you would think. Just this week Google announced that email translations was moving from a Lab curio to all email users:

We heard immediately from Google Apps for Business users that this was a killer feature for working with local teams across the world. Some people just wanted to easily read newsletters from abroad. Another person wrote in telling us how he set up his mom’s Gmail to translate everything into her native language, thus saving countless explanatory phone calls (he thanked us profusely).

Since message translation was one of the most popular labs, we decided it was time to graduate from Gmail Labs and move into the real world. Over the next few days, everyone who uses Gmail will be getting the convenience of translation added to their email. The next time you receive a message in a language other than your own, just click on Translate message in the header at the top of the message…

If you’re bi-lingual and don’t need translation for that language, just click on Turn off for: [language]. Or if you’d like to automatically have messages in that language translated into your language, click Always Translate. If you accidentally turned off the message translation feature for a particular language, or don’t see the Translate message header on a message, click on the down arrow next to Reply at the top-right of the message pane and select the Translate message option in the drop-down.

The second big hint, in the article, Breaking down the language barrier—six years in, is by one of the Google Translate researchers, includes a short history and context, and the stats are amazing:

Today we have more than 200 million monthly active users on translate.google.com (and even more in other places where you can use Translate, such as Chrome, mobile apps, YouTube, etc.). People also seem eager to access Google Translate on the go (the language barrier is never more acute than when you’re traveling)—we’ve seen our mobile traffic more than quadruple year over year. And our users are truly global: more than 92 percent of our traffic comes from outside the United States.

In a given day we translate roughly as much text as you’d find in 1 million books. To put it another way: what all the professional human translators in the world produce in a year, our system translates in roughly a single day. By this estimate, most of the translation on the planet is now done by Google Translate.

Of course, he repeats the mantra that all professional translators will want to hear:

Of course, for nuanced or mission-critical translations, nothing beats a human translator—and we believe that as machine translation encourages people to speak their own languages more and carry on more global conversations, translation experts will be more crucial than ever.

I think that these two posts are important – not only does Google have enough faith in it’s translations that it can roll them out across potentially the most used email system on the planet, but the statistic of 1 million books a day being translated just goes to show how much off the cuff, non mission critical translation was just waiting to happen.

First post from Kiribati

We have arrived in Kiribati! It’s lovely – the weather has been rough and ready, but hot and wet. The people are lovely and the scenery is quite amazing. I pinch myself every day. The internet connection on the other hand is appalling. And when I say internet connection I mean Internet Connection – there are a few bottle necks, but the most frustrating is that of the national telecom monopoly – their uplink, the main one on the island, is appalling. This blog post is being constructed in a text editor offline on the weekend from tabs I didn’t close on Friday afternoon and it feels quite unnatural. Anyway, more on the Kiribati language is coming, in the mean time I thought I’d mention two articles I noticed during the week.

The first is from Fully(Sic) the Crikey’s language blog about the localisation of comics in the daily papers here in Australia. In focus is the localisation of Zit’s use of mom being changed to mum. The bulk of the artile ruminates on the limited use of localisation from American (or British) into Australian – we have internalised their spellings and language usage over the last 50 years by importing their culture:

The Zits case is different though. We’re quite used to our locally produced content (or British content, for that matter) being edited for US audiences. But changing mom for mum in the Zits cartoon goes the other way. And this is something we’re not used to. We in Australia are effectively bidialectal – we hear US English (and likely other dialects too) very frequently and can effortlessly translate phrases, lexical items and spellings without it even breaching our conscious mind. For this, I suppose we can thank fifty years or more of pervasive US culture dominating our media. Perhaps this is the reason that such substitutions irritate Alan – just like everyone else, he knows that Americans spell it mom, and has no problem understanding it, but critically he also knows that Zits is an American comic strip – the characters’ voices in his head would most probably have American accents. So when he reads mum where he expects mom, it’s clearly going to be quite jarring.

The second article is from the dependable dev/null. A German company have started “creating” t-shirts – or more accurately t-shirt slogans, in both English and German:

Some of the results are more presentable than others; one might believe that “Budapest Bicycle Flux” was a semi-obscure math-rock band whose gig the wearer happened to catch in some college-town bar back in the day, and there are situations where one might plausibly wear a T-shirt reading “I Reject Your Reality And Replace It With Cupcakes”, which, alas, cannot be said for some of the outputs, such as “your vagina is a wonderland”, or a grid of words including “Hitlerponys”, “Mörderpenis” and/or the decidedly euphemistic-sounding “wurstvuvuzela”. … Interestingly enough, after clicking through the site for a while, a reader with a limited grasp of German may find their German comprehension improving slightly; perhaps the flood of meaningful (if nonsequiturial) sentences exercises the language pattern-matching parts of the brain in some kind of process of combinatorial fuzzing, reinforcing plausible word sequences.

Google Translate now does Esperanto

The Google Translate blog has announced that they have added Esperanto to the list of available languages:

Esperanto and Google Translate share the goal of helping people understand each other, this connection has been made even in this blog post. Therefore, we are very excited that we can now offer translation for this language as well.

The Google Translate team was actually surprised about the high quality of machine translation for Esperanto. As we know from many experiments, more training data (which in our case means more existing translations) tends to yield better translations. For Esperanto, the number of existing translations is comparatively small. German or Spanish, for example, have more than 100 times the data; other languages on which we focus our research efforts have similar amounts of data as Esperanto but don’t achieve comparable quality yet. Esperanto was constructed such that it is easy to learn for humans, and this seems to help automatic translation as well.

Workflow of a subtitler

If there’s one thing I really love as a monolingual interested in language and translation, it’s a translator’s workflow. It’s exactly because I’m monolingual that I have no real idea how translators go about their jobs. I understand almost every other part of the system – the technology and how it all works together, how to trouble shoot installation, the difference between UTF8 and UTF16, but I don’t know the flow.

The blogspace at Witness has a great article on how they use a mixture of technologies and softwares to subtitle a video in their workflow. Recommended reading if you want to look at subtitling.

Updated Libre Office

I discovered that the premier free office software, Libre Office, was updated to version 3.5 recently. For those working with language, amongst the new features and fixes are a some Localisation improvements that justify an upgrade. If you are paying for a competitive office suite, I recommend you try Libre Office before spending the money at you next upgrade opportunity.

Localization

  • Added Arabic, Aragonese, Belarusian, Bengali, Breton, Bulgarian, Scottish Gaelic, Greek, Gujarati, Hindi, Latvian, Brazilian Portuguese, European Portuguese, Sinhala, and Telugu spelling dictionaries. (Andras Timar)
  • Use of possessive genitive case and/or partitive month names if provided by a locale’s locale data (e.g., Russian, Polish, Finnish, Lithuanian, and others).
    If a day of month (D or DD) is present in a number formatter’s date format code, the month name for MMM or MMMM is displayed in possessive genitive case or partitive case.
    Else if no day of month is present, the month name is displayed as noun / nominative case.
    See blog for more details. (Eike Rathke)
  • Corrections to Polish [pl-PL], Portuguese [pt-PT and pt-BR], Slovenian [sl-SI], and Latin [la-VA] locale data, esp. date formats. (Eike Rathke, Martin Srebotnjak, Mateusz Zasuwik, Olivier Hallot, Roman Eisele, Sérgio Marques)
  • Initial support for two new UI languages, Luxembourgish (lb) and Tatar (tt) 
    LibreOffice 3.5 supports 107 UI languages.

Pineapple Donuts

Ok. So I’ve not announced anything here as yet, so I will now. In three weeks I am moving to the Pacific Island nation of Kiribati (pron: Kiri-bass), specifically the town of Bairiki on the island of Tarawa, for a year.

My whole family will be coming too – I am building a database for the Government, Amber is working in Marketing and Communications for the Kiribati Institute of Technology and our children are going to the local primary school. Both Amber and I are working with the AVID project.

This has been in the works since about October of last year, which hopefully explains my relative silence over the last two months, at least. Preparations are well underway, although the house is yet to be packed. Feel free to volunteer to help in this regard.

Further, given the nature of my work as a volunteer, but also the availability of the internet (or lack of), Pineapple Donut may well go on a small 12 month hiatus. I will potentially post a few things, but I certainly wont be doing quite as much as I have previously.

I absolutely plan on coming back to this project at the end of the assignment and look forward to reporting on tech/translation soon. I will be starting a blog about our experiences while we are away – I’ll be reporting on that very soon.