YouTube adds a Translation Service

Late last year Google added a subtitle translation function to make it easier for video uploaders to transcribe their videos and to then have others translate them.

Of course, sometimes you want that Swahili subtitle translation but you don’t know anyone that will do it.

Google has announced an initial collaboration with two translation services so you can get a translation done for you:

When you request a translation for your captions in YouTube, we’ll display a list of vendors along with their estimated pricing and delivery date so you can easily compare. We’ve initially collaborated with two companies, Gengo and, to make their services available to you and to streamline the ordering process.

There are two aspects to note here: two weeks ago Amara (previously Universal Subtitles, mentioned here often) announced an update that automagically sync’d subtitles to your YouTube channel – the timing of this move by Google’s is cynical in the extreme.

Amara are still doing a better job of it – who else has a Closed Captioning (CC) request service:

These are videos that our deaf and hard-of-hearing users have asked the Amara community to caption. Join the team – via – and help us make these videos accessible to everyone. Are you deaf or hard of hearing? Feel free to submit a video to this team or send your request to our Deaf HoH email list:

Did you see that? A deaf/hard of hearing subtitle request list. Fantastic. This type of development gives me faith that while the Google Translation engine will impact upon translators incomes, there is still room for groups to make a living if they think outside the box.

More importantly and fascinatingly, Amara also offers a Music Captioning service:

The place where music is captioned to bridge the gap between hearing and deaf world.
Everyone is welcome join this team – via – and share and create a worldwide audience to enjoy music in every language of the world.
We also have a Google group where you can discuss the captioning/subtitling of each video:!forum/musiccaptioning
See also our “Guidelines about collaborative captioning / subtitling” there: .

My other concern, or more correctly the obvious conclusion, of this development, is that Google will be using these subtitles more and more to help it with its voice recognition and understanding service, Google Voice Search – one of the most important steps to integrating robots and AIs into our lives.

Workflow of a subtitler

If there’s one thing I really love as a monolingual interested in language and translation, it’s a translator’s workflow. It’s exactly because I’m monolingual that I have no real idea how translators go about their jobs. I understand almost every other part of the system – the technology and how it all works together, how to trouble shoot installation, the difference between UTF8 and UTF16, but I don’t know the flow.

The blogspace at Witness has a great article on how they use a mixture of technologies and softwares to subtitle a video in their workflow. Recommended reading if you want to look at subtitling.

Subtitling for SE Asia and the Pacific

I used to work for EngageMedia and I’m pleased to see that they have just launched a subtitling project in conjunction with our friends over at Universal Subtitles. Headed up by Singaporean democratic activist Seelan Palay

The 600 million people spread across Southeast Asia share a common set of challenges: climate change, human rights, freedom of expression, corruption and much more. With hundreds of regional languages, communication and collaboration can be difficult. Translation and subtitling could always help, but now it’s a whole lot easier.

Let’s hope the number of subtitles increases and these important videos are seen by more people as a result.

Popcorn.js hits 1.0

Mozilla has announced that Popcorn has hit version 1.0. Popcorn.js itself

is an HTML5 media framework written in JavaScript for filmmakers, web developers, and anyone who wants to create time-based interactive media on the web.

The best way to understand what that means is to watch the video on that page, but in short: imagine the idea of subtitles mashed with the internet. Instead of just text appearing on screen based on time in a video, any web content can appear in the video – including live web streams such as twitter hashtags.

It’s an exciting development – I blogged about version 0.3 of popcorn.js in February of this year, and it was significantly less coherent and less far reaching – it’s come a long way in a mere nine months.

This is really is the next plane beyond subtitles – something that the Chinese subtitle hackers have been laying the groundwork for over the last couple of years. I can’t find the original article (from I believe), but it revolved around Chinese fansubs of American tv shows, like Lost, that included metadata from Wikipedia. For example someone in the show mentions General Custer, a topic Chinese viewers might not be familiar with, and since the video can be paused as they are largely consumed on laptops, the entire Wikipedia entry can be added to one frame for cultural context. Obviously, Popcorn.js takes this left field practice to the next level entirely.

Mozilla Popcorn is a slightly larger package that includes Popcorn Maker as well – a GUI for Popcorn.js that means you don’t need to be a coder to utilise the software. In fact, now that I’ve had a better look at Popcorn Maker, it’s very reminiscent of AegisSub or something similar – not as feature rich in what it can do to text, but much more so in it’s Twitter, Flickr, Wikipedia, image and website integration.

I’d be interested to know how much integration there is with another of Mozilla’s subtitling projects – the Universal Subtitle project.

YouTube’s Audio Transcription

Given the subject of my last post was Google and voice recognition, I thought that this video is timely. Titled Caption Fail 2 (the original Caption Fail is also available) it uses YouTube’s Auto Transcription mechanism to “play Telephone” (aka Chinese Whispers) – the game in which a message is passed from person to person in serial, and is transformed into something completely new – and often surreal – just like a bad machine translation.

Personally I believe that this game is always affected by the fact that the subject’ss knowledge of the study or outcome causes them to alter their behaviour – in this case, to deliberately change what they have heard (like the Hawthorne Effect, but not quite). While YouTube isn’t doing this to the performers, Rhett and Link, they have obviously chosen scripts that are a mouth full in order to trip the software up. I wonder how many takes they needed to shoot and script re-writes it took to get a sufficiently entertaining result?

Google, voice recognition and search

The London Review of Books has an interesting look at Google titled It knows. The article discusses how much Google knows, and what it’s doing with that information, instead of offering all of it to you in search, it’s keeping some back:

The reason is that Google is learning. The more data it gathers, the more it knows, the better it gets at what it does. Of course, the better it gets at what it does the more money it makes, and the more money it makes the more data it gathers and the better it gets at what it does – an example of the kind of win-win feedback loop Google specialises in – but what’s surprising is that there is no obvious end to the process.

While I’m less eloquent, or have been less able to pinpoint it so effectively, it is a refrain you hear a lot on this blog. I’ve posted previously about how Google has learnt semantics -this article follows up that anecdote with a fascinating and insightful description of GOOG-411, the voice search service briefly offered by Google, making the now obvious reasons for its existence a lot clearer. Given yesterday’s launch of SIRI by Apple, it is quite a timely reminder that there is more than just one player in the voice recognition field and Apple wasn’t even the first to it:

By 2007, Google knew enough about the structure of queries to be able to release a US-only directory inquiry service called GOOG-411. You dialled 1-800-4664-411 and spoke your question to the robot operator, which parsed it and spoke you back the top eight results, while offering to connect your call. It was free, nifty and widely used, especially because – unprecedentedly for a company that had never spent much on marketing – Google chose to promote it on billboards across California and New York State. People thought it was weird that Google was paying to advertise a product it couldn’t possibly make money from, but by then Google had become known for doing weird and pleasing things. In 2004, it launched Gmail with what was for the time an insanely large quota of free storage – 1GB, five hundred times more than its competitors. But in that case it was making money from the ads that appeared alongside your emails. What was it getting with GOOG-411? It soon became clear that what it was getting were demands for pizza spoken in every accent in the continental United States, along with questions about plumbers in Detroit and countless variations on the pronunciations of ‘Schenectady’, ‘Okefenokee’ and ‘Boca Raton’. GOOG-411, a Google researcher later wrote, was a phoneme-gathering operation, a way of improving voice recognition technology through massive data collection.

Three years later, the service was dropped, but by then Google had launched its Android operating system and had released into the wild an improved search-by-voice service that didn’t require a phone call. You tapped the little microphone icon on your phone’s screen – it was later extended to Blackberries and iPhones – and your speech was transmitted via the mobile internet to Google servers, where it was interpreted using the advanced techniques the GOOG-411 exercise had enabled. The baby had learned to talk.

But success wasn’t immediate. And failure is often the best way to learn – it forces us to adapt.

Before Google bought YouTube in 2006 for $1.65 billion, it had a fledgling video service of its own, predictably called Google Video, that in its initial incarnation offered the – it seemed – brilliant feature of answering a typed phrase with a video clip in which those words were spoken. The promise was that, for example, you’d be able to search for the phrase ‘in my beginning is my end’ and see T.S. Eliot, on film, reciting from the Four Quartets. But no such luck. Google Video’s search worked by a kind of trickery: it used the hidden subtitles that broadcasters provide for the hard of hearing, which Google had generally paid to use, and searched against the text. The service is just one of the many experiments that Google over the years has killed, but a presumably large reason for its death was that although it appeared to work it was really very limited. Not everything is tailored for the deaf, and subtitles are often wrong. If, however, Google is able to deploy its newly capable voice recognition system to transcribe the spoken words in the two days’ worth of video uploaded to YouTube every minute, there would be an explosion in the amount of searchable material. Since there’s no reason Google can’t do it, it will.

The final part of the article bemoans the size of Google:

Google is getting cleverer precisely because it is so big. If it’s cut down to size then what will happen to everything it knows? That’s the conundrum. It’s clearly wrong for all the information in all the world’s books to be in the sole possession of a single company. It’s clearly not ideal that only one company in the world can, with increasing accuracy, translate text between 506 different pairs of languages. On the other hand, if Google doesn’t do these things, who will?

Which is a legitimate concern, no doubt. Who needs a one world Government when society can just be taken over by a large corporation by stealth? Having said that, there’s no reason why we can’t live together in harmony, this society of ours and Google. I just think Google will have to give back in return for what it’s taken from us – make the maps free. Make the translations free. Keep the search free – and even open it’s heuristics. Am I asking too much? Am I not being cynical enough? My inner anarchist is squeamish at the thought of allowing it to happen, but my inner futurist is excited at its possibilities.

Wikipedia in Arabic?: ويكيبيديا: الموسوعة الحرة

@gr33ndata retweeted this tweet about Wikipedia in Arabic last night:

154,000 Arabic Wikipedia entries. 374m Arabs! How can we learn when we don’t share what we know? Yalla, wikiArabic!

I followed it through and found that there was a transcription available, but not as subtitles. To activate this function, click the transcription button (circled). I wish there was some easy Universal Subtitles plugin that would just transform the transcript to a subtitle!

Transcription button in YouTube

Transcription button in YouTube

Tech community anger at crowd sourced translations

Steam, the internet’s most popular game distributor, is crowd sourcing it’s game translations. This has caused anger in the tech community:

Steam/Valve has decided to build a “community effort” to get its Steam platform and game files translated by the community into 26 languages (english, czech, danish, dutch, finnish, french, german, hungarian, italian, japanese, korean, norwegian, “pirate”, polish, portugese, romanian, russian, spanish, swedish, simplified and traditional chinese, thai, brazilian, bulgarian, greek & turkish).

but here is the catch:

Translators do not get paid. They do enjoy many perks however, like access to the game text to be translated (not the game itself, god forbid they could actually test their translation within the game and not have to pay for it), and… and… that’s about it.

Update: I did some math; the test text when you sign up for Steam Translation Server is 265 words; at the current rate of 0.09 USD per word this means 23.85 USD is how much a professional translator would charge to translate that text. Now if only storefront descriptions like this are to be translated for all games (using Steam’s claim of a catalogue of over 1100 games and growing) that would mean that Steam is saving roughly 26235 USD per language (and keep in mind thats only for short storefront descriptions of games).

Now there are 26 languages on the Translation Server at present; that means roughly 26235 x 26 = 682110 USD are being saved by Steam making the “community” work for free.

To that you have to add the costs for reviewing said translations; 0.03 USD per word, so easily enough 682110 divided by 3 = 227370 USD. (that is assuming only one version of the text has to be reviewed, which is not the case)

So, Steam has just saved 909480 USD by making the “community” work for free.

I would love to hear from people that know more about translation costs in America regards the pricing that has been listed. I think the main source of anger is directly related to Steam’s large profits and that not even a free game is offered in compensation – especially when digital game distribution has a cost of almost zero – ie, it would cost Steam nothing to provide a gratuity.

There are a number of issues that spring to mind – how does one become an accredited translator into pirate for instance? This is an example of a translation effort that can almost only happen by means of crowd sourcing since the language was created on and by the internet via crowd sourcing – starting with Talk like a Pirate Day (Wikipedia entry) and then somewhat legitimised by Facebook.

Then there is the obvious problem for Steam (apart from the million dollar translation costs if done “legitimately”) of to whom to give a gratuity – would a crowd member have to submit a certain number of strings to qualify? Would it be based on votes garnered for the strings submitted, or strings accepted for the official or final translation? There’s also a time factor – games age quickly and translations take time. Crowd sourcing does a fantastic job of parallelising translation production – I would suggest that this process will be complete for Steam within the year, if not sooner – probably a saving of at least 6-12 months.

Further, without copies of the games, surely the contextual information needed to do a correct or proper translation would be missing?

Thankfully the more thoughtful crowd at Slashdot have weighed in, making the obvious point that it’s hypocritical to promote open source software (created via crowd sourcing) but denigrate translation using the same methods.

Another commentator brings to attention – a crowd sourced Spanish (European, I presume) subtitle project, and a third throws in the obligatory “hovercraft full of eels” line (context).

While I understand that those translating should be afforded some recognition, given that it’s to the community’s benefit I don’t have a problem with Steam’s actions.