Google Translate API gets monetized

I spoke about the deprecation of the Google Translate API when it was first announced, and speculated that the real reason for changing the status quo was “another product on the horizon”. Turns out I was spot on, with Google announcing that the paid version of the Google Translate API is now open for business.

This API supports translations between 50+ languages (more than 2500 language pairs) and is made possible by Google’s cloud infrastructure and large scale machine learning algorithms.

The paid version of Translate API removes many of the usage restrictions of previous versions and can now be used in commercial products. Translation costs $20 per million (M) characters of text translated (or approximately $0.05/page, assuming 500 words/page). You can sign up online via the APIs console for usage up to 50 M chars/month. Developers who created projects in the APIs Console and started using the Translate API v2 prior to today will continue to receive a courtesy limit of 100K chars/day until December 1, 2011 or until they enable billing for their projects.

For academic users, we will continue to offer free access to the Google Translate Research API through our University Research Program for Google Translate. For website translations, we encourage you to use the Google Website Translator gadget which will continue to be free for use on all web sites. In addition, Google Translate, Translator Toolkit, the mobile translate apps for iPhone and Android, and translation features within Chrome, Gmail, etc. will continue to be available to all users at no charge.

The glaringly obvious oversight is the question on all translators lips: Have you used or are you using TMs uploaded by users in good faith? I personally disagree with the decidedly closed system thinking that dominates this issue – I think translators are overly precious about TM ownership – but the reality is that Google is abusing it’s power in this case. The Google Translator’s Toolkit‘s Terms of Service is sufficiently vague, probably untested, and almost certainly open enough to interpretation to allow Google to use those uploaded TMs. Even proving that Google had transgressed your TM copyright would be a nightmare.

Note that I’m not a translator – it’s easier for me to have this position, and to be fair to Google, there’s no reason why translators couldn’t get together to take them on in this market, but it’s that preciousness surrounding TM ownership that’s preventing this from happening.

If, at any stage or in any way, Google has accidentally or otherwise let copyrighted TMs into it’s language learning systems or it’s translation systems, I believe that they have a responsibility to provide the API for free to everyone – and I think translators should be shouting this loud and clear from the mountain tops.

Transcendent Man talks on Translation Technology

I’ve let this one slip a little due to the length of the video and personal time constraints, but this huffpo article Ray Kurzweil on Translation Technology gets a lot of good answers to those questions that occupy the translation or interpreting professionals mind so much these days. I recently saw Transcendent Man, a documentary about the inventor and futurist Ray Kurzweil – his name may not ring bells – but he counts OCR, flatbed scanners and ‘the first electronic musical instrument which produced sound derived from sampled sounds’, the Kurzweil K250 among his discoveries. While obviously intelligent and well thought through, he certainly lacks the human side –  seen in the lack of accounting for or acknowledging of the human or social consequences in his predictions.

Although he’s nothing if not clinically rational and has amazing foresight.

According to Kurzweil, machines will reach human levels of translation quality by the year 2029. However, he was quick to highlight that even major technological advances in translation do not replace the need for language learning. “Even the best translators can’t fully translate literature,” he pointed out. “Some things just can’t be expressed in another language. Each language has its own personality, so reading literature in the original language is going to remain better than even the best human translators.”

However, Kurzweil does not believe that translation technologies will replace human translators and interpreters. “These technologies don’t replace whole fields, in general. What they do is replace a certain way of applying them.” He provided the example of music, a field he worked in extensively, and the negative reaction of musicians’ unions to synthesizers in the 1980s, driven by fears that these working professionals would lose the opportunity to make money. As Kurzweil pointed out, instead of losing the ability to make money, their profession simply evolved. “If you go to a music conference now, it’s like a computer conference with these very powerful musical tools where musicians can command a whole orchestra, and so forth, and actually do a lot more with the technology. In fact, music is more vibrant than ever and musicians are very much in demand.”

Because opportunities are changing, translation providers with inflexible business models that do not incorporate technology may indeed be at risk. However, Kurzweil sees a bright future for the language industry in general. “I think the demand for language is going to increase,” he pointed out. “These tools are going to increase humans’ ability, with the help of machines, to command greater ability to use language.”

Many practitioners believe that translation is an art, so the parallel Kurzweil draws between the music field and translation is one that even the most technophobic translators will appreciate. In fact, Kurzweil went so far as to characterize translation as “the most high-level type of work one can imagine.” He explained, “the epitome of human intelligence is our ability to command language. That is why Alan Turing based the Turing test, which is a test of whether or not a computer is operating at human levels, on a command of language.”

Of course, it’s precisely because of the complexity of translation that humans must harness the power of machines to improve it. “These tools are going to increase our ability to use, create, understand, manipulate and translate language,” Kurzweil explained. “The idea is not to resist the tools, but to use them to do more.”

The article is based on a 17 minute video on the huffpo site, but it’s not recommended – done by Common Sense Advisory, the interviewer is incredibly robotic and disengaged – something for the business exec crowd. Having said that, watching Ray speak is fascinating (he has excellent tone and tempo), and his ideas are interesting.

The thorny question of the þ

Over on the BBC’s now archived H2G2 site (“Hitchhiker’s Guide to the Galaxy” – an early internet attempt to, well, create a H2G2), there’s an interesting post suggesting that the Y in “Ye Olde Shoppe” should be pronounced as “th” – removing the olde worlde feel surrounding the original, in a fascinating short history of a minor part of the English language:

A Medieval Scribe’s Dilemma

Medieval English thus contained a variety of signs for the sound ‘th’ – the digraph ‘TH’, the thorn , and the eth (orthok ). Scribes ended up using a mixture of these, although some tried to make a distinction between those used for a voiced ‘th’ sound and the signs used for a voiceless ‘th’. As a result, reading medieval texts today can be enormously confusing. Is that a ‘y’? Is it a ‘p’? Or a ‘th’? The problem is compounded by the inclusion of yet another runic sign which made it into Medieval English – the wen7, a symbol that looks very like a thorn , except that the triangular portion sits even higher, giving it a strong look of an angular ‘p’.

Even readers at the time often found it difficult to know precisely what the text was saying, given the combination of Latin characters and the remnants from the runic alphabet. Heaven help the reader whose ability to transcribe the various letters and runes (and all their forms) was poor and couldn’t work out the meaning from the context! The problem was made worse by the occasional juxtaposition of Latin and Old English texts on the same page, and by the shorthand and unique methods employed by individual scribes in transcribing the letters

The Font of Wisdom

The thorn was particularly popular as a sign for ‘th’ in Medieval English, but with the advent of printing came a problem. There was no thorn sign in the printing fonts, as they were usually cast outside of England. So, since the sign for thorn slightly resembled the lower-case ‘y’, that’s what was substituted.

The thorn continued to be used, but printing caused its eventual demise from the English alphabet. As mentioned earlier, lingering proof of its existence hangs on in the outmoded ‘Ye’.


How many X words for Y?

For most, X=Inuit, Y=snow is the easiest solution to that equation – you will find it everywhere and probably hear it once a year in conversation. As it happens, another solution available is X=Welsh, Y=rain:

Although there are words for “spotting”, “big spaced drops”, “short sharp showers”, it is for the more serious rain that the language comes into its own. So there are different single words that translate as “pouring very quickly,” “throwing it down” and “fierce rain.”

Moving up a gear at least in the quantity of water coming down there are additional single words that mean “sheets of rain”, “fountain rain”, “beating rain”, “bucketing rain” and “maximum intensity rain.” The Welsh also have descriptive phrases. The English “It is raining cats and dogs” has the equally baffling but perhaps more colourful Welsh equivalent “It’s raining old women and sticks.”

Of course, the very next page I stumble across blows the whole idea apart. It’s well argued, I agree and support it’s point, but wonder if there isn’t just a little bit of hair splitting over meaning:

First some facts. Eskimo, or more accurately the Yupik and InuitInupiaq families of languages, have a handful of words for snow, ranging from estimates as low as two to a high of a dozen or so. That’s about the same number that can be found in English (snowsleetflurryblizzardslushpowder, etc.). So actually, Yupik and Inuit are not remarkable in the number of words they have for snow.

Part of the reason for the varying numbers is in defining exactly what we mean by word. (Is “snowdrift” one word or two?) Any two counts of the number of different words in anything will vary, sometimes wildly, depending on how the counting is done. Making this problem worse is the fact that Yupik and Inuit are agglutinative languages with many compounded forms, so a small number of roots can seem like a lot of different words. For example, the West Greenlandic word siku, or “sea ice,” is used as the root for sikursuit, “pack ice,” sikuliaq, “new ice,” sikuaq, “thin ice,” and sikurluk, “melting ice.” Note that for each of these examples, the English equivalent is expressed as a simple noun phrase instead of as a compound noun. English speakers can still easily express the concept, even if they don’t use a single word to do so.

But even if Yupik and Inuit had a large number of root words for snow, would this tell us anything interesting? The answer is no. For one thing, large vocabularies in specialized fields are not unusual. My dog Dexter is a beagle, but he could just as well be one of several hundred named breeds, from Affenpinscher to Zapadno-Sibirskaia Laika. Or think of the number of different types of saw: crosscutbandcircularhack,ripjigsaber, etc. Almost any specialized field develops long lists of such hyponyms, and if the Inuit did have a large number of words for snow it wouldn’t tell us anything interesting about them or their language.

The article goes on to talk about the more interesting notion of how language may actually influence ideas:

Guy Deutscher, in his recent book Through the Language Glass, describes the Guugu Yimithirr people of Australia, who have no words for right or left. Instead, they give directions using cardinal directions, north, south, east, and west. A Guugu Yimithirr speaker will not only tell you to drive five miles north, as one might in English, but will also ask you to scoot a few inches to the southwest, instead of “to the right,” so he has room to sit down. As a result the Guugu Yimithirr tend to be hyper-aware of their position relative to the points of the compass. They are continually and unconsciously updating their internal compasses. English speakers can learn to do this too; it’s just that we don’t practice the ability, so the ability is not as keenly developed in us. And the converse is true: Guugu Yimithirr people can quickly pick up the concepts of right and left. The language we use doesn’t prevent us from thinking certain thoughts, but it can make thinking in certain modes habitual and faster, and it can cause us to superficially associate ideas with related concepts that use similar words.

Of course, having found all this on twitter, as is so often the case (no doubt the filter bubble effect), soon after this article on the preservation of languages that were endangered caught my attention. At first angered by the opening line offering “saving the world’s threatened languages may seem informed more by nostalgia than need”, all of my personal criticisms were addressed – rescuing languages is valuable because it

touches on fundamental questions about how the brain works, how people express ideas, how societies adapt and how human history has evolved. And of how researchers benefit.

“We’re talking about neuroscientists, we’re talking about computer scientists, we’re definitely talking about historians, anthropologists and biologists in some cases” working on nearly extinct language, Kerttula said.

The National Science Foundation actually has physical scientists working with Inuit people to identify different aspects of ice that aren’t captured in the English language but could inform our understanding of the changing Arctic ecosystem.

“If you don’t understand and don’t have the language for what ice is, what ice should be, you’re not going to understand how it’s changing,” Kerttula said. “Language is critical in recognizing change in your environment.”

Of course, when talking about Eskimos and ice, the more interesting fact that no one ever mentions is that the Arctic as home to so many languages, although I think it’s safe to presume that like Danish/Swedish or even Texan English/Scottish, speakers could understand each other on the whole, despite speaking different langauges:

A few of the researchers will be working with languages spoken by fewer than 30 elderly people. But the designation “endangered,” Kerttula says, isn’t necessarily a measurement of the small number of people still speaking a language. Rather, she said, languages become endangered when children no longer speak them.

Out of 92 languages known to have been used in the Arctic, for example, she says 72 still have some speakers. All but one (Greenlandic) are endangered, the result of the steady encroachment of other dominant languages like English into the domains of public schools and legal systems, television and now the Internet.

“Pretty soon, all of the domains of your life are in English, and the only place where you get to speak your native language is to your grandmother,” Kerttula said. “So how long is that language going to last? It’s basically not.”

And probably the biggest problem with reducing the world to English, is that of the over quarter of a million words that exist, we only use 0.3%, about 7000 words, in 90% of our communication. Thankfully, Save The Words is trying to rectify this, although I agree with Brain Pickings that regrettably the site has gone for sexy-over-shareable – the all flash sites of yesteryear will hopefully die with the onrush of html5 capable sites in the near future.

Paul Celan, Poet and Translator

I’ve been listening to a lot more podcasts lately. Recently Biella Coleman sent me to Radio National’s 360 Documentaries for one about internet activists Anonymous that I really enjoyed. I ended up keeping my subscription to their feed.

The title of the most recent edition is A message in a bottle: encounters with Paul Celan and Martin Heidegger – as per my usual method, I just dived without even looking at the title, but soon I was hooked:

Paul Celan is regarded by many critics as one of the greatest European poets of the 20th century, as important in the pantheon of German language poets as Goethe and Holderlin.

But the surprise, the discovery, was reading the poetry of Paul Celan. It was a shattering experience, its impact upon me difficult to encompass in a bland sentence or two. Celan’s vision is at once one of immense grief – the grief of exile, of bearing witness to the Holocaust, of facing history and personal loss in the one moment – and also a vision of what can only be called ‘a terrible beauty’. Reading his work I found myself frequently breathless, at other times in tears, or astounded by the beauty he conveyed in startling images, suffused through with arcane and complex allusions.

Beyond giving an account of Celan, and of his encounter with Heidegger, the radio feature picks up on and explores the primary theme identified by Felstiner – that of our being ‘at home’ only in language. It is an idea echoed and developed along different lines by Heidegger – that we can only fully exist in langauge – that in effect ‘being’ is language.

These are complex ideas, but the lives through which these ideas are explored here are rich and the events engaging. And discovering, by chance, this ‘messages in a bottle’ has been one of the most exciting discoveries of my life.

Paul Celan‘s take on language after Auschwitz is fascinating:

“Only one thing remained reachable, close and secure amid all losses: language. Yes, language. In spite of everything, it remained secure against loss. But it had to go through its own lack of answers, through terrifying silence, through the thousand darknesses of murderous speech. It went through. It gave me no words for what was happening, but went through it. Went through and could resurface, ‘enriched’ by it all.”

Elsewhere he notes:

There is nothing in the world for which a poet will give up writing, not even when he is a Jew and the language of his poems is German.

He also worked as a translator (from what I can gather, of Russian Literature into Romanian, and later, English into German) included Shakespeare, Breton, Artaud, Kafka, Rimbaud, Picasso, Dickinson and Frost amongst many others.

It is a fascinating production – highly recommend.

The King James Bible

The BBC’s All Things Considered podcast has a very interesting episode this week marking the 400th anniversary of the King James Bible.

I’m not a particularly religious man (unless you want to hear my “Anarchism is a faith” rant) or blogger – why would this interest me enough to bring it to your attention?

It turns out to be an almost perfect case study of the issues surrounding the art of translation. Paraphrasing vs Metaphasing? Tick. Politics of words/segments used? Tick. Power plays based on authority of translators, source text or patronage? Tick. Translations that have added to our historical knowledge base? Tick!

In fact, it turns out that translating the Bible has it’s own Wikipedia entry. This page is quite complex and deals with the subject in a way that is probably only understandable by those that study translation. The podcast, on the other hand, makes it all very easy to understand. Highly recommended listening.