Humanising Translation Technology

Recently The Independent had an article about Google Translate which turned out to be an extract from a book by David Bellos. I certainly don’t agree with everything he says, and it is a bit waffly, but I do appreciate the direction he takes the piece.

It is not based on the intellectual presuppositions of early machine translation efforts – it isn’t an algorithm designed only to extract the meaning of an expression from its syntax and vocabulary.

In fact, at bottom, it doesn’t deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.

As I’ve noted before, I think this is incorrect – but would be a simple error to make for the non tech savvy or those without access to the source (as Levy did in my previous post). The reality is that search and translation are peas in a pod – both processes are looking for meaning. While I understand and accept that David is probably very close to the truth, Google themselves would be foolish if they weren’t letting their research in these two fields inform each other.

It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation.

The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.

Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what’s been submitted to it.

Much of the time, it works. It’s quite stunning. And it is largely responsible for the new mood of optimism about the prospects for “fully automated high-quality machine translation”.

Google Translate could not work without a very large pre-existing corpus of translations. It is built upon the millions of hours of labour of human translators who produced the texts that GT scours.

Google’s own promotional video doesn’t dwell on this at all. At present it offers two-way translation between 58 languages, that is 3,306 separate translation services, more than have ever existed in all human history to date.

Here he makes an interesting point – and one that I’ve been pushing to surmount since I started this blog – that the Translators should be recognised for their contributions, as coders are in the FLOSS ecosystem. When I think on it further though, I wonder if it matters – does the family of the now passed translator from early last century care that Google has made all our lives better without attribution? Do the makers of the innumerable stone axe heads deserve attribution for their work in fine tuning a useful tool? Will the 23rd century users of C-3PO like robots or BabelFish care, and even if they did – would it matter to me or David?

GT is also a splendidly cheeky response to one of the great myths of modern language studies. It was claimed, and for decades it was barely disputed, that what was so special about a natural language was that its underlying structure allowed an infinite number of different sentences to be generated by a finite set of words and rules.

A few wits pointed out that this was no different from a British motor car plant, capable of producing an infinite number of vehicles each one of which had something different wrong with it – but the objection didn’t make much impact outside Oxford.

GT deals with translation on the basis not that every sentence is different, but that anything submitted to it has probably been said before. Whatever a language may be in principle, in practice it is used most commonly to say the same things over and over again. There is a good reason for that. In the great basement that is the foundation of all human activities, including language behaviour, we find not anything as abstract as “pure meaning”, but common human needs and desires.

All languages serve those same needs, and serve them equally well. If we do say the same things over and over again, it is because we encounter the same needs, feel the same fears, desires and sensations at every turn. The skills of translators and the basic design of GT are, in their different ways, parallel reflections of our common humanity.

And this is where I enjoyed this piece – apart from the always welcome English humour – the return to humanism, the bringing of all this technological talk to the poetic, the beautiful. Technology is a reflection of our humanity – as well as an amplifier of our desirers and expander of our horizons. And this is the great unspoken promise of a functional GT that is available to all for free.