Google, voice recognition and search

The London Review of Books has an interesting look at Google titled It knows. The article discusses how much Google knows, and what it’s doing with that information, instead of offering all of it to you in search, it’s keeping some back:

The reason is that Google is learning. The more data it gathers, the more it knows, the better it gets at what it does. Of course, the better it gets at what it does the more money it makes, and the more money it makes the more data it gathers and the better it gets at what it does – an example of the kind of win-win feedback loop Google specialises in – but what’s surprising is that there is no obvious end to the process.

While I’m less eloquent, or have been less able to pinpoint it so effectively, it is a refrain you hear a lot on this blog. I’ve posted previously about how Google has learnt semantics -this article follows up that anecdote with a fascinating and insightful description of GOOG-411, the voice search service briefly offered by Google, making the now obvious reasons for its existence a lot clearer. Given yesterday’s launch of SIRI by Apple, it is quite a timely reminder that there is more than just one player in the voice recognition field and Apple wasn’t even the first to it:

By 2007, Google knew enough about the structure of queries to be able to release a US-only directory inquiry service called GOOG-411. You dialled 1-800-4664-411 and spoke your question to the robot operator, which parsed it and spoke you back the top eight results, while offering to connect your call. It was free, nifty and widely used, especially because – unprecedentedly for a company that had never spent much on marketing – Google chose to promote it on billboards across California and New York State. People thought it was weird that Google was paying to advertise a product it couldn’t possibly make money from, but by then Google had become known for doing weird and pleasing things. In 2004, it launched Gmail with what was for the time an insanely large quota of free storage – 1GB, five hundred times more than its competitors. But in that case it was making money from the ads that appeared alongside your emails. What was it getting with GOOG-411? It soon became clear that what it was getting were demands for pizza spoken in every accent in the continental United States, along with questions about plumbers in Detroit and countless variations on the pronunciations of ‘Schenectady’, ‘Okefenokee’ and ‘Boca Raton’. GOOG-411, a Google researcher later wrote, was a phoneme-gathering operation, a way of improving voice recognition technology through massive data collection.

Three years later, the service was dropped, but by then Google had launched its Android operating system and had released into the wild an improved search-by-voice service that didn’t require a phone call. You tapped the little microphone icon on your phone’s screen – it was later extended to Blackberries and iPhones – and your speech was transmitted via the mobile internet to Google servers, where it was interpreted using the advanced techniques the GOOG-411 exercise had enabled. The baby had learned to talk.

But success wasn’t immediate. And failure is often the best way to learn – it forces us to adapt.

Before Google bought YouTube in 2006 for $1.65 billion, it had a fledgling video service of its own, predictably called Google Video, that in its initial incarnation offered the – it seemed – brilliant feature of answering a typed phrase with a video clip in which those words were spoken. The promise was that, for example, you’d be able to search for the phrase ‘in my beginning is my end’ and see T.S. Eliot, on film, reciting from the Four Quartets. But no such luck. Google Video’s search worked by a kind of trickery: it used the hidden subtitles that broadcasters provide for the hard of hearing, which Google had generally paid to use, and searched against the text. The service is just one of the many experiments that Google over the years has killed, but a presumably large reason for its death was that although it appeared to work it was really very limited. Not everything is tailored for the deaf, and subtitles are often wrong. If, however, Google is able to deploy its newly capable voice recognition system to transcribe the spoken words in the two days’ worth of video uploaded to YouTube every minute, there would be an explosion in the amount of searchable material. Since there’s no reason Google can’t do it, it will.

The final part of the article bemoans the size of Google:

Google is getting cleverer precisely because it is so big. If it’s cut down to size then what will happen to everything it knows? That’s the conundrum. It’s clearly wrong for all the information in all the world’s books to be in the sole possession of a single company. It’s clearly not ideal that only one company in the world can, with increasing accuracy, translate text between 506 different pairs of languages. On the other hand, if Google doesn’t do these things, who will?

Which is a legitimate concern, no doubt. Who needs a one world Government when society can just be taken over by a large corporation by stealth? Having said that, there’s no reason why we can’t live together in harmony, this society of ours and Google. I just think Google will have to give back in return for what it’s taken from us – make the maps free. Make the translations free. Keep the search free – and even open it’s heuristics. Am I asking too much? Am I not being cynical enough? My inner anarchist is squeamish at the thought of allowing it to happen, but my inner futurist is excited at its possibilities.

One thought on “Google, voice recognition and search

  1. Pingback: YouTube’s Audio Transcription | Pineapple Donut

Leave a Reply

Your email address will not be published. Required fields are marked *

13 + two =