Google has released their Docs for the Android OS, which includes an OCR function – but it’s quality is pretty appalling. Compared with ABBYY Finereader, its output is embarrassing. I can’t say I’m that surprised. OCR is hard, and there are commercial options available at fairly reasonable prices considering the complexity of functionality. The software used by Google, Tesseract, lay dormant from 1995 to 2006. I think it can, and probably will, be improved, and Google are really the only people that could do it – they have the resources and can shoehorn it into other projects like Android or Google Docs. There is little for the independent developer, apart from the odd grad student.
To be honest I’m quite surprised at the state of Google’s OCR – given that they are scanning a large proportion of the worlds books, you would like to think that they had already nailed this one.