A previous post showed how to OCR Chinese texts using Adobe Acrobat Pro (OCRing is the process of recognizing text in an uneditable file, like a scan, and making it editable). While it works quite well, Acrobat Pro is a fairly expensive commercial product. The free Google Docs has the ability to OCR uploaded pdfs and image files in a number of languages, including both traditional and simplified Chinese. You enable it in the upload settings by clicking the upload icon, choosing Settings, then checking “Convert text …”
Once you have it enabled, when you upload a pdf (or image with text), you can select the language of the source file.
Google Docs saves the editable text together with the source file in your account. A multi-page pdf is saved with the editable text interleaved between the pages. This makes it impossible to select all the editable text at once for pasting into a clean document; you have to select each page’s text separately.
In a couple of quick tests, the results from Google were on average nearly as good as those from Acrobat X, in some passages Adobe had more wrong characters, in some Google. While neither app is great with punctuation, Google frequently misreads the Chinese full stop ( 。) as either a small o or a zero, which Adobe generally does not. Google does better with quotation marks, though, generally rendering the smart quotes from the source file correctly, while Adobe changes them to ASCII quotes.
Stay tuned for fuller coverage of the OCR showdown.