OCR

Jul 152012
 
iOS app to OCR Chinese: Pleco

This post fulfills my promise to discuss the additional features of Pleco Chinese Dictionary following up on its victory in the iOS OCR showdown. Pleco is the grand old man of apps for learning Chinese (see the Pleco website); I first used it on a Palm Tungsten C device in 2003. It took some time to be ported to iOS and, while the appearance is unfortunately too reminiscent of its earlier incarnations, Pleco is still a very powerful tool for students of Chinese. Once the app has OCRed text from an image, tapping “Capture” will open the app’s “Reader” (available as a paid add-on) and display the text reflowed (preserving paragraph breaks, but not the arbitrary line breaks of the printed page). The reflowing is great for continuous prose, but is undesirable for things like poetry or song lyrics. I have not found a way to disable the reflowing and … [read more]

May 202012
 
Showdown: iOS apps for Chinese OCR

This blog has discussed several options for performing OCR on Chinese texts, but the options all required a desktop or laptop computer (Google Docs, Adobe Acrobat, Sciweavers i2OCR). In this post, we’ll look at several options for OCRing Chinese on iOS devices. The contenders: ABBYY TextGrabber + Translator is an OCR app that supports many languages and ties into Google Translate. LRDict is a Chinese dictionary app that has an OCR feature. I used the lite version for testing. Pleco Chinese Dictionary is an app with several tools for studying Chinese, primarily a free dictionary; OCR is a paid add-on. All of these require starting from an image file, not a PDF. You can take a photo with your device’s camera or import one from the Photos app. My tests used two different photos taken with an iPhone 4S and iPad 3, so your results may vary. Round one A page … [read more]

Mar 292012
 
OCR Chinese with Sciweavers i2OCR

Sciweavers i2OCR is another free online service that can OCR image files in a number of languages, including traditional and simplified Chinese (see the post on using Google Docs to OCR Chinese); no sign up is required. A pdf has to be converted to an image file before it can be uploaded. Sciweavers provides another online tool to do this (follow the instructions on the i2OCR page), but I’ve also just opened a pdf in Preview (the default Mac pdf/image viewer) and exported it as a jpg. Once the OCR is complete, you’ll see a download icon plus a selectable text area containing the editable text next to an image of your scan. The download button saves a file with a .doc suffix. There’s something peculiar about the doc, though, because neither Pages nor Google Docs could open it, although TextEdit did. Sciweavers OCR is noticeably slower than Google Docs; one page … [read more]

Mar 252012
 
OCR Chinese with Google Docs

A previous post showed how to OCR Chinese texts using Adobe Acrobat Pro (OCRing is the process of recognizing text in an uneditable file, like a scan, and making it editable). While it works quite well, Acrobat Pro is a fairly expensive commercial product. The free Google Docs has the ability to OCR uploaded pdfs and image files in a number of languages, including both traditional and simplified Chinese. You enable it in the upload settings by clicking the upload icon, choosing Settings, then checking “Convert text …” Once you have it enabled, when you upload a pdf (or image with text), you can select the language of the source file. Google Docs saves the editable text together with the source file in your account. A multi-page pdf is saved with the editable text interleaved between the pages. This makes it impossible to select all the editable text at once for pasting into a clean … [read more]

Jun 142011
 
OCR Chinese with Adobe Acrobat

Suppose you have a printed magazine and you want to have a digitized version of an article (for converting from traditional to simplified or vice versa, adding interlinear pinyin, annotating/highlighting, carrying around on your iPhone, or whatever). You can scan the page, but this will not allow you to make any sorts of changes or additions to the text. That’s where OCR (optical character recognition) comes in. OCR software  makes the text of the scan selectable for copying and editing. To recognize characters, OCR software needs to be aware of the language (or at least the character set) that it is “reading.” Among its many features, Adobe Acrobat (not the free Adobe Reader) can perform OCR on Chinese texts in both traditional and simplified Chinese. Simply open a pdf scan in Acrobat and choose Document > OCR Text Recognition > Recognize Text Using OCR. In the dialogue box that appears, click the … [read more]