Pages

Saturday, April 23, 2011

Free Online OCR: web app delivers editable text from scanned images or PDFs




Free Online OCR: web app delivers editable text from scanned images or PDFs

Free Online OCR screenshot2
Do you have PDF documents or images (e.g. JPG, PNG, TIFF) that were created using a scanner, and that you wish you could convert to editable text? Free Online OCR is a web service that can perform optical character recognition (OCR) on your scanned images and/or image based PDF documents, in order to generate a “normal” text that can be subsequently edited or used in other applications.
This free service works as follows: you upload your PDF files on the website, choose the output format (e.g. word or RTF), and then you can download the editable file once the processing is completed. Note that unlike most commercial desktop-based OCR engines, Free Online OCR does not provide any post-processing editing tools once the OCR process is complete.

In most cases, you should not expect quick, single-click conversions though; depending on your source, you may have to wait a long time for your document to process, and there’s no guarantee that it will do so successfully. Moreover, the service will accept PDF files that are a maximum of 20 megs in size, so you may need to split your source into several pieces. The OCR process is highly dependent on the quality of your source, so you may need to manually embellish the quality of the source images (e.g. sharpen), as well as perform a lot of manual post-processing editing and patching up.
Here’s a quick guide on how to use this service:
- File size constraints: are 20 megs. If your source is larger (which is very likely for scanned PDF ebooks), you can split it into appropriately sized pieces with a program like PDFSam (which will let you define output sizes in megs).
- The quality of your source: has a lot to do with the quality of the output you get. If need be, you can export your PDF to individual JPEG’s using the free PDF-XChange Viewer (download via the widget on the right; portable version also available) then sharpen each image with an image editing app (e.g. Jpeg Enhancer, PhotoScape are two free tools that can do this). Note: this can be quite labour intensive, and is more appropriate for short documents.
- Processing times: can be quite long. My advice: start with a few pages (3-4) to see what the output is like, which will spare you a long and potentially fruitless wait, and will give you an idea of whether you should proceed or whether you should try to enhance the source as mentioned above.
- Output quality: once your file is OCR’d, you can download via the provided link. You will most certainly have to do a lot of manual fixing, though.
Here’s a list of PROs and CONs in relation to this service:
PROs:
  • Does a reasonable job: OCR quality is good enough, but not necessarily outstanding.
  • Many formats supported: PDFs or images (JPG, PNG, BMP, GIF, or TIFFs) as input, DOC, RTF, TXT, or searchable PDFs as output.
  • Preserves layout and some formatting: including font size, bold and italic, and bulleted lists.
  • Will process ’low quality’ images: i.e. the typical 300 dpi images of most documents.
  • Simple, easy to use interface: clean and straightforward.
  • Maximum upload size: is reasonable enough. At 20 megs you can upload most short documents in their entirety. For longer ebooks you will have to split your source as mentioned above. (And yes you will find ’Upload size’ in the CONs section below as well; I have no qualms contradicting myself!).
  • Confidential: their FAQ purports that your documents are automatically deleted once they are processed. (Note: I am merely reporting this and am not responsible for its veracity).
CONs: (or wish list, of you like)
  • Processing time can be long: I shudder to think about what the wait would be like if the service became very popular.
  • Will make you wait a long time before … failing: makes me wonder: why can’t I see what you got, so I can at least figure out what the state of my document is.
  • Upload size is 20 megs: I know that it’s quite reasonable as far as uploading documents to web services goes. But OCR’d documents tend to be very large almost by definition. Consider this a wish list item: I wish the upload size limit was larger.
  • No post-processing editor: which is to be expected for a web service, of course. But compared with most professional (read: for-pay) desktop OCR apps this is a considerable disadvantage. (
  • No option to email your document back to you: which means you cannot upload your document and move on to other things; you will have to keep your browser window open and wait.
The verdict: overall, an excellent OCR option. As a web service it is more suited for small documents rather than full-length ebooks, which at any rate is probably what most people will want to use this for anyway.
As a web service it has three disadvantages (1) some people may not feel comfortable uploading sensitive documents online, despite the promise of confidentiality, (2) the restriction on max upload size with respect to large documents, and (3) the fact that there are no post-processing tools.
That being said, good free OCR programs and services are few and far between; in fact, this service did a much better job that previously mentioned FreeOCR.net in my tests of it. As such this service is an excellent, welcome addition to repertoire of available, free OCR tools.
Compatibility: any OS running a modern browser.
Go to the Free Online OCR home page.

0 comments

Post a Comment