Thursday, September 25, 2008

We Can Do OCR!

Yesterday afternoon I installed the new Abbyy FineReader software that just came in.

Installation was a breeze and the software is very intuitive to start using. Scanning the first document was simple. I didn't even have to look at the manual to scan or edit the text. It did get a bit more complicated when the document was saved, because there are so many options available.

The end result that I wanted was a .pdf document that looked like the original that, but that had a text layer underneath which allowed the user to search the document. After several tries and reading the manual (imagine that) here is the end result.

To get a document that looks like the original scan select 'keep original image size,' 'text under page image' and 'enable tagged pdf'.'

Some things to note. FineReader automatically rotated the landscape photograph on page 8, but it couldn't read the text that was sideways in the table on page 16. To get that section of text readable I think it would take a lot of extra effort.

I was impressed with the accuracy of the OCR and the ease of use. So now we enter another phase of digitization.

No comments:

Post a Comment