Converting splotches of ink on a piece of paper into letters and words in digital storage is much like deciphering a foreign language. So it's no surprise that Image Recognition Integrated Systems (IRIS) publishes language-translation products as well as the new optical character recognition (OCR) package Readiris - which can recognise text in about 50 languages, including Cyrillic alphabets.
I tested Readiris 4.1 just before it shipped, but IRIS says the version I tested is final. I found Readiris's interface easy to navigate.
You can tell the program to automatically analyse the layout of a page and recognise its text. The program lets you select areas to recognise or ignore, and set the order in which it proceeds through the text blocks. You can create templates that tell Readiris to recognise the text in specific locations in a series of files - a handy feature if you're processing repetitive forms. Readiris lets you easily operate a scanner from within the program, and you can set the export format for recognised documents.
Good results overall, some stumbles
Readiris did a remarkable job of figuring out how my newspaper and magazine test pages were formatted. Readiris can figure out the structure of a table if the cells have borders. For borderless tables, I had to manually indicate the columns; the software figured out the rows.
Readiris returned reasonable results on some tough documents, though it tripped consistently over a few common problems.
It recognised a clean laser printout of a letter-format document almost perfectly - its only mistake was adding a space between two letters in one word. It even surprised me by correctly interpreting a few hand-written letters and numerals on faxes, though it didn't produce recognisable English from the faxes' degraded text.
On a page torn from a computer magazine, Readiris made about 30 errors - most of which were zeros instead of full stops. When I opened this document in Microsoft Word, I found that Readiris had preserved the page's three-column format, with separate articles on the top and bottom halves of the page. Unfortunately, the program inserted several large black characters at random, covering up some of the text.
No OCR software is perfect, and Readiris has its share of annoying weaknesses. It offers no way to save graphics from the original document with the recognised text. And though IRIS also publishes language-translation software, Readiris doesn't seem to perform any kind of semantic analysis on recognised documents. For example, in several instances it mistook a comma for a full stop, even though the following word wasn't capitalised. In training mode you can teach Readiris to interpret font shapes, but it can't spell-check the resulting words.
Could Readiris revolutionise your organisation's information-gathering opportunities? Nope - OCR software isn't good enough to do that yet. But if you set realistic goals and clean up after the software, Readiris can get those static printed documents into your electronic information systems as well as any other package on the market.