After years of widespread competition, the desktop optical character recognition (OCR) market has come to be dominated by two products: Caere's OmniPage and ScanSoft's TextBridge. Most recently, the options available to government agencies and departments have been winnowed by Caere's acquisition of two major competitors (Calera Recognition Systems, developer of WordScan, and Recognita, maker of Recognita Plus).
Fortunately for consumers, this wasn't simply a case of Caere buying off the competition. Instead of disposing of the other products, Caere incorporated parts of the products' OCR technologies into OmniPage Pro 10. The end result is a product with impressive recognition accuracy. Although TextBridge lags a tad behind in accuracy, the program offers an extremely easy-to-use interface that is well suited to novice users.
To test OmniPage and TextBridge, we scanned the same set of 50 typewritten, magazine and spreadsheet pages. In each conversion, we looked for not only the accuracy with which the program translated text, but also for the fidelity with which the program could reproduce formatting. In addition, we scored the programs according to their ease of use and flexibility in handling different types of hard copy.
OmniPage emerged as the decided winner, but TextBridge was a strong contender that may be better suited to agencies or departments with novice users. by Mike HeckCaere's OmniPage Pro 10Thanks in part to the recent incorporation of newly acquired technologies from Calera and Recognita, Caere's OmniPage Pro 10 has become the standard to beat in desktop OCR. The program's accuracy in our tests with Microsoft Word documents was almost perfect (making, for example, only two mistakes on an 800-cell spreadsheet). Overall, OmniPage turned in an accuracy rate of more than 99 per cent on our tests. Moreover, OmniPage is especially strong at handling degraded pages, such as faxes that may not have come through clearly. Additionally, OmniPage did a fine job maintaining elements of original pages, including font characteristics, column layout and colour graphics.
OmniPage's new interface has a set of tabs that lets users easily choose from three processing modes: AutoOCR, manual or OCR wizard. Using the automatic mode, we merely clicked the start button and OmniPage scanned and recognised pages using preset options. Furthermore, zoning (specifying areas on a page containing text to be recognised) and OCR now occur in a single step, which speeds the entire process. On an Intel 300MHz Pentium II PC, a page was scanned and recognised in an average of 20 seconds. The manual mode enabled us to draw zones and choose options, such as whether the proofing step was invoked. We particularly liked the ability to change preferences at any time during the OCR process. For example, you can redraw zones on a page and recognise the text again without rescanning the document, which speeds work. Furthermore, we set the software to automatically scan documents at regular intervals (such as every 30 seconds) from our flatbed scanner.
The improved OCR proofreader now includes five zoom levels, making it easier to compare recognised pages against the original document image. Moreover, a new voice read-back feature spoke the converted document as we read along from the original. For spreadsheets or other numeric material, that feature was indispensable for verifying recognition results.
OmniPage's identification of tables and other layout characteristics (such as fonts) was superb. Still, for those few misses, it's now easier to correct errors immediately. For example, the new Table Editing Window lets you revise the contents of cells before saving the document as a word processing or spreadsheet file. Finally, the OmniPage Pro package includes a personal edition of OmniPage Web, which converts multiple-page documents (up to 10 pages) into hyperlinked Web sites.
The strides Caere has made in improving OCR accuracy and OmniPage's usability, along with the low upgrade price, make the program an excellent choice.
Remarks. This latest upgrade offers a refreshed OCR engine for better accuracy, improved formatting commands, plus greater ease of use with features such as a new interface and voice readback. Fast, reliable conversion of paper documents, especially complex layouts and spreadsheets, mean workers spend less time recreating documents electronically and therefore are more productive.
Price and availability. OmniPage Pro 10 is available over the Internet for $US499. It is also available through Caere's Australian distributor:
Performance Sales (02) 9450 0777
ScanSoft's TextBridge Pro 9.0 Business EditionTextBridge Pro 9.0 Business Edition, introduced in June, runs nose to nose with the previous release of OmniPage (9.0) in features. Moreover, TextBridge held up well in our testing, matching the accuracy of OmniPage 10 on some pages.
TextBridge software is extremely simple to use, a benefit to departments with novice users. We had no trouble understanding the cleanly laid out user interface, and scanning and recognition is simple. Configurable icons enable you to get pages, recognise and send finished text to another application.
Similarly, tools for managing difficult documents will not confound casual users, but they offer the control experienced workers demand.
For example, to get the best scan of our colour magazine pages, we simply clicked the Page Type button and picked the desired icon - there was no need to delve into complex dialogues.
TextBridge Pro 9.0 captured and maintained colour and greyscale images in our converted documents. Furthermore, we easily zoned oblong and L-shaped regions then split and merged zones - essential tasks when converting precise area of pages.
Overall, recognition results were very good, with the program turning in an accuracy rate of just less than 99 per cent on our tests. The software detected text on a tinted background and also correctly recognised reverse type, columns and drop caps. However, TextBridge Pro had a bit of difficulty maintaining text attributes - sometimes substituting plain text for boldface or rendering text as italic when the original was normal. TextBridge adeptly handled tables, which are often part of agency reports.
Furthermore, the software analysed the scanned page and let us add, move or remove cell lines before recognition occurred.
As you proofread documents, part of the original scan appears within the toolbar for comparison. But you can't zoom in on the image or view the recognised text alongside the original page. Even so, TextBridge lists alternate spelling suggestions, which accelerates the proofing process.
Perhaps the biggest draw of this version, though, centres on four Portable Document Format output options. With one option, when TextBridge encountered a suspect word, it substituted the actual scanned image in the finished Adobe Systems Acrobat file to maintain the document's integrity.
Other PDF options enable you to save converted files with no word images or, conversely, render the entire page as an image. In fact, some agencies use this latter feature exclusively, bypassing the OCR step. Pages exported to HTML closely resembled originals, but TextBridge doesn't have the Web site building capability found in OmniPage.
In all, government users faced with going from paper to digital documents will benefit from TextBridge's simple interface design along with its formatted Web page output and PDF creation capabilities.
Remarks. Agencies tasked with converting numerous paper documents to Adobe's Portable Document Format or Hypertext Markup Language for the World Wide Web will benefit from TextBridge Pro's multiple output options. In addition, this version reliably saves electronic files in standard desktop formats, such as Microsoft Word. Recognition accuracy and page layout retention are very good, so converted documents should require minimal retyping and reformatting.
Price and availability. TextBridge Pro 9.0 Business Edition sells on the open market for $US499. It can be ordered online directly from ScanSoft.