These days, most documents are generated electronically, which makes it easy to question how Optical Character Recognition (OCR) software fits into your Web-creation workflow. After previewing the beta version of Caere's OmniPage Web 1.0, however, I think OCR could play a bigger role than you might imagine.
First shown publicly at the recent Demo '99 conference, OmniPage Web melds Caere's OCR scanning and recognition software with an ingenious neural network technology called Logical Structure Recognition (LSR).
OmniPage Web, which will ship on March 15, draws conclusions about the structure and formatting of paper documents, such as the difference between headings and body text. As a result, the material it converts to HTML closely matches the original - without the need for manual editing. Just as important, the software automatically adds interactivity to pages, such as a variety of hyperlinks and a table of contents.
After scanning a 75-page paper policy manual, LSR created an outline of all the objects, including headings, tables, graphics, captions, headers and footers in just a few seconds. It also linked cross-references (such as "See Section 1.2"), e-mail addresses and URLs to their destinations.
After LSR finished its task, OmniPage Web displayed three views of my manual. The document's hierarchical structure appeared in an outline panel; the image view held the original scan; and the text view simulated how the document would appear in a Web browser.
Further, all the windows were dynamically linked. For example, I used the Outline window's toolbar to demote or promote section headings, such as changing from an H2 to an H3 tag. After making the changes, the Web view instantly showed the text-size change. I also deleted objects from the outline, which removed them from the converted page.
There's also a very helpful filter command that I used to hide all the body text from the outline; this made scrolling through the remaining content much faster.
As for accuracy, OmniPage Web did an almost perfect job of recognising elements on the paper pages, including graphics. My only minor complaint is that tables without clearly defined borders did not always convert exactly. But OmniPage Web's zoning and editing tools made easy work of fixing these formatting errors.
I was pleased with OmniPage Web's extensive options for controlling HTML output. For starters, everything is neatly organised around three tabbed dialogue pages.
New page generation
I could adjust the General settings to generate plain HTML that worked with all browsers and I could decide when a new page should be generated, such as at every H2 heading. The software also created links to the original scanned images by placing a thumbnail graphic in the HTML page; this option was very helpful when the originals had marginal text that did not reproduce perfectly.
In the Component dialogue I specified how OmniPage Web created the navigation panel that appears at the top or bottom of each page. For instance, I specified my own image map so the pages matched the look of an existing intranet site.
Finally, OmniPage Web's Component area let me use Cascading Style Sheets. I could change the style of each object by specifying different fonts for individual headings and body text. Furthermore, these formatting selections can be saved as personal themes. The software will ship with 20 predefined themes to help you start producing professional-looking Web pages. OmniPage Web did a great job of recognising paper documents, matching the original format, and dividing them into individual HTML pages.
For all those employee handbooks, ISO-9000 specifications, and other long documents that exist only in paper form, OmniPage Web should prove particularly useful in getting them on the Web.
The Bottom Line
OmniPage Web 1.0, beta
Designed for both departmental managers and webmasters, OmniPage Web is ideal for publishing lengthy, paper-based documents containing consistent formatting to intranets and the Internet. After scanning, neural network technology examines the document formatting and automatically creates linked HTML pages.
Pros: Creates an outline of all objects; hyperlinks cross references, URLs, and e-mail addresses; easy editing, including promoting or demoting section headings; supports Cascading Style Sheets.
Cons: Some problems recognising tables without first manually zoning.
Platforms: Windows 95, Windows 98 or Windows NT 4.0.
Price: Not yet announced.
Ship date: March 15.
OmniPage Web distributor: Performance SalesTel (02) 9450 0777