For years, speech recognition has been a solution for special needs, such as those of disabled individuals and medical and legal transcribers. The rise of computer-related repetitive-stress injuries has created a new class of users who might see speech recognition as a godsend, but, as Ana Orubeondo found out, the technology still struggles for mainstream acceptance.
Despite the fact that speech-recognition programs can handle just about any task that a keyboard and mouse can - including word processing, e-mail, and even Web browsing - thus far it has proven impractical for most users. Hardware requirements are steep, and even with enough horsepower, speech recognition can be adversely affected by background noise. The varying facets of the human voice, including pitch, volume, and accent, also create challenges.
However, several advances are improving the odds for the future widespread adoption of speech-recognition products. Cheaper and faster microprocessors and computer memory will help. People are also becoming more accustomed to speech-recognition technology because of its increasing deployment in telephony applications, such as messaging and call-centre assistance. It's not uncommon today to talk your way out of voice-mail jail or locate a name in a company directory simply by asking for it.
Beyond telephony, voice-control features are being added to the fastest-growing computer-related activity - Web browsing. The circle is completed by makers of microphones and headsets, who, responding to increased interest in voice recognition, are making advances in the noise reduction, comfort, and mobility of their products. Eventually, as we enter the era of so-called pervasive computing, voice control will be a common way to interact with the microchips around us.
But that's tomorrow. Today, the challenges of using speech-recognition software are numerous, and they will be overcome only with time and effort. As mentioned, desktop speech-recognition software requires beefy PCs in order to be functional. This is because the software uses extremely large databases to turn spoken input into commands or words on-screen.
The purpose of this feature is to find out how well the market-leading PC-based voice-recognition programs handle these issues. I examined Dragon Systems' Dragon NaturallySpeaking Preferred 4.0, IBM ViaVoice Pro Millennium Edition, and Lernout & Hauspie's L&H Voice Xpress Professional 4.0. I also took a more in-depth look at ViaVoice.
Frankly, speech-recognition programs are a marvel. They are tremendously complex pieces of software that work by disassembling spoken input into component sounds and then piecing these sounds together to form words.
When a user speaks into a microphone, the software goes to work, breaking down the flow of sounds into distinct, non-divisible units called phonemes - pronounced "foe-neem". The software then searches its databases, trying to match the pattern of phonemes with known words. Language-processing techniques, including context and best-guess algorithms, are also used to improve recognition and hopefully reconstitute the spoken word. Of course, all of this must happen continuously, as fast as a person can speak.
One major challenge is using a microphone in an open area. This problem is twofold. Background noise can interfere with recognition accuracy, even when noise-cancelling headsets are used. Also, the use of speech-recognition software can be disruptive to those working nearby. To his or her cube mates, the speech-recognition aficionado might as well be on the telephone all day. And a user sensitive to this issue might lower his or her voice, thus reducing the software's accuracy.
The biggest challenge, however, may be the learning curve that speech-recognition products require. It takes motivation and patience to be successful at using speech recognition. Although vendors strive to make their products an out-of-the-box experience, the reality is that it takes time to establish a reliable voice profile. Even the introductory training can take hours, with its overview of proper dictation technique, program commands, and basic macros. Furthermore, advanced training with macros and for specific work situations is necessary to make the most of the software.
Constant attention and adherence are required to achieve excellent accuracy and speed. You might be able to decipher your wee-hours handwriting the following day, but speech-recognition software does not make allowances for your having been up half the night.
Learning any new software program takes work. But the bottom line is that making the most of speech-recognition software requires adapting to a new way of using a computer, not just memorising a new set of commands.
For testing, I used an IBM ThinkPad running Windows 95 with a 300MHz Pentium Pro, 64MB of RAM, and a 2GB hard drive. The recommended minimum requirements for these products are remarkably similar. Using Windows 95 or 98, the vendors recommend a 233MHz processor and 48MB of RAM. Under Windows NT Workstation 4.0, the memory requirement rises to 64MB.
I checked each application's built-in text editor (also called a dictation pad), but the bulk of my testing was with Microsoft Word 97. I dictated text in Word and analysed how well the command-and-control features of each speech-recognition product worked to edit the documents that I created. These products work with most Windows applications.
Although comparisons between the products were inevitable, I did not score them or match them up feature for feature. Rather, I wanted to get a sense of the state of the speech-recognition market by asking certain key questions. How long and complex is the start-up process? How good is its accuracy? How natural are the dictation and command-and-control processes?
It's been three years since our last in-depth look at the speech-recognition market, and I was impressed with the improvements that speech-recognition products have made in that time, such as the Web-browsing capabilities. I also expect to see continued advances for speech recognition in the telephony market.
However, I still feel that speech recognition is best-suited for workers suffering from repetitive-stress injuries, those with disabilities, or those performing transcription in technical disciplines such as medical and legal. Users willing to devote time to the training and mastery of the products will also benefit, but naturally this will be on a case-by-case basis.
All three of the products I looked at provide two stages of training: a general training session and a recommended additional training session to increase recognition accuracy and dictation speed. In addition, all three products offer some type of vocabulary-builder utility. These utilities allow you to add technical or other special words to the products' databases by either speaking them or having the software scan a document for unique words.
NaturallySpeaking uses a brief start-up tour to walk users through its standard operation, and I found the software very easy to set up and run. I had one complaint, however: in the general training process, I had to start again if I didn't complete the entire session.
ViaVoice uses an animated character named Woodrow to help get users started. Woodrow gave me information on how speech recognition works and tips on improving accuracy. As with the other solutions, I created my voice profile in ViaVoice by reading a supplied paragraph. All of these products include selections of text to use during training. The selections come in a range of difficulties to improve accuracy, although many users won't have the patience to wade through all of them.
I had no problem creating a profile in ViaVoice, the only product of the three for which I can make that claim. ViaVoice recognised everything I read, although it also took the longest of the three products to process my voice model at the end of training.
Unlike NaturallySpeaking, Voice Xpress allowed me to save a profile as I was creating it during the product's general training process and then return to it later. Unfortunately, Voice Xpress doesn't allow you to skip words during training, which hung me up once. I said a particular word over and over, but Voice Xpress would not recognise it. I saved and resumed the training session only to find myself at the beginning, and then I got stuck on the same word again. The only work-around was to train with another sample selection.
After training, I moved on to Microsoft Word. None of the products achieved 100 per cent accuracy, but in many cases, they missed the very same words, perhaps because of my slight Spanish accent.
The products' correction tools work very simi-larly. None were any better or worse than the others; it was a matter of learning marginally different commands and techniques for each. I found it easier to correct mistakes as I made them rather than going back and correcting the whole lot of them later.
NaturallySpeaking's accuracy was very good, but not perfect. To make a correction as I dictated, I had to say, "Correct that." This opened a dialogue box that listed the last several entries NaturallySpeaking had "typed" in my document. These numbered entries included single words, phrases, and even full sentences, depending on how the software had recognised my utterances.
To correct text, I selected the relevant entry, either manually or by saying "choose" followed by its associated number. I then said the word or phrase again, and the software made the replacement.
ViaVoice, which provided the most accurate recognition during the training stage, performed best during testing, although it wasn't perfect. Fixing errors in ViaVoice was similar to fixing them in NaturallySpeaking. When I said "correct", a dialogue box popped up with a list of recently entered words. I could choose one by saying its associated number, and then I could make the necessary correction.
Voice Xpress' accuracy was similar to NaturallySpeaking's: very good, but not as good as ViaVoice's. Voice Xpress missed many of the same words that NaturallySpeaking did, again perhaps because of my accent. I activated the correction dialogue box by saying "correct", and chose from the word list by saying "take", followed by the corresponding number. One plus: I could add new words to Voice Xpress' vocabulary database via the correction box.
Dictation was a breeze compared to editing and otherwise manipulating a document. I learned a lot about patience. This was partly because of spotty recognition and partly because of the frustration I experienced in moving from a keyboard and mouse to voice-only commands. I spent a lot of time flailing around documents. For example, it is easy to select one word in a paragraph when you are word processing: you move the cursor to it and double-click. Using voice commands to do something as simple as this is difficult to accomplish quickly. This is where determined users will persevere and ultimately make the most of speech-recognition software, while others will give up.
All of the products were challenging, and recognition was a lot lower than it was during dictation. I found that it was especially important to pause before and after saying a command to distinguish it from text. It was also important to speak clearly without over-enunciating.
In the final analysis, learning the best way to say a command was just as important as training the system for a unique vocabulary or a tone of voice.
With NaturallySpeaking, I had to click in the window of the program to which I wanted to talk. Commands were simple; I added paragraphs and lines by saying "new paragraph" or "new line".
Like the others, ViaVoice often did not recognise my commands. Navigating around a document became frustrating. However, the accuracy of commonly used commands improved notably the more I worked with the product. Again, I learned it was important to pause briefly before and after speaking commands. Voice Xpress was much the same. The recognition of common commands, such as "create a new file" and "close file", improved the more I used them.
Speech-recognition vendors have made terrific improvements to their products, adding features such as Web-navigation tools, mobile components, address books, and calculators. However, until usability becomes easier, medical and legal markets will continue to reap the most benefit from the software.
Speech recognition remains far from becoming a standard corporate application, but merits investigation, especially by people who will enthusiastically take on the challenge of training.
Summary: Speech recognition remains an alternative solution. Recent improvements in the software, such as Web navigation and mobile components, make implementation more attractive. But until the technology becomes transparent, only enthusiastic, dedicated users will have the wherewithal to stick with the inevitable adaptation difficulties.
Business Case: For disabled individuals, including those suffering from repetitive-stress injuries, speech-recognition software can be a cost-effective way to maintain productivity. Among typical users, however, only those with the determination to persevere through the training will make speech recognition a useful bottom-line choice.
Enhancements include Web navigation, mobile components, and other accessoriesBeneficial for disabled workersCons:
Patience and determination required for adaptation and trainingMicrophone use in open areas creates problems for user and neighbours