Developer talks up speech recognition

Richard Grant has been involved in voice technology ever since Texas Instruments introduced its first speech board in 1985. He was the founder of Voice Pilot Technologies in 1994, was a developmental partner for IBM's speech products, and this year founded Bay Area Science Centre (BASC) in the US. IDG's Ephraim Schwartz spoke with Grant to get a sense of where voice technology is headedSchwartz: There's a great deal of confusion among users when they hear the terms "speech recognition" and "natural language understanding". What is the difference?

Grant: The term that we use in the industry is natural language modelling. Speech recognition covers a wide aspect of the technology. Anything that responds to speech is put into this giant category and includes command and control speech macros added over a keyboard or mouse command. A natural language model would be the ability for a human to transfer his mere thoughts to the computer in his natural tongue and have the computer understand and process it.

What will the impact on business be when computers are able to understand natural language queries?

A person will sit down, knowing they need a file, and say: "I need the Jones file." You won't have to go through a step of open file, down, down, down, step, step, step. So the natural language model will be what the corporate world will use because you get more productivity quicker from users and an easier transition to new processes in a computer.

Will natural language understanding have an impact on an IS department?

It will lower the total cost of computing because if a user can just talk to a computer and have it provide a function without having to know a series of complicated commands, or even a series of step command and controls using speech, then most users will be able to work themselves around simple problems. Plus, you'll now run into natural language help systems or interactive help systems that will understand that a person can ask a computer how to do something, instead of asking the IS department personnel to come and show them how to do it.

On the flip side, you will probably have a higher cost in the interim of bringing your applications up to a natural language model.

And what is the dollar impact of that?

To use the full power of natural language speech will require powerful processors. The more powerful the processor, the faster the access, the more memory internally, the better and more accurate and easy to use the natural language application will be.

Is a Pentium II good enough?

The current level is a great starting point. Obviously, with the Digital Alpha series type chips, the Merced, the 400MHz processors, you're going to see an unbelievable increase in performance by speech technologies.

Can you lay out a road map for natural language modelling?

You will see this year, alone, almost every major company putting out some kind of speech application. The quality of that application will be determined by the consumers and their ability to understand the paradigm. And this is a major shift in the industry. A part of what we do is we also hear. So we're going to see an interactive product as well with the use of visual cues. You're going to have touch-screen pens, which would then complement your voice to just check off things on this visual definition. So you're looking at speech and some type of touch technology, either pen or actual touch-screen technology, becoming the only I/Os in the future.

Where does BASC fit on the road map?

BASC has a mission statement of providing technology for all people. See, we focus on what people can do, not what they can't do. We are going to intelligent agents, we are going to two natural language models.

Give me an example of how that translates in day-to-day use.

You could come into your office in the future and just ask your computer -- what's up? It will come back to you and tell you that you've got these appointments, that it had to reschedule this appointment for you because you had a conflict. You need to get this data. And then you could go to the computer and say -- well, bring me research on this information, this element -- so I can be more informed of it.

You will find that instead of having to understand the processes of computers, you'll be able to just talk to your computer and have it do the processes for you.

