Microsoft has welcomed a new addition to its server family: the Speech Server.
Running on Windows Server 2003, the first public beta of Speech Server will ship with Beta 3 of Microsoft's Speech Application SDK (Software Development Kit) in what signals speech technology's return to the corporate agenda.
Due for manufacturing release before mid-2004, the product will include a text-to-speech engine from SpeechWorks International - Microsoft's own speech-recognition engine - and a telephony interface manager. The offering will also include middleware that is being designed in partnership with Intel and Intervoice to connect the Microsoft product to an enterprise telephony infrastructure.
But it is the server's SALT (Speech Application Language Tags) voice browser that sets Microsoft apart from the standards crowd.
Rather than adhering to VXML (Voice XML) - the current W3C standard for developing speech-based telephony applications - Speech Server is compatible only with applications that use the specifications developed by the SALT Forum, of which Microsoft is a founding member.
The SALT Forum has submitted its specifications to a W3C working group, but they are far from becoming a standard.
"The process could take years," admitted James Mastan, director of marketing for the speech technologies group at Microsoft.
The SALT specification was originally targeted at the multimodal market for browsing the Web on handheld devices. The theory was that users required multiple ways to interface with smaller devices and that voice would be chief among them, but the market for multimodal handhelds has not materialised.
Microsoft executives consider the SALT-based Speech Server is ideally suited to call centres where the cost of using live operators is becoming prohibitive.
An InStat/MDR research report stated that live agents cost $US1 to $US5 per call as opposed to 20 cents for a speech-recognition system.
"This is not a desktop solution but an enterprise application," an analyst at Forrester Research, Elizabeth Herrell, said.
Bill Meisel, a principal at TMA Associates, a leading speech technology research company, said enterprise voice adoption would increase due to Microsoft's market influence. Yet, because Speech Server would compete directly with established VXML applications, Microsoft's actions would make speech technology adoption a more complex exercise for the enterprise.
Competing speech technology vendor IBM is a case in point.
Big Blue supports VXML and the W3C standard, according to Gene Cox, director of mobile solutions at IBM.
Cox said significant VXML applications already existed in the enterprise at companies such as AT&T., General Motors’ OnStar division, and Sprint.
"VXML conforms to all W3C royalty-free polices," Cox said. "But SALT is like Internet Explorer; it is free as long as you buy Windows."
The debate over which technology to use would not be fought out at the customer level, Herrell said, but rather by developers.
"Customers just want a solution that works," Herrell said. "Developers will decide which platform to use based on its quality, and for that, it is too early to tell."