Embedded Digital System Co., Ltd.

Embedded Digital System

An Introduction to Speech Recognition

Kimberlee A. Kemble

Program Manager, Voice Systems Middleware Education

IBM Corporation

Have you ever talked to your computer? (And no, yelling at it when your Internet connection

goes down or making polite chit-chat with it as you wait for all 25MB of that very important file to

download doesn't count). We mean, have you really, really talked to your computer? Where it

actually recognized what you said and then did something as a result? If you have, then you've

used a technology known as speech recognition.

VoiceXML takes speech recognition even further. Instead of talking to your computer, you're

essentially talking to a web site, and you're doing this over the phone.

OK, you say, well, what exactly is speech recognition? Simply put, it is the process of converting

spoken input to text. Speech recognition is thus sometimes referred to as speech-to-text.

Speech recognition allows you to provide input to an application with your voice. Just like clicking

with your mouse, typing on your keyboard, or pressing a key on the phone keypad provides input

to an application, speech recognition allows you to provide input by talking. In the desktop world,

you need a microphone to be able to do this. In the VoiceXML world, all you need is a telephone.

For example, you might say something like "checking account balance", to which your bank's

VoiceXML application replies "one million, two hundred twenty-eight thousand, six hundred ninety

eight dollars and thirty seven cents." (We can dream, can't we)?

Or, in response to hearing "Please say coffee, tea, or milk," you say "coffee" and the VoiceXML

application you're calling tells you what the flavor of the day is and then asks if you'd like to place

an order.

Pretty cool, wouldn't you say?

A closer look...

The speech recognition process is performed by a software component known as the speech

recognition engine. The primary function of the speech recognition engine is to process spoken

input and translate it into text that an application understands. The application can then do one of

two things:

?¤The application can interpret the result of the recognition as a command. In this case,

the application is a command and control application. An example of a command and

control application is one in which the caller says “check balance”, and the application

returns the current balance of the caller’s account.

?If an application handles the recognized text simply as text, then it is considered a

dictation application. In a dictation application, if you said “check balance,” the

application would not interpret the result, but simply return the text “check balance”.

Note that VoiceXML 1.0 uses a command and control model for speech recognition.

Embedded Digital System Co.,Ltd. CANADA 嵌入数码系统公司 Email: embedigital@yahoo.com