Embedded Digital System

Speech Design: When and How

At the beginning of the 21st century, the major stumbling block to the widespread acceptance of using speech in the user interface is no longer slow speed or poor accuracy.

Convenience and usability are the issues that must now be addressed before speech will penetrate larger markets.

The most successful users of speech recognition are currently those who must use it or are highly motivated to get it working for some other reason, like people with various physical disabilities. These users are willing to spend the extra time and work through convenience and usability issues because of their special needs. This is too much of a commitment for the average user, who just wants to sit down and start using a product to complete a task. If you are interested in marketing your solution to a wide range of customers, convenience and usability must be addressed at the beginning of your development.

Conducting a basic usability study is the best way to gain an understanding of convenience and usability for your targeted customers. A basic usability study should include:

Gathering feedback from potential customers and a study of their work habits. This can provide ideas for new or simplified interfaces that utilize speech technologies.
Having the user participate in the development of your solution from the very beginning is crucial to your success. Keep your focus on your objective of user acceptance and satisfaction.
Try to organize a representative focus group of target users and analyze their current approach to job tasks. Ask them what they feel is missing or inefficient in their workflow and how they feel it could be fixed. Be specific and ask how they think speech could be employed to fill this need, and where they feel speech might be unnecessary.

Once the working environment and needs of your end-users are understood, the next step is to consider all existing technologies. In addition to speech, this should include anything that the customers already use or could use in their work. For example, if your customers work in a mobile environment, you could consider the use of PDAs and wireless connections. Then determine if the technology available to you will accommodate their needs. Many factors can influence this decision. For instance, what is the physical work environment? Noisy environments such as stock exchanges and airports may preclude the use of speech as a reliable interface to an application. Critical control applications, such as in jet planes, might not tolerate even a rare misrecognition.

In addition to environmental factors and critical needs, you need to consider social factors. For example, the users themselves may be very resistant to the adoption of a new method of working. If their resistance is high, it may be very difficult to implement speech as a new way to accomplish tasks. You may need to make speech a secondary, or alternative interface to be used by a subset of the users.

If you have determined that speech would be an appropriate interface for your customers, you need to begin architecting your speech design. speech designs can be categorized into three major groups along a continuum from Speech as an Add-on to Designed for Speech to Speech Required.

Speech as an Add-on

If you have an existing application, this type of solution would represent the least amount of work because you don't need to make any coding changes. The GUI of your application remains unchanged and speech features are usually provided by means of a third-party add-on. Your application remains totally unaware of the presence of speech. For example, you could install a commercially available speech application and dictate into your application's free-text fields without making any changes to your application.
Designed for Speech

This type of speech-enabled application requires minor changes to the GUI and may be a solution for upgrading an existing GUI. The target application is usually aware of the speech components that are directly integrated into the program. Speech features are presented via an interface in which they supplement existing features. This is the preferred mode for a speech design, as it offers the most flexibility through allowing multi-modal input. It can also shorten the learning curve for new users and increase productivity in most situations. For example, a kiosk that provided graphical and well as audio instructions would allow users to read the instructions as well as hear them. This would greatly increase the chances of users knowing what to do or say.
Speech Required

In this category, the entire application is designed with speech as the primary user interface. This usually entails a complete re-write of your interface code and is the most work. Telephones and mobile devices are prime targets for this mode of speech user interface because there is no other easy-to-use input mechanism available to the user. The application may or may not have a GUI, but many features will be accessible only by speech. Adding voice prompts and responses can allow for a program that is both hands-free and eyes-free. An example of this type of application would be in an operating room or lab where users are wearing gloves and need to be looking at the patient or specimen and not at a computer screen.

After the scope of your project has been defined by studying usability factors and you have determined a place for it along the continuum of speech user interface categories, you are ready to start working on your design. A good rule of thumb is that any task that is not enhanced by a speech design should be left to other methods. Keyboards and mice, when they are available, are very robust input devices. Speech inputs, no matter how well they are implemented, are less robust because of their dependence on a number of more error prone hardware components and software services. However, a speech user interface that has taken into account the needs of the end-user and is well designed may offer benefits that cannot be achieved in any other way. Speech user interfaces offer developers a new set of tools to enhance the productivity of people who use their applications.

TOP HOME

Embedded Digital System Co.,Ltd. CANADA 嵌入数码系统公司 Email: embedigital@yahoo.com