The Voice of Technology


			Advanced Search

	Technologies
	- J2EE
	- J2SE
	- J2ME
	- Java Card
	- Web Services
	- Wireless
	- XML
	- Other
	Downloads
	- Early Access
	Documentation
	- APIs
	- Tutorials
	- Code Samples
	- See All
	Industry News
	Developer Services
	- Bug Database
	- Forums
	- Support
	- See All
	Java BluePrints

Printable Page

THE VOICE OF TECHNOLOGY The Java Speech API

by Steven Meloan

In the 1986 Star Trek movie, "The Voyage Home," the crew of the Enterprise traveled back in time to 20th-century Earth. Audiences laughed uproariously as Scotty naively picked up the mouse from a nearby PC, and commanded -- "Compu-tor, Compu-tor!"

headset Just a little over ten years later, however, no one's laughing anymore. Speech technology is quickly becoming a commonplace means of getting data into and out of computational systems. Speech synthesis systems are reaching ever new levels of clarity and sophistication, and speech recognition systems -- from industrial command-and-control technologies, to dictation services -- are achieving impressive new degrees of speed and accuracy. But the world of speech technology is still one of proprietary, platform-specific systems, filled with arcane APIs that, particularly from a development perspective, require a significant background in speech technology.

[Java Speech technology] has a very strong foundation -- the object-oriented approach that the Java programming language is built upon.
- John Earle, Chant Inc.

The Java Speech API ("JSAPI"), officially released this past October, offers a cutting-edge alternative to this status quo -- bringing the many benefits of the Java platform to the burgeoning world of speech technology. The JSAPI 1.0 specification provides a standard, secure, and cross-platform means of incorporating state-of-the-art speech technology into Java technology-based applets and applications -- for use on the desktop, in telephony servers, and in small portable and embedded devices.

In the Beginning

The JSAPI specification was developed by Sun Microsystems, Inc. in collaboration with leading speech technology companies: Apple Computer, Inc., AT&T, Dragon Systems, Inc., IBM Corporation, Novell, Inc., Philips Speech Processing, and Texas Instruments Inc.

"We laid out five basic goals in the early design phase of JSAPI," says Andrew Hunt, principal investigator for Sun Microsystems' speech technology group. Those goals were:

To provide support for speech synthesizers, and for both command-and-control and dictation speech recognizers.
To provide a robust, cross-platform, cross-vendor interface to speech synthesis and speech recognition.
To enable access to state-of-the-art speech technology.
To support integration with other capabilities of the Java platform, including the suite of Java Media APIs.
To be simple, compact, easy to learn, and internationalized (able to process languages in addition to English).

Extending Speech Technology

"In developing the API specification, Sun's focus was on enhancing developer productivity by making JSAPI easier to learn and use," says Hunt. "Now that the specification is released, Sun is working with speech technology companies to provide implementations of the API. Because Sun does not own, and is not developing, speech recognition or speech synthesis technology, we have no current plans to release an API implementation, but this does facilitate working with the speech companies since we are not in direct competition."

...we had customers come to us and ask that our speech engines be made accessible through JSAPI [the Java Speech API].
- Tom Morse, Lernout & Hauspie

The Java Speech interface specification was spawned through a lengthy open development process -- with major contributions from the partner companies and other speech companies, feedback from the application developer community, and months of public review and comment.

"We worked closely with Sun on the definition of JSAPI," says Steven De Gennaro, manager of enterprise conversational systems for IBM. "As the definition evolved over various releases, we tried to have an implementation available through our alphaWorks site -- where we put out early previews of technology -- running on top of our existing speech engines. That was helpful to developers, because they could then write applications against it, and we could find out what was good about the spec, what needed work, and where we needed to address design issues."

"Sun was very smart in how they approached bringing speech technology to the Java platform," says John Earle, president of Chant Inc. Chant is a software services company primarily focused on speech-enabled applications. They, too, provided feedback during the development of the Java Speech API. "Sun had the advantage of being able to look out on the horizon and see historically how some of these other speech technologies had evolved," says Earle. "So, even though this is only release 1.0 of the JSAPI specification, it's actually based upon very mature technology. And it also has a very strong foundation -- the object-oriented approach that the Java programming language is built upon."

"Simply put," says Andrew Hunt, "Sun wanted to produce an API that allowed the speech companies to put their latest and greatest technologies onto the Java platform." The Java Speech API is currently an extension to the Java platform -- part of the Java Media APIs. These are a suite of class libraries providing rich media and communications capabilities -- including audio/video playback, capture, conferencing, 2D and 3D graphics, animation, advanced imaging, and more.

"The current JSAPI-enabled engines are based around existing speech technologies that the companies have," explains Hunt. "They take their existing proprietary APIs, and wrap them in Java-technology based code to implement the Java Speech API. This reduces costs to speech companies, as well as time to market. And as the speech technology companies update their existing engines, that technology becomes available to the Java Speech technology community."

In order to better facilitate natural speech capabilities, two companion specifications are included with the Java Speech API -- the Java Speech API Markup Language (JSML), and the Java Speech API Grammar Format (JSGF). With JSML, an XML technology-based markup language used with speech synthesizers, explicit pronunciations can be specified for words, phrases, acronyms, and abbreviations, as well as control of pauses, boundaries, and emphasis. Meanwhile, the JSGF provides a facility to define grammars for speech-recognition systems. "Both of these specifications can also be applied separately from JSAPI," adds Hunt, "and, in fact, several companies are now using them separately from the Java platform."

Going To Market

Already, IBM and Lernout & Hauspie -- both respected developers of speech synthesis and recognition software engines -- have provided interfaces to their technologies through the Java Speech API. "We were already following and participating in the definition of JSAPI," says Tom Morse, director of engineering for Lernout & Hauspie. "We hadn't officially started to implement the interface ourselves," Morse reports, "but then we had customers come to us and ask that our speech engines be made accessible through JSAPI -- so it was really the needs of our customers that brought that about." The company now has their TTS3000 text-to-speech product enabled through JSAPI, as well as their ASR 1600, a grammar-based, continuous speech-recognition engine.

In the meantime, IBM's "ViaVoice SDK for the Java Environment" product implements most of JSAPI 1.0 for the recognition and synthesis engines used in the company's well-known ViaVoice product. It supports continuous dictation, command-and-control, and speech synthesis (all in multiple human languages), and is currently available for Microsoft Windows and Windows NT. "Speech engines today are very computationally intensive," explains De Gennaro. "They're all native code. So while JSAPI provides an interface for programmers, it still ultimately maps into the native code underneath it." New Java Speech releases, with additional capabilities, will continue to be made available through IBM's alphaWorks site.

Most recently, IBM announced a brand new implementation of JSAPI for their ViaVoice product line. "We've added Java Speech technology support for our ViaVoice for Linux toolkit that we released a few weeks ago," says De Gennaro. "This puts Java Speech technology on yet another platform, and can be downloaded from our alphaWorks site."

Chant, meanwhile, has long been active in facilitating speech-enabled software development. The company's object-oriented SpeechKit middleware supplies high-level speech-oriented components for use in Java, C++, Visual Basic, and other programming languages. "Before JSAPI was released," says Chant's Earle, "we hooked through the Java Native Interface in order to provide developers who use the Java platform with a way of voice-enabling their UIs to the speech engines that are commonly available."

Now, the company has begun to incorporate the Java Speech API into their tools of the speech technology trade. "After JSAPI was released," says Earle, "we began integrating it underneath our class technology. When using our product, it will be transparent to the person writing in the Java programming language whether they're even using a JSAPI-enabled engine or not. We'll isolate them from that, so they can still just concentrate on the application."

Into the Future

The next step in solidifying JSAPI's place as an industry-wide speech standard is in providing more speech-engine options, and across more platforms. "We expect that more Java technology implementations of speech engines will appear over time," says IBM's De Gennaro, "and that more native-code engines will become available on Java technology-enabled platforms. That will provide the runtimes necessary for greater numbers of Java Speech technology-enabled applications."

Speech technology is already increasingly found on the computer desktop -- in speech synthesis (text-to-speech) and dictation. And desktop dictation systems now offer speed and accuracy rivaling that of even the fastest typists. But many of the driving forces for the proliferation of the JSAPI specification may yet prove to be outside of the traditional desktop world -- in telephony systems, Web browsers, and small portable and embedded devices.

"Applications are starting to pop up all over the place where you can replace touch-tone systems with voice," says Andrew Hunt. "If you've ever been through ten menus of touch-tone, you know how painfully slow that can be. But with voice, you can speak hundreds, or thousands of words. So this is another area in which Java technology is starting to have a presence, by being a standard interface with which you can deploy telephony applications across multiple platforms."

The main focus for us in the NT environment, beyond what we currently support, will be Java and JSAPI technology-based initiatives.
- Chris Czekaj, Lucent Speech Solutions

In January of this year, Lucent Technology formed Lucent Speech Solutions, an organization chartered to deliver speech recognition, text to speech, and speech verification technologies to both internal and external customers. "Bell Labs pioneered the development of the initial algorithms for speech recognition and text to speech," explains Chris Czekaj, business development manager within Lucent Speech Solutions. "Now," he says, "we're scaling them down to deliver these same technologies and algorithms outside the AT&T network--while improving on them, from both a hardware and a software perspective."

Customer-Driven Advances

Java technology and the Java Speech API specification are poised to play a major role in this endeavor. "The main focus for us, beyond what we currently support, in the NT environment, will be Java and JSAPI technology-based initiatives," says Czekaj. As with Lernout & Hauspie, this push has been, in part, customer driven. "We've had several potential clients that have expressed interest in such capabilities," he says. "As a result, we're training our developers and researchers in Java and JSAPI technologies, all the fundamentals, in order to start developing our core technologies in that environment."

Another huge growth area for speech technology is in the small, portable, and embedded system market -- tailor-made to the scalability and cross-platform nature of Java technology. "We're seeing a lot of interest in the Java platform on small devices, and also on the server side," says De Gennaro. "To provide speech services in both of those environments, it makes sense to have a standard API."

"If you've got a computer the size of your cell phone," explains Hunt, "then you can't do keyboard entry with typed input, because there's simply not enough space to put in a keyboard. And in many instances, there's also not enough room for a full display. In such instances, a microphone and a speaker are a much more powerful and flexible input and output method."

Speech Technology on Consumer Devices

There's a great fit between JSAPI and where we see speech going in general...
- Steven De Gennaro, IBM

As an extension to the Java platform, the Java Speech API can also be provided as an extension to PersonalJava^TM and EmbeddedJava^TM platforms -- Java technology-based application environments designed specifically to operate on constrained devices with limited computing power and memory. "There's a great fit between JSAPI and where we see speech going in general," says De Gennaro, "-- conversational interfaces that involve small devices, smart phones, and cell phones."

Such embedded systems will undoubtedly find their way into a whole range of business applications -- from the medical field, where hands-free, eye-free work may be required during surgery, to the warehouse/industrial space, to the automobile, to remote email and calendar access, and even for voice-driven Web access, such as form filling.

And getting back to the desktop, speech technology may increasingly act as an adjunct to the current traditional GUI -- to speed configuration settings in software applications ("give me 12-point, Helvetica, bold"), and to notify users of background activities such as printer status.

Facilitating Accessibility

Finally, speech technology will increasingly facilitate an endless array of accessibility features for people with disabilities -- a potential user base of 43 million. The IBM Special Needs group has released a toolkit called the Self Voicing Kit (SVK) that works with Java technology Project Swing components to provide audio output features for people with disabilities. The SVK is built on top of the Java Speech API and the Java Accessibility API, and is available from IBM's alphaWorks site.

A Greater Pool of Developers

Having been developed with the input and insight of leading speech technology experts, the Java Speech API brings with it not only cross-platform standardization, but also industry-leading ease of use. "We've tried to enable average developers, with no special background in speech technology, to be able to learn the API, and to use it effectively," says Hunt. "At the moment, there may be only a few thousand experienced speech developers around the world," he conjectures, "but probably hundreds of thousands who can program graphical environments. Our goal is to provide a technology basis on which we can start to increase the number of people who write speech-enabled applications -- from the thousands, into the hundreds of thousands, and even beyond."