Software Dreams and Talking Machines

Speech and Speech Recognition

Resources on the World Wide Web

[The following materials were gathered from the web during the month of May 1996. This article has been in periodic update from 1993-1996.]

Author: Andrew Hund

This is a REPORT. SUPERADAPTOID does not REVIEW products that have not been personally evaluated by DEMONSTRATION.

Report Presented By: SUPERADAPTOID

The following article outlines the scope of Comp.Speech FAQ Postings. Institutional, research, and business resources available in the web. These business and product listings are not complete. However, this represents the Better-Of-The-Best of lists.

COMP.SPEECH FAQ POSTING - PART 3/3

Text By Andrew Hunt

FAQ SECTION 5 - SPEECH SYNTHESIS

SpeechLinks: Speech Synthesis
Q5.1: What is speech synthesis?
Q5.2: How can speech synthesis be performed?
Q5.3: References/Books on Synthesis
Q5.4: Speech Synthesis on the WWW
Q5.5: Speech Synthesis Software/Hardware

Q5.1: WHAT IS SPEECH SYNTHESIS?

Speech synthesis is the task of transforming written input to spoken output. The input can either be provided in a graphemic/orthographic or a phonemic script, depending on its source.

Could someone provide a more informative description?

Q5.2: PERFORMING SPEECH SYNTHESIS

There are several algorithms. The choice depends on the task they're used for. The easiest way is to just record the voice of a person speaking the desired phrases. This is useful if only a restricted volume of phrases and sentences is used, e.g. messages in a train station, or schedule information via phone. The quality depends on the way recording is done.

More sophisticated but worse in quality are algorithms which split the speech into smaller pieces. The smaller those units are, the less are they in number, but the quality also decreases. An often used unit is the phoneme, the smallest linguistic unit. Depending on the language used there are about 35-50 phonemes in western European languages, i.e. there are 35-50 single recordings. The problem is combining them as fluent speech requires fluent transitions between the elements. The intellegibility is therefore lower, but the memory required is small.

A solution to this dilemma is using diphones. Instead of splitting at the transitions, the cut is done at the center of the phonemes, leaving the transitions themselves intact. This gives about 400 elements (20*20) and the quality increases.

The longer the units become, the more elements are there, but the quality increases along with the memory required. Other units which are widely used are half-syllables, syllables, words, or combinations of them, e.g. word stems and inflectional endings.

Q5.3: REFERENCES/BOOKS ON SYNTHESIS

BOOKS AND PAPERS

* Douglas O'Shaughnessy, Speech Communication: Human and Machine Addison Wesley series in Electrical Engineering: Digital Signal Processing, 1987.

* D. H. Klatt, "Review of Text-To-Speech Conversion for English", Jnl. of the Acoustic Society of America (JASA), Vol 82, pp 737-793.

* "Talking Machines, Theories, Models and Designs" Eds, G. Bailly & C. Benoit (Elsevier: North Holland)

* I. H. Witten. Principles of Computer Speech, London: Academic Press, Inc., 1982.

* W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis, Elsevier, Amsterdam, 1995.

* John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to Speech: The MITalk System", Cambridge University Press, 1987.
Survey of the State of the Art in Human Language Technology Report edited by Ronald A. Cole et. al. with a section on Text-to-Speech Technologies.

BIBLIOGRAPHIES AND REFERENCE LISTS

WWW searchable online-bibiliography for Phonetics and Speech Technology with more than 8000 entries.
Provided by Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.

Computational Speech Processing
Speech Analysis, Recognition, Understanding, Compression, Transmission, Coding, Synthesis ; Text to Speech Systems, Speech to Tactile Displays, Speaker Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html

Q5.4: SPEECH SYNTHESIS ON THE WWW

Most of the following are links to WWW pages with demonstrations of speech synthesis. Plenty more links are included in the detailed list of speech synthesis software/hardware in Q5.5.

Speech Synthesis "Museum" URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html

Maintained by Jon Iles (j.p.iles@cs.bham.ac.uk) at the University of Birmingham. Information and speech samples for

YorkTalk
Loughborough Sound Images
University of Birmingham - FDFS
Eurovocs
DECtalk
AT&T Bell Labs Synthesiser
S.W.A.Ll.C. - Welsh Synthesis from CSTR
All-Prosodic Speech Synthesis - IPOX
Orator from Bellcore

Pavarobotti
WWW demo of the Pavarobotti synthesis technology developed at the National Center for Voice and Speech

Say...
WWW demo of the rsynth speech synthesis software. The WWW capability was implemented by Axel Belinfante.

Musee sonore de la synthese de la Parole en francais
Speech synthesis examples from a series of French language speech synthesisers plus links to other speech synthesis demo pages.

ICP-Grenoble
CNET-Lannion (with TD-PSOLA)
KTH-Stockholm
Universite-Mons - several versions
AT&T Bell Laboratories Voices
WWW interface to the Demo of the Laureate speech synthesis system - not yet commercially available. (this link may be good but it gives odd error messages)

ORATOR from Bellcore
Online demo of the ORATOR system developed at Bellcore.

SVOX from TIK, ETH in Zurich
Demo of German speech synthesis from Institut fur Technische Informatik und Kommunikationsnetze.

Multi-Lingual TTS from Gerhard-Mercator University, Duisburg
Synthesis in German, English or Japanese.

TMH: Institutionen for Taloverforing och Musikakustik, Kungliga Tekniska Hogskolan
Synthesis in Swedish, Finish, Norwegian, Icelandic, Danish, British and American English, French, German, Italian, Spanish, LA Spanish and Greek.

Examples of several types of speech synthesis.
Articulatory Synthesis by HyperASY. SineWave Synthesis. Gestural Computational Model. Pattern Playback system of the 1940's!

BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

Eurovocs Multilingual Speech Synthesis
Based on Lernout and Hauspie technology.

HADIFIX German Speech Synthesis
Provided by the Instituts fur Kommunikationsforschung und Phonetik, Universitat Bonn.

Centigram's TruVoice Demo
Allows control of speech rate, pitch and other prosodic characteristics.

Institute of Phonetic Sciences
Links to lots of on-line speech synthesis demonstrations provided by the Institute of Phonetic Sciences of the Faculty of Arts of the University of Amsterdam.

Yahoo page on speech generation

Q5.5: SPEECH SYNTHESIS SOFTWARE/HARDWARE

Please email any updates, corrections or additions to the following list. The range of commercially available synthesis software is growing rapidly so any help in keeping up to date will be appreciated.

Other lists of speech synthesis software on the WWW include:

Kevin Lenzo's list of Macintosh Speech Resources and Apps

Speech Toys Speech Synthesis Information

IN THE FAQ...

The following speech recognition software/hardware is described in the comp.speech FAQ.
AsTeR
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
TheBigMouth
Creative TextAssist and TextAssist API
CSRE: Computerized Speech Research Environment
DECtalk: Text-to-Speech from Digital
Eloquence
Emacspeak - A Speech Output Subsystem For Emacs
Eurovocs
HADIFIX
Infovox Product Range
IPOX: All Prosodic Speech Synthesis Architecture
JSRU
Klatt-style synthesiser
KPE80 - A Klatt Synthesiser and Parameter Editor
"learph": Trainable text-to-phoneme software by Antonio Lucca
Lernout and Hauspie Text-To-Speech (3 products)
Lernout and Hauspie Text-To-Speech Windows SDK
Macintosh Speech Output Applications
MacinTalk
Monologue for Windows from First Byte
Narrator Translator Library
Narrator
TextToSpeech Kit (NeXT)
Orator from Bellcore
PAM - A Text-To-Speech Application
ProVerbe Speech Engine for Windows
ProVoice Developer's Speech Toolkit from First Byte
RC Systems V8600/V8601 Text to Speech synthesizers
rsynth
SENSYN speech synthesizer
SGI Developers Toolbox Synthesiser
SIMTEL
Sound Bytes DeveloperUs Kit
spchsyn.exe
Speak
Speech Manager and PlainTalk
Text to Phoneme Program 1
Text to phoneme program 2
Text to phoneme program 3
Tinytalk
TrueTalk
TruVoice from Centigram
WinSpeech

AsTeR

Platform: UNIX
Description:
TTS front-end program which encodes structural information about documents in speech synthesis. For more information check out:
http://www.research.digital.com/CRL/personal/raman/aster/aster-toplevel.html

Operation requirements: Lisp: Lucid, clisp
Contact: T. V. Raman
WWW page
Email: raman@adobe.com

BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

Platform: ?
Description: BeSTspeech reads ASCII text no vocabulary limits. Available for Dutch, English (male and female), French, German, Italian, Portuguese, Spanish, Arabic, Cantonese, Japanese, Korean, Malay, Mandarin and Russian.

Price: ?
Contact: Berkeley Speech Technologies, Inc.
2246 Sixth Street, Berkeley, California 94710, USA
Ph: (510) 841-5083, Fax: (510) 841-5093
Email: webmaster@bst.com
WWW

TheBigMouth - a Text to Speech Program

Platform: NeXT
Description: Text to speech program based on concatenation of pre-recorded speech segments. NeXT equivalent of "Speak" for Suns.
Availability: try NeXT archive sites such as sonata.cc.purdue.edu.

Creative TextAssist

Platform: Windows
Description: Based on DECtalk speech synthesis. A detailed technical description of TextAssist is provided on the Creative WWW pages.

Availability: Creative TextAssist is bundled with most (all?) Creative Sound Blaster audio cards.
Contact: Creative Labs, Inc.
Address, phone, email etc unknown
WWW
Info

Creative TextAssist API

Platform: Windows
Description: The TextAssist API (TAAPI) is created for Microsoft Windows 3.1x and Windows 95 developers who intend to develop 16-bit Text-to-Speech software applications using Creative's TextAssist speech engine. It supports direct control of speech output characteristics, concurrent playback of text-to-speech and wave files, foreign language support, speech synchronization, and exception dictionaries. It also includes a voice editing tool for creating new custom voices, a Visual Basic Custom Control for high-level text-to-speech support in Visual Basic and other languages and some sample programs.

Availability: The TextAssist API is released to registered developers at no cost.
Contact: WWW

CSRE: Computerized Speech Research Environment

Platform: PC
Description: CSRE is a software system which includes in an implementation of the Klatt speech synthesizer. See the CSRE entry in Q1.9 and the AVAAZ WWW pages for more detail.

Contact: AVAAZ Innovations Inc.
P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G 2B0
Ph: +1-519-472-7944 , Fax: +1-519-472-7814
Email: info@avaaz.com
WWW

DECtalk Speech Synthesis

Platform: Windows NT, Alpha with Digital UNIX and RS232 ports
Description:
Converts ordinary text into natural-sounding, intelligible speech. Provides personalized voices, and extensive user controls. DECtalk technology is available for the following packaging options.

DECtalk PC card option:
An industry-standard ISA/EISA bus card implementation that can be integrated with any Intel 486 processor-based system running DOS or Windows. Applications can be interfaced to the bus via a DOS Terminate and Stay Resident (TSR) driver or a Windows Dynamic Link Library (DLL). This option is available with an external speaker with volume control and headphone jack.

DECtalk Express external package:
An external, portable package that you can plug in to any PC or serial port. The external package includes a built-in speaker and headphone jack, plus combined on/off and volume controls and a rechargeable battery pack.

DECtalk Software solution:
Software-only text to speech for Alpha or Intel systems running Windows NT or Alpha systems running Digital UNIX. Provides complete speech synthesis capabilities so developers can enhance applications with DECtalk technology. DECtalk Software output can be directed to audio devices, into WAVE files, or into memory buffers.

Pricing:DECtalk-Speech-Synthesis
More Information:
Digital Equipment Corporation WWW pages:
Ph: 1-800-DIGITAL

DECtalk Software

Platform: Digital UNIX and Windows NT
Description:
DECtalk converts standard ASCII text into natural, intelligible speech. Speech output through any audio device is supported by Microsoft Video for Windows or Multimedia Services for Digital UNIX. An API gives developers direct access to text-to-speech functions. Provides nine voice personalities (4 female, 4 male, 1 child). Provides punctuation and tonal control, supports customized pronunciation of trade jargon and acronyms. Common programming interface works with both Alpha and Intel platforms.

More Information:
Digital Equipment Corporation WWW pages:
DECtalk Software page:
Ph: 1-800-DIGITAL

Eloquence

Platform: Windows, Solaris, SunOS, SGI, RS/6000
Description:
Software based text-to-speech package. Generates waveforms completely algorithmically instead of by concatenating waveforms, for maximum flexibility and naturalism. For instance, when the user requests a deeper voice, the software simulates a larger vocal tract, instead of simply pitch-shifting samples.

Uses high-level linguistic parsing, which obviates the need for a huge dictionary. Handles numbers, acronyms, currency, etc. Includes a set of annotation symbols, for placing stress on particular words, expressing excitement/boredom, etc. Also allows phonetic input. Support for Windows DDL.

Produces male and female voices for General American English. Dialects under development include Alabama, Brooklyn, and Boston.

Price:
Flexible license agreements on application.
Availability:
Eloquent Technology, Inc.
2389 North Triphammer Road
Ithaca, NY 14850
Ph: (607) 607-266-7020 Fax: (607) 607-266-7030
Email: eti@plab.dmll.cornell.edu

Emacspeak - A Speech Output Subsystem For Emacs

Platform: UNIX, Emacs
Description:
Emacspeak is a speech output system that will allow someone who cannot see to work directly on a UNIX system. Emacspeak is built on top of Emacs. With emacspeak loaded, Emacs provides spoken feedback for everything you do. Emacspeak currently supports the new Dectalk Express speech synthesizer, as well as older versions of the Dectalk e.g. the MultiVoice. See the Emacspeak WWW page, the Emacspeak FAQ or the Emacspeak distribution for additional details.

Requirements:
Requires GNU FSF Emacs 19 (version 19.23 or later) and TCLX 7.3B (Extended TCL) to run Emacspeak.
Availability:
Not known at this time (web sites are gone)
Contact: T. V. Raman, raman@adobe.com

Eurovocs

Platform: Various - RS232 Connection
Description:
Eurovocs is a stand-alone text-to-speech synthesizer which uses the text-to-speech technology of Lernout and Hauspie Speech Products. Available for Dutch, French, German and American English with other languages planned for release soon. One Eurovocs device can support two different languages. Eurovocs can be connected to any computer via a standard serial interface (RS232). It supports personal dictionaries, generation of DTMF tones, and pronunciation of special character sequences such as digit strings, telephone-numbers, date and time indications, abbreviations, alphanumeric strings etc.

Contact:
Technologie & Revalidatie
Postbus 128, B-9000 Gent, Belgium
Ph: +32-9-264 33 97, Fax: +32-9-264 35 94
E-mail: noe@elis.rug.ac.be WWW page:

HADIFIX

Platform: Windows
Description:
German speech synthesis system developed at the Institute for Communications Research and Phonetics , University of Bonn. Provides conversion of input text to phonemes, automatic prediction of stress, phrasing and pitch, and speech generation by concatenation of small units of natural speech. Demisyllables and similar units are used; they comprise all consonants before the vowel and the beginning of the vowel (initial demisyllable) or the end of the vowel and the following consonants (final demisyllable). For example, the word 'Strolch' is formed by concatenating 'Stro' and 'olch'.

Demo:
Windows demo software available. Limited to synthesis of one short text (text.txt) at a time. Speech format limitations too. 1.3MB file.
WWW page
On-line demo

Infovox Product Range

Description:
Multilingual Text-to-speech systems, languages available: American English, British English, German, French, Spanish, Italian, Swedish, Norwegian, Icelandic, Danish and Finnish.

Product name:INFOVOX 500, PC BOARD
Product description: Half length expansion board for IBM PC, XT, AT, PS/2 model 30 or compatible personal computers. The board can also be connected via the serial port. Language and control program for downloading into RAM or mounted on EPROMs

  • Platform: for IBM PC, XT, AT, PS/2 model 30 or compatible
  • Delivered standard interface: MS DOS I/O driver
Product name: INFOVOX 600, OEM BOARD
Product description: OEM board built with CMOS IC's. Language and control program are stored in on-board fixed memory.

  • Platform: any, Interface: 9-pole D-SUB (RS 232-C) 300-9600 Baud.
  • Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech manager.
Product name: INFOVOX 700, DESKTOP UNIT
Product description: Desktop unit with built in Infovox 600 to be connected to any computer or terminal via an RS 232-C serial interface. Built in loudspeaker and rechargable battery for 4 hours use, and control knobs for continuous control of speech volume and speed.

  • Platform: any
  • Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech manager
Product name: INFOVOX 650, OEM BOARD
Product description: OEM-board built with CMOS IC's. Language and control program are stored in on-board memory.
  • Platform: any, Interface: 9 pole D-SUB (RS 232-C) 300-9600 Baud
  • Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech manager
Product name: INFOVOX 750, DESKTOP UNIT
Product description: Desktop unit with built in Infovox 650 to be connected to any computer or terminal via an RS 232-C serial interface. Built in loudspeaker and rechargable battery for 5 hours use, and a control knob for continuous control of speech volume.

  • Platform: any
  • Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech manager
Product name: Infovox 210, software for Apple Macintosh
Product description: Software based text-to-speech conversion. Produces 16 bit and 8 bit sound. Delivered on 3.5" diskettes with user lexicon and a complete documentation.

  • Platform: Apple Macintosh with minimum 68030, 33 MHz microprocessor.
  • Delivered standard interfaces: Standard interface to Apple Speech manager
Product name: Infovox 220, software for Microsoft Windows.
Product description: Software based text-to-speech conversion. Produces 16 bit sound and conforms to Microsoft Windows multimedia standard MCI. Delivered on 3.5" diskettes with user lexicon and a complete documentation.

  • Platform: IBM compatible PC with minimum 486, 25 MHz microprocessor.
  • Delivered standard interfaces: Standard interface to Microsoft Windows 3.1 and sound boards supporting Microsoft Windows multimedia driver for audio.
Contact:
Telia Promotor Infovox AB
TTS Sales Division
P.O. Box 2069
S-171 02 Solna, Sweden
Ph: +46 8 764 35 00 Fax: +46 8 735 78 76
email: tts-sales@infovox.se

IPOX: All Prosodic Speech Synthesis Architecture

Description:
IPOX is an experimental, all-prosodic speech synthesizer, developed by Arthur Dirksen and John Coleman. IPOX is freely available (after registration) for evaluation and non-profit research purposes.

Requirements:
PC (preferably a fast 486) running Windows 3.1 or higher. Sound output requires a 16-bit Windows-compatible sound card
Availability: By WWW

JSRU

Platform: UNIX and PC
Cost:
100 pounds sterling (from academic institutions and industry)
Description:
A "C" version of the JSRU system, Version 2.3 is available. It's written in Turbo C but runs on most Unix systems with very little modification. A Form of Agreement must be signed to say that the software is required for research and development only.

Contact:
Dr. E.Lewis eric.lewis@bristol.ac.uk

Klatt-style synthesiser

Platform: Unix
Cost: Free
Description:
Software posted to comp.speech in late 1992.
Availability:
By ftp from the comp.speech ftp site

KPE80 - A Klatt Synthesiser and Parameter Editor

Platform: Unix
Description:
The KPE80 program provides a graphical interface for the implementation of the Klatt 1980 formant synthesiser written by Jon Iles and Nick Ing-Simmons. It was inspired by IGE, a piece of code written by Rob Fletcher.

Technical Desc.:
It is comprised of an X-Window interface and version 3.03 of the synthesiser code. The interface allows users to display and edit Klatt parameters using a graphical display which includes the time-amplitude waveform of both the original speech and its synthetic copy, and some signal analysis facilities. Most of the work in choosing the parameter values to produce the synthetic copy has to be done by the user. KPE will estimate the fundamental frequency contour from an original token; this estimate will need to be amended where errors occur. It is possible to specify the formant trajectories with some precision by overlaying the appropriate formant frequency parameter tracks on the spectrogram of the target waveform. A number of facilities exist to help in the refinement of parameter values: original and synthetic waveforms can be compared aurally, spectrally, and spectrographically using built-in speech analysis facilities.

File formats:
KPE will read RIFF (.wav) files and SFS files. (SFS is a suite of speech-signal processing programs available free from Phonetics and Linguistics, UCL.)
Availability:
See also: Public domain Klatt-style speech synthesis code.
Contact: Andrew Simpson
Department of Phonetics and Linguistics, University College London
Wolfson House, 4 Stephenson Way, London NW1 2HE
Email: a.simpson@ucl.ac.uk
WWW page

"learph": Trainable text-to-phoneme software by Antonio Lucca

Platform: UNIX
Description: Experimental software which learns text to phoneme translation from examples using decision-tree-like data structures. It is based on the assumption that each letter can correspond to different phoneme strings depending on the context.

Availability: Examples and source are available on the WWW
Contact: Antonio Lucca: lucca@ghost.dsi.unimi.it

Lernout & Hauspie Text-to-Speech (3 products)

Lernout & Hauspie have three TTS products. The functionality of the products is similar, however, they differ in hardware implementation and other details where described below.
L&H tts2000/T: TTS for the Telephony and Telecommunications Market
L&H tts2000/M: TTS for the Computer and Multimedia Market
L&H tts3000/C: TTS for the Buisness and Consumer Electronics Market
Description:
Text to Speech (TTS) software based on parameterized segment concatenation (diphones, triphones and tetraphones) algorithms. Available for US English, German, Dutch, French, Spanish (Castilian), Italian and Korean.
General features include:
  • The control of volume, speech rate and speech pitch.
  • The use of control sequences to customize TTS output (adding pauses, using phonetic input, etc.).
  • Switching between languages at run time.
  • A personal vocabulary editor is available for building exception dictionaries.
  • Readout modes: letter by letter, word by word or sentence by sentence.
  • Input formats: orthographic input, phonetic input, phonetic input with prosodic information.
tts2000/T
  • Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PCM.
  • Sampling Frequency: 8kHz
  • Single channel platform examples: SHARP SH7000, ARM6/ARM7, Intel i960, TI TMS320C31, AT&T DSP3210
  • Multi-channel platform examples: TI TMS320C31, AT&T DSP3210
tts2000/M
  • Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PC.
  • Sampling Frequency: 8/10/11.025 kHz
  • Single processor platform examples: ARM6/ARM7, Intel 386/486/Pentium, Motorola 68040
  • Two processor platform examples: {Intel 386/486/Pentium or Motorola 68030} and {ADI ADSP21XX or Motorola 5600X or TI TMS320C25/20C5X}
tts3000/C
  • Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PCM.
  • Sampling Frequency: 10kHz
  • Single processor platform examples: SHARP SH7000, ARM6/ARM7, Intel i960, TI TMS320C31, AT&T DSP3210
  • Two processors platform examples: { SHARP SH7000 or ARM6/ARM7 or Intel 386EX or Motorola 683XX} and {ADI ADSP21XX or Motorola 5600X or TI TMS320C25/C5X or TI TSP50C10}
See also: L&H Windows TTS SDK
More Information: on the Lernout & Hauspie WWW pages
Price: Unknown
Contact: Lernout & Hauspie Speech Products
800 West Cummings Park, Suite 3100
Woburn, MA 01801, USA
Tel: (617) 932 4118
Fax: (617) 932 9209
Email: sales@lhs.com
WWW

Lernout & Hauspie Text-to-Speech Windows SDK

Platform: IBM-Compatible
Description: The L&H Text-to-Speech software developers kit is able to integrate text-to-speech technology with your own or existing PC applications under Microsoft Windows 3.1. This software will allow conversion of written text into clear human sounding synthetic speech.

Requirements:
  • IBM-compatible PC 386 DX/33, 8Mb RAM
  • MS DOS 5.0 and MS Windows 3.1 (or higher)
  • SoundBlaster compatible sound board.
See also: L&H TTS Products
More Information: on the Lernout & Hauspie WWW pages
Price: Unknown
Contact: Lernout & Hauspie Speech Products
800 West Cummings Park, Suite 3100
Woburn, MA 01801, USA
Tel: (617) 932 4118
Fax: (617) 932 9209
Email: sales@lhs.com, WWW page

Macintosh Speech Output Applications

A comprehensive list of Macintosh Speech Applications is provided by Kevin Lenzo at CMU
The Apple Speech WWW Site has some useful information

MacinTalk

Platform: Macintosh
Cost: Free
Description: Formant based speech synthesis. There is also a program called "tex-edit" which apparently can pronounce English sentences reasonably using Macintalk.

Note: MacinTalk doesn't run reliably on Macintosh's with new sound hardware under the lastest OS (System 7.1 w/HUD 2.0). More recent software is listed above.
Availability:
By anonymous ftp from many archive sites (have a look on archie if you can). tex-edit is on many of the same sites.
This article by my friend Denise Lance will give you some ideas on the more modern speech offerings of Apple/Macintosh. When you have finished reading the article (there are some appropriate notes to read) you can also download English_Text-to-Speech from there.

Monologue for Windows from First Byte

Description:
Monologue is a software program that reads text from the clipboard in Windows 16 or 32 bit applications. It can be found as a bundled product with many sound cards and multimedia general purpose computer systems. Monologue can add the element of speech to virtually any text oriented application. Any pronounceable combination of letters and numbers will be spoken clearly. It can be applied to tasks such as eyes-free proofreading, data verification (e.g. spreadsheets), reading E-mail and more. User-changeable parameters provide control over the sound quality by allowing for changes in pitch, and the speed of speech. An exception dictionary saves preferred pronunciation of words and abbreviations.

Monologue Win32 now includes support for the Microsoft SAPI. Monologue male "SpeechFonts" are available for US English, British English, German, French, Latin American Spanish, Italian. A US English Female SpeechFont is also available. For more detailed information and examples go to the First Byte WWW pages.

Availability: Currently bundled with many sound cards and multimedia general purpose computer systems. For pricing, licensing details, and release information see the First Byte WWW pages or email info@firstbyte.davd.com.

See also: ProVoice Developer's Speech Toolkit from First Byte
Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610 Fax: 310-793-0611
Email: info@firstbyte.davd.com or WWW page

Narrator Translator Library

Platform: Amiga
Description:
A replacement for the Commodore-supplied "translator.library" which is a part of the Narrator speech synthesis package. It implements multi-lingual text-to-speech for an Amiga. The library allows the user to specify the language the text to be spoken should be translated as. This can be done by setting the default language or by including markup codes in the text in a similar way to Latex or Html. eg: "\french{Bonjour}". There is currently support for American English, British English, Swedish, Maori, Finnish, German, Icelandic, Klingon, Polish, Italian, and Welsh.P
Availability:
The library (but not source) is available by anonymous ftp from Aminet
More Information: is available on the WWW

Narrator

Platform: Amiga
Description:
Formant based speech synthesis. Includes a Engish-to-phoneme translation library, and a SPEAK: pseudo-device for speech output.
Hardware: Standard Amiga hardware
Availability: Part of AmigaOS
See Also: The Narrator Translation library

TextToSpeech Kit

Platform: NeXT Computers
Description:
The TextToSpeech Kit does unrestricted conversion of English text to synthesized speech in real-time. The user has control over speaking rate, median pitch, stereo balance, volume, and intonation type. Text of any length can be spoken, and messages can be queued up, from multiple applications if desired. Real-time controls such as pause, continue, and erase are included. Pronunciations are derived primarily by dictionary look-up. The Main Dictionary has nearly 100,000 hand-edited pronunciations which can be supplemented or overridden with the User and Application dictionaries. A number parser handles numbers in any form. A letter-to-sound knowledge base provides pronunciations for words not in the Main or customized dictionaries. Dictionary search order is under user control. Special modes of text input are available for spelling and emphasis of words or phrases. The actual conversion of text to speech is done by the TextToSpeech Server. The Server runs as an independent task in the background, and can handle up to 50 client connections.

Misc:
The TextToSpeech Kit comes in two packages: the Developer Kit and the User Kit. The Developer Kit enables developers to build and test applications which incorporate text-to-speech. It includes the TextToSpeech Server, the TextToSpeech Object, the pronunciation editor PrEditor, several example applications, phonetic fonts, example source code, and developer documentation. The User Kit provides support for applications which incorporate text-to-speech. It is a subset of the Developer Kit.

Hardware:
Uses standard NeXT Computer hardware.
Cost:
  • TextToSpeech User Kit: $175 CDN ($145 US)
  • TextToSpeech Developer Kit: $350 CDN ($290 US)
  • Upgrade from User to Developer Kit: $175 CDN ($145 US)
Availability: Trillium Sound Research
1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
Tel: (403) 284-9278 Fax: (403) 282-6778
Order Desk: 1-800-L-ORATOR (US and Canada only)
Email: TTSInfo@trillium.ab.ca

Orator Text-to-Speech Synthesizer

Platform: SUN SPARC, Decstation 5000. Written in C, and therefore portable to other UNIX platforms. Some successful ports: --> HP, RS-6000, PC-Unix [Linux].
Description:
Sophisticated speech synthesis package. Has text preprocessing (for abbreviations, numbers), acronym rules, and human-like spelling routines. Natural-sounding synthesis based on demisyllable concatenation. Has high accuracy for pronunciation of names of people, places and businesses in America; good accuracy for English text; rules for stress and intonation marking; various methods of user control and customization at most stages of processing.

A new version of the ORATOR system is under development. Both ORATOR and this new "ORATOR II" system are capable of general text synthesis. The ORATOR II system has a more natural-sounding voice.

Hardware: Runs on common SPARC or Decstation workstations, using their internal audio output capability. Recommend at least 16M of memory.

More detailed information plus examples of ORATOR synthesis are available on the ORATOR WWW pages

Misc 1: A free demo cassette is available.

Misc 2: Examples of Orator are also available on the University of Birmingham Speech Synthesis "Museum" WWW site (see Q5.4).

Availability and Pricing: Contact Bellcore's Licensing Office
Tel: 1-800-521-CORE (521-2673)
Fax: 1-908-336-2559
Email to Anthony Lindsey: alin1@panix.com

PAM - A Text-To-Speech Application

Platform: Windows
Description:
PAM is a talking personal assistant and text reader application. It uses the ProVoice TTS package. PAM will verbally advise about appointments and reminder messages at specified times during the day. It can read text files, clipboard text, and text sent in DDE messages. Using the full verbal interface, PAM can be used by visually challenged individuals. Shareware - thirty day free trial.

Requirements: Any Windows sound card, speakers or headphones.
Min. memory - 4 megs, 8 megs recommended.
A more complete description is available on the JTS homepage
Availability:
The shareware and associated files can be downloaded by ftp
Price: $US40 for the registered version.
Contact: Tom Slemko:tslemko@islandnet.com
JTS Micro Consulting Ltd
10931 Lytton Road, RR#4
Ladysmith, B.C., Canada, V0R 2E0

ProVerbe Speech Engine for Windows (95 and NT)

Description: The ProVerbe Speech Engine produces natural sounding speech from written text. Naturalness is achieved by using the TD-PSOLA process from the CNET (France telecom's research lab.) which is based on the concatenation of elementary speech units (including diphones). Supported languages are British English, German, French and Spanish. For multi-channel applications Elan Informatique also provides hardware platforms. The Elan Informatique provides a SDK reference document (sdken.exe: WinWord6 format in a self extractable compressed format).

Demo versions:
The directory includes the following demos.
  • PVBSEDP.zip: French male voice (4.3MB)
  • PVBFRF.zip: French female voice (4.6MB)
  • PVBSPA.zip: Spanish male voice (4.6MB)
  • PVBGER.zip: German male voice (14.0MB)
  • PVBENG.zip: English male voice (9.9MB)
The directory also includes synthesis samples for a French male voice, French female voice, English male voice, and a German male voice. The readme file in the directory describes the memory requirements for the demos.
A CD-ROM with all these demonstrations is available. To request it, please email Elan Informatique.

Contact: Elan Informatique
4 rue Jean Rodier, 31400 TOULOUSE FRANCE
Contact person: Pierre Delrat
Phone: +33-61-36-0777 Fax: +33-61-36-0770
BBS: +33-61-36-0788
E-mail: 101346.465@compuserve.com
Anonymous FTP

ProVoice Developer's Speech Toolkit from First Byte

Platform: ProVoice Developer's Toolkits are available for DOS, Windows 3.1, Windows 95, Windows NT, OS/2, and Macintosh.
Description:
ProVoice allows programmers to add synthesized speech to their applications. Your program passes text strings to the ProVoice speech engine that translates text into audible speech. Male and/or female "SpeechFonts" are available for many languages; English, French, German, UK British English, Italian, and Spanish.

ProVoice converts text to speech in two phases using a set of phonetic translation and pronunciation rules. First, the software analyzes and translates text into "sound descriptors", a phonetic language with pitch, duration, and amplitude codes which are needed to produce stress patterns in phrases and sentences. Rules are used to analyze words, numbers, and punctuation. The second phase converts the intermediate phonetic language in speech signals; algorithms drive distinct speech signals into smooth flowing, continuous, clear speech. Real time synchronization of mouth movement and word boundaries allows animation of a graphical talking character, or highlighting of displayed text as it is spoken.

Necessary tools and examples are provided for programmers to manipulate the ProVoice speech technology; including installation instructions, extensive samples programs, and complete documentation. In addition, sample code is provided on disk to illustrate speech programming techniques.

Note 1: First Byte will perform custom work for embedded systems.

Note 2: ProVoice Windows includes support for the Microsoft SAPI. It will speak through any Windows-supported wave audio device.

Note 3: Distribution of ProVoice for commercial use is subject to execution of a Commercial Product Distribution License Agreement.

For more detailed information and examples go to the First Byte WWW page.
See also: Monologue for Windows from First Byte

Price and Availability:
Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610, Fax: 310-793-0611
Email: info@firstbyte.davd.com or WWW page.

RC Systems V8600/V8601 Text to Speech synthesizers

Platform 1: IBM PC: ISA card.
Platform 2: Interface to PC/104 standard microcontrollers.
Platform 3: Standalone (or embedded) thru RS232 or parallel printer port or processor bus.
Description: Converts plain ASCII text to speech. Programmable voices, pitch rate, volume, etc. Built-in DTMF and tone generators.
Price: $151-$299 US (qty 1)
Contact: RC Systems
1609 England Avenue, Everett, WA 98203, USA
Ph: (206) 355-3800 Fax: (206) 355-1098
Europe: +44181 539-0285

rsynth

Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI Irix4.x, Linux)
Description: Public domain text-to-speech systm assembled from a variety of sources. It supports CMU and BEEP format dictionaries (as described in Q1.10) and now utilises stress marks in the dictionary in synthesising intonation.

Price: Free
Misc: Axel Belinfante has implemented a WWW rsynth demo
Availability: anonymous ftp #1 or anonymous ftp #2

SENSYN speech synthesizer

Platform: PC, Mac, Sun, and NeXt
Rough Cost: $300
Description:
This formant synthesizer produces speech waveform files based on the (Klatt) KLSYN88 synthesizer. It is intended for laboratory and research use. Note that this is NOT a text-to-speech synthesizer, but creates speech sounds based upon a large number of input variables (formant frequencies, bandwidths, glottal pulse characteristics, etc.) and would be used as part of a TTS system. Includes full source code.

Availability: Sensimetrics Corporation
64 Sidney Street, Cambridge MA 02139.
Fax: (617) 225-0470; Tel: (617) 225-2442.
Email: sensimetrics@sens.com

SGI Developers Toolbox Synthesiser

Platform: SGI
Description: The SGI Developer Toolbox 4.0 CDROM contains a basicpublic domain text-to-speech program in the publics/speak directory. The directory includes man pages and source.

Availability: on the SGI Developer Toolbox 4.0 CDROM

SIMTEL

A wide range of speech related software, sound-blaster software and signal processing software for PCs is available on SimTel and its mirror sites. It can be obtained by ftp from:

Note: Voicemaker - The archives include the program Voicemaker which synthesises speech.


GOOD HUNTING AND ENJOY!

Top | ACSP Home | SuperAdaptoid Column