Software Dreams and Talking Machines
Speech and Speech Recognition
Resources on the World Wide Web
[The following materials were gathered from the web
during the month of May 1996. This article has
been in periodic update from 1993-1996.]
Author: Andrew Hund
This is a REPORT. SUPERADAPTOID does not REVIEW
products that have not been personally evaluated
by DEMONSTRATION.
Report Presented By: SUPERADAPTOID
The following article outlines the scope of Comp.Speech FAQ Postings.
Institutional, research, and business resources available in the web.
These business and product listings are not complete. However, this represents the Better-Of-The-Best of lists.
COMP.SPEECH FAQ POSTING - PART 3/3
Text By Andrew Hunt
FAQ SECTION 5 - SPEECH SYNTHESIS
- SpeechLinks: Speech Synthesis
- Q5.1: What is speech synthesis?
- Q5.2: How can speech synthesis be performed?
- Q5.3: References/Books on Synthesis
- Q5.4: Speech Synthesis on the WWW
- Q5.5: Speech Synthesis Software/Hardware
Q5.1: WHAT IS SPEECH SYNTHESIS?
Speech synthesis is the task of transforming written input to spoken
output. The input can either be provided in a graphemic/orthographic
or a phonemic script, depending on its source.
Could someone provide a more informative description?
Q5.2: PERFORMING SPEECH SYNTHESIS
There are several algorithms. The choice depends on the task they're
used for. The easiest way is to just record the voice of a person
speaking the desired phrases. This is useful if only a restricted
volume of phrases and sentences is used, e.g. messages in a train
station, or schedule information via phone. The quality depends on the
way recording is done.
More sophisticated but worse in quality are algorithms which split the
speech into smaller pieces. The smaller those units are, the less are
they in number, but the quality also decreases. An often used unit is
the phoneme, the smallest linguistic unit. Depending on the language
used there are about 35-50 phonemes in western European languages,
i.e. there are 35-50 single recordings. The problem is combining them
as fluent speech requires fluent transitions between the elements. The
intellegibility is therefore lower, but the memory required is small.
A solution to this dilemma is using diphones. Instead of splitting at
the transitions, the cut is done at the center of the phonemes,
leaving the transitions themselves intact. This gives about 400
elements (20*20) and the quality increases.
The longer the units become, the more elements are there, but the
quality increases along with the memory required. Other units which
are widely used are half-syllables, syllables, words, or combinations
of them, e.g. word stems and inflectional endings.
Q5.3: REFERENCES/BOOKS ON SYNTHESIS
BOOKS AND PAPERS
- * Douglas O'Shaughnessy, Speech Communication: Human and Machine
Addison Wesley series in Electrical Engineering: Digital Signal
Processing, 1987.
- * D. H. Klatt, "Review of Text-To-Speech Conversion for English",
Jnl. of the Acoustic Society of America (JASA), Vol 82, pp
737-793.
- * "Talking Machines, Theories, Models and Designs" Eds, G. Bailly
& C. Benoit (Elsevier: North Holland)
- * I. H. Witten. Principles of Computer Speech, London: Academic
Press, Inc., 1982.
- * W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and
Synthesis, Elsevier, Amsterdam, 1995.
- * John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to
Speech: The MITalk System", Cambridge University Press, 1987.
- Survey of the State of the Art in Human Language Technology
Report edited by Ronald A. Cole et. al. with a section on
Text-to-Speech Technologies.
BIBLIOGRAPHIES AND REFERENCE LISTS
- WWW searchable online-bibiliography for Phonetics and Speech
Technology with more than 8000 entries.
- Provided by Institut fur
Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.
- Computational Speech Processing
- Speech Analysis, Recognition, Understanding, Compression, Transmission, Coding, Synthesis ;
Text to Speech Systems, Speech to Tactile Displays, Speaker Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA
inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
- See also: http://gomer.mlink.net/infolingua.html
Q5.4: SPEECH SYNTHESIS ON THE WWW
Most of the following are links to WWW pages with demonstrations of
speech synthesis. Plenty more links are included in the detailed list
of speech synthesis software/hardware in Q5.5.
Speech Synthesis "Museum"
URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
Maintained by Jon Iles (j.p.iles@cs.bham.ac.uk) at the
University of Birmingham.
Information and speech samples for
- YorkTalk
- Loughborough Sound Images
- University of Birmingham - FDFS
- Eurovocs
- DECtalk
- AT&T Bell Labs Synthesiser
- S.W.A.Ll.C. - Welsh Synthesis from CSTR
- All-Prosodic Speech Synthesis - IPOX
- Orator from Bellcore
Pavarobotti
WWW demo of the Pavarobotti synthesis technology developed at the
National Center for Voice and Speech
Say...
WWW demo of the rsynth speech synthesis software. The WWW capability was implemented by Axel Belinfante.
Musee sonore de la synthese de la Parole en francais
Speech synthesis examples from a series of French language speech synthesisers plus links to other speech synthesis demo pages.
ICP-Grenoble
CNET-Lannion (with TD-PSOLA)
KTH-Stockholm
Universite-Mons - several versions
AT&T Bell Laboratories Voices
WWW interface to the Demo of the Laureate speech synthesis system - not yet commercially available. (this link may be good but it gives odd error messages)
ORATOR from Bellcore
Online demo of the ORATOR system developed at Bellcore.
SVOX from TIK, ETH in Zurich
Demo of German speech synthesis from Institut fur Technische Informatik und Kommunikationsnetze.
Multi-Lingual TTS from Gerhard-Mercator University, Duisburg
Synthesis in German, English or Japanese.
TMH: Institutionen for Taloverforing och Musikakustik, Kungliga Tekniska Hogskolan
Synthesis in Swedish, Finish, Norwegian, Icelandic, Danish, British and American English, French, German, Italian, Spanish, LA Spanish and Greek.
Examples of several types of speech synthesis.
Articulatory Synthesis by HyperASY. SineWave Synthesis. Gestural Computational Model. Pattern Playback system of the 1940's!
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
Eurovocs Multilingual Speech Synthesis
Based on Lernout and Hauspie technology.
HADIFIX German Speech Synthesis
Provided by the Instituts fur Kommunikationsforschung und Phonetik, Universitat Bonn.
Centigram's TruVoice Demo
Allows control of speech rate, pitch and other prosodic characteristics.
Institute of Phonetic Sciences
Links to lots of on-line speech synthesis demonstrations provided by the Institute of Phonetic Sciences of the Faculty of Arts of the University of Amsterdam.
Yahoo page on speech generation
Q5.5: SPEECH SYNTHESIS SOFTWARE/HARDWARE
Please email any updates, corrections or additions to the following
list. The range of commercially
available synthesis software is growing rapidly so any help in keeping
up to date will be appreciated.
Other lists of speech synthesis software on the WWW include:
- Kevin Lenzo's list of Macintosh Speech Resources and Apps
- Speech Toys Speech Synthesis Information
IN THE FAQ...
The following speech recognition software/hardware is described in the
comp.speech FAQ.
- AsTeR
- BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
- TheBigMouth
- Creative TextAssist and TextAssist API
- CSRE: Computerized Speech Research Environment
- DECtalk: Text-to-Speech from Digital
- Eloquence
- Emacspeak - A Speech Output Subsystem For Emacs
- Eurovocs
- HADIFIX
- Infovox Product Range
- IPOX: All Prosodic Speech Synthesis Architecture
- JSRU
- Klatt-style synthesiser
- KPE80 - A Klatt Synthesiser and Parameter Editor
- "learph": Trainable text-to-phoneme software by Antonio Lucca
- Lernout and Hauspie Text-To-Speech (3 products)
- Lernout and Hauspie Text-To-Speech Windows SDK
- Macintosh Speech Output Applications
- MacinTalk
- Monologue for Windows from First Byte
- Narrator Translator Library
- Narrator
- TextToSpeech Kit (NeXT)
- Orator from Bellcore
- PAM - A Text-To-Speech Application
- ProVerbe Speech Engine for Windows
- ProVoice Developer's Speech Toolkit from First Byte
- RC Systems V8600/V8601 Text to Speech synthesizers
- rsynth
- SENSYN speech synthesizer
- SGI Developers Toolbox Synthesiser
- SIMTEL
- Sound Bytes DeveloperUs Kit
- spchsyn.exe
- Speak
- Speech Manager and PlainTalk
- Text to Phoneme Program 1
- Text to phoneme program 2
- Text to phoneme program 3
- Tinytalk
- TrueTalk
- TruVoice from Centigram
- WinSpeech
AsTeR
- Platform: UNIX
- Description:
- TTS front-end program which encodes structural
information about documents in speech synthesis. For more information check out:
- http://www.research.digital.com/CRL/personal/raman/aster/aster-toplevel.html
- Operation requirements: Lisp: Lucid, clisp
- Contact: T. V. Raman
- WWW page
- Email: raman@adobe.com
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
- Platform: ?
- Description: BeSTspeech reads ASCII text no vocabulary limits.
Available for Dutch, English (male and female), French, German,
Italian, Portuguese, Spanish, Arabic, Cantonese, Japanese, Korean,
Malay, Mandarin and Russian.
- Price: ?
- Contact: Berkeley Speech Technologies, Inc.
2246 Sixth Street, Berkeley, California 94710, USA
Ph: (510) 841-5083, Fax: (510) 841-5093
- Email: webmaster@bst.com
- WWW
TheBigMouth - a Text to Speech Program
- Platform: NeXT
- Description: Text to speech program based on concatenation of
pre-recorded speech segments. NeXT equivalent of "Speak" for Suns.
- Availability: try NeXT archive sites such as
sonata.cc.purdue.edu.
Creative TextAssist
- Platform: Windows
- Description: Based on DECtalk speech synthesis. A detailed technical description of TextAssist is provided on the Creative WWW pages.
- Availability: Creative TextAssist is bundled with most (all?) Creative Sound Blaster audio cards.
- Contact: Creative Labs, Inc.
Address, phone, email etc unknown
WWW
Info
Creative TextAssist API
- Platform: Windows
- Description: The TextAssist API (TAAPI) is created for Microsoft
Windows 3.1x and Windows 95 developers who intend to develop
16-bit Text-to-Speech software applications using Creative's
TextAssist speech engine. It supports direct control of speech
output characteristics, concurrent playback of text-to-speech and
wave files, foreign language support, speech synchronization, and
exception dictionaries. It also includes a voice editing tool for
creating new custom voices, a Visual Basic Custom Control for
high-level text-to-speech support in Visual Basic and other
languages and some sample programs.
- Availability: The TextAssist API is released to registered
developers at no cost.
- Contact: WWW
CSRE: Computerized Speech Research Environment
- Platform: PC
- Description: CSRE is a software system which includes in an
implementation of the Klatt speech synthesizer. See the CSRE entry
in Q1.9 and the AVAAZ WWW pages for more detail.
- Contact: AVAAZ Innovations Inc.
P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G 2B0
Ph: +1-519-472-7944 , Fax: +1-519-472-7814
Email: info@avaaz.com
WWW
DECtalk Speech Synthesis
- Platform: Windows NT, Alpha with Digital UNIX and RS232 ports
- Description:
- Converts ordinary text into natural-sounding,
intelligible speech. Provides personalized voices, and extensive
user controls. DECtalk technology is available for the following
packaging options.
- DECtalk PC card option:
- An industry-standard ISA/EISA bus
card implementation that can be integrated with any Intel 486
processor-based system running DOS or Windows. Applications
can be interfaced to the bus via a DOS Terminate and Stay
Resident (TSR) driver or a Windows Dynamic Link Library
(DLL). This option is available with an external speaker with
volume control and headphone jack.
- DECtalk Express external package:
- An external, portable
package that you can plug in to any PC or serial port. The
external package includes a built-in speaker and headphone
jack, plus combined on/off and volume controls and a
rechargeable battery pack.
- DECtalk Software solution:
- Software-only text to speech for
Alpha or Intel systems running Windows NT or Alpha systems
running Digital UNIX. Provides complete speech synthesis
capabilities so developers can enhance applications with
DECtalk technology. DECtalk Software output can be directed
to audio devices, into WAVE files, or into memory buffers.
- Pricing:DECtalk-Speech-Synthesis
- More Information:
- Digital Equipment Corporation WWW pages:
Ph: 1-800-DIGITAL
DECtalk Software
- Platform: Digital UNIX and Windows NT
- Description:
- DECtalk converts standard ASCII text into natural,
intelligible speech. Speech output through any audio device is
supported by Microsoft Video for Windows or Multimedia Services
for Digital UNIX. An API gives developers direct access to
text-to-speech functions. Provides nine voice personalities (4
female, 4 male, 1 child). Provides punctuation and tonal control,
supports customized pronunciation of trade jargon and acronyms.
Common programming interface works with both Alpha and Intel
platforms.
- More Information:
- Digital Equipment Corporation WWW pages:
DECtalk Software page:
Ph: 1-800-DIGITAL
Eloquence
- Platform: Windows, Solaris, SunOS, SGI, RS/6000
- Description:
- Software based text-to-speech package. Generates
waveforms completely algorithmically instead of by concatenating
waveforms, for maximum flexibility and naturalism. For instance,
when the user requests a deeper voice, the software simulates a
larger vocal tract, instead of simply pitch-shifting samples.
Uses high-level linguistic parsing, which obviates the need for a
huge dictionary. Handles numbers, acronyms, currency, etc.
Includes a set of annotation symbols, for placing stress on
particular words, expressing excitement/boredom, etc. Also allows
phonetic input. Support for Windows DDL.
Produces male and female voices for General American English.
Dialects under development include Alabama, Brooklyn, and Boston.
- Price:
- Flexible license agreements on application.
- Availability:
- Eloquent Technology, Inc.
2389 North Triphammer Road
Ithaca, NY 14850
Ph: (607) 607-266-7020 Fax: (607) 607-266-7030
Email: eti@plab.dmll.cornell.edu
Emacspeak - A Speech Output Subsystem For Emacs
- Platform: UNIX, Emacs
- Description:
- Emacspeak is a speech output system that will allow
someone who cannot see to work directly on a UNIX system.
Emacspeak is built on top of Emacs. With emacspeak loaded, Emacs
provides spoken feedback for everything you do. Emacspeak
currently supports the new Dectalk Express speech synthesizer, as
well as older versions of the Dectalk e.g. the MultiVoice. See the
Emacspeak WWW page, the Emacspeak FAQ or the Emacspeak
distribution for additional details.
- Requirements:
- Requires GNU FSF Emacs 19 (version 19.23 or later)
and TCLX 7.3B (Extended TCL) to run Emacspeak.
- Availability:
- Not known at this time (web sites are gone)
- Contact: T. V. Raman, raman@adobe.com
Eurovocs
- Platform: Various - RS232 Connection
- Description:
- Eurovocs is a stand-alone text-to-speech
synthesizer which uses the text-to-speech technology of Lernout
and Hauspie Speech Products. Available for Dutch, French, German
and American English with other languages planned for release
soon. One Eurovocs device can support two different languages.
Eurovocs can be connected to any computer via a standard serial
interface (RS232). It supports personal dictionaries, generation
of DTMF tones, and pronunciation of special character sequences
such as digit strings, telephone-numbers, date and time
indications, abbreviations, alphanumeric strings etc.
- Contact:
- Technologie & Revalidatie
Postbus 128, B-9000 Gent, Belgium
Ph: +32-9-264 33 97, Fax: +32-9-264 35 94
E-mail: noe@elis.rug.ac.be
WWW page:
HADIFIX
- Platform: Windows
- Description:
- German speech synthesis system developed at the
Institute for Communications Research and Phonetics , University
of Bonn. Provides conversion of input text to phonemes, automatic
prediction of stress, phrasing and pitch, and speech generation by
concatenation of small units of natural speech. Demisyllables and
similar units are used; they comprise all consonants before the
vowel and the beginning of the vowel (initial demisyllable) or the
end of the vowel and the following consonants (final
demisyllable). For example, the word 'Strolch' is formed by
concatenating 'Stro' and 'olch'.
- Demo:
- Windows demo software available. Limited to synthesis of one short text (text.txt) at a time. Speech format limitations too. 1.3MB file.
- WWW page
- On-line demo
Infovox Product Range
- Description:
- Multilingual Text-to-speech systems, languages
available: American English, British English, German, French,
Spanish, Italian, Swedish, Norwegian, Icelandic, Danish and
Finnish.
- Product name:INFOVOX 500, PC BOARD
- Product description: Half length expansion board for IBM PC, XT, AT, PS/2 model 30 or compatible personal computers. The board can also be connected via the serial port. Language and control program for downloading into RAM or mounted on EPROMs
- Platform: for IBM PC, XT, AT, PS/2 model 30 or compatible
- Delivered standard interface: MS DOS I/O driver
- Product name: INFOVOX 600, OEM BOARD
- Product description: OEM board built with CMOS IC's.
Language and control program are stored in on-board fixed
memory.
- Platform: any, Interface: 9-pole D-SUB (RS 232-C) 300-9600 Baud.
- Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech manager.
- Product name: INFOVOX 700, DESKTOP UNIT
- Product description: Desktop unit with built in Infovox 600 to be connected to any computer or terminal via an RS 232-C serial interface. Built in loudspeaker and rechargable battery for 4 hours use, and control knobs for continuous control of speech volume and speed.
- Platform: any
- Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech manager
- Product name: INFOVOX 650, OEM BOARD
- Product description: OEM-board built with CMOS IC's. Language and control program are stored in on-board memory.
- Platform: any, Interface: 9 pole D-SUB (RS 232-C) 300-9600 Baud
- Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech manager
- Product name: INFOVOX 750, DESKTOP UNIT
- Product description: Desktop unit with built in Infovox 650 to be connected to any computer or terminal via an RS 232-C serial interface. Built in loudspeaker and rechargable battery for 5 hours use, and a control knob for continuous control of speech volume.
- Platform: any
- Delivered standard interfaces: MS DOS I/O driver and interface to Apple Speech manager
- Product name: Infovox 210, software for Apple Macintosh
- Product description: Software based text-to-speech conversion. Produces 16 bit and 8 bit sound. Delivered on 3.5" diskettes with user lexicon and a complete documentation.
- Platform: Apple Macintosh with minimum 68030, 33 MHz microprocessor.
- Delivered standard interfaces: Standard interface to Apple Speech manager
- Product name: Infovox 220, software for Microsoft Windows.
- Product description: Software based text-to-speech conversion. Produces 16 bit sound and conforms to Microsoft Windows multimedia standard MCI. Delivered on 3.5" diskettes with user lexicon and a complete documentation.
- Platform: IBM compatible PC with minimum 486, 25 MHz microprocessor.
- Delivered standard interfaces: Standard interface to Microsoft Windows 3.1 and sound boards supporting Microsoft Windows multimedia driver for audio.
- Contact:
- Telia Promotor Infovox AB
TTS Sales Division
P.O. Box 2069
S-171 02 Solna, Sweden
Ph: +46 8 764 35 00 Fax: +46 8 735 78 76
email: tts-sales@infovox.se
IPOX: All Prosodic Speech Synthesis Architecture
- Description:
- IPOX is an experimental, all-prosodic speech synthesizer, developed by Arthur Dirksen and John Coleman. IPOX is freely available (after registration) for evaluation and non-profit research purposes.
- Requirements:
- PC (preferably a fast 486) running Windows 3.1 or higher. Sound output requires a 16-bit Windows-compatible sound card
- Availability: By WWW
JSRU
- Platform: UNIX and PC
- Cost:
- 100 pounds sterling (from academic institutions and industry)
- Description:
- A "C" version of the JSRU system, Version 2.3 is available. It's written in Turbo C but runs on most Unix systems with very little modification. A Form of Agreement must be signed to say that the software is required for research and development only.
- Contact:
- Dr. E.Lewis eric.lewis@bristol.ac.uk
Klatt-style synthesiser
- Platform: Unix
- Cost: Free
- Description:
- Software posted to comp.speech in late 1992.
- Availability:
- By ftp from the comp.speech ftp site
KPE80 - A Klatt Synthesiser and Parameter Editor
- Platform: Unix
- Description:
- The KPE80 program provides a graphical interface
for the implementation of the Klatt 1980 formant synthesiser
written by Jon Iles and Nick Ing-Simmons. It was inspired by IGE,
a piece of code written by Rob Fletcher.
- Technical Desc.:
- It is comprised of an X-Window interface and
version 3.03 of the synthesiser code. The interface allows users
to display and edit Klatt parameters using a graphical display
which includes the time-amplitude waveform of both the original
speech and its synthetic copy, and some signal analysis
facilities. Most of the work in choosing the parameter values to
produce the synthetic copy has to be done by the user. KPE will
estimate the fundamental frequency contour from an original token;
this estimate will need to be amended where errors occur. It is
possible to specify the formant trajectories with some precision
by overlaying the appropriate formant frequency parameter tracks
on the spectrogram of the target waveform. A number of facilities
exist to help in the refinement of parameter values: original and
synthetic waveforms can be compared aurally, spectrally, and
spectrographically using built-in speech analysis facilities.
- File formats:
- KPE will read RIFF (.wav) files and SFS files. (SFS is a suite of speech-signal processing programs available
free from Phonetics and Linguistics, UCL.)
- Availability:
- See also: Public domain Klatt-style speech synthesis code.
- Contact: Andrew Simpson
Department of Phonetics and Linguistics, University College London
Wolfson House, 4 Stephenson Way, London NW1 2HE
Email: a.simpson@ucl.ac.uk
WWW page
"learph": Trainable text-to-phoneme software by Antonio Lucca
- Platform: UNIX
- Description: Experimental software which learns text to phoneme translation from examples using decision-tree-like data structures. It is based on the assumption that each letter can correspond to different phoneme strings depending on the context.
- Availability: Examples and source are available on the WWW
- Contact: Antonio Lucca: lucca@ghost.dsi.unimi.it
Lernout & Hauspie Text-to-Speech (3 products)
- Lernout & Hauspie have three TTS products. The functionality of the products is similar, however, they differ in hardware implementation and other details where described below.
- L&H tts2000/T: TTS for the Telephony and Telecommunications Market
- L&H tts2000/M: TTS for the Computer and Multimedia Market
- L&H tts3000/C: TTS for the Buisness and Consumer Electronics Market
- Description:
- Text to Speech (TTS) software based on parameterized segment concatenation (diphones, triphones and
tetraphones) algorithms. Available for US English, German, Dutch, French, Spanish (Castilian), Italian and Korean.
- General features include:
- The control of volume, speech rate and speech pitch.
- The use of control sequences to customize TTS output (adding pauses, using phonetic input, etc.).
- Switching between languages at run time.
- A personal vocabulary editor is available for building exception dictionaries.
- Readout modes: letter by letter, word by word or sentence by sentence.
- Input formats: orthographic input, phonetic input, phonetic input with prosodic information.
- tts2000/T
- Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PCM.
- Sampling Frequency: 8kHz
- Single channel platform examples: SHARP SH7000, ARM6/ARM7, Intel i960, TI TMS320C31, AT&T DSP3210
- Multi-channel platform examples: TI TMS320C31, AT&T DSP3210
- tts2000/M
- Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PC.
- Sampling Frequency: 8/10/11.025 kHz
- Single processor platform examples: ARM6/ARM7, Intel 386/486/Pentium, Motorola 68040
- Two processor platform examples: {Intel 386/486/Pentium or Motorola 68030} and {ADI ADSP21XX or Motorola 5600X or TI TMS320C25/20C5X}
- tts3000/C
- Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PCM.
- Sampling Frequency: 10kHz
- Single processor platform examples: SHARP SH7000, ARM6/ARM7, Intel i960, TI TMS320C31, AT&T DSP3210
- Two processors platform examples: { SHARP SH7000 or ARM6/ARM7 or Intel 386EX or Motorola 683XX} and {ADI ADSP21XX or Motorola 5600X or TI TMS320C25/C5X or TI TSP50C10}
- See also: L&H Windows TTS SDK
- More Information: on the Lernout & Hauspie WWW pages
- Price: Unknown
- Contact: Lernout & Hauspie Speech Products
800 West Cummings Park, Suite 3100
Woburn, MA 01801, USA
Tel: (617) 932 4118
Fax: (617) 932 9209
Email: sales@lhs.com
WWW
Lernout & Hauspie Text-to-Speech Windows SDK
- Platform: IBM-Compatible
- Description: The L&H Text-to-Speech software developers kit is able to integrate text-to-speech technology with your own or existing PC applications under Microsoft Windows 3.1. This software will allow conversion of written text into clear human sounding synthetic speech.
- Requirements:
- IBM-compatible PC 386 DX/33, 8Mb RAM
- MS DOS 5.0 and MS Windows 3.1 (or higher)
- SoundBlaster compatible sound board.
- See also: L&H TTS Products
- More Information: on the Lernout & Hauspie WWW pages
- Price: Unknown
- Contact: Lernout & Hauspie Speech Products
800 West Cummings Park, Suite 3100
Woburn, MA 01801, USA
Tel: (617) 932 4118
Fax: (617) 932 9209
Email: sales@lhs.com, WWW page
Macintosh Speech Output Applications
- A comprehensive list of Macintosh Speech Applications is provided by
Kevin Lenzo at CMU
The Apple Speech WWW Site has some useful information
MacinTalk
Platform: Macintosh
Cost: Free
Description: Formant based speech synthesis. There is also a program called "tex-edit" which apparently can pronounce English sentences reasonably using Macintalk.
Note: MacinTalk doesn't run reliably on Macintosh's with new sound hardware under the lastest OS (System 7.1 w/HUD 2.0). More recent software is listed above.
Availability:
By anonymous ftp from many archive sites (have a look on archie if you can). tex-edit is on many of the same sites.
This article by my friend Denise Lance will give you some ideas on the more modern speech offerings of Apple/Macintosh. When you have finished reading the article (there are some appropriate notes to read) you can also download English_Text-to-Speech from there.
Monologue for Windows from First Byte
- Description:
- Monologue is a software program that reads text
from the clipboard in Windows 16 or 32 bit applications. It can be
found as a bundled product with many sound cards and multimedia
general purpose computer systems. Monologue can add the element of
speech to virtually any text oriented application. Any
pronounceable combination of letters and numbers will be spoken
clearly. It can be applied to tasks such as eyes-free
proofreading, data verification (e.g. spreadsheets), reading
E-mail and more. User-changeable parameters provide control over
the sound quality by allowing for changes in pitch, and the speed
of speech. An exception dictionary saves preferred pronunciation
of words and abbreviations.
Monologue Win32 now includes support for the Microsoft SAPI.
Monologue male "SpeechFonts" are available for US English, British
English, German, French, Latin American Spanish, Italian. A US
English Female SpeechFont is also available.
For more detailed information and examples go to the First Byte
WWW pages.
- Availability: Currently bundled with many sound cards and
multimedia general purpose computer systems. For pricing,
licensing details, and release information see the First Byte WWW
pages or email info@firstbyte.davd.com.
- See also: ProVoice Developer's Speech Toolkit from First Byte
- Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610 Fax: 310-793-0611
Email: info@firstbyte.davd.com or WWW page
Narrator Translator Library
- Platform: Amiga
- Description:
- A replacement for the Commodore-supplied
"translator.library" which is a part of the Narrator speech
synthesis package. It implements multi-lingual text-to-speech for
an Amiga. The library allows the user to specify the language the
text to be spoken should be translated as. This can be done by
setting the default language or by including markup codes in the
text in a similar way to Latex or Html. eg: "\french{Bonjour}".
There is currently support for American English, British English,
Swedish, Maori, Finnish, German, Icelandic, Klingon, Polish,
Italian, and Welsh.P
- Availability:
- The library (but not source) is available by anonymous ftp from Aminet
- More Information: is available on the WWW
Narrator
- Platform: Amiga
- Description:
- Formant based speech synthesis. Includes a Engish-to-phoneme translation library, and a SPEAK: pseudo-device for speech output.
- Hardware: Standard Amiga hardware
- Availability: Part of AmigaOS
- See Also: The Narrator Translation library
TextToSpeech Kit
- Platform: NeXT Computers
- Description:
- The TextToSpeech Kit does unrestricted conversion
of English text to synthesized speech in real-time. The user has
control over speaking rate, median pitch, stereo balance, volume,
and intonation type. Text of any length can be spoken, and
messages can be queued up, from multiple applications if desired.
Real-time controls such as pause, continue, and erase are
included. Pronunciations are derived primarily by dictionary
look-up. The Main Dictionary has nearly 100,000 hand-edited
pronunciations which can be supplemented or overridden with the
User and Application dictionaries. A number parser handles numbers
in any form. A letter-to-sound knowledge base provides
pronunciations for words not in the Main or customized
dictionaries. Dictionary search order is under user control.
Special modes of text input are available for spelling and
emphasis of words or phrases. The actual conversion of text to
speech is done by the TextToSpeech Server. The Server runs as an
independent task in the background, and can handle up to 50 client
connections.
- Misc:
-
The TextToSpeech Kit comes in two packages: the Developer
Kit and the User Kit. The Developer Kit enables developers to
build and test applications which incorporate text-to-speech. It
includes the TextToSpeech Server, the TextToSpeech Object, the
pronunciation editor PrEditor, several example applications,
phonetic fonts, example source code, and developer documentation.
The User Kit provides support for applications which incorporate
text-to-speech. It is a subset of the Developer Kit.
- Hardware:
- Uses standard NeXT Computer hardware.
- Cost:
- TextToSpeech User Kit: $175 CDN ($145 US)
- TextToSpeech Developer Kit: $350 CDN ($290 US)
- Upgrade from User to Developer Kit: $175 CDN ($145 US)
- Availability: Trillium Sound Research
1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
Tel: (403) 284-9278 Fax: (403) 282-6778
Order Desk: 1-800-L-ORATOR (US and Canada only)
Email: TTSInfo@trillium.ab.ca
Orator Text-to-Speech Synthesizer
- Platform: SUN SPARC, Decstation 5000. Written in C, and therefore portable to other UNIX platforms. Some successful ports: --> HP, RS-6000, PC-Unix [Linux].
- Description:
- Sophisticated speech synthesis package. Has text
preprocessing (for abbreviations, numbers), acronym rules, and
human-like spelling routines. Natural-sounding synthesis based on
demisyllable concatenation. Has high accuracy for pronunciation of
names of people, places and businesses in America; good accuracy
for English text; rules for stress and intonation marking; various
methods of user control and customization at most stages of
processing.
A new version of the ORATOR system is under development. Both
ORATOR and this new "ORATOR II" system are capable of general text
synthesis. The ORATOR II system has a more natural-sounding voice.
- Hardware: Runs on common SPARC or Decstation workstations, using
their internal audio output capability. Recommend at least 16M of
memory.
- More detailed information plus examples of ORATOR synthesis
are available on the ORATOR WWW pages
- Misc 1: A free demo cassette is available.
- Misc 2: Examples of Orator are also available on the University of Birmingham Speech Synthesis "Museum" WWW site (see Q5.4).
- Availability and Pricing: Contact Bellcore's Licensing Office
Tel: 1-800-521-CORE (521-2673)
Fax: 1-908-336-2559
Email to Anthony Lindsey: alin1@panix.com
PAM - A Text-To-Speech Application
- Platform: Windows
- Description:
- PAM is a talking personal assistant and text reader application. It uses the ProVoice TTS package. PAM will verbally advise about appointments and reminder messages at specified times during the day. It can read text files, clipboard text, and text sent in DDE messages. Using the full verbal interface, PAM can be
used by visually challenged individuals. Shareware - thirty day free trial.
- Requirements: Any Windows sound card, speakers or headphones.
- Min. memory - 4 megs, 8 megs recommended.
- A more complete description is available on the JTS homepage
- Availability:
- The shareware and associated files can be downloaded by ftp
- Price: $US40 for the registered version.
- Contact: Tom Slemko:tslemko@islandnet.com
- JTS Micro Consulting Ltd
10931 Lytton Road, RR#4
Ladysmith, B.C., Canada, V0R 2E0
ProVerbe Speech Engine for Windows (95 and NT)
- Description: The ProVerbe Speech Engine produces natural
sounding speech from written text. Naturalness is achieved by
using the TD-PSOLA process from the CNET (France telecom's
research lab.) which is based on the concatenation of elementary
speech units (including diphones). Supported languages are British
English, German, French and Spanish. For multi-channel
applications Elan Informatique also provides hardware platforms.
The Elan Informatique provides a SDK reference document
(sdken.exe: WinWord6 format in a self extractable compressed
format).
- Demo versions:
- The directory includes the following demos.
- PVBSEDP.zip: French male voice (4.3MB)
- PVBFRF.zip: French female voice (4.6MB)
- PVBSPA.zip: Spanish male voice (4.6MB)
- PVBGER.zip: German male voice (14.0MB)
- PVBENG.zip: English male voice (9.9MB)
- The directory also includes synthesis samples for a French male voice, French female voice, English male voice, and a German male voice. The readme file in the directory describes the memory requirements
for the demos.
- A CD-ROM with all these demonstrations is available. To request
it, please email Elan Informatique.
- Contact: Elan Informatique
4 rue Jean Rodier, 31400 TOULOUSE FRANCE
Contact person: Pierre Delrat
Phone: +33-61-36-0777 Fax: +33-61-36-0770
BBS: +33-61-36-0788
E-mail: 101346.465@compuserve.com
Anonymous FTP
ProVoice Developer's Speech Toolkit from First Byte
- Platform: ProVoice Developer's Toolkits are available for DOS,
Windows 3.1, Windows 95, Windows NT, OS/2, and Macintosh.
- Description:
- ProVoice allows programmers to add synthesized
speech to their applications. Your program passes text strings to
the ProVoice speech engine that translates text into audible
speech. Male and/or female "SpeechFonts" are available for many
languages; English, French, German, UK British English, Italian,
and Spanish.
ProVoice converts text to speech in two phases using a set of
phonetic translation and pronunciation rules. First, the software
analyzes and translates text into "sound descriptors", a phonetic
language with pitch, duration, and amplitude codes which are
needed to produce stress patterns in phrases and sentences. Rules
are used to analyze words, numbers, and punctuation. The second
phase converts the intermediate phonetic language in speech
signals; algorithms drive distinct speech signals into smooth
flowing, continuous, clear speech. Real time synchronization of
mouth movement and word boundaries allows animation of a graphical
talking character, or highlighting of displayed text as it is
spoken.
Necessary tools and examples are provided for programmers to
manipulate the ProVoice speech technology; including installation
instructions, extensive samples programs, and complete
documentation. In addition, sample code is provided on disk to
illustrate speech programming techniques.
- Note 1: First Byte will perform custom work for embedded systems.
- Note 2: ProVoice Windows includes support for the Microsoft SAPI. It will speak through any Windows-supported wave audio device.
- Note 3: Distribution of ProVoice for commercial use is subject to execution of a Commercial Product Distribution License Agreement.
- For more detailed information and examples go to the First
Byte WWW page.
- See also: Monologue for Windows from First Byte
- Price and Availability:
- Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610, Fax: 310-793-0611
Email: info@firstbyte.davd.com or WWW page.
RC Systems V8600/V8601 Text to Speech synthesizers
- Platform 1: IBM PC: ISA card.
- Platform 2: Interface to PC/104 standard microcontrollers.
- Platform 3: Standalone (or embedded) thru RS232 or parallel
printer port or processor bus.
- Description: Converts plain ASCII text to speech. Programmable
voices, pitch rate, volume, etc. Built-in DTMF and tone
generators.
- Price: $151-$299 US (qty 1)
- Contact: RC Systems
1609 England Avenue, Everett, WA 98203, USA
Ph: (206) 355-3800 Fax: (206) 355-1098
Europe: +44181 539-0285
rsynth
- Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI
Irix4.x, Linux)
- Description: Public domain text-to-speech systm assembled from a
variety of sources. It supports CMU and BEEP format dictionaries
(as described in Q1.10) and now utilises stress marks in the
dictionary in synthesising intonation.
- Price: Free
- Misc: Axel Belinfante has implemented a WWW rsynth demo
- Availability: anonymous ftp #1 or anonymous ftp #2
SENSYN speech synthesizer
- Platform: PC, Mac, Sun, and NeXt
- Rough Cost: $300
- Description:
- This formant synthesizer produces speech waveform
files based on the (Klatt) KLSYN88 synthesizer. It is intended for
laboratory and research use. Note that this is NOT a
text-to-speech synthesizer, but creates speech sounds based upon a
large number of input variables (formant frequencies, bandwidths,
glottal pulse characteristics, etc.) and would be used as part of
a TTS system. Includes full source code.
- Availability: Sensimetrics Corporation
64 Sidney Street, Cambridge MA 02139.
Fax: (617) 225-0470; Tel: (617) 225-2442.
Email: sensimetrics@sens.com
SGI Developers Toolbox Synthesiser
- Platform: SGI
- Description: The SGI Developer Toolbox 4.0 CDROM contains a
basicpublic domain text-to-speech program in the publics/speak
directory. The directory includes man pages and source.
- Availability: on the SGI Developer Toolbox 4.0 CDROM
SIMTEL
A wide range of speech related software, sound-blaster software and
signal processing software for PCs is available on SimTel and its
mirror sites. It can be obtained by ftp from:
Note: Voicemaker - The archives include the program Voicemaker which synthesises speech.
GOOD HUNTING AND ENJOY!
Top | ACSP Home | SuperAdaptoid Column