espeak-ng Text to Speech

Tanmay
3 min readSep 24, 2021

eSpeak is an open source Text-To-Speech (TTS) engine. It is an artificial speech synthesis software which converts text to audio. It supports a vast number of languages. Further languages can be developed with the help of espeakedit, a GUI interface for preparing and compiling phoneme data. In Windows, eSpeak implements Microsoft SAPI (Speech API). Besides Windows, it supports Mac and Linux platforms.

eSpeak does text to speech synthesis for the following languages, some better than others: Afrikaans, Albanian, Aragonese, Armenian, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Kannada, Kurdish, Latvian, Lithuanian, Lojban, Macedonian, Malaysian, Malayalam, Mandarin, Nepalese, Norwegian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh.

Lazarus or Freepascal has many ways of implementing it. The simplest way is to use an sysutils.ExecuteProcess() call to execute eSpeak with command line parameters (discussed below). But this could result in a console window while speaking. A better solution is to use a TProcess (changing properties to poUsePipes and swoHide) to run the commands.

FEATURES

  • Includes different Voices, whose characteristics can be altered.
  • Can produce speech output as a WAV file.
  • SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
  • Compact size. The program and its data, including many languages, totals about 2 Mbytes.
  • Can be used as a front-end to MBROLA diphone voices, see mbrola.html. eSpeak converts text to phonemes with pitch and length information.
  • Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
  • Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
  • Development tools are available for producing and tuning phoneme data.

Using eSpeak command line

These are some simple examples of using the eSpeak command line. To use this Download and install eSpeak. Then open the command prompt and run cd C:\Program Files\eSpeak\command_line\ then run the commands below:

espeak "Hello World!"

This is the simplest command. Speaks “Hello World!”

espeak -v +f2 "Hello World!"

Speaks the text in female voice (thus f2). There are 7 male voices (m1 to m7) and 4 female voices (f1 to f4)

espeak -v fr+f2 "Bonjour tout le monde"

Speaks the text in French accent and in a female voice.

espeak -g 10 "I have something to say."

Pauses for 10 milliseconds between words. A better understandable option.

espeak -s 400 "I have something to say."

Speaks fluently! The -s (speed) parameter could be 80 to 450. Default is 175.

espeak -v +whisper "I have something secret to say!"

Speaks the text as if in your ear!! (Notice the +whisper part.)

With the help of espeak-ng manual we can get all the options, examples with description about this command

<man espeak-ng>

--

--