text to speech software linux

eSpeak: Text To Speech Tool For Linux

eSpeak is a command line tool for Linux that converts text to speech. This compact speech synthesizer provides support for English and many other languages. It is written in C.

eSpeak reads the text from the standard input or input file. The voice generated, however, is nowhere close to a human voice. But it is still a compact and handy tool if you want to use it in your projects.

Some of the main features of eSpeak are:

Speaks text from a file or from stdin
Shared library version to be used by other programs
SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface
Ported to other platforms, including Android, Mac OSX etc.
Several voice characteristics to choose from
Speech output can be saved as .WAV file
SSML ( Speech Synthesis Markup Language ) is supported partially along with HTML
Uses a “formant synthesis” method. This allows many languages to be provided in a small size.
Tiny in size, the complete program with language support, etc is under 2 MB.
Can translate text into phoneme codes so that it could be adapted as a front end for another speech synthesis engine.
Development tools are available for producing and tuning phoneme data
Supports several languages; however, in many cases these are initial drafts and need more work

Install eSpeak

To install eSpeak in Ubuntu based system, use the command below in a terminal:

eSpeak is an old tool and I presume that it should be available in the repositories of other Linux distributions such as Fedora. You can install eSpeak easily using the respective package manager. I

n case of Arch Linux, the repository has espeak-ng in place, which is described in the next section.

To use eSpeak, enter espeak in the terminal. It waits for input. You can start typing your text. When you press enter (new line), you can hear the text you had entered.

You can continue adding text in lines to hear it out. Use Ctrl+C to close the running program .

There are several other options available. You can browse through them through the help section of the program.

espeak help section explaining the usages

GUI Version: espeakedit

If you prefer the GUI version over the command line, you can install espeakedit which provides a GTK front end to eSpeak.

Use the command below to install espeakedit:

Once installed, you need to copy the data on /usr/lib/x86_64-linux-gnu/espeak-data/ to your home directory. For this, open a terminal and run:

Once done, you can open the espeakedit application. It will look like:

You can enter the text on the field provided and press speak to start. You can save the file as .WAV file and listen later.

The interface is straightforward and easy to use. You can explore the submenus and functions all by yourself.

A New Tool: eSpeak NG

The eSpeak NG is a compact open-source text-to-speech synthesizer, based on eSpeak engine created by Jonathan Duddington.

It offers the features of eSpeak and is in active development. The project also provides a separate espeak-ng-data package, to avoid conflict with the espeak-data package offered by eSpeak project.

To install this, on Ubuntu, run:

The new eSpeak NG project is a significant departure from the eSpeak project, aiming to clean up the existing codebase, add new features, and add to and improve the supported languages.

Also, it is important to note that espeakedit GUI is not part of this new project.

Some of the notable features:

Uses the same command-line options as espeak with several additions.
Provides new functionality such as specifying the output audio device name to use.
Has been ported to other platforms, including Solaris and Mac OSX.
Includes different voices whose characteristics can be altered.
Available as a command-line program for Linux and Windows to speak text from a file or from stdin.
Available as a shared library version for use by other programs.

Wrapping Up

On It’s FOSS, we use Play.ht to provide audio formats of selected articles. The espeak tools are not as good as the professional AI tools.

However, if you want something basic and free to be used in your project, you can give it a try.

Abhishek Prakash

Created It's FOSS 11 years ago to share my Linux adventures. Have a Master's degree in Engineering and years of IT industry experience. Huge fan of Agatha Christie detective mysteries 🕵️‍♂️

Meet DebianDog - Puppy sized Debian Linux

Reduce computer eye strain with this nifty tool in linux, install open source dj software mixxx version 2.0 in ubuntu, install adobe lightroom alternative rawtherapee in ubuntu linux, complete guide to installing linux on chromebook, become a better linux user.

With the FOSS Weekly Newsletter, you learn useful Linux tips, discover applications, explore new distros and stay updated with the latest from Linux world

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to It's FOSS.

Your link has expired.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.

An In-Depth Guide to Open Source Text-to-Speech Engines for Linux

This comprehensive guide explores the top open source text-to-speech (TTS) engines available for Linux. Converting text into lifelike speech is useful for accessibility, delivering information via voice interfaces, learning pronunciation, and more. We’ll cover the capabilities of leading Linux TTS tools, their installation, and plenty of usage examples.

Introduction to Text-to-Speech

Text-to-speech (TTS) is the artificial production of human speech from written text. TTS engines ingest text, process it through natural language pipelines, and output synthesized audio speech. The quality of TTS systems is determined by how natural and humanlike the generated voices sound.

TTS has many practical use cases:

Improving accessibility for vision-impaired users
Reading text aloud when eyes-free is needed like while driving
Delivering information over voice interfaces or phone systems
Assisting with learning languages and proper pronunciation
Transcribing documents to audio book format
Adding speech output to applications by leveraging TTS APIs

High-quality voices require sophisticated deep learning algorithms. Most modern TTS engines utilize machine learning trained on huge datasets of recorded human speech.

In this guide, we’ll focus on open source command line utilities for performing TTS on Linux. Let‘s look at some of the best options.

eSpeak – Lightweight Open Source TTS

eSpeak is an open source text-to-speech engine released in 1995 by Jonathan Duddington. It supports over 70 languages and accents and is highly configurable for adjusting speech parameters.

eSpeak is lightweight and designed to be portable across many systems. It comes bundled with many Linux distributions due to being open source (GPLv3 license). The voices tend to sound robotic but the speech is clear and works well.

To install on Debian/Ubuntu:

Arch Linux:

Basic usage is simple. To output text to speech:

To read a file aloud:

Let‘s go through some ways to customize and control eSpeak‘s voices.

To list all available voices:

This prints out a table summarizing each voice‘s language, dialect, and identifier.

For example, to set the voice to US English:

Adjust the speech rate with the -s flag:

The pitch can be adjusted with -p :

To save audio output to a file, use -w :

This saves a Wave audio file that can be played in media players. eSpeak supports outputting .wav , .mp3 , and .ogg .

In addition to these common uses, eSpeak provides phoneme support for precise pronunciation:

And an API for integrating TTS directly into applications with C, C++, Python and other languages.

Overall, eSpeak provides a capable open source text-to-speech system on Linux. The voices aren‘t as human sounding as some commercial options, but it‘s free, customizable, lightweight, and easy to use.

Festival – Framework for Building TTS Voices

Festival is another leading open source text-to-speech system originally developed at the University of Edinburgh and released in 1997.

Festival utilizes a modular framework for building synthetic voices. It comes packaged with several English voices and support for Spanish, Welsh, and other languages. Festival is well-suited for research and education purposes.

Install Festival using your Linux distribution‘s package manager:

Some example usage:

Festival includes an interactive shell for experimenting with speech synthesis. This allows modifying parameters on the fly:

Under the hood, Festival provides a framework for building TTS voices called FestVox. This allows developers to create new synthetic voices and languages.

For basic usage, Festival has clear text-to-speech capabilities but sounds robotic. The option to build custom voices is useful for research. However, modern TTS technology has surpassed Festival‘s voice quality.

Pico TTS – Optimized Small Footprint Engine

Pico TTS is an open source project to create a small footprint text-to-speech engine optimized for embedded Linux.

The engine itself is written in C++ and comes packaged in many Linux distributions. It‘s licensed under the LGPL and was originally developed for the Raspberry Pi.

Install on Debian/Ubuntu:

Pico TTS supports English, Spanish, French, German, and Italian voices. Since it‘s designed for small systems, the quality is surprisingly good for the small resource requirements.

To synthesize text and save as a WAV file:

Here -l specifies the language code like en-US for US English.

Pico TTS doesn‘t allow piping text directly to stdout. But the WAV output works well for offline usage.

In summary, Pico TTS provides a capable text-to-speech engine optimized for embedded Linux applications like the Raspberry Pi. For desktop use, other options might be higher quality. But as a small footprint engine, Pico TTS works quite well.

gTTS – Leveraging Google‘s TTS API

gTTS provides a command line interface and Python library for Google Translate‘s Text-to-Speech API. It‘s an easy way to access Google‘s state-of-the-art deep learning models.

gTTS can be installed with pip:

Or on Linux distributions:

Basic usage:

This saves the synthesized audio to an MP3 file.

To read a text file aloud:

gTTS supports dozens of languages and natural sounding voices provided by Google:

Prints out all the available languages and voice codes.

For example, set the language to US English:

gTTS is ideal way to leverage Google‘s industry leading text-to-speech engine from the Linux command line. The audio quality is human sounding and highly intelligible.

Comparing Voice Quality Between TTS Engines

There are noticeable differences in audio quality between the open source text-to-speech solutions we covered. Let‘s do a quick comparison.

eSpeak and Festival sound robotic since they rely on formant synthesis instead of deep learning. eSpeak voices tend to be clearer than Festival.

Pico TTS delivers good quality given its tiny resource footprint. The voices aren‘t perfectly human sounding but quite intelligible.

gTTS provides the most natural sounding audio by far since it uses Google‘s state-of-the-art WaveNet deep neural network voices. The quality difference is very noticeable.

For the best sounding voices, gTTS is recommended. But the open source engines like eSpeak work well enough for some use cases, especially considering they‘re free.

Additional Tips and Tricks

Here are some additional tips for getting the most out of Linux text-to-speech engines:

Adjust speech rate, pitch, and volume to customize the voice
Use phoneme support for precise pronunciation of texts
Output audio to a file instead of directly to speakers
Pipe audio to media players like mplayer for enhanced controls
Chain multiple engines together for more options
Install alternative voices and languages
Use TTS engines from other languages like Chinese, Russian, etc.
Integrate speech synthesis directly into your own apps with provided APIs

And some troubleshooting advice:

If no audio, check speakers are not muted and volume is up
Install any required audio codec packs for your system
Try a different TTS engine if issues with a specific one
Look for error output for diagnose problems
Consult documentation and GitHub issues page

With a bit of tweaking, the open source text-to-speech engines provide plenty of options for your Linux projects.

Leveraging TTS Engines in Shell Scripts

One useful application of text-to-speech on Linux is scripting batch text file conversions. Here is an example bash script to synthesize all text files in a directory using eSpeak:

This iterates through .txt files, converts each to audio with eSpeak using the -w flag, and saves the output as a .wav file.

Scripts like this provide an easy way to automate batch text-to-speech conversions and workflows.

Appendix: Quick Reference of Engines

This guide covered several excellent open source text-to-speech utilities for Linux. eSpeak and Festival are classic options that work reasonably well. Pico TTS is great for embedded devices. gTTS provides the best sounding human voices by leveraging Google‘s technology.

The installation process, basic usage, and customization options were explained for each text-to-speech engine. TTS enables many exciting applications on the Linux command line and within scripts or apps.

To learn more about the capabilities of each text-to-speech engine, be sure to consult the official project documentation. Their GitHub repositories also contain useful code samples to get started.

With the power of text-to-speech, Linux can talk back to you! Converting text to natural sounding speech opens many possibilities.

You maybe like,

Related posts, 10 best linux games for free in 2022.

Gaming on Linux has become incredibly popular in recent years, gaining the trust of hardcore gamers thanks to digital video game distribution services like Steam…

11 Best IDEs for Web Development

Integrated development environments (IDEs) are invaluable for making web development easier, faster, and more efficient. Rather than juggling multiple tools, an IDE brings together essential…

30 Best GNOME Extensions for Ubuntu in 2023

GNOME is one of the most popular desktop environments available for Linux today. With its sleek interface and intuitive workflow, GNOME offers a polished user…

4 Best Open Source Video Editors for Linux, Mac and Windows: A Complete 2023 Guide

Video content creation is more accessible today than ever before thanks to affordable equipment and software. But proprietary video editors like Final Cut Pro or…

5 Best Free and Open Source NAS Software for Linux

Network-attached storage (NAS) devices have become very popular among home users and businesses for centralized file storage and backup. NAS units typically run a Linux-based…

5 Best Linux Distros to Learn Linux

Hi there! If you‘re venturing into the world of Linux for the first time, one key decision you’ll face is: which Linux distribution (or "distro")…

Text to Speech for Linux: Unveiling Top Solutions for Voice Synthesis

Text-to-speech (TTS) technology on Linux allows users to convert written text into spoken words. This functionality is not only useful for the visually impaired but also benefits those who prefer auditory learning or require hands-free computing. Several TTS tools are available for Linux, each offering varying features to cater to diverse needs. Popular among them is eSpeak , a compact open-source software that provides a straightforward command-line interface for speech synthesis.

The landscape of Text-to-speech for Linux encompasses a range of applications from simple, lightweight programs to more complex systems with natural-sounding voices. The quest for naturalness in computer-generated speech has given rise to projects like CMUSphinx , which aims to provide high-quality speech recognition using models trained on different languages. Accessibility and customization are focal points in the development of Linux TTS tools, as many of them are open source and enable modification to meet user-specific requirements.

While TTS technology continues to evolve, Linux users have access to a number of options for integrating speech into their computing experience. Implementations vary from simple command-line interfaces to more sophisticated GUI-based applications, ensuring there is a solution suitable for different skill levels and use cases. Through these applications, Linux upholds its commitment to inclusivity and adaptability in the realm of digital accessibility.

Text to Speech Basics

In the realm of Linux computing, text to speech tools are essential for converting written text into audible speech. These tools are widely used for their accessibility benefits and in various applications where speech output from text is preferable.

Understanding Speech Synthesis

Speech synthesis, commonly referred to as text to speech, involves the artificial production of human speech. The process begins with text analysis, during which the input text is converted into a linguistic structure. Then, during the synthesis phase, this structure is transformed into the audible waveform that we hear as speech. Each TTS system features unique algorithms and technologies to accomplish this complex task, ensuring the output is as natural-sounding as possible.

TTS Engines for Linux

Linux users have access to a variety of TTS engines. eSpeak is a compact, open-source TTS engine known for its simplicity and support for multiple languages. It operates via command line and can be easily integrated with different applications. Another example is Festival , which offers a framework for building speech synthesis systems and is known for its versatility in producing custom voices. Some Text-to-speech tools offer additional features like:

Adjusting pitch and speed
Controlling word gaps

For those seeking more advanced commercial solutions, engines like Cepstral provide a more natural voice quality for professional applications. It's important to select a TTS engine that balances functionality with system resource requirements, as some engines may be more resource-intensive than others.

Implementation and Usage

Adopting text-to-speech technology on Linux systems can be streamlined by understanding the appropriate tools and their implementation within applications. Users have access to various command line and GUI tools, ensuring versatility across different use cases.

Installing TTS Software

To get started, one must install Text-to-speech software. On many Linux distributions this involves package managers like apt for Ubuntu or pacman for Arch Linux. For instance, eSpeak , a compact and open-source TTS program, can be installed using the command sudo apt-get install espeak on Ubuntu-based distributions.

Command Line TTS Tools

Using the command line, eSpeak can convert text files to speech or live input from the standard input. It supports English among other languages and is invoked using commands like espeak "Your text goes here" . Advanced usage includes adjusting the pitch, speed, and saving the output to an audio file with flags like -p for pitch, -s for speed, and -w for writing to a file.

For a deep learning approach to Text-to-speech, coqui-ai/TTS offers a toolkit suitable for both research and production environments. This toolkit often requires additional steps for installation, such as working with Python virtual environments and installing dependencies.

Text-to-speech in Applications

Integrating TTS into applications can enhance the accessibility and functionality of software. For example, gosling serves as a wrapper around Google's Cloud Text-to-Speech API, allowing for natural-sounding speech synthesis through simple terminal commands after installation and setup. It shows how modern TTS technology can be leveraged even within Linux terminal environments.

LinuxGUI.com

High Quality Text to Speech Software – Best TTS for Linux

The best text to speech for Linux software that provide high quality text to speech — The sound have the highest quality among TTS (Text to Speech) systems, you can try unit selection voices, not hsmm, they should be less robotic — TTS with natural sounding speech.

What is the best text to speech program in Linux?

If you try to find a good solution to TTS on Linux to help you proofread nearly everything you write as without it you almost always have to many mistakes. There is not only good tts for Linux called Cepstral.

The Cepstral, paid Linux software for TTS can speak any text they are given with whatever voice you choose. Cepstral is building new synthetic voices for Text-to-Speech (TTS) every day, and can find or build the right one for any application.

On the creation date of this article, Cesptral gain version 6 with these features added : Natural prosody and smart pronunciation Enhanced audio (22kHz). New voices added: Allison now has 20% more source material, Alejandra, Charlie Superb OS integration

Cepstral Supported Language:

US English, German, UK English, Americas Spanish, Canadian French, and Italian

How to Install Cepstral Text to Speech Program in Linux To install tts software called Cepstral for Linux follow these steps:

Download the installer file from Cepstral official s ite here
Extract the file, e.g. tar -xvzf Cepstral_Allison_x86-64-linux_5.1.0.tar.gz
Change directory to the extracted directory, e.g. cd Cepstral_Allison_x86-64-linux_5.1.0
Run the install script with elevated privileges, sudo sh install.sh
Enter activation key, how to obtain and activate Cepstral key click here

High Quality Text to Speech - Best TTS for Linux Software

Another Best Free TTS for Linux

This feature available if you are using Google Chrome as your browser because the text to speech software in Linux provided by an Chrome extension! Here is what I did to have pure natural speech for PDF and TEXT FILE for FREE (other solutions are not natural or they’re just paid services).

Install SpeakIt! extension on your chrome or chromium.
Drag and drop your pdf or text file (*.txt) to browser.
Now highlight some text and right click and select SpeakIt! from the rught click context menu. Or you can click icon near the address bar so you can listen to pure natural text-to-speech (check the picture above).

There’s also ways to open other files like .doc and .txt in chrome and do the same. There’s other extensions for chrome that view pdf files, check if it fits you better. Besides you can upload all kind of texts in Google Drive and use SpeakIt! to read it for you. You need to convert your document into pure text file with .txt suffix and drag and drop it into Google Chrome if you want to read it using this extension, also yo need an internet connection.

Best text-to-speech software of 2024

Boosting accessibility and productivity

Best overall
Best realism
Best for developers
Best for podcasting
How we test

The best text-to-speech software makes it simple and easy to convert text to voice for accessibility or for productivity applications.

1. Best overall 2. Best realism 3. Best for developers 4. Best for podcasting 5. Best for developers 6. FAQs 7. How we test

Finding the best text-to-speech software is key for anyone looking to transform written text into spoken words, whether for accessibility purposes, productivity enhancement, or creative applications like voice-overs in videos.

Text-to-speech (TTS) technology relies on sophisticated algorithms to model natural language to bring written words to life, making it easier to catch typos or nuances in written content when it's read aloud. So, unlike the best speech-to-text apps and best dictation software , which focus on converting spoken words into text, TTS software specializes in the reverse process: turning text documents into audio. This technology is not only efficient but also comes with a variety of tools and features. For those creating content for platforms like YouTube , the ability to download audio files is a particularly valuable feature of the best text-to-speech software.

While some standard office programs like Microsoft Word and Google Docs offer basic TTS tools, they often lack the comprehensive functionalities found in dedicated TTS software. These basic tools may provide decent accuracy and basic options like different accents and languages, but they fall short in delivering the full spectrum of capabilities available in specialized TTS software.

To help you find the best text-to-speech software for your specific needs, TechRadar Pro has rigorously tested various software options, evaluating them based on user experience, performance, output quality, and pricing. This includes examining the best free text-to-speech software as well, since many free options are perfect for most users. We've brought together our picks below to help you choose the most suitable tool for your specific needs, whether for personal use, professional projects, or accessibility requirements.

The best text-to-speech software of 2024 in full:

Why you can trust TechRadar We spend hours testing every product or service we review, so you can be sure you’re buying the best. Find out more about how we test.

Below you'll find full write-ups for each of the entries on our best text-to-speech software list. We've tested each one extensively, so you can be sure that our recommendations can be trusted.

The best text-to-speech software overall

1. NaturalReader

Our expert review:

Reasons to buy

Reasons to avoid.

If you’re looking for a cloud-based speech synthesis application, you should definitely check out NaturalReader. Aimed more at personal use, the solution allows you to convert written text such as Word and PDF documents, ebooks and web pages into human-like speech.

Because the software is underpinned by cloud technology, you’re able to access it from wherever you go via a smartphone, tablet or computer. And just like Capti Voice, you can upload documents from cloud storage lockers such as Google Drive, Dropbox and OneDrive.

Currently, you can access 56 natural-sounding voices in nine different languages, including American English, British English, French, Spanish, German, Swedish, Italian, Portuguese and Dutch. The software supports PDF, TXT, DOC(X), ODT, PNG, JPG, plus non-DRM EPUB files and much more, along with MP3 audio streams.

There are three different products: online, software, and commercial. Both the online and software products have a free tier.

Read our full NaturalReader review .

^ Back to the top

The best text-to-speech software for realistic voices

Specializing in voice synthesis technology, Murf uses AI to generate realistic voiceovers for a range of uses, from e-learning to corporate presentations.

Murf comes with a comprehensive suite of AI tools that are easy to use and straightforward to locate and access. There's even a Voice Changer feature that allows you to record something before it is transformed into an AI-generated voice- perfect if you don't think you have the right tone or accent for a piece of audio content but would rather not enlist the help of a voice actor. Other features include Voice Editing, Time Syncing, and a Grammar Assistant.

The solution comes with three pricing plans to choose from: Basic, Pro and Enterprise. The latter of these options may be pricey but some with added collaboration and account management features that larger companies may need access to. The Basic plan starts at around $19 / £17 / AU$28 per month but if you set up a yearly plan that will drop to around $13 / £12 / AU$20 per month. You can also try the service out for free for up to 10 minutes, without downloads.

The best text-to-speech software for developers

3. Amazon Polly

Alexa isn’t the only artificial intelligence tool created by tech giant Amazon as it also offers an intelligent text-to-speech system called Amazon Polly. Employing advanced deep learning techniques, the software turns text into lifelike speech. Developers can use the software to create speech-enabled products and apps.

It sports an API that lets you easily integrate speech synthesis capabilities into ebooks, articles and other media. What’s great is that Polly is so easy to use. To get text converted into speech, you just have to send it through the API, and it’ll send an audio stream straight back to your application.

You can also store audio streams as MP3, Vorbis and PCM file formats, and there’s support for a range of international languages and dialects. These include British English, American English, Australian English, French, German, Italian, Spanish, Dutch, Danish and Russian.

Polly is available as an API on its own, as well as a feature of the AWS Management Console and command-line interface. In terms of pricing, you’re charged based on the number of text characters you convert into speech. This is charged at approximately $16 per1 million characters , but there is a free tier for the first year.

The best text-to-speech software for podcasting

In terms of its library of voice options, it's hard to beat Play.ht as one of the best text-to-speech software tools. With almost 600 AI-generated voices available in over 60 languages, it's likely you'll be able to find a voice to suit your needs.

Although the platform isn't the easiest to use, there is a detailed video tutorial to help users if they encounter any difficulties. All the usual features are available, including Voice Generation and Audio Analytics.

In terms of pricing, Play.ht comes with four plans: Personal, Professional, Growth, and Business. These range widely in price, but it depends if you need things like commercial rights and affects the number of words you can generate each month.

The best text-to-speech software for Mac and iOS

5. Voice Dream Reader

There are also plenty of great text-to-speech applications available for mobile devices, and Voice Dream Reader is an excellent example. It can convert documents, web articles and ebooks into natural-sounding speech.

The app comes with 186 built-in voices across 30 languages, including English, Arabic, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, Finnish, French, German, Greek, Hebrew, Hungarian, Italian, Japanese and Korean.

You can get the software to read a list of articles while you drive, work or exercise, and there are auto-scrolling, full-screen and distraction-free modes to help you focus. Voice Dream Reader can be used with cloud solutions like Dropbox, Google Drive, iCloud Drive, Pocket, Instapaper and Evernote.

The best text-to-speech software: FAQs

What is the best text-to-speech software for youtube.

If you're looking for the best text-to-speech software for YouTube videos or other social media platforms, you need a tool that lets you extract the audio file once your text document has been processed. Thankfully, that's most of them. So, the real trick is to select a TTS app that features a bountiful choice of natural-sounding voices that match the personality of your channel.

What’s the difference between web TTS services and TTS software?

Web TTS services are hosted on a company or developer website. You’ll only be able to access the service if the service remains available at the whim of a provider or isn’t facing an outage.

TTS software refers to downloadable desktop applications that typically won’t rely on connection to a server, meaning that so long as you preserve the installer, you should be able to use the software long after it stops being provided.

Do I need a text-to-speech subscription?

Subscriptions are by far the most common pricing model for top text-to-speech software. By offering subscription models for, companies and developers benefit from a more sustainable revenue stream than they do from simply offering a one-time purchase model. Subscription models are also attractive to text-to-speech software providers as they tend to be more effective at defeating piracy.

Free software options are very rarely absolutely free. In some cases, individual voices may be priced and sold individually once the application has been installed or an account has been created on the web service.

How can I incorporate text-to-speech as part of my business tech stack?

Some of the text-to-speech software that we’ve chosen come with business plans, offering features such as additional usage allowances and the ability to have a shared workspace for documents. Other than that, services such as Amazon Polly are available as an API for more direct integration with business workflows.

Small businesses may find consumer-level subscription plans for text-to-speech software to be adequate, but it’s worth mentioning that only business plans usually come with the universal right to use any files or audio created for commercial use.

How to choose the best text-to-speech software

When deciding which text-to-speech software is best for you, it depends on a number of factors and preferences. For example, whether you’re happy to join the ecosystem of big companies like Amazon in exchange for quality assurance, if you prefer realistic voices, and how much budget you’re playing with. It’s worth noting that the paid services we recommend, while reliable, are often subscription services, with software hosted via websites, rather than one-time purchase desktop apps.

Also, remember that the latest versions of Microsoft Word and Google Docs feature basic text-to-speech as standard, as well as most popular browsers. So, if you have access to that software and all you’re looking for is a quick fix, that may suit your needs well enough.

How we test the best text-to-speech software

We test for various use cases, including suitability for use with accessibility issues, such as visual impairment, and for multi-tasking. Both of these require easy access and near instantaneous processing. Where possible, we look for integration across the entirety of an operating system , and for fair usage allowances across free and paid subscription models.

At a minimum, we expect an intuitive interface and intuitive software. We like bells and whistles such as realistic voices, but we also appreciate that there is a place for products that simply get the job done. Here, the question that we ask can be as simple as “does this piece of software do what it's expected to do when asked?”

Read more on how we test, rate, and review products on TechRadar .

Get in touch

Want to find out about commercial or marketing opportunities? Click here
Out of date info, errors, complaints or broken links? Give us a nudge
Got a suggestion for a product or service provider? Message us directly
You've reached the end of the page. Jump back up to the top ^

Are you a pro? Subscribe to our newsletter

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

John (He/Him) is the Components Editor here at TechRadar and he is also a programmer, gamer, activist, and Brooklyn College alum currently living in Brooklyn, NY.

Named by the CTA as a CES 2020 Media Trailblazer for his science and technology reporting, John specializes in all areas of computer science, including industry news, hardware reviews, PC gaming, as well as general science writing and the social impact of the tech industry.

You can find him online on Threads @johnloeffler.

Currently playing: Baldur's Gate 3 (just like everyone else).

Luke Hughes Staff Writer
Steve Clark B2B Editor - Creative & Hardware

Adobe InDesign (2024) review

Adobe Fill & Sign (2024) review

Adobe InCopy (2024) review

Mimic 3 – neural Text to Speech (TTS) engine

Mimic 3 is a neural text to speech engine that can run locally, even on low-end hardware like the Raspberry Pi 4. The software speaks over 25 languages with over 100 pre-trained voices. Mimic 3 uses VITS, a “Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech”.

Mimic 3 is free and open source software.

Let’s take you through the installation steps first before demonstrating the software.

Installation

We tested the software on Ubuntu 22.10. We prefer installing software with the source code although there are packages available for Ubuntu/Debian.

We first install the python3.10-venv package. The venv module supports creating lightweight “virtual environments”, each with their own independent set of Python packages.

$ sudo apt install python3.10-venv

Next, clone the GitHub repository with the command:

$ git clone https://github.com/MycroftAI/mimic3

Change into the newly created mimic3 directory.

$ cd mimic3

Run the install.sh script

$ ./install.sh

This script downloads and installs all the necessary Python dependencies in a virtual environment.

There’s also a pre-built Docker image available for Intel/AMD CPus and 32/64-bit ARM. The software can also be installed with pip, a cross-platform package manager.

Next page: Page 2 – In Operation

Pages in this article: Page 1 – Introduction / Installation Page 2 – In Operation / Summary

This site uses Akismet to reduce spam. Learn how your comment data is processed .

I am impressed by this. Thank you

A-Z Commands
Privacy Policy
Terms & Conditions
Google News

Top 10 Best Open Source Speech Recognition Tools for Linux

Speech is a popular and smart method in modern time to make interaction with electronic devices. As we know, there are many open source speech recognition tools available on different platforms. From the beginning of this technology, it has been improved simultaneously in understanding the human voice. This is the reason; it has now engaged a lot of professionals than before. The technical advancement is strong enough to make it more clear to the common people.

Open Source Speech Recognition Tools

Open source voice recognition tool is not much available like the typical software we use in our daily lives in Linux platform. After a long way of research, we found some well-featured applications for you with a short description. Let’s have a look at the points below!

Kaldi is a special kind of speech recognition software, started as a part of a project at John Hopkins University. This toolkit comes with an extensible design and written in C++ programming language. It provides a flexible and comfortable environment to its users with a lot of extensions to enhance the power of Kaldi.

Noteworthy Features of Kaldi

A free and flexible open source voice recognition application, under the Apache license.
Runs on multiple platforms, including GNU/Linux , BSD, and Microsoft Windows.
Provides support to install and configure the application to your system.
Besides the speech recognition system, it also supports deep neural networks and linear transforms.

2. CMUSphinx

CMUS Sphinx comes with a group of featured-enriched systems with several pre-built packages related to speech recognition. It is an open source program , developed at Carnegie Mellon University. You will get this speaker-independent recognition tool in several languages, including French, English, German, Dutch, and more.

cmusphinx- open source voice recognition

Noteworthy Features of CMUSphinx

It is an easy-to-use and fast speech recognition system with a user-friendly interface.
Comes with a flexible design and efficient system, even in low resource platforms.
Provides acoustic model training tools through its Sphinxtrain package.
Helps to perform different types of tasks through its helpful packages, including keyword spotting, pronunciation evaluation, alignment, and more.
It is a cross-platform tool that supports both Windows and Linux systems.

Get CMUSphinx

3. DeepSpeech

DeepSpeech is an open source speech recognition engine to convert your speech to text. It is a free application by Mozilla. To run DeepSearch project to your device, you will need Python 3.r or above. Also, it needs a Git extension file, namely Git Large File Storage. It is used for versioning large files while you run it to your system.

Noteworthy Features of DeepSpeech

DeepSpeech uses TensorFlow framework to make the voice transformation more comfortable.
It supports NVIDIA GPU, which helps to perform quicker inference.
You can use the DeepSearch inference in three different ways; The Python package, Node.JS package, or Command-line client .
Each time you want to run this software to your system, you’ll need to activate the virtual environment by Python command.
It needs a Linux or Mac environment to run this application.

Get DeepSpeech

4. Wav2Letter++

WavLetter++ is a modern and popular speech recognition tool, developed by the Facebook AI Research team. It is another open source program under the BCD license. This superfast voice recognition software was built in C++ and introduced with a lot of features. It provides the facility of language modeling, machine translation, speech synthesis, and more to its users in a flexible environment.

Noteworthy Features of Wav2Letter++

It contains an active community in popular platforms like Facebook and Google group to assist its users worldwide.
WavLetter++ is a fast and flexible toolkit which uses ArrayFire tensor library for the maximum efficiency.
It lets you work with a high-performance framework like wav2letter++, which helps to do a successful research and model tuning.
Also, it provides complete documentation through the tutorial sections.
In the recipes folder, you will get the detailed recipes for WSJ, Timit, and Librispeech.

Get Wav2Letter++

Julius is comparatively an older open source voice recognition software developed by Lee Akinobu. This tool is written in the C programming language by the developers of Kawahara Lab, Kyoto University. It is a high-performance speech recognition application having a large vocabulary. You can use it in both English and Japanese languages. It can be a great choice if you want to use it for academic and research purposes.

Noteworthy Features of Julius

Julius is a highly configurable application that can set different search parameters to tune its performance.
This tool is based on a 2-pass strategy which provides you a real-time and high-quality performance.
It is a cross-platform project that runs on Linux, BSD, Windows, and Android Systems.
Integrated with Julian, a grammar-based recognition parser.
Besides supporting rule-based grammar, it also provides Word graph output, Confidence scoring, GMM-based input rejection, and many more facilities.

Get Julius

Simon comes with a modern and easy-to-use speech recognition software, developed by Peter Grasch. It is another open source program under the GNU General Public License. You are free to use Simon in both Linux and Windows systems. Also, it provides the flexibility to work with any language you want.

Noteworthy Features of Simon

Using its voice-controlled calculator, Simon provides the facility to do various arithmetic operations.
Compatible with Skype and other popular VOIP programs to establish an easy communication system with friends and relatives.
It allows users to watch slide shows and videos, listen to music , and more with a few simple voice commands.
Also, it is an essential tool in reading newspapers and surfing the internet.

Mycroft comes with an easy-to-use open source voice assistant for converting voice to text. It is regarded as one of the most popular Linux speech recognition tools in modern time, written in Python. It allows users to make the best use of this tool in a science project or enterprise software application. Also, it can be used as a practical assistant, that can tell you the time, date, weather, and more like these.

Noteworthy Features of Mycroft

Integrated with the most popular social media and professional platforms, including Facebook, Github , LinkedIn, and more.
You can run this application on different software and hardware platforms. It can be a desktop or a Raspberry Pi .
Besides being a smart voice assistant, it provides the facility of the audio record, machine learning, software library, and more.
It lets users convert the natural language to machine-readable data through Adapt, an intent parser of Mycroft.

Get Mycroft

8. OpenMindSpeech

Open Mind Speech is one of the essential Linux speech recognition tools aims to convert your speech to text for free. It is a part of Open Mind Initiative, runs its operation, especially for developers. This program was introduced with different names like VoiceControl, SpeechInput, and FreeSpeech before getting the present name.

Noteworthy Features of OpenMindSpeech

It uses the Overflow environment in the voice recognition operation to make the complex applications flexible.
Open Mind Speech is mostly compatible with Linux and UNIX-based platforms.
Using the internet, it can collect speech data from e-citizens, who are the contributors of raw data.

Get OpenMindSpeech

9. SpeechControl

Speech Control is a free speech recognition application, suitable for any Ubuntu distro. It comes with a graphical user interface based on Qt. Though it is still in its early development stage, you can use it for your simple project.

speechcontrol-open source voice recognition

Noteworthy Features of SpeechControl

Speech Control is an open source program under the General Public License (GPL).
It aims to work as a virtual assistant that provides repetitive task guidance to execute the process smoothly.
It is mostly suitable for Linux-based platforms.
Also, provides easy-to-understand user documentation with project details.

Get SpeechControl

10. Deepspeech.pytorch

Deepspeech.pytorch is another mentionable open source speech recognition application which is ultimately implementation of DeepSpeech2 for PyTorch. It contains a set of powerful networks based DeepSpeech2 architecture. With many helpful resources, it can be used as one of the essential Linux speech recognition tools for research and project development.

Noteworthy Features of Deepspeech.pytorch

Supports noise augmentation that helps to increase robustness at the time of loading audio.
To send the post request to the server, it provides a basic server script.
Support several datasets for downloading, including TEDLIUM, AN4, Voxforge, and LibriSpeech.
Lets you add noise into the training data through noise injection.
Supports Visdom and Tensorboard for visualizing training on scientific experimentation.

Get Deepspeech.pytorch

Finishing Thoughts

So, we have reached the finishing point on open source speech recognition tools for Linux. Hope, you got comprehensive information regarding this topic. The above-mentioned applications are free, easy-to-use, and ready to be a part of your academic or personal project.

Which one do you prefer most? If you have any other choices, then don’t hesitate to let us know. Please do share this article with your community, if you get it helpful. Till then, have a nice time. Thanks!

I dont understand alot of this github stuff i just need a deb

i just want to talk to my computer

I frequently make live videos (usually streamed by Instagram or Facebook) and I would like to know if there is a software that can automatically transcribe what I say in these videos, like Youtube does automatically for subtitles. Anyone can help? Thanks

I’m searching for a simple speech recognition to create a variable to select audio files to play for a blind person. This lady only wants to listen to a Bible version called The Message Bible. Unfortunately it isn’t available in a manner that doesn’t require the User to respond to visual selections. I envision a simple command line file triggered by a variable created by her voice when she says something like “Goto the book of Psalms, chapter 23. (since Psalms is indexed by Psalm they would be inside folders marked as chapters.

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

How to text-to-speech output using command-line?

How to get speech output from entered text by using command-line?

Also facility to change speech rate, pitch, volume etc using simple command .

command-line
software-recommendation
text-to-speech

1 Possible duplicate of How can I install and use text-to-speech software? – Organic Addict Dec 5, 2015 at 20:25
1 Update for 2023: these two are very natural sounding: Mimic (from MyCroft) and Coqui-ai TTS. See YouTube comparison of 7 TTS in my answer: askubuntu.com/a/1447599/795299 – alchemy Dec 28, 2022 at 1:34

15 Answers 15

In order of descending popularity :

say converts text to audible speech using the GNUstep speech engine.

festival General multi-lingual speech synthesis system.

spd-say sends text-to-speech output request to speech-dispatcher

espeak is a multi-lingual software speech synthesizer.

22 spd-say appears to be pre-installed in 14.04 and later: releases.ubuntu.com/trusty/… – Ciro Santilli OurBigBook.com Jul 28, 2016 at 11:52
10 Also sudo pip install gTTS , (Google Text to Speech/ github.com/pndurette/gTTS ) then gtts-cli "hello" -o hello.mp3 you can pipe it to mpg123 - as well. gtts-cli "why, hello there" | mpg123 - . – Elijah Lynn Apr 6, 2017 at 17:31
unfortunately, spd-say does not seem to be able to play tts simultaneously, only one a time – phil294 Jul 7, 2017 at 15:51
@ElijahLynn doesn't work – Dims Jan 19, 2018 at 12:49
1 @Wlad espeak.sourceforge.net/download.html is cross platform (but last release was in 2014) – Sylvain Pineau Dec 13, 2019 at 10:45

espeak is a nice little tool.

I just like playing around with it in a command line. You might find it conflicts with Pulseaudio so I'm using a long-winded version that negates having to set it up properly.

espeak --help will show you the options to calibrate reading speed, pitch, voice, etc.

When you're doing your notes, save them as a text file and then:

You can then play around with ffmeg et al to compress this down from PCM to something more manageable like MP3 or OGG. But that's a different story.

1 very nice, one can also try the Graphic User Interface to espeak, espeak-gui. – Sabacon Jan 16, 2011 at 13:15
Pretty rubbish compared to the Mac's text-to-speech tool. – Snowcrash Nov 28, 2019 at 11:29
@Snowcrash Okay... You're free to use something else like Mary-TTS, but that's considerably more of a PITA to install: askubuntu.com/questions/981273/how-to-install-marytts-5-2/… – Oli ♦ Nov 28, 2019 at 11:55

From man spd-say :

Hence you can get text-to-speech by following command:

You can also set speech rate, pitch, volume etc. see man-page.

4 spd-say -t female2 "text" makes it bearable – scorpiodawg Jun 5, 2018 at 17:59
@scorpiodawg Barely, that's pretty primitive still... – Olle Härstedt Jan 26 at 11:10

Python Google Speech :

Svox From Android :

Svox Nanotts :

Linked resource: Comparison of speech synthesizers Post source: Linuxhacks.org Disclosure: I am the owner of Linuxhacks.org

2 To install and use google_speech on ubuntu 18.04 I had to install python3-pip and libsox-fmt-mp3 and use pip3 install google_speech . – artm Jul 1, 2018 at 9:14
Any idea why google_speech has to reboot itself for larger chunks of text? Is there a buffer setting somewhere? – Olle Härstedt Jan 26 at 11:11

Mbrola doesn't work since 11.10.

SVOX (pico) tools are easy to install, easy to use and brings good quality voices in Ubuntu. Install it:

Even more easy, you can use LibreOffice in combination with SVOX (pico) tools by install the "Read Text" extension and you obtain a "GUI" for this excellent TTS software:

Set up Read Text Extension's options with Tools - Add-ons - Read selection.... Use /usr/bin/python as the external program. Select a command line option that includes the token (PICO_READ_TEXT_PY).

SVOX pico2wave

That's what I use. And it sounds natural, it's easy to understand and it recognizes units (m, °C,kg, ...).

Here is my first post about pico2wave.

All you have to do is: Go to Ubuntu Software Center and search for "pico". You'll find 4 or 5 entries with "Small Footprint Ling...". Install them.

A possible use of pico2wave is described in my first posting (follow the link above).

i have used your way can you pls tell me how to get a naturl sweet female voice using your way – user49557 Jun 19, 2015 at 13:03

And yet another espeak gui: gespeaker . It uses both espeak and mbrola engines. Also, it has more options than espeak-gui .

The following is not a FLOSS solution, but you may find it worthwhile. (it is a wine solution),

I'm personally very keen on TTS, I use it quite often... eg. listening to a rambling discourse which I would never bother to stick with otherise (because I need to get another cup of coffee... :)

A few things I've discovered along the way.. or should I say, things I haven't discovered along the way... To put it bluntly: Every piece of FOSS TTS voice software I've tried is under par and therefore unsuitable for any semi-protracted listening...

I currently use ATnT's NaturalVoices. It is only available for Windows (maybe the Mac), but it does run under wine in Ubuntu .. (it has minor glytch, where I sometimes need to click on the panel when I move away from the reader... It is a minor issue when compared to the advantage gained by quality of speech from NatualVoices.

Some other things I've found to be virtually essential for a half-sensible listening experience, are;...

These TTS progamas are not intelligent (well maybe as intelligent as a young baboon) .. so they need every bit of help they can get. and there is one (and only one Reader program I've found which helps greatly in this.. The app is called ReadPlease (2003 Pro) ... It allowd you to specially modify words and groups of word to be pronounced as you want them... It is by no means perfect, but for me, it made the difference between the entire process being usable and not usable...

The speech in Natural Voices is "okay", but it is a bit boring. There are other good products too, but they are all for Windows, unfortunately).. It infeclts surprisingl well sometimes .. but OMG, initially it is a pain! .. so #2 is * patience ... and lots of updating of your "special words" list ... By patience, I mean you(I) actually became accustomed to my particular baboon's speech patterns :)... and by the way, I currently have about 3000 words that now sound "Human" enough that I no longer cringe when I hear them.

3.. "Follow the Bouncing Ball" ... Again because the voice is never as good as a real speaker, things sometimes need to be clarified .. . The Reader program I use has one feature for which I even put up with its clunky looking interface.... Is has a "select the currently being read" word option.. Many readers have this, but ReadPlease keeps the current line bang on center of the screen .. This is invaluable to be able to see ahead and behind to quickly re-read what you just missed (so auto-centering the curent line is good)...

Well that's my experience.. I'm going to make a coffee now, and while I'm doing it, I'll be listening to this, to see how it "reads".... TTS is surprisingl good for picking up typos (I make lots of typos)...

If something as good as ATnT NaturalVoices turns up on the Ubuntu repository, I'll jump at it.

Here is a link to some samples of Natural Voices : I use "MIke"

For festival (the voice seems more natural to me):

Pitch and speed configuration:

create ~/.festivalrc with the following content:

See also http://www.solomonson.com/content/ubuntu-linux-text-speech

Update: tried on another Ubuntu computer. Had to install English speech engine package to work with festival properly:

Also play is a cli command which comes with the sox package:

Even though you've already accepted an answer, I wanted to mention festival , which I like quite a lot too. This post on the Ubuntu forums has a lot of information on getting very nice voices set up for it.

Meet espeak-ng - A multi-lingual software speech synthesizer:

It uses a default English voice, but there are numerous other voices for other languages and even dialects available and can be listed with espeak-ng --voices (for all) or e.g. espeak-ng --voices=en (for English). They can be set with -v together with either the language abbreviation or the file name, e.g. for Scottish or Swahili:

There are many other options available, e.g. -s for the speed and -w to write the output to a wave file, see the manpage linked below.

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged command-line software-recommendation text-to-speech ..

The Overflow Blog
OverflowAI and the holy grail of search
Featured on Meta
Our Partnership with OpenAI
What deliverables would you like to see out of a working group?

Hot Network Questions

Counting consecutive units in nested lists
How will hitting the gym trigger the effect of the drug?
Would the equatorial region on an alternative Earth spinning on two axes still be +/- 5° above the median "equator?"
Is there a single word (or a short phrase) meaning "to be used both in the UK and in the US"
What would military spies in 17th century Europe be looking for inside a city, before a siege?
Clarification needed about my Schengen visa (CZ, DE, SK)
Decode a Caesar ciphertext with high probability
Custom multiple \str_case:nn — expl3
If we consider the spacetime of the universe to be four-dimensional, does the Big Bang lie in its center?
In the phrase "the letter L" or "the number 3", which is the noun and which is the adjunct?
Reasons for implementing op-amps which are not unity-gain stable
Why has the Cuban government invested little or nothing on biofuels?
I'm trying to remember a game about collecting a lot of gold and defeating bosses
Why does classical physics not predict particles in the double-slit experiment to land in just two different locations?
Why the color is different in two pictures in similar conditions and setting?
How to adjust parent anchors so that they don't touch
Second order ODE solution approach
How to build apartment floors/ceilings to not transfer sound?
tcsh: Handle spaces in arguments when passing on to another command
I missed a paper from the past year which is very close to what I am doing. What should I do?
`exit` man page
Unable to change file ownership through find-exec
Why do cryogenic fuels want an extra pressure tank?
What are the minimum system requirements to run GW-BASIC?

Embedded/IoT
Open Source
System Administration
Certification
What is Linux?

Text-to-speech software gets Linux voice

NeoSpeech has announced Linux support for its VoiceText 2.0, advanced Text-to-Speech (TTS) software. The software generates natural-sounding voices from text input across handheld, desktop, and network/server applications.

Link: desktoplinux.com

Maintainer Confidential: Challenges and Opportunities One Year On

Bridging design and runtime gaps: asyncapi in event-driven architecture, implementing opentelemetry natively in an event broker, innovation as a catalyst in telecommunications, linux 6.8 brings more sound hardware support for intel & amd, including the steam deck.

Text to Speech synthesis software

There are in principle many free software alternatives for converting text to speech on Linux but in practice there's just two and they are rather poor compared to proprietary alternatives. They can be used to make the computer read text and speak in very artificial-sounding voices.

1 The free software alternatives for converting Text to Speech on Linux
2 HOWTO use Mimic
3.1 Adding MBROLA voices: don't bother
4 HOWTO use festival
5 HOWTO use flite
6 Proprietary alternatives

The free software alternatives for converting Text to Speech on Linux

The practically usable alternatives for converting text to speech using free software on GNU/Linux desktop and laptop machines are:

mimic from Mycroft, forked off an early version of the flite software, is the best choice if you are only interested in the English language.
festival is actively developed and it works fine but it is not great and it does not sound as good as mimic. festival may be the better choice for non-English languages. Festival is developed by the British at the University of Edinburgh. The project was dead for many years which is why some GNU/Linux distributions still ship an ancient version from 2004 even though there have been several releases after the project was somewhat revived in 2017. if echo 'hello' | festival --tts results in a Segmentation fault (core dumped) then it is likely because your distribution gave you an outdated version.

mimic and festival are not what you could call "natural-sounding". They do produce acceptable and, more importantly, understandable results even though both sounds very artificial.

There are several other alternatives but they not very good and, in most cases, usable. Many web pages, notably older pages and pages made by people who didn't do anything and just cut and paste from older pages, will recommend the following programs:

The flite project is not dead, there was a release (2.5.1) in July 2020. You can acquire the source from github.com/festvox/festival and compile it yourself if you want to try a newer flite version. Why GNU/Linux distributions ship an ancient 2005 version is unclear.
There's a Java alternative called freetts which was last updated in 2009. Good luck getting that one working. We tried and gave up after wasting too much time on it.
espeak, last updated in 2014, is another widely recommended alternative that isn't usable on modern distributions. The espeak espeak-ng fork is actively developed and it is quite usable.
espeak-ng (espeak next generation) can be used but it doesn't sound very good. All the distributions have a working version available in their repositories, so there's that.
There is also a GNU project for voice synthesis called gnuspeech . It was last updated in 2015. You can view the code at git.savannah.gnu.org: gnuspeech and you may be able to get it to compile if you have a lot of patience and willingness to change the code so it compiles against modern libraries. Getting it to work is not easy and it isn't very good.

GNU/Linux systems have a layer between applications with text to speech features and the applications who provide these features called speech-dispatcher . speech-dispatcher can be configured any of the above mentioned programs.

HOWTO use Mimic

A video explaining the four essential freedoms software must have to qualify as free software made in kdenlive using mimic -voice slt to create the audio.

mimic from Mycroft is available as a package called mimic on most GNU/Linux distributions. It is a pure command-line tool, there is no GUI. Using it is strait-forward:

mimic -t "Hello world" makes it say "Hello world".

-f filename.txt makes it read a text file. Adding -o output.wav makes mimic write the voice output to a .wav formatted audio file.

This is what mimic -t 'Hello, this is a test of the emergency broadcasting system' -o mimic-test.wav ; oggenc mimic-test.wav sounds like:

The mimic package comes with several built-in voices. There is also support for voice-files. One voice-file comes pre-installed in /usr/share/mimic/voices . There are no additional voice files available on the mimic website at mimic.mycroft.ai/ but there are some files flitevox files in a voices/ folder that are not included in the package distributions ship on the GitHub page at https://github.com/MycroftAI/mimic1 .

The internal voices in mimic can be used by passing the -voice option. The available built-in internal voices can be listed with mimic -lv

This will, when using mimic v1.3.0, output: Voices available: ap slt slt_hts kal awb kal16 rms awb_time

The slt and slt_hts voices are female voices. Here is a test of slt made using:

mimic -t 'Hello, this is a test of the emergency broadcasting system' -voice slt -o mimic-slt-test.wav

ab, awb, kal and rms are male voices. awb is probably British. kal is probably a drunk. rms does not sound anything like Richard Stallman .
slt and slt_hts are female voices.
awb_time and kal16 seem to be broken, using them does not produce any understandable outout

Run mimic --help to see all the available command-line options.

HOWTO use espeak-ng

espeak-ng is a commmand-line tool which, like most command-line tools, accepts piped input. It will happily turn all piped input, either it's a file you cat or text you echo and turn it into spoken audio. Example:

echo 'Hello, this is a test of the emergency broadcasting system' | espeak-ng

This is what it wounds like - twice:

espeak-ng does have quite a lot of options for "enhancing" the audio. You can set things like speed, pause between words and amplitude. And there's several different voices available for it. Thus; you can play around with it but don't expect "professional" results no matter what you do.

The most interesting options to try with espeak-ng are espeak-ng --voices and espeak-ng --voices=mb which will list all the available voices for the default and the MBROLA voice synthesizer respectively. The list for --voices will be long and look like this

(That's just 3 lines picked randomly, espeak-ng outputs a much longer list)

These voices can then be used with the -v option. Thus; to make it say something with the Norwegian voice you could do:

echo 'Nei takk ikke fiskeboller' | espeak-ng -v gmq/nb

espeak-ng is developed at github.com/espeak-ng/espeak-ng/ .

Adding MBROLA voices: don't bother

espeak-ng supports using MBROLA as a back-end. The list for MBROLA supported voices can be generated by espeak-ng --voices=mb and it will look similar to regular voices. However, using them will only work if you have the mbrola binary installed. It is non-free and not available in distributions. You can download and install it from http://tcts.fpms.ac.be/synthesis/mbrola.html if you want to. It it not worth the trouble. The voices available to it are different from espeak-ng's stock - but they are not better. If anything, they sound worse.

The espeak-ng manual page lists a lot more options. But as said, it won't sound great no matter what you do.

HOWTO use festival

festival will say whatever is piped to it if you have a working version and you add the --tts option:

You can pipe files to festival and have them read:

Many GNU/Linux distributions ship wildly outdated versions of festival . You may find that the version your distribution includes segfaults and exits when you try to use it. You can acquire the source code from github.com/festvox/festival and compile it yourself that's the case.

HOWTO use flite

All the GNU/Linux distributions ship flite 1.3 from 2005 for some reason we can't begin to imagine. There are several newer releases available, v2.5.1 was released in July 2020.

The text you want flite to say can be specified with -t .

flite 1.3 will not produce any audio, or anything else, if you tell it to say something with -t . It does support file output and that works.

will produce a flite-1.3-test.wav file you can play with aplay or mpv .

You will want to compile and install a recent version (source at github.com/festvox/flite ) if you want to use flite because the version Linux distributions ship is typically wildly outdated and outright horrible.

Proprietary alternatives

Amazon Polly is the best proprietary alternative if you want text-to-speech functionality in a non-free software project. It is botnet text to speech cloud service operated by the very evil American Amazon corporation . Stallman would absolutely not approve. baby WOGUE uses it to make YouTube Video s about free software. You can check that channel out to get an idea how Amazon Polly sounds. It is better than mimic and espeak-ng for practical purposes and worth looking into if you think evil proprietary software tied to cloud services is acceptable when there is no superb free alternative. You could check out AWS: Getting Started with Amazon Polly if you are interested. Most of the Android "apps" for text-to-speech use the Amazon Polly API.

Read Aloud: A Text to Speech Voice Reader is a plug-in for the Mozilla Firefox web browser which lets you do text-to-speech in that web browser using server-side services. The "standard" voices available are all generated using Google services. A Google account is required to use some of the "premium" voices. There are also many other "premium" voices available that use other third party services. You need to buy a subscription in order to use those voices.

Natural Reader is a plug-in for the Chrome and Chromium web browsers which lets you do text-to-speech in those browsers using a server-side service.

Read Aloud and Natural Reader are both decent alternatives if you want something read aloud. The obvious downsides with those are that a) they are limited to in-browser text-to-speech only and b) they use proprietary cloud services to do the actual text to speech synthesis. Everything you ask them to read is sent to the cloud.

Enable comment auto-refresher

Anonymous (f4df9e7b4e)

Permalink | Reply

Software comparisons

Personal tools

Create account
Breaking News
Software Reviews
Game Reviews

fun free games

Racing games
Blue Nebula
Secret Chronicles of Dr. M.
SuperTuxKart
Unvanquished

software benchmarks

Web Browser Performance Round-Up April 2021
bzip2 vs lzip vs xz

educational videos

Arch Conf 2020
Fosdem 2021
LibrePlanet 2021
X.Org Developers Conference 2020
Lectures by Richard Stallman

Comparisons

BitTorrent clients
Desktop Environments
Image Viewers
Video Editors
RSS feed readers
System Monitoring Programs

Great software

Cantata mpd music player
mpv media player

for beginners

Bash Guide for Beginners
Learn to touch-type
Learn to compress and decompress archives with tar
Learn how to convert video files with ffmpeg
Make GIMPs interface colorful and happy
Learn to lists the ports a system is listening on

cheat sheets

Bourne Shell Reference
Tao of Regular Expressions
Magic Command Line Collection
see a games FPS Second and other data in a HUD overlay
use the numeric keyboard keys as mouse in XOrg
Red Star OS
HOWTO get Korean input on Manjaro
HOWTO get Korean input on Ubuntu
Rockit Girl
Ask LinuxReviews

feed reader feeds

News (Atom)

try your luck

Random Page
Random news story
Random Game
Recent Changes
What links here
Related changes
Special pages
Printable version
Permanent link
Page information
This page was last edited on 2 March 2021, at 19:54.
Privacy Policy
About LinuxReviews
Latest News
Latest Reviews

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications

Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

petewarden/spchcat

Folders and files, repository files navigation, description.

spchcat is a command-line tool that reads in audio from .WAV files, a microphone, or system audio inputs and converts any speech found into text. It runs locally on your machine, with no web API calls or network activity, and is open source. It is built on top of Coqui's speech to text library , TensorFlow , KenLM , and data from Mozilla's Common Voice project .

It supports multiple languages thanks to Coqui's library of models . The accuracy of the recognized text will vary widely depending on the language, since some have only small amounts of training data. You can help improve future models by contributing your voice .

Installation

On Debian-based x86 Linux systems like Ubuntu you should be able to install the latest .deb package by downloading and double-clicking it. Other distributions are currently unsupported. The tool requires PulseAudio, which is already present on most desktop systems, but can be installed manually .

There's a notebook you can run in Colab at notebooks/install.ipynb that shows all installation steps.

Raspberry Pi

To install on a Raspberry Pi, download the latest .deb installer package and either double-click on it from the desktop, or run dpkg -i ~/Downloads/spchcat_0.0-2_armhf.deb from the terminal. It will take several minutes to unpack all the language files. This version has only been tested on the latest release of Raspbian, released October 30th 2021, and on a Raspberry Pi 4. It's expected to fail on Raspberry Pi 1's and 0's, due to their CPU architecture.

After installation, you should be able to run it with no arguments to start capturing audio from the default microphone source, with the results output to the terminal:

After you've run the command, start speaking, and you should see the words you're saying appear. The speech recognition is still a work in progress, and the accuracy will depend a lot on the noise levels, your accent, and the complexity of the words, but hopefully you should see something close enough to be useful for simple note taking or other purposes.

System Audio

If you don't have a microphone attached, or want to transcribe audio coming from another program, you can set the --source argument to 'system'. This will attempt to listen to the audio that your machine is playing, including any videos or songs, and transcribe any speech found.

One of the most common audio file formats is WAV. If you don't have any to test with, you can download Coqui's test set to try this option out. If you need to convert files from another format like '.mp3', I recommend using FFMPeg . As with the other source options, spchcat will attempt to find any speech in the files and convert it into a transcript. You don't have to explicitly set the --source argument, as long as file names are present on the command line that will be the default.

If you're using the audio file from the test set, you should see output like the following:

You can also specify a folder instead of a single filename, and all .wav files within that directory will be transcribed.

Language Support

So far this documentation has assumed you're using American English, but the tool will default to looking for the language your system has been configured to use. It first looks for the one specified in the LANG environment variable. If no model for that language is found, it will default back to 'en_US'. You can override this by setting the --language argument on the command line, for example:

This works independently of --source and other options, so you can transcribe microphone, system audio, or files in any of the supported languages. It should be noted that some languages have very small amounts of data and so their quality may suffer. If you don't care about country-specific variants, you can also just specify the language part of the code, for example --language=en . This will pick any model that supports the language, regardless of country. The same thing happens if a particular language and country pair isn't found, it will log a warning and fall back to any country that supports the language. For example, if 'en_GB' is specified but only 'en_US' is present, 'en_US' will be used.

All of these models have been collected by Coqui, and contributed by organizations like Inclusive Technology for Marginalized Languages or individuals. All are using the conventions for Coqui's STT library, so custom models could potentially be used, but training and deployment of those is outside the scope of this document. The models themselves are provided under a variety of open source licenses, which can be inspected in their source folders (typically inside /etc/spchcat/models/ ).

Saving Output

By default spchcat writes any recognized text to the terminal, but it's designed to behave like a normal Unix command-line tool, so it can also be written to a file using indirection like this:

If you then run cat /tmp/transcript.txt (or open it in an editor) you should see `your power is sufficient i said'. You can also pipe the output to another command. Unfortunately you can't pipe audio into the tool from another executable, since pipes aren't designed for non-text data.

There is one subtle difference between writing to a file and to the terminal. The transcription itself can take some time to settle into a final form, especially when waiting for long words to finish, so when it's being run live in a terminal you'll often see the last couple of words change. This isn't useful when writing to a file, so instead the output is finalized before it's written. This can introduce a small delay when writing live microphone or system audio input.

Build from Source

It's possible to build all dependencies from source, but I recommending downloading binary versions of Coqui's STT, TensorFlow Lite, and KenLM libraries from github.com/coqui-ai/STT/releases/download/v1.1.0/native_client.tflite.Linux.tar.xz . Extract this to a folder, and then from inside a folder containing this repo run to build the spchcat tool itself:

You should replace ../STT_download with the path to the Coqui library folder. After this you should see a spchcat executable binary in the repo folder. Because it relies on shared libraries, you'll need to specify a path to these too using LD_LIBRARY_PATH unless you have copies in system folders.

The previous step only built the executable binary itself, but for the complete tool you also need data files for each language. If you have the gh GitHub command line tool you can run the download_models.py script to fetch Coqui's releases into the build/models folder in your local repo. You can then run your locally-built tool against these models using the --languages_dir option:

After you have the tool built and the model data downloaded, create_deb_package.sh will attempt to package them into a Debian installer archive. It will take several minutes to run, and the result ends up in spchcat_0.0-2_amd64.deb .

Release Process

There's a notebook at notebooks/build.pynb that runs through all the build steps needed to downloaded dependencies, data, build the executable, and create the final package. These steps are run inside an Ubuntu 18.04 Docker image to create the binaries that are released .

Contributors

Tool code written by Pete Warden , [email protected] , heavily based on Coqui's STT example . It's a pretty thin wrapper on top of Coqui's speech to text library , so the Coqui team should get credit for their amazing work. Also relies on TensorFlow , KenLM , data from Mozilla's Common Voice project , and all the contributors to Coqui's model zoo .

Tool code is licensed under the Mozilla Public License Version 2.0, see LICENSE in this folder.

All other libraries and model data are released under their own licenses, see the relevant folders for more details.

Jupyter Notebook 28.9%
Makefile 7.1%
Python 0.5%

Top 11 Open Source Speech Recognition/Speech-to-Text Systems

Last Updated on: May 15, 2024

A speech-to-text (STT) system , or sometimes called automatic speech recognition (ASR) is as its name implies: A way of transforming the spoken words via sound into textual data that can be used later for any purpose.

Speech recognition technology is extremely useful. It can be used for a lot of applications such as the automation of transcription, writing books/texts using sound only, enabling complicated analysis on information using the generated textual files and a lot of other things.

In the past, the speech-to-text technology was dominated by proprietary software and libraries. Open source speech recognition alternatives didn’t exist or existed with extreme limitations and no community around.

This is changing, today there are a lot of open source speech-to-text tools and libraries that you can use right now.

Table of Contents:

What is a Speech Recognition Library/System?

What is an open source speech recognition library, what are the benefits of using open source speech recognition, 1. project deepspeech, 4. flashlight asr (formerly wav2letter++), 5. paddlespeech (formerly deepspeech2), 6. openseq2seq, 10. whisper, 11. styletts2, what is the best open source speech recognition system.

It is the software engine responsible for transforming voice to texts.

It is not meant to be used by end users. Developers will first have to adapt these libraries and use them to create computer programs that can enable speech recognition to users.

Some of them come with preloaded and trained dataset to recognize the given voices in one language and generate the corresponding texts, while others just give the engine without the dataset, and developers will have to build the training models themselves. This can be a complex task, similar to asking someone to do my online homework for me , as it requires a deep understanding of machine learning and data handling.

You can think of them as the underlying engines of speech recognition programs.

If you are an ordinary user looking for speech recognition, then none of these will be suitable for you, as they are meant for development use only.

The difference between proprietary speech recognition and open source speech recognition, is that the library used to process the voices should be licensed under one of the known open source licenses, such as GPL, MIT and others.

Microsoft and IBM for example have their own speech recognition toolkits that they offer for developers, but they are not open source. Simply because they are not licensed under one of the open source licenses in the market.

Mainly, you get few or no restrictions at all on the commercial usage for your application, as the open source speech recognition libraries will allow you to use them for whatever use case you may need.

Also, most – if not all – open source speech recognition toolkits in the market are also free of charge, saving you tons of money instead of using the proprietary ones.

The benefits of using open source speech recognition toolkits are indeed too many to be summarized in one article.

Top Open Source Speech Recognition Systems

In our article we’ll see a couple of them, what are their pros and cons and when they should be used.

This project is made by Mozilla, the organization behind the Firefox browser.

It’s a 100% free and open source speech-to-text library that also implies the machine learning technology using TensorFlow framework to fulfill its mission. In other words, you can use it to build training models by yourself to enhance the underlying speech-to-text technology and get better results, or even to bring it to other languages if you want.

You can also easily integrate it to your other machine learning projects that you are having on TensorFlow. Sadly it sounds like the project is currently only supporting English by default. It’s also available in many languages such as Python (3.6).

However, after the recent Mozilla restructure, the future of the project is unknown, as it may be shut down (or not) depending on what they are going to decide .

You may visit its Project DeepSpeech homepage to learn more.

Kaldi is an open source speech recognition software written in C++, and is released under the Apache public license.

It works on Windows, macOS and Linux. Its development started back in 2009. Kaldi’s main features over some other speech recognition software is that it’s extendable and modular: The community is providing tons of 3rd-party modules that you can use for your tasks.

Kaldi also supports deep neural networks, and offers an excellent documentation on its website . While the code is mainly written in C++, it’s “wrapped” by Bash and Python scripts.

So if you are looking just for the basic usage of converting speech to text, then you’ll find it easy to accomplish that via either Python or Bash. You may also wish to check Kaldi Active Grammar , which is a Python pre-built engine with English trained models already ready for usage.

Learn more about Kaldi speech recognition from its official website .

Probably one of the oldest speech recognition software ever, as its development started in 1991 at the University of Kyoto, and then its ownership was transferred to as an independent project in 2005. A lot of open source applications use it as their engine (Think of KDE Simon).

Julius main features include its ability to perform real-time STT processes, low memory usage (Less than 64MB for 20000 words), ability to produce N-best/Word-graph output, ability to work as a server unit and a lot more.

This software was mainly built for academic and research purposes. It is written in C, and works on Linux, Windows, macOS and even Android (on smartphones). Currently it supports both English and Japanese languages only.

The software is probably available to install easily using your Linux distribution’s repository; Just search for julius package in your package manager.

You can access Julius source code from GitHub.

If you are looking for something modern, then this one can be included.

Flashlight ASR is an open source speech recognition software that was released by Facebook’s AI Research Team. The code is a C++ code released under the MIT license.

Facebook was describing its library as “the fastest state-of-the-art speech recognition system available” up to 2018.

The concepts on which this tool is built makes it optimized for performance by default. Facebook’s machine learning library Flashlight is used as the underlying core of Flashlight ASR. The software requires that you first build a training model for the language you desire before becoming able to run the speech recognition process.

No pre-built support of any language (including English) is available. It’s just a machine-learning-driven tool to convert speech to text.

You can learn more about it from the following link .

Researchers at the Chinese giant Baidu are also working on their own speech recognition toolkit, called PaddleSpeech.

The speech toolkit is built on the PaddlePaddle deep learning framework, and provides many features such as:

Speech-to-Text support.
Text-to-Speech support.
State-of-the-art performance in audio transcription, it even won the NAACL2022 Best Demo Award ,
Support for many large language models (LLMs), mainly for English and Chinese languages.

The engine can be trained on any model and for any language you desire.

PaddleSpeech ‘s source code is written in Python, so it should be easy for you to get familiar with it if that’s the language you use.

Developed by NVIDIA for sequence-to-sequence models training.

While it can be used for way more than just speech recognition, it is a good engine nonetheless for this use case. You can either build your own training models for it, or use models which are shipped by default. It supports parallel processing using multiple GPUs/Multiple CPUs, besides a heavy support for some NVIDIA technologies like CUDA and its strong graphics cards.

As of 2021 the project is archived; it can still be used but looks like it is no longer under active development.

Check its speech recognition documentation page for more information, or you may visit its official source code page .

One of the newest open source speech recognition systems, as its development just started in 2020.

Unlike other systems in this list, Vosk is quite ready to use after installation, as it supports 10 languages (English, German, French, Turkish…) with portable 50MB-sized models already available for users (There are other larger models up to 1.4GB if you need).

It also works on Raspberry Pi, iOS and android devices, and provides a streaming API which allows you to connect to it to do your speech recognition tasks online. Vosk has bindings for Java, Python, JavaScript, C# and NodeJS.

Learn more about Vosk from its official website .

An end-to-end speech recognition engine which implements ASR.

Written in Python and licensed under the Apache 2.0 license. Supports unsupervised pre-training and multi-GPUs training either on same or multiple machines. Built on the top of TensorFlow.

Has a large model available for both English and Chinese languages.

Visit Athena source code .

Written in Python on the top of PyTorch.

Also supports end-to-end ASR. It follows Kaldi style for data processing, so it would be easier to migrate from it to ESPnet. The main marketing point for ESPnet is the state-of-art performance it gives in many benchmarks, and its support for other language processing tasks such as speech-to-text (STT), machine translation (MT) and speech translation (ST).

Licensed under the Apache 2.0 license.

You can access ESPnet from the following link .

The newest speech recognition toolkit in the family, developed by the famous OpenAI company (the same company behind ChatGPT ).

The main marketing point for Whisper is that it does not specialize in a set of training datasets for specific languages only; instead, it can be used with any suitable model and for any language. It was trained on 680 thousand hours of audio files, one third of which were non-English datasets.

It supports speech-to-text, text-to-speech, speech translation. And the company claims that its toolkit has 50% less errors in the output compared to other toolkit in the market.

Learn more about Whisper from its official website .

The newest speech recognition library on the list, which was just released in the middle of November, 2023. It employs diffusion techniques with large speech language models (SLMs) training in order to achieve more advanced results than other models.

The makers of the model published it along with a research paper, where they make the following claim about their work:

This work achieves the first human-level TTS synthesis on both single and multispeaker datasets, showcasing the potential of style diffusion and adversarial training with large SLMs.

It is written in Python, and has some Jupyter notebooks shipped with it to demonstrate how to use it. The model is licensed under the MIT license.

There is an online demo where you can see different benchmarks of the model: https://styletts2.github.io/

If you are building a small application that you want to be portable everywhere, then Vosk is your best option, as it is written in Python and works on iOS, android and Raspberry pi too, and supports up to 10 languages. It also provides a huge training dataset if you shall need it, and a smaller one for portable applications.

If, however, you want to train and build your own models for much complex tasks, then any of PaddleSpeech, Whisper and Athena should be more than enough for your needs, as they are the most modern state-of-the-art toolkits.

As for Mozilla’s DeepSpeech , it lacks a lot of features behind its other competitors in this list, and isn’t really cited a lot in speech recognition academic research like the others. And its future is concerning after the recent Mozilla restructure, so one would want to stay away from it for now.

Traditionally, Julius and Kaldi are also very much cited in the academic literature.

Alternatively, you may try these open source speech recognition libraries to see how they work for you in your use case.

The speech recognition category is starting to become mainly driven by open source technologies, a situation that seemed to be very far-fetched a few years ago.

The current open source speech recognition software are very modern and bleeding-edge, and one can use them to fulfill any purpose instead of depending on Microsoft’s or IBM’s toolkits.

If you have any other recommendations for this list, or comments in general, we’d love to hear them below!

FOSS Post has been providing high-quality content about open source and Linux software for around 7 years now. All of our content is free so that you can enjoy it whenever you like. However, consider buying us a cup of coffee by joining our Patreon campaign or doing a one-time donation to support our efforts!

Our community platform is here. Join it now so that you can explore tons of interesting and fun discussions about various open source aspects and issues!

Are you stuck following one of our articles or technical tutorials? Drop us a support request in the forum and we'll get right back to you.

You can take a number of interesting and exciting quizzes that the FOSS Post team prepared about various open source software from FOSS Quiz.

With a B.Sc and M.Sc in Computer Science & Engineering, Hanny brings more than a decade of experience with Linux and open-source software. He has developed Linux distributions, desktop programs, web applications and much more. All of which attracted tens of thousands of users over many years. He additionally maintains other open-source related platforms to promote it in his local communities.

Hanny is the founder of FOSS Post.

Enter your email address to subscribe to our newsletter. We only send you an email when we have a couple of new posts or some important updates to share.

Social Links

Open Source Directory

Join the force.

For the price of one cup of coffee per month:

Support the FOSS Post to produce more content.
Get a special account on our website.
Remove all the ads you are seeing (including this one!).
Get an OPML file containing +70 RSS feeds for various FOSS-related websites and blogs, so that you can import it into your favorite RSS reader and stay updated about the FOSS world!

Become a Supporter

Sign up in our modern forum to discuss various issues and see a lot of insightful, entertaining and informational content about Linux and open source software! Your content is yours and you can take it with you wherever you go.

* Premium members get a special badge.

No thanks, I’m not interested!

Originally published on August 23, 2020, Last Updated on May 15, 2024 by M.Hanny Sabbagh

Create a Resume / Cover Letter
Expand Your Network / Mentor
Explore Your Interests / Self Assessment
Find Funding Opportunities
Negotiate an Offer
Prepare for an Interview
Prepare for Graduate School
Search for an Internship
Search for a Job
Agriculture, Food & Natural Resources
Architecture & Construction
Arts, Audio/Video Technology & Communications
Business Management & Administration
Education & Training
Government & Public Administration
Health Science
Hospitality & Tourism
Human Services
Information Technology
Law, Public Safety, Corrections and Security
Transportation, Distribution & Logistics
Science, Technology, Engineering & Mathematics
Asian/Pacific Islander
Black/African American
First Generation Students
Hispanic/Latinx
International Students
Student Athletes
Students With Disabilities
Undocumented/DACA Students
On-Campus Employment
Meet the Team

OpenAI API: Speech

Share This: Share OpenAI API: Speech on Facebook Share OpenAI API: Speech on LinkedIn Share OpenAI API: Speech on X

Instructor: Ronnie Sheer

The game-changing text-to-speech and speech-to-text APIs in OpenAI may set a new standard for voice activated software. Developers can increasingly benefit from proficiency with these APIs. Since its release, Whisper Model has completely disrupted the STT/TTS space. In this hands-on course, software developer and instructor Ronnie Sheer shows you how to leverage the voice APIs in OpenAI. Learn about voice models and the AI revolution. Get useful overviews of STT/TTS, then dive into actually building a small text reading web app and a pronunciation coach. After completing this course, you will be able to incorporate STT/TTS capabilities into your software in genuinely meaningful and useful ways.

Shell Scripting
Docker in Linux
Kubernetes in Linux
Linux interview question
Top 12 AI Tools for LinkedIn [2024]
Top 10 Open-Source LLM Models
Top 10 AI Tools for Sales (Free and Paid)
Top 10 Best AI Tools for Startups in 2024
10 AI Tools for Stock Trading & Price Predictions
Top 10 Free AI Fitness & Nutrition Tools
Top 10 Free AI Music Generation Tools [2024]
Top 10 AI Tools for Data Analysis
Top 10 AI Tools for Podcasters: Exclusive List [2024]
Top 12 AI Tools for NLP (Natural Language Processing ): 2024
Top 10 Free AI Tools for Video Editing
Top 10 AI-Powered Music Mixing Tools for Musicians and Producers
Top 10 Kali Linux Tools For Hacking
10 Best AI tools for Web Development
Top 10 AI Tools for Business in 2024
Top 10 AI Content Creation Tools
OpenAI To Launch AI Tool Sora For Public Use In 2024
Top 5 Open Source Source and Free Static Code Analysis Tools in 2020
Top 10 Software Architecture Tools in 2024

10 Top Open Source AI Tools for Linux

Artificial Intelligence (AI) has rapidly evolved from a futuristic concept to an integral part of our daily lives. From recommendation systems to autonomous vehicles, AI technologies are transforming industries and revolutionizing how we interact with technology. One of the driving forces behind this transformation is the availability of open-source AI tools. These tools provide developers with the flexibility, transparency, and collaboration necessary to build cutting-edge AI solutions. In this article, we explore 10 top open-source AI tools specifically tailored for the Linux ecosystem, empowering developers to embark on their AI journey.

10 Top Open Source Artificial Intelligence Tools for Linux

TensorFlow:

Scikit-learn:, apache mxnet:.

TensorFlow , developed by Google, is one of the most popular open-source AI libraries used for machine learning and deep learning tasks. It offers comprehensive support for neural networks, including both CPU and GPU acceleration. TensorFlow’s flexibility allows developers to deploy models across various platforms, from cloud servers to mobile devices. Its extensive documentation and vibrant community make it an ideal choice for both beginners and experienced developers.

Object Detection
Image Classification
Image Segmentation

PyTorch , backed by Facebook’s AI Research lab (FAIR), has gained widespread adoption for its dynamic computational graph and intuitive interface. With its imperative programming style, PyTorch simplifies the process of building and training complex neural networks. It also provides seamless integration with Python libraries, making it a favorite among researchers and practitioners alike. PyTorch’s strong focus on usability and flexibility has propelled it to become a cornerstone of modern AI development.

Sentiment Analysis
Language Translation
Text Generation

Keras , an open-source neural network library written in Python, acts as an interface for TensorFlow and other deep learning frameworks. Known for its user-friendly API and high-level abstraction, Keras enables rapid prototyping and experimentation. It abstracts away the complexities of low-level implementation, allowing developers to focus on model design and experimentation. Keras’s modular architecture facilitates easy extension and customization, making it an excellent choice for building neural networks on Linux.

Image Recognition
Natural Language Processing
Time Series Forecasting

Scikit-learn is a versatile machine learning library that provides simple and efficient tools for data mining and analysis. Written in Python and built on NumPy, SciPy, and matplotlib, Scikit-learn offers a rich set of algorithms for classification, regression, clustering, dimensionality reduction, and more. Its clean and consistent API makes it easy to learn and use, making it suitable for both educational and production environments. Scikit-learn’s emphasis on code readability and ease of use has made it a go-to choice for machine learning tasks on Linux.

Classification

Apache MXNet is an open-source deep learning framework known for its scalability and efficiency. It supports both imperative and symbolic programming paradigms, allowing developers to choose the most suitable approach for their needs. MXNet’s distributed training capabilities make it well-suited for training large-scale models across multiple GPUs and machines. Its comprehensive documentation and active community make it an attractive option for building scalable AI applications on Linux.

Distributed Deep Learning
Large-Scale Model Training
High-Performance Computing

Theano is a Python library that allows developers to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It is widely used for building and training deep learning models, particularly in academic and research settings. Theano’s symbolic expression approach enables automatic differentiation and GPU acceleration, leading to faster computation and training times. Despite being in maintenance mode, Theano remains a valuable tool for prototyping and experimenting with deep learning algorithms on Linux.

Deep Learning Research
Academic Projects
Scientific Computing

Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and the community contributors. It is known for its expressive architecture and efficiency in training and deploying convolutional neural networks (CNNs). Caffe’s model zoo provides pre-trained models for various tasks, allowing developers to leverage state-of-the-art architectures with ease. Its C++ and Python interfaces, along with support for CPU and GPU acceleration, make it a powerful tool for building AI applications on Linux.

Facial Recognition

OpenCV (Open Source Computer Vision Library) is a popular open-source computer vision and machine learning software library. It provides a wide range of functions for real-time image processing, object detection, feature extraction, and more. OpenCV’s extensive collection of algorithms and utilities, along with its support for multiple programming languages including C++, Python, and Java, make it a versatile tool for developing AI-driven applications on Linux.

Real-Time Image Processing
Augmented Reality

H2O.ai is an open-source platform that provides scalable machine learning and AI solutions for enterprises. It offers a suite of machine learning algorithms and tools designed for large-scale data processing and model training. H2O.ai’s distributed architecture, coupled with its support for popular programming languages such as Python and R, makes it well-suited for building scalable AI applications on Linux clusters.

Fraud Detection
Predictive Maintenance
Customer Segmentation

Fastai is a deep learning library built on top of PyTorch that aims to make deep learning more accessible to practitioners. It provides high-level abstractions and pre-configured models that enable rapid experimentation and prototyping. Fastai’s extensive documentation, along with its emphasis on best practices and state-of-the-art techniques, makes it an excellent choice for developers looking to dive into deep learning on Linux.

Educational Projects
Personal Research
Prototyping Ideas

The open-source ecosystem offers a rich collection of AI tools tailored for the Linux environment, empowering developers to harness the power of artificial intelligence. From industry-leading frameworks like TensorFlow, PyTorch, and Keras to specialized libraries like Scikit-learn, OpenCV, and Fastai, this diverse array of tools caters to a wide range of AI applications and use cases. By leveraging the transparency, flexibility, and collaborative nature of open-source software, developers can build cutting-edge AI solutions, contribute to thriving communities, and drive innovation across various domains. As AI continues to reshape industries and transform the way we interact with technology, these open-source AI tools on Linux provide a solid foundation for developers to explore, experiment, and push the boundaries of what’s possible in the ever-evolving field of artificial intelligence.

Please Login to comment...

Similar reads, improve your coding skills with practice.

What kind of Experience do you want to share?

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

Contact Sales
Try Azure for free

App Service Environment version 1 and version 2 will be retired on 31 August 2024

Published date: may 17, 2024.

In August 2021, Azure announced that the retirement of Cloud Services (classic) is happening on 31 August 2024. As App Service Environment v1 and v2 run on Cloud Services (classic), we’ll retire App Service Environment v1 and v2 on the same date. Before that date, please complete your migration to App Service Environment v3.

As of 29 January 2024 , App Service Environment v1 and v2 no longer supports the creation of new workloads using any of the available methods including APM/Bicep templates, Azure portal, Azure CLI, or REST API.

After 31 August 2024 , App Service Environment v1 and v2 and the applications running on them will be deleted and any application data associated with them will be lost.

Please visit the product documentation for the latest information on the available resources and how to get started. You have access to several migration resources, including Azure FastTrack Architects, to provide more guidance as needed.

What are the main benefits of App Service Environment v3?

No Stamp Fee: App Service Environment v3 eliminates the Stamp Fee, which was a fixed cost per hour for each App Service Environment. This means you only pay for the compute and memory resources that you use for your apps, and you can save money by scaling your apps up or down as needed.
Discounted pricing offers: App Service Environment v3 supports reserved instances and savings plans, which are discounted pricing offers that allow you to save up to 55% on your App Service costs. You can commit to a certain amount of compute and memory resources for a period of 1 or 3 years and pay a lower rate than the pay-as-you-go option.
Potential cost saving: App Service Environment v3 allows you to potentially save money by using less compute and memory resources than you are using today. App Service Environment v3 runs on more powerful hardware than the previous versions, which means you can run the same workload with fewer instances or smaller instance sizes. You can also take advantage of the faster scaling speeds and higher scale limits to optimize your resource utilization and performance.

Required action

To avoid service disruption, please follow the steps to complete your migration to App Service Environment v3 before 31 August 2024 .

Help and support 

If you have questions, get answers from community experts in Microsoft Q&A . Review the free on-demand webinar with Azure FastTrack Architects. Please visit the product documentation for the latest resources.

If you have a support plan and you encounter any problem during your migration, please create a support request .

Note: This post is an update to the post that was published on 19 April 2024 as a reminder to complete your migrations.

App Service
Retirements

Related Updates

April 19, 2024

March 28, 2022

Related Products

IMAGES

JSpeak
How to Install eSpeak Text to Speech Software on Ubuntu 20.04
How to Convert Text to Speech on Linux: 12 Steps (with Pictures)
How to Convert Text to Speech on Linux: 12 Steps (with Pictures)
How to Convert Text to Speech on Linux: 12 Steps (with Pictures)
Text To Speech On Linux With Festival

VIDEO

How to configure espeak on Ubuntu 20.4.1
Speech to Text App
Google Text to Speech
How to convert your text to speech using Opensource tools?
Free Text-to-Speech with 'SpeechWich': Unlimited Human-Like Voices!
Text to Speech using Python module pttsx3 (Offline)

COMMENTS

eSpeak: Text To Speech Tool For Linux
eSpeak is a command line tool for Linux that converts text to speech. This compact speech synthesizer provides support for English and many other languages. It is written in C. eSpeak reads the text from the standard input or input file. The voice generated, however, is nowhere close to a human voice. But it is still a compact and handy tool if ...
13 Best Free Linux Speech Recognition Tools
TensorFlow implementation of Baidu's DeepSpeech architecture. Julius. Two-pass large vocabulary continuous speech recognition engine. OpenSeq2Seq. TensorFlow-based toolkit for sequence-to-sequence models. CMUSphinx. Speech recognition system for mobile and server applications. Eesen. End-to-End Speech Recognition.
7 Best Open Source Text-to-Speech (TTS) Engines
The 7 Best Open Source Text-to-Speech (TTS) Engines. Here are some well-known open-source TTS engines: 1. MaryTTS (Multimodal Interaction Architecture) A flexible, modular architecture for building TTS systems, including a voice-building tool for generating new voices from recorded audio data.
An In-Depth Guide to Open Source Text-to-Speech Engines for Linux
One useful application of text-to-speech on Linux is scripting batch text file conversions. Here is an example bash script to synthesize all text files in a directory using eSpeak: ... Video content creation is more accessible today than ever before thanks to affordable equipment and software. But proprietary video editors like Final Cut Pro or…
Text to Speech Software for Linux
Using Create AI Voiceovers is super easy and straightforward. Simply paste text on the editor, choose a voice, and make necessary adjustments. Then, process and download your final MP3 audio file. That's it. CreateAIvoiceovers caters to diverse text to speech needs. It is best for: - Product and business promotions - Explainer videos - E ...
eSpeak NG Text-to-Speech
The eSpeak NG is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents. It is based on the eSpeak engine created by Jonathan Duddington. eSpeak NG uses a "formant synthesis" method. This allows many languages to be provided in a small size.
Convert Text To Speech Using eSpeak NG In Linux
Type the word to speak and hit ENTER key. To exit, press CTRL+C. 4. If you want to save output to a WAV audio file, rather than speaking it directly, use -w flag: $ espeak-ng -w audio.wav "I use Arch, BTW". 5. eSpeak can able to print the phonemes of a text.
Text to Speech for Linux: Unveiling Top Solutions for Voice Synthesis
The landscape of Text-to-speech for Linux encompasses a range of applications from simple, lightweight programs to more complex systems with natural-sounding voices. ... To get started, one must install Text-to-speech software. On many Linux distributions this involves package managers like apt for Ubuntu or pacman for Arch Linux. For instance, ...
How to Install eSpeak Text to Speech Software on Ubuntu 20.04
eSpeak command can be used to convert text into speech. You can give any text file as an input or enter the texts on the terminal for conversion. Let's speak the line "Hi this is a sample" and record it to the sample.mp4 audio file. espeak "Hi this is a sample" -w sample.mp4 -g 60 -p 70 -s 100 -v en-us. Here, -w parameter specifies the ...
High Quality Text to Speech Software
The Cepstral, paid Linux software for TTS can speak any text they are given with whatever voice you choose. Cepstral is building new synthetic voices for Text-to-Speech (TTS) every day, and can find or build the right one for any application. As you may know that Cepstral is non free program for Linux, and you have to pay for $40.
Getting Speech Output From Entered Text From the Command Line
Many Linux tools convert text to speech or audio files from the command line, improving accessibility. Some of them also come with features like multiple voice options, multiple languages, pitch adjustment, and word gap control. In this tutorial, we'll discuss four commands for getting speech output from command line text. 2. espeak
eSpeak: speech synthesis download
eSpeak: speech synthesis. As of 2021-11-17, this project can be found here. Text to Speech engine for English and many other languages. Compact size with clear but artificial pronunciation. Available as a command-line program with many options, a shared library for Linux, and a Windows SAPI5 version.
Convert text to voice with eSpeak on Ubuntu
1. Use the following command to listen to the text specified in the inverted commas: $ espeak "enter the text that you want to listen to". Example: 2. Enter the following command and then hit Enter: $ espeak. On the prompt that appears, enter the text you want eSpeak to say and then hit Enter.
Best text-to-speech software of 2024
FAQs. How we test. The best text-to-speech software makes it simple and easy to convert text to voice for accessibility or for productivity applications. Best text-to-speech software: Quick menu ...
Mimic 3
Mimic 3 is a neural text to speech engine that can run locally, even on low-end hardware like the Raspberry Pi 4. The software speaks over 25 languages with over 100 pre-trained voices. Mimic 3 uses VITS, a "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech". Mimic 3 is free and open source software.
Top 10 Best Open Source Speech Recognition Tools for Linux
7. Mycroft. Mycroft comes with an easy-to-use open source voice assistant for converting voice to text. It is regarded as one of the most popular Linux speech recognition tools in modern time, written in Python. It allows users to make the best use of this tool in a science project or enterprise software application.
software recommendation
From man spd-say:. NAME spd-say - send text-to-speech output request to speech-dispatcher SYNOPSIS spd-say [options] "some text" DESCRIPTION spd-say sends text-to-speech output request to speech-dispatcher process which handles it and ideally outputs the result to the audio system.
Text-to-speech software gets Linux voice
NeoSpeech has announced Linux support for its VoiceText 2.0, advanced Text-to-Speech (TTS) software. The software generates natural-sounding voices from text input across handheld, desktop, and network/server applications. Link: desktoplinux.com Category: Linux
eSpeak: Speech Synthesizer
The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings. eSpeak is available as: A command line program (Linux and Windows) to speak text from a file or from stdin.
Text to Speech synthesis software
The practically usable alternatives for converting text to speech using free software on GNU/Linux desktop and laptop machines are: mimic from Mycroft, forked off an early version of the flite software, is the best choice if you are only interested in the English language. festival is actively developed and it works fine but it is not great and ...
GitHub
spchcat is a command-line tool that reads in audio from .WAV files, a microphone, or system audio inputs and converts any speech found into text. It runs locally on your machine, with no web API calls or network activity, and is open source. It is built on top of Coqui's speech to text library, TensorFlow, KenLM, and data from Mozilla's Common Voice project.
Top 11 Open Source Speech Recognition/Speech-to-Text Systems
1. Project DeepSpeech. This project is made by Mozilla, the organization behind the Firefox browser. It's a 100% free and open source speech-to-text library that also implies the machine learning technology using TensorFlow framework to fulfill its mission.
OpenAI API: Speech
The game-changing text-to-speech and speech-to-text APIs in OpenAI may set a new standard for voice activated software. Developers can increasingly benefit from proficiency with these APIs. ... Managing the Linux operating system can be a complex task. In this course, instructor and Linux enthusiast Grant McWilliams dives…
10 Top Open Source AI Tools for Linux
Its C++ and Python interfaces, along with support for CPU and GPU acceleration, make it a powerful tool for building AI applications on Linux. Use Cases: Object Detection; Image Segmentation; Facial Recognition; OpenCV: OpenCV (Open Source Computer Vision Library) is a popular open-source computer vision and machine learning software library ...
App Service Environment version 1 and version 2 will be retired on 31
Provision Windows and Linux VMs in seconds. Azure Virtual Desktop ... software, and solutions. Azure Arc Secure, develop, and operate infrastructure, apps, and Azure services anywhere ... Unified speech services for speech-to-text, text-to-speech and speech translation.
New on Azure Marketplace: April 26-30, 2024
BitFractal Transcriber Speech to Text ... Squid Easy Proxy Server on Rocky Linux 8 is a software solution that simplifies proxy server administration. It offers caching, traffic monitoring, content adaptation, and reverse proxying features to optimize web performance and enhance user experience. Squid serves as a front-end for web servers and ...

eSpeak: Text To Speech Tool For Linux

Install eSpeak

GUI Version: espeakedit

A New Tool: eSpeak NG

Wrapping Up

Abhishek Prakash

Meet DebianDog - Puppy sized Debian Linux

An In-Depth Guide to Open Source Text-to-Speech Engines for Linux

Introduction to Text-to-Speech

eSpeak – Lightweight Open Source TTS

Festival – Framework for Building TTS Voices

Pico TTS – Optimized Small Footprint Engine

gTTS – Leveraging Google‘s TTS API

Comparing Voice Quality Between TTS Engines

Additional Tips and Tricks

Leveraging TTS Engines in Shell Scripts

Appendix: Quick Reference of Engines

You maybe like,

11 Best IDEs for Web Development

30 Best GNOME Extensions for Ubuntu in 2023

4 Best Open Source Video Editors for Linux, Mac and Windows: A Complete 2023 Guide

5 Best Free and Open Source NAS Software for Linux

5 Best Linux Distros to Learn Linux

Text to Speech for Linux: Unveiling Top Solutions for Voice Synthesis

Text to Speech Basics

Understanding Speech Synthesis

TTS Engines for Linux

Implementation and Usage

Installing TTS Software

Command Line TTS Tools

Text-to-speech in Applications

High Quality Text to Speech Software – Best TTS for Linux

What is the best text to speech program in Linux?

Cepstral Supported Language:

Another Best Free TTS for Linux

Best text-to-speech software of 2024

The best text-to-speech software of 2024 in full:

The best text-to-speech software overall

1. NaturalReader

Reasons to buy

The best text-to-speech software for realistic voices

The best text-to-speech software for developers

3. Amazon Polly

The best text-to-speech software for podcasting

The best text-to-speech software for Mac and iOS

5. Voice Dream Reader

The best text-to-speech software: FAQs

What’s the difference between web TTS services and TTS software?

Do I need a text-to-speech subscription?

How can I incorporate text-to-speech as part of my business tech stack?

How to choose the best text-to-speech software

How we test the best text-to-speech software

Get in touch

Are you a pro? Subscribe to our newsletter

Most Popular

Mimic 3 – neural Text to Speech (TTS) engine

Installation

Top 10 Best Open Source Speech Recognition Tools for Linux

Open Source Speech Recognition Tools

2. CMUSphinx

3. DeepSpeech

4. Wav2Letter++

8. OpenMindSpeech

9. SpeechControl

10. Deepspeech.pytorch

Finishing Thoughts

LEAVE A REPLY Cancel reply

You May Like It!

Stack Exchange Network

How to text-to-speech output using command-line?

15 Answers 15

SVOX pico2wave

Further reading

You must log in to answer this question.

Hot Network Questions

Text-to-speech software gets Linux voice

RELATED ARTICLES MORE FROM AUTHOR

Maintainer Confidential: Challenges and Opportunities One Year On

Text to Speech synthesis software

The free software alternatives for converting Text to Speech on Linux

Help and support