• About AssemblyAI

The top free Speech-to-Text APIs, AI Models, and Open Source Engines

This post compares the best free Speech-to-Text APIs and AI models on the market today, including APIs that have a free tier. We’ll also look at several free open-source Speech-to-Text engines and explore why you might choose an API vs. an open-source library, or vice versa.

The top free Speech-to-Text APIs, AI Models, and Open Source Engines

Growth at AssemblyAI

Choosing the best Speech-to-Text API , AI model, or open-source engine to build with can be challenging. You need to compare accuracy, model design, features, support options, documentation, security, and more.

This post examines the best free Speech-to-Text APIs and AI models on the market today, including ones that have a free tier, to help you make an informed decision. We’ll also look at several free open-source Speech-to-Text engines and explore why you might choose an API or AI model vs. an open-source library, or vice versa.

Looking for a powerful speech-to-text API or AI model?

Learn why AssemblyAI is the leading Speech AI partner.

Free Speech-to-Text APIs and AI Models

APIs and AI models are more accurate, easier to integrate, and come with more out-of-the-box features than open-source options. However, large-scale use of APIs and AI models can come with a higher cost than open-source options.

If you’re looking to use an API or AI model for a small project or a trial run, many of today’s Speech-to-Text APIs and AI models have a free tier. This means that the API or model is free for anyone to use up to a certain volume per day, per month, or per year.

Let’s compare three of the most popular Speech-to-Text APIs and AI models with a free tier: AssemblyAI, Google, and AWS Transcribe.

AssemblyAI is an API platform that offers AI models that accurately transcribe and understand speech, and enable users to extract insights from voice data. AssemblyAI offers cutting-edge AI models such as Speaker Diarization , Topic Detection, Entity Detection , Automated Punctuation and Casing , Content Moderation , Sentiment Analysis , Text Summarization , and more. These AI models help users get more out of voice data, with continuous improvements being made to accuracy .

AssemblyAI also offers LeMUR , which enables users to leverage Large Language Models (LLMs) to pull valuable information from their voice data—including answering questions, generating summaries and action items, and more. 

The company offers up to 100 free transcription hours for audio files or video streams, with a concurrency limit of 5, before transitioning to an affordable paid tier.

Its high accuracy and diverse collection of AI models built by AI experts make AssemblyAI a sound option for developers looking for a free Speech-to-Text API. The API also supports virtually every audio and video file format out-of-the-box for easier transcription.

AssemblyAI has expanded the languages it supports to include English, Spanish, French, German, Japanese, Korean, and much more, with additional languages being released monthly. See the full list here .

AssemblyAI’s easy-to-use models also allow for quick set-up and transcription in any programming language. You can copy/paste code examples in your preferred language directly from the AssemblyAI Docs or use the AssemblyAI Python SDK or another one of its ready-to-use integrations .

  • Free to test in the AI playground , plus 100 free hours of asynchronous transcription with an API sign-up
  • Speech-to-Text – $0.37 per hour
  • Real-time Transcription – $0.47 per hour
  • Audio Intelligence – varies, $.01 to $.15 per hour
  • LeMUR – varies
  • Enterprise pricing is also available

See the full pricing list here .

  • High accuracy
  • Breadth of AI models available, built by AI experts
  • Continuous model iteration and improvement
  • Developer-friendly documentation and SDKs
  • Enterprise-grade support and security
  • Models are not open-source

Google Speech-to-Text is a well-known speech transcription API. Google gives users 60 minutes of free transcription, with $300 in free credits for Google Cloud hosting.

Google only supports transcribing files already in a Google Cloud Bucket, so the free credits won’t get you very far. Google also requires you to sign up for a GCP account and project — whether you're using the free tier or paid.

With good accuracy and 125+ languages supported, Google is a decent choice if you’re willing to put in some initial work.

  • 60 minutes of free transcription
  • $300 in free credits for Google Cloud hosting
  • Decent accuracy
  • Multi-language support
  • Only supports transcription of files in a Google Cloud Bucket
  • Difficult to get started
  • Lower accuracy than other similarly-priced APIs
  • AWS Transcribe

AWS Transcribe offers one hour free per month for the first 12 months of use.

Like Google, you must create an AWS account first if you don’t already have one. AWS also has lower accuracy compared to alternative APIs and only supports transcribing files already in an Amazon S3 bucket.

However, if you’re looking for a specific feature, like medical transcription, AWS has some options. Its Transcribe Medical API is a medical-focused ASR option that is available today.

  • One hour free per month for the first 12 months of use
  • Tiered pricing , based on usage, ranges from $0.02400 to $0.00780
  • Integrates into existing AWS ecosystem
  • Medical language transcription
  • Difficult to get started from scratch
  • Only supports transcribing files already in an Amazon S3 bucket

Open-Source Speech Transcription engines

An alternative to APIs and AI models, open-source Speech-to-Text libraries are completely free--with no limits on use. Some developers also see data security as a plus, since your data doesn’t have to be sent to a third party or the cloud.

There is work involved with open-source engines, so you must be comfortable putting in a lot of time and effort to get the results you want, especially if you are trying to use these libraries at scale. Open-source Speech-to-Text engines are typically less accurate than the APIs discussed above.

If you want to go the open-source route, here are some options worth exploring:

DeepSpeech is an open-source embedded Speech-to-Text engine designed to run in real-time on a range of devices, from high-powered GPUs to a Raspberry Pi 4. The DeepSpeech library uses end-to-end model architecture pioneered by Baidu.

DeepSpeech also has decent out-of-the-box accuracy for an open-source option and is easy to fine-tune and train on your own data.

  • Easy to customize
  • Can use it to train your own model
  • Can be used on a wide range of devices
  • Lack of support
  • No model improvement outside of individual custom training
  • Heavy lift to integrate into production-ready applications

Kaldi is a speech recognition toolkit that has been widely popular in the research community for many years.

Like DeepSpeech, Kaldi has good out-of-the-box accuracy and supports the ability to train your own models. It’s also been thoroughly tested—a lot of companies currently use Kaldi in production and have used it for a while—making more developers confident in its application.

  • Can use it to train your own models
  • Active user base
  • Can be complex and expensive to use
  • Uses a command-line interface

Flashlight ASR (formerly Wav2Letter)

Flashlight ASR, formerly Wav2Letter, is Facebook AI Research’s Automatic Speech Recognition (ASR) Toolkit. It is also written in C++ and usesthe ArrayFire tensor library.

Like DeepSpeech, Flashlight ASR is decently accurate for an open-source library and is easy to work with on a small project.

  • Customizable
  • Easier to modify than other open-source options
  • Processing speed
  • Very complex to use
  • No pre-trained libraries available
  • Need to continuously source datasets for training and model updates, which can be difficult and costly
  • SpeechBrain

SpeechBrain is a PyTorch-based transcription toolkit. The platform releases open implementations of popular research works and offers a tight integration with Hugging Face for easy access.

Overall, the platform is well-defined and constantly updated, making it a straightforward tool for training and finetuning.

  • Integration with Pytorch and Hugging Face
  • Pre-trained models are available
  • Supports a variety of tasks
  • Even its pre-trained models take a lot of customization to make them usable
  • Lack of extensive docs makes it not as user-friendly, except for those with extensive experience

Coqui is another deep learning toolkit for Speech-to-Text transcription. Coqui is used in over twenty languages for projects and also offers a variety of essential inference and productionization features.

The platform also releases custom-trained models and has bindings for various programming languages for easier deployment.

  • Generates confidence scores for transcripts
  • Large support comunity
  • No longer updated and maintained by Coqui

Whisper by OpenAI, released in September 2022, is comparable to other current state-of-the-art open-source options.

Whisper can be used either in Python or from the command line and can also be used for multilingual translation.

Whisper has five different models of varying sizes and capabilities, depending on the use case, including v3 released in November 2023 .

However, you’ll need a fairly large computing power and access to an in-house team to maintain, scale, update, and monitor the model to run Whisper at a large scale, making the total cost of ownership higher compared to other options. 

As of March 2023, Whisper is also now available via API . On-demand pricing starts at $0.006/minute.

  • Multilingual transcription
  • Can be used in Python
  • Five models are available, each with different sizes and capabilities
  • Need an in-house research team to maintain and update
  • Costly to run

Which free Speech-to-Text API, AI model, or Open Source engine is right for your project?

The best free Speech-to-Text API, AI model, or open-source engine will depend on our project. Do you want something that is easy-to-use, has high accuracy, and has additional out-of-the-box features? If so, one of these APIs might be right for you:

Alternatively, you might want a completely free option with no data limits—if you don’t mind the extra work it will take to tailor a toolkit to your needs. If so, you might choose one of these open-source libraries:

Whichever you choose, make sure you find a product that can continually meet the needs of your project now and what your project may develop into in the future.

Want to get started with an API?

Get a free API key for AssemblyAI.

Popular posts

AI trends in 2024: Graph Neural Networks

AI trends in 2024: Graph Neural Networks

Marco Ramponi's picture

Developer Educator at AssemblyAI

AI for Universal Audio Understanding: Qwen-Audio Explained

AI for Universal Audio Understanding: Qwen-Audio Explained

Combining Speech Recognition and Diarization in one model

Combining Speech Recognition and Diarization in one model

How DALL-E 2 Actually Works

How DALL-E 2 Actually Works

Ryan O'Connor's picture

The Best Speech-to-Text APIs in 2024

Josh Fox

, Jose Nicholas Francisco

speech-to-text gold trophy

If you've been shopping for a speech-to-text (STT) solution for your business, you're not alone. In our recent  State of Voice Technology  report, 82% of respondents confirmed their current utilization of voice-enabled technology, a 6% increase from last year.

The vast number of options for speech transcription can be overwhelming, especially if you're unfamiliar with the space. From Big Tech to open source options, there are many choices, each with different price points and feature sets. While this diversity is great, it can also be confusing when you're trying to compare options and pick the right solution.

This article breaks down the leading speech-to-text APIs available today, outlining their pros and cons and providing a ranking that accurately represents the current STT landscape. Before getting to the ranking, we explain exactly what an STT API is, and the core features you can expect an STT API to have, and some key use cases for speech-to-text APIs.

What is a speech-to-text API?

At its core, a speech-to-text (also known as automatic speech recognition, or ASR) application programming interface (API) is simply the ability to call a service to transcribe audio containing speech into written text. The STT service will take the provided audio data, process it using either machine learning or legacy techniques (e.g. Hidden Markov Models), and then provide a transcript of what it has inferred was said.

What are the most important things to consider when choosing a speech-to-text API?

What makes the best speech-to-text API? Is the fastest speech-to-text API the best? Is the most accurate speech-to-text API the best? Is the most affordable speech-to-text API the best? The answers to these questions depend on your specific project and are thus certainly different for everybody. There are a number of aspects to carefully consider in the evaluation and selection of a transcription service and the order of importance is dependent on your target use case and end user needs.

Accuracy - A speech-to-text API should produce highly accurate transcripts, even while dealing with varying levels of speaking conditions (e.g. background noise, dialects, accents, etc.). “Garbage in, garbage out,” as the saying goes. The vast majority of voice applications require highly accurate results from their transcription service to deliver value and a good customer experience to their users.

Speed - Many applications require quick turnaround times and high throughput. A responsive STT solution will deliver value with low latency and fast processing speeds.

Cost - Speech-to-text is a foundational capability in the application stack, and cost efficiency is essential. Solutions that fail to deliver adequate ROI and a good price-to-performance ratio will be a barrier to the overall utility of the end user application.

Modality - Important input modes include support for pre-recorded or real-time audio:

Batch or pre-recorded transcription capabilities - Batch transcription won't be needed by everyone, but for many use cases, you'll want a service that you can send batches of files to to be transcribed, rather than having to do it one-by-one on your end.

Real-time streaming - Again, not everyone will need real-time streaming. However, if you want to use STT to create, for example, truly conversational AI that can respond to customer inquiries in real time, you'll need to use a STT API that returns its results as quickly as possible.

Features & Capabilities - Developers and companies seeking speech processing solutions require more than a bare transcript. They also need rich features that help them build scalable products with their voice data, including sophisticated formatting and speech understanding capabilities to improve readability and utility by downstream tasks.

Scalability and Reliability - A good speech-to-text solution will accommodate varying throughput needs, adequately handling a range of audio data volumes from small startups to large enterprises. Similarly, ensuring reliable, operational integrity is a hard requirement for many applications where the effects from frequent or lengthy service interruption could result in revenue impacts and damage to brand reputation. 

Customization, Flexibility, and Adaptability - One size, fits few. The ability to customize STT models for specific vocabulary or jargon as well as flexible deployment options to meet project-specific privacy, security, and compliance needs are important, often overlooked considerations in the selection process.

Ease of Adoption and Use - A speech-to-text API only has value if it can be integrated into an application. Flexible pricing and packaging options are critical, including usage-based pricing with volume discounts. Some vendors do a better job than others to provide a good developer experience by offering frictionless self-onboarding and even including free tiers with an adequate volume of credits to help developers test the API and prototype their applications before choosing the best subscription option to choose.

Support and Subject Matter Expertise - Domain experts in AI, machine learning, and spoken language understanding are an invaluable resource when issues arise. Many solution providers outsource their model development or offer STT as a value-add to their core offering. Vendors for whom speech AI is their core focus are better equipped to diagnose and resolve challenge issues in a timely fashion. They are also more inclined to make continuous improvements to their STT service and avoid issues with stagnating performance over time.

What are the most important features of a speech-to-text API?

In this section, we'll survey some of the most common features that STT APIs offer. The key features that are offered by each API differ, and your use cases will dictate your priorities and needs in terms of which features to focus on.

Multi-language support - If you're planning to handle multiple languages or dialects, this should be a key concern. And even if you aren't planning on multilingual support now, if there's any chance that you would in the future, you're best off starting with a service that offers many languages and is always expanding to more.

Formatting - Formatting options like punctuation, numeral formatting, paragraphing, speaker labeling (or speaker diarization), word-level timestamping, profanity filtering, and more, all to improve readability and utility for data science

Automatic punctuation & capitalization - Depending on what you're planning to do with your transcripts, you might not care if they're formatted nicely. But if you're planning on surfacing them publicly, having this included in what the STT API provides can save you time.

Profanity filtering or redaction - If you're using STT as part of an effort for community moderation, you're going to want a tool that can automatically detect profanity in its output and censor it or flag it for review.

Understanding - A primary motivation for employing a speech-to-text API is to gain understanding of who said what and why they said it. Many applications employ natural language and spoken language understanding tasks to accurately identify, extract, and summarize conversational audio to deliver amazing customer experiences. 

Topic detection - Automatically identify the main topics and themes in your audio to improve categorization, organization, and understanding of large volumes of spoken language content..

Intent detection - Similarly, intent detection is used to determine the purpose or intention behind the interactions between speakers, enabling more efficient handling by downstream agents or tasks in a system in order to determine the next best action to take or response to provide.

Sentiment analysis - Understand the interactions, attitudes, views, and emotions in conversational audio by quantitatively scoring the overall and component sections as being positive, neutral, or negative. 

Summarization - Deliver a concise summary of the content in your audio, retaining the most relevant and important information and overall meaning, for responsive understanding, analysis, and efficient archival.

Keywords (a.k.a. Keyword Boosting) - Being able to include an extended, custom vocabulary is helpful if your audio has lots of specialized terminology, uncommon proper nouns, abbreviations, and acronyms that an off-the-shelf model wouldn't have been exposed to. This allows the model to incorporate these custom terms as possible predictions.

Custom models - While keywords provide inclusion of a small set of specialized, out-of-vocabulary words, a custom model trained on representative data will always give the best performance. Vendors that allow you to tailor a model for your specific needs, fine-tuned on your own data, give you the ability to boost accuracy beyond what an out-of-the-box solution alone provides.

Accepts multiple audio formats - Another concern that won't be present for everyone is whether or not the STT API can process audio in different formats. If you have audio coming from multiple sources that aren't encoded in the same format, having a STT API that removes the need for converting to different types of audio can save you time and money.

What are the top speech-to-text use cases?

As noted at the outset, voice technology that's built on the back of STT APIs is a critical part of the future of business. So what are some of the most common use cases for speech-to-text APIs? Let's take a look.

Smart assistants  - Smart assistants like Siri and Alexa are perhaps the most frequently encountered use case for speech-to-text, taking spoken commands, converting them to text, and then acting on them.

Conversational AI  - Voicebots let humans speak and, in real time, get answers from an AI. Converting speech to text is the first step in this process, and it has to happen quickly for the interaction to truly feel like a conversation.

Sales and support enablement  - Sales and support digital assistants that provide tips, hints, and solutions to agents by transcribing, analyzing and pulling up information in real time. It can also be used to gauge sales pitches or sales calls with a customer.

Contact centers  - Contact centers can use STT to create transcripts of their calls, providing more ways to evaluate their agents, understand what customers are asking about, and provide insight into different aspects of their business that are typically hard to assess.

Speech analytics  - Broadly speaking, speech analytics is any attempt to process spoken audio to extract insights. This might be done in a call center, as above, but it could also be done in other environments, like meetings or even speeches and talks.

Accessibility  - Providing transcriptions of spoken speech can be a huge win for accessibility, whether it's  providing captions for classroom lectures  or creating badges that transcribe speech on the fly.

How do you evaluate performance of a speech-to-text API?

All speech-to-text solutions aim to produce highly accurate transcripts in a user-friendly format. We advise performing side-by-side accuracy testing using files that resemble the audio you will be processing in production to determine the best speech solution for your needs. The best evaluation regimes employ a holistic approach that includes a mix of quantitative benchmarking and qualitative human preference evaluation across the most important dimensions of quality and performance, including accuracy and speed.

The generally accepted industry metric for measuring transcription quality is Word Error Rate (WER). Consider WER in relation to the following equation:

WER + Accuracy Rate = 100%

Thus, an 80% accurate transcript corresponds to a WER of 20%

WER is an industry standard focusing on error rate rather than accuracy as the error rate can be subdivided into distinct error categories. These categories provide valuable insights into the nature of errors present in a transcript. Consequently, WER can also be defined using the formula:

WER = (# of words inserted + # of words deleted + # of words substituted) / total # of words.

We suggest a degree of skepticism towards vendor claims about accuracy. This includes the qualitative claim that OpenAI’s model “approaches human level robustness on accuracy in English,” and the WER statistics published in Whisper’s documentation.

speech to text free api

Tife Sanusi

May 21, 2024

How Businesses are Adopting AI: Full Guide

Samuel Adebayo

May 20, 2024

How our inventions beat us at our own games: AI Game Strategies

May 15, 2024

Nova-2 Speech to Text Now Supports 36 Languages (and Counting)

May 16, 2024

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

  • Português – Brasil

Using the Speech-to-Text API with Node.js

1. overview.

Google Cloud Speech-to-Text API enables developers to convert audio to text in 120 languages and variants, by applying powerful neural network models in an easy to use API.

In this codelab, you will focus on using the Speech-to-Text API with Node.js. You will learn how to send an audio file in English and other languages to the Cloud Speech-to-Text API for transcription.

What you'll learn

  • How to enable the Speech-to-Text API
  • How to Authenticate API requests
  • How to install the Google Cloud client library for Node.js
  • How to transcribe audio files in English
  • How to transcribe audio files with word timestamps
  • How to transcribe audio files in different languages

What you'll need

  • A Google Cloud Platform Project
  • A Browser, such Chrome or Firefox
  • Familiarity using Javascript/Node.js

How will you use this tutorial?

How would you rate your experience with node.js, how would you rate your experience with using google cloud platform services, 2. setup and requirements, self-paced environment setup.

  • Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one .)

dMbN6g9RawQj_VXCSYpdYncY-DbaRzr2GbnwoV7jFf1u3avxJtmGPmKpMYgiaMH-qu80a_NJ9p2IIXFppYk8x3wyymZXavjglNLJJhuXieCem56H30hwXtd8PvXGpXJO9gEUDu3cZw

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID .

  • Next, you'll need to enable billing in Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost much, if anything at all. Be sure to to follow any instructions in the "Cleaning up" section which advises you how to shut down resources so you don't incur billing beyond this tutorial. New users of Google Cloud are eligible for the $300USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell , a command line environment running in the Cloud.

Activate Cloud Shell

H7JlbhKGHITmsxhQIcLwoe5HXZMhDlYue4K-SPszMxUxDjIeWfOHBfxDHYpmLQTzUmQ7Xx8o6OJUlANnQF0iBuUyfp1RzVad_4nCa0Zz5LtwBlUZFXFCWFrmrWZLqg1MkZz2LdgUDQ

If you've never started Cloud Shell before, you'll be presented with an intermediate screen (below the fold) describing what it is. If that's the case, click Continue (and you won't ever see it again). Here's what that one-time screen looks like:

kEPbNAo_w5C_pi9QvhFwWwky1cX8hr_xEMGWySNIoMCdi-Djx9AQRqWn-__DmEpC7vKgUtl-feTcv-wBxJ8NwzzAp7mY65-fi2LJo4twUoewT1SUjd6Y3h81RG3rKIkqhoVlFR-G7w

It should only take a few moments to provision and connect to Cloud Shell.

pTv5mEKzWMWp5VBrg2eGcuRPv9dLInPToS-mohlrqDASyYGWnZ_SwE-MzOWHe76ZdCSmw0kgWogSJv27lrQE8pvA5OD6P1I47nz8vrAdK7yR1NseZKJvcxAZrPb8wRxoqyTpD-gbhA

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with simply a browser or your Chromebook.

Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.

  • Run the following command in Cloud Shell to confirm that you are authenticated:

Command output

If it is not, you can set it with this command:

3. Enable the Speech-to-Text API

Before you can begin using the Speech-to-Text API, you must enable the API. You can enable the API by using the following command in the Cloud Shell:

4. Authenticate API requests

In order to make requests to the Speech-to-Text API, you need to use a Service Account . A Service Account belongs to your project and it is used by the Google Client Node.js library to make Speech-to-Text API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the Cloud SDK to create a service account and then create credentials you will need to authenticate as the service account.

First, set an environment variable with your PROJECT_ID which you will use throughout this codelab, if you are using Cloud Shell this will be set for you:

Next, create a new service account to access the Speech-to-Text API by using:

Next, create credentials that your Node.js code will use to login as your new service account. Create these credentials and save it as a JSON file ~/key.json by using the following command:

Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the Speech-to-Text API Node.js library, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created, by using:

You can read more about authenticating the Speech-to-Text API .

5. Install the Google Cloud Speech-to-Text API client library for Node.js

First, create a project that you will use to run this Speech-to-Text API lab, initialize a new Node.js package in a folder of your choice:

NPM asks several questions about the project configuration, such as name and version. For each question, press ENTER to accept the default values. The default entry point is a file named index.js .

Next, install the Google Cloud Speech library to the project:

For more instructions on how to set up a Node.js development for Google Cloud please see the Setup Guide .

Now, you're ready to use Speech-to-Text API!

6. Transcribe Audio Files

In this section, you will transcribe a pre-recorded audio file in English. The audio file is available on Google Cloud Storage.

Navigate to the index.js file inside the and replace the code with the following:

Take a minute or two to study the code and see it is used to transcribe an audio file*.*

The Encoding parameter tells the API which type of audio encoding you're using for the audio file. Flac is the encoding type for .raw files (see the doc for encoding type for more details).

In the RecognitionAudio object, you can pass the API either the uri of our audio file in Cloud Storage or the local file path for the audio file. Here, we're using a Cloud Storage uri.

Run the program:

You should see the following output:

7. Transcribe with word timestamps

Speech-to-Text can detect time offset (timestamp) for the transcribed audio. Time offsets show the beginning and end of each spoken word in the supplied audio. A time offset value represents the amount of time that has elapsed from the beginning of the audio, in increments of 100ms.

Take a minute or two to study the code and see it is used to transcribe an audio file with word timestamps*.* The EnableWordTimeOffsets parameter tells the API to enable time offsets (see the doc for more details).

Run your program again:

8. Transcribe different languages

Speech-to-Text API supports transcription in over 100 languages! You can find a list of supported languages here .

In this section, you will transcribe a pre-recorded audio file in French. The audio file is available on Google Cloud Storage.

Run your program again and you should see the following output:

This is a sentence from a popular French children's tale .

For the full list of supported languages and language codes, see the documentation here .

9. Congratulations!

You learned how to use the Speech-to-Text API using Node.js to perform different kinds of transcription on audio files!

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

  • Go to the Cloud Platform Console .
  • Select the project you want to shut down, then click ‘Delete' at the top: this schedules the project for deletion.
  • Google Cloud Speech-to-Text API: https://cloud.google.com/speech-to-text/docs
  • Node.js on Google Cloud Platform: https://cloud.google.com/nodejs/
  • Google Cloud Node.js client: https://googlecloudplatform.github.io/google-cloud-node/

This work is licensed under a Creative Commons Attribution 2.0 Generic License.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

  • Skip to main content
  • Skip to search
  • Skip to select language
  • Sign up for free

Using the Web Speech API

Speech recognition.

Speech recognition involves receiving speech through a device's microphone, which is then checked by a speech recognition service against a list of grammar (basically, the vocabulary you want to have recognized in a particular app.) When a word or phrase is successfully recognized, it is returned as a result (or list of results) as a text string, and further actions can be initiated as a result.

The Web Speech API has a main controller interface for this — SpeechRecognition — plus a number of closely-related interfaces for representing grammar, results, etc. Generally, the default speech recognition system available on the device will be used for the speech recognition — most modern OSes have a speech recognition system for issuing voice commands. Think about Dictation on macOS, Siri on iOS, Cortana on Windows 10, Android Speech, etc.

Note: On some browsers, such as Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.

To show simple usage of Web speech recognition, we've written a demo called Speech color changer . When the screen is tapped/clicked, you can say an HTML color keyword, and the app's background color will change to that color.

The UI of an app titled Speech Color changer. It invites the user to tap the screen and say a color, and then it turns the background of the app that color. In this case it has turned the background red.

To run the demo, navigate to the live demo URL in a supporting mobile browser (such as Chrome).

HTML and CSS

The HTML and CSS for the app is really trivial. We have a title, instructions paragraph, and a div into which we output diagnostic messages.

The CSS provides a very simple responsive styling so that it looks OK across devices.

Let's look at the JavaScript in a bit more detail.

Prefixed properties

Browsers currently support speech recognition with prefixed properties. Therefore at the start of our code we include these lines to allow for both prefixed properties and unprefixed versions that may be supported in future:

The grammar

The next part of our code defines the grammar we want our app to recognize. The following variable is defined to hold our grammar:

The grammar format used is JSpeech Grammar Format ( JSGF ) — you can find a lot more about it at the previous link to its spec. However, for now let's just run through it quickly:

  • The lines are separated by semicolons, just like in JavaScript.
  • The first line — #JSGF V1.0; — states the format and version used. This always needs to be included first.
  • The second line indicates a type of term that we want to recognize. public declares that it is a public rule, the string in angle brackets defines the recognized name for this term ( color ), and the list of items that follow the equals sign are the alternative values that will be recognized and accepted as appropriate values for the term. Note how each is separated by a pipe character.
  • You can have as many terms defined as you want on separate lines following the above structure, and include fairly complex grammar definitions. For this basic demo, we are just keeping things simple.

Plugging the grammar into our speech recognition

The next thing to do is define a speech recognition instance to control the recognition for our application. This is done using the SpeechRecognition() constructor. We also create a new speech grammar list to contain our grammar, using the SpeechGrammarList() constructor.

We add our grammar to the list using the SpeechGrammarList.addFromString() method. This accepts as parameters the string we want to add, plus optionally a weight value that specifies the importance of this grammar in relation of other grammars available in the list (can be from 0 to 1 inclusive.) The added grammar is available in the list as a SpeechGrammar object instance.

We then add the SpeechGrammarList to the speech recognition instance by setting it to the value of the SpeechRecognition.grammars property. We also set a few other properties of the recognition instance before we move on:

  • SpeechRecognition.continuous : Controls whether continuous results are captured ( true ), or just a single result each time recognition is started ( false ).
  • SpeechRecognition.lang : Sets the language of the recognition. Setting this is good practice, and therefore recommended.
  • SpeechRecognition.interimResults : Defines whether the speech recognition system should return interim results, or just final results. Final results are good enough for this simple demo.
  • SpeechRecognition.maxAlternatives : Sets the number of alternative potential matches that should be returned per result. This can sometimes be useful, say if a result is not completely clear and you want to display a list if alternatives for the user to choose the correct one from. But it is not needed for this simple demo, so we are just specifying one (which is actually the default anyway.)

Starting the speech recognition

After grabbing references to the output <div> and the HTML element (so we can output diagnostic messages and update the app background color later on), we implement an onclick handler so that when the screen is tapped/clicked, the speech recognition service will start. This is achieved by calling SpeechRecognition.start() . The forEach() method is used to output colored indicators showing what colors to try saying.

Receiving and handling results

Once the speech recognition is started, there are many event handlers that can be used to retrieve results, and other pieces of surrounding information (see the SpeechRecognition events .) The most common one you'll probably use is the result event, which is fired once a successful result is received:

The second line here is a bit complex-looking, so let's explain it step by step. The SpeechRecognitionEvent.results property returns a SpeechRecognitionResultList object containing SpeechRecognitionResult objects. It has a getter so it can be accessed like an array — so the first [0] returns the SpeechRecognitionResult at position 0. Each SpeechRecognitionResult object contains SpeechRecognitionAlternative objects that contain individual recognized words. These also have getters so they can be accessed like arrays — the second [0] therefore returns the SpeechRecognitionAlternative at position 0. We then return its transcript property to get a string containing the individual recognized result as a string, set the background color to that color, and report the color recognized as a diagnostic message in the UI.

We also use the speechend event to stop the speech recognition service from running (using SpeechRecognition.stop() ) once a single word has been recognized and it has finished being spoken:

Handling errors and unrecognized speech

The last two handlers are there to handle cases where speech was recognized that wasn't in the defined grammar, or an error occurred. The nomatch event seems to be supposed to handle the first case mentioned, although note that at the moment it doesn't seem to fire correctly; it just returns whatever was recognized anyway:

The error event handles cases where there is an actual error with the recognition successfully — the SpeechRecognitionErrorEvent.error property contains the actual error returned:

Speech synthesis

Speech synthesis (aka text-to-speech, or TTS) involves receiving synthesizing text contained within an app to speech, and playing it out of a device's speaker or audio output connection.

The Web Speech API has a main controller interface for this — SpeechSynthesis — plus a number of closely-related interfaces for representing text to be synthesized (known as utterances), voices to be used for the utterance, etc. Again, most OSes have some kind of speech synthesis system, which will be used by the API for this task as available.

To show simple usage of Web speech synthesis, we've provided a demo called Speak easy synthesis . This includes a set of form controls for entering text to be synthesized, and setting the pitch, rate, and voice to use when the text is uttered. After you have entered your text, you can press Enter / Return to hear it spoken.

UI of an app called speak easy synthesis. It has an input field in which to input text to be synthesized, slider controls to change the rate and pitch of the speech, and a drop down menu to choose between different voices.

To run the demo, navigate to the live demo URL in a supporting mobile browser.

The HTML and CSS are again pretty trivial, containing a title, some instructions for use, and a form with some simple controls. The <select> element is initially empty, but is populated with <option> s via JavaScript (see later on.)

Let's investigate the JavaScript that powers this app.

Setting variables

First of all, we capture references to all the DOM elements involved in the UI, but more interestingly, we capture a reference to Window.speechSynthesis . This is API's entry point — it returns an instance of SpeechSynthesis , the controller interface for web speech synthesis.

Populating the select element

To populate the <select> element with the different voice options the device has available, we've written a populateVoiceList() function. We first invoke SpeechSynthesis.getVoices() , which returns a list of all the available voices, represented by SpeechSynthesisVoice objects. We then loop through this list — for each voice we create an <option> element, set its text content to display the name of the voice (grabbed from SpeechSynthesisVoice.name ), the language of the voice (grabbed from SpeechSynthesisVoice.lang ), and -- DEFAULT if the voice is the default voice for the synthesis engine (checked by seeing if SpeechSynthesisVoice.default returns true .)

We also create data- attributes for each option, containing the name and language of the associated voice, so we can grab them easily later on, and then append the options as children of the select.

Older browser don't support the voiceschanged event, and just return a list of voices when SpeechSynthesis.getVoices() is fired. While on others, such as Chrome, you have to wait for the event to fire before populating the list. To allow for both cases, we run the function as shown below:

Speaking the entered text

Next, we create an event handler to start speaking the text entered into the text field. We are using an onsubmit handler on the form so that the action happens when Enter / Return is pressed. We first create a new SpeechSynthesisUtterance() instance using its constructor — this is passed the text input's value as a parameter.

Next, we need to figure out which voice to use. We use the HTMLSelectElement selectedOptions property to return the currently selected <option> element. We then use this element's data-name attribute, finding the SpeechSynthesisVoice object whose name matches this attribute's value. We set the matching voice object to be the value of the SpeechSynthesisUtterance.voice property.

Finally, we set the SpeechSynthesisUtterance.pitch and SpeechSynthesisUtterance.rate to the values of the relevant range form elements. Then, with all necessary preparations made, we start the utterance being spoken by invoking SpeechSynthesis.speak() , passing it the SpeechSynthesisUtterance instance as a parameter.

In the final part of the handler, we include an pause event to demonstrate how SpeechSynthesisEvent can be put to good use. When SpeechSynthesis.pause() is invoked, this returns a message reporting the character number and name that the speech was paused at.

Finally, we call blur() on the text input. This is mainly to hide the keyboard on Firefox OS.

Updating the displayed pitch and rate values

The last part of the code updates the pitch / rate values displayed in the UI, each time the slider positions are moved.

Nordic APIs

5 Best Speech-to-Text APIs

J.Simpson

Voice search is becoming increasingly prevalent as the years tick on, as increasing amounts of users access the Internet via mobile devices and with the help of voice assistants like Alexa. 41% of adults report using voice search on a daily basis.

Voice search is becoming an essential component of eCommerce, as well. 50% of consumers report making a purchase using voice search in the last year. Neglecting voice is like leaving money on the table, not to mention potentially alienating your audience.

Voice is also highly useful for segmenting your audience. Voice search is used most widely by affluent, highly-educated consumers . You could potentially integrate voice into a digital marketing campaign, as part of your marketing funnel, segmenting your audience in all manner of useful ways.

The fact that voice search could possibly alert you to members of your audience with money to burn and a willingness to spend is reason enough to investigate voice and integrate it into your existing workflow.

But how do you go about integrating voice recognition into your website or app? Isn’t that the domain of uber-rich companies with heavy investments in machine learning and virtual reality?

Not necessarily.

There are numerous speech-to-text web APIs you can use to power your app or website. We’re going to dig into some of our favorite, most useful APIs for voice search.

The 5 Best APIs For Speech-To-Text

Ranking tech solutions from best to worst is always going to be subjective. What constitutes the best API will largely depend on what you’re going to be using voice recognition for.

We’ll be segmenting our favorite speech-to-text APIs by application, as a way to help you figure out which API will best suit your particular needs.

Speech-To-Text APIs for Short Online Searches

The phrases people tend to use to look things up online tend to be short, sweet, and to the point. Voice search APIs for online applications won’t need to be as thorough or have as many technical considerations, like grammar or syntax, to consider. This means these APIs tend to be lighter, faster, and quicker to load.

1. Google Speech-To-Text

speech-api-lead

Google Speech-To-Text was unveiled in 2018 , just one week after their text-to-speech update. Google’s Speech-To-Text API makes some audacious claims, reducing word errors by 54% in test after test. In certain areas, the results are even more encouraging.

One of the reasons for the APIs impressive accuracy is the ability to select between different machine learning models , depending on what your application’s being used for. This also makes Google Speech-To-Text a suitable solution for applications other than short web searches. It can also be configured for audio from phone calls or videos. There’s a fourth setting, as well, which Google recommends using as default.

The Speech-To-Text API also features an impressive update for extended punctuation options. This is designed to make more useful transcriptions, with fewer run-on sentences or punctuation errors.

The newest update also allows developers to tag their transcribed audio or video with basic metadata . This is more for the company’s benefit than for the developers, however, as it will allow Google to decide which features are most useful for programmers.

The Google Speech-To-Text API isn’t free, however. It is free for speech recognition for audio less than 60 minutes. For audio transcriptions longer than that, it costs $0.006 per 15 seconds.

For video transcriptions, it costs $0.006 per 15 seconds for videos up to 60 minutes in length. For video longer than one hour, it costs $0.012 for every 15 seconds. Make sure you factor that into your pricing models when developing applications and web services.

  • Recognizes over 120 languages
  • Multiple machine learning models for increased accuracy
  • Automatic language recognition
  • Text transcription
  • Proper noun recognition
  • Data privacy
  • Noise cancellation for audio from phone calls and video
  • Costs money
  • Limited custom vocabulary builder

2. Microsoft Cognitive Services

Microsoft is also a major player in the world of voice recognition APIs. Microsoft Cognitive Services is more than just another speech recognition API, however. It’s also a part of the Microsoft Trust Services which offer unparalleled security options for developers looking for the most secure data for their applications.

The main thing that separates Microsoft Cognitive Services’ Speech to Text API  is the Speaker Recognition function. This is the auditory version of security software like face recognition . Think of it as a retina scan for the sound of the user’s voice. It makes it incredibly easy for different levels of users.

This same voice recognition capability allows software to adapt to specific user’s speech styles and patterns. It also offers more custom vocabulary options than Google, as an additional benefit.

Beyond that, Microsoft Cognitive Service’s speech recognition API has many of the same benefits of other voice APIs. It can perform real-time transcription , as well as converting text-into-speech. Thus, Microsoft Cognitive Services can cover most of your text and speech-based needs. It can also be used for call center log analysis, if you’ve got large amounts of audio that needs to be analyzed.

Considering the widespread popularity of Microsoft products and services, Microsoft Cognitive Services is growing faster than many of the other APIs on our list. If you’re looking to join in with a vibrant, active community of developers, Microsoft Cognitive Services could be a good fit.

  • Enhanced data security via voice-recognition algorithms
  • Real-time transcription
  • Real-time translation
  • Customizable vocabulary
  • Text-to-speech capabilities for natural speech patterns
  • Built-in constraints due to the API being created for general purposes
  • Uses microservices, which can be useful for solving individual problems but falls short for larger problems

3. Dialogflow (Formerly API.AI, Speaktoit)

Dialogflow is also owned by Google. The main advantage over other voice APIs is Dialogflow’s ability to take context into consideration when analyzing speech, which makes for more accurate transcriptions. It also allows developers to customize their voice-based commands for different devices, such as smart devices, phones, wearables, cars, and smart speakers.

Dialogflow’s earlier incarnation, Api.ai, was used to power the Assistant app, one of the earliest virtual voice-based assistants, way back in 2014. It’s since been discontinued but demonstrates that Dialogflow has been in the AI/machine learning/voice recognition game for longer than most.

The Dialogflow voice recognition API also has a number of analytics built into the platform. You can measure user engagement or session metrics, as well as usage patterns or latency issues. This is bound to be helpful when getting investors, sales and marketing teams, and developers on the same page.

Dialogflow currently only supports 14 languages, however. This makes it less useful for multilingual software than Google Speech-To-Text or Microsoft Cognitive Services.

  • Easy to use
  • Easy to set up
  • Integrates with a wide variety of software
  • Easily integrated with other web services
  • Can integrate with non-Google devices like Amazon’s Alexa
  • Cannot handle math functions
  • Cannot match intent with common phrases
  • Cannot create clickable links in the text box
  • Cannot search across intents
  • Can only provide one webhook

Voice Recognition APIs for Longform and Offline Processing

4. ibm watson.

It’s no secret we’re generating, processing, and analyzing larger quantities of data than any other time in history. Not all of that data is going to be clean and well-organized, especially if you’re designing or developing an API. As API developers, it’s our job to make sure that the data is organized and usable.

IBM Watson is perhaps one of the purest expressions of AI as a virtual assistant . IBM Watson is very adept at processing natural language patterns, which is one of the holy grails of AI and machine learning developers.

The I BM Watson Speech to Text API is particularly robust in understanding context, relying on hypothesis generation and evaluation in its response formulation. It’s also able to differentiate between multiple speakers, which makes it suitable for most transcription tasks. You can even set a number of filters, eliminating profanities, adding word confidence, and formatting options for speech-to-text applications.

IBM Watson offers three different interfaces for developers. There’s a WebSocket interface, an HTTP REST interface, and an asynchronous HTTP interface.

IBM Watson is simple to set up and implement, which makes it a wonderful option for those looking for a Speech-To-Text API but aren’t completely technically proficient. IBM provides  extensive documentation and one of the most thorough API reference manuals on the market. If you’re looking for a speech-to-text API that’s simple to set up and start using immediately, IBM Watson might be a good fit.

Of course, IBM Watson is more than just a speech-to-text API. It’s one of the most fully-developed machine learning libraries in existence. It continues to learn and evolve, the more you use it. This makes it suitable for preventing outages and disruptions as well as accelerating research and data . Most applications that would benefit from structuring unstructured data will benefit from using the IBM Watson API.

As one of the best-developed machine learning APIs out there, IBM Watson isn’t cheap. It is quick to get up and running, however, meaning you won’t waste money on downtime or having to hire multiple developers just to get started. The peace of mind of a nearly plug-and-play Speech-To-Text API may be worth the cost of admission alone.

  • Processes unstructured data
  • Assists humans instead of replacing them
  • Helps overcome human limitations
  • Improves productivity be delivering relevant data
  • Improves user experience
  • Can process large quantities of data
  • Easy to set up and get started with
  • Doesn’t directly support structured data
  • Expensive to switch to
  • Requires maintenance
  • Only supports a limited number of languages
  • Takes time to implement fully
  • Requires education and training to make full use of its resources

5. Speechmatics

Speechmatics offers an easy-to-use cloud-based API for automatic transcription services. Its main claim to fame is that it supports a wide range of file formats, meaning it can be used for offline file processing.

The Speechmatics API is also highly adept at speaker recognition . It processes an impressive array of different variables, from confidence values to timing and speaker indications. This makes Speechmatics useful for machine learning applications, as it gets to know a speaker more thoroughly with each iteration.

Speechmatics has been found to be one of the fastest and most reliable automatic transcription APIs available for developers. It also supports nine languages, including different variants on English, including British and Australian English.

There are a couple of drawbacks to the Speechmatics API, however, although none of them are major enough to be a dealbreaker. First and most notably, there’s no app interface. If you’ll be using the transcription services, you’ll need to upload the audio to the website.

Secondly, each query does cost money. It costs .06 GBP per 1 minute of processed audio. If you’re going to be using the Speechmatics API for any sort of commercial app or web service, make sure to consider that when setting your processing. They do offer a discount for over 1000 minutes of processed audio. Perhaps you can work out some sort of bulk rate if you’re going to be using the Speechmatics API extensively.

  • Supports multiple languages
  • Supports multiple English variants
  • Multi-speaker support
  • Multiple file formats supported
  • Does well with noisy audio
  • Easily integrated via REST API
  • Speaker recognition
  • Can be used for cloud-based transcription services and private usage, using the same API
  • No app interface
  • Costs money for each query

Final Thoughts

Not all Voice-To-Text APIs are created equal. In fact, think of a voice recognition API as a toolbox rather than a product you’d buy off the shelf. Each one has different strengths and weaknesses. Knowing which Speech-To-Text API is right for your product largely depends on what you’ll be using it for.

These five APIs certainly aren’t the only ones you can use for voice-related functions, either. Some other noteworthy voice recognition APIs are worthy of a look.

Other Noteworthy Voice Recognition APIs include:

  • UWP Speech Recognition by Microsoft
  • CMU Sphinx Speech Recognition Toolkit (open source)
  • Kaldi Speech Recognition Toolkit For Research (open source)

Each one of the speech-to-text APIs has its strengths. If you need transcription or to decode noisy audio, Google Speech-To-Text is an excellent contender. If you’re looking for real-time translation and transcription functionality, Microsoft Cognitive Services is probably going to be your best bet. If you’re looking for a plug-and-play voice recognition API that easily configures for numerous devices and software environments, Dialogflow might be right for you.

If you’re going to be dealing with large amounts of unstructured data, however, IBM Watson is going to be the best suited for your particular needs. If you’re going to be needing speaker separation or easy integration with additional software, Speechmatics will make your life as easy as possible, with its convenient REST API.

Considering the rise of mobile and hands-free devices, virtual assistants, and AI, it’s safe to say that voice integration isn’t going anywhere. It’s only going to get more prevalent, as technology continues to intertwine with the fabric of our daily lives.

The latest API insights straight to your inbox

J.

J. Simpson lives at the crossroads of logic and creativity. He writes and researches tech-related topics extensively for a wide variety of publications, including Forbes Finds. He is also a graphic designer, journalist, and academic writer, writing on the ways that technology is shaping our society while using the most cutting-edge tools and techniques to aid his path. He lives in Portland, Or.

  •  REST vs GraphQL: How...
  •  How Does Open Banking Apply... 

Latest Posts

How the api economy is changing in 2024.

Steve Rodda

Prototype-First API Design

Tom Akehurst

10 Search Engine Results Page (SERP) APIs

J. Simpson

Smarter Tech Decisions Using APIs

High impact blog posts and eBooks on API business models, and tech advice

Connect with market leading platform creators at our events

Join a helpful community of API practitioners

API Insights Straight to Your Inbox!

Can't make it to the event? Signup to the Nordic APIs newsletter for quality content. High impact blog posts on API business models and tech advice.

Join Our Thriving Community

Become a part of our global community of API practitioners and enthusiasts. Share your insights on the blog, speak at an event or exhibit at our conferences and create new business relationships with decision makers and top influencers responsible for API solutions.

Nordic APIs Community

Speech to Text - Voice Typing & Transcription

Take notes with your voice for free, or automatically transcribe audio & video recordings. secure, accurate & blazing fast..

~ Proudly serving millions of users since 2015 ~

I need to >

Dictate Notes

Start taking notes, on our online voice-enabled notepad right away, for free.

Transcribe Recordings

Automatically transcribe (as well as summarize & translate) audios & videos. Upload files from your device or link to an online resource (Drive, YouTube, TikTok or other). Export to text, docx, video subtitles & more.

Speechnotes is a reliable and secure web-based speech-to-text tool that enables you to quickly and accurately transcribe your audio and video recordings, as well as dictate your notes instead of typing, saving you time and effort. With features like voice commands for punctuation and formatting, automatic capitalization, and easy import/export options, Speechnotes provides an efficient and user-friendly dictation and transcription experience. Proudly serving millions of users since 2015, Speechnotes is the go-to tool for anyone who needs fast, accurate & private transcription. Our Portfolio of Complementary Speech-To-Text Tools Includes:

Voice typing - Chrome extension

Dictate instead of typing on any form & text-box across the web. Including on Gmail, and more.

Transcription API & webhooks

Speechnotes' API enables you to send us files via standard POST requests, and get the transcription results sent directly to your server.

Zapier integration

Combine the power of automatic transcriptions with Zapier's automatic processes. Serverless & codeless automation! Connect with your CRM, phone calls, Docs, email & more.

Android Speechnotes app

Speechnotes' notepad for Android, for notes taking on your mobile, battle tested with more than 5Million downloads. Rated 4.3+ ⭐

iOS TextHear app

TextHear for iOS, works great on iPhones, iPads & Macs. Designed specifically to help people with hearing impairment participate in conversations. Please note, this is a sister app - so it has its own pricing plan.

Audio & video converting tools

Tools developed for fast - batch conversions of audio files from one type to another and extracting audio only from videos for minimizing uploads.

Our Sister Apps for Text-To-Speech & Live Captioning

Complementary to Speechnotes

Reads out loud texts, files & web pages

Reads out loud texts, PDFs, e-books & websites for free

Speechlogger

Live Captioning & Translation

Live captions & translations for online meetings, webinars, and conferences.

Need Human Transcription? We Can Offer a 10% Discount Coupon

We do not provide human transcription services ourselves, but, we partnered with a UK company that does. Learn more on human transcription and the 10% discount .

Dictation Notepad

Start taking notes with your voice for free

Speech to Text online notepad. Professional, accurate & free speech recognizing text editor. Distraction-free, fast, easy to use web app for dictation & typing.

Speechnotes is a powerful speech-enabled online notepad, designed to empower your ideas by implementing a clean & efficient design, so you can focus on your thoughts. We strive to provide the best online dictation tool by engaging cutting-edge speech-recognition technology for the most accurate results technology can achieve today, together with incorporating built-in tools (automatic or manual) to increase users' efficiency, productivity and comfort. Works entirely online in your Chrome browser. No download, no install and even no registration needed, so you can start working right away.

Speechnotes is especially designed to provide you a distraction-free environment. Every note, starts with a new clear white paper, so to stimulate your mind with a clean fresh start. All other elements but the text itself are out of sight by fading out, so you can concentrate on the most important part - your own creativity. In addition to that, speaking instead of typing, enables you to think and speak it out fluently, uninterrupted, which again encourages creative, clear thinking. Fonts and colors all over the app were designed to be sharp and have excellent legibility characteristics.

Example use cases

  • Voice typing
  • Writing notes, thoughts
  • Medical forms - dictate
  • Transcribers (listen and dictate)

Transcription Service

Start transcribing

Fast turnaround - results within minutes. Includes timestamps, auto punctuation and subtitles at unbeatable price. Protects your privacy: no human in the loop, and (unlike many other vendors) we do NOT keep your audio. Pay per use, no recurring payments. Upload your files or transcribe directly from Google Drive, YouTube or any other online source. Simple. No download or install. Just send us the file and get the results in minutes.

  • Transcribe interviews
  • Captions for Youtubes & movies
  • Auto-transcribe phone calls or voice messages
  • Students - transcribe lectures
  • Podcasters - enlarge your audience by turning your podcasts into textual content
  • Text-index entire audio archives

Key Advantages

Speechnotes is powered by the leading most accurate speech recognition AI engines by Google & Microsoft. We always check - and make sure we still use the best. Accuracy in English is very good and can easily reach 95% accuracy for good quality dictation or recording.

Lightweight & fast

Both Speechnotes dictation & transcription are lightweight-online no install, work out of the box anywhere you are. Dictation works in real time. Transcription will get you results in a matter of minutes.

Super Private & Secure!

Super private - no human handles, sees or listens to your recordings! In addition, we take great measures to protect your privacy. For example, for transcribing your recordings - we pay Google's speech to text engines extra - just so they do not keep your audio for their own research purposes.

Health advantages

Typing may result in different types of Computer Related Repetitive Strain Injuries (RSI). Voice typing is one of the main recommended ways to minimize these risks, as it enables you to sit back comfortably, freeing your arms, hands, shoulders and back altogether.

Saves you time

Need to transcribe a recording? If it's an hour long, transcribing it yourself will take you about 6! hours of work. If you send it to a transcriber - you will get it back in days! Upload it to Speechnotes - it will take you less than a minute, and you will get the results in about 20 minutes to your email.

Saves you money

Speechnotes dictation notepad is completely free - with ads - or a small fee to get it ad-free. Speechnotes transcription is only $0.1/minute, which is X10 times cheaper than a human transcriber! We offer the best deal on the market - whether it's the free dictation notepad ot the pay-as-you-go transcription service.

Dictation - Free

  • Online dictation notepad
  • Voice typing Chrome extension

Dictation - Premium

  • Premium online dictation notepad
  • Premium voice typing Chrome extension
  • Support from the development team

Transcription

$0.1 /minute.

  • Pay as you go - no subscription
  • Audio & video recordings
  • Speaker diarization in English
  • Generate captions .srt files
  • REST API, webhooks & Zapier integration

Compare plans

Privacy policy.

We at Speechnotes, Speechlogger, TextHear, Speechkeys value your privacy, and that's why we do not store anything you say or type or in fact any other data about you - unless it is solely needed for the purpose of your operation. We don't share it with 3rd parties, other than Google / Microsoft for the speech-to-text engine.

Privacy - how are the recordings and results handled?

- transcription service.

Our transcription service is probably the most private and secure transcription service available.

  • HIPAA compliant.
  • No human in the loop. No passing your recording between PCs, emails, employees, etc.
  • Secure encrypted communications (https) with and between our servers.
  • Recordings are automatically deleted from our servers as soon as the transcription is done.
  • Our contract with Google / Microsoft (our speech engines providers) prohibits them from keeping any audio or results.
  • Transcription results are securely kept on our secure database. Only you have access to them - only if you sign in (or provide your secret credentials through the API)
  • You may choose to delete the transcription results - once you do - no copy remains on our servers.

- Dictation notepad & extension

For dictation, the recording & recognition - is delegated to and done by the browser (Chrome / Edge) or operating system (Android). So, we never even have access to the recorded audio, and Edge's / Chrome's / Android's (depending the one you use) privacy policy apply here.

The results of the dictation are saved locally on your machine - via the browser's / app's local storage. It never gets to our servers. So, as long as your device is private - your notes are private.

Payments method privacy

The whole payments process is delegated to PayPal / Stripe / Google Pay / Play Store / App Store and secured by these providers. We never receive any of your credit card information.

More generic notes regarding our site, cookies, analytics, ads, etc.

  • We may use Google Analytics on our site - which is a generic tool to track usage statistics.
  • We use cookies - which means we save data on your browser to send to our servers when needed. This is used for instance to sign you in, and then keep you signed in.
  • For the dictation tool - we use your browser's local storage to store your notes, so you can access them later.
  • Non premium dictation tool serves ads by Google. Users may opt out of personalized advertising by visiting Ads Settings . Alternatively, users can opt out of a third-party vendor's use of cookies for personalized advertising by visiting https://youradchoices.com/
  • In case you would like to upload files to Google Drive directly from Speechnotes - we'll ask for your permission to do so. We will use that permission for that purpose only - syncing your speech-notes to your Google Drive, per your request.

The world’s most accurate API for AI- and human-generated transcripts

Trained from the most diverse collection of voices in the world, Rev AI sets the accuracy standard for video and voice applications.

  • Submit audio or video files and get machine-generated transcripts in minutes
  • High accuracy
  • 58+ languages available
  • Generates transcription in real-time as audio or video is streamed
  • 9 languages available
  • Get the highest level of accuracy from human-created transcripts
  • ~24 hour turnaround time
  • English only
  • Predicts the dominant language used in an audio or video file
  • 22 languages available
  • Get positive, negative, and neutral statements from text
  • Identify key topics in text
  • Great for auto-tagging
  • Transform voice content into concise, actionable summaries
  • Communicate across languages with context-aware translations
  • 11 languages available
  • Precise timestamps enhance content searchability and analysis
  • English, Spanish, and French available

Transcend barriers of communication with Rev AI

Advanced Text to Speech API

  • ~400ms latency
  • High quality at speed

speech to text free api

Highest Quality Audio Output

Contextual awareness.

Understands text nuances for appropriate intonation and resonance.

Emotional Range

Adapt the emotional tone to suit any narrative required.

Multilingual Capability

Authentic speech across 29 languages, with each voice maintaining its original characteristics.

Voice Variety

Use voice design and a comprehensive library to discover voices for every use-case.

High Quality Output

Supreme audio quality at 128 kbps to elevate the listener's experience.

Audio Streaming

Quickly generate long-form content, at no loss to quality.

Low Latency Turbo Model

Build Faster Than Ever

Build Faster Than Ever

ElevenLabs Grants

3 Months Free

11m characters, api features, 1000s of hq voices.

Create custom voices by cloning your own voice, create a new one from scratch or explore our library.

Real-time Latency

Get the fastest response time in the industry with our real-time API. Achieve ~400ms audio generation times at 128kbps.

Contextual awareness

Our text to speech model understands the context of the text to deliver the most natural sounding voices.

Enterprise-ready Security

Trusted security and data controls, soc2 and gdpr.

Compliant with the highest security and data handling standards

Full Privacy Mode

Optional Full Privacy mode that enables zero content and data retention on ElevenLabs servers. Exclusively for Enterprise.

End-To-End Encryption

Content and data sent to and from our models are always protected

Explore our resources

Python library, react text to speech guide, gaming ai voice guide, multilingual text to speech api in 29 languages, developer api, enterprise scale, frequently asked questions, what makes elevenlabs api the best tts api.

It offers unparalleled quality, multilingual capabilities, and low latency (<500ms), ensuring optimal user experience. It also provides a comprehensive library of voices and a variety of voice settings to suit any use-case.

What is a text to speech & AI voice API?

It is an application programming interface that allows developers to integrate text-to-speech and voice cloning capabilities into their applications. It works by leveraging deep learning to convert text into speech, and speech into a different voice. The technology has had significant growth in recent months due to its ability to create a more immersive user experience. It is used to create audiobooks, podcasts, voice assistants, and more. It can also be used to create custom voices for gaming, movies, and other media.

How do I get started with the text to speech API?

You can get started by signing up for a free account. Once you have an account, find your xi-api-key in your profile settings after registration. This key is required for authentication in API requests. You can then generate audio from text in a variety of languages by sending a POST request to the API with the desired text and voice settings. The API returns an audio file in response. Use programming languages like Python for these requests, as demonstrated in the example above.

How does the API ensure high-quality output?

It delivers audio at 128 kbps, allowing for a premium listening experience. It also offers a variety of voice settings to suit any use-case, including emotional range, contextual awareness, and voice variety.

Can I get support during the integration process?

Yes, extensive resources, an active developer community, and a responsive support team are available to assist you.

How many languages does the API support?

Our text to speech API supports 29 languages including Hindi, Spanish, German, Arabic & Chinese. Each voice maintains its unique characteristics across all languages.

What is the latency of the text to speech API?

The API boasts ultra-low latency, achieving approximately 400ms audio generation times with its Turbo model. This ensures a quick turnaround from text input to audio output. Multiple latency optimization modes are available, enabling significant improvements and responsiveness.

What are the use cases for the ElevenLabs TTS API?

The API can be used to create audiobooks, podcasts, voice assistants, and more. It can also be used to create custom voices for gaming, movies, and other media.

What is an AI voice API and how does it work?

An AI voice API is an application programming interface that allows developers to integrate text-to-speech and voice cloning capabilities into their applications. It works by leveraging deep learning to convert text into speech, and speech into a different voice.

What is the best text to speech (TTS) API?

The best text to speech API is one that offers high-quality output, multilingual capabilities, and low latency. It should also provide a comprehensive library of voices and a variety of voice settings to suit any use-case. You can find all of these features and more with ElevenLabs.

chart, waterfall chart

AI + Machine Learning , Announcements , Azure AI Content Safety , Azure AI Studio , Azure OpenAI Service , Partners

Introducing GPT-4o: OpenAI’s new flagship multimodal model now in preview on Azure

By Eric Boyd Corporate Vice President, Azure AI Platform, Microsoft

Posted on May 13, 2024 2 min read

  • Tag: Copilot
  • Tag: Generative AI

Microsoft is thrilled to announce the launch of GPT-4o, OpenAI’s new flagship model on Azure AI. This groundbreaking multimodal model integrates text, vision, and audio capabilities, setting a new standard for generative and conversational AI experiences. GPT-4o is available now in Azure OpenAI Service, to try in preview , with support for text and image.

Azure OpenAI Service

A person sitting at a table looking at a laptop.

A step forward in generative AI for Azure OpenAI Service

GPT-4o offers a shift in how AI models interact with multimodal inputs. By seamlessly combining text, images, and audio, GPT-4o provides a richer, more engaging user experience.

Launch highlights: Immediate access and what you can expect

Azure OpenAI Service customers can explore GPT-4o’s extensive capabilities through a preview playground in Azure OpenAI Studio starting today in two regions in the US. This initial release focuses on text and vision inputs to provide a glimpse into the model’s potential, paving the way for further capabilities like audio and video.

Efficiency and cost-effectiveness

GPT-4o is engineered for speed and efficiency. Its advanced ability to handle complex queries with minimal resources can translate into cost savings and performance.

Potential use cases to explore with GPT-4o

The introduction of GPT-4o opens numerous possibilities for businesses in various sectors: 

  • Enhanced customer service : By integrating diverse data inputs, GPT-4o enables more dynamic and comprehensive customer support interactions.
  • Advanced analytics : Leverage GPT-4o’s capability to process and analyze different types of data to enhance decision-making and uncover deeper insights.
  • Content innovation : Use GPT-4o’s generative capabilities to create engaging and diverse content formats, catering to a broad range of consumer preferences.

Exciting future developments: GPT-4o at Microsoft Build 2024 

We are eager to share more about GPT-4o and other Azure AI updates at Microsoft Build 2024 , to help developers further unlock the power of generative AI.

Get started with Azure OpenAI Service

Begin your journey with GPT-4o and Azure OpenAI Service by taking the following steps:

  • Try out GPT-4o in Azure OpenAI Service Chat Playground (in preview).
  • If you are not a current Azure OpenAI Service customer, apply for access by completing this form .
  • Learn more about  Azure OpenAI Service  and the  latest enhancements.  
  • Understand responsible AI tooling available in Azure with Azure AI Content Safety .
  • Review the OpenAI blog on GPT-4o.

Let us know what you think of Azure and what you would like to see in the future.

Provide feedback

Build your cloud computing and Azure skills with free courses by Microsoft Learn.

Explore Azure learning

Related posts

AI + Machine Learning , Announcements , Azure AI , Azure AI Studio , Azure OpenAI Service , Events

New models added to the Phi-3 family, available on Microsoft Azure   chevron_right

AI + Machine Learning , Announcements , Azure AI , Azure AI Content Safety , Azure AI Services , Azure AI Studio , Azure Cosmos DB , Azure Database for PostgreSQL , Azure Kubernetes Service (AKS) , Azure OpenAI Service , Azure SQL Database , Events

From code to production: New ways Azure helps you build transformational AI experiences   chevron_right

AI + Machine Learning , Azure AI Studio , Customer stories

3 ways Microsoft Azure AI Studio helps accelerate the AI development journey     chevron_right

AI + Machine Learning , Analyst Reports , Azure AI , Azure AI Content Safety , Azure AI Search , Azure AI Services , Azure AI Studio , Azure OpenAI Service , Partners

Microsoft is a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud AI Developer Services   chevron_right

Join the conversation, leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

I understand by submitting this form Microsoft is collecting my name, email and comment as a means to track comments on this website. This information will also be processed by an outside service for Spam protection. For more information, please review our Privacy Policy and Terms of Use .

I agree to the above

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Top 50 Best APIs

The Top 50 Most Popular APIs

By Team RapidAPI // March 16, 2023

Updated January 2023

If you’re in the API development or software development space and are searching for the most popular and widely used APIs ( application programming interfaces ), or are looking for the APIs to integrate one of the best APIs into your website, software application, or mobile app, you’ve come to the right place.

With over 14 thousand public APIs, and 400 billion API calls per month on the Rapid API Hub, we analyzed the data to see which were the most popular APIs by API requests in 2022. Rapid has got you covered with a validation list of the top 50 APIs.

What are the top API categories?

Due to the rise and continued emergence of social platforms, it’s no surprise that social came in at number 1 for 2022, with data following at number 2 and sports rounding out the top 3 (sports was a top 3 category in 2021 as well). For the full list, check out the most popular APIs by category .

Top 3 popular API categories or Rapids API Hub

What are the most popular and top used APIs? 

Out of the 14,000+ APIs on the Rapid API Marketplace, the following are the ones that have the highest number of API calls. Click on “Connect to API” to view the interactive API documentation; see the API description, test the API endpoints, learn about the API design, and view the API pricing.

Quick Links

  • Most Popular APIs by Category
  • API Collections
  • Previous Top 50

Here were the most popular API integrations for 2022:

1. tokapi – mobile version .

Most popular API of 2022

Category: Social

Connect to API

Starting with the top API, TokApi – mobile version .

With the explosive growth and popularity of TikTok, it’s no surprise TokApi – mobile version, a TikTok mobile API, is the most popular among developers.

What is TokApi – mobile version?

TokApi – mobile version is a TikTok mobile API that allows developers to search with ease.

  • Trending – view all trending categories and feed recommendations
  • Hashtag ID – search posts and videos by Hashtag IDs
  • Music ID – search music information and videos by music ID
  • Video –  get video feeds, search videos by web URL or video ID, and retrieve comments by video ID
  • Search – text query search capabilities for video, hashtags, users, and music
  • Effects – retrieve effects information
  • User – retrieve user playlists, videos, information, usernames, QR codes, followers, and following lists

Related: List of Tiktok APIs

2. Google Search

Second most popular API is Google Search

Category: Data

Google Search provides powerful search results in real time. The API retrieves data from both Google web and image search.

  • Web search – support for web searches
  • Image search – search support for images

Third most popular API is Deezer

Category: Music

The Deezer API gives developers access to Deezer’s massive music database of over 30 million tracks and playlists.

  • Infos – retrieve for the current country

Related: Collection of best music and audio APIs

4. API-FOOTBALL

API Football is a top 10 API

Category: Sports

API-Football is the most popular RESTful API for football (soccer) data. It covers over 960 football leagues and cups, and provides live scores, pre-match odds, events, line-ups, standings, stats, and much more.

Endpoints :

  • Timezone – get a list of available timezones to be used in the fixtures endpoint
  • Predictions – made using the poisson distribution, comparison of team statistics, last matches, players, etc.
  • Fixtures – retrieve a list of available fixtures according to the parameters
  • Teams – get a list of all available teams
  • Countries & Seasons – view a list of available countries and seasons
  • Leagues – retrieve a list of all available leagues and cups
  • Odds – taken from fixtures, leagues, or date
  • Events – retrieve the events from a fixture
  • Standings – see the standings for a league or a team
  • Statistics – view the statistics from one fixture
  • Players – get player statistics
  • Lineups – see the lineups for a fixture
  • Trophies – view all the trophies given to a player or a coach
  • Odds – returns in-play odds for fixtures in progress
  • Sidelined – see all sidelined players and coaches
  • Transfers – retrieve all available transfers of players and teams
  • Search – search by team, country, league, player, coach, venue, bet, or bookmaker
  • Top Scorers – see the 20 best players for a league or a cup
  • Venues – view a list of all available venues
  • Injuries – retrieve a list of all unavailable players due to illness, suspension, or injuries
  • Coaches – get information for coaches, including their career history

Related: Top Football (Soccer) APIs

5. ScrapTik

ScrapTik is a top 10 API for 2022

ScrapTik is a social API that allows developers to scrape data from the TikTok mobile app. The API is used as a gateway for fetching trending videos, users, music, and more with 100% uptime for a great user experience.

  • Scraping API – scrape data directly from the TikTok mobile app (IE; users, posts, hashtags, music, etc.)
  • Scraping API (Web API) – get user info and posts
  • Login API – login or send a code using SMS text, and check if unique ID exists
  • Stories API – view user stories
  • Notifications API – show all account notifications (IE; activity, followers, mentions, tags, likes, etc.)
  • Livestream API – start, create, get, and end a livestream
  • Services – additional tools for developers like x-argus, x-ladon, x-gorgon, and device registration

6. Tiktok video no watermark

Tiktok no watermark is the sixth most popular API

Another TikTok API coming in on the top 10; the Tiktok video no watermark API allows developers to download Tiktok videos without watermarks, so they can be posted to other social platforms.

  • User Related API
  • Music Related API
  • Feed Related API
  • Comment Related API
  • Challenge (Hashtag)
  • Service API – including Tiktok Mobile endpoint signing (X-Argus, X-Ladon, X-Gorgon, X-Khronos), device registration, TTencryption, XLog encryption, and decryption

7. Rapid Translate Multi Traduction

Rapid Multi Traduction is the seventh most popular API

Category: Text Analysis

The Rapid Translate Multi Traduction API translates html, text, words, phrases, and paragraphs in real time across more than 100 languages. This can improve response times and service quality by translating multiple texts in one fast query.

8. Youtube v3

Youtube v3 is the eighth popular API

Another hot social API, the Youtube v3 API, provides Youtube data without having a Youtube data api key.

  • Captions List
  • Video Comments
  • Video Details
  • Channel Details
  • Channel Videos
  • Suggested Videos
  • Playlist Videos
  • Playlist Details

Related: List of Youtube APIs

9. OpenAPI 1.2

OpenAPI 1.2 is the ninth popular API

Category: Location

The TransLoc OpenAPI is a public RESTful API that allows developers to access real time vehicle tracking information and incorporate this data into their website or mobile application.

  • Arrival Estimates

Related: Top Geocoding and Location APIs

10. TikTok All in One

TikTok All in One is the tenth popular API

To round out the top 10 most popular APIs, is TikTok All in One. The API provides info and allows developers to search TikTok videos, music, hashtags (challenges), users, video comments, and live feeds.

  • Feed (Trending)
  • Hashtag (Challenge)

11. BetsAPI

Bet365 is the eleventh popular API

BetsAPI provides both in-play and pre-match sports data feeds and sports betting scores and odds for soccer, football, volleyball, cricket, and more.

  • Bet365 Inplay Filter
  • Bet365 InPlay
  • Bet365 InPlay Event
  • Bet365 Upcoming Events
  • Bet365 PreMatch Odds
  • Bet365 Result

Related: Sports Odds and Sports Betting APIs

12. Tiktok Download Without Watermark

Tiktok Download Without Watermark is a popular API

If you’re repurposing your Tiktok content for other social media platforms, it’s best to remove the Tiktok watermark as these other platforms demote Tiktok reels. The Tiktok Download Without Watermark API provides the fastest way to download videos without watermark directly from tiktok.

13. SportScore

SportsScore is a popular API

The SportScore API provides detailed data on standings, players, teams, coaches, lineups, stadiums, odds and odds-history, match locations, video goals and highlights. The API also provides real time data for live-score and game incidents (substitutions, corners, cards, etc).

  • Tennis Rankings

Imgur is a popular is API

Category: Entertainment

Using the Imgur API, programmers can do just about anything you can do on imgur.com , while using your programming language of choice. The Imgur API exposes the entire Imgur infrastructure via a standardized programmatic interface.

  • Notification
  • Custom Gallery
  • Conversation

15. Youtube Search and Download

Youtube Search and Download is a popular API

The Youtube Search and Download API does exactly what it says. Search for information about channel, playlist, video, and trends in Youtube.

  • About channel
  • Video related
  • Video comments
  • Search videos/channels/playlists

16. Pinnacle Odds

Pinnacle Odds is a popular API

The Pinnacle Odds API is a RESTful service for getting pre-match and live odds. Sports include soccer, tennis, basketball, hockey, football, MMA, and baseball.

  • List of special markets
  • List of sports
  • List of leagues
  • Betting status
  • List of markets
  • List of archive events
  • List of periods
  • Event details

Telize is a popular API

Telize is a REST API that allows developers to query location information from any IP address and retrieve visitor’s IP addresses.

  • Legacy – GeoIP – get visitor IP address location in JSON format
  • JSON IP – provides the visitor IP address (IPv4 or IPv6) in a JSON object
  • IP – retrieves the visitor IP address (IPv4 or IPv6) in plain text, useful for shell scripts or to find the external Internet routable address
  • Location – get visitor IP address location also in JSON format
  • Location with specific IP – shows a specific IP address location in JSON format

18. Instagram

Instagram is a popular API

The Instagram API allows developers to get basic instagram user and media info.  

Related: List of Top Instagram APIs

19. webcams.travel

Webcams.travel is a popular API

Category: Travel

The webcams.travel API gathers webcams from around the world and makes them accessible to any developers via API. Using the API, developers can enrich their website or app with free webcam content from over 69,687 webcams around the world.

  • webcams/list
  • webcams/map

Related: Travel APIs

20. Twitter

Twitter is a popular API

The Twitter API allows developers to search users, followers, images, tweets, media, and more.

  • Explore – Auto Complete, Search
  • Tweet – Tweet Detail & Conversation, Tweet Favorites, Tweet Retweeters
  • User – by Screen Name, Rest ID, Rest IDs, User Tweets, User Tweets & Replies, User Media, User Likes, User Following, User Followers

Related: Top Twitter and Twitter-related APIs

21. COVID-19

COVID-19 is a popular API

Category: Health & Fitness

The Covid-19 API allows developers to follow the progress of the coronavirus around the world. This free API gives statistics for all countries on COVID-19.

Related: Coronavirus COVID-19 APIs

22. JoJ Unlimited Web Search

JoJ Unlimited Web Search is a popular API

Category: Search

This web search API searches the world’s information, including webpages, related keywords, and more.

AKA is popular API

Category: Other

24. Instagram Bulk Profile Scrapper

Instagram Bulk Profile Scrapper is a popular API

As its name suggests, this API is designed specifically to scrap Instagram profiles in bulk/high volume.

  • Post – Instagram feeds endpoints 
  • Reels – also known as Clips
  • Followers – fetched by username 
  • Following – Instagram following list 
  • Music – Instagram music API
  • Locations – get Instagram posts by location

25. Instrgram Scraper

Instagram Scraper is a popular API

Instagram Scraper API allows developers to get information they need from instagram. 

  • LocationMedias
  • HashTagMediasV2
  • SearchFollowing
  • SearchFollowers
  • HashTagMedias
  • MediaLikers
  • MediaComments
  • DetailUserInfo

26. MapTiles

MapTiles is a popular API

Category: Mapping

With the MapTiles API, developers can display a planet-covering map on 20 zoom levels. Choose between English, French or Spanish labeled raster map tiles to provide the best usability for your target audience.

  • getMapTilewithSpanishLabels
  • getMapTilewithEnglishLabels
  • getMapTilewithFrenchLabels
  • getStandardMapTile

Related: List of Mapping APIs

27. BraveNewCoin

BraveNewCoin is a popular API

Category: Finance

BraveNewCoin (BNC) has been providing data, indexes, analysis, and insight on the cryptographic asset marketplace since April 2014 through its website and API’s. The institutional grade V3 platform is the latest iteration of its systems.

  • AssetTicker

Related: The Brave New Coin API Collection

28. SendGrid

SendGrid is a popular API

Category: Email

The SendGrid API is RESTful, fully featured, and easy to integrate with. SendGrid delivers transactional and marketing emails through the world’s largest cloud-based email delivery platform. 

  • Update an alert
  • Retrieve all alerts
  • Invalid Emails
  • Spam Reports
  • Cancel Schedules Sends 
  • Unsubscribe Groups
  • Suppressions (Unsubscribe)
  • Settings – Tracking 
  • Settings – Mail 
  • Settings – Inbound Parse
  • Account Stats

29. IP Geolocation

IP Geolocation is a popular API

As the name suggests, the IP Geolocation API gives detailed information about the IP location of visitors. The API returns location data such as country, city, latitude, longitude, timezone, asn, currency, and security data for IPv4 and IPv6 addresses in JSON or XML formats. 

  • Visitor Lookup – returns the IP address with all the location data
  • IP Lookup – provides geo information for the given IP

Related: Top 10 Best IP Geolocation APIs (in 2022)

30. Instagram Scraper 2022

Instagram Scraper 2022 is a popular API

With an uptime of 97.85%, and a 100% response smart request filtering, Instagram Scraper 2022 allows developers to retrieve a lot of information from Instagram consistently and quickly. 

  • Media Info and Download
  • Not for use

31. Text Analysis

TextAnalysis is a popular text analysis API

The Text Analysis API streamlines the data mining process for developers and businesses so that they can quickly classify data from a variety of sources. The Text Analysis API performs various tasks such as summarization, language detection, sentiment analysis, article extraction, named entity recognition, and extract text from various documents, images, and audio files. 

  • Article-extraction – extract data from news sources like title, text, summary, keywords, authors, main image, all images, and links
  • Extract-text-from-files – Extract text from file formats such as csv, doc, docx, eml, epub, json, html, htm, msg, odt, pdf, pptx, ps, rtf, txt, xls, xlsx, gif, jpg, jpeg, png, tiff, tif, mp3, ogg, and wav
  • Language-detection – helps discover what languages are present in your text
  • Named-entity-recognition – locates and classifies named entities mentioned in unstructured text into predefined categories such as person names, organizations, locations, time expressions, quantities, monetary values, and percentages
  • Sentiment-analysis – used to systematically identify and extract subjective information, it is widely used in movie reviews and survey responses
  • Summarize-text – used to summarize a long-form piece of written content or text 
  • Website-extraction – extracts content from generic websites

Related: Top Text Analysis APIs

32. Translo

Translo is a popular translation API

Category: Translation

The Translo API is like Google Translate on steroids and is 3x more accurate. 

  • Translate – translate text and keeps emoji and html safe and sound
  • Batch_translate – translate groups in a fixed order
  • Detect – detects the language of text

33. Shazam Core

Shazam Core is a popular music API

Shazam is a mobile app that recognizes the music and TV playing nearby by creating a unique digital fingerprint to match with one of the millions of songs in the Shazam database. Developers use the Shazam Core API to define the song by uploading the file. 

  • Search – multi search and search suggest 
  • Artists – artist details 
  • Tracks – similarities, details, recognize, totals, track YouTube video, and related 
  • Charts – world charts by genre, country, and cities
  • Events – events details and lists 

Related: Collection of Best Music and Audio APIs

34. NLP Translation

NLP Translation is a widely used API

NLP Translation is a high quality neural machine translation (NMT) that utilizes over 110 languages. 

  • Translate – translate Text or HTML
  • JSON Data Translate – translate values inside JSON formatted string with protected key and word capabilities

Related: Top 7 NLP (Natural Language Processing) APIs

35. Article Data Extraction and Text Mining

Article Data Extraction is a popular API

Category: Media

The Article Data Extraction and Text Mining API by Ujeebu allows developers to extract meaningful data such as clean text from HTML regardless of the language used.

  • /v1.1/extract – article extraction endpoint
  • /v1.1/card – extract a preview of an article (or article card). Faster than the Extract endpoint, it doesn’t do any in-depth analysis of the article content, instead, it mostly relies on its meta tags.

36. TikTok best experience

TikTok Best Experience is a popular API

The TikTok Best Experience API is a TikTok intelligent proxy that allows developers to get data with 99.99% uptime. 

  • User’s data – retrieve by user ID or username
  • Video – get by ID or URL
  • Music – retrieve music feed and data by ID
  • Hashtag (Challenge) – get Hashtag’s data and feed by ID and name 
  • Comments – retrieve comments by video ID
  • Trending – see trending videos by region and all trending videos 
  • Search – search hashtag and users by query 
  • Followers – retrieve following and followers by user ID

37. Instagram Downloaded

Instagram Downloader is a popular API

The Instagram Downloaded API allows developers to pretty much download any type of Instagram media. Reels, IGTV, videos, photos, stories, carousel, and profile pictures.

38. ScrapeNinja

ScrapeNinja is a popular API

ScrapeNinja is a high performance API for web scraping, the process of using bots to extract content and data from a website. The API emulates Chrome TLS fingerprint, backed by rotating proxies and smart retries. Developers use this when node.js/curl/python fails to load the website but they still need fast scraping and want to avoid using Puppeteer and JS evaluation (ScrapeNinja returns raw HTTP responses by default). Javascript rendering is available as well. 

  • /scrape – scrape via POST method
  • /scrape-js – launches a real Chrome browser engine, used only when /scrape endpoint features are not enough
  • /scrape (legacy) – scrape via GET method; not recommended for production

39. OpenSea

OpenSea is a highly used API

The OpenSea API easily pulls in data on your NFT items. Search NFTs, collections, users, and more. 

  • V1.0 (stable):
  • Collections
  • Single Asset
  • Asset Owners 
  • Listings for an Asset
  • Offers for an Asset
  • Single Contract
  • Single collection
  • Collection stats 
  • Seaport Listings
  • Seaport Offers

40. seo api

seo api is a fevorite API

Use the Serply seo api to scrape search engine results with unlimited web searches. Retrieve links, descriptions, and titles. Monitor news, job boards, social media, scholarly literature, competitors, and more. In addition, all results can also be filtered by country and language. 

  • Search engines – search, news, images, video, SERP, crawl, and scholar 
  • Products API – search for product rankings 
  • Status – “status” == true then API is up

41. Temp Mail

TempMail is a top API

Category: Tools

The Temp Mail API is a disposable t emporary email service that helps developers avoid spam by self-destructing after a certain period of time. Other names include tempmail, 10minutemail, throwaway email, fake-mail, or trash-mail. 

  • Delete message 
  • Domains list
  • Get emails 
  • Get message attachments 
  • Get one message 
  • Source message 

Related: How to create a disposable email address

42. YouTube MP3

YouTube MP3 is a popular API

The YouTube MP3 API does exactly what it says. It converts any YouTube video link into a downloadable MP3 link. The endpoints allow developers to do this for single or multiple videos.

The YouTube MP3 API has one endpoint,  Get MP3 – Convert Youtube Videos to MP3.

43. Social Media Data TT

Social Media Data TT is a popular API

The Social Media Data TT API allows developers to retrieve useful and unique information in real time. Extract user and hashtag metadata, video feed metadata from a user, hashtag, trending and music pages (video posts with statistics), and extract user followers.

44. Free Geo IP

Category: Commerce

FreeGeoIP.app provides a reliable and scalable IP geolocation API for developers. It uses a database of IP addresses associated with cities and other relevant information like time zone, latitude, and longitude.

45. ReCaptcha Solver

ReCaptcha Solver is a popular API

Solve reCAPTCHA v2 & reCAPTCHA v3 automatically and get the g_recaptcha_response for all scrapping and captcha-solving needs. 

  • ReCaptcha v2 create_task
  • ReCaptcha v3 create_task
  • Get ReCaptcha Result

46. YH Finance Complete

YH Finance Complete is a popular API

The YH Finance Complete API helps developers query stocks, quotes, movers, and other financial summaries.

  • Currency Convertor
  • YH Historical 
  • Stock Snapshot
  • Simple Stock Price

Related: Best Stock Market and Brokerage APIs

47. Yahoo Finance

Yahoo Finance is a popular API

The Yahoo Finance API provides financial news for stocks, options, ETFs, and mutual funds and data, as well as online tools for personal finance management. It’s one of the more popular media properties for financial stock data . 

  • Get Summary – get real time live market summaries in a specific region 
  • Get Movers – retrieves the day’s gainers, losers, and activations within a given region
  • Get Quotes – returns all relevant information for a specific stock quote or groups of stock quotes specified by symbol. 
  • Get Charts – returns data that allows you to visualize a chart for a specific symbol and comparisons, Data includes trading periods, timestamps, comparisons, and more. 

48. Lingvanex Translate

Lingvanex Translate is a popular API

The Lingvanex Translate API provides clear and fast translation. Developers can use the API for business or research. 

The Lingvanex Translate API has one endpoint,  Translate – /getLanguages and /translate.

49. Blockchain HTTP RPC

Blockchain HTTP RPC is a popular API

Blockchain HTTP RPC offers managed web3 backend and blockchain infrastructure as a service (IAAS) with best in class elasticity, scalability, and flexible subscription options. Supports JSON-RPC over HTTPS.

Currently supports the following blockchains networks (full node):

  • Ethereum/mainnet
  • Ethereum/goerli
  • Polygon/mainnet
  • Polygon/mumbai-testnet
  • Binance Smart Chain/mainnet
  • Binance Smart Chain/testnet
  • Solana/mainnet
  • Solana/devnet

50. Botometer Pro

Botometer Pro is a popular API

The Botometer Pro API examines Twitter accounts for possible automated activity. The API checks the activity of an account and gives it a score based on the extent to which it matches accounts that use automation—a higher score indicates activities that are more bot-like .

  • Check accounts in bulk
  • Check account v4

And that’s a wrap of Rapid’s top 50 most popular APIs for 2023! This post was updated in January 2023.

To find more lists or collections of APIs, make sure to browse the Rapid API Hub .

Do you have an API that you’d like to add to the Rapid API Marketplace? Add your API today!

Most Popular API Categories:

  • Entertainment
  • Translation
  • Text Analysis
  • Food & Restaurant
  • Real Estate
  • IP Geolocation
  • Video Games
  • Cryptocurrency
  • Anime & Manga
  • Machine Learning
  • Facial Recognition
  • Company Information
  • Text Summarization
  • IP & Domain

2021 Top 50

  • Skyscanner Flight Search
  • Open Weather Map
  • API-FOOTBALL
  • The Cocktail DB
  • REST Countries v1
  • Yahoo Finance
  • Love Calculator
  • URL Shortener Service
  • Translation and NLP
  • Chuck Norris
  • Hearthstone
  • Currency Exchange
  • Breaking News
  • Email Validator
  • Urban Dictionary
  • Recipe – Food – Nutrition
  • Investors Exchange (IEX) Trading
  • Movie Database (IMDB Alternative)
  • webcams.travel
  • City Geo-Location Lookup
  • GeoDB Cities
  • Chicken Coop
  • OpenAPI 1.2
  • Text-to-Speech
  • vin-decoder
  • Cricket Live Scores
  • Youtube To Mp3 Download
  • Nexmo SMS Messaging API
  • Currency Converter
  • BrainShop.AI
  • Flight Data
  • BraveNewCoin

2020 Top 50

  • TransLoc OpenAPI 1.2
  • BettingOddsApi
  • YahooWeatherAPI
  • ListenNotes
  • Movie Database (IMDb Alternative)
  • AirportsFinder
  • Free Football (Soccer) Videos
  • Indian Mobile info
  • Get Video and Audio URL
  • Random Famous Quotes
  • Nutritionix – Nutrition Database
  • Edamam Nutrition Analysis
  • CoinMarketCap
  • Meme Generator
  • Soccer – Sports Open Data

Rapid API Collections

  • Address Validation
  • Air Quality
  • Alternatives to the Google Maps
  • Alternatives to the Yahoo Music
  • Amazon Products
  • Amazon Product Details
  • APIs to build Omni-Channel Notifications
  • Bible and Religious
  • BIN and IIN
  • Cannabis and Drugs
  • CoinMarketCap Alternative
  • Country Data
  • Currency Converter and Exchange
  • Distance Calculator
  • Domain Name Checker
  • Email Validation and Verification
  • Essential eCommerce
  • Geocoding and Location
  • Google Play and iOS App Store
  • Google Translate API Alternatives
  • Grocery Store
  • Health and Fitness
  • Historical Data
  • Holiday and Bank Holiday
  • HTML to PDF
  • Image Processing
  • Image Search and Image Recognition
  • Infrastructure
  • List of Dictionary
  • March Madness
  • Mobile App Development
  • Music and Audio
  • News Article Extraction
  • Non-English
  • Nudity Detection Image Moderation
  • Open Source
  • Product Info
  • Product Managers
  • Quote Generator
  • Reverse Geocoding
  • Ridesharing
  • Snapchat Alternative
  • Social Media
  • Song Lyrics
  • Speech to Text
  • Sports Data
  • Sports Odds and Sports Betting
  • Sports Scores
  • Stock Market and Brokerage
  • Stock Photos
  • Text Generator
  • Text to Speech
  • Transportation
  • Trending News and Publication
  • URL Shortener
  • Venmo API Alternatives
  • Video Gaming and Game Database
  • Weather APIs for Android
  • Web Scraping
  • Whitepages API Alternatives
  • Zloadr API Collection

speech to text free api

Team RapidAPI

  • API Courses
  • API Glossary
  • API Testing
  • API Management
  • Most Popular APIs
  • Free APIs List
  • How to use an API
  • Learn REST API
  • Build API’s
  • Write for Us
  • API Directory
  • Privacy Policy
  • Terms of Use

Building an Enterprise API Program Learn More

  • Election 2024
  • Entertainment
  • Newsletters
  • Photography
  • Personal Finance
  • AP Investigations
  • AP Buyline Personal Finance
  • AP Buyline Shopping
  • Press Releases
  • Israel-Hamas War
  • Russia-Ukraine War
  • Global elections
  • Asia Pacific
  • Latin America
  • Middle East
  • Election Results
  • Delegate Tracker
  • AP & Elections
  • Auto Racing
  • 2024 Paris Olympic Games
  • Movie reviews
  • Book reviews
  • Personal finance
  • Financial Markets
  • Business Highlights
  • Financial wellness
  • Artificial Intelligence
  • Social Media

Chiefs kicker Butker congratulates women graduates and says most are more excited about motherhood

FILE - Kansas City Chiefs kicker Harrison Butker speaks to the media during NFL football Super Bowl 58 opening night Monday, Feb. 5, 2024, in Las Vegas. Butker railed against Pride month along with President Biden’s leadership during the COVID-19 pandemic and his stance on abortion during a commencement address at Benedictine College last weekend. (AP Photo/Charlie Riedel, File)

FILE - Kansas City Chiefs kicker Harrison Butker speaks to the media during NFL football Super Bowl 58 opening night Monday, Feb. 5, 2024, in Las Vegas. Butker railed against Pride month along with President Biden’s leadership during the COVID-19 pandemic and his stance on abortion during a commencement address at Benedictine College last weekend. (AP Photo/Charlie Riedel, File)

The Benedictine College sign is seen Wednesday, May 15, 2024, in Atchison, Kan., days after Kansas City Chiefs kicker Harrison Butker gave a commencement speech that has been gaining attention. Butker’s speech has raised some eyebrows with his proclamations of conservative politics and Catholicism, but he received a standing ovation from graduates and other attendees of the commencement ceremony on Saturday, May 11. (AP Photo/Nick Ingram)

  • Copy Link copied

KANSAS CITY, Mo. (AP) — The commencement speaker at Kansas’ Benedictine College , a private Catholic liberal arts school, congratulated the women receiving degrees — and said most of them were probably more excited about getting married and having children.

Harrison Butker, the kicker for the Super Bowl champion Kansas City Chiefs, is getting attention for those and other comments last weekend in which he said some Catholic leaders were “pushing dangerous gender ideologies onto the youth of America.”

Butker, who’s made his conservative Catholic beliefs well known, also assailed Pride month , a particularly important time for the LGBTQ+ rights movement, and President Joe Biden’s stance on abortion.

“I think it is you, the women, who have had the most diabolical lies told to you,” Butker said.

AP AUDIO: Chiefs kicker Butker congratulates women graduates and says most are more excited about motherhood

A Super Bowl champion kicker is in hot water after comments he made during a college commencement speech. Correspondent Gethin Coolbaugh reports.

“Some of you may go on to lead successful careers in the world, but I would venture to guess that the majority of you are most excited about your marriage and the children you will bring into this world. I can tell you that my beautiful wife Isabelle would be the first to say that her life truly started when she started living her vocation as a wife and as a mother,” he said.

Butker said that his wife embraced “one of the most important titles of all. Homemaker.“

“Harrison Butker gave a speech in his personal capacity,” NFL senior vice president and chief diversity and inclusion officer Jonathan Beane said in a statement. “His views are not those of the NFL as an organization. The NFL is steadfast in our commitment to inclusion, which only makes our league stronger.”

Butker also criticized as disparaging to the Catholic Church an article by The Associated Press highlighting a shift toward conservativism in some parts of the church.

The three-time Super Bowl champion delivered his roughly 20-minute address Saturday at the Catholic private liberal arts school in Atchison, Kansas, which is located about 60 miles (97 kilometers) miles north of Kansas City. He received a standing ovation from graduates and other attendees.

Butker, 28, referred to a “deadly sin sort of pride that has a month dedicated to it” in an oblique reference to Pride month. Butker also took aim at Biden’s policies, including his condemnation of the Supreme Court’s reversal of the 1973 Roe v. Wade decision and advocacy for freedom of choice — a key campaign issue in the 2024 presidential race.

Biden, who is Catholic, has a fraught history on the issue. He initially opposed the Roe v. Wade decision, saying it went too far . He also opposed federal funding for abortions and supported restrictions on abortions later in pregnancy.

Butker also tackled Biden’s response to COVID-19, which has killed nearly 1.2 million people in the U.S., according to the Centers for Disease Control and Prevention.

“While COVID might have played a large role throughout your formative years, it is not unique,” he said. “Bad policies and poor leadership have negatively impacted major life issues. Things like abortion, IVF, surrogacy, euthanasia, as well as a growing support for degenerate cultural values and media all stem from pervasiveness of disorder.”

Graduates had mixed views on the speech. ValerieAnne Volpe, 20, who graduated with an art degree, lauded Butker for saying things that “people are scared to say.”

“You can just hear that he loves his wife. You can hear that he loves his family,” she said.

Elle Wilbers, 22, who is heading to medical school, said she was shocked by Butker’s criticism of priests and bishops and his reference to the LGBTQ+ community, one that she described as “horrible.”

“We should have compassion for the people who have been told all their life that the person they love is like, it’s not OK to love that person,” Wilbers said.

Kassidy Neuner, 22, who will spend a gap year teaching before going to law school, said being a stay-at-home parent is “a wonderful decision.”

“And it’s also not for everybody,” Neuner added, saying, “I think that he should have addressed more that it’s not always an option. And, if it is your option in life, that’s amazing for you. But there’s also the option to be a mother and a career woman.”

The Chiefs declined to comment on Butker’s commencement address.

The 2017 seventh-round pick out of Georgia Tech has become of the NFL’s best kickers, breaking the Chiefs’ franchise record with a 62-yard field goal in 2022. Butker helped them win their first Super Bowl in 50 years in 2020, added a second Lombardi Trophy in 2023, and he kicked the field goal that forced overtime in a Super Bowl win over San Francisco in February.

It has been an embarrassing offseason for the Chiefs, though.

Last month, voters in Jackson County, Missouri, soundly rejected a ballot initiative that would have helped pay for an $800 million renovation to Arrowhead Stadium, home of the Chiefs. Many voters criticized the plan put forward by the Chiefs as catering primarily to VIPs and the wealthy.

The same week, wide receiver Rashee Rice turned himself in to Dallas police on multiple charges, including aggravated assault, after he was involved in a high-speed crash that left four people with injuries. Rice has acknowledged being the driver of one of the sports cars that was going in excess of 100 mph (160 kph).

Last week, law enforcement officials told The Dallas Morning News that Rice also was suspected of assaulting a person at a downtown nightclub. Dallas police did not name Rice as the suspect in detailing a report to The Associated Press.

Chiefs coach Andy Reid said he had spoken to the receiver and the team was letting the legal process play out.

Associated Press writer Heather Hollingsworth in Mission, Kansas, contributed to this story.

AP NFL: https://apnews.com/hub/nfl

speech to text free api

  • Español – América Latina
  • Português – Brasil
  • Cloud Speech-to-Text
  • Documentation

Transcribe speech to text by using the API

This page shows you how to send a speech recognition request to Speech-to-Text using the REST interface and the curl command.

Before you begin

Before you can send a request to the Speech-to-Text API, you must have completed the following actions. See the before you begin page for details.

  • Make sure billing is enabled for Speech-to-Text.

Install the Google Cloud CLI, then initialize it by running the following command:

  • (Optional) Create a new Google Cloud Storage bucket to store your audio data.

Make an audio transcription request

Now you can use Speech-to-Text to transcribe an audio file to text. Use the following code sample to send a recognize REST request to the Speech-to-Text API.

Create a JSON request file with the following text, and save it as a sync-request.json plain text file:

Use curl to make a speech:recognize request, passing it the filename of the JSON request you set up in step 1:

The sample curl command uses the gcloud auth print-access-token command to get an authentication token.

Note that to pass a filename to curl you use the -d option (for "data") and precede the filename with an @ sign. This file should be in the same directory in which you execute the curl command.

You should see a response similar to the following:

Congratulations! You've sent your first request to Speech-to-Text.

If you receive an error or an empty response from Speech-to-Text, take a look at the troubleshooting and error mitigation steps.

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

  • Use the Google Cloud console to delete your project if you do not need it.

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-05-13 UTC.

IMAGES

  1. Top 10 Free Speech-to-Text APIs that you can use in your next IoT Project

    speech to text free api

  2. The Top Free Speech-to-Text APIs, AI Models, and Open Source Engines

    speech to text free api

  3. Best Free Speech-To-Text APIs and Open Source Libraries

    speech to text free api

  4. Live Speech to Text with Watson Speech to Text and Python

    speech to text free api

  5. Text to speech in the browser with the Web Speech API

    speech to text free api

  6. Speech To Text using Google API in Android Studio ( To Do App Firebase Finale )

    speech to text free api

VIDEO

  1. Top 3 FREE API to use for your project

  2. GCP Cloud Speech API 3 Ways: Challenge Lab ARC132

  3. How to use Open AI Text to Speech API in 3 mins

  4. The Best Text to Speech Tool Powered by AI 2024 (Free Access Link Below)

  5. #10bestaitools #10freebestAItools 10 BEST FREE A I TOOLS FOE TEXT TO SPEECH, you mus know in 2024

  6. Text to Speech inside Next.js using OpenAI API

COMMENTS

  1. The Top Free Speech-to-Text APIs, AI Models, and Open ...

    Choosing the best Speech-to-Text API, AI model, or open-source engine to build with can be challenging.You need to compare accuracy, model design, features, support options, documentation, security, and more. This post examines the best free Speech-to-Text APIs and AI models on the market today, including ones that have a free tier, to help you make an informed decision.

  2. 13 Best Free Speech-to-Text Open Source Engines, APIs, and AI Models

    Transcription. Translation. Recording. Best 13 speech-to-text open-source engine · 1 Whisper · 2 Project DeepSpeech · 3 Kaldi · 4 SpeechBrain · 5 Coqui · 6 Julius · 7 Flashlight ASR (Formerly Wav2Letter++) · 8 PaddleSpeech (Formerly DeepSpeech2) · 9 OpenSeq2Seq · 10 Vosk · 11 Athena · 12 ESPnet · 13 Tensorflow ASR.

  3. Best Speech-to-Text APIs in 2024

    8. Amazon Transcribe. Amazon Transcribe is offered as a part of the overall Amazon Web Services (AWS) platform. With similar features as Google and Microsoft's speech-to-text solutions, Amazon Transcribe offers good accuracy for pre-recorded audio, but poor accuracy for real-time streaming use cases.

  4. Speech to text

    The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model.They can be used to: Transcribe audio into whatever language the audio is in. Translate and transcribe the audio into english.

  5. Accurately convert speech into text using an API powered by Google's AI

    Support your global user base with Speech-to-Text service's extensive language support in over 125 languages and variants. Have full control over your infrastructure and protected speech data while leveraging Google's speech recognition technology on-premises, right in your own private data centers. Take the next step.

  6. Cloud Computing Services

    Cloud Computing Services | Google Cloud

  7. Speech-to-Text documentation

    Speech-to-Text documentation. View all product documentation. Speech-to-Text enables easy integration of Google speech recognition technologies into developer applications. Send audio and receive a text transcription from the Speech-to-Text API service. Learn more.

  8. Using the Speech-to-Text API with Node.js

    Overview. Google Cloud Speech-to-Text API enables developers to convert audio to text in 120 languages and variants, by applying powerful neural network models in an easy to use API. In this codelab, you will focus on using the Speech-to-Text API with Node.js. You will learn how to send an audio file in English and other languages to the Cloud ...

  9. Using the Web Speech API

    Using the Web Speech API. The Web Speech API provides two distinct areas of functionality — speech recognition, and speech synthesis (also known as text to speech, or tts) — which open up interesting new possibilities for accessibility, and control mechanisms. This article provides a simple introduction to both areas, along with demos.

  10. 5 Best Speech-to-Text APIs

    The Google Speech-To-Text API isn't free, however. It is free for speech recognition for audio less than 60 minutes. For audio transcriptions longer than that, it costs $0.006 per 15 seconds. For video transcriptions, it costs $0.006 per 15 seconds for videos up to 60 minutes in length. For video longer than one hour, it costs $0.012 for ...

  11. Speech-to-text APIs (Free Tutorials, SDK Documentation & Pricing

    A Speech to Text API (Application Programming Interface) is a software tool that allows developers to build applications that can transcribe spoken words into text. It uses machine learning algorithms to analyze and convert audio files, such as recorded voice memos or live speech, into written words. Speech to Text APIs are commonly used in various applications such as transcription software ...

  12. Free Speech to Text Online, Voice Typing & Transcription

    Speech to Text online notepad. Professional, accurate & free speech recognizing text editor. Distraction-free, fast, easy to use web app for dictation & typing. Speechnotes is a powerful speech-enabled online notepad, designed to empower your ideas by implementing a clean & efficient design, so you can focus on your thoughts.

  13. Speech to Text API

    Schedule a Call Now. Rev AI is the most accurate speech-to-text API on the market at only 2.0¢/min. Get your first transcript in minutes. Sign up for a free trial.

  14. Speech to text in the browser with the Web Speech API

    The Web Speech API has two functions, speech synthesis, otherwise known as text to speech, and speech recognition, or speech to text. We…

  15. Text to Speech API

    Advanced Text to Speech API. Elevate your projects with the fastest & most powerful text to speech & voice API. Quickly generate AI voices in multiple languages for your chatbots, agents, LLMs, websites, apps and more. ~400ms latency. High quality at speed. Get Started Free.

  16. APIs and references

    If you're new to Google Cloud, create an account to evaluate how Speech-to-Text performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads. Try Speech-to-Text free. Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code ...

  17. Speech to text

    Introduction. The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model. They can be used to: Transcribe audio into whatever language the audio is in. Translate and transcribe the audio into english.

  18. 7 Best Text to Speech APIs & Free Alternatives List

    Just select your preference from any API endpoints page. Sign up today for free on RapidAPI to begin using Text to Speech APIs! Browse 7+ Best Text to Speech APIs available on RapidAPI.com. Top Best Text to Speech APIs include Text-to-Speech, Rev.AI, RoboMatic.AI and more. Sign up today for free!

  19. Text to speech

    Introduction. The Audio API provides a speech endpoint based on our TTS (text-to-speech) model. It comes with 6 built-in voices and can be used to: Narrate a written blog post. Produce spoken audio in multiple languages. Give real time audio output using streaming. Here is an example of the alloy voice:

  20. Introducing GPT-4o: OpenAI's new flagship multimodal model now in

    Unified speech services for speech-to-text, text-to-speech and speech translation. Azure AI Language Add natural language capabilities with a single API call. Azure AI Translator Easily conduct machine translation with a simple REST API call ... Get free tools and guidance to build solutions, publish them to the marketplace, and reach millions ...

  21. Pricing

    The Assistants API and its tools make it easy for developers to build AI assistants in their applications. The tokens used for the Assistant API are billed at the chosen language model's per-token input / output rates. ... (1 GB free) GB refers to binary gigabytes (also known as gibibyte), where 1 GB is 2^30 bytes. ... Text-to-speech (TTS ...

  22. SpeechLab

    SpeechLab - Text to Speech TTS is the most advanced, simple and small app that revolutionizes the way people read! It is the best text reader that allows users to read aloud text with amazing voices. SpeechLab helps to convert text and text files into speech and save them as audio files. SpeechLab converts speech to text and text files into ...

  23. Top 50 Most Popular APIs (Updated for 2023)

    10. TikTok All in One. Category: Social. Connect to API. To round out the top 10 most popular APIs, is TikTok All in One. The API provides info and allows developers to search TikTok videos, music, hashtags (challenges), users, video comments, and live feeds.

  24. Cloud Speech-to-Text API

    A service endpoint is a base URL that specifies the network address of an API service. One service might have multiple service endpoints. This service has the following service endpoint and all URIs below are relative to this service endpoint: https://speech.googleapis.com.

  25. How do we use GPT 4o API for Vision, Text, Image, and more?

    GPT-4o breaks the mold by being truly multimodal. It can seamlessly process information from different formats, including: Text: This remains a core strength, allowing GPT-4o to converse, answer your questions, and generate creative text formats like poems or code. Audio: Imagine playing GPT-4o a song and having it analyze the music, describe ...

  26. Does Azure Speech Translation API expose the language detected key

    I am building a simple Python application using continuous speech translation through the Azure Cognitive Services Speech SDK. Translation and detection between languages works as far as I can tell...

  27. Chiefs' Harrison Butker says most graduating women are more excited

    Chiefs kicker Butker congratulates women graduates and says most are more excited about motherhood. FILE - Kansas City Chiefs kicker Harrison Butker speaks to the media during NFL football Super Bowl 58 opening night Monday, Feb. 5, 2024, in Las Vegas. Butker railed against Pride month along with President Biden's leadership during the COVID ...

  28. Transcribe speech to text by using the API

    You can send audio data to the Speech-to-Text API, which then returns a text transcription of that audio file. For more information about the service, see Speech-to-Text basics. Before you begin. Before you can send a request to the Speech-to-Text API, you must have completed the following actions. See the before you begin page for details.

  29. Hướng dẫn chuyển văn bản thành giọng đọc bằng FPT.AI Voicemaker

    4. Nghe thử và chọn giọng đọc. Nghe thử và chọn giọng đọc phù hợp ở cột bên phải giao diện. FPT.AI Text to Speech hiện đang sở hữu 8 giọng đọc chất lượng cao, đa dạng vùng miền (Bắc - Trung - Nam), giới tính (Nam/Nữ), đáp ứng nhiều nhu cầu và mục đích sử dụng ...