Types of voices

Text-to-Speech generates audio with natural, human-like quality, which creates speech that sounds like a real person. To start, specify a voice when sending a synthesis request.

Text-to-Speech offers a variety of voices based on language, gender, and accent. Some languages have multiple options. For a full list, check the Supported Voices page. To select a voice, use the VoiceSelectionParams field in your API request. Refer to the Quickstarts for instructions on making a synthesize request.

Overview

Voice Type Intended for Launch stage Controllability Streaming
Journey Conversational Agents
Preview - Yes
Studio Two speakers group Media - Discussions and Interviews

Experimental - -
One speaker person Media - Narration
GA SSML -
Neural2 General purpose
GA SSML -
Standard Cost efficient
GA SSML -

Pricing Details

Journey voices

Journey Voices, powered by the AudioLM engine, lets you create more engaging and empathetic speech for conversational applications. Through text streaming, Journey Voices produces low-latency real-time communication and supports the languages listed in the table of supported voices.

Chat experiences


Voice: en-US-Journey-F

Other examples

Virtual assistants


Voice: en-US-Journey-D

Customer service chatbots


Voice: en-US-Journey-F
Interactive education applications


Voice: en-US-Journey-O
Sales and pitches


Voice: en-US-Journey-D
Storytime


Voice: en-US-Journey-F

Studio multispeaker voices

Create discussions and interviews with the new multispeaker studio voices, based on the same technology behind Journey voices.


Studio voices

Studio voices are designed for news reading and broadcast content.


Example 1. The en-US-Studio-O voice reading the Great Gatsby.

Neural2 voices

The Text-to-Speech API provides a voice tier called Neural2. Neural2 voices are based on the same technology used to create a Custom Voice. Neural2 allows anyone to use Custom Voice technology without training their own custom voice. They're available in global and single region endpoints.


Example 1. Neural2 voice

Standard voices

The voices offered by Text-to-Speech differ in how they are produced, the synthetic speech technology used to create the machine model of the voice. One common speech technology, parametric text-to-speech, typically generates audio data by passing outputs through signal processing algorithms known as vocoders. Many of the standard voices available in Text-to-Speech use a variation of this technology.