Text-to-Speech (OpenAI)

Action ID: openai_text_to_speech

Description

Generate an audio recording from text using OpenAI's text-to-speech API. This node converts written text into natural-sounding speech using various voice options and audio formats, supporting multiple quality levels and playback speeds.

Provider

OpenAI

Connection

Name

Description

Required

Input Parameters

Name

Type

Required

Default

Description

model

dropdown

tts-1

The model which will generate the audio. Options: tts-1, tts-1-hd

text

string

✓

The text you want to convert to speech.

voice

dropdown

alloy

The voice to generate the audio in. Options: alloy, echo, fable, onyx, nova, shimmer

format

dropdown

mp3

The format you want the audio file in. Options: mp3, opus, aac, flac

speed

number

1.0

The speed of the audio. Minimum is 0.25 and maximum is 4.00.

file_name

string

audio

The name of the output audio file (without extension).

View JSON Schema

{
  "description": "Text-to-Speech node input.",
  "properties": {
    "model": {
      "default": "tts-1",
      "description": "The model which will generate the audio.",
      "enum": [
        "tts-1",
        "tts-1-hd"
      ],
      "title": "Model",
      "type": "string"
    },
    "text": {
      "description": "The text you want to convert to speech.",
      "title": "Text",
      "type": "string"
    },
    "voice": {
      "default": "alloy",
      "description": "The voice to generate the audio in.",
      "enum": [
        "alloy",
        "echo",
        "fable",
        "onyx",
        "nova",
        "shimmer"
      ],
      "title": "Voice",
      "type": "string"
    },
    "format": {
      "default": "mp3",
      "description": "The format you want the audio file in.",
      "enum": [
        "mp3",
        "opus",
        "aac",
        "flac"
      ],
      "title": "Output Format",
      "type": "string"
    },
    "speed": {
      "default": 1.0,
      "description": "The speed of the audio. Minimum is 0.25 and maximum is 4.00.",
      "maximum": 4.0,
      "minimum": 0.25,
      "title": "Speed",
      "type": "number"
    },
    "file_name": {
      "default": "audio",
      "description": "The name of the output audio file (without extension).",
      "title": "File Name",
      "type": "string"
    }
  },
  "required": [
    "text"
  ],
  "title": "TextToSpeechInput",
  "type": "object"
}

Output Parameters

Name

Type

Description

url

string

URL to the generated audio file.

format

string

The format of the generated audio file.

View JSON Schema

{
  "description": "Response from text-to-speech conversion.",
  "properties": {
    "url": {
      "title": "Url",
      "type": "string"
    },
    "format": {
      "title": "Format",
      "type": "string"
    }
  },
  "title": "TextToSpeechResponse",
  "type": "object"
}

How It Works

This node sends your text to OpenAI's text-to-speech API along with your selected voice and quality settings. The model converts the text into natural-sounding speech using the chosen voice profile. You can adjust the playback speed, select from six different voice options, and choose your preferred audio format. The generated audio file is returned as a URL that can be played, downloaded, or used in subsequent workflow steps.

Usage Examples

Example 1: Standard Quality Marketing Voiceover

Input:

model: "tts-1"
text: "Welcome to our premium product line. Experience quality and innovation combined."
voice: "nova"
format: "mp3"
speed: 1.0
file_name: "marketing_voiceover"

Output:

url: "https://api.openai.com/v1/audio/speech/..."
format: "mp3"

Example 2: High-Quality Audiobook

Input:

model: "tts-1-hd"
text: "Chapter 1: The Beginning. It was a dark and stormy night..."
voice: "fable"
format: "aac"
speed: 0.9
file_name: "audiobook_chapter_1"

Output:

url: "https://api.openai.com/v1/audio/speech/..."
format: "aac"

Example 3: Fast-Paced Notification

Input:

model: "tts-1"
text: "Alert! System update available. Please restart your computer."
voice: "onyx"
format: "opus"
speed: 1.3
file_name: "system_alert"

Output:

url: "https://api.openai.com/v1/audio/speech/..."
format: "opus"

Common Use Cases

Audiobook Generation: Create audiobooks from written text content
Voiceovers: Generate professional voiceovers for videos and presentations
Accessibility: Convert written content to audio for accessibility purposes
Notifications: Create audio notifications and alerts
Interactive Voice Responses: Generate dynamic responses for voice applications
Language Learning: Create pronunciation audio for language learning materials
Marketing: Generate professional marketing voiceovers and promotional audio

Error Handling

Error Type

Cause

Solution

Text Too Long

Input text exceeds maximum allowed length (4096 characters)

Split text into smaller chunks and process separately

Invalid Model

Model name doesn't exist

Use either tts-1 or tts-1-hd

Invalid Voice

Voice name doesn't exist or is misspelled

Select from: alloy, echo, fable, onyx, nova, shimmer

Invalid Format

Audio format not supported

Use: mp3, opus, aac, or flac

Invalid Speed

Speed is outside range 0.25-4.0

Ensure speed is between 0.25 and 4.0

Authentication Error

Invalid or missing API key

Verify OpenAI connection is properly configured

Timeout Error

Request took too long to process

Try with shorter text or simpler settings

Rate Limit Exceeded

Too many requests in a short time

Implement delays between requests

Notes

Model Selection: tts-1 is faster and cheaper but may produce lower quality audio. tts-1-hd produces higher quality but is slower and more expensive.
Voice Options: Try different voices (alloy, echo, fable, onyx, nova, shimmer) to match your brand personality or content tone.
Speed Control: Range is 0.25 (slowest) to 4.0 (fastest). Use 0.9-1.1 for natural-sounding speech.
Format Selection: MP3 is widely compatible. FLAC provides lossless compression. OPUS and AAC are modern efficient formats.
Text Limitations: Maximum 4096 characters per request. Plan for multiple requests for longer content.
Audio Storage: URLs may expire. Download or persist audio if long-term storage is needed.
Cost Optimization: tts-1 is significantly cheaper. Only use tts-1-hd when high quality is critical.

PreviousOpenAI Web Search NextAudio Transcription

Last updated 3 months ago

hashtagDescription

hashtagProvider

hashtagConnection

hashtagInput Parameters

hashtagOutput Parameters

hashtagHow It Works

hashtagUsage Examples

hashtagExample 1: Standard Quality Marketing Voiceover

hashtagExample 2: High-Quality Audiobook

hashtagExample 3: Fast-Paced Notification

hashtagCommon Use Cases

hashtagError Handling

hashtagNotes

Description

Provider

Connection

Input Parameters

Output Parameters

How It Works

Usage Examples

Example 1: Standard Quality Marketing Voiceover

Example 2: High-Quality Audiobook

Example 3: Fast-Paced Notification

Common Use Cases

Error Handling

Notes