Text-to-Speech (OpenAI)

Action ID: openai_text_to_speech

Description

Generate an audio recording from text using OpenAI's text-to-speech API. This node converts written text into natural-sounding speech using various voice options and audio formats, supporting multiple quality levels and playback speeds.

Provider

OpenAI

Connection

Name
Description
Required
Category

OpenAI Connection

The OpenAI connection to use for text-to-speech conversion.

openai

Input Parameters

Name
Type
Required
Default
Description

model

dropdown

-

tts-1

The model which will generate the audio. Options: tts-1, tts-1-hd

text

string

-

The text you want to convert to speech.

voice

dropdown

-

alloy

The voice to generate the audio in. Options: alloy, echo, fable, onyx, nova, shimmer

format

dropdown

-

mp3

The format you want the audio file in. Options: mp3, opus, aac, flac

speed

number

-

1.0

The speed of the audio. Minimum is 0.25 and maximum is 4.00.

file_name

string

-

audio

The name of the output audio file (without extension).

View JSON Schema
{
  "description": "Text-to-Speech node input.",
  "properties": {
    "model": {
      "default": "tts-1",
      "description": "The model which will generate the audio.",
      "enum": [
        "tts-1",
        "tts-1-hd"
      ],
      "title": "Model",
      "type": "string"
    },
    "text": {
      "description": "The text you want to convert to speech.",
      "title": "Text",
      "type": "string"
    },
    "voice": {
      "default": "alloy",
      "description": "The voice to generate the audio in.",
      "enum": [
        "alloy",
        "echo",
        "fable",
        "onyx",
        "nova",
        "shimmer"
      ],
      "title": "Voice",
      "type": "string"
    },
    "format": {
      "default": "mp3",
      "description": "The format you want the audio file in.",
      "enum": [
        "mp3",
        "opus",
        "aac",
        "flac"
      ],
      "title": "Output Format",
      "type": "string"
    },
    "speed": {
      "default": 1.0,
      "description": "The speed of the audio. Minimum is 0.25 and maximum is 4.00.",
      "maximum": 4.0,
      "minimum": 0.25,
      "title": "Speed",
      "type": "number"
    },
    "file_name": {
      "default": "audio",
      "description": "The name of the output audio file (without extension).",
      "title": "File Name",
      "type": "string"
    }
  },
  "required": [
    "text"
  ],
  "title": "TextToSpeechInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

url

string

URL to the generated audio file.

format

string

The format of the generated audio file.

View JSON Schema
{
  "description": "Response from text-to-speech conversion.",
  "properties": {
    "url": {
      "title": "Url",
      "type": "string"
    },
    "format": {
      "title": "Format",
      "type": "string"
    }
  },
  "title": "TextToSpeechResponse",
  "type": "object"
}

How It Works

This node sends your text to OpenAI's text-to-speech API along with your selected voice and quality settings. The model converts the text into natural-sounding speech using the chosen voice profile. You can adjust the playback speed, select from six different voice options, and choose your preferred audio format. The generated audio file is returned as a URL that can be played, downloaded, or used in subsequent workflow steps.

Usage Examples

Example 1: Standard Quality Marketing Voiceover

Input:

model: "tts-1"
text: "Welcome to our premium product line. Experience quality and innovation combined."
voice: "nova"
format: "mp3"
speed: 1.0
file_name: "marketing_voiceover"

Output:

url: "https://api.openai.com/v1/audio/speech/..."
format: "mp3"

Example 2: High-Quality Audiobook

Input:

model: "tts-1-hd"
text: "Chapter 1: The Beginning. It was a dark and stormy night..."
voice: "fable"
format: "aac"
speed: 0.9
file_name: "audiobook_chapter_1"

Output:

url: "https://api.openai.com/v1/audio/speech/..."
format: "aac"

Example 3: Fast-Paced Notification

Input:

model: "tts-1"
text: "Alert! System update available. Please restart your computer."
voice: "onyx"
format: "opus"
speed: 1.3
file_name: "system_alert"

Output:

url: "https://api.openai.com/v1/audio/speech/..."
format: "opus"

Common Use Cases

  • Audiobook Generation: Create audiobooks from written text content

  • Voiceovers: Generate professional voiceovers for videos and presentations

  • Accessibility: Convert written content to audio for accessibility purposes

  • Notifications: Create audio notifications and alerts

  • Interactive Voice Responses: Generate dynamic responses for voice applications

  • Language Learning: Create pronunciation audio for language learning materials

  • Marketing: Generate professional marketing voiceovers and promotional audio

Error Handling

Error Type
Cause
Solution

Text Too Long

Input text exceeds maximum allowed length (4096 characters)

Split text into smaller chunks and process separately

Invalid Model

Model name doesn't exist

Use either tts-1 or tts-1-hd

Invalid Voice

Voice name doesn't exist or is misspelled

Select from: alloy, echo, fable, onyx, nova, shimmer

Invalid Format

Audio format not supported

Use: mp3, opus, aac, or flac

Invalid Speed

Speed is outside range 0.25-4.0

Ensure speed is between 0.25 and 4.0

Authentication Error

Invalid or missing API key

Verify OpenAI connection is properly configured

Timeout Error

Request took too long to process

Try with shorter text or simpler settings

Rate Limit Exceeded

Too many requests in a short time

Implement delays between requests

Notes

  • Model Selection: tts-1 is faster and cheaper but may produce lower quality audio. tts-1-hd produces higher quality but is slower and more expensive.

  • Voice Options: Try different voices (alloy, echo, fable, onyx, nova, shimmer) to match your brand personality or content tone.

  • Speed Control: Range is 0.25 (slowest) to 4.0 (fastest). Use 0.9-1.1 for natural-sounding speech.

  • Format Selection: MP3 is widely compatible. FLAC provides lossless compression. OPUS and AAC are modern efficient formats.

  • Text Limitations: Maximum 4096 characters per request. Plan for multiple requests for longer content.

  • Audio Storage: URLs may expire. Download or persist audio if long-term storage is needed.

  • Cost Optimization: tts-1 is significantly cheaper. Only use tts-1-hd when high quality is critical.

Last updated

Was this helpful?