Text-to-Speech (OpenAI)
Action ID: openai_text_to_speech
Description
Generate an audio recording from text using OpenAI's text-to-speech API. This node converts written text into natural-sounding speech using various voice options and audio formats, supporting multiple quality levels and playback speeds.
Provider
OpenAI
Connection
OpenAI Connection
The OpenAI connection to use for text-to-speech conversion.
✓
openai
Input Parameters
model
dropdown
-
tts-1
The model which will generate the audio. Options: tts-1, tts-1-hd
text
string
✓
-
The text you want to convert to speech.
voice
dropdown
-
alloy
The voice to generate the audio in. Options: alloy, echo, fable, onyx, nova, shimmer
format
dropdown
-
mp3
The format you want the audio file in. Options: mp3, opus, aac, flac
speed
number
-
1.0
The speed of the audio. Minimum is 0.25 and maximum is 4.00.
file_name
string
-
audio
The name of the output audio file (without extension).
Output Parameters
url
string
URL to the generated audio file.
format
string
The format of the generated audio file.
How It Works
This node sends your text to OpenAI's text-to-speech API along with your selected voice and quality settings. The model converts the text into natural-sounding speech using the chosen voice profile. You can adjust the playback speed, select from six different voice options, and choose your preferred audio format. The generated audio file is returned as a URL that can be played, downloaded, or used in subsequent workflow steps.
Usage Examples
Example 1: Standard Quality Marketing Voiceover
Input:
model: "tts-1"
text: "Welcome to our premium product line. Experience quality and innovation combined."
voice: "nova"
format: "mp3"
speed: 1.0
file_name: "marketing_voiceover"Output:
url: "https://api.openai.com/v1/audio/speech/..."
format: "mp3"Example 2: High-Quality Audiobook
Input:
model: "tts-1-hd"
text: "Chapter 1: The Beginning. It was a dark and stormy night..."
voice: "fable"
format: "aac"
speed: 0.9
file_name: "audiobook_chapter_1"Output:
url: "https://api.openai.com/v1/audio/speech/..."
format: "aac"Example 3: Fast-Paced Notification
Input:
model: "tts-1"
text: "Alert! System update available. Please restart your computer."
voice: "onyx"
format: "opus"
speed: 1.3
file_name: "system_alert"Output:
url: "https://api.openai.com/v1/audio/speech/..."
format: "opus"Common Use Cases
Audiobook Generation: Create audiobooks from written text content
Voiceovers: Generate professional voiceovers for videos and presentations
Accessibility: Convert written content to audio for accessibility purposes
Notifications: Create audio notifications and alerts
Interactive Voice Responses: Generate dynamic responses for voice applications
Language Learning: Create pronunciation audio for language learning materials
Marketing: Generate professional marketing voiceovers and promotional audio
Error Handling
Text Too Long
Input text exceeds maximum allowed length (4096 characters)
Split text into smaller chunks and process separately
Invalid Model
Model name doesn't exist
Use either tts-1 or tts-1-hd
Invalid Voice
Voice name doesn't exist or is misspelled
Select from: alloy, echo, fable, onyx, nova, shimmer
Invalid Format
Audio format not supported
Use: mp3, opus, aac, or flac
Invalid Speed
Speed is outside range 0.25-4.0
Ensure speed is between 0.25 and 4.0
Authentication Error
Invalid or missing API key
Verify OpenAI connection is properly configured
Timeout Error
Request took too long to process
Try with shorter text or simpler settings
Rate Limit Exceeded
Too many requests in a short time
Implement delays between requests
Notes
Model Selection: tts-1 is faster and cheaper but may produce lower quality audio. tts-1-hd produces higher quality but is slower and more expensive.
Voice Options: Try different voices (alloy, echo, fable, onyx, nova, shimmer) to match your brand personality or content tone.
Speed Control: Range is 0.25 (slowest) to 4.0 (fastest). Use 0.9-1.1 for natural-sounding speech.
Format Selection: MP3 is widely compatible. FLAC provides lossless compression. OPUS and AAC are modern efficient formats.
Text Limitations: Maximum 4096 characters per request. Plan for multiple requests for longer content.
Audio Storage: URLs may expire. Download or persist audio if long-term storage is needed.
Cost Optimization: tts-1 is significantly cheaper. Only use tts-1-hd when high quality is critical.
Last updated
Was this helpful?