Text to Speech
Action ID: text_to_speech
Description
Convert text to natural-sounding speech audio using advanced text-to-speech technology. This node supports multiple voice providers including Azure, OpenAI, Google, and AWS Polly, offering a wide selection of voices in different languages and styles.
Provider
PixelML
Connection
PixelML Connection
The PixelML connection to call PixelML API.
β
pixelml
Input Parameters
voice
string (enum)
β
-
Voice to use for text to speech. Choose from a variety of voices across multiple providers (Azure, OpenAI, Google, AWS Polly)
text
string
β
-
Text to convert to speech
View JSON Schema
{
"$defs": {
"TextToSpeechVoices": {
"description": "Text to speech voices.",
"enum": [
"Hoai_My_Neural",
"Nam_Minh_Neural",
"Ava_Neural",
"Andrew_Neural",
"Emma_Neural",
"Brian_Neural",
"Alloy",
"Echo",
"Fable",
"Onyx",
"Nova",
"Shimmer",
"en_US_Casual_K",
"en_US_Journey_D",
"en_US_Journey_F",
"en_US_Journey_O",
"en_US_Neural2_A",
"en_US_Neural2_C",
"en_US_Neural2_D",
"en_US_Neural2_E",
"en_US_Neural2_F",
"en_US_Neural2_G",
"en_US_Neural2_H",
"en_US_Neural2_I",
"en_US_Neural2_J",
"en_US_News_K",
"en_US_News_L",
"en_US_News_N",
"en_US_Polyglot_1",
"en_US_Standard_A",
"en_US_Standard_B",
"en_US_Standard_C",
"en_US_Standard_D",
"en_US_Standard_E",
"en_US_Standard_F",
"en_US_Standard_G",
"en_US_Standard_H",
"en_US_Standard_I",
"en_US_Standard_J",
"en_US_Studio_O",
"en_US_Studio_Q",
"en_US_Wavenet_A",
"en_US_Wavenet_B",
"en_US_Wavenet_C",
"en_US_Wavenet_D",
"en_US_Wavenet_E",
"en_US_Wavenet_F",
"en_US_Wavenet-G",
"en_US_Wavenet-H",
"en_US_Wavenet-I",
"en_US_Wavenet-J",
"vi_VN_Neural2_A",
"vi_VN_Neural2_D",
"vi_VN_Standard_A",
"vi_VN_Standard_B",
"vi_VN_Standard_C",
"vi_VN_Standard_D",
"vi_VN_Wavenet_A",
"vi_VN_Wavenet_B",
"vi_VN_Wavenet_C",
"vi_VN_Wavenet_D",
"Danielle",
"Gregory",
"Ivy",
"Joanna",
"Kendra",
"Kimberly",
"Salli",
"Joey",
"Justin",
"Kevin",
"Matthew",
"Ruth",
"Stephen"
],
"title": "TextToSpeechVoices",
"type": "string"
}
},
"description": "Text To speech node input.",
"properties": {
"voice": {
"$ref": "#/$defs/TextToSpeechVoices",
"description": "Voice to use for text to speech",
"enum_options": {
"Alloy": {
"name": "Alloy",
"provider": "OpenAI",
"voice_id": "alloy"
},
"Andrew_Neural": {
"gender": "male",
"languages": [
"en"
],
"name": "Andrew Neural",
"provider": "Azure",
"voice_id": "en-US-AndrewNeural"
},
"Ava_Neural": {
"gender": "female",
"languages": [
"en"
],
"name": "Ava Neural",
"provider": "Azure",
"voice_id": "en-US-AvaNeural"
},
"Brian_Neural": {
"gender": "male",
"languages": [
"en"
],
"name": "Brian Neural",
"provider": "Azure",
"voice_id": "en-US-BrianNeural"
},
"Danielle": {
"gender": "female",
"languages": [
"en"
],
"name": "Danielle",
"provider": "AWSPolly",
"voice_id": "Danielle"
},
"Echo": {
"name": "Echo",
"provider": "OpenAI",
"voice_id": "echo"
},
"Emma_Neural": {
"gender": "female",
"languages": [
"en"
],
"name": "Emma Neural",
"provider": "Azure",
"voice_id": "en-US-EmmaNeural"
},
"Fable": {
"name": "Fable",
"provider": "OpenAI",
"voice_id": "fable"
},
"Gregory": {
"gender": "male",
"languages": [
"en"
],
"name": "Gregory",
"provider": "AWSPolly",
"voice_id": "Gregory"
},
"Hoai_My_Neural": {
"gender": "female",
"languages": [
"vi"
],
"name": "Hoai My Neural",
"provider": "Azure",
"voice_id": "vi-VN-HoaiMyNeural"
},
"Ivy": {
"gender": "female",
"languages": [
"en"
],
"name": "Ivy",
"provider": "AWSPolly",
"voice_id": "Ivy"
},
"Joanna": {
"gender": "female",
"languages": [
"en"
],
"name": "Joanna",
"provider": "AWSPolly",
"voice_id": "Joanna"
},
"Joey": {
"gender": "male",
"languages": [
"en"
],
"name": "Joey",
"provider": "AWSPolly",
"voice_id": "Joey"
},
"Justin": {
"gender": "male",
"languages": [
"en"
],
"name": "Justin",
"provider": "AWSPolly",
"voice_id": "Justin"
},
"Kendra": {
"gender": "female",
"languages": [
"en"
],
"name": "Kendra",
"provider": "AWSPolly",
"voice_id": "Kendra"
},
"Kevin": {
"gender": "male",
"languages": [
"en"
],
"name": "Kevin",
"provider": "AWSPolly",
"voice_id": "Kevin"
},
"Kimberly": {
"gender": "female",
"languages": [
"en"
],
"name": "Kimberly",
"provider": "AWSPolly",
"voice_id": "Kimberly"
},
"Matthew": {
"gender": "male",
"languages": [
"en"
],
"name": "Matthew",
"provider": "AWSPolly",
"voice_id": "Matthew"
},
"Nam_Minh_Neural": {
"gender": "male",
"languages": [
"vi"
],
"name": "Nam Minh Neural",
"provider": "Azure",
"voice_id": "vi-VN-NamMinhNeural"
},
"Nova": {
"name": "Nova",
"provider": "OpenAI",
"voice_id": "nova"
},
"Onyx": {
"name": "Onyx",
"provider": "OpenAI",
"voice_id": "onyx"
},
"Ruth": {
"gender": "female",
"languages": [
"en"
],
"name": "Ruth",
"provider": "AWSPolly",
"voice_id": "Ruth"
},
"Salli": {
"gender": "female",
"languages": [
"en"
],
"name": "Salli",
"provider": "AWSPolly",
"voice_id": "Salli"
},
"Shimmer": {
"name": "Shimmer",
"provider": "OpenAI",
"voice_id": "shimmer"
},
"Stephen": {
"gender": "male",
"languages": [
"en"
],
"name": "Stephen",
"provider": "AWSPolly",
"voice_id": "Stephen"
},
"en_US_Casual_K": {
"languages": [
"en"
],
"name": "en-US-Casual-K",
"provider": "Google",
"voice_id": "en-US-Casual-K"
},
"en_US_Journey_D": {
"languages": [
"en"
],
"name": "en-US-Journey-D",
"provider": "Google",
"voice_id": "en-US-Journey-D"
},
"en_US_Journey_F": {
"languages": [
"en"
],
"name": "en-US-Journey-F",
"provider": "Google",
"voice_id": "en-US-Journey-F"
},
"en_US_Journey_O": {
"languages": [
"en"
],
"name": "en-US-Journey-O",
"provider": "Google",
"voice_id": "en-US-Journey-O"
},
"en_US_Neural2_A": {
"languages": [
"en"
],
"name": "en-US-Neural2-A",
"provider": "Google",
"voice_id": "en-US-Neural2-A"
},
"en_US_Neural2_C": {
"languages": [
"en"
],
"name": "en-US-Neural2-C",
"provider": "Google",
"voice_id": "en-US-Neural2-C"
},
"en_US_Neural2_D": {
"languages": [
"en"
],
"name": "en-US-Neural2-D",
"provider": "Google",
"voice_id": "en-US-Neural2-D"
},
"en_US_Neural2_E": {
"languages": [
"en"
],
"name": "en-US-Neural2-E",
"provider": "Google",
"voice_id": "en-US-Neural2-E"
},
"en_US_Neural2_F": {
"languages": [
"en"
],
"name": "en-US-Neural2-F",
"provider": "Google",
"voice_id": "en-US-Neural2-F"
},
"en_US_Neural2_G": {
"languages": [
"en"
],
"name": "en-US-Neural2-G",
"provider": "Google",
"voice_id": "en-US-Neural2-G"
},
"en_US_Neural2_H": {
"languages": [
"en"
],
"name": "en-US-Neural2-H",
"provider": "Google",
"voice_id": "en-US-Neural2-H"
},
"en_US_Neural2_I": {
"languages": [
"en"
],
"name": "en-US-Neural2-I",
"provider": "Google",
"voice_id": "en-US-Neural2-I"
},
"en_US_Neural2_J": {
"languages": [
"en"
],
"name": "en-US-Neural2-J",
"provider": "Google",
"voice_id": "en-US-Neural2-J"
},
"en_US_News_K": {
"languages": [
"en"
],
"name": "en-US-News-K",
"provider": "Google",
"voice_id": "en-US-News-K"
},
"en_US_News_L": {
"languages": [
"en"
],
"name": "en-US-News-L",
"provider": "Google",
"voice_id": "en-US-News-L"
},
"en_US_News_N": {
"languages": [
"en"
],
"name": "en-US-News-N",
"provider": "Google",
"voice_id": "en-US-News-N"
},
"en_US_Polyglot_1": {
"languages": [
"en"
],
"name": "en-US-Polyglot-1",
"provider": "Google",
"voice_id": "en-US-Polyglot-1"
},
"en_US_Standard_A": {
"languages": [
"en"
],
"name": "en-US-Standard-A",
"provider": "Google",
"voice_id": "en-US-Standard-A"
},
"en_US_Standard_B": {
"languages": [
"en"
],
"name": "en-US-Standard-B",
"provider": "Google",
"voice_id": "en-US-Standard-B"
},
"en_US_Standard_C": {
"languages": [
"en"
],
"name": "en-US-Standard-C",
"provider": "Google",
"voice_id": "en-US-Standard-C"
},
"en_US_Standard_D": {
"languages": [
"en"
],
"name": "en-US-Standard-D",
"provider": "Google",
"voice_id": "en-US-Standard-D"
},
"en_US_Standard_E": {
"languages": [
"en"
],
"name": "en-US-Standard-E",
"provider": "Google",
"voice_id": "en-US-Standard-E"
},
"en_US_Standard_F": {
"languages": [
"en"
],
"name": "en-US-Standard-F",
"provider": "Google",
"voice_id": "en-US-Standard-F"
},
"en_US_Standard_G": {
"languages": [
"en"
],
"name": "en-US-Standard-G",
"provider": "Google",
"voice_id": "en-US-Standard-G"
},
"en_US_Standard_H": {
"languages": [
"en"
],
"name": "en-US-Standard-H",
"provider": "Google",
"voice_id": "en-US-Standard-H"
},
"en_US_Standard_I": {
"languages": [
"en"
],
"name": "en-US-Standard-I",
"provider": "Google",
"voice_id": "en-US-Standard-I"
},
"en_US_Standard_J": {
"languages": [
"en"
],
"name": "en-US-Standard-J",
"provider": "Google",
"voice_id": "en-US-Standard-J"
},
"en_US_Studio_O": {
"languages": [
"en"
],
"name": "en-US-Studio-O",
"provider": "Google",
"voice_id": "en-US-Studio-O"
},
"en_US_Studio_Q": {
"languages": [
"en"
],
"name": "en-US-Studio-Q",
"provider": "Google",
"voice_id": "en-US-Studio-Q"
},
"en_US_Wavenet-G": {
"languages": [
"en"
],
"name": "en-US-Wavenet-G",
"provider": "Google",
"voice_id": "en-US-Wavenet-G"
},
"en_US_Wavenet-H": {
"languages": [
"en"
],
"name": "en-US-Wavenet-H",
"provider": "Google",
"voice_id": "en-US-Wavenet-H"
},
"en_US_Wavenet-I": {
"languages": [
"en"
],
"name": "en-US-Wavenet-I",
"provider": "Google",
"voice_id": "en-US-Wavenet-I"
},
"en_US_Wavenet-J": {
"languages": [
"en"
],
"name": "en-US-Wavenet-J",
"provider": "Google",
"voice_id": "en-US-Wavenet-J"
},
"en_US_Wavenet_A": {
"languages": [
"en"
],
"name": "en-US-Wavenet-A",
"provider": "Google",
"voice_id": "en-US-Wavenet-A"
},
"en_US_Wavenet_B": {
"languages": [
"en"
],
"name": "en-US-Wavenet-B",
"provider": "Google",
"voice_id": "en-US-Wavenet-B"
},
"en_US_Wavenet_C": {
"languages": [
"en"
],
"name": "en-US-Wavenet-C",
"provider": "Google",
"voice_id": "en-US-Wavenet-C"
},
"en_US_Wavenet_D": {
"languages": [
"en"
],
"name": "en-US-Wavenet-D",
"provider": "Google",
"voice_id": "en-US-Wavenet-D"
},
"en_US_Wavenet_E": {
"languages": [
"en"
],
"name": "en-US-Wavenet-E",
"provider": "Google",
"voice_id": "en-US-Wavenet-E"
},
"en_US_Wavenet_F": {
"languages": [
"en"
],
"name": "en-US-Wavenet-F",
"provider": "Google",
"voice_id": "en-US-Wavenet-F"
},
"vi_VN_Neural2_A": {
"languages": [
"vi"
],
"name": "vi-VN-Neural2-A",
"provider": "Google",
"voice_id": "vi-VN-Neural2-A"
},
"vi_VN_Neural2_D": {
"languages": [
"vi"
],
"name": "vi-VN-Neural2-D",
"provider": "Google",
"voice_id": "vi-VN-Neural2-D"
},
"vi_VN_Standard_A": {
"languages": [
"vi"
],
"name": "vi-VN-Standard-A",
"provider": "Google",
"voice_id": "vi-VN-Standard-A"
},
"vi_VN_Standard_B": {
"languages": [
"vi"
],
"name": "vi-VN-Standard-B",
"provider": "Google",
"voice_id": "vi-VN-Standard-B"
},
"vi_VN_Standard_C": {
"languages": [
"vi"
],
"name": "vi-VN-Standard-C",
"provider": "Google",
"voice_id": "vi-VN-Standard-C"
},
"vi_VN_Standard_D": {
"languages": [
"vi"
],
"name": "vi-VN-Standard-D",
"provider": "Google",
"voice_id": "vi-VN-Standard-D"
},
"vi_VN_Wavenet_A": {
"languages": [
"vi"
],
"name": "vi-VN-Wavenet-A",
"provider": "Google",
"voice_id": "vi-VN-Wavenet-A"
},
"vi_VN_Wavenet_B": {
"languages": [
"vi"
],
"name": "vi-VN-Wavenet-B",
"provider": "Google",
"voice_id": "vi-VN-Wavenet-B"
},
"vi_VN_Wavenet_C": {
"languages": [
"vi"
],
"name": "vi-VN-Wavenet-C",
"provider": "Google",
"voice_id": "vi-VN-Wavenet-C"
},
"vi_VN_Wavenet_D": {
"languages": [
"vi"
],
"name": "vi-VN-Wavenet-D",
"provider": "Google",
"voice_id": "vi-VN-Wavenet-D"
}
},
"title": "Voice"
},
"text": {
"description": "Text to convert to speech",
"title": "Text",
"type": "string"
}
},
"required": [
"voice",
"text"
],
"title": "TextToSpeechNodeInput",
"type": "object"
}Output Parameters
voice_url
string
URL to the generated audio file
caption_url
string
URL to the caption/transcript file
duration
number
Duration of the generated audio in seconds
How It Works
This node converts text into natural-sounding speech audio using PixelML's text-to-speech API, which integrates multiple voice providers including Azure, OpenAI, Google, and AWS Polly. You provide the text content and select a voice from the available options, and the node generates an audio file, returns the URL to access it, along with caption data and duration information for downstream processing.
Usage Examples
Example 1: Convert Article to Audio with Female Voice
Input:
Output:
Example 2: Generate Vietnamese Narration
Input:
Output:
Example 3: Create News-Style Audio with Professional Voice
Input:
Output:
Common Use Cases
Podcast Generation: Convert written content, blog posts, or articles into audio format for podcast distribution
Accessibility: Create audio versions of written content to make it accessible to visually impaired users
E-Learning: Generate narration for educational videos, courses, and training materials
Voice Notifications: Create custom voice alerts and notifications for applications and systems
Audiobook Production: Convert written books or documents into audiobook format
Video Voiceovers: Generate professional voiceovers for marketing videos, explainer videos, and presentations
Multilingual Content: Produce audio content in multiple languages using native-sounding voices
Error Handling
Invalid Connection
PixelML connection is not configured or expired
Verify and update your PixelML connection credentials in the connection settings
Text Too Long
Input text exceeds maximum character limit
Split the text into smaller chunks and process separately, then concatenate audio files
Invalid Voice
Selected voice is not available or unsupported
Choose a valid voice from the available options in the voice parameter
API Rate Limit
Too many requests sent in a short time period
Implement delays between requests or upgrade your PixelML plan for higher limits
Empty Text
No text provided or text is empty
Ensure the text parameter contains at least one character
Network Timeout
Request timed out due to network issues
Retry the request or check network connectivity
Quota Exceeded
PixelML account quota has been exceeded
Upgrade your PixelML plan or wait for quota reset
Notes
Voice Selection: Different voices are optimized for different use cases. News voices work well for formal content, Neural voices for natural conversation, and Wavenet for high-quality output.
Text Length: Longer texts will result in longer processing times. Consider splitting very long texts into manageable chunks.
Language Support: Ensure you select a voice that matches the language of your text for best pronunciation and natural sound.
Audio Format: The generated audio is typically in MP3 format, which is widely compatible with most platforms and devices.
Provider Differences: Each provider (Azure, OpenAI, Google, AWS) has distinct voice characteristics. Test multiple voices to find the best fit for your content.
Duration Accuracy: The returned duration value is accurate and useful for synchronizing audio with video or other media.
Caption Files: The caption_url provides a text transcript file that can be used for subtitles or accessibility purposes.
Caching: Audio files are temporarily stored and accessible via the returned URLs. Download and store files if long-term access is needed.
Last updated