Text to Speech

Action ID: text_to_speech

Description

Convert text to natural-sounding speech audio using advanced text-to-speech technology. This node supports multiple voice providers including Azure, OpenAI, Google, and AWS Polly, offering a wide selection of voices in different languages and styles.

Provider

PixelML

Connection

Name

Description

Required

Input Parameters

Name

Type

Required

Default

Description

voice

string (enum)

✓

Voice to use for text to speech. Choose from a variety of voices across multiple providers (Azure, OpenAI, Google, AWS Polly)

text

string

✓

Text to convert to speech

View JSON Schema

{
  "$defs": {
    "TextToSpeechVoices": {
      "description": "Text to speech voices.",
      "enum": [
        "Hoai_My_Neural",
        "Nam_Minh_Neural",
        "Ava_Neural",
        "Andrew_Neural",
        "Emma_Neural",
        "Brian_Neural",
        "Alloy",
        "Echo",
        "Fable",
        "Onyx",
        "Nova",
        "Shimmer",
        "en_US_Casual_K",
        "en_US_Journey_D",
        "en_US_Journey_F",
        "en_US_Journey_O",
        "en_US_Neural2_A",
        "en_US_Neural2_C",
        "en_US_Neural2_D",
        "en_US_Neural2_E",
        "en_US_Neural2_F",
        "en_US_Neural2_G",
        "en_US_Neural2_H",
        "en_US_Neural2_I",
        "en_US_Neural2_J",
        "en_US_News_K",
        "en_US_News_L",
        "en_US_News_N",
        "en_US_Polyglot_1",
        "en_US_Standard_A",
        "en_US_Standard_B",
        "en_US_Standard_C",
        "en_US_Standard_D",
        "en_US_Standard_E",
        "en_US_Standard_F",
        "en_US_Standard_G",
        "en_US_Standard_H",
        "en_US_Standard_I",
        "en_US_Standard_J",
        "en_US_Studio_O",
        "en_US_Studio_Q",
        "en_US_Wavenet_A",
        "en_US_Wavenet_B",
        "en_US_Wavenet_C",
        "en_US_Wavenet_D",
        "en_US_Wavenet_E",
        "en_US_Wavenet_F",
        "en_US_Wavenet-G",
        "en_US_Wavenet-H",
        "en_US_Wavenet-I",
        "en_US_Wavenet-J",
        "vi_VN_Neural2_A",
        "vi_VN_Neural2_D",
        "vi_VN_Standard_A",
        "vi_VN_Standard_B",
        "vi_VN_Standard_C",
        "vi_VN_Standard_D",
        "vi_VN_Wavenet_A",
        "vi_VN_Wavenet_B",
        "vi_VN_Wavenet_C",
        "vi_VN_Wavenet_D",
        "Danielle",
        "Gregory",
        "Ivy",
        "Joanna",
        "Kendra",
        "Kimberly",
        "Salli",
        "Joey",
        "Justin",
        "Kevin",
        "Matthew",
        "Ruth",
        "Stephen"
      ],
      "title": "TextToSpeechVoices",
      "type": "string"
    }
  },
  "description": "Text To speech node input.",
  "properties": {
    "voice": {
      "$ref": "#/$defs/TextToSpeechVoices",
      "description": "Voice to use for text to speech",
      "enum_options": {
        "Alloy": {
          "name": "Alloy",
          "provider": "OpenAI",
          "voice_id": "alloy"
        },
        "Andrew_Neural": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Andrew Neural",
          "provider": "Azure",
          "voice_id": "en-US-AndrewNeural"
        },
        "Ava_Neural": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Ava Neural",
          "provider": "Azure",
          "voice_id": "en-US-AvaNeural"
        },
        "Brian_Neural": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Brian Neural",
          "provider": "Azure",
          "voice_id": "en-US-BrianNeural"
        },
        "Danielle": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Danielle",
          "provider": "AWSPolly",
          "voice_id": "Danielle"
        },
        "Echo": {
          "name": "Echo",
          "provider": "OpenAI",
          "voice_id": "echo"
        },
        "Emma_Neural": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Emma Neural",
          "provider": "Azure",
          "voice_id": "en-US-EmmaNeural"
        },
        "Fable": {
          "name": "Fable",
          "provider": "OpenAI",
          "voice_id": "fable"
        },
        "Gregory": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Gregory",
          "provider": "AWSPolly",
          "voice_id": "Gregory"
        },
        "Hoai_My_Neural": {
          "gender": "female",
          "languages": [
            "vi"
          ],
          "name": "Hoai My Neural",
          "provider": "Azure",
          "voice_id": "vi-VN-HoaiMyNeural"
        },
        "Ivy": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Ivy",
          "provider": "AWSPolly",
          "voice_id": "Ivy"
        },
        "Joanna": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Joanna",
          "provider": "AWSPolly",
          "voice_id": "Joanna"
        },
        "Joey": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Joey",
          "provider": "AWSPolly",
          "voice_id": "Joey"
        },
        "Justin": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Justin",
          "provider": "AWSPolly",
          "voice_id": "Justin"
        },
        "Kendra": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Kendra",
          "provider": "AWSPolly",
          "voice_id": "Kendra"
        },
        "Kevin": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Kevin",
          "provider": "AWSPolly",
          "voice_id": "Kevin"
        },
        "Kimberly": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Kimberly",
          "provider": "AWSPolly",
          "voice_id": "Kimberly"
        },
        "Matthew": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Matthew",
          "provider": "AWSPolly",
          "voice_id": "Matthew"
        },
        "Nam_Minh_Neural": {
          "gender": "male",
          "languages": [
            "vi"
          ],
          "name": "Nam Minh Neural",
          "provider": "Azure",
          "voice_id": "vi-VN-NamMinhNeural"
        },
        "Nova": {
          "name": "Nova",
          "provider": "OpenAI",
          "voice_id": "nova"
        },
        "Onyx": {
          "name": "Onyx",
          "provider": "OpenAI",
          "voice_id": "onyx"
        },
        "Ruth": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Ruth",
          "provider": "AWSPolly",
          "voice_id": "Ruth"
        },
        "Salli": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Salli",
          "provider": "AWSPolly",
          "voice_id": "Salli"
        },
        "Shimmer": {
          "name": "Shimmer",
          "provider": "OpenAI",
          "voice_id": "shimmer"
        },
        "Stephen": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Stephen",
          "provider": "AWSPolly",
          "voice_id": "Stephen"
        },
        "en_US_Casual_K": {
          "languages": [
            "en"
          ],
          "name": "en-US-Casual-K",
          "provider": "Google",
          "voice_id": "en-US-Casual-K"
        },
        "en_US_Journey_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Journey-D",
          "provider": "Google",
          "voice_id": "en-US-Journey-D"
        },
        "en_US_Journey_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Journey-F",
          "provider": "Google",
          "voice_id": "en-US-Journey-F"
        },
        "en_US_Journey_O": {
          "languages": [
            "en"
          ],
          "name": "en-US-Journey-O",
          "provider": "Google",
          "voice_id": "en-US-Journey-O"
        },
        "en_US_Neural2_A": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-A",
          "provider": "Google",
          "voice_id": "en-US-Neural2-A"
        },
        "en_US_Neural2_C": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-C",
          "provider": "Google",
          "voice_id": "en-US-Neural2-C"
        },
        "en_US_Neural2_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-D",
          "provider": "Google",
          "voice_id": "en-US-Neural2-D"
        },
        "en_US_Neural2_E": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-E",
          "provider": "Google",
          "voice_id": "en-US-Neural2-E"
        },
        "en_US_Neural2_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-F",
          "provider": "Google",
          "voice_id": "en-US-Neural2-F"
        },
        "en_US_Neural2_G": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-G",
          "provider": "Google",
          "voice_id": "en-US-Neural2-G"
        },
        "en_US_Neural2_H": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-H",
          "provider": "Google",
          "voice_id": "en-US-Neural2-H"
        },
        "en_US_Neural2_I": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-I",
          "provider": "Google",
          "voice_id": "en-US-Neural2-I"
        },
        "en_US_Neural2_J": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-J",
          "provider": "Google",
          "voice_id": "en-US-Neural2-J"
        },
        "en_US_News_K": {
          "languages": [
            "en"
          ],
          "name": "en-US-News-K",
          "provider": "Google",
          "voice_id": "en-US-News-K"
        },
        "en_US_News_L": {
          "languages": [
            "en"
          ],
          "name": "en-US-News-L",
          "provider": "Google",
          "voice_id": "en-US-News-L"
        },
        "en_US_News_N": {
          "languages": [
            "en"
          ],
          "name": "en-US-News-N",
          "provider": "Google",
          "voice_id": "en-US-News-N"
        },
        "en_US_Polyglot_1": {
          "languages": [
            "en"
          ],
          "name": "en-US-Polyglot-1",
          "provider": "Google",
          "voice_id": "en-US-Polyglot-1"
        },
        "en_US_Standard_A": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-A",
          "provider": "Google",
          "voice_id": "en-US-Standard-A"
        },
        "en_US_Standard_B": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-B",
          "provider": "Google",
          "voice_id": "en-US-Standard-B"
        },
        "en_US_Standard_C": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-C",
          "provider": "Google",
          "voice_id": "en-US-Standard-C"
        },
        "en_US_Standard_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-D",
          "provider": "Google",
          "voice_id": "en-US-Standard-D"
        },
        "en_US_Standard_E": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-E",
          "provider": "Google",
          "voice_id": "en-US-Standard-E"
        },
        "en_US_Standard_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-F",
          "provider": "Google",
          "voice_id": "en-US-Standard-F"
        },
        "en_US_Standard_G": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-G",
          "provider": "Google",
          "voice_id": "en-US-Standard-G"
        },
        "en_US_Standard_H": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-H",
          "provider": "Google",
          "voice_id": "en-US-Standard-H"
        },
        "en_US_Standard_I": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-I",
          "provider": "Google",
          "voice_id": "en-US-Standard-I"
        },
        "en_US_Standard_J": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-J",
          "provider": "Google",
          "voice_id": "en-US-Standard-J"
        },
        "en_US_Studio_O": {
          "languages": [
            "en"
          ],
          "name": "en-US-Studio-O",
          "provider": "Google",
          "voice_id": "en-US-Studio-O"
        },
        "en_US_Studio_Q": {
          "languages": [
            "en"
          ],
          "name": "en-US-Studio-Q",
          "provider": "Google",
          "voice_id": "en-US-Studio-Q"
        },
        "en_US_Wavenet-G": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-G",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-G"
        },
        "en_US_Wavenet-H": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-H",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-H"
        },
        "en_US_Wavenet-I": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-I",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-I"
        },
        "en_US_Wavenet-J": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-J",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-J"
        },
        "en_US_Wavenet_A": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-A",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-A"
        },
        "en_US_Wavenet_B": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-B",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-B"
        },
        "en_US_Wavenet_C": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-C",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-C"
        },
        "en_US_Wavenet_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-D",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-D"
        },
        "en_US_Wavenet_E": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-E",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-E"
        },
        "en_US_Wavenet_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-F",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-F"
        },
        "vi_VN_Neural2_A": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Neural2-A",
          "provider": "Google",
          "voice_id": "vi-VN-Neural2-A"
        },
        "vi_VN_Neural2_D": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Neural2-D",
          "provider": "Google",
          "voice_id": "vi-VN-Neural2-D"
        },
        "vi_VN_Standard_A": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-A",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-A"
        },
        "vi_VN_Standard_B": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-B",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-B"
        },
        "vi_VN_Standard_C": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-C",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-C"
        },
        "vi_VN_Standard_D": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-D",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-D"
        },
        "vi_VN_Wavenet_A": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-A",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-A"
        },
        "vi_VN_Wavenet_B": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-B",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-B"
        },
        "vi_VN_Wavenet_C": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-C",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-C"
        },
        "vi_VN_Wavenet_D": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-D",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-D"
        }
      },
      "title": "Voice"
    },
    "text": {
      "description": "Text to convert to speech",
      "title": "Text",
      "type": "string"
    }
  },
  "required": [
    "voice",
    "text"
  ],
  "title": "TextToSpeechNodeInput",
  "type": "object"
}

Output Parameters

Name

Type

Description

voice_url

string

URL to the generated audio file

caption_url

string

URL to the caption/transcript file

duration

number

Duration of the generated audio in seconds

View JSON Schema

{
  "description": "Call other workflow node output.",
  "properties": {
    "voice_url": {
      "title": "Audio URL",
      "type": "string"
    },
    "caption_url": {
      "title": "Caption URL",
      "type": "string"
    },
    "duration": {
      "title": "Duration",
      "type": "number"
    }
  },
  "required": [
    "voice_url",
    "caption_url",
    "duration"
  ],
  "title": "TextToSpeechNodeOutput",
  "type": "object"
}

How It Works

This node converts text into natural-sounding speech audio using PixelML's text-to-speech API, which integrates multiple voice providers including Azure, OpenAI, Google, and AWS Polly. You provide the text content and select a voice from the available options, and the node generates an audio file, returns the URL to access it, along with caption data and duration information for downstream processing.

Usage Examples

Example 1: Convert Article to Audio with Female Voice

Input:

voice: "Joanna"
text: "Welcome to our podcast. Today we're discussing the future of artificial intelligence and its impact on everyday life."

Output:

voice_url: "https://storage.pixelml.com/audio/abc123.mp3"
caption_url: "https://storage.pixelml.com/captions/abc123.vtt"
duration: 12.5

Example 2: Generate Vietnamese Narration

Input:

voice: "Hoai_My_Neural"
text: "Chào mừng bạn đến với hướng dẫn sử dụng sản phẩm của chúng tôi."

Output:

voice_url: "https://storage.pixelml.com/audio/def456.mp3"
caption_url: "https://storage.pixelml.com/captions/def456.vtt"
duration: 8.3

Example 3: Create News-Style Audio with Professional Voice

Input:

voice: "en_US_News_K"
text: "Breaking news: Scientists have made a groundbreaking discovery in renewable energy technology that could revolutionize power generation worldwide."

Output:

voice_url: "https://storage.pixelml.com/audio/ghi789.mp3"
caption_url: "https://storage.pixelml.com/captions/ghi789.vtt"
duration: 15.7

Common Use Cases

Podcast Generation: Convert written content, blog posts, or articles into audio format for podcast distribution
Accessibility: Create audio versions of written content to make it accessible to visually impaired users
E-Learning: Generate narration for educational videos, courses, and training materials
Voice Notifications: Create custom voice alerts and notifications for applications and systems
Audiobook Production: Convert written books or documents into audiobook format
Video Voiceovers: Generate professional voiceovers for marketing videos, explainer videos, and presentations
Multilingual Content: Produce audio content in multiple languages using native-sounding voices

Error Handling

Error Type

Cause

Solution

Invalid Connection

PixelML connection is not configured or expired

Verify and update your PixelML connection credentials in the connection settings

Text Too Long

Input text exceeds maximum character limit

Split the text into smaller chunks and process separately, then concatenate audio files

Invalid Voice

Selected voice is not available or unsupported

Choose a valid voice from the available options in the voice parameter

API Rate Limit

Too many requests sent in a short time period

Implement delays between requests or upgrade your PixelML plan for higher limits

Empty Text

No text provided or text is empty

Ensure the text parameter contains at least one character

Network Timeout

Request timed out due to network issues

Retry the request or check network connectivity

Quota Exceeded

PixelML account quota has been exceeded

Upgrade your PixelML plan or wait for quota reset

Notes

Voice Selection: Different voices are optimized for different use cases. News voices work well for formal content, Neural voices for natural conversation, and Wavenet for high-quality output.
Text Length: Longer texts will result in longer processing times. Consider splitting very long texts into manageable chunks.
Language Support: Ensure you select a voice that matches the language of your text for best pronunciation and natural sound.
Audio Format: The generated audio is typically in MP3 format, which is widely compatible with most platforms and devices.
Provider Differences: Each provider (Azure, OpenAI, Google, AWS) has distinct voice characteristics. Test multiple voices to find the best fit for your content.
Duration Accuracy: The returned duration value is accurate and useful for synchronizing audio with video or other media.
Caption Files: The caption_url provides a text transcript file that can be used for subtitles or accessibility purposes.
Caching: Audio files are temporarily stored and accessible via the returned URLs. Download and store files if long-term access is needed.

PreviousText to Music NextText to speech custom

Last updated 3 months ago

hashtagDescription

hashtagProvider

hashtagConnection

hashtagInput Parameters

hashtagOutput Parameters

hashtagHow It Works

hashtagUsage Examples

hashtagExample 1: Convert Article to Audio with Female Voice

hashtagExample 2: Generate Vietnamese Narration

hashtagExample 3: Create News-Style Audio with Professional Voice

hashtagCommon Use Cases

hashtagError Handling

hashtagNotes

Description

Provider

Connection

Input Parameters

Output Parameters

How It Works

Usage Examples

Example 1: Convert Article to Audio with Female Voice

Example 2: Generate Vietnamese Narration

Example 3: Create News-Style Audio with Professional Voice

Common Use Cases

Error Handling

Notes