Text to Speech

Action ID: text_to_speech

Description

Convert text to natural-sounding speech audio using advanced text-to-speech technology. This node supports multiple voice providers including Azure, OpenAI, Google, and AWS Polly, offering a wide selection of voices in different languages and styles.

Provider

PixelML

Connection

Name
Description
Required
Category

PixelML Connection

The PixelML connection to call PixelML API.

pixelml

Input Parameters

Name
Type
Required
Default
Description

voice

string (enum)

-

Voice to use for text to speech. Choose from a variety of voices across multiple providers (Azure, OpenAI, Google, AWS Polly)

text

string

-

Text to convert to speech

View JSON Schema
{
  "$defs": {
    "TextToSpeechVoices": {
      "description": "Text to speech voices.",
      "enum": [
        "Hoai_My_Neural",
        "Nam_Minh_Neural",
        "Ava_Neural",
        "Andrew_Neural",
        "Emma_Neural",
        "Brian_Neural",
        "Alloy",
        "Echo",
        "Fable",
        "Onyx",
        "Nova",
        "Shimmer",
        "en_US_Casual_K",
        "en_US_Journey_D",
        "en_US_Journey_F",
        "en_US_Journey_O",
        "en_US_Neural2_A",
        "en_US_Neural2_C",
        "en_US_Neural2_D",
        "en_US_Neural2_E",
        "en_US_Neural2_F",
        "en_US_Neural2_G",
        "en_US_Neural2_H",
        "en_US_Neural2_I",
        "en_US_Neural2_J",
        "en_US_News_K",
        "en_US_News_L",
        "en_US_News_N",
        "en_US_Polyglot_1",
        "en_US_Standard_A",
        "en_US_Standard_B",
        "en_US_Standard_C",
        "en_US_Standard_D",
        "en_US_Standard_E",
        "en_US_Standard_F",
        "en_US_Standard_G",
        "en_US_Standard_H",
        "en_US_Standard_I",
        "en_US_Standard_J",
        "en_US_Studio_O",
        "en_US_Studio_Q",
        "en_US_Wavenet_A",
        "en_US_Wavenet_B",
        "en_US_Wavenet_C",
        "en_US_Wavenet_D",
        "en_US_Wavenet_E",
        "en_US_Wavenet_F",
        "en_US_Wavenet-G",
        "en_US_Wavenet-H",
        "en_US_Wavenet-I",
        "en_US_Wavenet-J",
        "vi_VN_Neural2_A",
        "vi_VN_Neural2_D",
        "vi_VN_Standard_A",
        "vi_VN_Standard_B",
        "vi_VN_Standard_C",
        "vi_VN_Standard_D",
        "vi_VN_Wavenet_A",
        "vi_VN_Wavenet_B",
        "vi_VN_Wavenet_C",
        "vi_VN_Wavenet_D",
        "Danielle",
        "Gregory",
        "Ivy",
        "Joanna",
        "Kendra",
        "Kimberly",
        "Salli",
        "Joey",
        "Justin",
        "Kevin",
        "Matthew",
        "Ruth",
        "Stephen"
      ],
      "title": "TextToSpeechVoices",
      "type": "string"
    }
  },
  "description": "Text To speech node input.",
  "properties": {
    "voice": {
      "$ref": "#/$defs/TextToSpeechVoices",
      "description": "Voice to use for text to speech",
      "enum_options": {
        "Alloy": {
          "name": "Alloy",
          "provider": "OpenAI",
          "voice_id": "alloy"
        },
        "Andrew_Neural": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Andrew Neural",
          "provider": "Azure",
          "voice_id": "en-US-AndrewNeural"
        },
        "Ava_Neural": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Ava Neural",
          "provider": "Azure",
          "voice_id": "en-US-AvaNeural"
        },
        "Brian_Neural": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Brian Neural",
          "provider": "Azure",
          "voice_id": "en-US-BrianNeural"
        },
        "Danielle": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Danielle",
          "provider": "AWSPolly",
          "voice_id": "Danielle"
        },
        "Echo": {
          "name": "Echo",
          "provider": "OpenAI",
          "voice_id": "echo"
        },
        "Emma_Neural": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Emma Neural",
          "provider": "Azure",
          "voice_id": "en-US-EmmaNeural"
        },
        "Fable": {
          "name": "Fable",
          "provider": "OpenAI",
          "voice_id": "fable"
        },
        "Gregory": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Gregory",
          "provider": "AWSPolly",
          "voice_id": "Gregory"
        },
        "Hoai_My_Neural": {
          "gender": "female",
          "languages": [
            "vi"
          ],
          "name": "Hoai My Neural",
          "provider": "Azure",
          "voice_id": "vi-VN-HoaiMyNeural"
        },
        "Ivy": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Ivy",
          "provider": "AWSPolly",
          "voice_id": "Ivy"
        },
        "Joanna": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Joanna",
          "provider": "AWSPolly",
          "voice_id": "Joanna"
        },
        "Joey": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Joey",
          "provider": "AWSPolly",
          "voice_id": "Joey"
        },
        "Justin": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Justin",
          "provider": "AWSPolly",
          "voice_id": "Justin"
        },
        "Kendra": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Kendra",
          "provider": "AWSPolly",
          "voice_id": "Kendra"
        },
        "Kevin": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Kevin",
          "provider": "AWSPolly",
          "voice_id": "Kevin"
        },
        "Kimberly": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Kimberly",
          "provider": "AWSPolly",
          "voice_id": "Kimberly"
        },
        "Matthew": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Matthew",
          "provider": "AWSPolly",
          "voice_id": "Matthew"
        },
        "Nam_Minh_Neural": {
          "gender": "male",
          "languages": [
            "vi"
          ],
          "name": "Nam Minh Neural",
          "provider": "Azure",
          "voice_id": "vi-VN-NamMinhNeural"
        },
        "Nova": {
          "name": "Nova",
          "provider": "OpenAI",
          "voice_id": "nova"
        },
        "Onyx": {
          "name": "Onyx",
          "provider": "OpenAI",
          "voice_id": "onyx"
        },
        "Ruth": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Ruth",
          "provider": "AWSPolly",
          "voice_id": "Ruth"
        },
        "Salli": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Salli",
          "provider": "AWSPolly",
          "voice_id": "Salli"
        },
        "Shimmer": {
          "name": "Shimmer",
          "provider": "OpenAI",
          "voice_id": "shimmer"
        },
        "Stephen": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Stephen",
          "provider": "AWSPolly",
          "voice_id": "Stephen"
        },
        "en_US_Casual_K": {
          "languages": [
            "en"
          ],
          "name": "en-US-Casual-K",
          "provider": "Google",
          "voice_id": "en-US-Casual-K"
        },
        "en_US_Journey_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Journey-D",
          "provider": "Google",
          "voice_id": "en-US-Journey-D"
        },
        "en_US_Journey_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Journey-F",
          "provider": "Google",
          "voice_id": "en-US-Journey-F"
        },
        "en_US_Journey_O": {
          "languages": [
            "en"
          ],
          "name": "en-US-Journey-O",
          "provider": "Google",
          "voice_id": "en-US-Journey-O"
        },
        "en_US_Neural2_A": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-A",
          "provider": "Google",
          "voice_id": "en-US-Neural2-A"
        },
        "en_US_Neural2_C": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-C",
          "provider": "Google",
          "voice_id": "en-US-Neural2-C"
        },
        "en_US_Neural2_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-D",
          "provider": "Google",
          "voice_id": "en-US-Neural2-D"
        },
        "en_US_Neural2_E": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-E",
          "provider": "Google",
          "voice_id": "en-US-Neural2-E"
        },
        "en_US_Neural2_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-F",
          "provider": "Google",
          "voice_id": "en-US-Neural2-F"
        },
        "en_US_Neural2_G": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-G",
          "provider": "Google",
          "voice_id": "en-US-Neural2-G"
        },
        "en_US_Neural2_H": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-H",
          "provider": "Google",
          "voice_id": "en-US-Neural2-H"
        },
        "en_US_Neural2_I": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-I",
          "provider": "Google",
          "voice_id": "en-US-Neural2-I"
        },
        "en_US_Neural2_J": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-J",
          "provider": "Google",
          "voice_id": "en-US-Neural2-J"
        },
        "en_US_News_K": {
          "languages": [
            "en"
          ],
          "name": "en-US-News-K",
          "provider": "Google",
          "voice_id": "en-US-News-K"
        },
        "en_US_News_L": {
          "languages": [
            "en"
          ],
          "name": "en-US-News-L",
          "provider": "Google",
          "voice_id": "en-US-News-L"
        },
        "en_US_News_N": {
          "languages": [
            "en"
          ],
          "name": "en-US-News-N",
          "provider": "Google",
          "voice_id": "en-US-News-N"
        },
        "en_US_Polyglot_1": {
          "languages": [
            "en"
          ],
          "name": "en-US-Polyglot-1",
          "provider": "Google",
          "voice_id": "en-US-Polyglot-1"
        },
        "en_US_Standard_A": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-A",
          "provider": "Google",
          "voice_id": "en-US-Standard-A"
        },
        "en_US_Standard_B": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-B",
          "provider": "Google",
          "voice_id": "en-US-Standard-B"
        },
        "en_US_Standard_C": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-C",
          "provider": "Google",
          "voice_id": "en-US-Standard-C"
        },
        "en_US_Standard_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-D",
          "provider": "Google",
          "voice_id": "en-US-Standard-D"
        },
        "en_US_Standard_E": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-E",
          "provider": "Google",
          "voice_id": "en-US-Standard-E"
        },
        "en_US_Standard_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-F",
          "provider": "Google",
          "voice_id": "en-US-Standard-F"
        },
        "en_US_Standard_G": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-G",
          "provider": "Google",
          "voice_id": "en-US-Standard-G"
        },
        "en_US_Standard_H": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-H",
          "provider": "Google",
          "voice_id": "en-US-Standard-H"
        },
        "en_US_Standard_I": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-I",
          "provider": "Google",
          "voice_id": "en-US-Standard-I"
        },
        "en_US_Standard_J": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-J",
          "provider": "Google",
          "voice_id": "en-US-Standard-J"
        },
        "en_US_Studio_O": {
          "languages": [
            "en"
          ],
          "name": "en-US-Studio-O",
          "provider": "Google",
          "voice_id": "en-US-Studio-O"
        },
        "en_US_Studio_Q": {
          "languages": [
            "en"
          ],
          "name": "en-US-Studio-Q",
          "provider": "Google",
          "voice_id": "en-US-Studio-Q"
        },
        "en_US_Wavenet-G": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-G",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-G"
        },
        "en_US_Wavenet-H": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-H",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-H"
        },
        "en_US_Wavenet-I": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-I",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-I"
        },
        "en_US_Wavenet-J": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-J",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-J"
        },
        "en_US_Wavenet_A": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-A",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-A"
        },
        "en_US_Wavenet_B": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-B",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-B"
        },
        "en_US_Wavenet_C": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-C",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-C"
        },
        "en_US_Wavenet_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-D",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-D"
        },
        "en_US_Wavenet_E": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-E",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-E"
        },
        "en_US_Wavenet_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-F",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-F"
        },
        "vi_VN_Neural2_A": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Neural2-A",
          "provider": "Google",
          "voice_id": "vi-VN-Neural2-A"
        },
        "vi_VN_Neural2_D": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Neural2-D",
          "provider": "Google",
          "voice_id": "vi-VN-Neural2-D"
        },
        "vi_VN_Standard_A": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-A",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-A"
        },
        "vi_VN_Standard_B": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-B",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-B"
        },
        "vi_VN_Standard_C": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-C",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-C"
        },
        "vi_VN_Standard_D": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-D",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-D"
        },
        "vi_VN_Wavenet_A": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-A",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-A"
        },
        "vi_VN_Wavenet_B": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-B",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-B"
        },
        "vi_VN_Wavenet_C": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-C",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-C"
        },
        "vi_VN_Wavenet_D": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-D",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-D"
        }
      },
      "title": "Voice"
    },
    "text": {
      "description": "Text to convert to speech",
      "title": "Text",
      "type": "string"
    }
  },
  "required": [
    "voice",
    "text"
  ],
  "title": "TextToSpeechNodeInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

voice_url

string

URL to the generated audio file

caption_url

string

URL to the caption/transcript file

duration

number

Duration of the generated audio in seconds

View JSON Schema
{
  "description": "Call other workflow node output.",
  "properties": {
    "voice_url": {
      "title": "Audio URL",
      "type": "string"
    },
    "caption_url": {
      "title": "Caption URL",
      "type": "string"
    },
    "duration": {
      "title": "Duration",
      "type": "number"
    }
  },
  "required": [
    "voice_url",
    "caption_url",
    "duration"
  ],
  "title": "TextToSpeechNodeOutput",
  "type": "object"
}

How It Works

This node converts text into natural-sounding speech audio using PixelML's text-to-speech API, which integrates multiple voice providers including Azure, OpenAI, Google, and AWS Polly. You provide the text content and select a voice from the available options, and the node generates an audio file, returns the URL to access it, along with caption data and duration information for downstream processing.

Usage Examples

Example 1: Convert Article to Audio with Female Voice

Input:

voice: "Joanna"
text: "Welcome to our podcast. Today we're discussing the future of artificial intelligence and its impact on everyday life."

Output:

voice_url: "https://storage.pixelml.com/audio/abc123.mp3"
caption_url: "https://storage.pixelml.com/captions/abc123.vtt"
duration: 12.5

Example 2: Generate Vietnamese Narration

Input:

voice: "Hoai_My_Neural"
text: "Chào mừng bạn đến với hướng dẫn sử dụng sản phẩm của chúng tôi."

Output:

voice_url: "https://storage.pixelml.com/audio/def456.mp3"
caption_url: "https://storage.pixelml.com/captions/def456.vtt"
duration: 8.3

Example 3: Create News-Style Audio with Professional Voice

Input:

voice: "en_US_News_K"
text: "Breaking news: Scientists have made a groundbreaking discovery in renewable energy technology that could revolutionize power generation worldwide."

Output:

voice_url: "https://storage.pixelml.com/audio/ghi789.mp3"
caption_url: "https://storage.pixelml.com/captions/ghi789.vtt"
duration: 15.7

Common Use Cases

  • Podcast Generation: Convert written content, blog posts, or articles into audio format for podcast distribution

  • Accessibility: Create audio versions of written content to make it accessible to visually impaired users

  • E-Learning: Generate narration for educational videos, courses, and training materials

  • Voice Notifications: Create custom voice alerts and notifications for applications and systems

  • Audiobook Production: Convert written books or documents into audiobook format

  • Video Voiceovers: Generate professional voiceovers for marketing videos, explainer videos, and presentations

  • Multilingual Content: Produce audio content in multiple languages using native-sounding voices

Error Handling

Error Type
Cause
Solution

Invalid Connection

PixelML connection is not configured or expired

Verify and update your PixelML connection credentials in the connection settings

Text Too Long

Input text exceeds maximum character limit

Split the text into smaller chunks and process separately, then concatenate audio files

Invalid Voice

Selected voice is not available or unsupported

Choose a valid voice from the available options in the voice parameter

API Rate Limit

Too many requests sent in a short time period

Implement delays between requests or upgrade your PixelML plan for higher limits

Empty Text

No text provided or text is empty

Ensure the text parameter contains at least one character

Network Timeout

Request timed out due to network issues

Retry the request or check network connectivity

Quota Exceeded

PixelML account quota has been exceeded

Upgrade your PixelML plan or wait for quota reset

Notes

  • Voice Selection: Different voices are optimized for different use cases. News voices work well for formal content, Neural voices for natural conversation, and Wavenet for high-quality output.

  • Text Length: Longer texts will result in longer processing times. Consider splitting very long texts into manageable chunks.

  • Language Support: Ensure you select a voice that matches the language of your text for best pronunciation and natural sound.

  • Audio Format: The generated audio is typically in MP3 format, which is widely compatible with most platforms and devices.

  • Provider Differences: Each provider (Azure, OpenAI, Google, AWS) has distinct voice characteristics. Test multiple voices to find the best fit for your content.

  • Duration Accuracy: The returned duration value is accurate and useful for synchronizing audio with video or other media.

  • Caption Files: The caption_url provides a text transcript file that can be used for subtitles or accessibility purposes.

  • Caching: Audio files are temporarily stored and accessible via the returned URLs. Download and store files if long-term access is needed.

Last updated

Was this helpful?