# Text to Speech

**Action ID:** `text_to_speech`

## Description

Convert text to natural-sounding speech audio using advanced text-to-speech technology. This node supports multiple voice providers including Azure, OpenAI, Google, and AWS Polly, offering a wide selection of voices in different languages and styles.

## Provider

**PixelML**

## Connection

| Name               | Description                                 | Required | Category |
| ------------------ | ------------------------------------------- | :------: | -------- |
| PixelML Connection | The PixelML connection to call PixelML API. |     ✓    | pixelml  |

## Input Parameters

| Name  | Type          | Required | Default | Description                                                                                                                   |
| ----- | ------------- | :------: | ------- | ----------------------------------------------------------------------------------------------------------------------------- |
| voice | string (enum) |     ✓    | -       | Voice to use for text to speech. Choose from a variety of voices across multiple providers (Azure, OpenAI, Google, AWS Polly) |
| text  | string        |     ✓    | -       | Text to convert to speech                                                                                                     |

<details>

<summary>View JSON Schema</summary>

```json
{
  "$defs": {
    "TextToSpeechVoices": {
      "description": "Text to speech voices.",
      "enum": [
        "Hoai_My_Neural",
        "Nam_Minh_Neural",
        "Ava_Neural",
        "Andrew_Neural",
        "Emma_Neural",
        "Brian_Neural",
        "Alloy",
        "Echo",
        "Fable",
        "Onyx",
        "Nova",
        "Shimmer",
        "en_US_Casual_K",
        "en_US_Journey_D",
        "en_US_Journey_F",
        "en_US_Journey_O",
        "en_US_Neural2_A",
        "en_US_Neural2_C",
        "en_US_Neural2_D",
        "en_US_Neural2_E",
        "en_US_Neural2_F",
        "en_US_Neural2_G",
        "en_US_Neural2_H",
        "en_US_Neural2_I",
        "en_US_Neural2_J",
        "en_US_News_K",
        "en_US_News_L",
        "en_US_News_N",
        "en_US_Polyglot_1",
        "en_US_Standard_A",
        "en_US_Standard_B",
        "en_US_Standard_C",
        "en_US_Standard_D",
        "en_US_Standard_E",
        "en_US_Standard_F",
        "en_US_Standard_G",
        "en_US_Standard_H",
        "en_US_Standard_I",
        "en_US_Standard_J",
        "en_US_Studio_O",
        "en_US_Studio_Q",
        "en_US_Wavenet_A",
        "en_US_Wavenet_B",
        "en_US_Wavenet_C",
        "en_US_Wavenet_D",
        "en_US_Wavenet_E",
        "en_US_Wavenet_F",
        "en_US_Wavenet-G",
        "en_US_Wavenet-H",
        "en_US_Wavenet-I",
        "en_US_Wavenet-J",
        "vi_VN_Neural2_A",
        "vi_VN_Neural2_D",
        "vi_VN_Standard_A",
        "vi_VN_Standard_B",
        "vi_VN_Standard_C",
        "vi_VN_Standard_D",
        "vi_VN_Wavenet_A",
        "vi_VN_Wavenet_B",
        "vi_VN_Wavenet_C",
        "vi_VN_Wavenet_D",
        "Danielle",
        "Gregory",
        "Ivy",
        "Joanna",
        "Kendra",
        "Kimberly",
        "Salli",
        "Joey",
        "Justin",
        "Kevin",
        "Matthew",
        "Ruth",
        "Stephen"
      ],
      "title": "TextToSpeechVoices",
      "type": "string"
    }
  },
  "description": "Text To speech node input.",
  "properties": {
    "voice": {
      "$ref": "#/$defs/TextToSpeechVoices",
      "description": "Voice to use for text to speech",
      "enum_options": {
        "Alloy": {
          "name": "Alloy",
          "provider": "OpenAI",
          "voice_id": "alloy"
        },
        "Andrew_Neural": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Andrew Neural",
          "provider": "Azure",
          "voice_id": "en-US-AndrewNeural"
        },
        "Ava_Neural": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Ava Neural",
          "provider": "Azure",
          "voice_id": "en-US-AvaNeural"
        },
        "Brian_Neural": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Brian Neural",
          "provider": "Azure",
          "voice_id": "en-US-BrianNeural"
        },
        "Danielle": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Danielle",
          "provider": "AWSPolly",
          "voice_id": "Danielle"
        },
        "Echo": {
          "name": "Echo",
          "provider": "OpenAI",
          "voice_id": "echo"
        },
        "Emma_Neural": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Emma Neural",
          "provider": "Azure",
          "voice_id": "en-US-EmmaNeural"
        },
        "Fable": {
          "name": "Fable",
          "provider": "OpenAI",
          "voice_id": "fable"
        },
        "Gregory": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Gregory",
          "provider": "AWSPolly",
          "voice_id": "Gregory"
        },
        "Hoai_My_Neural": {
          "gender": "female",
          "languages": [
            "vi"
          ],
          "name": "Hoai My Neural",
          "provider": "Azure",
          "voice_id": "vi-VN-HoaiMyNeural"
        },
        "Ivy": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Ivy",
          "provider": "AWSPolly",
          "voice_id": "Ivy"
        },
        "Joanna": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Joanna",
          "provider": "AWSPolly",
          "voice_id": "Joanna"
        },
        "Joey": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Joey",
          "provider": "AWSPolly",
          "voice_id": "Joey"
        },
        "Justin": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Justin",
          "provider": "AWSPolly",
          "voice_id": "Justin"
        },
        "Kendra": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Kendra",
          "provider": "AWSPolly",
          "voice_id": "Kendra"
        },
        "Kevin": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Kevin",
          "provider": "AWSPolly",
          "voice_id": "Kevin"
        },
        "Kimberly": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Kimberly",
          "provider": "AWSPolly",
          "voice_id": "Kimberly"
        },
        "Matthew": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Matthew",
          "provider": "AWSPolly",
          "voice_id": "Matthew"
        },
        "Nam_Minh_Neural": {
          "gender": "male",
          "languages": [
            "vi"
          ],
          "name": "Nam Minh Neural",
          "provider": "Azure",
          "voice_id": "vi-VN-NamMinhNeural"
        },
        "Nova": {
          "name": "Nova",
          "provider": "OpenAI",
          "voice_id": "nova"
        },
        "Onyx": {
          "name": "Onyx",
          "provider": "OpenAI",
          "voice_id": "onyx"
        },
        "Ruth": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Ruth",
          "provider": "AWSPolly",
          "voice_id": "Ruth"
        },
        "Salli": {
          "gender": "female",
          "languages": [
            "en"
          ],
          "name": "Salli",
          "provider": "AWSPolly",
          "voice_id": "Salli"
        },
        "Shimmer": {
          "name": "Shimmer",
          "provider": "OpenAI",
          "voice_id": "shimmer"
        },
        "Stephen": {
          "gender": "male",
          "languages": [
            "en"
          ],
          "name": "Stephen",
          "provider": "AWSPolly",
          "voice_id": "Stephen"
        },
        "en_US_Casual_K": {
          "languages": [
            "en"
          ],
          "name": "en-US-Casual-K",
          "provider": "Google",
          "voice_id": "en-US-Casual-K"
        },
        "en_US_Journey_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Journey-D",
          "provider": "Google",
          "voice_id": "en-US-Journey-D"
        },
        "en_US_Journey_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Journey-F",
          "provider": "Google",
          "voice_id": "en-US-Journey-F"
        },
        "en_US_Journey_O": {
          "languages": [
            "en"
          ],
          "name": "en-US-Journey-O",
          "provider": "Google",
          "voice_id": "en-US-Journey-O"
        },
        "en_US_Neural2_A": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-A",
          "provider": "Google",
          "voice_id": "en-US-Neural2-A"
        },
        "en_US_Neural2_C": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-C",
          "provider": "Google",
          "voice_id": "en-US-Neural2-C"
        },
        "en_US_Neural2_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-D",
          "provider": "Google",
          "voice_id": "en-US-Neural2-D"
        },
        "en_US_Neural2_E": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-E",
          "provider": "Google",
          "voice_id": "en-US-Neural2-E"
        },
        "en_US_Neural2_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-F",
          "provider": "Google",
          "voice_id": "en-US-Neural2-F"
        },
        "en_US_Neural2_G": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-G",
          "provider": "Google",
          "voice_id": "en-US-Neural2-G"
        },
        "en_US_Neural2_H": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-H",
          "provider": "Google",
          "voice_id": "en-US-Neural2-H"
        },
        "en_US_Neural2_I": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-I",
          "provider": "Google",
          "voice_id": "en-US-Neural2-I"
        },
        "en_US_Neural2_J": {
          "languages": [
            "en"
          ],
          "name": "en-US-Neural2-J",
          "provider": "Google",
          "voice_id": "en-US-Neural2-J"
        },
        "en_US_News_K": {
          "languages": [
            "en"
          ],
          "name": "en-US-News-K",
          "provider": "Google",
          "voice_id": "en-US-News-K"
        },
        "en_US_News_L": {
          "languages": [
            "en"
          ],
          "name": "en-US-News-L",
          "provider": "Google",
          "voice_id": "en-US-News-L"
        },
        "en_US_News_N": {
          "languages": [
            "en"
          ],
          "name": "en-US-News-N",
          "provider": "Google",
          "voice_id": "en-US-News-N"
        },
        "en_US_Polyglot_1": {
          "languages": [
            "en"
          ],
          "name": "en-US-Polyglot-1",
          "provider": "Google",
          "voice_id": "en-US-Polyglot-1"
        },
        "en_US_Standard_A": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-A",
          "provider": "Google",
          "voice_id": "en-US-Standard-A"
        },
        "en_US_Standard_B": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-B",
          "provider": "Google",
          "voice_id": "en-US-Standard-B"
        },
        "en_US_Standard_C": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-C",
          "provider": "Google",
          "voice_id": "en-US-Standard-C"
        },
        "en_US_Standard_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-D",
          "provider": "Google",
          "voice_id": "en-US-Standard-D"
        },
        "en_US_Standard_E": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-E",
          "provider": "Google",
          "voice_id": "en-US-Standard-E"
        },
        "en_US_Standard_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-F",
          "provider": "Google",
          "voice_id": "en-US-Standard-F"
        },
        "en_US_Standard_G": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-G",
          "provider": "Google",
          "voice_id": "en-US-Standard-G"
        },
        "en_US_Standard_H": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-H",
          "provider": "Google",
          "voice_id": "en-US-Standard-H"
        },
        "en_US_Standard_I": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-I",
          "provider": "Google",
          "voice_id": "en-US-Standard-I"
        },
        "en_US_Standard_J": {
          "languages": [
            "en"
          ],
          "name": "en-US-Standard-J",
          "provider": "Google",
          "voice_id": "en-US-Standard-J"
        },
        "en_US_Studio_O": {
          "languages": [
            "en"
          ],
          "name": "en-US-Studio-O",
          "provider": "Google",
          "voice_id": "en-US-Studio-O"
        },
        "en_US_Studio_Q": {
          "languages": [
            "en"
          ],
          "name": "en-US-Studio-Q",
          "provider": "Google",
          "voice_id": "en-US-Studio-Q"
        },
        "en_US_Wavenet-G": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-G",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-G"
        },
        "en_US_Wavenet-H": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-H",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-H"
        },
        "en_US_Wavenet-I": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-I",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-I"
        },
        "en_US_Wavenet-J": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-J",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-J"
        },
        "en_US_Wavenet_A": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-A",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-A"
        },
        "en_US_Wavenet_B": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-B",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-B"
        },
        "en_US_Wavenet_C": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-C",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-C"
        },
        "en_US_Wavenet_D": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-D",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-D"
        },
        "en_US_Wavenet_E": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-E",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-E"
        },
        "en_US_Wavenet_F": {
          "languages": [
            "en"
          ],
          "name": "en-US-Wavenet-F",
          "provider": "Google",
          "voice_id": "en-US-Wavenet-F"
        },
        "vi_VN_Neural2_A": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Neural2-A",
          "provider": "Google",
          "voice_id": "vi-VN-Neural2-A"
        },
        "vi_VN_Neural2_D": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Neural2-D",
          "provider": "Google",
          "voice_id": "vi-VN-Neural2-D"
        },
        "vi_VN_Standard_A": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-A",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-A"
        },
        "vi_VN_Standard_B": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-B",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-B"
        },
        "vi_VN_Standard_C": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-C",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-C"
        },
        "vi_VN_Standard_D": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Standard-D",
          "provider": "Google",
          "voice_id": "vi-VN-Standard-D"
        },
        "vi_VN_Wavenet_A": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-A",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-A"
        },
        "vi_VN_Wavenet_B": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-B",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-B"
        },
        "vi_VN_Wavenet_C": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-C",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-C"
        },
        "vi_VN_Wavenet_D": {
          "languages": [
            "vi"
          ],
          "name": "vi-VN-Wavenet-D",
          "provider": "Google",
          "voice_id": "vi-VN-Wavenet-D"
        }
      },
      "title": "Voice"
    },
    "text": {
      "description": "Text to convert to speech",
      "title": "Text",
      "type": "string"
    }
  },
  "required": [
    "voice",
    "text"
  ],
  "title": "TextToSpeechNodeInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name         | Type   | Description                                |
| ------------ | ------ | ------------------------------------------ |
| voice\_url   | string | URL to the generated audio file            |
| caption\_url | string | URL to the caption/transcript file         |
| duration     | number | Duration of the generated audio in seconds |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Call other workflow node output.",
  "properties": {
    "voice_url": {
      "title": "Audio URL",
      "type": "string"
    },
    "caption_url": {
      "title": "Caption URL",
      "type": "string"
    },
    "duration": {
      "title": "Duration",
      "type": "number"
    }
  },
  "required": [
    "voice_url",
    "caption_url",
    "duration"
  ],
  "title": "TextToSpeechNodeOutput",
  "type": "object"
}
```

</details>

## How It Works

This node converts text into natural-sounding speech audio using PixelML's text-to-speech API, which integrates multiple voice providers including Azure, OpenAI, Google, and AWS Polly. You provide the text content and select a voice from the available options, and the node generates an audio file, returns the URL to access it, along with caption data and duration information for downstream processing.

## Usage Examples

### Example 1: Convert Article to Audio with Female Voice

**Input:**

```
voice: "Joanna"
text: "Welcome to our podcast. Today we're discussing the future of artificial intelligence and its impact on everyday life."
```

**Output:**

```
voice_url: "https://storage.pixelml.com/audio/abc123.mp3"
caption_url: "https://storage.pixelml.com/captions/abc123.vtt"
duration: 12.5
```

### Example 2: Generate Vietnamese Narration

**Input:**

```
voice: "Hoai_My_Neural"
text: "Chào mừng bạn đến với hướng dẫn sử dụng sản phẩm của chúng tôi."
```

**Output:**

```
voice_url: "https://storage.pixelml.com/audio/def456.mp3"
caption_url: "https://storage.pixelml.com/captions/def456.vtt"
duration: 8.3
```

### Example 3: Create News-Style Audio with Professional Voice

**Input:**

```
voice: "en_US_News_K"
text: "Breaking news: Scientists have made a groundbreaking discovery in renewable energy technology that could revolutionize power generation worldwide."
```

**Output:**

```
voice_url: "https://storage.pixelml.com/audio/ghi789.mp3"
caption_url: "https://storage.pixelml.com/captions/ghi789.vtt"
duration: 15.7
```

## Common Use Cases

* **Podcast Generation**: Convert written content, blog posts, or articles into audio format for podcast distribution
* **Accessibility**: Create audio versions of written content to make it accessible to visually impaired users
* **E-Learning**: Generate narration for educational videos, courses, and training materials
* **Voice Notifications**: Create custom voice alerts and notifications for applications and systems
* **Audiobook Production**: Convert written books or documents into audiobook format
* **Video Voiceovers**: Generate professional voiceovers for marketing videos, explainer videos, and presentations
* **Multilingual Content**: Produce audio content in multiple languages using native-sounding voices

## Error Handling

| Error Type         | Cause                                           | Solution                                                                                |
| ------------------ | ----------------------------------------------- | --------------------------------------------------------------------------------------- |
| Invalid Connection | PixelML connection is not configured or expired | Verify and update your PixelML connection credentials in the connection settings        |
| Text Too Long      | Input text exceeds maximum character limit      | Split the text into smaller chunks and process separately, then concatenate audio files |
| Invalid Voice      | Selected voice is not available or unsupported  | Choose a valid voice from the available options in the voice parameter                  |
| API Rate Limit     | Too many requests sent in a short time period   | Implement delays between requests or upgrade your PixelML plan for higher limits        |
| Empty Text         | No text provided or text is empty               | Ensure the text parameter contains at least one character                               |
| Network Timeout    | Request timed out due to network issues         | Retry the request or check network connectivity                                         |
| Quota Exceeded     | PixelML account quota has been exceeded         | Upgrade your PixelML plan or wait for quota reset                                       |

## Notes

* **Voice Selection**: Different voices are optimized for different use cases. News voices work well for formal content, Neural voices for natural conversation, and Wavenet for high-quality output.
* **Text Length**: Longer texts will result in longer processing times. Consider splitting very long texts into manageable chunks.
* **Language Support**: Ensure you select a voice that matches the language of your text for best pronunciation and natural sound.
* **Audio Format**: The generated audio is typically in MP3 format, which is widely compatible with most platforms and devices.
* **Provider Differences**: Each provider (Azure, OpenAI, Google, AWS) has distinct voice characteristics. Test multiple voices to find the best fit for your content.
* **Duration Accuracy**: The returned duration value is accurate and useful for synchronizing audio with video or other media.
* **Caption Files**: The caption\_url provides a text transcript file that can be used for subtitles or accessibility purposes.
* **Caching**: Audio files are temporarily stored and accessible via the returned URLs. Download and store files if long-term access is needed.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/text_to_speech.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
