# Text to speech custom

**Action ID:** `text_to_speech_custom`

## Description

Generate custom speech from text with detailed control over voice characteristics and audio properties.

## Connection

| Name               | Description                                 | Required | Category |
| ------------------ | ------------------------------------------- | -------- | -------- |
| PixelML Connection | The PixelML connection to call PixelML API. | True     | pixelml  |

## Input Parameters

| Name        | Type     | Required | Default                                                                                                                                              | Description                                                                                                                    |
| ----------- | -------- | :------: | ---------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| provider    | dropdown |     -    | replicate                                                                                                                                            | AI provider for speech generation. Available options: replicate, baseten                                                       |
| text        | string   |     ✓    | -                                                                                                                                                    | Text to convert to speech                                                                                                      |
| description | string   |     -    | A male speaker with a low-pitched voice delivering his words at a fast pace in a small, confined space with a very clear audio and an animated tone. | Detailed description of the desired output audio characteristics including gender, pitch, pace, environment, clarity, and tone |

<details>

<summary>View JSON Schema</summary>

**Input Schema**

```json
{
  "description": "Text To speech node input.",
  "properties": {
    "provider": {
      "default": "replicate",
      "description": "Provider",
      "enum": [
        "replicate",
        "baseten"
      ],
      "title": "Provider",
      "type": "string"
    },
    "text": {
      "description": "Text to convert to speech",
      "title": "Text",
      "type": "string"
    },
    "description": {
      "default": "A male speaker with a low-pitched voice delivering his words at a fast pace in a small, confined space with a very clear audio and an animated tone.",
      "description": "Provide description of the output audio",
      "title": "Provide description of the output audio",
      "type": "string"
    }
  },
  "required": [
    "text"
  ],
  "title": "TextToSpeechCustomNodeInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name       | Type   | Description                     |
| ---------- | ------ | ------------------------------- |
| voice\_url | string | URL of the generated audio file |

<details>

<summary>View JSON Schema</summary>

**Output Schema**

```json
{
  "description": "Call other workflow node output.",
  "properties": {
    "voice_url": {
      "title": "Audio URL",
      "type": "string"
    }
  },
  "required": [
    "voice_url"
  ],
  "title": "TextToSpeechCustomNodeOutput",
  "type": "object"
}
```

</details>

## How It Works

This node uses AI to generate custom speech from text based on detailed voice characteristic descriptions. Unlike standard text-to-speech or voice cloning, this node allows you to describe the exact voice properties you want, including speaker gender, pitch, speaking pace, environment acoustics, audio clarity, and emotional tone. The AI interprets your description and generates speech that matches those specifications.

## Usage Examples

### Example 1: Professional Narrator

**Input:**

```
provider: "replicate"
text: "Welcome to our comprehensive guide on artificial intelligence and machine learning."
description: "A professional female narrator with a medium pitch, speaking at a moderate pace in a studio environment with crystal clear audio and a confident, educational tone."
```

**Output:**

```
voice_url: "https://storage.pixelml.com/narrator-audio.mp3"
```

### Example 2: Energetic Advertisement

**Input:**

```
provider: "baseten"
text: "Don't miss out on our incredible summer sale! Up to 70% off on all items!"
description: "An enthusiastic male speaker with a high-pitched voice delivering his words at a very fast pace in a bright, open space with energetic and exciting tone."
```

**Output:**

```
voice_url: "https://storage.pixelml.com/ad-audio.mp3"
```

### Example 3: Calm Meditation Guide

**Input:**

```
provider: "replicate"
text: "Take a deep breath and let your body relax. Feel the tension leaving your muscles."
description: "A soothing female voice with a low pitch, speaking slowly and deliberately in a quiet, serene environment with soft, calming, and peaceful tone."
```

**Output:**

```
voice_url: "https://storage.pixelml.com/meditation-audio.mp3"
```

## Common Use Cases

* **Dynamic Content Creation**: Generate varied voice styles for different content types without needing multiple voice clones
* **Character Voices**: Create unique character voices for games, animations, or audiobooks
* **Mood-Based Audio**: Adjust voice characteristics to match the emotional context of content
* **Brand Voice Creation**: Experiment with different voice styles to find the perfect brand voice
* **A/B Testing**: Generate multiple voice variations to test audience preferences
* **Accessibility Content**: Create audio with specific characteristics for different accessibility needs
* **Multilingual Projects**: Generate consistent voice styles across different language content

## Error Handling

| Error Type          | Cause                                                 | Solution                                                         |
| ------------------- | ----------------------------------------------------- | ---------------------------------------------------------------- |
| Provider Error      | Selected provider is unavailable                      | Try switching to the alternative provider (replicate or baseten) |
| Invalid Description | Voice description is too vague or unclear             | Provide more specific details about voice characteristics        |
| Text Too Long       | Input text exceeds maximum length                     | Split text into smaller segments and process separately          |
| Empty Text          | Text field is empty                                   | Provide valid text content to convert to speech                  |
| Generation Failed   | AI unable to interpret description or generate speech | Simplify the description or try different voice characteristics  |
| Connection Failed   | Unable to access PixelML API                          | Check PixelML connection credentials and API availability        |
| Processing Timeout  | Audio generation took too long                        | Try with shorter text or simpler description                     |

## Notes

* **Description Quality**: More detailed and specific descriptions produce better results. Include details about gender, pitch, pace, environment, clarity, and emotional tone.
* **Provider Selection**: Different providers may produce slightly different results. Try both to find which works best for your needs.
* **Voice Characteristics**: You can control multiple aspects: speaker gender, voice pitch (low/medium/high), speaking pace (slow/moderate/fast), environment (studio/room/open space), audio clarity, and emotional tone.
* **Consistency**: Use similar descriptions across multiple generations to maintain voice consistency in a project.
* **Experimentation**: Don't hesitate to experiment with different descriptions to achieve your desired voice output.
* **Processing Time**: Generation typically takes 10-30 seconds depending on text length and description complexity.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/text_to_speech_custom.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
