# Text to speech with voice clone

**Action ID:** `text_to_speech_voice_clone`

## Description

Convert text to speech with voice cloning across 32 languages.

## Connection

| Name               | Description                                 | Required | Category |
| ------------------ | ------------------------------------------- | -------- | -------- |
| PixelML Connection | The PixelML connection to call PixelML API. | True     | pixelml  |

## Input Parameters

| Name       | Type   | Required | Default | Description                              |
| ---------- | ------ | :------: | ------- | ---------------------------------------- |
| voice\_id  | string |     ✓    | -       | Voice ID of the cloned voice to use      |
| text       | string |     ✓    | -       | Text to convert to speech                |
| file\_name | string |     ✓    | -       | Output file name for the generated audio |

<details>

<summary>View JSON Schema</summary>

**Input Schema**

```json
{
  "description": "Text To speech with voice clone node input.",
  "properties": {
    "voice_id": {
      "description": "Voice ID",
      "title": "Voice ID",
      "type": "string"
    },
    "text": {
      "description": "Text",
      "title": "Text",
      "type": "string"
    },
    "file_name": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "description": "File name",
      "title": "File name"
    }
  },
  "required": [
    "voice_id",
    "text",
    "file_name"
  ],
  "title": "TextToSpeechVoiceCloneNodeInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name       | Type   | Description                                       |
| ---------- | ------ | ------------------------------------------------- |
| voice\_url | string | URL of the generated audio file with cloned voice |

<details>

<summary>View JSON Schema</summary>

**Output Schema**

```json
{
  "description": "Text to speech voice clone node output.",
  "properties": {
    "voice_url": {
      "title": "Audio URL",
      "type": "string"
    }
  },
  "required": [
    "voice_url"
  ],
  "title": "TextToSpeechVoiceCloneNodeOutput",
  "type": "object"
}
```

</details>

## How It Works

This node converts text to natural-sounding speech using a cloned voice profile. You provide a voice\_id that references a previously cloned voice, along with the text you want to convert. The AI analyzes the voice characteristics from the cloned profile and generates speech that matches the tone, accent, and speaking style of the original voice across 32 supported languages.

## Usage Examples

### Example 1: Basic Voice Cloning

**Input:**

```
voice_id: "voice_abc123xyz"
text: "Hello, welcome to our platform. We're excited to have you here."
file_name: "welcome-message.mp3"
```

**Output:**

```
voice_url: "https://storage.pixelml.com/welcome-message.mp3"
```

### Example 2: Multilingual Content

**Input:**

```
voice_id: "voice_def456uvw"
text: "Bonjour! Comment allez-vous aujourd'hui?"
file_name: "french-greeting.mp3"
```

**Output:**

```
voice_url: "https://storage.pixelml.com/french-greeting.mp3"
```

### Example 3: Long-Form Content

**Input:**

```
voice_id: "voice_ghi789rst"
text: "In today's episode, we'll explore the fascinating world of artificial intelligence and how it's transforming the way we live and work. From machine learning to natural language processing, AI is revolutionizing every industry."
file_name: "podcast-intro.mp3"
```

**Output:**

```
voice_url: "https://storage.pixelml.com/podcast-intro.mp3"
```

## Common Use Cases

* **Content Localization**: Create multilingual audio content using the same voice across 32 languages
* **Podcast Production**: Generate podcast episodes with consistent voice characteristics
* **Audiobook Creation**: Convert written content to audiobooks with a specific narrator's voice
* **Video Narration**: Create voiceovers for videos using cloned voice profiles
* **Virtual Assistants**: Build personalized voice assistants with custom voice characteristics
* **E-Learning**: Produce educational content with consistent instructor voices
* **Personalized Messages**: Generate custom audio messages for customers or users

## Error Handling

| Error Type             | Cause                                                 | Solution                                                               |
| ---------------------- | ----------------------------------------------------- | ---------------------------------------------------------------------- |
| Invalid Voice ID       | Voice ID doesn't exist or is inaccessible             | Verify the voice\_id is correct and the voice has been properly cloned |
| Text Too Long          | Input text exceeds maximum length                     | Split text into smaller chunks and process separately                  |
| Empty Text             | Text field is empty or only whitespace                | Provide valid text content to convert to speech                        |
| Language Not Supported | Text language is not among the 32 supported languages | Use text in one of the supported languages                             |
| Voice Profile Error    | Voice clone profile is corrupted or incomplete        | Re-clone the voice or use a different voice\_id                        |
| Connection Failed      | Unable to access PixelML API                          | Check PixelML connection credentials and API availability              |
| Processing Timeout     | Audio generation took too long                        | Try with shorter text or retry the operation                           |

## Notes

* **Voice Quality**: The cloned voice quality depends on the quality and characteristics of the original voice sample used for cloning.
* **Language Support**: This node supports 32 languages, making it ideal for international content creation.
* **Text Length**: Longer text may take more time to process. Consider splitting very long content into smaller segments.
* **Voice Consistency**: Using the same voice\_id ensures consistent voice characteristics across multiple audio generations.
* **File Naming**: Use descriptive file names to easily identify and organize your generated audio files.
* **Processing Time**: Generation typically takes 5-15 seconds depending on text length and complexity.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/text_to_speech_voice_clone.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
