Text to speech with voice clone

Action ID: text_to_speech_voice_clone

Description

Convert text to speech with voice cloning across 32 languages.

Connection

Name
Description
Required
Category

PixelML Connection

The PixelML connection to call PixelML API.

True

pixelml

Input Parameters

Name
Type
Required
Default
Description

voice_id

string

-

Voice ID of the cloned voice to use

text

string

-

Text to convert to speech

file_name

string

-

Output file name for the generated audio

View JSON Schema

Input Schema

{
  "description": "Text To speech with voice clone node input.",
  "properties": {
    "voice_id": {
      "description": "Voice ID",
      "title": "Voice ID",
      "type": "string"
    },
    "text": {
      "description": "Text",
      "title": "Text",
      "type": "string"
    },
    "file_name": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "description": "File name",
      "title": "File name"
    }
  },
  "required": [
    "voice_id",
    "text",
    "file_name"
  ],
  "title": "TextToSpeechVoiceCloneNodeInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

voice_url

string

URL of the generated audio file with cloned voice

View JSON Schema

Output Schema

{
  "description": "Text to speech voice clone node output.",
  "properties": {
    "voice_url": {
      "title": "Audio URL",
      "type": "string"
    }
  },
  "required": [
    "voice_url"
  ],
  "title": "TextToSpeechVoiceCloneNodeOutput",
  "type": "object"
}

How It Works

This node converts text to natural-sounding speech using a cloned voice profile. You provide a voice_id that references a previously cloned voice, along with the text you want to convert. The AI analyzes the voice characteristics from the cloned profile and generates speech that matches the tone, accent, and speaking style of the original voice across 32 supported languages.

Usage Examples

Example 1: Basic Voice Cloning

Input:

voice_id: "voice_abc123xyz"
text: "Hello, welcome to our platform. We're excited to have you here."
file_name: "welcome-message.mp3"

Output:

voice_url: "https://storage.pixelml.com/welcome-message.mp3"

Example 2: Multilingual Content

Input:

voice_id: "voice_def456uvw"
text: "Bonjour! Comment allez-vous aujourd'hui?"
file_name: "french-greeting.mp3"

Output:

voice_url: "https://storage.pixelml.com/french-greeting.mp3"

Example 3: Long-Form Content

Input:

voice_id: "voice_ghi789rst"
text: "In today's episode, we'll explore the fascinating world of artificial intelligence and how it's transforming the way we live and work. From machine learning to natural language processing, AI is revolutionizing every industry."
file_name: "podcast-intro.mp3"

Output:

voice_url: "https://storage.pixelml.com/podcast-intro.mp3"

Common Use Cases

  • Content Localization: Create multilingual audio content using the same voice across 32 languages

  • Podcast Production: Generate podcast episodes with consistent voice characteristics

  • Audiobook Creation: Convert written content to audiobooks with a specific narrator's voice

  • Video Narration: Create voiceovers for videos using cloned voice profiles

  • Virtual Assistants: Build personalized voice assistants with custom voice characteristics

  • E-Learning: Produce educational content with consistent instructor voices

  • Personalized Messages: Generate custom audio messages for customers or users

Error Handling

Error Type
Cause
Solution

Invalid Voice ID

Voice ID doesn't exist or is inaccessible

Verify the voice_id is correct and the voice has been properly cloned

Text Too Long

Input text exceeds maximum length

Split text into smaller chunks and process separately

Empty Text

Text field is empty or only whitespace

Provide valid text content to convert to speech

Language Not Supported

Text language is not among the 32 supported languages

Use text in one of the supported languages

Voice Profile Error

Voice clone profile is corrupted or incomplete

Re-clone the voice or use a different voice_id

Connection Failed

Unable to access PixelML API

Check PixelML connection credentials and API availability

Processing Timeout

Audio generation took too long

Try with shorter text or retry the operation

Notes

  • Voice Quality: The cloned voice quality depends on the quality and characteristics of the original voice sample used for cloning.

  • Language Support: This node supports 32 languages, making it ideal for international content creation.

  • Text Length: Longer text may take more time to process. Consider splitting very long content into smaller segments.

  • Voice Consistency: Using the same voice_id ensures consistent voice characteristics across multiple audio generations.

  • File Naming: Use descriptive file names to easily identify and organize your generated audio files.

  • Processing Time: Generation typically takes 5-15 seconds depending on text length and complexity.

Last updated

Was this helpful?