Text to speech custom

Action ID: text_to_speech_custom

Description

Generate custom speech from text with detailed control over voice characteristics and audio properties.

Connection

Name
Description
Required
Category

PixelML Connection

The PixelML connection to call PixelML API.

True

pixelml

Input Parameters

Name
Type
Required
Default
Description

provider

dropdown

-

replicate

AI provider for speech generation. Available options: replicate, baseten

text

string

-

Text to convert to speech

description

string

-

A male speaker with a low-pitched voice delivering his words at a fast pace in a small, confined space with a very clear audio and an animated tone.

Detailed description of the desired output audio characteristics including gender, pitch, pace, environment, clarity, and tone

View JSON Schema

Input Schema

{
  "description": "Text To speech node input.",
  "properties": {
    "provider": {
      "default": "replicate",
      "description": "Provider",
      "enum": [
        "replicate",
        "baseten"
      ],
      "title": "Provider",
      "type": "string"
    },
    "text": {
      "description": "Text to convert to speech",
      "title": "Text",
      "type": "string"
    },
    "description": {
      "default": "A male speaker with a low-pitched voice delivering his words at a fast pace in a small, confined space with a very clear audio and an animated tone.",
      "description": "Provide description of the output audio",
      "title": "Provide description of the output audio",
      "type": "string"
    }
  },
  "required": [
    "text"
  ],
  "title": "TextToSpeechCustomNodeInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

voice_url

string

URL of the generated audio file

View JSON Schema

Output Schema

{
  "description": "Call other workflow node output.",
  "properties": {
    "voice_url": {
      "title": "Audio URL",
      "type": "string"
    }
  },
  "required": [
    "voice_url"
  ],
  "title": "TextToSpeechCustomNodeOutput",
  "type": "object"
}

How It Works

This node uses AI to generate custom speech from text based on detailed voice characteristic descriptions. Unlike standard text-to-speech or voice cloning, this node allows you to describe the exact voice properties you want, including speaker gender, pitch, speaking pace, environment acoustics, audio clarity, and emotional tone. The AI interprets your description and generates speech that matches those specifications.

Usage Examples

Example 1: Professional Narrator

Input:

provider: "replicate"
text: "Welcome to our comprehensive guide on artificial intelligence and machine learning."
description: "A professional female narrator with a medium pitch, speaking at a moderate pace in a studio environment with crystal clear audio and a confident, educational tone."

Output:

voice_url: "https://storage.pixelml.com/narrator-audio.mp3"

Example 2: Energetic Advertisement

Input:

provider: "baseten"
text: "Don't miss out on our incredible summer sale! Up to 70% off on all items!"
description: "An enthusiastic male speaker with a high-pitched voice delivering his words at a very fast pace in a bright, open space with energetic and exciting tone."

Output:

voice_url: "https://storage.pixelml.com/ad-audio.mp3"

Example 3: Calm Meditation Guide

Input:

provider: "replicate"
text: "Take a deep breath and let your body relax. Feel the tension leaving your muscles."
description: "A soothing female voice with a low pitch, speaking slowly and deliberately in a quiet, serene environment with soft, calming, and peaceful tone."

Output:

voice_url: "https://storage.pixelml.com/meditation-audio.mp3"

Common Use Cases

  • Dynamic Content Creation: Generate varied voice styles for different content types without needing multiple voice clones

  • Character Voices: Create unique character voices for games, animations, or audiobooks

  • Mood-Based Audio: Adjust voice characteristics to match the emotional context of content

  • Brand Voice Creation: Experiment with different voice styles to find the perfect brand voice

  • A/B Testing: Generate multiple voice variations to test audience preferences

  • Accessibility Content: Create audio with specific characteristics for different accessibility needs

  • Multilingual Projects: Generate consistent voice styles across different language content

Error Handling

Error Type
Cause
Solution

Provider Error

Selected provider is unavailable

Try switching to the alternative provider (replicate or baseten)

Invalid Description

Voice description is too vague or unclear

Provide more specific details about voice characteristics

Text Too Long

Input text exceeds maximum length

Split text into smaller segments and process separately

Empty Text

Text field is empty

Provide valid text content to convert to speech

Generation Failed

AI unable to interpret description or generate speech

Simplify the description or try different voice characteristics

Connection Failed

Unable to access PixelML API

Check PixelML connection credentials and API availability

Processing Timeout

Audio generation took too long

Try with shorter text or simpler description

Notes

  • Description Quality: More detailed and specific descriptions produce better results. Include details about gender, pitch, pace, environment, clarity, and emotional tone.

  • Provider Selection: Different providers may produce slightly different results. Try both to find which works best for your needs.

  • Voice Characteristics: You can control multiple aspects: speaker gender, voice pitch (low/medium/high), speaking pace (slow/moderate/fast), environment (studio/room/open space), audio clarity, and emotional tone.

  • Consistency: Use similar descriptions across multiple generations to maintain voice consistency in a project.

  • Experimentation: Don't hesitate to experiment with different descriptions to achieve your desired voice output.

  • Processing Time: Generation typically takes 10-30 seconds depending on text length and description complexity.

Last updated

Was this helpful?