Text to speech custom

Action ID: text_to_speech_custom

Description

Generate custom speech from text with detailed control over voice characteristics and audio properties.

Connection

Name

Description

Required

Input Parameters

Name

Type

Required

Default

Description

provider

dropdown

replicate

AI provider for speech generation. Available options: replicate, baseten

text

string

✓

Text to convert to speech

description

string

A male speaker with a low-pitched voice delivering his words at a fast pace in a small, confined space with a very clear audio and an animated tone.

Detailed description of the desired output audio characteristics including gender, pitch, pace, environment, clarity, and tone

View JSON Schema

Input Schema

{
  "description": "Text To speech node input.",
  "properties": {
    "provider": {
      "default": "replicate",
      "description": "Provider",
      "enum": [
        "replicate",
        "baseten"
      ],
      "title": "Provider",
      "type": "string"
    },
    "text": {
      "description": "Text to convert to speech",
      "title": "Text",
      "type": "string"
    },
    "description": {
      "default": "A male speaker with a low-pitched voice delivering his words at a fast pace in a small, confined space with a very clear audio and an animated tone.",
      "description": "Provide description of the output audio",
      "title": "Provide description of the output audio",
      "type": "string"
    }
  },
  "required": [
    "text"
  ],
  "title": "TextToSpeechCustomNodeInput",
  "type": "object"
}

Output Parameters

Name

Type

Description

voice_url

string

URL of the generated audio file

View JSON Schema

Output Schema

{
  "description": "Call other workflow node output.",
  "properties": {
    "voice_url": {
      "title": "Audio URL",
      "type": "string"
    }
  },
  "required": [
    "voice_url"
  ],
  "title": "TextToSpeechCustomNodeOutput",
  "type": "object"
}

How It Works

This node uses AI to generate custom speech from text based on detailed voice characteristic descriptions. Unlike standard text-to-speech or voice cloning, this node allows you to describe the exact voice properties you want, including speaker gender, pitch, speaking pace, environment acoustics, audio clarity, and emotional tone. The AI interprets your description and generates speech that matches those specifications.

Usage Examples

Example 1: Professional Narrator

Input:

provider: "replicate"
text: "Welcome to our comprehensive guide on artificial intelligence and machine learning."
description: "A professional female narrator with a medium pitch, speaking at a moderate pace in a studio environment with crystal clear audio and a confident, educational tone."

Output:

voice_url: "https://storage.pixelml.com/narrator-audio.mp3"

Example 2: Energetic Advertisement

Input:

provider: "baseten"
text: "Don't miss out on our incredible summer sale! Up to 70% off on all items!"
description: "An enthusiastic male speaker with a high-pitched voice delivering his words at a very fast pace in a bright, open space with energetic and exciting tone."

Output:

voice_url: "https://storage.pixelml.com/ad-audio.mp3"

Example 3: Calm Meditation Guide

Input:

provider: "replicate"
text: "Take a deep breath and let your body relax. Feel the tension leaving your muscles."
description: "A soothing female voice with a low pitch, speaking slowly and deliberately in a quiet, serene environment with soft, calming, and peaceful tone."

Output:

voice_url: "https://storage.pixelml.com/meditation-audio.mp3"

Common Use Cases

Dynamic Content Creation: Generate varied voice styles for different content types without needing multiple voice clones
Character Voices: Create unique character voices for games, animations, or audiobooks
Mood-Based Audio: Adjust voice characteristics to match the emotional context of content
Brand Voice Creation: Experiment with different voice styles to find the perfect brand voice
A/B Testing: Generate multiple voice variations to test audience preferences
Accessibility Content: Create audio with specific characteristics for different accessibility needs
Multilingual Projects: Generate consistent voice styles across different language content

Error Handling

Error Type

Cause

Solution

Provider Error

Selected provider is unavailable

Try switching to the alternative provider (replicate or baseten)

Invalid Description

Voice description is too vague or unclear

Provide more specific details about voice characteristics

Text Too Long

Input text exceeds maximum length

Split text into smaller segments and process separately

Empty Text

Text field is empty

Provide valid text content to convert to speech

Generation Failed

AI unable to interpret description or generate speech

Simplify the description or try different voice characteristics

Connection Failed

Unable to access PixelML API

Check PixelML connection credentials and API availability

Processing Timeout

Audio generation took too long

Try with shorter text or simpler description

Notes

Description Quality: More detailed and specific descriptions produce better results. Include details about gender, pitch, pace, environment, clarity, and emotional tone.
Provider Selection: Different providers may produce slightly different results. Try both to find which works best for your needs.
Voice Characteristics: You can control multiple aspects: speaker gender, voice pitch (low/medium/high), speaking pace (slow/moderate/fast), environment (studio/room/open space), audio clarity, and emotional tone.
Consistency: Use similar descriptions across multiple generations to maintain voice consistency in a project.
Experimentation: Don't hesitate to experiment with different descriptions to achieve your desired voice output.
Processing Time: Generation typically takes 10-30 seconds depending on text length and description complexity.

PreviousText to Speech NextText to speech with voice clone

Last updated 3 months ago

hashtagDescription

hashtagConnection

hashtagInput Parameters

hashtagInput Schema

hashtagOutput Parameters

hashtagOutput Schema

hashtagHow It Works

hashtagUsage Examples

hashtagExample 1: Professional Narrator

hashtagExample 2: Energetic Advertisement

hashtagExample 3: Calm Meditation Guide

hashtagCommon Use Cases

hashtagError Handling

hashtagNotes

Description

Connection

Input Parameters

Input Schema

Output Parameters

Output Schema

How It Works

Usage Examples

Example 1: Professional Narrator

Example 2: Energetic Advertisement

Example 3: Calm Meditation Guide

Common Use Cases

Error Handling

Notes