Text to speech custom
Action ID: text_to_speech_custom
Description
Generate custom speech from text with detailed control over voice characteristics and audio properties.
Connection
PixelML Connection
The PixelML connection to call PixelML API.
True
pixelml
Input Parameters
provider
dropdown
-
replicate
AI provider for speech generation. Available options: replicate, baseten
text
string
✓
-
Text to convert to speech
description
string
-
A male speaker with a low-pitched voice delivering his words at a fast pace in a small, confined space with a very clear audio and an animated tone.
Detailed description of the desired output audio characteristics including gender, pitch, pace, environment, clarity, and tone
Output Parameters
voice_url
string
URL of the generated audio file
How It Works
This node uses AI to generate custom speech from text based on detailed voice characteristic descriptions. Unlike standard text-to-speech or voice cloning, this node allows you to describe the exact voice properties you want, including speaker gender, pitch, speaking pace, environment acoustics, audio clarity, and emotional tone. The AI interprets your description and generates speech that matches those specifications.
Usage Examples
Example 1: Professional Narrator
Input:
provider: "replicate"
text: "Welcome to our comprehensive guide on artificial intelligence and machine learning."
description: "A professional female narrator with a medium pitch, speaking at a moderate pace in a studio environment with crystal clear audio and a confident, educational tone."Output:
voice_url: "https://storage.pixelml.com/narrator-audio.mp3"Example 2: Energetic Advertisement
Input:
provider: "baseten"
text: "Don't miss out on our incredible summer sale! Up to 70% off on all items!"
description: "An enthusiastic male speaker with a high-pitched voice delivering his words at a very fast pace in a bright, open space with energetic and exciting tone."Output:
voice_url: "https://storage.pixelml.com/ad-audio.mp3"Example 3: Calm Meditation Guide
Input:
provider: "replicate"
text: "Take a deep breath and let your body relax. Feel the tension leaving your muscles."
description: "A soothing female voice with a low pitch, speaking slowly and deliberately in a quiet, serene environment with soft, calming, and peaceful tone."Output:
voice_url: "https://storage.pixelml.com/meditation-audio.mp3"Common Use Cases
Dynamic Content Creation: Generate varied voice styles for different content types without needing multiple voice clones
Character Voices: Create unique character voices for games, animations, or audiobooks
Mood-Based Audio: Adjust voice characteristics to match the emotional context of content
Brand Voice Creation: Experiment with different voice styles to find the perfect brand voice
A/B Testing: Generate multiple voice variations to test audience preferences
Accessibility Content: Create audio with specific characteristics for different accessibility needs
Multilingual Projects: Generate consistent voice styles across different language content
Error Handling
Provider Error
Selected provider is unavailable
Try switching to the alternative provider (replicate or baseten)
Invalid Description
Voice description is too vague or unclear
Provide more specific details about voice characteristics
Text Too Long
Input text exceeds maximum length
Split text into smaller segments and process separately
Empty Text
Text field is empty
Provide valid text content to convert to speech
Generation Failed
AI unable to interpret description or generate speech
Simplify the description or try different voice characteristics
Connection Failed
Unable to access PixelML API
Check PixelML connection credentials and API availability
Processing Timeout
Audio generation took too long
Try with shorter text or simpler description
Notes
Description Quality: More detailed and specific descriptions produce better results. Include details about gender, pitch, pace, environment, clarity, and emotional tone.
Provider Selection: Different providers may produce slightly different results. Try both to find which works best for your needs.
Voice Characteristics: You can control multiple aspects: speaker gender, voice pitch (low/medium/high), speaking pace (slow/moderate/fast), environment (studio/room/open space), audio clarity, and emotional tone.
Consistency: Use similar descriptions across multiple generations to maintain voice consistency in a project.
Experimentation: Don't hesitate to experiment with different descriptions to achieve your desired voice output.
Processing Time: Generation typically takes 10-30 seconds depending on text length and description complexity.
Last updated
Was this helpful?