Speech to text
Action ID: speech_to_text
Description
Speech to text node.
Connection
PixelML Connection
The PixelML connection to call PixelML API.
True
pixelml
Input Parameters
provider
dropdown
-
Groq
Which provider to use for speech to text. Available options: Groq, Azure, AWS_Transcribe
language
dropdown
✓
-
Which language that the audio is in. Supports: English (UK, Canada, US, South Africa), French, Italian, Japanese, Russian, Vietnamese, Chinese variants
audio
string
✓
-
Audio file URL to convert and transcribe
Output Parameters
transcript
string
The transcribed text from the audio file
transcript_file
string
URL to a file containing the full transcript
How It Works
This node takes an audio file URL and language specification, sends the audio to your chosen speech-to-text provider (Groq, Azure, or AWS Transcribe) through the PixelML API, processes the audio through advanced speech recognition algorithms, and returns both the transcribed text and a URL to a file containing the complete transcript.
Usage Examples
Example 1: English Meeting Transcription with Groq
Input:
provider: "Groq"
language: "English (United States)"
audio: "https://example.com/team-meeting.mp3"Output:
transcript: "Good morning team. Today we'll discuss the Q4 roadmap and our strategic priorities for the upcoming quarter. Let's start with product updates from the engineering team."
transcript_file: "https://pixelml-storage.com/transcripts/abc123-meeting.txt"Example 2: French Customer Call with Azure
Input:
provider: "Azure"
language: "French (France)"
audio: "https://example.com/customer-call-fr.wav"Output:
transcript: "Bonjour, merci d'avoir appelé notre service client. Comment puis-je vous aider aujourd'hui? Je comprends votre problème et je vais vous aider à le résoudre."
transcript_file: "https://pixelml-storage.com/transcripts/def456-call-fr.txt"Example 3: Japanese Interview with AWS Transcribe
Input:
provider: "AWS_Transcribe"
language: "Japanese (Japan)"
audio: "https://example.com/interview-jp.mp3"Output:
transcript: "本日はインタビューにお越しいただきありがとうございます。まず、あなたの経験とスキルについて教えてください。"
transcript_file: "https://pixelml-storage.com/transcripts/ghi789-interview-jp.txt"Common Use Cases
Meeting Transcription: Convert recorded business meetings, standups, or conference calls into searchable text documents
Customer Service Analysis: Transcribe support calls for quality assurance, training, or sentiment analysis
Interview Documentation: Create written records of job interviews, research interviews, or media interviews
Podcast Production: Generate transcripts for podcast episodes to improve accessibility and SEO
Voice Note Processing: Convert voice memos and audio notes into text for easier organization and search
Multilingual Content Creation: Transcribe audio content in multiple languages for translation or localization workflows
Legal Documentation: Create accurate transcripts of depositions, hearings, or client consultations
Error Handling
Invalid API Connection
PixelML connection credentials are missing or incorrect
Verify your PixelML API credentials in the connection settings
Audio URL Inaccessible
Cannot download audio file from provided URL
Ensure the URL is publicly accessible and returns a valid audio file
Unsupported Audio Format
Audio format not supported by the selected provider
Convert audio to a commonly supported format like MP3, WAV, or M4A
Language Not Supported
Selected language not available for the chosen provider
Select a different language or switch to a provider that supports it
Transcription Failed
Provider unable to process the audio
Check audio quality and ensure it contains clear speech
Provider Unavailable
Selected speech-to-text provider is temporarily down
Try a different provider or retry after a short delay
Rate Limit Exceeded
Too many transcription requests in a short time
Implement delays between requests or contact PixelML about rate limits
Notes
Provider Selection: Each provider (Groq, Azure, AWS Transcribe) has different strengths. Groq offers fast processing, Azure excels at multiple languages, and AWS provides robust accuracy.
Language Matching: Always select the correct language to ensure accurate transcription. Mismatched languages result in poor quality output.
Audio Quality: Clear audio with minimal background noise produces the best transcription results. Consider audio preprocessing for noisy files.
Supported Languages: The node supports 12 language variants including multiple English dialects, French, Italian, Japanese, Russian, Vietnamese, and Chinese dialects.
File Output: The transcript_file URL provides a persistent copy of the full transcript, useful for long audio files or archiving.
Cost Considerations: Different providers have different pricing models. Check PixelML's pricing for each provider.
Processing Time: Transcription time varies by provider and audio length. Longer files take more time to process.
Accuracy: Transcription accuracy depends on audio quality, speaker clarity, accent, and technical terminology. Review transcripts for critical applications.
Last updated
Was this helpful?