Speech to text

Action ID: speech_to_text

Description

Speech to text node.

Connection

Name

Description

Required

Input Parameters

Name

Type

Required

Default

Description

provider

dropdown

Groq

Which provider to use for speech to text. Available options: Groq, Azure, AWS_Transcribe

language

dropdown

✓

Which language that the audio is in. Supports: English (UK, Canada, US, South Africa), French, Italian, Japanese, Russian, Vietnamese, Chinese variants

audio

string

✓

Audio file URL to convert and transcribe

View JSON Schema

Input Schema

{
  "$defs": {
    "SpeechToTextProvider": {
      "description": "Speech to text provider.",
      "enum": [
        "Groq",
        "Azure",
        "AWS_Transcribe"
      ],
      "title": "SpeechToTextProvider",
      "type": "string"
    },
    "SpeechToTextSupportedLanguage": {
      "description": "Speech to text supported language.",
      "enum": [
        "English (United Kingdom)",
        "English (Canada)",
        "English (United States)",
        "English (South Africa)",
        "French (France)",
        "Italian (Italy)",
        "Japanese (Japan)",
        "Russian (Russia)",
        "Vietnamese (Vietnam)",
        "Chinese (Wu, Simplified)",
        "Chinese (Cantonese, Simplified)",
        "Chinese (Mandarin, Simplified)"
      ],
      "title": "SpeechToTextSupportedLanguage",
      "type": "string"
    }
  },
  "description": "Speech to text node input.",
  "properties": {
    "provider": {
      "$ref": "#/$defs/SpeechToTextProvider",
      "default": "Groq",
      "description": "Which provider to use for speech to text",
      "title": "Provider"
    },
    "language": {
      "$ref": "#/$defs/SpeechToTextSupportedLanguage",
      "description": "Which language that the audio is in",
      "title": "Language"
    },
    "audio": {
      "description": "Audio file url to convert transcribe",
      "title": "Audio file",
      "type": "string"
    }
  },
  "required": [
    "language",
    "audio"
  ],
  "title": "SpeechToTextNodeInput",
  "type": "object"
}

Output Parameters

Name

Type

Description

transcript

string

The transcribed text from the audio file

transcript_file

string

URL to a file containing the full transcript

View JSON Schema

{
  "description": "Speech To Text node output.",
  "properties": {
    "transcript": {
      "title": "Transcribed text",
      "type": "string"
    },
    "transcript_file": {
      "title": "Transcript file",
      "type": "string"
    }
  },
  "required": [
    "transcript",
    "transcript_file"
  ],
  "title": "SpeechToTextNodeOutput",
  "type": "object"
}

How It Works

This node takes an audio file URL and language specification, sends the audio to your chosen speech-to-text provider (Groq, Azure, or AWS Transcribe) through the PixelML API, processes the audio through advanced speech recognition algorithms, and returns both the transcribed text and a URL to a file containing the complete transcript.

Usage Examples

Example 1: English Meeting Transcription with Groq

Input:

provider: "Groq"
language: "English (United States)"
audio: "https://example.com/team-meeting.mp3"

Output:

transcript: "Good morning team. Today we'll discuss the Q4 roadmap and our strategic priorities for the upcoming quarter. Let's start with product updates from the engineering team."
transcript_file: "https://pixelml-storage.com/transcripts/abc123-meeting.txt"

Example 2: French Customer Call with Azure

Input:

provider: "Azure"
language: "French (France)"
audio: "https://example.com/customer-call-fr.wav"

Output:

transcript: "Bonjour, merci d'avoir appelé notre service client. Comment puis-je vous aider aujourd'hui? Je comprends votre problème et je vais vous aider à le résoudre."
transcript_file: "https://pixelml-storage.com/transcripts/def456-call-fr.txt"

Example 3: Japanese Interview with AWS Transcribe

Input:

provider: "AWS_Transcribe"
language: "Japanese (Japan)"
audio: "https://example.com/interview-jp.mp3"

Output:

transcript: "本日はインタビューにお越しいただきありがとうございます。まず、あなたの経験とスキルについて教えてください。"
transcript_file: "https://pixelml-storage.com/transcripts/ghi789-interview-jp.txt"

Common Use Cases

Meeting Transcription: Convert recorded business meetings, standups, or conference calls into searchable text documents
Customer Service Analysis: Transcribe support calls for quality assurance, training, or sentiment analysis
Interview Documentation: Create written records of job interviews, research interviews, or media interviews
Podcast Production: Generate transcripts for podcast episodes to improve accessibility and SEO
Voice Note Processing: Convert voice memos and audio notes into text for easier organization and search
Multilingual Content Creation: Transcribe audio content in multiple languages for translation or localization workflows
Legal Documentation: Create accurate transcripts of depositions, hearings, or client consultations

Error Handling

Error Type

Cause

Solution

Invalid API Connection

PixelML connection credentials are missing or incorrect

Verify your PixelML API credentials in the connection settings

Audio URL Inaccessible

Cannot download audio file from provided URL

Ensure the URL is publicly accessible and returns a valid audio file

Unsupported Audio Format

Audio format not supported by the selected provider

Convert audio to a commonly supported format like MP3, WAV, or M4A

Language Not Supported

Selected language not available for the chosen provider

Select a different language or switch to a provider that supports it

Transcription Failed

Provider unable to process the audio

Check audio quality and ensure it contains clear speech

Provider Unavailable

Selected speech-to-text provider is temporarily down

Try a different provider or retry after a short delay

Rate Limit Exceeded

Too many transcription requests in a short time

Implement delays between requests or contact PixelML about rate limits

Notes

Provider Selection: Each provider (Groq, Azure, AWS Transcribe) has different strengths. Groq offers fast processing, Azure excels at multiple languages, and AWS provides robust accuracy.
Language Matching: Always select the correct language to ensure accurate transcription. Mismatched languages result in poor quality output.
Audio Quality: Clear audio with minimal background noise produces the best transcription results. Consider audio preprocessing for noisy files.
Supported Languages: The node supports 12 language variants including multiple English dialects, French, Italian, Japanese, Russian, Vietnamese, and Chinese dialects.
File Output: The transcript_file URL provides a persistent copy of the full transcript, useful for long audio files or archiving.
Cost Considerations: Different providers have different pricing models. Check PixelML's pricing for each provider.
Processing Time: Transcription time varies by provider and audio length. Longer files take more time to process.
Accuracy: Transcription accuracy depends on audio quality, speaker clarity, accent, and technical terminology. Review transcripts for critical applications.

PreviousSocial profile analyzer NextRun Straico Image Generation

Last updated 3 months ago

hashtagDescription

hashtagConnection

hashtagInput Parameters

hashtagInput Schema

hashtagOutput Parameters

hashtagHow It Works

hashtagUsage Examples

hashtagExample 1: English Meeting Transcription with Groq

hashtagExample 2: French Customer Call with Azure

hashtagExample 3: Japanese Interview with AWS Transcribe

hashtagCommon Use Cases

hashtagError Handling

hashtagNotes

Description

Connection

Input Parameters

Input Schema

Output Parameters

How It Works

Usage Examples

Example 1: English Meeting Transcription with Groq

Example 2: French Customer Call with Azure

Example 3: Japanese Interview with AWS Transcribe

Common Use Cases

Error Handling

Notes