Speech to text

Action ID: speech_to_text

Description

Speech to text node.

Connection

Name
Description
Required
Category

PixelML Connection

The PixelML connection to call PixelML API.

True

pixelml

Input Parameters

Name
Type
Required
Default
Description

provider

dropdown

-

Groq

Which provider to use for speech to text. Available options: Groq, Azure, AWS_Transcribe

language

dropdown

-

Which language that the audio is in. Supports: English (UK, Canada, US, South Africa), French, Italian, Japanese, Russian, Vietnamese, Chinese variants

audio

string

-

Audio file URL to convert and transcribe

View JSON Schema

Input Schema

{
  "$defs": {
    "SpeechToTextProvider": {
      "description": "Speech to text provider.",
      "enum": [
        "Groq",
        "Azure",
        "AWS_Transcribe"
      ],
      "title": "SpeechToTextProvider",
      "type": "string"
    },
    "SpeechToTextSupportedLanguage": {
      "description": "Speech to text supported language.",
      "enum": [
        "English (United Kingdom)",
        "English (Canada)",
        "English (United States)",
        "English (South Africa)",
        "French (France)",
        "Italian (Italy)",
        "Japanese (Japan)",
        "Russian (Russia)",
        "Vietnamese (Vietnam)",
        "Chinese (Wu, Simplified)",
        "Chinese (Cantonese, Simplified)",
        "Chinese (Mandarin, Simplified)"
      ],
      "title": "SpeechToTextSupportedLanguage",
      "type": "string"
    }
  },
  "description": "Speech to text node input.",
  "properties": {
    "provider": {
      "$ref": "#/$defs/SpeechToTextProvider",
      "default": "Groq",
      "description": "Which provider to use for speech to text",
      "title": "Provider"
    },
    "language": {
      "$ref": "#/$defs/SpeechToTextSupportedLanguage",
      "description": "Which language that the audio is in",
      "title": "Language"
    },
    "audio": {
      "description": "Audio file url to convert transcribe",
      "title": "Audio file",
      "type": "string"
    }
  },
  "required": [
    "language",
    "audio"
  ],
  "title": "SpeechToTextNodeInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

transcript

string

The transcribed text from the audio file

transcript_file

string

URL to a file containing the full transcript

View JSON Schema
{
  "description": "Speech To Text node output.",
  "properties": {
    "transcript": {
      "title": "Transcribed text",
      "type": "string"
    },
    "transcript_file": {
      "title": "Transcript file",
      "type": "string"
    }
  },
  "required": [
    "transcript",
    "transcript_file"
  ],
  "title": "SpeechToTextNodeOutput",
  "type": "object"
}

How It Works

This node takes an audio file URL and language specification, sends the audio to your chosen speech-to-text provider (Groq, Azure, or AWS Transcribe) through the PixelML API, processes the audio through advanced speech recognition algorithms, and returns both the transcribed text and a URL to a file containing the complete transcript.

Usage Examples

Example 1: English Meeting Transcription with Groq

Input:

provider: "Groq"
language: "English (United States)"
audio: "https://example.com/team-meeting.mp3"

Output:

transcript: "Good morning team. Today we'll discuss the Q4 roadmap and our strategic priorities for the upcoming quarter. Let's start with product updates from the engineering team."
transcript_file: "https://pixelml-storage.com/transcripts/abc123-meeting.txt"

Example 2: French Customer Call with Azure

Input:

provider: "Azure"
language: "French (France)"
audio: "https://example.com/customer-call-fr.wav"

Output:

transcript: "Bonjour, merci d'avoir appelé notre service client. Comment puis-je vous aider aujourd'hui? Je comprends votre problème et je vais vous aider à le résoudre."
transcript_file: "https://pixelml-storage.com/transcripts/def456-call-fr.txt"

Example 3: Japanese Interview with AWS Transcribe

Input:

provider: "AWS_Transcribe"
language: "Japanese (Japan)"
audio: "https://example.com/interview-jp.mp3"

Output:

transcript: "本日はインタビューにお越しいただきありがとうございます。まず、あなたの経験とスキルについて教えてください。"
transcript_file: "https://pixelml-storage.com/transcripts/ghi789-interview-jp.txt"

Common Use Cases

  • Meeting Transcription: Convert recorded business meetings, standups, or conference calls into searchable text documents

  • Customer Service Analysis: Transcribe support calls for quality assurance, training, or sentiment analysis

  • Interview Documentation: Create written records of job interviews, research interviews, or media interviews

  • Podcast Production: Generate transcripts for podcast episodes to improve accessibility and SEO

  • Voice Note Processing: Convert voice memos and audio notes into text for easier organization and search

  • Multilingual Content Creation: Transcribe audio content in multiple languages for translation or localization workflows

  • Legal Documentation: Create accurate transcripts of depositions, hearings, or client consultations

Error Handling

Error Type
Cause
Solution

Invalid API Connection

PixelML connection credentials are missing or incorrect

Verify your PixelML API credentials in the connection settings

Audio URL Inaccessible

Cannot download audio file from provided URL

Ensure the URL is publicly accessible and returns a valid audio file

Unsupported Audio Format

Audio format not supported by the selected provider

Convert audio to a commonly supported format like MP3, WAV, or M4A

Language Not Supported

Selected language not available for the chosen provider

Select a different language or switch to a provider that supports it

Transcription Failed

Provider unable to process the audio

Check audio quality and ensure it contains clear speech

Provider Unavailable

Selected speech-to-text provider is temporarily down

Try a different provider or retry after a short delay

Rate Limit Exceeded

Too many transcription requests in a short time

Implement delays between requests or contact PixelML about rate limits

Notes

  • Provider Selection: Each provider (Groq, Azure, AWS Transcribe) has different strengths. Groq offers fast processing, Azure excels at multiple languages, and AWS provides robust accuracy.

  • Language Matching: Always select the correct language to ensure accurate transcription. Mismatched languages result in poor quality output.

  • Audio Quality: Clear audio with minimal background noise produces the best transcription results. Consider audio preprocessing for noisy files.

  • Supported Languages: The node supports 12 language variants including multiple English dialects, French, Italian, Japanese, Russian, Vietnamese, and Chinese dialects.

  • File Output: The transcript_file URL provides a persistent copy of the full transcript, useful for long audio files or archiving.

  • Cost Considerations: Different providers have different pricing models. Check PixelML's pricing for each provider.

  • Processing Time: Transcription time varies by provider and audio length. Longer files take more time to process.

  • Accuracy: Transcription accuracy depends on audio quality, speaker clarity, accent, and technical terminology. Review transcripts for critical applications.

Last updated

Was this helpful?