Audio Transcription

Action ID: openai_transcriptions

Description

Transcribe audio to text using OpenAI's Whisper model. This node converts spoken audio in multiple formats into accurate written text transcripts. Whisper supports numerous languages, accents, and technical language, making it highly versatile for audio processing workflows.

Provider

OpenAI

Connection

Name
Description
Required
Category

OpenAI Connection

The OpenAI connection to use for transcription.

openai

Input Parameters

Name
Type
Required
Default
Description

model

dropdown

-

whisper-1

The model to use for transcription.

audio_file

string

-

The audio file to transcribe. Accepts: file_id, HTTP URL, or uploaded file. Supported formats: MP3, MP4, WebM.

response_format

dropdown

-

text

The format of the transcript output. Options: json, text, srt, verbose_json, vtt

language

string

-

-

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

prompt

string

-

-

An optional text to guide the model's style or continue a previous audio segment.

temperature

number

-

0.0

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

View JSON Schema
{
  "description": "Transcription node input.",
  "properties": {
    "model": {
      "default": "whisper-1",
      "description": "The model to use for transcription.",
      "enum": [
        "whisper-1"
      ],
      "title": "Model",
      "type": "string"
    },
    "audio_file": {
      "description": "The audio file to transcribe.",
      "title": "Audio File",
      "type": "string"
    },
    "response_format": {
      "default": "text",
      "description": "The format of the transcript output.",
      "enum": [
        "json",
        "text",
        "srt",
        "verbose_json",
        "vtt"
      ],
      "title": "Response Format",
      "type": "string"
    },
    "language": {
      "description": "The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.",
      "title": "Language",
      "type": "string"
    },
    "prompt": {
      "description": "An optional text to guide the model's style or continue a previous audio segment.",
      "title": "Prompt",
      "type": "string"
    },
    "temperature": {
      "default": 0.0,
      "description": "The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.",
      "maximum": 1.0,
      "minimum": 0.0,
      "title": "Temperature",
      "type": "number"
    }
  },
  "required": [
    "audio_file"
  ],
  "title": "TranscriptionInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

text

string

The transcribed text from the audio file.

View JSON Schema
{
  "description": "Response from transcription.",
  "properties": {
    "text": {
      "title": "Text",
      "type": "string"
    }
  },
  "title": "TranscriptionResponse",
  "type": "object"
}

How It Works

This node sends your audio file to OpenAI's Whisper API for transcription. Whisper automatically detects the language if not specified, though providing the language code (ISO-639-1 format like "en" for English, "fr" for French) improves accuracy and speed. The response format determines how the transcription is returned: plain text, JSON with metadata, or timestamped formats (SRT/VTT). Optional prompts can guide the model's interpretation of ambiguous words or maintain consistency with previous segments.

Usage Examples

Example 1: Simple Audio Transcription

Input:

audio_file: "https://example.com/interview.mp3"
model: "whisper-1"
response_format: "text"
language: "en"
temperature: 0.0

Output:

text: "Good morning, thank you for joining us today. I'm excited to discuss the new project developments..."

Example 2: Timestamped Video Subtitle Transcription

Input:

audio_file: "file_id_abc123"
model: "whisper-1"
response_format: "vtt"
language: "en"
prompt: "This is a technical presentation about machine learning"

Output:

text: "WEBVTT

00:00.000 --> 00:05.000
Good morning, thank you for joining us today.

00:05.000 --> 00:12.000
I'm excited to discuss the new project developments..."

Example 3: Non-English Audio with Context

Input:

audio_file: "https://example.com/spanish_podcast.mp3"
model: "whisper-1"
response_format: "json"
language: "es"
prompt: "This is a Spanish business podcast discussing marketing strategies"
temperature: 0.2

Output:

text: "Buenos días, gracias por acompañarnos. Hoy hablaremos sobre nuevas estrategias de marketing digital..."

Common Use Cases

  • Meeting Transcriptions: Convert recorded meetings into text transcripts for documentation

  • Podcast Transcription: Create searchable text versions of podcast episodes

  • Accessibility: Generate captions and transcripts for video content

  • Customer Support: Transcribe customer service calls for training and quality assurance

  • Research: Convert interview recordings into text for analysis

  • Content Creation: Generate blog post content from audio recordings

  • Legal Documentation: Create accurate transcripts of depositions and proceedings

Error Handling

Error Type
Cause
Solution

Invalid Audio Format

Audio file format not supported

Use MP3, MP4, or WebM formats

Audio Too Long

Audio file exceeds size or duration limits

Split longer audio into smaller chunks

Invalid Language Code

Language code is incorrectly formatted

Use ISO-639-1 codes (e.g., "en", "es", "fr")

Invalid Response Format

Response format not supported

Use: text, json, srt, verbose_json, or vtt

Invalid Temperature

Temperature outside range 0-1

Use a value between 0.0 and 1.0

Authentication Error

Invalid or missing API key

Verify OpenAI connection is properly configured

File Not Found

Audio file URL is invalid or inaccessible

Check URL validity and ensure file is publicly accessible

Audio Quality Issues

Audio is too unclear or noisy

Try with clearer audio or lower temperature

Timeout Error

Request took too long to process

Try with shorter audio or ensure stable connection

Notes

  • Supported Formats: MP3, MP4, WebM, and other common audio formats are supported. Audio files should be under 25MB.

  • Language Detection: Whisper can auto-detect language, but specifying it (e.g., "en", "es", "fr") improves accuracy and speed.

  • Response Formats: Choose "text" for simple transcript, "srt" or "vtt" for timestamped subtitles, "json" or "verbose_json" for detailed metadata.

  • Prompt Engineering: Use prompts to guide interpretation of technical terms, proper nouns, or to maintain consistency across segments.

  • Temperature Control: Lower temperature (0.0-0.3) for accurate transcription. Higher (0.5-1.0) for more variable interpretation.

  • Multi-language Support: Whisper supports 99+ languages. Works reliably across accents and dialects.

  • Accuracy: For best results, use clear audio with minimal background noise. Whisper is quite robust but benefits from good audio quality.

Last updated

Was this helpful?