Audio Transcription
Action ID: openai_transcriptions
Description
Transcribe audio to text using OpenAI's Whisper model. This node converts spoken audio in multiple formats into accurate written text transcripts. Whisper supports numerous languages, accents, and technical language, making it highly versatile for audio processing workflows.
Provider
OpenAI
Connection
OpenAI Connection
The OpenAI connection to use for transcription.
✓
openai
Input Parameters
model
dropdown
-
whisper-1
The model to use for transcription.
audio_file
string
✓
-
The audio file to transcribe. Accepts: file_id, HTTP URL, or uploaded file. Supported formats: MP3, MP4, WebM.
response_format
dropdown
-
text
The format of the transcript output. Options: json, text, srt, verbose_json, vtt
language
string
-
-
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
prompt
string
-
-
An optional text to guide the model's style or continue a previous audio segment.
temperature
number
-
0.0
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
Output Parameters
text
string
The transcribed text from the audio file.
How It Works
This node sends your audio file to OpenAI's Whisper API for transcription. Whisper automatically detects the language if not specified, though providing the language code (ISO-639-1 format like "en" for English, "fr" for French) improves accuracy and speed. The response format determines how the transcription is returned: plain text, JSON with metadata, or timestamped formats (SRT/VTT). Optional prompts can guide the model's interpretation of ambiguous words or maintain consistency with previous segments.
Usage Examples
Example 1: Simple Audio Transcription
Input:
audio_file: "https://example.com/interview.mp3"
model: "whisper-1"
response_format: "text"
language: "en"
temperature: 0.0Output:
text: "Good morning, thank you for joining us today. I'm excited to discuss the new project developments..."Example 2: Timestamped Video Subtitle Transcription
Input:
audio_file: "file_id_abc123"
model: "whisper-1"
response_format: "vtt"
language: "en"
prompt: "This is a technical presentation about machine learning"Output:
text: "WEBVTT
00:00.000 --> 00:05.000
Good morning, thank you for joining us today.
00:05.000 --> 00:12.000
I'm excited to discuss the new project developments..."Example 3: Non-English Audio with Context
Input:
audio_file: "https://example.com/spanish_podcast.mp3"
model: "whisper-1"
response_format: "json"
language: "es"
prompt: "This is a Spanish business podcast discussing marketing strategies"
temperature: 0.2Output:
text: "Buenos días, gracias por acompañarnos. Hoy hablaremos sobre nuevas estrategias de marketing digital..."Common Use Cases
Meeting Transcriptions: Convert recorded meetings into text transcripts for documentation
Podcast Transcription: Create searchable text versions of podcast episodes
Accessibility: Generate captions and transcripts for video content
Customer Support: Transcribe customer service calls for training and quality assurance
Research: Convert interview recordings into text for analysis
Content Creation: Generate blog post content from audio recordings
Legal Documentation: Create accurate transcripts of depositions and proceedings
Error Handling
Invalid Audio Format
Audio file format not supported
Use MP3, MP4, or WebM formats
Audio Too Long
Audio file exceeds size or duration limits
Split longer audio into smaller chunks
Invalid Language Code
Language code is incorrectly formatted
Use ISO-639-1 codes (e.g., "en", "es", "fr")
Invalid Response Format
Response format not supported
Use: text, json, srt, verbose_json, or vtt
Invalid Temperature
Temperature outside range 0-1
Use a value between 0.0 and 1.0
Authentication Error
Invalid or missing API key
Verify OpenAI connection is properly configured
File Not Found
Audio file URL is invalid or inaccessible
Check URL validity and ensure file is publicly accessible
Audio Quality Issues
Audio is too unclear or noisy
Try with clearer audio or lower temperature
Timeout Error
Request took too long to process
Try with shorter audio or ensure stable connection
Notes
Supported Formats: MP3, MP4, WebM, and other common audio formats are supported. Audio files should be under 25MB.
Language Detection: Whisper can auto-detect language, but specifying it (e.g., "en", "es", "fr") improves accuracy and speed.
Response Formats: Choose "text" for simple transcript, "srt" or "vtt" for timestamped subtitles, "json" or "verbose_json" for detailed metadata.
Prompt Engineering: Use prompts to guide interpretation of technical terms, proper nouns, or to maintain consistency across segments.
Temperature Control: Lower temperature (0.0-0.3) for accurate transcription. Higher (0.5-1.0) for more variable interpretation.
Multi-language Support: Whisper supports 99+ languages. Works reliably across accents and dialects.
Accuracy: For best results, use clear audio with minimal background noise. Whisper is quite robust but benefits from good audio quality.
Last updated
Was this helpful?