# Audio Transcription

**Action ID:** `openai_transcriptions`

## Description

Transcribe audio to text using OpenAI's Whisper model. This node converts spoken audio in multiple formats into accurate written text transcripts. Whisper supports numerous languages, accents, and technical language, making it highly versatile for audio processing workflows.

## Provider

**OpenAI**

## Connection

| Name              | Description                                     | Required | Category |
| ----------------- | ----------------------------------------------- | :------: | -------- |
| OpenAI Connection | The OpenAI connection to use for transcription. |     ✓    | openai   |

## Input Parameters

| Name             | Type     | Required | Default   | Description                                                                                                                                                                  |
| ---------------- | -------- | :------: | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| model            | dropdown |     -    | whisper-1 | The model to use for transcription.                                                                                                                                          |
| audio\_file      | string   |     ✓    | -         | The audio file to transcribe. Accepts: file\_id, HTTP URL, or uploaded file. Supported formats: MP3, MP4, WebM.                                                              |
| response\_format | dropdown |     -    | text      | The format of the transcript output. Options: json, text, srt, verbose\_json, vtt                                                                                            |
| language         | string   |     -    | -         | The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.                                                         |
| prompt           | string   |     -    | -         | An optional text to guide the model's style or continue a previous audio segment.                                                                                            |
| temperature      | number   |     -    | 0.0       | The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Transcription node input.",
  "properties": {
    "model": {
      "default": "whisper-1",
      "description": "The model to use for transcription.",
      "enum": [
        "whisper-1"
      ],
      "title": "Model",
      "type": "string"
    },
    "audio_file": {
      "description": "The audio file to transcribe.",
      "title": "Audio File",
      "type": "string"
    },
    "response_format": {
      "default": "text",
      "description": "The format of the transcript output.",
      "enum": [
        "json",
        "text",
        "srt",
        "verbose_json",
        "vtt"
      ],
      "title": "Response Format",
      "type": "string"
    },
    "language": {
      "description": "The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.",
      "title": "Language",
      "type": "string"
    },
    "prompt": {
      "description": "An optional text to guide the model's style or continue a previous audio segment.",
      "title": "Prompt",
      "type": "string"
    },
    "temperature": {
      "default": 0.0,
      "description": "The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.",
      "maximum": 1.0,
      "minimum": 0.0,
      "title": "Temperature",
      "type": "number"
    }
  },
  "required": [
    "audio_file"
  ],
  "title": "TranscriptionInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name | Type   | Description                               |
| ---- | ------ | ----------------------------------------- |
| text | string | The transcribed text from the audio file. |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Response from transcription.",
  "properties": {
    "text": {
      "title": "Text",
      "type": "string"
    }
  },
  "title": "TranscriptionResponse",
  "type": "object"
}
```

</details>

## How It Works

This node sends your audio file to OpenAI's Whisper API for transcription. Whisper automatically detects the language if not specified, though providing the language code (ISO-639-1 format like "en" for English, "fr" for French) improves accuracy and speed. The response format determines how the transcription is returned: plain text, JSON with metadata, or timestamped formats (SRT/VTT). Optional prompts can guide the model's interpretation of ambiguous words or maintain consistency with previous segments.

## Usage Examples

### Example 1: Simple Audio Transcription

**Input:**

```
audio_file: "https://example.com/interview.mp3"
model: "whisper-1"
response_format: "text"
language: "en"
temperature: 0.0
```

**Output:**

```
text: "Good morning, thank you for joining us today. I'm excited to discuss the new project developments..."
```

### Example 2: Timestamped Video Subtitle Transcription

**Input:**

```
audio_file: "file_id_abc123"
model: "whisper-1"
response_format: "vtt"
language: "en"
prompt: "This is a technical presentation about machine learning"
```

**Output:**

```
text: "WEBVTT

00:00.000 --> 00:05.000
Good morning, thank you for joining us today.

00:05.000 --> 00:12.000
I'm excited to discuss the new project developments..."
```

### Example 3: Non-English Audio with Context

**Input:**

```
audio_file: "https://example.com/spanish_podcast.mp3"
model: "whisper-1"
response_format: "json"
language: "es"
prompt: "This is a Spanish business podcast discussing marketing strategies"
temperature: 0.2
```

**Output:**

```
text: "Buenos días, gracias por acompañarnos. Hoy hablaremos sobre nuevas estrategias de marketing digital..."
```

## Common Use Cases

* **Meeting Transcriptions**: Convert recorded meetings into text transcripts for documentation
* **Podcast Transcription**: Create searchable text versions of podcast episodes
* **Accessibility**: Generate captions and transcripts for video content
* **Customer Support**: Transcribe customer service calls for training and quality assurance
* **Research**: Convert interview recordings into text for analysis
* **Content Creation**: Generate blog post content from audio recordings
* **Legal Documentation**: Create accurate transcripts of depositions and proceedings

## Error Handling

| Error Type              | Cause                                      | Solution                                                  |
| ----------------------- | ------------------------------------------ | --------------------------------------------------------- |
| Invalid Audio Format    | Audio file format not supported            | Use MP3, MP4, or WebM formats                             |
| Audio Too Long          | Audio file exceeds size or duration limits | Split longer audio into smaller chunks                    |
| Invalid Language Code   | Language code is incorrectly formatted     | Use ISO-639-1 codes (e.g., "en", "es", "fr")              |
| Invalid Response Format | Response format not supported              | Use: text, json, srt, verbose\_json, or vtt               |
| Invalid Temperature     | Temperature outside range 0-1              | Use a value between 0.0 and 1.0                           |
| Authentication Error    | Invalid or missing API key                 | Verify OpenAI connection is properly configured           |
| File Not Found          | Audio file URL is invalid or inaccessible  | Check URL validity and ensure file is publicly accessible |
| Audio Quality Issues    | Audio is too unclear or noisy              | Try with clearer audio or lower temperature               |
| Timeout Error           | Request took too long to process           | Try with shorter audio or ensure stable connection        |

## Notes

* **Supported Formats**: MP3, MP4, WebM, and other common audio formats are supported. Audio files should be under 25MB.
* **Language Detection**: Whisper can auto-detect language, but specifying it (e.g., "en", "es", "fr") improves accuracy and speed.
* **Response Formats**: Choose "text" for simple transcript, "srt" or "vtt" for timestamped subtitles, "json" or "verbose\_json" for detailed metadata.
* **Prompt Engineering**: Use prompts to guide interpretation of technical terms, proper nouns, or to maintain consistency across segments.
* **Temperature Control**: Lower temperature (0.0-0.3) for accurate transcription. Higher (0.5-1.0) for more variable interpretation.
* **Multi-language Support**: Whisper supports 99+ languages. Works reliably across accents and dialects.
* **Accuracy**: For best results, use clear audio with minimal background noise. Whisper is quite robust but benefits from good audio quality.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/openai_transcriptions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
