# Text-to-Speech (OpenAI)

**Action ID:** `openai_text_to_speech`

## Description

Generate an audio recording from text using OpenAI's text-to-speech API. This node converts written text into natural-sounding speech using various voice options and audio formats, supporting multiple quality levels and playback speeds.

## Provider

**OpenAI**

## Connection

| Name              | Description                                                 | Required | Category |
| ----------------- | ----------------------------------------------------------- | :------: | -------- |
| OpenAI Connection | The OpenAI connection to use for text-to-speech conversion. |     ✓    | openai   |

## Input Parameters

| Name       | Type     | Required | Default | Description                                                                          |
| ---------- | -------- | :------: | ------- | ------------------------------------------------------------------------------------ |
| model      | dropdown |     -    | tts-1   | The model which will generate the audio. Options: tts-1, tts-1-hd                    |
| text       | string   |     ✓    | -       | The text you want to convert to speech.                                              |
| voice      | dropdown |     -    | alloy   | The voice to generate the audio in. Options: alloy, echo, fable, onyx, nova, shimmer |
| format     | dropdown |     -    | mp3     | The format you want the audio file in. Options: mp3, opus, aac, flac                 |
| speed      | number   |     -    | 1.0     | The speed of the audio. Minimum is 0.25 and maximum is 4.00.                         |
| file\_name | string   |     -    | audio   | The name of the output audio file (without extension).                               |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Text-to-Speech node input.",
  "properties": {
    "model": {
      "default": "tts-1",
      "description": "The model which will generate the audio.",
      "enum": [
        "tts-1",
        "tts-1-hd"
      ],
      "title": "Model",
      "type": "string"
    },
    "text": {
      "description": "The text you want to convert to speech.",
      "title": "Text",
      "type": "string"
    },
    "voice": {
      "default": "alloy",
      "description": "The voice to generate the audio in.",
      "enum": [
        "alloy",
        "echo",
        "fable",
        "onyx",
        "nova",
        "shimmer"
      ],
      "title": "Voice",
      "type": "string"
    },
    "format": {
      "default": "mp3",
      "description": "The format you want the audio file in.",
      "enum": [
        "mp3",
        "opus",
        "aac",
        "flac"
      ],
      "title": "Output Format",
      "type": "string"
    },
    "speed": {
      "default": 1.0,
      "description": "The speed of the audio. Minimum is 0.25 and maximum is 4.00.",
      "maximum": 4.0,
      "minimum": 0.25,
      "title": "Speed",
      "type": "number"
    },
    "file_name": {
      "default": "audio",
      "description": "The name of the output audio file (without extension).",
      "title": "File Name",
      "type": "string"
    }
  },
  "required": [
    "text"
  ],
  "title": "TextToSpeechInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name   | Type   | Description                             |
| ------ | ------ | --------------------------------------- |
| url    | string | URL to the generated audio file.        |
| format | string | The format of the generated audio file. |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Response from text-to-speech conversion.",
  "properties": {
    "url": {
      "title": "Url",
      "type": "string"
    },
    "format": {
      "title": "Format",
      "type": "string"
    }
  },
  "title": "TextToSpeechResponse",
  "type": "object"
}
```

</details>

## How It Works

This node sends your text to OpenAI's text-to-speech API along with your selected voice and quality settings. The model converts the text into natural-sounding speech using the chosen voice profile. You can adjust the playback speed, select from six different voice options, and choose your preferred audio format. The generated audio file is returned as a URL that can be played, downloaded, or used in subsequent workflow steps.

## Usage Examples

### Example 1: Standard Quality Marketing Voiceover

**Input:**

```
model: "tts-1"
text: "Welcome to our premium product line. Experience quality and innovation combined."
voice: "nova"
format: "mp3"
speed: 1.0
file_name: "marketing_voiceover"
```

**Output:**

```
url: "https://api.openai.com/v1/audio/speech/..."
format: "mp3"
```

### Example 2: High-Quality Audiobook

**Input:**

```
model: "tts-1-hd"
text: "Chapter 1: The Beginning. It was a dark and stormy night..."
voice: "fable"
format: "aac"
speed: 0.9
file_name: "audiobook_chapter_1"
```

**Output:**

```
url: "https://api.openai.com/v1/audio/speech/..."
format: "aac"
```

### Example 3: Fast-Paced Notification

**Input:**

```
model: "tts-1"
text: "Alert! System update available. Please restart your computer."
voice: "onyx"
format: "opus"
speed: 1.3
file_name: "system_alert"
```

**Output:**

```
url: "https://api.openai.com/v1/audio/speech/..."
format: "opus"
```

## Common Use Cases

* **Audiobook Generation**: Create audiobooks from written text content
* **Voiceovers**: Generate professional voiceovers for videos and presentations
* **Accessibility**: Convert written content to audio for accessibility purposes
* **Notifications**: Create audio notifications and alerts
* **Interactive Voice Responses**: Generate dynamic responses for voice applications
* **Language Learning**: Create pronunciation audio for language learning materials
* **Marketing**: Generate professional marketing voiceovers and promotional audio

## Error Handling

| Error Type           | Cause                                                       | Solution                                              |
| -------------------- | ----------------------------------------------------------- | ----------------------------------------------------- |
| Text Too Long        | Input text exceeds maximum allowed length (4096 characters) | Split text into smaller chunks and process separately |
| Invalid Model        | Model name doesn't exist                                    | Use either tts-1 or tts-1-hd                          |
| Invalid Voice        | Voice name doesn't exist or is misspelled                   | Select from: alloy, echo, fable, onyx, nova, shimmer  |
| Invalid Format       | Audio format not supported                                  | Use: mp3, opus, aac, or flac                          |
| Invalid Speed        | Speed is outside range 0.25-4.0                             | Ensure speed is between 0.25 and 4.0                  |
| Authentication Error | Invalid or missing API key                                  | Verify OpenAI connection is properly configured       |
| Timeout Error        | Request took too long to process                            | Try with shorter text or simpler settings             |
| Rate Limit Exceeded  | Too many requests in a short time                           | Implement delays between requests                     |

## Notes

* **Model Selection**: tts-1 is faster and cheaper but may produce lower quality audio. tts-1-hd produces higher quality but is slower and more expensive.
* **Voice Options**: Try different voices (alloy, echo, fable, onyx, nova, shimmer) to match your brand personality or content tone.
* **Speed Control**: Range is 0.25 (slowest) to 4.0 (fastest). Use 0.9-1.1 for natural-sounding speech.
* **Format Selection**: MP3 is widely compatible. FLAC provides lossless compression. OPUS and AAC are modern efficient formats.
* **Text Limitations**: Maximum 4096 characters per request. Plan for multiple requests for longer content.
* **Audio Storage**: URLs may expire. Download or persist audio if long-term storage is needed.
* **Cost Optimization**: tts-1 is significantly cheaper. Only use tts-1-hd when high quality is critical.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/openai_text_to_speech.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
