Audio Transcription
A guide to the Audio Transcription action for converting speech into written text.
The Audio Transcription Action (also known as Speech-to-Text) uses powerful AI models to convert spoken audio into written text. It can process audio files to produce an accurate, readable transcript.
This is essential for:
Transcribing interviews, meetings, or customer service calls.
Creating subtitles for videos.
Making audio content searchable.
Analyzing spoken content with text-based AI tools.
How It Works
You provide an audio file as input. The action sends this file to an advanced speech recognition model, which analyzes the audio and generates a text transcription. For the highest accuracy, it's best to use audio that is clear and has minimal background noise.
Configuration
Input Parameters
Audio
File
The audio file you want to transcribe (e.g., MP3, WAV, M4A). This should be a file object from a previous action or an uploaded file.
Language
Select / Text
The language spoken in the audio. You can select from a list or, if left blank, the model will attempt to auto-detect the language.
Output Parameters
Output
Text
The full text transcription of the audio file.
Example: Transcribe and Summarize a Customer Call
Imagine you want to automatically log and summarize every customer support call.
Get the Audio: Your workflow is triggered when a new call recording is saved. The audio file is the trigger's output.
Configure the Audio Transcription Action:
Audio: Connect the call recording file from the trigger (
{{trigger.audio_file}}
).Language: Leave blank for auto-detection, or specify "English" if all your calls are in English.
Summarize the Transcript:
The
Output
of the transcription action will be the full text of the conversation.Connect this output to an LLM Action.
Set the LLM's prompt to:
Please summarize the following customer service call transcript and identify the customer's main issue and the resolution: {{audio_transcription_action.output}}
Log the Summary:
Connect the LLM's output to a Google Sheet Action or your CRM to log the call summary for future reference.
This workflow transforms an unstructured audio recording into a concise, actionable summary, saving significant manual effort.
Last updated
Was this helpful?