Audio Transcription

Build a workflow to transcribe audio files, identify different speakers, and use an LLM to generate a summary and extract key topics and quotes.

This guide demonstrates how to build a powerful workflow that not only transcribes audio files but also performs a high-level analysis, including generating a summary, identifying key themes, and extracting notable quotes. This is perfect for processing interviews, meetings, or any recorded audio.

Goal

The workflow will take an audio file as input and produce a structured analysis containing:

  1. An accurate transcription with speaker labels and timestamps.

  2. A high-level summary of the conversation.

  3. A list of the main topics and themes discussed.

  4. A collection of key quotes from the audio.

Required Nodes & MCPs

  • File Input Node: To upload the audio file (e.g., MP3, WAV, M4A).

  • OpenAI MCP: To use the Whisper model for transcription and a GPT model for analysis.

  • Save to File Node: To save the final, structured analysis.

Workflow Steps

Step 1: Transcribe the Audio

The first step is to convert the audio into text using OpenAI's Whisper model.

  • Node: OpenAI MCP

  • Purpose: To get a clean transcription of the audio file.

  • Setup:

    • Action: Create Transcription

    • File: Use the output from your File Input node: {{file_input_1.file}}.

    • Response Format: verbose_json (This provides detailed segments with timestamps).

    • Enable Speaker Diarization: True (This will label the different speakers, e.g., Speaker 0, Speaker 1).

Step 2: Analyze the Transcription

Now that we have the text, we'll use a powerful GPT model to analyze it.

  • Node: OpenAI MCP (a second one)

  • Purpose: To generate a summary, identify themes, and extract quotes.

  • Setup:

    • Action: Chat

    • Model: gpt-4-turbo

    • Prompt:

      Analyze the following audio transcription. Please provide a response in a structured JSON format with three keys: "summary", "themes", and "quotes".
      
      - "summary": A concise, one-paragraph summary of the entire conversation.
      - "themes": A list of the top 5 most important topics discussed.
      - "quotes": A list of the 3-5 most impactful or representative quotes from the text, including who said them if possible.
      
      Here is the transcription:
      {{openai_mcp_1.text}}

Step 3: Save the Structured Output

Finally, we'll save the JSON output from the analysis step into a file for easy access.

  • Node: Save to File

  • Purpose: To store the complete analysis.

  • Setup:

    • File Name: analysis_output.json

    • Content: {{openai_mcp_2.choices[0].message.content}}

Final Workflow

This workflow efficiently transforms a raw audio file into a structured, analyzable JSON object. You can easily access the summary, themes, or quotes in subsequent workflow steps for further processing, such as sending the summary in an email or saving the quotes to a database.

Last updated

Was this helpful?