Extract Structured Data

Action ID: openai_extract_structured_data

Description

Returns structured data from provided unstructured text. This node uses OpenAI's Structured Outputs capability to reliably extract and validate data according to a JSON Schema you define, ensuring the model always returns properly formatted results.

Provider

OpenAI

Connection

Name
Description
Required
Category

OpenAI Connection

The OpenAI connection to use for extracting structured data.

openai

Input Parameters

Name
Type
Required
Default
Description

model

dropdown

-

gpt-4o-2024-08-06

The model to use for extracting structured data. Use gpt-4o-2024-08-06 or later for best results.

text

string

-

The text from which to extract structured data.

schema_name

string

-

extracted_data

A name for the schema (required by OpenAI).

json_schema

object

-

The JSON Schema that defines the structure of the data to extract. Must be a valid JSON Schema object with type='object', properties defined, and additionalProperties=false.

strict

boolean

-

true

Whether to enforce strict schema validation. When true, the model will always generate responses that adhere to the supplied schema.

system_prompt

string

-

-

Optional system prompt to guide the extraction process.

View JSON Schema
{
  "description": "Extract Structured Data node input.",
  "properties": {
    "model": {
      "default": "gpt-4o-2024-08-06",
      "description": "The model to use for extracting structured data. Use gpt-4o-2024-08-06 or later for best results.",
      "title": "Model",
      "type": "string"
    },
    "text": {
      "description": "The text from which to extract structured data.",
      "title": "Unstructured Text",
      "type": "string"
    },
    "schema_name": {
      "default": "extracted_data",
      "description": "A name for the schema (required by OpenAI).",
      "title": "Schema Name",
      "type": "string"
    },
    "json_schema": {
      "description": "The JSON Schema that defines the structure of the data to extract. Must be a valid JSON Schema object with type='object', properties defined, and additionalProperties=false.",
      "title": "JSON Schema",
      "type": "object"
    },
    "strict": {
      "default": true,
      "description": "Whether to enforce strict schema validation. When true, the model will always generate responses that adhere to the supplied schema.",
      "title": "Strict Mode",
      "type": "boolean"
    },
    "system_prompt": {
      "description": "Optional system prompt to guide the extraction process.",
      "title": "System Prompt",
      "type": "string"
    }
  },
  "required": [
    "text",
    "json_schema"
  ],
  "title": "ExtractStructuredDataInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

data

object

The extracted structured data matching the provided schema.

View JSON Schema
{
  "description": "Response from extracting structured data.",
  "properties": {
    "data": {
      "description": "The extracted structured data.",
      "title": "Data",
      "type": "object"
    }
  },
  "title": "ExtractStructuredDataResponse",
  "type": "object"
}

How It Works

This node leverages OpenAI's Structured Outputs feature to reliably extract data from unstructured text. You provide a JSON Schema that describes the structure you want to extract, and the model uses this schema to guide its response. In strict mode (default), the model is constrained to always return valid JSON that conforms to your schema, eliminating the need for complex parsing or validation logic. The extracted data is returned directly as a JSON object.

Usage Examples

Example 1: Extract Contact Information

Input:

text: "John Smith, Software Engineer, works at Acme Corp. His email is [email protected] and phone is 555-1234."
schema_name: "contact_info"
json_schema: {
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "title": {"type": "string"},
    "company": {"type": "string"},
    "email": {"type": "string"},
    "phone": {"type": "string"}
  },
  "required": ["name", "email"],
  "additionalProperties": false
}
strict: true

Output:

data: {
  "name": "John Smith",
  "title": "Software Engineer",
  "company": "Acme Corp",
  "email": "[email protected]",
  "phone": "555-1234"
}

Example 2: Extract Product Information

Input:

text: "The TechWidget 3000 is a revolutionary device that costs $299.99 and comes in silver, blue, and black colors. It has a 2-year warranty and is rated 4.5 stars."
schema_name: "product"
json_schema: {
  "type": "object",
  "properties": {
    "product_name": {"type": "string"},
    "price": {"type": "number"},
    "colors": {"type": "array", "items": {"type": "string"}},
    "warranty_years": {"type": "integer"},
    "rating": {"type": "number"}
  },
  "required": ["product_name", "price"],
  "additionalProperties": false
}
strict: true

Output:

data: {
  "product_name": "TechWidget 3000",
  "price": 299.99,
  "colors": ["silver", "blue", "black"],
  "warranty_years": 2,
  "rating": 4.5
}

Common Use Cases

  • Data Extraction: Extract structured information from unstructured documents

  • Form Filling: Automatically populate form fields from text descriptions

  • Information Parsing: Parse complex text into organized data structures

  • Document Processing: Extract key information from PDFs, emails, or documents

  • API Data Preparation: Convert text into JSON for downstream APIs

  • Data Classification: Categorize and structure unstructured information

  • Entity Recognition: Extract and organize entities with relationships

Error Handling

Error Type
Cause
Solution

Invalid Schema

Schema missing required fields or incorrectly formatted

Ensure schema has type='object', properties defined, and additionalProperties=false

Schema Validation Failed

Model cannot strictly adhere to schema

Simplify schema, use more flexible field types, or disable strict mode

Model Not Found

Selected model doesn't exist or access not granted

Verify model availability and ensure API key has access

Extraction Failed

Model cannot extract data from provided text

Provide clearer text with more explicit information

Authentication Error

Invalid or missing OpenAI API key

Verify your OpenAI connection is properly configured

Token Limit Exceeded

Text is too long for model's context window

Reduce text length or use a model with larger context

Notes

  • Schema Requirements: Schemas must have type='object', define properties, and set additionalProperties=false for validation to work correctly.

  • Strict Mode: Highly recommended for reliable extraction. When enabled, responses are guaranteed to conform to your schema.

  • Model Selection: Use gpt-4o-2024-08-06 or later for best results with Structured Outputs.

  • System Prompts: Use system prompts to guide extraction behavior, such as "Extract only explicitly stated information" or "Infer implied values where reasonable."

  • Nested Objects: You can define nested properties for complex hierarchical data structures.

  • Array Handling: Use array types for extracting lists of items with consistent structure.

Last updated

Was this helpful?