Extract Structured Data
Action ID: openai_extract_structured_data
Description
Returns structured data from provided unstructured text. This node uses OpenAI's Structured Outputs capability to reliably extract and validate data according to a JSON Schema you define, ensuring the model always returns properly formatted results.
Provider
OpenAI
Connection
OpenAI Connection
The OpenAI connection to use for extracting structured data.
✓
openai
Input Parameters
model
dropdown
-
gpt-4o-2024-08-06
The model to use for extracting structured data. Use gpt-4o-2024-08-06 or later for best results.
text
string
✓
-
The text from which to extract structured data.
schema_name
string
-
extracted_data
A name for the schema (required by OpenAI).
json_schema
object
✓
-
The JSON Schema that defines the structure of the data to extract. Must be a valid JSON Schema object with type='object', properties defined, and additionalProperties=false.
strict
boolean
-
true
Whether to enforce strict schema validation. When true, the model will always generate responses that adhere to the supplied schema.
system_prompt
string
-
-
Optional system prompt to guide the extraction process.
Output Parameters
data
object
The extracted structured data matching the provided schema.
How It Works
This node leverages OpenAI's Structured Outputs feature to reliably extract data from unstructured text. You provide a JSON Schema that describes the structure you want to extract, and the model uses this schema to guide its response. In strict mode (default), the model is constrained to always return valid JSON that conforms to your schema, eliminating the need for complex parsing or validation logic. The extracted data is returned directly as a JSON object.
Usage Examples
Example 1: Extract Contact Information
Input:
text: "John Smith, Software Engineer, works at Acme Corp. His email is [email protected] and phone is 555-1234."
schema_name: "contact_info"
json_schema: {
"type": "object",
"properties": {
"name": {"type": "string"},
"title": {"type": "string"},
"company": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": "string"}
},
"required": ["name", "email"],
"additionalProperties": false
}
strict: trueOutput:
data: {
"name": "John Smith",
"title": "Software Engineer",
"company": "Acme Corp",
"email": "[email protected]",
"phone": "555-1234"
}Example 2: Extract Product Information
Input:
text: "The TechWidget 3000 is a revolutionary device that costs $299.99 and comes in silver, blue, and black colors. It has a 2-year warranty and is rated 4.5 stars."
schema_name: "product"
json_schema: {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "number"},
"colors": {"type": "array", "items": {"type": "string"}},
"warranty_years": {"type": "integer"},
"rating": {"type": "number"}
},
"required": ["product_name", "price"],
"additionalProperties": false
}
strict: trueOutput:
data: {
"product_name": "TechWidget 3000",
"price": 299.99,
"colors": ["silver", "blue", "black"],
"warranty_years": 2,
"rating": 4.5
}Common Use Cases
Data Extraction: Extract structured information from unstructured documents
Form Filling: Automatically populate form fields from text descriptions
Information Parsing: Parse complex text into organized data structures
Document Processing: Extract key information from PDFs, emails, or documents
API Data Preparation: Convert text into JSON for downstream APIs
Data Classification: Categorize and structure unstructured information
Entity Recognition: Extract and organize entities with relationships
Error Handling
Invalid Schema
Schema missing required fields or incorrectly formatted
Ensure schema has type='object', properties defined, and additionalProperties=false
Schema Validation Failed
Model cannot strictly adhere to schema
Simplify schema, use more flexible field types, or disable strict mode
Model Not Found
Selected model doesn't exist or access not granted
Verify model availability and ensure API key has access
Extraction Failed
Model cannot extract data from provided text
Provide clearer text with more explicit information
Authentication Error
Invalid or missing OpenAI API key
Verify your OpenAI connection is properly configured
Token Limit Exceeded
Text is too long for model's context window
Reduce text length or use a model with larger context
Notes
Schema Requirements: Schemas must have type='object', define properties, and set additionalProperties=false for validation to work correctly.
Strict Mode: Highly recommended for reliable extraction. When enabled, responses are guaranteed to conform to your schema.
Model Selection: Use gpt-4o-2024-08-06 or later for best results with Structured Outputs.
System Prompts: Use system prompts to guide extraction behavior, such as "Extract only explicitly stated information" or "Infer implied values where reasonable."
Nested Objects: You can define nested properties for complex hierarchical data structures.
Array Handling: Use array types for extracting lists of items with consistent structure.
Last updated
Was this helpful?