Extract Content
Action ID: extract_content
Description
Extract structured content from text using a specified schema.
Input Parameters
extract_from
string
✓
-
The content to extract from
extract_schema
object
✓
-
A JSON schema representing the structure for extracted content
model
dropdown
-
gpt-3.5-turbo-0613
The LLM model to use for extracting content. Available options: gpt-3.5-turbo-0613, gpt-4-32k-0613
Output Parameters
extracted_content
array
The extracted content from the text as an array of objects matching the provided schema
How It Works
This node uses large language models (LLMs) to intelligently parse unstructured text and extract specific information based on a JSON schema you define. The LLM analyzes the input text, identifies relevant data points matching your schema structure, and returns the extracted information in a structured, consistent format. This enables automated data extraction from emails, documents, web content, and other text sources.
Usage Examples
Example 1: Extract Contact Information
Input:
extract_from: "John Smith is the CEO. You can reach him at [email protected] or call (555) 123-4567. His office is at 123 Main St, New York, NY 10001."
extract_schema: {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": "string"},
"address": {"type": "string"}
}
}
model: "gpt-3.5-turbo-0613"Output:
extracted_content: [
{
"name": "John Smith",
"email": "[email protected]",
"phone": "(555) 123-4567",
"address": "123 Main St, New York, NY 10001"
}
]Example 2: Extract Product Information
Input:
extract_from: "We have the iPhone 15 Pro available for $999. The Samsung Galaxy S24 costs $899 and comes in blue, black, and silver. The Google Pixel 8 is priced at $699."
extract_schema: {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "number"},
"colors": {"type": "array", "items": {"type": "string"}}
}
}
model: "gpt-4-32k-0613"Output:
extracted_content: [
{
"product_name": "iPhone 15 Pro",
"price": 999,
"colors": []
},
{
"product_name": "Samsung Galaxy S24",
"price": 899,
"colors": ["blue", "black", "silver"]
},
{
"product_name": "Google Pixel 8",
"price": 699,
"colors": []
}
]Example 3: Extract Event Details
Input:
extract_from: "Join us for the Tech Summit 2024 on March 15-17 at the Convention Center. Registration opens at 8 AM. Keynote speaker: Dr. Jane Doe. Cost: $299 for early bird."
extract_schema: {
"type": "object",
"properties": {
"event_name": {"type": "string"},
"dates": {"type": "string"},
"location": {"type": "string"},
"speaker": {"type": "string"},
"price": {"type": "number"}
}
}
model: "gpt-3.5-turbo-0613"Output:
extracted_content: [
{
"event_name": "Tech Summit 2024",
"dates": "March 15-17",
"location": "Convention Center",
"speaker": "Dr. Jane Doe",
"price": 299
}
]Common Use Cases
Email Processing: Extract key information like names, dates, and action items from emails
Document Parsing: Pull structured data from invoices, contracts, and business documents
Web Scraping: Extract specific data points from web page content
Customer Data Extraction: Parse customer inquiries to extract contact details and requirements
Product Catalog Creation: Extract product details from descriptions to build structured catalogs
Resume Parsing: Extract candidate information like skills, experience, and education from resumes
Lead Generation: Extract business contact information from various text sources
Error Handling
Invalid Schema
JSON schema is malformed or invalid
Validate your JSON schema structure and ensure it follows proper JSON syntax
Model Error
LLM API is unavailable or rate limited
Retry the operation or switch to an alternative model
Empty Content
extract_from field is empty or null
Provide valid text content for extraction
Schema Mismatch
Content doesn't match expected schema structure
Adjust schema to match the actual content structure or provide appropriate content
Token Limit Exceeded
Input text is too long for the model
Split text into smaller chunks or use gpt-4-32k-0613 for larger content
Authentication Failed
Invalid API credentials
Verify your OpenAI API connection and credentials
Extraction Failed
LLM unable to extract matching data
Simplify the schema or provide more explicit content that matches the schema
Notes
Schema Design: Design clear, specific schemas that match your expected data structure. Include field types and nested objects as needed.
Model Selection: Use gpt-3.5-turbo-0613 for simple extractions and gpt-4-32k-0613 for complex content or higher accuracy requirements.
Content Quality: Better structured input text produces more accurate extraction results.
Array Output: The node returns an array of objects, allowing extraction of multiple items from a single text source.
Data Types: Ensure your schema specifies appropriate data types (string, number, boolean, array, object) for accurate extraction.
Performance: Extraction time varies based on content length and model choice. GPT-4 is slower but more accurate than GPT-3.5.
Last updated
Was this helpful?