Extract Structured Data with Claude

Action ID: claude_extract_structured_data

Description

Extract structured data from text or images using Claude. This node allows you to define schemas in simple or advanced mode and extract data in a structured format.

Provider

Anthropic Claude

Connection

Name
Description
Required
Category

Claude Connection

The Claude connection to use for extracting structured data.

claude

Input Parameters

Name
Type
Required
Default
Description

model

dropdown

-

claude-3-haiku-20240307

The model to use for extracting structured data. Available options: claude-3-haiku-20240307, claude-3-sonnet-20240229, claude-3-opus-20240229, claude-3-5-sonnet-latest, claude-3-5-haiku-latest, claude-3-7-sonnet-latest

text

string

-

-

Text to extract structured data from.

images

array

-

-

Images to extract structured data from. Supported formats: JPG, PNG, JPEG

prompt

string

-

"Extract the following data from the provided content."

Prompt to guide the AI in extracting structured data.

schema_mode

dropdown

-

simple

Mode for defining the schema. Available options: simple, advanced

simple_schema

array

-

-

Schema definition in simple mode. Each entry should define a field with name, description, type, and whether it's required.

advanced_schema

object

-

-

JSON Schema for advanced mode. Provide a complete JSON Schema object for complex data extraction.

max_tokens

integer

-

2000

The maximum number of tokens to generate.

View JSON Schema
{
  "description": "Extract Structured Data node input.",
  "properties": {
    "model": {
      "default": "claude-3-haiku-20240307",
      "description": "The model to use for extracting structured data.",
      "enum": [
        "claude-3-haiku-20240307",
        "claude-3-sonnet-20240229",
        "claude-3-opus-20240229",
        "claude-3-5-sonnet-latest",
        "claude-3-5-haiku-latest",
        "claude-3-7-sonnet-latest"
      ],
      "title": "Model",
      "type": "string"
    },
    "text": {
      "default": null,
      "description": "Text to extract structured data from.",
      "title": "Text",
      "type": "string"
    },
    "images": {
      "default": null,
      "description": "Images to extract structured data from.",
      "items": {
        "type": "string"
      },
      "title": "Images",
      "type": "array"
    },
    "prompt": {
      "default": "Extract the following data from the provided content.",
      "description": "Prompt to guide the AI in extracting structured data.",
      "title": "Guide Prompt",
      "type": "string"
    },
    "schema_mode": {
      "default": "simple",
      "description": "Mode for defining the schema.",
      "enum": [
        "simple",
        "advanced"
      ],
      "title": "Schema Mode",
      "type": "string"
    },
    "simple_schema": {
      "default": null,
      "description": "Schema definition in simple mode.",
      "items": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string",
            "title": "Name",
            "description": "Name of the field to extract."
          },
          "description": {
            "type": "string",
            "title": "Description",
            "description": "Description of the field to extract."
          },
          "type": {
            "type": "string",
            "title": "Data Type",
            "description": "Data type of the field to extract.",
            "default": "string"
          },
          "is_required": {
            "type": "boolean",
            "title": "Required",
            "description": "Whether the field is required.",
            "default": false
          }
        }
      },
      "title": "Simple Schema",
      "type": "array"
    },
    "advanced_schema": {
      "default": null,
      "description": "JSON Schema for advanced mode.",
      "title": "Advanced Schema",
      "type": "object"
    },
    "max_tokens": {
      "default": 2000,
      "description": "The maximum number of tokens to generate.",
      "title": "Maximum Tokens",
      "type": "integer"
    }
  },
  "required": [],
  "title": "ExtractStructuredDataInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

data

object

The structured data extracted from the input

View JSON Schema
{
  "description": "Extract Structured Data node output.",
  "properties": {
    "data": {
      "title": "Extracted Data",
      "type": "object",
      "description": "The structured data extracted from the input."
    }
  },
  "required": [
    "data"
  ],
  "title": "ExtractStructuredDataOutput",
  "type": "object"
}

How It Works

This node sends your text or image content to Claude along with your defined schema. Claude analyzes the content against your schema definitions and extracts structured data that matches your specified fields. The extracted data is returned as a JSON object with the fields you defined. If your schema defines required fields but they're not found in the input, those fields may be null or omitted depending on the schema settings.

Usage Examples

Example 1: Extract Contact Information (Simple Mode)

Input:

text: "John Smith, Email: [email protected], Phone: 555-123-4567"
schema_mode: "simple"
simple_schema: [
  {"name": "full_name", "description": "Full name", "type": "string", "is_required": true},
  {"name": "email", "description": "Email address", "type": "string", "is_required": true},
  {"name": "phone", "description": "Phone number", "type": "string", "is_required": false}
]

Output:

data: {
  "full_name": "John Smith",
  "email": "[email protected]",
  "phone": "555-123-4567"
}

Example 2: Extract Product Details (Advanced Mode)

Input:

text: "Product: Blue Running Shoes, Size: 10, Price: $89.99, Colors: Blue, Red"
schema_mode: "advanced"
advanced_schema: {
  "type": "object",
  "properties": {
    "product_name": {"type": "string"},
    "size": {"type": "string"},
    "price": {"type": "number"},
    "available_colors": {"type": "array", "items": {"type": "string"}}
  }
}

Output:

data: {
  "product_name": "Blue Running Shoes",
  "size": "10",
  "price": 89.99,
  "available_colors": ["Blue", "Red"]
}

Example 3: Extract from Image

Input:

images: ["https://example.com/invoice.jpg"]
prompt: "Extract invoice details"
schema_mode: "simple"
simple_schema: [
  {"name": "invoice_number", "type": "string", "is_required": true},
  {"name": "total_amount", "type": "number", "is_required": true},
  {"name": "vendor", "type": "string", "is_required": true}
]

Output:

data: {
  "invoice_number": "INV-2024-001",
  "total_amount": 1250.50,
  "vendor": "Acme Supplies Inc"
}

Common Use Cases

  • Form Data Extraction: Extract structured information from unstructured form submissions

  • Document Processing: Pull key information from invoices, receipts, and other business documents

  • Web Scraping: Extract data from web pages and convert to structured JSON format

  • Image Analysis: Extract structured information from images like screenshots or scanned documents

  • API Response Parsing: Convert complex API responses into simplified structured formats

  • Bulk Data Migration: Transform CSV, email, or text data into consistent structured formats

  • Contact List Building: Extract names, emails, and contact details from various sources

Error Handling

Error Type
Cause
Solution

Invalid Schema

Schema definition doesn't follow JSON Schema format

Review your schema syntax and ensure it's valid JSON Schema in advanced mode

Extraction Failed

Content doesn't contain the required fields

Verify the input content has the information you're trying to extract

Null Values

Required fields not found in the input

Adjust your schema to make fields optional or improve your guide prompt

Type Mismatch

Extracted data doesn't match the defined type

Update your schema or guide prompt to clarify the expected data types

Token Limit Exceeded

Input content is too large

Reduce input size or increase max_tokens parameter

Ambiguous Schema

Schema is too vague to extract consistently

Add more detailed field descriptions in your schema definition

Notes

  • Simple Mode: Use simple mode for straightforward field extraction. Define fields with names, descriptions, data types, and required status.

  • Advanced Mode: Use advanced mode for complex schemas with nested objects, arrays, and conditional fields. Provide a complete JSON Schema.

  • Schema Design: Be specific in your schema definitions. The more detailed your schema, the more accurate the extraction.

  • Image Support: You can extract structured data from images (JPG, PNG, JPEG) in addition to text.

  • Model Selection: Opus and Sonnet models handle complex extractions better than Haiku, especially for detailed schemas.

  • Guide Prompt: Customize the guide prompt to provide additional context about how to extract and interpret the data.

Last updated

Was this helpful?