# Extract Structured Data with Claude

**Action ID:** `claude_extract_structured_data`

## Description

Extract structured data from text or images using Claude. This node allows you to define schemas in simple or advanced mode and extract data in a structured format.

## Provider

**Anthropic Claude**

## Connection

| Name              | Description                                                  | Required | Category |
| ----------------- | ------------------------------------------------------------ | :------: | -------- |
| Claude Connection | The Claude connection to use for extracting structured data. |     ✓    | claude   |

## Input Parameters

| Name             | Type     | Required | Default                                                 | Description                                                                                                                                                                                                                |
| ---------------- | -------- | :------: | ------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| model            | dropdown |     -    | claude-3-haiku-20240307                                 | The model to use for extracting structured data. Available options: claude-3-haiku-20240307, claude-3-sonnet-20240229, claude-3-opus-20240229, claude-3-5-sonnet-latest, claude-3-5-haiku-latest, claude-3-7-sonnet-latest |
| text             | string   |     -    | -                                                       | Text to extract structured data from.                                                                                                                                                                                      |
| images           | array    |     -    | -                                                       | Images to extract structured data from. Supported formats: JPG, PNG, JPEG                                                                                                                                                  |
| prompt           | string   |     -    | "Extract the following data from the provided content." | Prompt to guide the AI in extracting structured data.                                                                                                                                                                      |
| schema\_mode     | dropdown |     -    | simple                                                  | Mode for defining the schema. Available options: simple, advanced                                                                                                                                                          |
| simple\_schema   | array    |     -    | -                                                       | Schema definition in simple mode. Each entry should define a field with name, description, type, and whether it's required.                                                                                                |
| advanced\_schema | object   |     -    | -                                                       | JSON Schema for advanced mode. Provide a complete JSON Schema object for complex data extraction.                                                                                                                          |
| max\_tokens      | integer  |     -    | 2000                                                    | The maximum number of tokens to generate.                                                                                                                                                                                  |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Extract Structured Data node input.",
  "properties": {
    "model": {
      "default": "claude-3-haiku-20240307",
      "description": "The model to use for extracting structured data.",
      "enum": [
        "claude-3-haiku-20240307",
        "claude-3-sonnet-20240229",
        "claude-3-opus-20240229",
        "claude-3-5-sonnet-latest",
        "claude-3-5-haiku-latest",
        "claude-3-7-sonnet-latest"
      ],
      "title": "Model",
      "type": "string"
    },
    "text": {
      "default": null,
      "description": "Text to extract structured data from.",
      "title": "Text",
      "type": "string"
    },
    "images": {
      "default": null,
      "description": "Images to extract structured data from.",
      "items": {
        "type": "string"
      },
      "title": "Images",
      "type": "array"
    },
    "prompt": {
      "default": "Extract the following data from the provided content.",
      "description": "Prompt to guide the AI in extracting structured data.",
      "title": "Guide Prompt",
      "type": "string"
    },
    "schema_mode": {
      "default": "simple",
      "description": "Mode for defining the schema.",
      "enum": [
        "simple",
        "advanced"
      ],
      "title": "Schema Mode",
      "type": "string"
    },
    "simple_schema": {
      "default": null,
      "description": "Schema definition in simple mode.",
      "items": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string",
            "title": "Name",
            "description": "Name of the field to extract."
          },
          "description": {
            "type": "string",
            "title": "Description",
            "description": "Description of the field to extract."
          },
          "type": {
            "type": "string",
            "title": "Data Type",
            "description": "Data type of the field to extract.",
            "default": "string"
          },
          "is_required": {
            "type": "boolean",
            "title": "Required",
            "description": "Whether the field is required.",
            "default": false
          }
        }
      },
      "title": "Simple Schema",
      "type": "array"
    },
    "advanced_schema": {
      "default": null,
      "description": "JSON Schema for advanced mode.",
      "title": "Advanced Schema",
      "type": "object"
    },
    "max_tokens": {
      "default": 2000,
      "description": "The maximum number of tokens to generate.",
      "title": "Maximum Tokens",
      "type": "integer"
    }
  },
  "required": [],
  "title": "ExtractStructuredDataInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name | Type   | Description                                  |
| ---- | ------ | -------------------------------------------- |
| data | object | The structured data extracted from the input |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Extract Structured Data node output.",
  "properties": {
    "data": {
      "title": "Extracted Data",
      "type": "object",
      "description": "The structured data extracted from the input."
    }
  },
  "required": [
    "data"
  ],
  "title": "ExtractStructuredDataOutput",
  "type": "object"
}
```

</details>

## How It Works

This node sends your text or image content to Claude along with your defined schema. Claude analyzes the content against your schema definitions and extracts structured data that matches your specified fields. The extracted data is returned as a JSON object with the fields you defined. If your schema defines required fields but they're not found in the input, those fields may be null or omitted depending on the schema settings.

## Usage Examples

### Example 1: Extract Contact Information (Simple Mode)

**Input:**

```
text: "John Smith, Email: john.smith@example.com, Phone: 555-123-4567"
schema_mode: "simple"
simple_schema: [
  {"name": "full_name", "description": "Full name", "type": "string", "is_required": true},
  {"name": "email", "description": "Email address", "type": "string", "is_required": true},
  {"name": "phone", "description": "Phone number", "type": "string", "is_required": false}
]
```

**Output:**

```
data: {
  "full_name": "John Smith",
  "email": "john.smith@example.com",
  "phone": "555-123-4567"
}
```

### Example 2: Extract Product Details (Advanced Mode)

**Input:**

```
text: "Product: Blue Running Shoes, Size: 10, Price: $89.99, Colors: Blue, Red"
schema_mode: "advanced"
advanced_schema: {
  "type": "object",
  "properties": {
    "product_name": {"type": "string"},
    "size": {"type": "string"},
    "price": {"type": "number"},
    "available_colors": {"type": "array", "items": {"type": "string"}}
  }
}
```

**Output:**

```
data: {
  "product_name": "Blue Running Shoes",
  "size": "10",
  "price": 89.99,
  "available_colors": ["Blue", "Red"]
}
```

### Example 3: Extract from Image

**Input:**

```
images: ["https://example.com/invoice.jpg"]
prompt: "Extract invoice details"
schema_mode: "simple"
simple_schema: [
  {"name": "invoice_number", "type": "string", "is_required": true},
  {"name": "total_amount", "type": "number", "is_required": true},
  {"name": "vendor", "type": "string", "is_required": true}
]
```

**Output:**

```
data: {
  "invoice_number": "INV-2024-001",
  "total_amount": 1250.50,
  "vendor": "Acme Supplies Inc"
}
```

## Common Use Cases

* **Form Data Extraction**: Extract structured information from unstructured form submissions
* **Document Processing**: Pull key information from invoices, receipts, and other business documents
* **Web Scraping**: Extract data from web pages and convert to structured JSON format
* **Image Analysis**: Extract structured information from images like screenshots or scanned documents
* **API Response Parsing**: Convert complex API responses into simplified structured formats
* **Bulk Data Migration**: Transform CSV, email, or text data into consistent structured formats
* **Contact List Building**: Extract names, emails, and contact details from various sources

## Error Handling

| Error Type           | Cause                                               | Solution                                                                     |
| -------------------- | --------------------------------------------------- | ---------------------------------------------------------------------------- |
| Invalid Schema       | Schema definition doesn't follow JSON Schema format | Review your schema syntax and ensure it's valid JSON Schema in advanced mode |
| Extraction Failed    | Content doesn't contain the required fields         | Verify the input content has the information you're trying to extract        |
| Null Values          | Required fields not found in the input              | Adjust your schema to make fields optional or improve your guide prompt      |
| Type Mismatch        | Extracted data doesn't match the defined type       | Update your schema or guide prompt to clarify the expected data types        |
| Token Limit Exceeded | Input content is too large                          | Reduce input size or increase max\_tokens parameter                          |
| Ambiguous Schema     | Schema is too vague to extract consistently         | Add more detailed field descriptions in your schema definition               |

## Notes

* **Simple Mode**: Use simple mode for straightforward field extraction. Define fields with names, descriptions, data types, and required status.
* **Advanced Mode**: Use advanced mode for complex schemas with nested objects, arrays, and conditional fields. Provide a complete JSON Schema.
* **Schema Design**: Be specific in your schema definitions. The more detailed your schema, the more accurate the extraction.
* **Image Support**: You can extract structured data from images (JPG, PNG, JPEG) in addition to text.
* **Model Selection**: Opus and Sonnet models handle complex extractions better than Haiku, especially for detailed schemas.
* **Guide Prompt**: Customize the guide prompt to provide additional context about how to extract and interpret the data.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/claude_extract_structured_data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
