# Extract Structured Data

**Action ID:** `openai_extract_structured_data`

## Description

Returns structured data from provided unstructured text. This node uses OpenAI's Structured Outputs capability to reliably extract and validate data according to a JSON Schema you define, ensuring the model always returns properly formatted results.

## Provider

**OpenAI**

## Connection

| Name              | Description                                                  | Required | Category |
| ----------------- | ------------------------------------------------------------ | :------: | -------- |
| OpenAI Connection | The OpenAI connection to use for extracting structured data. |     ✓    | openai   |

## Input Parameters

| Name           | Type     | Required | Default           | Description                                                                                                                                                                   |
| -------------- | -------- | :------: | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| model          | dropdown |     -    | gpt-4o-2024-08-06 | The model to use for extracting structured data. Use gpt-4o-2024-08-06 or later for best results.                                                                             |
| text           | string   |     ✓    | -                 | The text from which to extract structured data.                                                                                                                               |
| schema\_name   | string   |     -    | extracted\_data   | A name for the schema (required by OpenAI).                                                                                                                                   |
| json\_schema   | object   |     ✓    | -                 | The JSON Schema that defines the structure of the data to extract. Must be a valid JSON Schema object with type='object', properties defined, and additionalProperties=false. |
| strict         | boolean  |     -    | true              | Whether to enforce strict schema validation. When true, the model will always generate responses that adhere to the supplied schema.                                          |
| system\_prompt | string   |     -    | -                 | Optional system prompt to guide the extraction process.                                                                                                                       |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Extract Structured Data node input.",
  "properties": {
    "model": {
      "default": "gpt-4o-2024-08-06",
      "description": "The model to use for extracting structured data. Use gpt-4o-2024-08-06 or later for best results.",
      "title": "Model",
      "type": "string"
    },
    "text": {
      "description": "The text from which to extract structured data.",
      "title": "Unstructured Text",
      "type": "string"
    },
    "schema_name": {
      "default": "extracted_data",
      "description": "A name for the schema (required by OpenAI).",
      "title": "Schema Name",
      "type": "string"
    },
    "json_schema": {
      "description": "The JSON Schema that defines the structure of the data to extract. Must be a valid JSON Schema object with type='object', properties defined, and additionalProperties=false.",
      "title": "JSON Schema",
      "type": "object"
    },
    "strict": {
      "default": true,
      "description": "Whether to enforce strict schema validation. When true, the model will always generate responses that adhere to the supplied schema.",
      "title": "Strict Mode",
      "type": "boolean"
    },
    "system_prompt": {
      "description": "Optional system prompt to guide the extraction process.",
      "title": "System Prompt",
      "type": "string"
    }
  },
  "required": [
    "text",
    "json_schema"
  ],
  "title": "ExtractStructuredDataInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name | Type   | Description                                                 |
| ---- | ------ | ----------------------------------------------------------- |
| data | object | The extracted structured data matching the provided schema. |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Response from extracting structured data.",
  "properties": {
    "data": {
      "description": "The extracted structured data.",
      "title": "Data",
      "type": "object"
    }
  },
  "title": "ExtractStructuredDataResponse",
  "type": "object"
}
```

</details>

## How It Works

This node leverages OpenAI's Structured Outputs feature to reliably extract data from unstructured text. You provide a JSON Schema that describes the structure you want to extract, and the model uses this schema to guide its response. In strict mode (default), the model is constrained to always return valid JSON that conforms to your schema, eliminating the need for complex parsing or validation logic. The extracted data is returned directly as a JSON object.

## Usage Examples

### Example 1: Extract Contact Information

**Input:**

```
text: "John Smith, Software Engineer, works at Acme Corp. His email is john@acme.com and phone is 555-1234."
schema_name: "contact_info"
json_schema: {
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "title": {"type": "string"},
    "company": {"type": "string"},
    "email": {"type": "string"},
    "phone": {"type": "string"}
  },
  "required": ["name", "email"],
  "additionalProperties": false
}
strict: true
```

**Output:**

```
data: {
  "name": "John Smith",
  "title": "Software Engineer",
  "company": "Acme Corp",
  "email": "john@acme.com",
  "phone": "555-1234"
}
```

### Example 2: Extract Product Information

**Input:**

```
text: "The TechWidget 3000 is a revolutionary device that costs $299.99 and comes in silver, blue, and black colors. It has a 2-year warranty and is rated 4.5 stars."
schema_name: "product"
json_schema: {
  "type": "object",
  "properties": {
    "product_name": {"type": "string"},
    "price": {"type": "number"},
    "colors": {"type": "array", "items": {"type": "string"}},
    "warranty_years": {"type": "integer"},
    "rating": {"type": "number"}
  },
  "required": ["product_name", "price"],
  "additionalProperties": false
}
strict: true
```

**Output:**

```
data: {
  "product_name": "TechWidget 3000",
  "price": 299.99,
  "colors": ["silver", "blue", "black"],
  "warranty_years": 2,
  "rating": 4.5
}
```

## Common Use Cases

* **Data Extraction**: Extract structured information from unstructured documents
* **Form Filling**: Automatically populate form fields from text descriptions
* **Information Parsing**: Parse complex text into organized data structures
* **Document Processing**: Extract key information from PDFs, emails, or documents
* **API Data Preparation**: Convert text into JSON for downstream APIs
* **Data Classification**: Categorize and structure unstructured information
* **Entity Recognition**: Extract and organize entities with relationships

## Error Handling

| Error Type               | Cause                                                   | Solution                                                                            |
| ------------------------ | ------------------------------------------------------- | ----------------------------------------------------------------------------------- |
| Invalid Schema           | Schema missing required fields or incorrectly formatted | Ensure schema has type='object', properties defined, and additionalProperties=false |
| Schema Validation Failed | Model cannot strictly adhere to schema                  | Simplify schema, use more flexible field types, or disable strict mode              |
| Model Not Found          | Selected model doesn't exist or access not granted      | Verify model availability and ensure API key has access                             |
| Extraction Failed        | Model cannot extract data from provided text            | Provide clearer text with more explicit information                                 |
| Authentication Error     | Invalid or missing OpenAI API key                       | Verify your OpenAI connection is properly configured                                |
| Token Limit Exceeded     | Text is too long for model's context window             | Reduce text length or use a model with larger context                               |

## Notes

* **Schema Requirements**: Schemas must have type='object', define properties, and set additionalProperties=false for validation to work correctly.
* **Strict Mode**: Highly recommended for reliable extraction. When enabled, responses are guaranteed to conform to your schema.
* **Model Selection**: Use gpt-4o-2024-08-06 or later for best results with Structured Outputs.
* **System Prompts**: Use system prompts to guide extraction behavior, such as "Extract only explicitly stated information" or "Infer implied values where reasonable."
* **Nested Objects**: You can define nested properties for complex hierarchical data structures.
* **Array Handling**: Use array types for extracting lists of items with consistent structure.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/openai_extract_structured_data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
