# Extract Content

**Action ID:** `extract_content`

## Description

Extract structured content from text using a specified schema.

## Input Parameters

| Name            | Type     | Required | Default            | Description                                                                                        |
| --------------- | -------- | :------: | ------------------ | -------------------------------------------------------------------------------------------------- |
| extract\_from   | string   |     ✓    | -                  | The content to extract from                                                                        |
| extract\_schema | object   |     ✓    | -                  | A JSON schema representing the structure for extracted content                                     |
| model           | dropdown |     -    | gpt-3.5-turbo-0613 | The LLM model to use for extracting content. Available options: gpt-3.5-turbo-0613, gpt-4-32k-0613 |

<details>

<summary>View JSON Schema</summary>

**Input Schema**

```json
{
  "$defs": {
    "AllowedModels": {
      "description": "Allowed models for the web scraping node.",
      "enum": [
        "gpt-3.5-turbo-0613",
        "gpt-4-32k-0613"
      ],
      "title": "AllowedModels",
      "type": "string"
    }
  },
  "description": "Extract content node input.",
  "properties": {
    "extract_from": {
      "description": "The content to extract from.",
      "title": "Extract From",
      "type": "string"
    },
    "extract_schema": {
      "additionalProperties": true,
      "description": "A json string represent schema for the extracted content.",
      "title": "Extract Schema",
      "type": "object"
    },
    "model": {
      "$ref": "#/$defs/AllowedModels",
      "default": "gpt-3.5-turbo-0613",
      "description": "The LLM model to use for extracting content.",
      "title": "Model"
    }
  },
  "required": [
    "extract_from",
    "extract_schema"
  ],
  "title": "ExtractContentNodeInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name               | Type  | Description                                                                             |
| ------------------ | ----- | --------------------------------------------------------------------------------------- |
| extracted\_content | array | The extracted content from the text as an array of objects matching the provided schema |

<details>

<summary>View JSON Schema</summary>

**Output Schema**

```json
{
  "description": "Extract content node output.",
  "properties": {
    "extracted_content": {
      "description": "The extracted content from the web URLs.",
      "items": {
        "additionalProperties": true,
        "type": "object"
      },
      "title": "Extracted Content",
      "type": "array"
    }
  },
  "required": [
    "extracted_content"
  ],
  "title": "ExtractContentNodeOutput",
  "type": "object"
}
```

</details>

## How It Works

This node uses large language models (LLMs) to intelligently parse unstructured text and extract specific information based on a JSON schema you define. The LLM analyzes the input text, identifies relevant data points matching your schema structure, and returns the extracted information in a structured, consistent format. This enables automated data extraction from emails, documents, web content, and other text sources.

## Usage Examples

### Example 1: Extract Contact Information

**Input:**

```
extract_from: "John Smith is the CEO. You can reach him at john@example.com or call (555) 123-4567. His office is at 123 Main St, New York, NY 10001."
extract_schema: {
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "email": {"type": "string"},
    "phone": {"type": "string"},
    "address": {"type": "string"}
  }
}
model: "gpt-3.5-turbo-0613"
```

**Output:**

```
extracted_content: [
  {
    "name": "John Smith",
    "email": "john@example.com",
    "phone": "(555) 123-4567",
    "address": "123 Main St, New York, NY 10001"
  }
]
```

### Example 2: Extract Product Information

**Input:**

```
extract_from: "We have the iPhone 15 Pro available for $999. The Samsung Galaxy S24 costs $899 and comes in blue, black, and silver. The Google Pixel 8 is priced at $699."
extract_schema: {
  "type": "object",
  "properties": {
    "product_name": {"type": "string"},
    "price": {"type": "number"},
    "colors": {"type": "array", "items": {"type": "string"}}
  }
}
model: "gpt-4-32k-0613"
```

**Output:**

```
extracted_content: [
  {
    "product_name": "iPhone 15 Pro",
    "price": 999,
    "colors": []
  },
  {
    "product_name": "Samsung Galaxy S24",
    "price": 899,
    "colors": ["blue", "black", "silver"]
  },
  {
    "product_name": "Google Pixel 8",
    "price": 699,
    "colors": []
  }
]
```

### Example 3: Extract Event Details

**Input:**

```
extract_from: "Join us for the Tech Summit 2024 on March 15-17 at the Convention Center. Registration opens at 8 AM. Keynote speaker: Dr. Jane Doe. Cost: $299 for early bird."
extract_schema: {
  "type": "object",
  "properties": {
    "event_name": {"type": "string"},
    "dates": {"type": "string"},
    "location": {"type": "string"},
    "speaker": {"type": "string"},
    "price": {"type": "number"}
  }
}
model: "gpt-3.5-turbo-0613"
```

**Output:**

```
extracted_content: [
  {
    "event_name": "Tech Summit 2024",
    "dates": "March 15-17",
    "location": "Convention Center",
    "speaker": "Dr. Jane Doe",
    "price": 299
  }
]
```

## Common Use Cases

* **Email Processing**: Extract key information like names, dates, and action items from emails
* **Document Parsing**: Pull structured data from invoices, contracts, and business documents
* **Web Scraping**: Extract specific data points from web page content
* **Customer Data Extraction**: Parse customer inquiries to extract contact details and requirements
* **Product Catalog Creation**: Extract product details from descriptions to build structured catalogs
* **Resume Parsing**: Extract candidate information like skills, experience, and education from resumes
* **Lead Generation**: Extract business contact information from various text sources

## Error Handling

| Error Type            | Cause                                           | Solution                                                                           |
| --------------------- | ----------------------------------------------- | ---------------------------------------------------------------------------------- |
| Invalid Schema        | JSON schema is malformed or invalid             | Validate your JSON schema structure and ensure it follows proper JSON syntax       |
| Model Error           | LLM API is unavailable or rate limited          | Retry the operation or switch to an alternative model                              |
| Empty Content         | extract\_from field is empty or null            | Provide valid text content for extraction                                          |
| Schema Mismatch       | Content doesn't match expected schema structure | Adjust schema to match the actual content structure or provide appropriate content |
| Token Limit Exceeded  | Input text is too long for the model            | Split text into smaller chunks or use gpt-4-32k-0613 for larger content            |
| Authentication Failed | Invalid API credentials                         | Verify your OpenAI API connection and credentials                                  |
| Extraction Failed     | LLM unable to extract matching data             | Simplify the schema or provide more explicit content that matches the schema       |

## Notes

* **Schema Design**: Design clear, specific schemas that match your expected data structure. Include field types and nested objects as needed.
* **Model Selection**: Use gpt-3.5-turbo-0613 for simple extractions and gpt-4-32k-0613 for complex content or higher accuracy requirements.
* **Content Quality**: Better structured input text produces more accurate extraction results.
* **Array Output**: The node returns an array of objects, allowing extraction of multiple items from a single text source.
* **Data Types**: Ensure your schema specifies appropriate data types (string, number, boolean, array, object) for accurate extraction.
* **Performance**: Extraction time varies based on content length and model choice. GPT-4 is slower but more accurate than GPT-3.5.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/extract_content.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
