# Firecrawl Extract

**Action ID:** `firecrawl_extract`

## Description

Extract structured data from websites using natural language prompts or JSON schemas. This node uses Firecrawl's AI-powered extraction to intelligently parse webpage content and return specifically requested data fields.

## Provider

**Firecrawl**

## Connection

| Name                 | Description                                      | Required | Category  |
| -------------------- | ------------------------------------------------ | :------: | --------- |
| Firecrawl Connection | The Firecrawl connection to use for the extract. |     ✓    | firecrawl |

## Input Parameters

| Name                | Type    | Required | Default | Description                                                         |
| ------------------- | ------- | :------: | ------- | ------------------------------------------------------------------- |
| urls                | array   |     ✓    | -       | The URLs to extract from                                            |
| prompt              | string  |     -    | -       | The prompt to use for the extract                                   |
| schema              | object  |     -    | -       | The schema to use for the extract                                   |
| enable\_web\_search | boolean |     -    | false   | When true, extraction can follow links outside the specified domain |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Firecrawl extract node input.",
  "properties": {
    "urls": {
      "title": "URLs",
      "type": "array",
      "items": {"type": "string", "format": "uri"},
      "description": "The URLs to extract from."
    },
    "prompt": {
      "title": "Prompt",
      "type": "string",
      "description": "The prompt to use for the extract."
    },
    "schema": {
      "title": "Schema",
      "type": "object",
      "description": "The schema to use for the extract."
    },
    "enable_web_search": {
      "title": "Enable Web Search",
      "type": "boolean",
      "default": false,
      "description": "When true, extraction can follow links outside the specified domain."
    }
  },
  "required": [
    "urls"
  ],
  "title": "FirecrawlExtractInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name   | Type   | Description                           |
| ------ | ------ | ------------------------------------- |
| result | object | The output from the Firecrawl extract |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Firecrawl extract node output.",
  "properties": {
    "result": {
      "title": "Result",
      "type": "object",
      "description": "The output from the Firecrawl extract."
    }
  },
  "required": [
    "result"
  ],
  "title": "FirecrawlExtractOutput",
  "type": "object"
}
```

</details>

## How It Works

This node processes one or more URLs and extracts specific information using either a natural language prompt or a structured JSON schema. Firecrawl's AI understands the page content and extracts only the relevant data fields you request, returning results in the schema format you specify.

## Usage Examples

### Example 1: Extract Product Information with Prompt

**Input:**

```
urls: ["https://example.com/products/laptop-pro"]
prompt: "Extract the product name, price, specs, and customer rating"
enable_web_search: false
```

**Output:**

```
result: {
  "product_name": "Laptop Pro 15",
  "price": "$1299.99",
  "specs": {
    "cpu": "Intel i9",
    "ram": "32GB",
    "storage": "1TB SSD"
  },
  "customer_rating": 4.8
}
```

### Example 2: Extract Contact Info with Schema

**Input:**

```
urls: ["https://example.com/contact"]
schema: {
  "type": "object",
  "properties": {
    "company_name": {"type": "string"},
    "email": {"type": "string"},
    "phone": {"type": "string"},
    "address": {"type": "string"}
  }
}
enable_web_search: false
```

**Output:**

```
result: {
  "company_name": "Example Corp",
  "email": "contact@example.com",
  "phone": "+1-555-123-4567",
  "address": "123 Main St, City, State 12345"
}
```

### Example 3: Multi-URL Extraction with Web Search

**Input:**

```
urls: ["https://startup.example.com/about", "https://startup.example.com/team"]
prompt: "Extract founder names, founding year, and company mission"
enable_web_search: true
```

**Output:**

```
result: {
  "founders": ["Jane Smith", "John Johnson"],
  "founding_year": 2020,
  "mission": "To revolutionize digital transformation..."
}
```

## Common Use Cases

* **Data Extraction**: Extract structured data from unstructured web pages
* **Lead Enrichment**: Extract company information, contact details, and industry data
* **Product Information Gathering**: Collect product specs, pricing, and availability across websites
* **Research Data Collection**: Extract research papers, citations, and academic information
* **Real Estate Data**: Extract property details, pricing, and features from listings
* **Job Listing Analysis**: Extract job requirements, salary ranges, and qualifications
* **E-commerce Intelligence**: Extract competitor pricing, product features, and reviews

## Error Handling

| Error Type           | Cause                                                        | Solution                                                             |
| -------------------- | ------------------------------------------------------------ | -------------------------------------------------------------------- |
| Invalid URL          | URL format is incorrect or domain doesn't exist              | Verify all URLs are valid and properly formatted                     |
| Access Denied        | Website blocks automated scraping or requires authentication | Check if web search is enabled; consider authentication if available |
| No Data Extracted    | Prompt is too vague or schema doesn't match page content     | Refine the prompt or schema to match actual page structure           |
| Invalid Schema       | JSON schema is malformed or has syntax errors                | Validate schema against JSON schema standards                        |
| Multiple URLs Failed | Some or all URLs are inaccessible or time out                | Check URL availability and reduce page complexity                    |
| Web Search Error     | Web search feature encounters issues following links         | Disable web search and specify exact URLs needed                     |

## Notes

* **Multiple URLs**: You can extract from multiple URLs in a single request. Results will aggregate data from all pages.
* **Prompt vs Schema**: Use prompts for flexible, natural language extraction. Use schemas for consistent, structured output format.
* **Web Search**: Enable web search to allow the extractor to follow external links if information isn't on the specified pages.
* **Schema Format**: Provide JSON schema that matches standard JSON schema conventions for best results.
* **Rate Limits**: Respect Firecrawl's rate limits, especially when extracting from many URLs or using web search.
* **Accuracy**: More specific prompts and schemas yield more accurate extraction results.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/firecrawl_extract.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
