# FireCrawl

**Action ID:** `firecrawl`

## Description

Extract structured data from web pages using Firecrawl's AI-powered extraction capabilities. Provide a URL and a natural language prompt describing what data to extract, and Firecrawl will intelligently parse the page content and return the requested information.

## Input Parameters

| Name   | Type   | Required | Default | Description                                           |
| ------ | ------ | :------: | ------- | ----------------------------------------------------- |
| url    | string |     ✓    | -       | URL to crawl and extract data from                    |
| prompt | string |     ✓    | -       | The prompt to use for the extraction without a schema |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "FireCrawl node input.",
  "properties": {
    "url": {
      "description": "URL to crawl.",
      "title": "URL",
      "type": "string"
    },
    "prompt": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "description": "The prompt to use for the extraction without a schema.",
      "title": "Prompt"
    }
  },
  "required": [
    "url",
    "prompt"
  ],
  "title": "FireCrawlNodeInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name | Type               | Description                                        |
| ---- | ------------------ | -------------------------------------------------- |
| data | string (json\_str) | The extracted data from the web URL in JSON format |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Web scraping node output.",
  "properties": {
    "data": {
      "description": "The extracted data from the web URL.",
      "title": "Data",
      "type": "string"
    }
  },
  "required": [
    "data"
  ],
  "title": "FireCrawlNodeOutput",
  "type": "object"
}
```

</details>

## How It Works

This node uses Firecrawl's AI-powered extraction engine to intelligently parse web pages based on natural language prompts. It loads the target URL, renders any JavaScript content, and then uses the provided prompt to identify and extract the specific data you need. The AI understands the page structure and content context, extracting data according to your instructions and returning it as structured JSON.

## Usage Examples

### Example 1: Extract Product Information

**Input:**

```
url: "https://example.com/products/laptop-pro"
prompt: "Extract the product name, price, description, and availability status"
```

**Output:**

```
data: "{\"product_name\": \"Laptop Pro 15\", \"price\": \"$1,299.99\", \"description\": \"High-performance laptop with 16GB RAM and 512GB SSD\", \"availability\": \"In Stock\"}"
```

### Example 2: Extract Article Metadata

**Input:**

```
url: "https://news.example.com/tech-article-2024"
prompt: "Get the article title, author name, publication date, and main topic"
```

**Output:**

```
data: "{\"title\": \"The Future of AI in 2024\", \"author\": \"Jane Smith\", \"publication_date\": \"2024-01-15\", \"main_topic\": \"Artificial Intelligence\"}"
```

### Example 3: Extract Contact Information

**Input:**

```
url: "https://company.example.com/contact"
prompt: "Find the company's email address, phone number, and physical address"
```

**Output:**

```
data: "{\"email\": \"contact@company.com\", \"phone\": \"+1-555-123-4567\", \"address\": \"123 Main St, San Francisco, CA 94105\"}"
```

## Common Use Cases

* **Product Data Extraction**: Extract product details, pricing, and specifications from e-commerce sites
* **Lead Generation**: Collect contact information and business details from company websites
* **News Monitoring**: Extract article content, headlines, and metadata from news sources
* **Real Estate Data**: Gather property listings, prices, and details from real estate websites
* **Job Listings**: Extract job titles, descriptions, requirements, and application details
* **Research Data Collection**: Collect structured data from various websites for research purposes
* **Competitive Intelligence**: Monitor competitor websites for pricing, product, and content changes

## Error Handling

| Error Type              | Cause                                                | Solution                                                                       |
| ----------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------ |
| Invalid URL             | URL is malformed or inaccessible                     | Verify the URL is correctly formatted with http\:// or https\:// protocol      |
| Page Not Found          | The URL returns a 404 error or doesn't exist         | Check that the URL is correct and the page is currently available              |
| Extraction Failed       | AI couldn't find data matching the prompt            | Refine your prompt to be more specific or check if the data exists on the page |
| Timeout Error           | Page took too long to load or process                | Retry the request or check if the website is experiencing issues               |
| Rate Limit Exceeded     | Too many requests to Firecrawl API                   | Implement delays between requests or upgrade your Firecrawl plan               |
| Authentication Required | Page requires login or authentication                | Use a different scraping approach for authenticated pages                      |
| Prompt Too Vague        | Prompt doesn't provide enough context for extraction | Make your prompt more specific about what data to extract and where            |

## Notes

* **Natural Language Prompts**: Write clear, specific prompts describing exactly what data you want to extract from the page.
* **AI-Powered**: Firecrawl uses AI to understand page context, making it more flexible than traditional CSS selector-based scraping.
* **JavaScript Rendering**: The service renders JavaScript, so it works with modern single-page applications.
* **Prompt Quality**: More specific prompts yield better results—describe the data type, format, and location if possible.
* **JSON Output**: Extracted data is returned as a JSON string, which can be parsed for use in subsequent workflow nodes.
* **No Schema Required**: Unlike structured extraction APIs, this node works with natural language prompts instead of predefined schemas.
* **Best Practices**: Test your prompts on sample pages first to ensure they extract the correct data.
* **Cost Considerations**: Each extraction counts toward your Firecrawl API usage quota.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/firecrawl.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
