# Firecrawl Scrape

**Action ID:** `firecrawl_scrape`

## Description

Scrape a single website and extract its content in structured format. This node uses Firecrawl's powerful web scraping technology to fetch and parse a webpage, returning clean HTML, markdown, or JSON data.

## Provider

**Firecrawl**

## Connection

| Name                 | Description                                     | Required | Category  |
| -------------------- | ----------------------------------------------- | :------: | --------- |
| Firecrawl Connection | The Firecrawl connection to use for the scrape. |     ✓    | firecrawl |

## Input Parameters

| Name          | Type   | Required | Default   | Description                            |
| ------------- | ------ | :------: | --------- | -------------------------------------- |
| url           | string |     ✓    | -         | The URL to scrape                      |
| format        | array  |     -    | \["json"] | The format to use for the scrape       |
| json\_options | object |     -    | {}        | The options to use for the JSON scrape |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Firecrawl scrape node input.",
  "properties": {
    "url": {
      "title": "URL",
      "type": "string",
      "format": "uri",
      "description": "The URL to scrape."
    },
    "format": {
      "title": "Format",
      "type": "array",
      "items": {"type": "string"},
      "default": ["json"],
      "description": "The format to use for the scrape."
    },
    "json_options": {
      "title": "JSON Options",
      "type": "object",
      "default": {},
      "description": "The options to use for the JSON scrape."
    }
  },
  "required": [
    "url"
  ],
  "title": "FirecrawlScrapeInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name   | Type   | Description                          |
| ------ | ------ | ------------------------------------ |
| result | object | The output from the Firecrawl scrape |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Firecrawl scrape node output.",
  "properties": {
    "result": {
      "title": "Result",
      "type": "object",
      "description": "The output from the Firecrawl scrape."
    }
  },
  "required": [
    "result"
  ],
  "title": "FirecrawlScrapeOutput",
  "type": "object"
}
```

</details>

## How It Works

This node sends a URL to Firecrawl's scraping service, which navigates to the page, loads all content including dynamically generated content, and returns structured data in your requested format. The scraper handles JavaScript rendering, extracts clean content, and returns the result as a structured object.

## Usage Examples

### Example 1: Scrape Article Content

**Input:**

```
url: "https://example.com/article/tech-news"
format: ["json"]
json_options: {}
```

**Output:**

```
result: {
  "title": "Latest Tech News",
  "author": "John Doe",
  "content": "Article content here...",
  "publish_date": "2024-01-15",
  "tags": ["technology", "news"]
}
```

### Example 2: Scrape Product Page

**Input:**

```
url: "https://example.com/products/laptop"
format: ["json"]
json_options: {
  "include_images": true,
  "include_pricing": true
}
```

**Output:**

```
result: {
  "product_name": "Professional Laptop",
  "price": "$999.99",
  "rating": 4.5,
  "reviews_count": 250,
  "images": ["url1", "url2", "url3"],
  "specifications": {
    "processor": "Intel i7",
    "memory": "16GB RAM",
    "storage": "512GB SSD"
  }
}
```

### Example 3: Scrape News Article

**Input:**

```
url: "https://news.example.com/story/123"
format: ["json"]
json_options: {}
```

**Output:**

```
result: {
  "headline": "Breaking News",
  "byline": "Staff Reporter",
  "publish_time": "2024-01-15T10:30:00Z",
  "body": "Full article content...",
  "images": ["image1.jpg", "image2.jpg"],
  "related_stories": ["story1", "story2"]
}
```

## Common Use Cases

* **Content Extraction**: Extract article text, headlines, and metadata from web pages
* **Price Monitoring**: Scrape product pages to track pricing and availability changes
* **Research Data Collection**: Gather data from multiple websites for analysis
* **News Aggregation**: Collect news articles and summaries from news websites
* **Lead Generation**: Extract contact information and business details from company websites
* **Market Intelligence**: Monitor competitor websites for product updates and announcements
* **Data Enrichment**: Augment existing data with information scraped from web sources

## Error Handling

| Error Type       | Cause                                                        | Solution                                                                    |
| ---------------- | ------------------------------------------------------------ | --------------------------------------------------------------------------- |
| Invalid URL      | URL format is incorrect or domain doesn't exist              | Verify the URL is valid and properly formatted                              |
| Access Denied    | Website blocks automated scraping or requires authentication | Check robots.txt and site terms; consider using proxies if allowed          |
| Page Not Found   | URL returns 404 status                                       | Verify the URL is correct and the page still exists                         |
| Timeout          | Page takes too long to load or render                        | Try with a different URL or reduce complexity of json\_options              |
| JavaScript Error | Page requires complex JavaScript that fails to execute       | Ensure the website supports standard JavaScript and has no rendering issues |
| Empty Result     | Page content cannot be extracted or parsed                   | Check if the page has required content or if structure has changed          |

## Notes

* **URL Validation**: Ensure URLs are publicly accessible and include the full protocol (http\:// or https\://).
* **Dynamic Content**: Firecrawl handles JavaScript-rendered content automatically, so dynamic websites are supported.
* **Format Options**: Specify multiple formats to get the same content in different structures.
* **JSON Options**: Use json\_options to customize the output structure and include/exclude specific elements.
* **Rate Limits**: Be mindful of Firecrawl's rate limits when scraping multiple pages in rapid succession.
* **Robots.txt Compliance**: Respect website terms of service and robots.txt directives when scraping.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/firecrawl_scrape.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
