Firecrawl Scrape

Action ID: firecrawl_scrape

Description

Scrape a single website and extract its content in structured format. This node uses Firecrawl's powerful web scraping technology to fetch and parse a webpage, returning clean HTML, markdown, or JSON data.

Provider

Firecrawl

Connection

Name

Description

Required

Input Parameters

Name

Type

Required

Default

Description

url

string

✓

The URL to scrape

format

array

["json"]

The format to use for the scrape

json_options

object

{}

The options to use for the JSON scrape

View JSON Schema

{
  "description": "Firecrawl scrape node input.",
  "properties": {
    "url": {
      "title": "URL",
      "type": "string",
      "format": "uri",
      "description": "The URL to scrape."
    },
    "format": {
      "title": "Format",
      "type": "array",
      "items": {"type": "string"},
      "default": ["json"],
      "description": "The format to use for the scrape."
    },
    "json_options": {
      "title": "JSON Options",
      "type": "object",
      "default": {},
      "description": "The options to use for the JSON scrape."
    }
  },
  "required": [
    "url"
  ],
  "title": "FirecrawlScrapeInput",
  "type": "object"
}

Output Parameters

Name

Type

Description

result

object

The output from the Firecrawl scrape

View JSON Schema

{
  "description": "Firecrawl scrape node output.",
  "properties": {
    "result": {
      "title": "Result",
      "type": "object",
      "description": "The output from the Firecrawl scrape."
    }
  },
  "required": [
    "result"
  ],
  "title": "FirecrawlScrapeOutput",
  "type": "object"
}

How It Works

This node sends a URL to Firecrawl's scraping service, which navigates to the page, loads all content including dynamically generated content, and returns structured data in your requested format. The scraper handles JavaScript rendering, extracts clean content, and returns the result as a structured object.

Usage Examples

Example 1: Scrape Article Content

Input:

url: "https://example.com/article/tech-news"
format: ["json"]
json_options: {}

Output:

result: {
  "title": "Latest Tech News",
  "author": "John Doe",
  "content": "Article content here...",
  "publish_date": "2024-01-15",
  "tags": ["technology", "news"]
}

Example 2: Scrape Product Page

Input:

url: "https://example.com/products/laptop"
format: ["json"]
json_options: {
  "include_images": true,
  "include_pricing": true
}

Output:

result: {
  "product_name": "Professional Laptop",
  "price": "$999.99",
  "rating": 4.5,
  "reviews_count": 250,
  "images": ["url1", "url2", "url3"],
  "specifications": {
    "processor": "Intel i7",
    "memory": "16GB RAM",
    "storage": "512GB SSD"
  }
}

Example 3: Scrape News Article

Input:

url: "https://news.example.com/story/123"
format: ["json"]
json_options: {}

Output:

result: {
  "headline": "Breaking News",
  "byline": "Staff Reporter",
  "publish_time": "2024-01-15T10:30:00Z",
  "body": "Full article content...",
  "images": ["image1.jpg", "image2.jpg"],
  "related_stories": ["story1", "story2"]
}

Common Use Cases

Content Extraction: Extract article text, headlines, and metadata from web pages
Price Monitoring: Scrape product pages to track pricing and availability changes
Research Data Collection: Gather data from multiple websites for analysis
News Aggregation: Collect news articles and summaries from news websites
Lead Generation: Extract contact information and business details from company websites
Market Intelligence: Monitor competitor websites for product updates and announcements
Data Enrichment: Augment existing data with information scraped from web sources

Error Handling

Error Type

Cause

Solution

Invalid URL

URL format is incorrect or domain doesn't exist

Verify the URL is valid and properly formatted

Access Denied

Website blocks automated scraping or requires authentication

Check robots.txt and site terms; consider using proxies if allowed

Page Not Found

URL returns 404 status

Verify the URL is correct and the page still exists

Timeout

Page takes too long to load or render

Try with a different URL or reduce complexity of json_options

JavaScript Error

Page requires complex JavaScript that fails to execute

Ensure the website supports standard JavaScript and has no rendering issues

Empty Result

Page content cannot be extracted or parsed

Check if the page has required content or if structure has changed

Notes

URL Validation: Ensure URLs are publicly accessible and include the full protocol (http:// or https://).
Dynamic Content: Firecrawl handles JavaScript-rendered content automatically, so dynamic websites are supported.
Format Options: Specify multiple formats to get the same content in different structures.
JSON Options: Use json_options to customize the output structure and include/exclude specific elements.
Rate Limits: Be mindful of Firecrawl's rate limits when scraping multiple pages in rapid succession.
Robots.txt Compliance: Respect website terms of service and robots.txt directives when scraping.

PreviousFirecrawl Map NextGenerate email from template

Last updated 3 months ago

hashtagDescription

hashtagProvider

hashtagConnection

hashtagInput Parameters

hashtagOutput Parameters

hashtagHow It Works

hashtagUsage Examples

hashtagExample 1: Scrape Article Content

hashtagExample 2: Scrape Product Page

hashtagExample 3: Scrape News Article

hashtagCommon Use Cases

hashtagError Handling

hashtagNotes

Description

Provider

Connection

Input Parameters

Output Parameters

How It Works

Usage Examples

Example 1: Scrape Article Content

Example 2: Scrape Product Page

Example 3: Scrape News Article

Common Use Cases

Error Handling

Notes