Firecrawl Scrape
Action ID: firecrawl_scrape
Description
Scrape a single website and extract its content in structured format. This node uses Firecrawl's powerful web scraping technology to fetch and parse a webpage, returning clean HTML, markdown, or JSON data.
Provider
Firecrawl
Connection
Firecrawl Connection
The Firecrawl connection to use for the scrape.
✓
firecrawl
Input Parameters
url
string
✓
-
The URL to scrape
format
array
-
["json"]
The format to use for the scrape
json_options
object
-
{}
The options to use for the JSON scrape
Output Parameters
result
object
The output from the Firecrawl scrape
How It Works
This node sends a URL to Firecrawl's scraping service, which navigates to the page, loads all content including dynamically generated content, and returns structured data in your requested format. The scraper handles JavaScript rendering, extracts clean content, and returns the result as a structured object.
Usage Examples
Example 1: Scrape Article Content
Input:
url: "https://example.com/article/tech-news"
format: ["json"]
json_options: {}Output:
result: {
"title": "Latest Tech News",
"author": "John Doe",
"content": "Article content here...",
"publish_date": "2024-01-15",
"tags": ["technology", "news"]
}Example 2: Scrape Product Page
Input:
url: "https://example.com/products/laptop"
format: ["json"]
json_options: {
"include_images": true,
"include_pricing": true
}Output:
result: {
"product_name": "Professional Laptop",
"price": "$999.99",
"rating": 4.5,
"reviews_count": 250,
"images": ["url1", "url2", "url3"],
"specifications": {
"processor": "Intel i7",
"memory": "16GB RAM",
"storage": "512GB SSD"
}
}Example 3: Scrape News Article
Input:
url: "https://news.example.com/story/123"
format: ["json"]
json_options: {}Output:
result: {
"headline": "Breaking News",
"byline": "Staff Reporter",
"publish_time": "2024-01-15T10:30:00Z",
"body": "Full article content...",
"images": ["image1.jpg", "image2.jpg"],
"related_stories": ["story1", "story2"]
}Common Use Cases
Content Extraction: Extract article text, headlines, and metadata from web pages
Price Monitoring: Scrape product pages to track pricing and availability changes
Research Data Collection: Gather data from multiple websites for analysis
News Aggregation: Collect news articles and summaries from news websites
Lead Generation: Extract contact information and business details from company websites
Market Intelligence: Monitor competitor websites for product updates and announcements
Data Enrichment: Augment existing data with information scraped from web sources
Error Handling
Invalid URL
URL format is incorrect or domain doesn't exist
Verify the URL is valid and properly formatted
Access Denied
Website blocks automated scraping or requires authentication
Check robots.txt and site terms; consider using proxies if allowed
Page Not Found
URL returns 404 status
Verify the URL is correct and the page still exists
Timeout
Page takes too long to load or render
Try with a different URL or reduce complexity of json_options
JavaScript Error
Page requires complex JavaScript that fails to execute
Ensure the website supports standard JavaScript and has no rendering issues
Empty Result
Page content cannot be extracted or parsed
Check if the page has required content or if structure has changed
Notes
URL Validation: Ensure URLs are publicly accessible and include the full protocol (http:// or https://).
Dynamic Content: Firecrawl handles JavaScript-rendered content automatically, so dynamic websites are supported.
Format Options: Specify multiple formats to get the same content in different structures.
JSON Options: Use json_options to customize the output structure and include/exclude specific elements.
Rate Limits: Be mindful of Firecrawl's rate limits when scraping multiple pages in rapid succession.
Robots.txt Compliance: Respect website terms of service and robots.txt directives when scraping.
Last updated
Was this helpful?