FireCrawl

Action ID: firecrawl

Description

Extract structured data from web pages using Firecrawl's AI-powered extraction capabilities. Provide a URL and a natural language prompt describing what data to extract, and Firecrawl will intelligently parse the page content and return the requested information.

Input Parameters

Name
Type
Required
Default
Description

url

string

-

URL to crawl and extract data from

prompt

string

-

The prompt to use for the extraction without a schema

View JSON Schema
{
  "description": "FireCrawl node input.",
  "properties": {
    "url": {
      "description": "URL to crawl.",
      "title": "URL",
      "type": "string"
    },
    "prompt": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "description": "The prompt to use for the extraction without a schema.",
      "title": "Prompt"
    }
  },
  "required": [
    "url",
    "prompt"
  ],
  "title": "FireCrawlNodeInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

data

string (json_str)

The extracted data from the web URL in JSON format

View JSON Schema
{
  "description": "Web scraping node output.",
  "properties": {
    "data": {
      "description": "The extracted data from the web URL.",
      "title": "Data",
      "type": "string"
    }
  },
  "required": [
    "data"
  ],
  "title": "FireCrawlNodeOutput",
  "type": "object"
}

How It Works

This node uses Firecrawl's AI-powered extraction engine to intelligently parse web pages based on natural language prompts. It loads the target URL, renders any JavaScript content, and then uses the provided prompt to identify and extract the specific data you need. The AI understands the page structure and content context, extracting data according to your instructions and returning it as structured JSON.

Usage Examples

Example 1: Extract Product Information

Input:

url: "https://example.com/products/laptop-pro"
prompt: "Extract the product name, price, description, and availability status"

Output:

data: "{\"product_name\": \"Laptop Pro 15\", \"price\": \"$1,299.99\", \"description\": \"High-performance laptop with 16GB RAM and 512GB SSD\", \"availability\": \"In Stock\"}"

Example 2: Extract Article Metadata

Input:

url: "https://news.example.com/tech-article-2024"
prompt: "Get the article title, author name, publication date, and main topic"

Output:

data: "{\"title\": \"The Future of AI in 2024\", \"author\": \"Jane Smith\", \"publication_date\": \"2024-01-15\", \"main_topic\": \"Artificial Intelligence\"}"

Example 3: Extract Contact Information

Input:

url: "https://company.example.com/contact"
prompt: "Find the company's email address, phone number, and physical address"

Output:

data: "{\"email\": \"[email protected]\", \"phone\": \"+1-555-123-4567\", \"address\": \"123 Main St, San Francisco, CA 94105\"}"

Common Use Cases

  • Product Data Extraction: Extract product details, pricing, and specifications from e-commerce sites

  • Lead Generation: Collect contact information and business details from company websites

  • News Monitoring: Extract article content, headlines, and metadata from news sources

  • Real Estate Data: Gather property listings, prices, and details from real estate websites

  • Job Listings: Extract job titles, descriptions, requirements, and application details

  • Research Data Collection: Collect structured data from various websites for research purposes

  • Competitive Intelligence: Monitor competitor websites for pricing, product, and content changes

Error Handling

Error Type
Cause
Solution

Invalid URL

URL is malformed or inaccessible

Verify the URL is correctly formatted with http:// or https:// protocol

Page Not Found

The URL returns a 404 error or doesn't exist

Check that the URL is correct and the page is currently available

Extraction Failed

AI couldn't find data matching the prompt

Refine your prompt to be more specific or check if the data exists on the page

Timeout Error

Page took too long to load or process

Retry the request or check if the website is experiencing issues

Rate Limit Exceeded

Too many requests to Firecrawl API

Implement delays between requests or upgrade your Firecrawl plan

Authentication Required

Page requires login or authentication

Use a different scraping approach for authenticated pages

Prompt Too Vague

Prompt doesn't provide enough context for extraction

Make your prompt more specific about what data to extract and where

Notes

  • Natural Language Prompts: Write clear, specific prompts describing exactly what data you want to extract from the page.

  • AI-Powered: Firecrawl uses AI to understand page context, making it more flexible than traditional CSS selector-based scraping.

  • JavaScript Rendering: The service renders JavaScript, so it works with modern single-page applications.

  • Prompt Quality: More specific prompts yield better results—describe the data type, format, and location if possible.

  • JSON Output: Extracted data is returned as a JSON string, which can be parsed for use in subsequent workflow nodes.

  • No Schema Required: Unlike structured extraction APIs, this node works with natural language prompts instead of predefined schemas.

  • Best Practices: Test your prompts on sample pages first to ensure they extract the correct data.

  • Cost Considerations: Each extraction counts toward your Firecrawl API usage quota.

Last updated

Was this helpful?