Firecrawl Extract

Action ID: firecrawl_extract

Description

Extract structured data from websites using natural language prompts or JSON schemas. This node uses Firecrawl's AI-powered extraction to intelligently parse webpage content and return specifically requested data fields.

Provider

Firecrawl

Connection

Name
Description
Required
Category

Firecrawl Connection

The Firecrawl connection to use for the extract.

firecrawl

Input Parameters

Name
Type
Required
Default
Description

urls

array

-

The URLs to extract from

prompt

string

-

-

The prompt to use for the extract

schema

object

-

-

The schema to use for the extract

enable_web_search

boolean

-

false

When true, extraction can follow links outside the specified domain

View JSON Schema
{
  "description": "Firecrawl extract node input.",
  "properties": {
    "urls": {
      "title": "URLs",
      "type": "array",
      "items": {"type": "string", "format": "uri"},
      "description": "The URLs to extract from."
    },
    "prompt": {
      "title": "Prompt",
      "type": "string",
      "description": "The prompt to use for the extract."
    },
    "schema": {
      "title": "Schema",
      "type": "object",
      "description": "The schema to use for the extract."
    },
    "enable_web_search": {
      "title": "Enable Web Search",
      "type": "boolean",
      "default": false,
      "description": "When true, extraction can follow links outside the specified domain."
    }
  },
  "required": [
    "urls"
  ],
  "title": "FirecrawlExtractInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

result

object

The output from the Firecrawl extract

View JSON Schema
{
  "description": "Firecrawl extract node output.",
  "properties": {
    "result": {
      "title": "Result",
      "type": "object",
      "description": "The output from the Firecrawl extract."
    }
  },
  "required": [
    "result"
  ],
  "title": "FirecrawlExtractOutput",
  "type": "object"
}

How It Works

This node processes one or more URLs and extracts specific information using either a natural language prompt or a structured JSON schema. Firecrawl's AI understands the page content and extracts only the relevant data fields you request, returning results in the schema format you specify.

Usage Examples

Example 1: Extract Product Information with Prompt

Input:

urls: ["https://example.com/products/laptop-pro"]
prompt: "Extract the product name, price, specs, and customer rating"
enable_web_search: false

Output:

result: {
  "product_name": "Laptop Pro 15",
  "price": "$1299.99",
  "specs": {
    "cpu": "Intel i9",
    "ram": "32GB",
    "storage": "1TB SSD"
  },
  "customer_rating": 4.8
}

Example 2: Extract Contact Info with Schema

Input:

urls: ["https://example.com/contact"]
schema: {
  "type": "object",
  "properties": {
    "company_name": {"type": "string"},
    "email": {"type": "string"},
    "phone": {"type": "string"},
    "address": {"type": "string"}
  }
}
enable_web_search: false

Output:

result: {
  "company_name": "Example Corp",
  "email": "[email protected]",
  "phone": "+1-555-123-4567",
  "address": "123 Main St, City, State 12345"
}

Input:

urls: ["https://startup.example.com/about", "https://startup.example.com/team"]
prompt: "Extract founder names, founding year, and company mission"
enable_web_search: true

Output:

result: {
  "founders": ["Jane Smith", "John Johnson"],
  "founding_year": 2020,
  "mission": "To revolutionize digital transformation..."
}

Common Use Cases

  • Data Extraction: Extract structured data from unstructured web pages

  • Lead Enrichment: Extract company information, contact details, and industry data

  • Product Information Gathering: Collect product specs, pricing, and availability across websites

  • Research Data Collection: Extract research papers, citations, and academic information

  • Real Estate Data: Extract property details, pricing, and features from listings

  • Job Listing Analysis: Extract job requirements, salary ranges, and qualifications

  • E-commerce Intelligence: Extract competitor pricing, product features, and reviews

Error Handling

Error Type
Cause
Solution

Invalid URL

URL format is incorrect or domain doesn't exist

Verify all URLs are valid and properly formatted

Access Denied

Website blocks automated scraping or requires authentication

Check if web search is enabled; consider authentication if available

No Data Extracted

Prompt is too vague or schema doesn't match page content

Refine the prompt or schema to match actual page structure

Invalid Schema

JSON schema is malformed or has syntax errors

Validate schema against JSON schema standards

Multiple URLs Failed

Some or all URLs are inaccessible or time out

Check URL availability and reduce page complexity

Web Search Error

Web search feature encounters issues following links

Disable web search and specify exact URLs needed

Notes

  • Multiple URLs: You can extract from multiple URLs in a single request. Results will aggregate data from all pages.

  • Prompt vs Schema: Use prompts for flexible, natural language extraction. Use schemas for consistent, structured output format.

  • Web Search: Enable web search to allow the extractor to follow external links if information isn't on the specified pages.

  • Schema Format: Provide JSON schema that matches standard JSON schema conventions for best results.

  • Rate Limits: Respect Firecrawl's rate limits, especially when extracting from many URLs or using web search.

  • Accuracy: More specific prompts and schemas yield more accurate extraction results.

Last updated

Was this helpful?