Firecrawl Extract
Action ID: firecrawl_extract
Description
Extract structured data from websites using natural language prompts or JSON schemas. This node uses Firecrawl's AI-powered extraction to intelligently parse webpage content and return specifically requested data fields.
Provider
Firecrawl
Connection
Firecrawl Connection
The Firecrawl connection to use for the extract.
✓
firecrawl
Input Parameters
urls
array
✓
-
The URLs to extract from
prompt
string
-
-
The prompt to use for the extract
schema
object
-
-
The schema to use for the extract
enable_web_search
boolean
-
false
When true, extraction can follow links outside the specified domain
Output Parameters
result
object
The output from the Firecrawl extract
How It Works
This node processes one or more URLs and extracts specific information using either a natural language prompt or a structured JSON schema. Firecrawl's AI understands the page content and extracts only the relevant data fields you request, returning results in the schema format you specify.
Usage Examples
Example 1: Extract Product Information with Prompt
Input:
urls: ["https://example.com/products/laptop-pro"]
prompt: "Extract the product name, price, specs, and customer rating"
enable_web_search: falseOutput:
result: {
"product_name": "Laptop Pro 15",
"price": "$1299.99",
"specs": {
"cpu": "Intel i9",
"ram": "32GB",
"storage": "1TB SSD"
},
"customer_rating": 4.8
}Example 2: Extract Contact Info with Schema
Input:
urls: ["https://example.com/contact"]
schema: {
"type": "object",
"properties": {
"company_name": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": "string"},
"address": {"type": "string"}
}
}
enable_web_search: falseOutput:
result: {
"company_name": "Example Corp",
"email": "[email protected]",
"phone": "+1-555-123-4567",
"address": "123 Main St, City, State 12345"
}Example 3: Multi-URL Extraction with Web Search
Input:
urls: ["https://startup.example.com/about", "https://startup.example.com/team"]
prompt: "Extract founder names, founding year, and company mission"
enable_web_search: trueOutput:
result: {
"founders": ["Jane Smith", "John Johnson"],
"founding_year": 2020,
"mission": "To revolutionize digital transformation..."
}Common Use Cases
Data Extraction: Extract structured data from unstructured web pages
Lead Enrichment: Extract company information, contact details, and industry data
Product Information Gathering: Collect product specs, pricing, and availability across websites
Research Data Collection: Extract research papers, citations, and academic information
Real Estate Data: Extract property details, pricing, and features from listings
Job Listing Analysis: Extract job requirements, salary ranges, and qualifications
E-commerce Intelligence: Extract competitor pricing, product features, and reviews
Error Handling
Invalid URL
URL format is incorrect or domain doesn't exist
Verify all URLs are valid and properly formatted
Access Denied
Website blocks automated scraping or requires authentication
Check if web search is enabled; consider authentication if available
No Data Extracted
Prompt is too vague or schema doesn't match page content
Refine the prompt or schema to match actual page structure
Invalid Schema
JSON schema is malformed or has syntax errors
Validate schema against JSON schema standards
Multiple URLs Failed
Some or all URLs are inaccessible or time out
Check URL availability and reduce page complexity
Web Search Error
Web search feature encounters issues following links
Disable web search and specify exact URLs needed
Notes
Multiple URLs: You can extract from multiple URLs in a single request. Results will aggregate data from all pages.
Prompt vs Schema: Use prompts for flexible, natural language extraction. Use schemas for consistent, structured output format.
Web Search: Enable web search to allow the extractor to follow external links if information isn't on the specified pages.
Schema Format: Provide JSON schema that matches standard JSON schema conventions for best results.
Rate Limits: Respect Firecrawl's rate limits, especially when extracting from many URLs or using web search.
Accuracy: More specific prompts and schemas yield more accurate extraction results.
Last updated
Was this helpful?