FireCrawl
Action ID: firecrawl
Description
Extract structured data from web pages using Firecrawl's AI-powered extraction capabilities. Provide a URL and a natural language prompt describing what data to extract, and Firecrawl will intelligently parse the page content and return the requested information.
Input Parameters
url
string
✓
-
URL to crawl and extract data from
prompt
string
✓
-
The prompt to use for the extraction without a schema
Output Parameters
data
string (json_str)
The extracted data from the web URL in JSON format
How It Works
This node uses Firecrawl's AI-powered extraction engine to intelligently parse web pages based on natural language prompts. It loads the target URL, renders any JavaScript content, and then uses the provided prompt to identify and extract the specific data you need. The AI understands the page structure and content context, extracting data according to your instructions and returning it as structured JSON.
Usage Examples
Example 1: Extract Product Information
Input:
url: "https://example.com/products/laptop-pro"
prompt: "Extract the product name, price, description, and availability status"Output:
data: "{\"product_name\": \"Laptop Pro 15\", \"price\": \"$1,299.99\", \"description\": \"High-performance laptop with 16GB RAM and 512GB SSD\", \"availability\": \"In Stock\"}"Example 2: Extract Article Metadata
Input:
url: "https://news.example.com/tech-article-2024"
prompt: "Get the article title, author name, publication date, and main topic"Output:
data: "{\"title\": \"The Future of AI in 2024\", \"author\": \"Jane Smith\", \"publication_date\": \"2024-01-15\", \"main_topic\": \"Artificial Intelligence\"}"Example 3: Extract Contact Information
Input:
url: "https://company.example.com/contact"
prompt: "Find the company's email address, phone number, and physical address"Output:
data: "{\"email\": \"[email protected]\", \"phone\": \"+1-555-123-4567\", \"address\": \"123 Main St, San Francisco, CA 94105\"}"Common Use Cases
Product Data Extraction: Extract product details, pricing, and specifications from e-commerce sites
Lead Generation: Collect contact information and business details from company websites
News Monitoring: Extract article content, headlines, and metadata from news sources
Real Estate Data: Gather property listings, prices, and details from real estate websites
Job Listings: Extract job titles, descriptions, requirements, and application details
Research Data Collection: Collect structured data from various websites for research purposes
Competitive Intelligence: Monitor competitor websites for pricing, product, and content changes
Error Handling
Invalid URL
URL is malformed or inaccessible
Verify the URL is correctly formatted with http:// or https:// protocol
Page Not Found
The URL returns a 404 error or doesn't exist
Check that the URL is correct and the page is currently available
Extraction Failed
AI couldn't find data matching the prompt
Refine your prompt to be more specific or check if the data exists on the page
Timeout Error
Page took too long to load or process
Retry the request or check if the website is experiencing issues
Rate Limit Exceeded
Too many requests to Firecrawl API
Implement delays between requests or upgrade your Firecrawl plan
Authentication Required
Page requires login or authentication
Use a different scraping approach for authenticated pages
Prompt Too Vague
Prompt doesn't provide enough context for extraction
Make your prompt more specific about what data to extract and where
Notes
Natural Language Prompts: Write clear, specific prompts describing exactly what data you want to extract from the page.
AI-Powered: Firecrawl uses AI to understand page context, making it more flexible than traditional CSS selector-based scraping.
JavaScript Rendering: The service renders JavaScript, so it works with modern single-page applications.
Prompt Quality: More specific prompts yield better results—describe the data type, format, and location if possible.
JSON Output: Extracted data is returned as a JSON string, which can be parsed for use in subsequent workflow nodes.
No Schema Required: Unlike structured extraction APIs, this node works with natural language prompts instead of predefined schemas.
Best Practices: Test your prompts on sample pages first to ensure they extract the correct data.
Cost Considerations: Each extraction counts toward your Firecrawl API usage quota.
Last updated
Was this helpful?