Firecrawl Map

Action ID: firecrawl_map

Description

Generate a comprehensive map of all URLs on a website. This node quickly discovers and lists all accessible URLs on a domain, optionally filtered by search terms, subdomains, and configured limits.

Provider

Firecrawl

Connection

Name
Description
Required
Category

Firecrawl Connection

The Firecrawl connection to use for the map.

firecrawl

Input Parameters

Name
Type
Required
Default
Description

url

string

-

The URL to scrape

search

string

-

-

Use the search feature to find URLs relevant to your query. For example, entering 'blog' will retrieve all URLs related to 'blog'.

include_subdomains

boolean

-

true

Include subdomains of the url in the result such as docs., blog., etc.

ignore_sitemap

boolean

-

false

Ignore the website sitemap when mapping

limit

integer

-

1000

The maximum number of URLs to return. Maximum is 5000.

View JSON Schema
{
  "description": "Firecrawl map node input.",
  "properties": {
    "url": {
      "title": "URL",
      "type": "string",
      "format": "uri",
      "description": "The URL to scrape."
    },
    "search": {
      "title": "Search",
      "type": "string",
      "description": "Use the search feature to find URLs relevant to your query. For example, entering 'blog' will retrieve all URLs related to 'blog'."
    },
    "include_subdomains": {
      "title": "Include Subdomains",
      "type": "boolean",
      "default": true,
      "description": "Include subdomains of the url in the result such as docs.*, blog.*, etc."
    },
    "ignore_sitemap": {
      "title": "Ignore Sitemap",
      "type": "boolean",
      "default": false,
      "description": "Ignore the website sitemap when mapping."
    },
    "limit": {
      "title": "Limit",
      "type": "integer",
      "default": 1000,
      "description": "The maximum number of URLs to return. Maximum is 5000."
    }
  },
  "required": [
    "url"
  ],
  "title": "FirecrawlMapInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

result

array

The output from the Firecrawl map

View JSON Schema
{
  "description": "Firecrawl map node output.",
  "properties": {
    "result": {
      "title": "Result",
      "type": "array",
      "items": {"type": "string"},
      "description": "The output from the Firecrawl map."
    }
  },
  "required": [
    "result"
  ],
  "title": "FirecrawlMapOutput",
  "type": "object"
}

How It Works

This node analyzes a website's structure and generates a complete or filtered list of all accessible URLs. It can use the website's sitemap for efficiency, search for specific URL patterns, and optionally include subdomain URLs. The result is an array of URL strings that can be used for further processing.

Usage Examples

Example 1: Map Entire Website

Input:

url: "https://example.com"
search: null
include_subdomains: false
ignore_sitemap: false
limit: 1000

Output:

result: [
  "https://example.com/",
  "https://example.com/about",
  "https://example.com/products",
  "https://example.com/products/item1",
  "https://example.com/products/item2",
  "https://example.com/contact",
  "https://example.com/blog"
]

Example 2: Map Blog URLs Only

Input:

url: "https://example.com"
search: "blog"
include_subdomains: false
ignore_sitemap: false
limit: 500

Output:

result: [
  "https://example.com/blog",
  "https://example.com/blog/post1",
  "https://example.com/blog/post2",
  "https://example.com/blog/post3",
  "https://example.com/blog/category/tech",
  "https://example.com/blog/category/news"
]

Example 3: Map with Subdomains

Input:

url: "https://example.com"
search: null
include_subdomains: true
ignore_sitemap: false
limit: 2000

Output:

result: [
  "https://example.com/",
  "https://docs.example.com/",
  "https://docs.example.com/api",
  "https://blog.example.com/",
  "https://blog.example.com/post1",
  "https://api.example.com/v1"
]

Common Use Cases

  • SEO Auditing: Discover all pages on a website for SEO analysis and optimization

  • Website Crawling: Generate a complete URL list before scraping or analyzing a site

  • Link Analysis: Map internal links and site structure for analysis

  • Backup Planning: Create a comprehensive list of all URLs before migrating or backing up a site

  • Content Inventory: Take inventory of all content pages on a website

  • API Discovery: Find all API documentation pages across a domain

  • Competitive Analysis: Map competitor websites to understand their structure

Error Handling

Error Type
Cause
Solution

Invalid URL

URL format is incorrect or domain doesn't exist

Verify the URL is valid and properly formatted

Sitemap Not Found

Website doesn't have a sitemap.xml file

Set ignore_sitemap to true and let crawler discover URLs

Access Denied

Website blocks automated crawling

Check robots.txt and website terms; verify bot access is allowed

Timeout

Website structure is too complex or large

Increase timeout or reduce limit parameter

Empty Result

No URLs found matching the search criteria

Verify search term is correct or remove search filter

Rate Limited

Too many requests to the same domain

Space out requests or reduce limit per request

Notes

  • Sitemap Usage: By default, the mapper uses sitemap.xml for efficiency. Set ignore_sitemap to true to crawl instead.

  • Search Filter: The search parameter filters URLs containing specific keywords. Use lowercase for best results.

  • Subdomain Inclusion: Including subdomains significantly increases the URL count. Use carefully on large sites.

  • URL Limits: The maximum limit is 5000 URLs. For larger sites, use multiple requests with search filters.

  • Performance: Mapping large websites can take time. Start with a lower limit to test, then increase as needed.

  • URL Patterns: Results include all accessible URLs, including query parameters and fragments depending on the site structure.

Last updated

Was this helpful?