URL to Markdown

Action ID: url_to_markdown

Description

Convert a URL to markdown

Input Parameters

Name
Type
Required
Default
Description

url

string

-

URL to convert. Can be a website URL or a direct link to PDF/HTML/DOCX/DOC/XLSX/XLS/PPTX/PPT/TXT file.

View JSON Schema
{
  "description": "Url to markdown node input.",
  "properties": {
    "url": {
      "description": "It can be a url to a PDF/HTML/DOCX/DOC/XLSX/XLS/PPTX/PPT/TXT file or a website url",
      "title": "URL",
      "type": "string"
    }
  },
  "required": [
    "url"
  ],
  "title": "UrlToMarkdownInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

markdown

string

The markdown formatted content extracted from the URL.

View JSON Schema
{
  "description": "Url to markdown node output.",
  "properties": {
    "markdown": {
      "title": "Markdown",
      "type": "string"
    }
  },
  "required": [
    "markdown"
  ],
  "title": "UrlToMarkdownOutput",
  "type": "object"
}

How It Works

This node fetches content from the provided URL, identifies the content type (web page, PDF, Office document, etc.), extracts the text and structure, and converts it into clean markdown format. For websites, it parses HTML and converts elements like headings, lists, links, and tables into markdown. For documents, it extracts text content while preserving formatting and structure. The output is a standardized markdown representation that's easy to process, read, and analyze.

Usage Examples

Example 1: Convert Website to Markdown

Input:

url: "https://example.com/blog/article-title"

Output:

markdown: "# Article Title\n\nThis is the introduction paragraph...\n\n## Section 1\n\nContent here with [links](https://example.com) and **bold text**.\n\n- List item 1\n- List item 2\n\n## Section 2\n\nMore content..."

Example 2: Extract PDF Content

Input:

url: "https://company.com/documents/whitepaper.pdf"

Output:

markdown: "# Company Whitepaper 2024\n\n## Executive Summary\n\nThis whitepaper discusses...\n\n### Key Findings\n\n1. Finding one\n2. Finding two\n3. Finding three\n\n## Detailed Analysis\n\nThe analysis reveals..."

Example 3: Convert Word Document

Input:

url: "https://storage.example.com/reports/quarterly-report.docx"

Output:

markdown: "# Q1 2024 Quarterly Report\n\n## Financial Performance\n\n**Revenue:** $5.2M (+15% YoY)\n**Expenses:** $3.8M\n**Net Profit:** $1.4M\n\n### Key Metrics\n\n| Metric | Q1 2024 | Q4 2023 | Change |\n|--------|---------|---------|--------|\n| Users  | 125K    | 110K    | +13.6% |"

Common Use Cases

  • Web Scraping: Extract and structure content from web pages for analysis or processing

  • Document Processing: Convert various document formats into a unified markdown format

  • Content Archiving: Save web content in a clean, portable markdown format

  • AI Training Data: Prepare web and document content for AI model training or fine-tuning

  • Research Automation: Collect and structure information from multiple online sources

  • Knowledge Base Building: Extract content from documentation sites to build internal knowledge bases

  • Content Migration: Convert content from various sources into markdown for CMS migration

Error Handling

Error Type
Cause
Solution

Invalid URL

URL format is incorrect

Verify the URL starts with http:// or https://

URL Not Accessible

Cannot reach the URL

Check if URL is public and server is responding

Unsupported Format

File format is not supported

Use supported formats: PDF, HTML, DOCX, DOC, XLSX, XLS, PPTX, PPT, TXT

Download Failed

Cannot download content from URL

Verify URL accessibility and check for authentication requirements

Parsing Error

Content cannot be parsed

Check if file is corrupted or content structure is unusual

Empty Content

URL returns no extractable content

Verify the URL contains actual content and not just scripts/styles

Timeout Error

Request took too long

Try again or check if server is slow to respond

Authentication Required

URL requires login/credentials

Provide publicly accessible URL or use authenticated endpoints

Notes

  • Supported Formats: The node supports PDF, HTML, DOCX, DOC, XLSX, XLS, PPTX, PPT, and TXT formats. Ensure your URL points to one of these formats.

  • Public Accessibility: The URL must be publicly accessible. Password-protected or authentication-required URLs will fail.

  • Content Preservation: The node attempts to preserve document structure (headings, lists, tables) in the markdown output.

  • Large Documents: Very large documents may take time to process. Consider breaking them into smaller sections if possible.

  • Dynamic Content: JavaScript-heavy websites may not render fully. Static HTML content converts best.

  • Formatting Limitations: Complex formatting like colors, fonts, and advanced layouts may be simplified in markdown.

  • Link Preservation: External and internal links in the original content are preserved as markdown links.

  • Use with AI: The markdown output is ideal for feeding into AI models for analysis, summarization, or question answering.

Last updated

Was this helpful?