# URL to Markdown

**Action ID:** `url_to_markdown`

## Description

Convert a URL to markdown

## Input Parameters

| Name | Type   | Required | Default | Description                                                                                            |
| ---- | ------ | :------: | ------- | ------------------------------------------------------------------------------------------------------ |
| url  | string |     ✓    | -       | URL to convert. Can be a website URL or a direct link to PDF/HTML/DOCX/DOC/XLSX/XLS/PPTX/PPT/TXT file. |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Url to markdown node input.",
  "properties": {
    "url": {
      "description": "It can be a url to a PDF/HTML/DOCX/DOC/XLSX/XLS/PPTX/PPT/TXT file or a website url",
      "title": "URL",
      "type": "string"
    }
  },
  "required": [
    "url"
  ],
  "title": "UrlToMarkdownInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name     | Type   | Description                                            |
| -------- | ------ | ------------------------------------------------------ |
| markdown | string | The markdown formatted content extracted from the URL. |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Url to markdown node output.",
  "properties": {
    "markdown": {
      "title": "Markdown",
      "type": "string"
    }
  },
  "required": [
    "markdown"
  ],
  "title": "UrlToMarkdownOutput",
  "type": "object"
}
```

</details>

## How It Works

This node fetches content from the provided URL, identifies the content type (web page, PDF, Office document, etc.), extracts the text and structure, and converts it into clean markdown format. For websites, it parses HTML and converts elements like headings, lists, links, and tables into markdown. For documents, it extracts text content while preserving formatting and structure. The output is a standardized markdown representation that's easy to process, read, and analyze.

## Usage Examples

### Example 1: Convert Website to Markdown

**Input:**

```
url: "https://example.com/blog/article-title"
```

**Output:**

```
markdown: "# Article Title\n\nThis is the introduction paragraph...\n\n## Section 1\n\nContent here with [links](https://example.com) and **bold text**.\n\n- List item 1\n- List item 2\n\n## Section 2\n\nMore content..."
```

### Example 2: Extract PDF Content

**Input:**

```
url: "https://company.com/documents/whitepaper.pdf"
```

**Output:**

```
markdown: "# Company Whitepaper 2024\n\n## Executive Summary\n\nThis whitepaper discusses...\n\n### Key Findings\n\n1. Finding one\n2. Finding two\n3. Finding three\n\n## Detailed Analysis\n\nThe analysis reveals..."
```

### Example 3: Convert Word Document

**Input:**

```
url: "https://storage.example.com/reports/quarterly-report.docx"
```

**Output:**

```
markdown: "# Q1 2024 Quarterly Report\n\n## Financial Performance\n\n**Revenue:** $5.2M (+15% YoY)\n**Expenses:** $3.8M\n**Net Profit:** $1.4M\n\n### Key Metrics\n\n| Metric | Q1 2024 | Q4 2023 | Change |\n|--------|---------|---------|--------|\n| Users  | 125K    | 110K    | +13.6% |"
```

## Common Use Cases

* **Web Scraping**: Extract and structure content from web pages for analysis or processing
* **Document Processing**: Convert various document formats into a unified markdown format
* **Content Archiving**: Save web content in a clean, portable markdown format
* **AI Training Data**: Prepare web and document content for AI model training or fine-tuning
* **Research Automation**: Collect and structure information from multiple online sources
* **Knowledge Base Building**: Extract content from documentation sites to build internal knowledge bases
* **Content Migration**: Convert content from various sources into markdown for CMS migration

## Error Handling

| Error Type              | Cause                              | Solution                                                               |
| ----------------------- | ---------------------------------- | ---------------------------------------------------------------------- |
| Invalid URL             | URL format is incorrect            | Verify the URL starts with http\:// or https\://                       |
| URL Not Accessible      | Cannot reach the URL               | Check if URL is public and server is responding                        |
| Unsupported Format      | File format is not supported       | Use supported formats: PDF, HTML, DOCX, DOC, XLSX, XLS, PPTX, PPT, TXT |
| Download Failed         | Cannot download content from URL   | Verify URL accessibility and check for authentication requirements     |
| Parsing Error           | Content cannot be parsed           | Check if file is corrupted or content structure is unusual             |
| Empty Content           | URL returns no extractable content | Verify the URL contains actual content and not just scripts/styles     |
| Timeout Error           | Request took too long              | Try again or check if server is slow to respond                        |
| Authentication Required | URL requires login/credentials     | Provide publicly accessible URL or use authenticated endpoints         |

## Notes

* **Supported Formats**: The node supports PDF, HTML, DOCX, DOC, XLSX, XLS, PPTX, PPT, and TXT formats. Ensure your URL points to one of these formats.
* **Public Accessibility**: The URL must be publicly accessible. Password-protected or authentication-required URLs will fail.
* **Content Preservation**: The node attempts to preserve document structure (headings, lists, tables) in the markdown output.
* **Large Documents**: Very large documents may take time to process. Consider breaking them into smaller sections if possible.
* **Dynamic Content**: JavaScript-heavy websites may not render fully. Static HTML content converts best.
* **Formatting Limitations**: Complex formatting like colors, fonts, and advanced layouts may be simplified in markdown.
* **Link Preservation**: External and internal links in the original content are preserved as markdown links.
* **Use with AI**: The markdown output is ideal for feeding into AI models for analysis, summarization, or question answering.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/url_to_markdown.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
