📄URL → Markdown Action Guide
Overview
The URL → Markdown action is a powerful data conversion node that transforms various file formats and web content into clean, readable Markdown text. This action is essential for document processing workflows, enabling you to extract structured content from URLs, PDFs, Word documents, Excel files, PowerPoint presentations, and HTML pages.
🎬 Video Tutorial
Supported File Formats
The URL → Markdown action can process the following formats:
📄 Document Formats
PDF - Portable Document Format files
DOCX - Microsoft Word documents
DOC - Legacy Word documents
RTF - Rich Text Format files
TXT - Plain text files
📊 Spreadsheet Formats
XLSX - Microsoft Excel spreadsheets
XLS - Legacy Excel files
CSV - Comma-separated values
TSV - Tab-separated values
📑 Presentation Formats
PPTX - Microsoft PowerPoint presentations
PPT - Legacy PowerPoint files
🌐 Web Formats
HTML - Web pages and HTML files
XML - Structured markup documents
JSON - JavaScript Object Notation (basic conversion)
How It Works
graph LR
A[📄 Input File/URL] --> B[🔄 URL → Markdown Action]
B --> C[📝 Clean Markdown Output]
A1[PDF Document] --> B
A2[Word DOCX] --> B
A3[Excel XLSX] --> B
A4[PowerPoint PPTX] --> B
A5[HTML Page] --> B
B --> C1[Structured Text]
B --> C2[Formatted Tables]
B --> C3[Bullet Points]
B --> C4[Headers & Sections]
style A fill:#e3f2fd
style B fill:#f3e5f5
style C fill:#e8f5e8
Configuration Options
Input Parameters
URL
String
✅ Yes
Direct URL to the file or document
File Path
String
✅ Yes
Local file path (alternative to URL)
Format Hint
String
⚪ Optional
Specify expected format if URL doesn't have extension
Extract Options
Object
⚪ Optional
Additional extraction preferences
Output Format
The action returns clean Markdown with:
Headers - Properly formatted
#
,##
,###
structureTables - Converted to Markdown table format
Lists - Bullet points and numbered lists
Text Formatting - Bold, italic, and other basic formatting
Links - Preserved as Markdown links where applicable
Step-by-Step Usage Guide
Step 1: Add the Action to Your Workflow
Open your workflow in the Visual Builder
From the Data Processing section, drag the URL → Markdown action
Connect it to your trigger or previous action
Step 2: Configure Input Source
Choose your input method:
Option A: Direct URL
url: "https://example.com/document.pdf"
Option B: File Upload
file_path: "/uploads/document.docx"
Option C: Variable from Previous Action
url: "{{previous_action.file_url}}"
Step 3: Set Processing Options
extract_options:
preserve_formatting: true
include_tables: true
convert_images: false
remove_empty_lines: true
Step 4: Handle the Output
The action outputs clean Markdown that you can:
Store in a database or file system
Process with AI for summarization
Convert to other formats (HTML, PDF)
Send via email or notifications
Common Use Cases
📚 Document Processing Workflows
Use Case: Convert uploaded documents to searchable text
trigger: file_upload
actions:
- url_to_markdown:
file_path: "{{trigger.file_path}}"
- text_analysis:
content: "{{url_to_markdown.markdown}}"
- database_store:
processed_text: "{{text_analysis.summary}}"
🌐 Web Content Extraction
Use Case: Extract article content from web pages
trigger: webhook
actions:
- url_to_markdown:
url: "{{trigger.article_url}}"
- ai_summarize:
text: "{{url_to_markdown.markdown}}"
- publish_summary:
content: "{{ai_summarize.summary}}"
📊 Report Generation
Use Case: Convert Excel reports to readable format
trigger: schedule
actions:
- url_to_markdown:
url: "https://reports.company.com/monthly.xlsx"
- format_report:
markdown: "{{url_to_markdown.markdown}}"
- email_report:
body: "{{format_report.html}}"
Output Examples
PDF to Markdown
Input: Business proposal PDF Output:
# Business Proposal: Q4 Expansion
## Executive Summary
This proposal outlines our strategic expansion plans for Q4 2024...
## Financial Projections
| Quarter | Revenue | Growth |
|---------|---------|--------|
| Q4 2024 | $2.5M | 15% |
| Q1 2025 | $2.9M | 16% |
## Key Initiatives
- Market expansion in Southeast Asia
- Product line diversification
- Team scaling (20+ new hires)
Excel to Markdown
Input: Sales data spreadsheet Output:
# Sales Report - January 2025
## Regional Performance
| Region | Sales | Target | Achievement |
|--------|-------|--------|-------------|
| North | $450K | $400K | 112.5% |
| South | $380K | $350K | 108.6% |
| East | $290K | $300K | 96.7% |
| West | $510K | $450K | 113.3% |
## Top Performers
1. Sarah Johnson - $89K
2. Mike Chen - $76K
3. Lisa Rodriguez - $72K
Advanced Configuration
Custom Extraction Rules
url_to_markdown:
url: "{{input.document_url}}"
extract_options:
# Table processing
preserve_table_formatting: true
merge_table_cells: false
# Text processing
remove_headers_footers: true
preserve_whitespace: false
# Image handling
extract_image_alt_text: true
include_image_urls: false
# Quality settings
ocr_quality: "high"
encoding: "utf-8"
Error Handling
url_to_markdown:
url: "{{input.file_url}}"
on_error:
action: "continue"
fallback_content: "Document could not be processed"
retry_attempts: 3
timeout_seconds: 30
Performance Considerations
File Size Limits
PDF: Up to 50MB
DOCX/XLSX: Up to 25MB
HTML: Up to 10MB
Images: Extracted as alt-text only
Processing Time
Small files (<1MB): 1-3 seconds
Medium files (1-10MB): 5-15 seconds
Large files (10MB+): 15-45 seconds
Best Practices
✅ Do
Validate URLs before processing
Handle large files asynchronously
Cache frequently accessed documents
Use appropriate timeout settings
❌ Don't
Process password-protected files without credentials
Assume perfect formatting preservation
Skip error handling for external URLs
Process extremely large files synchronously
Troubleshooting
Common Issues
Empty output
Invalid URL or unsupported format
Verify URL accessibility and format
Garbled text
Encoding issues
Specify correct encoding in options
Missing tables
Complex table structure
Enable preserve_table_formatting
Timeout errors
Large file or slow connection
Increase timeout or process asynchronously
Supported URL Types
✅ Supported
Direct file URLs:
https://site.com/doc.pdf
Cloud storage: Google Drive, Dropbox, OneDrive
Public repositories: GitHub, GitLab
Document platforms: Notion (public pages)
❌ Not Supported
Password-protected documents
Login-required content
Dynamic JavaScript content
Embedded media files
Integration Examples
With AI Analysis
workflow:
name: "Document Intelligence"
steps:
- url_to_markdown:
url: "{{trigger.document_url}}"
- ai_analysis:
prompt: "Analyze this document and extract key insights"
content: "{{url_to_markdown.markdown}}"
- generate_summary:
insights: "{{ai_analysis.response}}"
With Database Storage
workflow:
name: "Document Archive"
steps:
- url_to_markdown:
url: "{{trigger.file_url}}"
- database_insert:
table: "documents"
data:
original_url: "{{trigger.file_url}}"
markdown_content: "{{url_to_markdown.markdown}}"
processed_at: "{{now()}}"
Related Actions
HTML to Text - For simple HTML text extraction
PDF Reader - For PDF-specific processing
File Upload - For handling file inputs
Text Processor - For post-processing markdown
💡 Pro Tip: Combine the URL → Markdown action with AI processing nodes to create powerful document intelligence workflows that can summarize, analyze, and extract insights from any document format automatically.
Last updated
Was this helpful?