Text Extract

Action ID: text_extract

Description

Extract text content from various file formats including PDFs, images, and documents using OCR and document parsing technology.

Connection

Name

Description

Required

Input Parameters

Name

Type

Required

Default

Description

file

string

✓

URL of the file to extract text from. Supports PDFs, images, and various document formats

View JSON Schema

{
  "description": "Text extract node input.",
  "properties": {
    "file": {
      "description": "File to extract text from",
      "title": "File",
      "type": "string"
    }
  },
  "required": [
    "file"
  ],
  "title": "TextExtractNodeInput",
  "type": "object"
}

Output Parameters

Name

Type

Description

text

string

The extracted text content from the file

View JSON Schema

{
  "description": "Text extract node output.",
  "properties": {
    "text": {
      "description": "Extracted text from the file",
      "title": "Extracted text",
      "type": "string"
    }
  },
  "required": [
    "text"
  ],
  "title": "TextExtractNodeOutput",
  "type": "object"
}

How It Works

This node uses advanced OCR (Optical Character Recognition) and document parsing technology to extract text from various file formats. For PDFs, it extracts embedded text directly when available or applies OCR for scanned PDFs. For images, it uses computer vision to detect and recognize text characters. The extracted text maintains line breaks and paragraph structure when possible, providing clean, readable output.

Usage Examples

Example 1: Extract Text from PDF

Input:

file: "https://example.com/invoice.pdf"

Output:

text: "Invoice #12345
Date: January 15, 2025
Customer: Acme Corp
Total Amount: $1,250.00
..."

Example 2: Extract Text from Image

Input:

file: "https://example.com/business-card.jpg"

Output:

text: "John Doe
Senior Developer
Acme Technology
[email protected]
+1 (555) 123-4567"

Example 3: Extract Text from Scanned Document

Input:

file: "https://example.com/scanned-contract.pdf"

Output:

text: "SERVICE AGREEMENT

This agreement is made on January 15, 2025 between...
Terms and Conditions:
1. Service Duration
2. Payment Terms
..."

Common Use Cases

Invoice Processing: Extract text from invoices for automated accounting and data entry
Document Digitization: Convert scanned documents and images into searchable, editable text
Receipt Processing: Extract information from receipts for expense tracking and reporting
Business Card Scanning: Extract contact information from business card images
Form Processing: Extract data from filled forms and applications
Legal Document Processing: Extract text from contracts and legal documents for review
ID Verification: Extract text from identification documents for verification workflows

Error Handling

Error Type

Cause

Solution

Invalid File URL

URL is malformed or file is inaccessible

Verify the file URL is valid and publicly accessible

Unsupported Format

File format is not supported for text extraction

Convert file to a supported format (PDF, JPG, PNG, etc.)

No Text Found

File contains no readable text

Ensure the file contains visible text and is not blank

File Too Large

File size exceeds maximum allowed

Compress or split the file into smaller parts

Low Image Quality

Image quality is too poor for OCR

Use a higher resolution scan or image

Connection Error

Cannot connect to PixelML API

Check your PixelML connection settings and API key

Notes

Supported Formats: Works with PDFs, JPG, PNG, JPEG, WebP, and other common image and document formats.
OCR Accuracy: Accuracy depends on image quality, text clarity, and font legibility. High-resolution images produce best results.
Language Support: The system supports multiple languages for text extraction. Specify language if detection is inaccurate.
Text Structure: The node attempts to preserve text structure including paragraphs and line breaks.
Processing Time: Extraction time varies based on file size and complexity. Large PDFs may take 30-60 seconds.
Handwriting: OCR works best with printed text. Handwritten text may have lower accuracy rates.

PreviousSend Telegram Video NextText to Music

Last updated 3 months ago

hashtagDescription

hashtagConnection

hashtagInput Parameters

hashtagOutput Parameters

hashtagHow It Works

hashtagUsage Examples

hashtagExample 1: Extract Text from PDF

hashtagExample 2: Extract Text from Image

hashtagExample 3: Extract Text from Scanned Document

hashtagCommon Use Cases

hashtagError Handling

hashtagNotes

Description

Connection

Input Parameters

Output Parameters

How It Works

Usage Examples

Example 1: Extract Text from PDF

Example 2: Extract Text from Image

Example 3: Extract Text from Scanned Document

Common Use Cases

Error Handling

Notes