PDF to text

Extract text from a PDF - Supports OCR

The PDF-to-Text action allows you to extract all the text data from PDF files and further analyze the text or use it in applications such as question answering. You can save the extracted text into a knowledge set to avoid redoing the PDF-to-Text step.

On this page, we will introduce the steps to convert PDF to text in AgenticFlow AI.

How to Use the Convert PDF to Text Action

Add the Component

Navigate to the Workflow page.
Click on + Create Workflow or select an existing workflow.
Click on + Add Action.
Select Convert PDF to Text from the list of action components.

File URL

A PDF-to-text converter requires a file as an input. If your file is publicly accessible on the web (i.e., with no authentication or sign-up requirement), simply provide the URL directly or as a text input. Otherwise, you will need to add a File-to-URL input. In either situation, use the {{variable_name}} syntax to provide the data to the converter.

Use OCR

OCR (Optical Character Recognition) is needed for image PDFs (e.g., scanned data). This option uses more credits, so only activate it for image PDFs.

Available Converters

Fast Converter: The default PDF-to-text converter, which is fast and reasonably accurate.
Quality Converter: Slower but more accurate compared to the fast converter.

Additional Information

Follow the links below for more information about:

Access the Step Output

The output is a dictionary with two keys: text and number_of_pages, containing the extracted text and the number of pages in the file, respectively. Below are examples where the default name assigned to the step is pdf_to_text.

Example Access

# Accessing the extracted text
pdf_to_text.text

# Accessing the number of pages
pdf_to_text.number_of_pages

Note that a step name is different from the step title. Step titles can be found on the top left of steps. A step name is shown on the bottom left, in a smaller font and highlighted green.

Common Errors

Unsupported Protocol

An error similar to the one noted below indicates that the provided input is not a valid URL.

Error:

Only HTTP(S) protocols are supported

PreviousPython Helper Functions NextExtract Website Content

Last updated 1 month ago

Was this helpful?