PDF to text
Extract text from a PDF - Supports OCR
The PDF-to-Text action allows you to extract all the text data from PDF files and further analyze the text or use it in applications such as question answering. You can save the extracted text into a knowledge set to avoid redoing the PDF-to-Text step.
On this page, we will introduce the steps to convert PDF to text in AgenticFlow AI.
How to Use the Convert PDF to Text Action
Add the Component
Navigate to the Workflow page.
Click on + Create Workflow or select an existing workflow.
Click on + Add Action.
Select Convert PDF to Text from the list of action components.
File URL
A PDF-to-text converter requires a file as an input. If your file is publicly accessible on the web (i.e., with no authentication or sign-up requirement), simply provide the URL directly or as a text input. Otherwise, you will need to add a File-to-URL input. In either situation, use the {{variable_name}}
syntax to provide the data to the converter.
Use OCR
OCR (Optical Character Recognition) is needed for image PDFs (e.g., scanned data). This option uses more credits, so only activate it for image PDFs.
Available Converters
Fast Converter: The default PDF-to-text converter, which is fast and reasonably accurate.
Quality Converter: Slower but more accurate compared to the fast converter.
Additional Information
Follow the links below for more information about:
Access the Step Output
The output is a dictionary with two keys: text
and number_of_pages
, containing the extracted text and the number of pages in the file, respectively. Below are examples where the default name assigned to the step is pdf_to_text
.
Example Access
Note that a step name is different from the step title. Step titles can be found on the top left of steps. A step name is shown on the bottom left, in a smaller font and highlighted green.
Common Errors
Unsupported Protocol
An error similar to the one noted below indicates that the provided input is not a valid URL.
Error:
Last updated