Advanced Web Scraping
A guide to the Apify integration for advanced, large-scale web scraping tasks.
Last updated
Was this helpful?
A guide to the Apify integration for advanced, large-scale web scraping tasks.
Last updated
Was this helpful?
While the standard action is great for simple pages, the Advanced Web Scraping Action provides a powerful, large-scale scraping solution by integrating with .
Use this action when you need to:
Scrape data from complex, modern websites that rely heavily on JavaScript.
Handle features like infinite scrolling, pagination, and pop-ups.
Extract thousands of records from a site.
Run scrapers on a schedule.
You will need an Apify account and your Apify API token.
Sign up for an account on .
Find your API token in your Apify account settings under Settings > Integrations.
In AgenticFlow, navigate to Settings > Connections and add a new Apify Connection, providing your API token.
Apify works using "Actors," which are pre-built cloud programs designed for specific scraping tasks. The most common one is the Website Content Crawler, which can be configured to extract specific data from a site.
You don't configure the scraper inside AgenticFlow. Instead, you configure an Actor on the Apify platform and then simply tell the AgenticFlow action to run it.
Connection
Connection
Select the Apify connection you created.
Actor ID
Text
The ID of the Apify Actor you want to run (e.g., apify/website-content-crawler
).
Run Parameters
JSON
A JSON object containing the specific configuration for the Actor run, such as the target URLs and the data to extract.
Output
Array
The structured data extracted by the Apify Actor, usually an array of JSON objects.
Let's say you want to get the names and prices of all products from a specific e-commerce category page.
On Apify:
Find the "Website Content Crawler" actor and create a new task.
Configure it to scrape the e-commerce category URL.
Specify the CSS selectors for the product name (h2.product-title
) and price (span.price
).
Save the task and note its Actor ID.
In AgenticFlow:
Add the Advanced Web Scraping Action.
Connection: Select your Apify connection.
Actor ID: apify/website-content-crawler
(or the specific ID of your saved task).
Run Parameters: This JSON tells the actor what to do for this specific run.
Result:
The action will trigger the Apify Actor, which will visit the URL, extract the data, and return it.
The Output of the action will be an array of objects, ready to be used in a Map action or saved to a Google Sheet.