Import Data into Dataset

Action ID: import_dataset

Description

Import a file into a dataset.

Input Parameters

Name
Type
Required
Default
Description

file_url

string

-

The URL of the file to import. Supports CSV, XLSX, and JSON lines.

dataset_id

string

-

The ID of the dataset to import into.

View JSON Schema
{
  "description": "Import Data Set node input.",
  "properties": {
    "file_url": {
      "description": "The URL of the file to import. Supports CSV, XLSX, and JSON lines.",
      "title": "File URL",
      "type": "string"
    },
    "dataset_id": {
      "description": "The ID of the dataset to import into, if applicable.",
      "title": "Dataset ID",
      "type": "string"
    }
  },
  "required": [
    "file_url",
    "dataset_id"
  ],
  "title": "ImportDatasetNodeInput",
  "type": "object"
}

Output Parameters

Name
Type
Description

records_imported

integer

The number of records successfully imported.

records_failed

integer

The number of records that failed to import.

View JSON Schema
{
  "description": "Import Data Set node output.",
  "properties": {
    "records_imported": {
      "description": "The number of records imported.",
      "title": "Records Imported",
      "type": "integer"
    },
    "records_failed": {
      "description": "The number of records failed to import.",
      "title": "Records Failed",
      "type": "integer"
    }
  },
  "required": [
    "records_imported",
    "records_failed"
  ],
  "title": "ImportDatasetNodeOutput",
  "type": "object"
}

How It Works

This node fetches a file from the specified URL, parses it based on its format (CSV, XLSX, or JSON lines), and imports the records into the target dataset. Each row or JSON object is processed as an individual record. The node validates data format, handles parsing errors, and provides a summary of successfully imported and failed records. The import operation is transactional, ensuring data integrity throughout the process.

Usage Examples

Example 1: Import CSV File

Input:

file_url: "https://example.com/users.csv"
dataset_id: "dataset_abc123"

Output:

records_imported: 1523
records_failed: 7

Example 2: Import Excel File

Input:

file_url: "https://storage.example.com/sales_data_2024.xlsx"
dataset_id: "dataset_sales_001"

Output:

records_imported: 8450
records_failed: 0

Example 3: Import JSON Lines File

Input:

file_url: "https://api.example.com/exports/events.jsonl"
dataset_id: "dataset_events_456"

Output:

records_imported: 12350
records_failed: 23

Common Use Cases

  • Bulk Data Loading: Import large volumes of data from external sources into your datasets

  • Data Migration: Transfer data from legacy systems or external platforms into AgenticFlow

  • Periodic Data Updates: Schedule regular imports to keep datasets synchronized with external sources

  • ETL Pipelines: Part of extract-transform-load workflows for data integration

  • Data Consolidation: Combine data from multiple file sources into a centralized dataset

  • Initial Setup: Populate new datasets with historical or reference data

  • Third-Party Integration: Import data exported from CRM, analytics, or other business tools

Error Handling

Error Type
Cause
Solution

Invalid URL

File URL is malformed or inaccessible

Verify the URL is correct and publicly accessible

Unsupported Format

File format is not CSV, XLSX, or JSON lines

Convert file to a supported format before importing

Dataset Not Found

Dataset ID doesn't exist

Verify the dataset_id is correct and the dataset exists

File Download Failed

Cannot access the file at the URL

Check URL accessibility, authentication, or network connectivity

Parsing Error

File format is corrupted or invalid

Validate file structure and ensure proper formatting

Schema Mismatch

File columns don't match dataset schema

Ensure file headers match expected dataset fields

Size Limit Exceeded

File is too large to process

Split large files into smaller chunks for import

Notes

  • Supported Formats: The node accepts CSV, XLSX (Excel), and JSON Lines (JSONL) formats. Ensure your file matches one of these formats.

  • File Accessibility: The file_url must be publicly accessible or use pre-signed URLs if using cloud storage like S3 or Google Cloud Storage.

  • Column Mapping: CSV and XLSX files should have header rows that match the dataset field names for proper mapping.

  • JSON Lines Format: Each line must contain a valid JSON object representing one record.

  • Error Tracking: Check the records_failed count to identify if any records didn't import. Review logs for specific error details.

  • Performance: Large files may take time to process. Consider splitting extremely large datasets into multiple imports.

  • Data Validation: Records that fail validation checks (e.g., missing required fields, invalid data types) will be counted in records_failed.

  • Idempotency: Re-importing the same file may create duplicate records unless your dataset has unique constraints configured.

Last updated

Was this helpful?