# Import Data into Dataset

**Action ID:** `import_dataset`

## Description

Import a file into a dataset.

## Input Parameters

| Name        | Type   | Required | Default | Description                                                        |
| ----------- | ------ | :------: | ------- | ------------------------------------------------------------------ |
| file\_url   | string |     ✓    | -       | The URL of the file to import. Supports CSV, XLSX, and JSON lines. |
| dataset\_id | string |     ✓    | -       | The ID of the dataset to import into.                              |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Import Data Set node input.",
  "properties": {
    "file_url": {
      "description": "The URL of the file to import. Supports CSV, XLSX, and JSON lines.",
      "title": "File URL",
      "type": "string"
    },
    "dataset_id": {
      "description": "The ID of the dataset to import into, if applicable.",
      "title": "Dataset ID",
      "type": "string"
    }
  },
  "required": [
    "file_url",
    "dataset_id"
  ],
  "title": "ImportDatasetNodeInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name              | Type    | Description                                  |
| ----------------- | ------- | -------------------------------------------- |
| records\_imported | integer | The number of records successfully imported. |
| records\_failed   | integer | The number of records that failed to import. |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Import Data Set node output.",
  "properties": {
    "records_imported": {
      "description": "The number of records imported.",
      "title": "Records Imported",
      "type": "integer"
    },
    "records_failed": {
      "description": "The number of records failed to import.",
      "title": "Records Failed",
      "type": "integer"
    }
  },
  "required": [
    "records_imported",
    "records_failed"
  ],
  "title": "ImportDatasetNodeOutput",
  "type": "object"
}
```

</details>

## How It Works

This node fetches a file from the specified URL, parses it based on its format (CSV, XLSX, or JSON lines), and imports the records into the target dataset. Each row or JSON object is processed as an individual record. The node validates data format, handles parsing errors, and provides a summary of successfully imported and failed records. The import operation is transactional, ensuring data integrity throughout the process.

## Usage Examples

### Example 1: Import CSV File

**Input:**

```
file_url: "https://example.com/users.csv"
dataset_id: "dataset_abc123"
```

**Output:**

```
records_imported: 1523
records_failed: 7
```

### Example 2: Import Excel File

**Input:**

```
file_url: "https://storage.example.com/sales_data_2024.xlsx"
dataset_id: "dataset_sales_001"
```

**Output:**

```
records_imported: 8450
records_failed: 0
```

### Example 3: Import JSON Lines File

**Input:**

```
file_url: "https://api.example.com/exports/events.jsonl"
dataset_id: "dataset_events_456"
```

**Output:**

```
records_imported: 12350
records_failed: 23
```

## Common Use Cases

* **Bulk Data Loading**: Import large volumes of data from external sources into your datasets
* **Data Migration**: Transfer data from legacy systems or external platforms into AgenticFlow
* **Periodic Data Updates**: Schedule regular imports to keep datasets synchronized with external sources
* **ETL Pipelines**: Part of extract-transform-load workflows for data integration
* **Data Consolidation**: Combine data from multiple file sources into a centralized dataset
* **Initial Setup**: Populate new datasets with historical or reference data
* **Third-Party Integration**: Import data exported from CRM, analytics, or other business tools

## Error Handling

| Error Type           | Cause                                       | Solution                                                         |
| -------------------- | ------------------------------------------- | ---------------------------------------------------------------- |
| Invalid URL          | File URL is malformed or inaccessible       | Verify the URL is correct and publicly accessible                |
| Unsupported Format   | File format is not CSV, XLSX, or JSON lines | Convert file to a supported format before importing              |
| Dataset Not Found    | Dataset ID doesn't exist                    | Verify the dataset\_id is correct and the dataset exists         |
| File Download Failed | Cannot access the file at the URL           | Check URL accessibility, authentication, or network connectivity |
| Parsing Error        | File format is corrupted or invalid         | Validate file structure and ensure proper formatting             |
| Schema Mismatch      | File columns don't match dataset schema     | Ensure file headers match expected dataset fields                |
| Size Limit Exceeded  | File is too large to process                | Split large files into smaller chunks for import                 |

## Notes

* **Supported Formats**: The node accepts CSV, XLSX (Excel), and JSON Lines (JSONL) formats. Ensure your file matches one of these formats.
* **File Accessibility**: The file\_url must be publicly accessible or use pre-signed URLs if using cloud storage like S3 or Google Cloud Storage.
* **Column Mapping**: CSV and XLSX files should have header rows that match the dataset field names for proper mapping.
* **JSON Lines Format**: Each line must contain a valid JSON object representing one record.
* **Error Tracking**: Check the records\_failed count to identify if any records didn't import. Review logs for specific error details.
* **Performance**: Large files may take time to process. Consider splitting extremely large datasets into multiple imports.
* **Data Validation**: Records that fail validation checks (e.g., missing required fields, invalid data types) will be counted in records\_failed.
* **Idempotency**: Re-importing the same file may create duplicate records unless your dataset has unique constraints configured.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/import_dataset.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
