Knowledge & Data Sources

🧠 Powering Your Agent with Domain Knowledge

The Knowledge tab is where you transform your AI agent from a general assistant into a domain expert. By connecting relevant data sources, documents, and knowledge bases, you give your agent access to the specific information it needs to provide accurate, contextual responses.


🎯 Knowledge Source Types

📄 Document Upload

Upload files directly to your agent's knowledge base for semantic search and retrieval.

Supported File Types

  • PDF Documents: Research papers, manuals, reports

  • Word Documents: (.docx) Policies, procedures, guides

  • Text Files: (.txt, .md) Documentation, notes, plain text

  • HTML Files: (.html) Web pages, formatted documentation

  • Spreadsheets: (.xlsx, .xls, .csv) Data tables, catalogs, structured data

Document Processing Features

  • Intelligent Chunking: Configurable chunking strategies for optimal knowledge retrieval

  • Text Extraction: Automatic text extraction from supported formats

  • Space Normalization: Remove extra whitespace for cleaner text

Best Practices for Document Upload

✅ DO:
- Use clear, descriptive dataset names
- Keep documents focused on specific topics
- Ensure text is selectable (not just images)
- Configure chunking based on document structure

❌ AVOID:
- Uploading duplicate or conflicting information
- Using documents with poor formatting
- Including sensitive personal information without proper access controls
- Overwhelming with too many similar documents

📊 Table Upload

Upload structured data in tabular format for precise lookups and semantic search.

Supported Table Formats

  • CSV Files: Comma-separated values

  • Excel Files: (.xlsx, .xls) Spreadsheets with single or multiple sheets

  • Manual Entry: Create tables directly in the interface

Table Configuration

  • Column Types: TEXT, NUMBER, INTEGER, BOOLEAN, DATE

  • Semantic Columns: Mark columns for semantic search indexing

  • Column Sequencing: Define display order for columns

  • Schema Analysis: Automatic type detection from uploaded files

Table Use Cases

  • Product catalogs with specifications and pricing

  • Customer records and interaction history

  • FAQ databases with questions and answers

  • Knowledge articles with categorization

  • Configuration and settings databases

🗄️ Database Schema (Manual)

Create database-like schemas for structured knowledge organization.

Database Format Features

  • Custom Schema Design: Define your own table structures

  • Column Type Support: TEXT, NUMBER, INTEGER, BOOLEAN, DATE

  • Manual Data Entry: Populate data through the interface

  • Structured Queries: Enable precise data retrieval


⚙️ Knowledge Processing Settings

Chunking Strategy

Control how documents are broken down for processing and retrieval.

Chunking Configuration Options

  • Chunk Type: Strategy for dividing content

  • Max Tokens: Maximum size per chunk (configurable)

  • Separator: Custom separator for chunk boundaries

  • Remove Extra Spaces: Clean up whitespace

  • Remove URLs/Emails: Filter out contact information

Best Practices

Document Length Guidelines:
- Short documents (< 5 pages): Larger chunks (1000+ tokens)
- Medium documents (5-50 pages): Medium chunks (500-1000 tokens)
- Long documents (50+ pages): Smaller chunks (200-500 tokens)
- Technical docs with code: Preserve code blocks intact

🔍 Agent Knowledge Configuration

Configure how your agent retrieves and uses knowledge during conversations.

Retrieval Mode

Auto Retrieval (Default: Off)

Automatic Knowledge Access:
- Agent automatically retrieves relevant knowledge for every query
- No explicit tool call needed
- Best for: Agents that always need domain knowledge
- Trade-off: May retrieve unnecessary information

Manual Tool Call (Default: On)

On-Demand Knowledge Access:
- Agent decides when to retrieve knowledge using available tools
- More control over when knowledge is accessed
- Best for: General-purpose agents that sometimes need knowledge
- Trade-off: Requires agent to recognize when knowledge is needed

Search Strategy

Combined Approach:
- Semantic search: Understanding context and meaning
- Full-text search: Exact keyword matching
- Best for: Most use cases, balances accuracy and coverage
- Returns: Semantically relevant + keyword-matched results
- Cost extra credits for re-ranking documents. 

Semantic Search Only

Vector-Based Retrieval:
- Understanding context and intent
- Finding conceptually related information
- Handling synonyms and variations
- Best for: Natural language queries and conceptual searches

Full-Text Search Only

Keyword-Based Retrieval:
- Exact term matching
- Faster for specific lookups
- Good for technical terminology and precise queries
- Best for: Known keywords and exact phrase matching

Retrieval Parameters

Top K (Default: 5, Range: 1-10)

Number of Knowledge Chunks to Retrieve:
- Higher values: More comprehensive context, higher cost
- Lower values: Focused context, lower cost
- Recommended: 3-5 for most use cases
- Adjust based on: Query complexity and knowledge base size

Threshold (Default: 0.5, Range: 0.0-1.0)

Relevance Score Threshold:
- Higher threshold (0.7-1.0): Only highly relevant results
- Medium threshold (0.4-0.7): Balanced relevance
- Lower threshold (0.0-0.4): Include more potential matches
- Recommended: Start at 0.5, adjust based on retrieval quality

Query Rewrite (Default: On)

Query Optimization:
- Rewrites user query before knowledge retrieval
- Improves search relevance by clarifying intent
- Expands abbreviations and adds context
- Recommended: Enable for most use cases

Rerank (Default: Off)

Result Reranking:
- Post-processes retrieved results for better ordering
- Uses cross-encoder models for more accurate relevance
- Trade-off: Better results but additional latency
- Recommended: Enable for critical accuracy use cases

Connected Datasets

Multiple Dataset Support

  • Connect up to 100 datasets per agent

  • Each dataset appears as a searchable knowledge source

  • Datasets maintain their own:

    • Name and ID

    • Source type (UPLOAD, MANUAL)

    • Format type (TEXT, TABLE, DATABASE)

    • Processing status

Dataset Information Display

For each connected dataset, the agent has access to:

  • Dataset name (user-friendly identifier)

  • Dataset ID (unique identifier)

  • Source type (how data was added)

  • Status (PENDING, SUCCESS, FAILURE)

  • Format type (TEXT, TABLE, DATABASE)


📊 Knowledge Analytics & Management

Dataset Status Monitoring

Processing States

  • PENDING: Dataset creation or update in progress

  • SUCCESS: Dataset ready for use

  • FAILURE: Processing encountered errors

Progress Tracking

  • Monitor document import progress

  • Track embedding generation status

  • View chunk processing metrics

Embedding Updates

Manual Embedding Refresh

When to Update Embeddings:
- After modifying dataset content
- After bulk row updates in tables
- To incorporate new document versions

🔧 Knowledge Configuration Best Practices

Initial Setup Process

  1. Audit Existing Information: Catalog what knowledge you have

  2. Choose Dataset Format: TEXT for documents, TABLE for structured data

  3. Configure Processing: Set chunking and parsing options

  4. Select Embedding Model: Choose based on language and domain

  5. Test Retrieval: Verify agent responses with sample queries

Dataset Organization Strategies

By Topic

Product Knowledge:
├── Product Features (TEXT dataset)
├── Technical Specifications (TABLE dataset)
├── Pricing & Plans (TABLE dataset)
└── Troubleshooting Guides (TEXT dataset)

Customer Support:
├── Common Issues (TEXT dataset)
├── Resolution Procedures (TEXT dataset)
└── Product Updates (TEXT dataset)

By Source Type

Documentation:
├── User Manual.pdf → TEXT dataset
├── API Reference.pdf → TEXT dataset
└── FAQ.csv → TABLE dataset

Internal Knowledge:
├── Training Materials → TEXT dataset
├── Process Documents → TEXT dataset
└── Policy Database → DATABASE dataset

Optimization Guidelines

Document Preparation

Before Upload:
- Remove duplicate content across documents
- Ensure consistent formatting
- Split very large documents into logical sections
- Use descriptive filenames
- Remove or redact sensitive information

Table Design

Column Configuration:
- Mark relevant columns as "semantic" for search
- Use appropriate data types (TEXT, NUMBER, DATE, etc.)
- Include descriptive column names
- Maintain data consistency across rows
- Consider creating separate tables for different entity types

Retrieval Tuning

Adjustment Process:
1. Start with defaults (Hybrid, Top K=5, Threshold=0.5)
2. Test with representative queries
3. If too few results: Lower threshold, increase Top K
4. If too many irrelevant results: Raise threshold, enable rerank
5. If missing semantic matches: Switch to Semantic search
6. If missing exact matches: Switch to Full-text search

🚀 Advanced Features

Multi-Dataset Retrieval

When connecting multiple datasets to an agent:

  • Agent can search across all connected datasets

  • Results merged and ranked by relevance

  • Each result includes source dataset information

  • Useful for comprehensive knowledge coverage

Semantic Column Configuration

For TABLE and DATABASE formats:

  • Mark specific columns for semantic search indexing

  • Non-semantic columns remain queryable but not embedded

  • Reduces embedding costs for large tables

  • Improves search focus on relevant fields


🎯 Knowledge Integration Checklist

Before activating your agent's knowledge base:


Your agent's knowledge is its competitive advantage—invest in building a comprehensive, well-organized knowledge base that enables intelligent, accurate responses.

Last updated

Was this helpful?