Knowledge & Data Sources
🧠 Powering Your Agent with Domain Knowledge
The Knowledge tab is where you transform your AI agent from a general assistant into a domain expert. By connecting relevant data sources, documents, and knowledge bases, you give your agent access to the specific information it needs to provide accurate, contextual responses.
🎯 Knowledge Source Types
📄 Document Upload
Upload files directly to your agent's knowledge base for semantic search and retrieval.
Supported File Types
PDF Documents: Research papers, manuals, reports
Word Documents: (.docx) Policies, procedures, guides
Text Files: (.txt, .md) Documentation, notes, plain text
HTML Files: (.html) Web pages, formatted documentation
Spreadsheets: (.xlsx, .xls, .csv) Data tables, catalogs, structured data
Document Processing Features
Intelligent Chunking: Configurable chunking strategies for optimal knowledge retrieval
Text Extraction: Automatic text extraction from supported formats
Space Normalization: Remove extra whitespace for cleaner text
Best Practices for Document Upload
✅ DO:
- Use clear, descriptive dataset names
- Keep documents focused on specific topics
- Ensure text is selectable (not just images)
- Configure chunking based on document structure
❌ AVOID:
- Uploading duplicate or conflicting information
- Using documents with poor formatting
- Including sensitive personal information without proper access controls
- Overwhelming with too many similar documents📊 Table Upload
Upload structured data in tabular format for precise lookups and semantic search.
Supported Table Formats
CSV Files: Comma-separated values
Excel Files: (.xlsx, .xls) Spreadsheets with single or multiple sheets
Manual Entry: Create tables directly in the interface
Table Configuration
Column Types: TEXT, NUMBER, INTEGER, BOOLEAN, DATE
Semantic Columns: Mark columns for semantic search indexing
Column Sequencing: Define display order for columns
Schema Analysis: Automatic type detection from uploaded files
Table Use Cases
Product catalogs with specifications and pricing
Customer records and interaction history
FAQ databases with questions and answers
Knowledge articles with categorization
Configuration and settings databases
🗄️ Database Schema (Manual)
Create database-like schemas for structured knowledge organization.
Database Format Features
Custom Schema Design: Define your own table structures
Column Type Support: TEXT, NUMBER, INTEGER, BOOLEAN, DATE
Manual Data Entry: Populate data through the interface
Structured Queries: Enable precise data retrieval
⚙️ Knowledge Processing Settings
Chunking Strategy
Control how documents are broken down for processing and retrieval.
Chunking Configuration Options
Chunk Type: Strategy for dividing content
Max Tokens: Maximum size per chunk (configurable)
Separator: Custom separator for chunk boundaries
Remove Extra Spaces: Clean up whitespace
Remove URLs/Emails: Filter out contact information
Best Practices
Document Length Guidelines:
- Short documents (< 5 pages): Larger chunks (1000+ tokens)
- Medium documents (5-50 pages): Medium chunks (500-1000 tokens)
- Long documents (50+ pages): Smaller chunks (200-500 tokens)
- Technical docs with code: Preserve code blocks intact🔍 Agent Knowledge Configuration
Configure how your agent retrieves and uses knowledge during conversations.
Retrieval Mode
Auto Retrieval (Default: Off)
Automatic Knowledge Access:
- Agent automatically retrieves relevant knowledge for every query
- No explicit tool call needed
- Best for: Agents that always need domain knowledge
- Trade-off: May retrieve unnecessary informationManual Tool Call (Default: On)
On-Demand Knowledge Access:
- Agent decides when to retrieve knowledge using available tools
- More control over when knowledge is accessed
- Best for: General-purpose agents that sometimes need knowledge
- Trade-off: Requires agent to recognize when knowledge is neededSearch Strategy
Hybrid Search (Default - Recommended)
Combined Approach:
- Semantic search: Understanding context and meaning
- Full-text search: Exact keyword matching
- Best for: Most use cases, balances accuracy and coverage
- Returns: Semantically relevant + keyword-matched results
- Cost extra credits for re-ranking documents. Semantic Search Only
Vector-Based Retrieval:
- Understanding context and intent
- Finding conceptually related information
- Handling synonyms and variations
- Best for: Natural language queries and conceptual searchesFull-Text Search Only
Keyword-Based Retrieval:
- Exact term matching
- Faster for specific lookups
- Good for technical terminology and precise queries
- Best for: Known keywords and exact phrase matchingRetrieval Parameters
Top K (Default: 5, Range: 1-10)
Number of Knowledge Chunks to Retrieve:
- Higher values: More comprehensive context, higher cost
- Lower values: Focused context, lower cost
- Recommended: 3-5 for most use cases
- Adjust based on: Query complexity and knowledge base sizeThreshold (Default: 0.5, Range: 0.0-1.0)
Relevance Score Threshold:
- Higher threshold (0.7-1.0): Only highly relevant results
- Medium threshold (0.4-0.7): Balanced relevance
- Lower threshold (0.0-0.4): Include more potential matches
- Recommended: Start at 0.5, adjust based on retrieval qualityQuery Rewrite (Default: On)
Query Optimization:
- Rewrites user query before knowledge retrieval
- Improves search relevance by clarifying intent
- Expands abbreviations and adds context
- Recommended: Enable for most use casesRerank (Default: Off)
Result Reranking:
- Post-processes retrieved results for better ordering
- Uses cross-encoder models for more accurate relevance
- Trade-off: Better results but additional latency
- Recommended: Enable for critical accuracy use casesConnected Datasets
Multiple Dataset Support
Connect up to 100 datasets per agent
Each dataset appears as a searchable knowledge source
Datasets maintain their own:
Name and ID
Source type (UPLOAD, MANUAL)
Format type (TEXT, TABLE, DATABASE)
Processing status
Dataset Information Display
For each connected dataset, the agent has access to:
Dataset name (user-friendly identifier)
Dataset ID (unique identifier)
Source type (how data was added)
Status (PENDING, SUCCESS, FAILURE)
Format type (TEXT, TABLE, DATABASE)
📊 Knowledge Analytics & Management
Dataset Status Monitoring
Processing States
PENDING: Dataset creation or update in progress
SUCCESS: Dataset ready for use
FAILURE: Processing encountered errors
Progress Tracking
Monitor document import progress
Track embedding generation status
View chunk processing metrics
Embedding Updates
Manual Embedding Refresh
When to Update Embeddings:
- After modifying dataset content
- After bulk row updates in tables
- To incorporate new document versions🔧 Knowledge Configuration Best Practices
Initial Setup Process
Audit Existing Information: Catalog what knowledge you have
Choose Dataset Format: TEXT for documents, TABLE for structured data
Configure Processing: Set chunking and parsing options
Select Embedding Model: Choose based on language and domain
Test Retrieval: Verify agent responses with sample queries
Dataset Organization Strategies
By Topic
Product Knowledge:
├── Product Features (TEXT dataset)
├── Technical Specifications (TABLE dataset)
├── Pricing & Plans (TABLE dataset)
└── Troubleshooting Guides (TEXT dataset)
Customer Support:
├── Common Issues (TEXT dataset)
├── Resolution Procedures (TEXT dataset)
└── Product Updates (TEXT dataset)By Source Type
Documentation:
├── User Manual.pdf → TEXT dataset
├── API Reference.pdf → TEXT dataset
└── FAQ.csv → TABLE dataset
Internal Knowledge:
├── Training Materials → TEXT dataset
├── Process Documents → TEXT dataset
└── Policy Database → DATABASE datasetOptimization Guidelines
Document Preparation
Before Upload:
- Remove duplicate content across documents
- Ensure consistent formatting
- Split very large documents into logical sections
- Use descriptive filenames
- Remove or redact sensitive informationTable Design
Column Configuration:
- Mark relevant columns as "semantic" for search
- Use appropriate data types (TEXT, NUMBER, DATE, etc.)
- Include descriptive column names
- Maintain data consistency across rows
- Consider creating separate tables for different entity typesRetrieval Tuning
Adjustment Process:
1. Start with defaults (Hybrid, Top K=5, Threshold=0.5)
2. Test with representative queries
3. If too few results: Lower threshold, increase Top K
4. If too many irrelevant results: Raise threshold, enable rerank
5. If missing semantic matches: Switch to Semantic search
6. If missing exact matches: Switch to Full-text search🚀 Advanced Features
Multi-Dataset Retrieval
When connecting multiple datasets to an agent:
Agent can search across all connected datasets
Results merged and ranked by relevance
Each result includes source dataset information
Useful for comprehensive knowledge coverage
Semantic Column Configuration
For TABLE and DATABASE formats:
Mark specific columns for semantic search indexing
Non-semantic columns remain queryable but not embedded
Reduces embedding costs for large tables
Improves search focus on relevant fields
🎯 Knowledge Integration Checklist
Before activating your agent's knowledge base:
Your agent's knowledge is its competitive advantage—invest in building a comprehensive, well-organized knowledge base that enables intelligent, accurate responses.
Last updated
Was this helpful?