# Knowledge Retrieval

**Action ID:** `knowledge_retrieval`

## Description

Retrieve knowledge from a dataset using semantic similarity search.

## Input Parameters

| Name    | Type     | Required | Default | Description                                                           |
| ------- | -------- | :------: | ------- | --------------------------------------------------------------------- |
| dataset | dropdown |     ✓    | -       | The dataset to retrieve knowledge from.                               |
| query   | string   |     ✓    | -       | The query to retrieve knowledge from the dataset (1-1000 characters). |
| top\_k  | integer  |     -    | 5       | The number of documents to return (1-100).                            |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Knowledge Retrieval node input.",
  "properties": {
    "dataset": {
      "description": "The dataset to retrieve knowledge from.",
      "title": "Dataset",
      "type": "string"
    },
    "query": {
      "description": "The query to retrieve knowledge from the dataset.",
      "maxLength": 1000,
      "minLength": 1,
      "title": "Query",
      "type": "string"
    },
    "top_k": {
      "default": 5,
      "description": "The number of documents to return.",
      "maximum": 100,
      "minimum": 1,
      "title": "Top K",
      "type": "integer"
    }
  },
  "required": [
    "dataset",
    "query"
  ],
  "title": "KnowledgeRetrievalNodeInput",
  "type": "object"
}
```

</details>

## Output Parameters

| Name      | Type  | Description                                   |
| --------- | ----- | --------------------------------------------- |
| documents | array | The documents that are relevant to the query. |

### Document Structure

Each document in the `documents` array contains:

| Field    | Type   | Description                                             |
| -------- | ------ | ------------------------------------------------------- |
| id       | string | Unique identifier of the row                            |
| content  | string | Combined content from all cell values                   |
| metadata | object | Metadata about the document including source dataset ID |

<details>

<summary>View JSON Schema</summary>

```json
{
  "description": "Knowledge Retrieval node output.",
  "properties": {
    "documents": {
      "description": "The documents that is relevant to the query.",
      "items": {
        "additionalProperties": true,
        "type": "object"
      },
      "title": "Documents",
      "type": "array"
    }
  },
  "required": [
    "documents"
  ],
  "title": "KnowledgeRetrievalNodeOutput",
  "type": "object"
}
```

</details>

## How It Works

This node performs semantic similarity search on a dataset to retrieve relevant documents. It uses vector embeddings to find rows whose content is semantically similar to the query text. The node combines all cell values from each matching row into a single content string and returns the top K most relevant documents along with their IDs and metadata. This enables natural language querying of structured data for knowledge retrieval and context-aware applications.

## Usage Examples

### Example 1: Product Documentation Search

**Input:**

```
dataset: "product_docs_2024"
query: "How do I reset my password?"
top_k: 3
```

**Output:**

```
documents: [
  {
    "id": "doc_456",
    "content": "Password Reset: Navigate to settings, click 'Security', then 'Reset Password'...",
    "score": 0.92,
    "metadata": {"category": "authentication", "updated": "2024-01-10"}
  },
  {
    "id": "doc_123",
    "content": "Account Security: Managing your password and security settings...",
    "score": 0.87,
    "metadata": {"category": "security", "updated": "2024-01-05"}
  },
  {
    "id": "doc_789",
    "content": "Login Issues: Troubleshooting common authentication problems...",
    "score": 0.78,
    "metadata": {"category": "troubleshooting", "updated": "2023-12-20"}
  }
]
```

### Example 2: Customer Support Knowledge Base

**Input:**

```
dataset: "support_kb_001"
query: "Shipping delays international orders"
top_k: 5
```

**Output:**

```
documents: [
  {
    "id": "kb_234",
    "content": "International shipping typically takes 7-14 business days. Customs processing may cause additional delays...",
    "score": 0.89
  },
  {
    "id": "kb_567",
    "content": "Track your international shipment using the tracking number provided...",
    "score": 0.82
  },
  ...
]
```

### Example 3: Research Paper Database

**Input:**

```
dataset: "research_papers_ai"
query: "transformer architecture attention mechanisms"
top_k: 10
```

**Output:**

```
documents: [
  {
    "id": "paper_1123",
    "title": "Attention Is All You Need",
    "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
    "score": 0.95,
    "authors": ["Vaswani et al."],
    "year": 2017
  },
  ...
]
```

## Common Use Cases

* **Customer Support**: Retrieve relevant help articles and documentation based on customer questions
* **Question Answering Systems**: Find contextually relevant information to answer user queries
* **Semantic Search**: Implement intelligent search that understands meaning beyond keywords
* **RAG (Retrieval-Augmented Generation)**: Provide context to AI models by retrieving relevant documents
* **Document Discovery**: Help users discover related content and documents in large repositories
* **Knowledge Base Navigation**: Enable natural language search across organizational knowledge bases
* **Research and Analysis**: Find relevant research papers, reports, or documents based on topics

## Error Handling

| Error Type          | Cause                                             | Solution                                                         |
| ------------------- | ------------------------------------------------- | ---------------------------------------------------------------- |
| Dataset Not Found   | Dataset ID doesn't exist                          | Verify the dataset parameter contains a valid dataset ID         |
| Empty Query         | Query string is empty or contains only whitespace | Provide a meaningful query string with at least 1 character      |
| Query Too Long      | Query exceeds 1000 characters                     | Shorten your query to 1000 characters or less                    |
| Invalid Top K       | top\_k value is outside range 1-100               | Set top\_k to a value between 1 and 100                          |
| Dataset Not Indexed | Dataset lacks vector embeddings                   | Ensure the dataset has been properly indexed for semantic search |
| No Results Found    | No documents match the query                      | Try rephrasing your query or expanding the search criteria       |
| Embedding Error     | Failed to generate query embeddings               | Check embedding service availability and retry                   |

## Notes

* **Semantic vs Keyword Search**: This node uses semantic search, which understands context and meaning rather than just matching keywords.
* **Query Quality**: More specific, well-formed queries generally produce better results. Avoid overly vague or generic queries.
* **Top K Selection**: Balance between retrieving enough context (higher top\_k) and maintaining relevance (lower top\_k). Default of 5 works well for most cases.
* **Result Ranking**: Documents are returned in order of relevance, with the most relevant documents first.
* **Score Interpretation**: Similarity scores typically range from 0 to 1, with higher scores indicating greater relevance.
* **Dataset Preparation**: Ensure your dataset is properly indexed with embeddings before using this node.
* **Performance**: Retrieval speed depends on dataset size. Larger datasets may take slightly longer to search.
* **Use with AI Nodes**: Combine with AI nodes like Claude or GPT to build RAG systems that answer questions based on retrieved context.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agenticflow.ai/reference/nodes/knowledge_retrieval.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
