Introduction
How to use the platform to store and access your datasets
The platform has built-in datasets that are NoSQL-based documents and support storing vector embeddings and vector search.
What is a dataset?
A dataset is a collection of documents that are stored in a NoSQL database. Each document has a unique ID and a set of fields. Each field can be a string, number, date, or a vector. A vector is a list of numbers that represent a point in a multi-dimensional space.
Vectors are additional features to a dataset. Such a feature enables better performance in tasks including search and answer retrieval. In other words, vectors add extra knowledge to AI. Therefore, a dataset containing Vectors is referred to as Knowledge enabled.
On the Data page, you can see all your uploaded datasets. If a dataset is vectorized, it appears under “Knowledge” and under “Datasets” otherwise.
How to create a data table
On the Data page, click on Create table on the top right and choose the option matching your data.
Blank: to create an empty table
Upload file: to upload a file (CSV, PDF, MP3, …)
PDF files automatically go through a PDF-to-Text step.
Audio/Video files automatically go through a transcription step.
Import from a website: to scrape a website and save the data in the table
Integration: to import data from a third party
Upload file
Simply select the file(s) or drag and drop them in the box and type a name for your data table.
Data table names can only contain small letters, numbers, and hyphens. Any other characters will be automatically replaced with a hyphen.
You can upload multiple files at once.
Click on Upload data to table and wait till the upload process finalizes. Note that the larger the files, the longer the upload time.
Enable knowledge
After the data is saved in a data table, the system detects all fields that can be vectorized and be later used as knowledge for an AI agent or for further analysis.
You can vectorize all fields (Select all) or only select your desired fields and then click on Continue.
Note that only knowledge-enabled data can be used for further analysis.
Last updated