FAQ

Frequently asked questions about data and data preparation

What is the best practice when preparing a CSV file?

The quality of your data significantly impacts the results you’ll get from your LLM, so it’s important to properly prepare your dataset. You can manually generate your data or pull it from your CRM or any other source. The process of creating the dataset remains the same, regardless of the data source. It’s important to include headers for every field/column in your file. Avoid using spaces or special characters in the headers. Stick to lowercase letters and use dashes instead of spaces. To improve your experience, we recommend short (a few words) and meaningful header names.

What is the maximum allowed file size?

  • Maximum file size to upload is 100 MB.

  • Maximum number of rows per data table is 50K.

  • Maximum raw text size to upload for knowledge retrieval is 10 MB.

What does Knowledge enable mean?

After uploading your data, you will see a pop-up window asking which fields to use to enable knowledge. The selected fields are vectorized, and vectors enable semantic search (i.e., search by meaning and not just word matching). In other words, vectors help match up a query with the most similar set of information from your dataset (e.g., the most similar responses from the past in a QA dataset).

How do I know if my dataset is vectorized?

A dataset labeled with Knowledge enabled in the Data page indicates there are vectors associated with the data.

Is there a way to vectorize/re-vectorize a dataset after the upload process is completed?

  • Re-vectorize: Select the table, click on the Knowledge button on the top right, and follow the vectorize wizard.

  • Vectorize after upload: If there are no vectors associated with a dataset, your table will appear under Datasets (i.e., not Knowledge). Click on the dataset that you wish to vectorize. On the new page, click on Convert to knowledge set and follow the wizard.

What models are used for vectorizing text data?

By default, MpNet is used for vectorizing text data. However, there are other models available. To use them, skip enabling knowledge when uploading your dataset. Next, select the uploaded dataset and click on the Vectorize button.

How do I know the name of the field containing the vectorized data?

Select the dataset that is vectorized, and you will see the name of the vector field on the top.

Last updated