Question-Answering on Data

Use LLMs to retrieve the best matching answer from knowledge sources

Ever wondered what if we could provide all the PDF files containing company rules and regulations as the background knowledge to a large language model such as GPT and have it as an agent to reply to our queries? With the right setup, this is not a wonder anymore. We can actually build such an agent in a matter of minutes, and even embed it into websites.

What Do We Need

Data

Collect all sources of data that you want to provide the LLM as knowledge. Note that you are not limited to PDFs when providing your knowledge. Even audio/video files (e.g., user guides) can be the knowledge source.

No need to worry about OCR or transcription. The platform will automatically take care of such steps.

Tool Configuration

Put together a Workflow that:

  • Receives knowledge

  • Receives a question

  • Provides the best answer from the knowledge to the question

Let’s Build a Knowledge Retrieval Workflow

  1. Start with Creating a Workflow

    • Click on + Create workflow located on the top right of the Workflows page. For more information, see how to create a workflow.

  2. Add Knowledge

    • Click on + Add data in the knowledge section to add knowledge to your Workflow. Note that you can upload knowledge sources directly from the add knowledge window or select already existing data tables on your account.

    • Make sure to enable knowledge (i.e., vectorize your data). Vectors allow semantic search (as opposed to word-matching) and increase the accuracy of knowledge retrieval.

  3. Add User Input

    • Add a text input component which will carry the query/question. Let's call it query. Your Workflow should look similar to the image below.

  4. Add an LLM Component

    • The prompt should use {{}} and the component names (i.e., knowledge and query in our current example) to bring the knowledge and the question into the prompt. Provide precise instructions on what you need from the model. For example, a very simple prompt could be:

      Context: """ {{knowledge}} """
      Goal:
      Use the above Context and nothing else and answer the question below.
      Question: {{query}}

Handling Large Amount of Data

LLMs have limitations on the number of tokens included in the prompt. When dealing with large amounts of text such as rules and regulations, and trying to answer questions, we need to stick to the most relevant data.

  1. Set Up Most Relevant Data

    • Under LLM advanced options, click on Edit located in front of knowledge. By default, we select the most relevant data using vector search, but it is recommended to set it up manually.

    • With Most relevant data selected, click on Advanced options and type {{query}} (i.e., the name of the component containing our question) to filter out any non-relevant information to the query. More details are provided in the section on how to handle too much text.

System Prompt

Scroll further down and under System prompt, give some characteristics to your knowledge retrieval agent. For instance:

You are an expert in rules and regulations in company XYZ. You will answer questions based on the provided knowledge and ...

or

You are an expert in knowledge retrieval from large sources of data. You will answer questions precisely based on the provided knowledge.

Output Configuration

Click on the LLM output button (located on the top right of the component) if you wish to modify the output. Answer is the main output, the rest provide you with information regarding the execution and can be safely deleted.

Save and Test

Save the Workflow using the button on the top right of the page and you are ready to enter your query and get responses from your knowledge retrieval large language model.

Last updated