Question-Answering on Data
Use LLMs to retrieve the best matching answer from knowledge sources
Ever wondered what if we could provide all the PDF files containing company rules and regulations as the background knowledge to a large language model such as GPT and have it as an agent to reply to our queries? With the right setup, this is not a wonder anymore. We can actually build such an agent in a matter of minutes, and even embed it into websites.
What Do We Need
Data
Collect all sources of data that you want to provide the LLM as knowledge. Note that you are not limited to PDFs when providing your knowledge. Even audio/video files (e.g., user guides) can be the knowledge source.
No need to worry about OCR or transcription. The platform will automatically take care of such steps.
Tool Configuration
Put together a Workflow that:
Receives knowledge
Receives a question
Provides the best answer from the knowledge to the question
Let’s Build a Knowledge Retrieval Workflow
Start with Creating a Workflow
Click on + Create workflow located on the top right of the Workflows page. For more information, see how to create a workflow.
Add Knowledge
Click on + Add data in the knowledge section to add knowledge to your Workflow. Note that you can upload knowledge sources directly from the add knowledge window or select already existing data tables on your account.
Make sure to enable knowledge (i.e., vectorize your data). Vectors allow semantic search (as opposed to word-matching) and increase the accuracy of knowledge retrieval.
Add User Input
Add a text input component which will carry the query/question. Let's call it query. Your Workflow should look similar to the image below.
Add an LLM Component
The prompt should use
{{}}
and the component names (i.e., knowledge and query in our current example) to bring the knowledge and the question into the prompt. Provide precise instructions on what you need from the model. For example, a very simple prompt could be:
Handling Large Amount of Data
LLMs have limitations on the number of tokens included in the prompt. When dealing with large amounts of text such as rules and regulations, and trying to answer questions, we need to stick to the most relevant data.
Set Up Most Relevant Data
Under LLM advanced options, click on Edit located in front of knowledge. By default, we select the most relevant data using vector search, but it is recommended to set it up manually.
With Most relevant data selected, click on Advanced options and type
{{query}}
(i.e., the name of the component containing our question) to filter out any non-relevant information to the query. More details are provided in the section on how to handle too much text.
System Prompt
Scroll further down and under System prompt, give some characteristics to your knowledge retrieval agent. For instance:
or
Output Configuration
Click on the LLM output button (located on the top right of the component) if you wish to modify the output. Answer is the main output, the rest provide you with information regarding the execution and can be safely deleted.
Save and Test
Save the Workflow using the button on the top right of the page and you are ready to enter your query and get responses from your knowledge retrieval large language model.
Last updated