Multi-Turn Conversation Support in NVIDIA RAG Blueprint#
The NVIDIA RAG Blueprint server exposes an OpenAI-compatible API, using which developers can provide custom conversation history. For full details, see APIs for RAG Server.
Use the /generate endpoint in the RAG server of a RAG pipeline to generate responses to prompts using custom conversation history.
To support multi-turn conversations, include the following parameters in the request body.
Parameter |
Description |
Type |
|---|---|---|
messages |
A sequence of messages that form a conversation history. Each message contains a |
Array |
use_knowledge_base |
|
Boolean |
Example payload for customization#
The following example payload includes a messages parameter that passes a custom conversation history to /generate endpoint for better contextual answers. You can include or change the following parameters in the request body while trying out the generate API using this notebook.
{
"messages": [
{
"role": "system",
"content": "You are an assistant that provides information about FastAPI."
},
{
"role": "user",
"content": "What is FastAPI?"
},
{
"role": "assistant",
"content": "FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints."
},
{
"role": "user",
"content": "What are the key features of FastAPI?"
}
],
"use_knowledge_base": true
}
Tip
For better accuracy of multi-turn queries, consider enabling query rewriting.