Text-to-SQL with Vanna#
The NVIDIA NeMo Agent toolkit provides text-to-SQL capabilities through the text2sql and execute_db_query functions, powered by the Vanna framework with Databricks support and vector-based few-shot learning.
Features#
Text-to-SQL Generation: Convert natural language questions to SQL queries using AI
Databricks Support: Optimized for Databricks SQL warehouses and compute clusters
Vector Store Integration: Milvus-based similarity search for few-shot learning
Streaming Support: Real-time progress updates during SQL generation
Database Execution: Optional query execution with result formatting
Customizable: Flexible configuration for prompts, examples, and database connections
Installation#
The text-to-SQL plugin is distributed as a separate package that can be installed alongside the NeMo Agent toolkit. If you have not yet installed the NeMo Agent toolkit, refer to the Installation guide.
If you have performed a source code checkout, you can install this with the following command:
uv pip install -e '.[vanna]'
If you have installed the NeMo Agent toolkit from a package, you can install this with the following command:
uv pip install "nvidia-nat[vanna]"
Quick Start#
Prerequisites#
NVIDIA API Key (refer to Obtaining API Keys)
Milvus vector database (local or cloud)
Databricks workspace with SQL warehouse or compute cluster access
1. Start Milvus#
Install and start Milvus standalone with docker compose following these steps.
2. Set Environment Variables#
Create a .env file:
# NVIDIA API
NVIDIA_API_KEY=nvapi-xxx
# Database (Databricks)
CONNECTION_URL=databricks://token:<token>@<db_host>:443/default?http_path=<http_path>&catalog=main&schema=default
# Milvus
MILVUS_URI=http://localhost:19530
MILVUS_PASSWORD=your-password
3. Create Workflow Configuration#
3.1 Create training config text2sql_training_config.yml#
general:
telemetry:
logging:
console:
_type: console
level: INFO
functions:
text2sql:
_type: text2sql
llm_name: nim_llm
embedder_name: nim_embedder
milvus_retriever: milvus_retriever
# Database config
database_type: databricks
connection_url: "${CONNECTION_URL}"
# Vanna settings
execute_sql: false
train_on_startup: true
auto_training: true # Auto-train Vanna (auto-extract DDL and generate training data from database) or manually train Vanna (uses training data from training_db_schema.py)
n_results: 5
milvus_search_limit: 1000
execute_db_query:
_type: execute_db_query
database_type: databricks
connection_url: "${CONNECTION_URL}"
max_rows: 100
llms:
nim_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
api_key: "${NVIDIA_API_KEY}"
base_url: https://integrate.api.nvidia.com/v1
temperature: 0.0
embedders:
nim_embedder:
_type: nim
model_name: nvidia/llama-3.2-nv-embedqa-1b-v2
api_key: "${NVIDIA_API_KEY}"
base_url: https://integrate.api.nvidia.com/v1
retrievers:
milvus_retriever:
_type: milvus_retriever
uri: "${MILVUS_URI}"
connection_args:
user: "developer"
password: "${MILVUS_PASSWORD}"
db_name: "default"
embedding_model: nim_embedder
content_field: text
use_async_client: true
workflow:
_type: rewoo_agent
tool_names: [text2sql, execute_db_query]
llm_name: nim_llm
tool_call_max_retries: 3
Update training materials in training_db_schema.py:
VANNA_TRAINING_DOCUMENTATION: Add documentation about your tables and business logicVANNA_TRAINING_DDL: Provide DDL statements for your database schemaIf
auto_trainingis set totrue, make sureVANNA_ACTIVE_TABLESis updated with the tables in your database. This ensures that automatic DDL extraction works properly.
VANNA_TRAINING_EXAMPLES: Provide question-SQL example pairs for few-shot learning
3.2 Create inference config text2sql_config.yml#
Set train_on_startup to false for faster startup when using pre-trained data:
functions:
text2sql:
train_on_startup: false
auto_training: false
See text2sql_training_config.yml and text2sql_config.yml for reference.
4. Run the Workflow#
The following examples show how to use the text-to-SQL workflow with the NeMo Agent toolkit CLI or programmatically.
# Using NeMo Agent toolkit CLI
# If auto_training is set to true, training takes approximately 7 minutes depending on endpoints and network conditions.
nat run --config_file packages/nvidia_nat_vanna/text2sql_training_config.yml --input "Retrieve the total number of customers."
# Once training is complete, use the inference configuration for faster generation.
nat run --config_file packages/nvidia_nat_vanna/text2sql_config.yml --input "What is the total profit?"
Or use the Python API:
import asyncio
from nat.core import Workflow
async def main():
workflow = Workflow.from_config("text2sql_config.yml")
result = await workflow.run("Retrieve the total number of customers.")
print(result)
asyncio.run(main())
Expected output:
# Ingest DDL and synthesize query-SQL pairs for training
Training Vanna...
# ReWOO Agent Planning Phase
Plan 1: Generate SQL query from natural language
Tool: text2sql
Plan 2: Execute the generated SQL query
Tool: execute_db_query
# Execution Phase
Starting SQL generation...
Retrieved 1 similar SQL examples
SQL generated: SELECT COUNT(*) FROM customers
Executing SQL query...
Results: 42 customers found
Configuration#
Text2SQL Function#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
LLM reference for SQL generation |
Required |
|
|
Embedder reference for vector ops |
Required |
|
|
Milvus retriever reference (must use |
Required |
|
|
Database type (must be ‘Databricks’) |
“Databricks” |
|
|
Database connection string (SQLAlchemy format) |
Required |
|
|
Execute SQL or just return query |
false |
|
|
Allow intermediate queries |
false |
|
|
Train Vanna on startup |
false |
|
|
Auto-extract DDL and generate training data |
false |
|
|
Custom system prompt |
null |
|
|
Number of similar examples |
5 |
|
|
Milvus collection name for SQL examples |
|
|
|
Milvus collection name for DDL |
|
|
|
Milvus collection name for documentation |
|
|
|
Maximum limit for vector search operations |
1000 |
|
|
Models requiring think tag removal |
See below |
|
|
Models using standard response handling |
See below |
Default reasoning models: nvidia/llama-3.1-nemotron-ultra-253b-v1, nvidia/llama-3.3-nemotron-super-49b-v1.5, deepseek-ai/deepseek-v3.1, deepseek-ai/deepseek-r1
Default chat models: meta/llama-3.1-70b-instruct
Understanding train_on_startup and auto_training#
train_on_startup: Controls whether Vanna initializes and loads training data when the workflow starts.
true: Automatically creates Milvus collections with names specified bysql_collection,ddl_collection, anddoc_collectionparameters (defaults:"vanna_sql","vanna_ddl","vanna_documentation") and ingests training data during workflow initialization. This ensures the vector store is populated and ready for similarity search before the first query is processed. Use this setting when you want to ensure fresh training data is loaded each time the workflow starts.false(default): Skips automatic collection creation and training data ingestion. The workflow assumes Milvus collections already exist and contain previously trained data. Use this setting in production environments where training data is already loaded.
auto_training: Controls the source of training data (only used when train_on_startup=true).
true: Automatically extracts DDL from the database usingVANNA_ACTIVE_TABLESand generates question-SQL training pairs using the LLM. This is useful when you want to quickly bootstrap the system with your existing database schema.false(default): Uses manually defined training data fromtraining_db_schema.py(VANNA_TRAINING_DDL,VANNA_TRAINING_EXAMPLES,VANNA_TRAINING_DOCUMENTATION). This gives you full control over the training data quality.
Database Configuration#
Databricks:
database_type: databricks
connection_url: "databricks://token:${DB_TOKEN}@${DB_HOST}:443/default?http_path=${HTTP_PATH}&catalog=main&schema=default"
Connection URL Format:
databricks://token:<token>@<db_host>:443/default?http_path=<http_path>&catalog=<catalog>&schema=<schema>
Parameters:
<token>: Databricks personal access token or service principal token<db_host>: Your Databricks workspace URL, for exampleyour-workspace.cloud.databricks.com<http_path>: Path to your SQL warehouse or compute cluster, for example/sql/1.0/warehouses/abc123<catalog>: Catalog name, for examplemain<schema>: Schema name, for exampledefault
Example:
CONNECTION_URL="databricks://token:dapi-xxx@your-workspace.cloud.databricks.com:443/default?http_path=/sql/1.0/warehouses/abc123&catalog=main&schema=default"
Note: Only Databricks is currently supported. The connection uses SQLAlchemy with the databricks-sql-connector driver. Other databases can be customized as following:
# PostgreSQL
engine = create_engine("postgresql+psycopg://user:password@localhost:5432/mydb")
# MS SQL Server
engine = create_engine(
"mssql+pyodbc://user:password@server/db?driver=ODBC+Driver+18+for+SQL+Server"
)
# SQLite
engine = create_engine("sqlite:///local.db")
Execute DB Query Function#
Parameter |
Type |
Description |
Default |
|---|---|---|---|
|
|
Database type (must be ‘Databricks’) |
“Databricks” |
|
|
Database connection string (SQLAlchemy format) |
Required |
|
|
Maximum rows to return |
100 |
Milvus Configuration#
The text2sql function connects to Milvus using environment variables and manages collections internally. For advanced use cases, you can configure Milvus connection settings:
# Optional: Custom retriever for additional collections
retrievers:
milvus_retriever:
_type: milvus_retriever
uri: "${MILVUS_URI}" # Supports both http://localhost:19530 or https://host:443
connection_args:
user: "developer"
password: "${MILVUS_PASSWORD}"
db_name: "default"
embedding_model: nim_embedder
use_async_client: true
Training Data#
Training data is defined in training_db_schema.py and is used when train_on_startup=true.
DDL (Data Definition Language)#
Provide table schemas to help Vanna understand your database structure in VANNA_TRAINING_DDL:
VANNA_TRAINING_DDL: list[str] = [
"CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100), created_at TIMESTAMP)",
"CREATE TABLE orders (id INT PRIMARY KEY, user_id INT, total DECIMAL(10,2))",
]
Documentation#
Add contextual information about your data in VANNA_TRAINING_DOCUMENTATION:
VANNA_TRAINING_DOCUMENTATION: list[str] = [
"The users table contains customer information. The created_at field shows when they signed up.",
"Orders table tracks all purchases. The total field is in USD.",
]
Examples (Few-Shot Learning)#
Provide question-SQL pairs for better accuracy in VANNA_TRAINING_EXAMPLES:
VANNA_TRAINING_EXAMPLES: list[dict[str, str]] = [
{
"question": "Who are our top 10 customers by revenue?",
"sql": "SELECT u.name, SUM(o.total) as revenue FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.id ORDER BY revenue DESC LIMIT 10",
},
{
"question": "How many new users signed up last month?",
"sql": "SELECT COUNT(*) FROM users WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')",
},
]
Active Tables (for Auto-Training)#
When auto_training=true, specify which tables to extract DDL from in VANNA_ACTIVE_TABLES:
VANNA_ACTIVE_TABLES = ['catalog.schema.table_a', 'catalog.schema.table_b']
Advanced Usage#
Multi-Step Query Planning#
The ReWOO agent automatically plans a two-step workflow:
Generate SQL from natural language using
text2sqlExecute the SQL using
execute_db_query
You can customize the planning and solving prompts:
workflow:
_type: rewoo_agent
tool_names: [text2sql, execute_db_query]
llm_name: nim_llm
tool_call_max_retries: 3
additional_planner_instructions: |
When generating SQL queries, prioritize performance and accuracy.
Always plan to verify the SQL before execution.
additional_solver_instructions: |
Format the final results in a clear, user-friendly manner.
For alternative agent types, for example ReAct for multi-turn conversations:
workflow:
_type: react_agent
tool_names: [text2sql, execute_db_query]
llm_name: nim_llm
max_history: 10
Custom Prompts#
Customize the system prompt for domain-specific SQL generation:
text2sql:
initial_prompt: |
You are an expert in supply chain analytics using Databricks SQL.
Generate queries that follow these conventions:
- Use CTE (WITH clauses) for complex queries
- Always include meaningful column aliases
- Use QUALIFY for deduplication when appropriate
Streaming Responses#
Access streaming progress in your application:
from nat.core import Workflow
workflow = Workflow.from_config("text2sql_config.yml")
async for update in workflow.stream("How many customers do we have?"):
if update["type"] == "status":
print(f"Status: {update['message']}")
elif update["type"] == "result":
print(f"Result: {update}")
Production Considerations#
Security#
Environment Variables: Store credentials in environment variables, not in config files
Database Permissions: Use read-only database users for query execution
Query Validation: Review generated SQL before execution in production
Connection Pooling: Configure connection limits for high-traffic scenarios
Performance#
Milvus Indexing: Use appropriate index types for your vector dimensions
Result Limits: Set
max_rowsto prevent large result setsCaching: Consider caching frequent queries
Connection Reuse: Vanna maintains a singleton instance for efficiency
Monitoring#
Enable telemetry for observability:
general:
telemetry:
tracing:
phoenix:
_type: phoenix
endpoint: "http://localhost:6006"
logging:
console:
_type: console
level: INFO
Other features include:
Full integration with the NeMo Agent toolkit intermediate step tracking system
Better UI Display - Front-ends can now properly render intermediate steps
Parent Tracking - Each function call has a
parent_idto group related steps
Troubleshooting#
Connection Issues#
Milvus connection failed:
Error: Failed to connect to Milvus
Verify Milvus is running:
docker ps | grep milvusCheck host and port configuration
Verify TLS settings match your Milvus deployment
Database connection failed:
Error: Failed to connect to database
Verify credentials and connection parameters
Check network connectivity
For Databricks, ensure HTTP path format is correct
SQL Generation Issues#
Poor quality SQL:
Add more training examples similar to your use case (aim for 20+)
Provide comprehensive DDL with column descriptions
Add documentation about business logic
Increase
n_resultsto retrieve more examples
SQL execution errors:
Enable
execute_sql: falseto review queries before executionVerify catalog and schema names
No training data found:
Vanna needs examples to work. Set
train_on_startup: trueand add at least 3-5 training examples intraining_db_schema.pyOr use
auto_training: trueto automatically generate training data from your database
Known Limitations#
LLM Limitations:
The
llama-3.1-70b-instructmodel does not always strictly follow instructions to output in the expected JSON format, which can cause parsing issues. A parsing fallback mechanism has been implemented to handle these cases.To ensure optimal performance and consistent JSON output formatting, we recommend using reasoning models in the configuration. These models demonstrate better instruction-following capabilities and reliably produce output in the expected format.
Database Privileges:
This package provides text-to-SQL functionality without built-in guardrails. To prevent destructive operations, always configure the database connection with read-only privileges.
Additional Resources#
For more information:
Writing Custom Functions - Learn how to create your own functions
Workflow Configuration - Complete configuration reference
Contributing Guidelines - How to contribute to the NeMo Agent toolkit
Support - Get help and support