API Key Authentication for RAG and Retriever Evaluations#

NeMo Evaluator evaluator supports API key authentication for external services used in RAG (Retrieval Augmented Generation) and retriever evaluation workflows. This feature enables secure integration with NVIDIA NIM services while maintaining the security of your authentication credentials.

Overview#

The API key authentication feature allows you to securely provide authentication credentials for various components in your RAG and retriever evaluation pipelines:

  • Query embedding models - For encoding user queries using NVIDIA models

  • Index embedding models - For encoding documents in your knowledge base using NVIDIA models

  • Reranking models - For improving retrieval relevance

  • Large Language Models - For answer generation

  • Judge models - For evaluation metrics

API keys are handled securely and are never logged or exposed in evaluation outputs.


Secret Lifecycle Management#

The evaluator automatically manages Kubernetes secrets for API keys based on how you provide resources:

Target and Configuration Resources#

When you create targets or configurations with API keys:

  • Secrets are created automatically when the target or configuration is created

  • Secrets persist across multiple evaluation jobs

  • Secrets are deleted only when you delete the target or configuration

This approach is efficient for reusable resources—you can run multiple evaluation jobs with the same target or configuration without recreating secrets each time.

Inline Job Resources#

When you create a job with inline target or configuration definitions (not referencing existing resources by ID):

  • Secrets are created automatically when the job is submitted

  • Secrets are automatically deleted when the job completes (whether successful or failed)

This approach provides automatic cleanup for one-time evaluations where you don’t need to persist the configuration.


Supported Authentication Methods#

API Key Authentication#

The most common authentication method for external services. Specify the api_key field in your model endpoint configurations.

{
  "api_endpoint": {
    "url": "https://integrate.api.nvidia.com/v1/embeddings",
    "model_id": "nvidia/nv-embedqa-e5-v5",
    "api_key": "your-nvidia-api-key"
  }
}

Supported Models and Services#

Warning

Current Implementation: NeMo Evaluator currently supports NVIDIA embedding models only. Third-party embedding services are not yet implemented.

The authentication feature works with NVIDIA NIM services, including:

  • NVIDIA Embedding Models - For embeddings (e.g., nvidia/nv-embedqa-e5-v5)

  • NVIDIA Reranking Models - For reranking (e.g., nvidia/nv-rerankqa-mistral-4b-v3)

  • LLM Services - For answer generation and judge models

  • OpenAI-compatible endpoints - Any service following OpenAI-compatible format


Configuration Examples#

RAG Pipeline with Authentication#

Here’s an example of a complete RAG pipeline using authenticated NVIDIA services:

{
  "type": "rag",
  "rag": {
    "pipeline": {
      "retriever": {
        "pipeline": {
          "query_embedding_model": {
            "api_endpoint": {
              "url": "https://integrate.api.nvidia.com/v1/embeddings",
              "model_id": "nvidia/nv-embedqa-e5-v5",
              "api_key": "your-nvidia-api-key"
            }
          },
          "index_embedding_model": {
            "api_endpoint": {
              "url": "https://integrate.api.nvidia.com/v1/embeddings",
              "model_id": "nvidia/nv-embedqa-e5-v5",
              "api_key": "your-nvidia-api-key"
            }
          },
          "reranker_model": {
            "api_endpoint": {
              "url": "http://nemo-ranking-ms.nemo-retrieval.svc.cluster.local:8080/v1/ranking",
              "model_id": "nvidia/nv-rerankqa-mistral-4b-v3",
              "api_key": "your-nvidia-api-key"
            }
          },
          "top_k": 5
        }
      },
      "model": {
        "api_endpoint": {
          "url": "https://integrate.api.nvidia.com/v1/chat/completions",
          "model_id": "meta/llama-3.1-70b-instruct",
          "api_key": "your-nvidia-api-key"
        }
      }
    }
  }
}
{
  "type": "rag",
  "params": {
    "temperature": 0.1,
    "max_tokens": 512
  },
  "tasks": {
    "my-beir-task": {
      "type": "beir",
      "dataset": {
        "files_url": "file://nfcorpus/"
      },
      "params": {
        "judge_llm": {
          "api_endpoint": {
            "url": "https://integrate.api.nvidia.com/v1",
            "model_id": "meta/llama-3.1-8b-instruct",
            "api_key": "your-judge-api-key"
          }
        },
        "judge_embeddings": {
          "api_endpoint": {
            "url": "https://integrate.api.nvidia.com/v1/embeddings",
            "model_id": "nvidia/nv-embedqa-e5-v5",
            "api_key": "your-nvidia-api-key"
          }
        },
        "judge_timeout": 300,
        "judge_max_retries": 5,
        "judge_max_workers": 16
      },
      "metrics": {
        "recall_5": {"type": "recall_5"},
        "ndcg_cut_5": {"type": "ndcg_cut_5"},
        "recall_10": {"type": "recall_10"},
        "ndcg_cut_10": {"type": "ndcg_cut_10"},
        "faithfulness": {"type": "faithfulness"},
        "answer_relevancy": {"type": "answer_relevancy"}
      }
    }
  }
}

Retriever Pipeline with Authentication#

For retriever-only evaluations using NVIDIA embedding models:

{
  "type": "retriever", 
  "retriever": {
    "pipeline": {
      "query_embedding_model": {
        "api_endpoint": {
          "url": "https://integrate.api.nvidia.com/v1/embeddings",
          "model_id": "nvidia/nv-embedqa-e5-v5",
          "api_key": "your-nvidia-api-key"
        }
      },
      "index_embedding_model": {
        "api_endpoint": {
          "url": "https://integrate.api.nvidia.com/v1/embeddings",
          "model_id": "nvidia/nv-embedqa-e5-v5",
          "api_key": "your-nvidia-api-key"
        }
      },
      "top_k": 10
    }
  }
}