Evaluation Concepts#
NVIDIA NeMo Evaluator is the one stop shop for evaluating your LLMs as part of the NeMo ecosystem. It enables real-time evaluations of your LLM application through APIs, guiding developers and researchers in refining and optimizing LLMs for enhanced performance and real-world applicability. The NeMo Evaluator APIs can be seamlessly automated within development pipelines, enabling faster iterations without the need for live data. It is cost effective and suitable for pre-deployment checks and regression testing.
The development of Large Language Models (LLMs) has become pivotal in shaping intelligent applications across various domains. Enterprises today have a large number of LLMs to choose from, and need a rigorous and systematic evaluation framework to choose the LLM that best suits their use case.
NVIDIA NeMo Evaluator supports evaluation of LLMs through academic benchmarks, custom automated evaluations, and LLM-as-a-Judge. Beyond LLM evaluation, NeMo Evaluator also supports evaluation of Retriever and RAG pipelines. For more information, see Evaluation Types.
NeMo Evaluator Use Cases#
The following table gives the use cases that NeMo Evaluator supports.
Evaluation Focus |
Use Cases |
NeMo Evaluator Documentation |
---|---|---|
Models |
|
|
Evaluations |
|
|
Data |
|
NeMo Evaluator Interactions with Other Microservices#
The following diagram gives an overview of NeMo Evaluator’s interaction with other NeMo Microservices.