Specialized Tools Containers#

Containers for specialized evaluation tasks including agentic AI capabilities and advanced reasoning assessments.


Agentic Evaluation Container#

NGC Catalog: agentic_eval

Container for evaluating agentic AI models on tool usage and planning tasks.

Use Cases:

  • Tool usage evaluation

  • Planning tasks assessment

Pull Command:

docker pull nvcr.io/nvidia/eval-factory/agentic_eval:25.08.1

Supported Benchmarks:

  • agentic_eval_answer_accuracy

  • agentic_eval_goal_accuracy_with_reference

  • agentic_eval_goal_accuracy_without_reference

  • agentic_eval_topic_adherence

  • agentic_eval_tool_call_accuracy