Specialized Tools Containers#
Containers for specialized evaluation tasks including agentic AI capabilities and advanced reasoning assessments.
Agentic Evaluation Container#
NGC Catalog: agentic_eval
Container for evaluating agentic AI models on tool usage and planning tasks.
Use Cases:
Tool usage evaluation
Planning tasks assessment
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/agentic_eval:25.10
Supported Benchmarks:
agentic_eval_answer_accuracyagentic_eval_goal_accuracy_with_referenceagentic_eval_goal_accuracy_without_referenceagentic_eval_topic_adherenceagentic_eval_tool_call_accuracy
BFCL Container#
NGC Catalog: bfcl
Container for Berkeley Function-Calling Leaderboard evaluation framework.
Use Cases:
Tool usage evaluation
Multi-turn interactions
Native support for function/tool calling
Function calling evaluation
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/bfcl:25.10
Default Parameters:
Parameter |
Value |
|---|---|
|
|
|
|
|
|
|
|
ToolTalk Container#
NGC Catalog: tooltalk
Container for evaluating AI models’ ability to use tools and APIs effectively.
Use Cases:
Tool usage evaluation
API interaction assessment
Function calling evaluation
External tool integration testing
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/tooltalk:25.10
Default Parameters:
Parameter |
Value |
|---|---|
|
|