Specialized Tools Containers#
Containers for specialized evaluation tasks including agentic AI capabilities and advanced reasoning assessments.
Agentic Evaluation Container#
NGC Catalog: agentic_eval
Container for evaluating agentic AI models on tool usage and planning tasks.
Use Cases:
Tool usage evaluation
Planning tasks assessment
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/agentic_eval:25.09
Supported Benchmarks:
agentic_eval_answer_accuracy
agentic_eval_goal_accuracy_with_reference
agentic_eval_goal_accuracy_without_reference
agentic_eval_topic_adherence
agentic_eval_tool_call_accuracy
BFCL Container#
NGC Catalog: bfcl
Container for Berkeley Function-Calling Leaderboard evaluation framework.
Use Cases:
Tool usage evaluation
Multi-turn interactions
Native support for function/tool calling
Function calling evaluation
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/bfcl:25.09
Default Parameters:
Parameter |
Value |
---|---|
|
|
|
|
|
|
|
|
ToolTalk Container#
NGC Catalog: tooltalk
Container for evaluating AI models’ ability to use tools and APIs effectively.
Use Cases:
Tool usage evaluation
API interaction assessment
Function calling evaluation
External tool integration testing
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/tooltalk:25.09
Default Parameters:
Parameter |
Value |
---|---|
|
|