Specialized Tools Containers#

Containers for specialized evaluation tasks including agentic AI capabilities and advanced reasoning assessments.

Agentic Evaluation Container#

Container for evaluating agentic AI models on tool usage and planning tasks.

Use Cases:

Pull Command:

docker pull nvcr.io/nvidia/eval-factory/agentic_eval:25.10

Supported Benchmarks:

NGC Catalog: bfcl

Container for Berkeley Function-Calling Leaderboard evaluation framework.

Use Cases:

Pull Command:

docker pull nvcr.io/nvidia/eval-factory/bfcl:25.10

Default Parameters:

Parameter	Value
`limit_samples`	`None`
`parallelism`	`10`
`native_calling`	`False`
`custom_dataset`	`{'path': None, 'format': None, 'data_template_path': None}`

NGC Catalog: tooltalk

Container for evaluating AI models’ ability to use tools and APIs effectively.

Use Cases:

Pull Command:

docker pull nvcr.io/nvidia/eval-factory/tooltalk:25.10

Default Parameters:

Parameter	Value
`limit_samples`	`None`