Vision-Language Containers#
Containers specialized for evaluating multimodal models that process both visual and textual information.
VLMEvalKit Container#
NGC Catalog: vlmevalkit
Container for Vision-Language Model evaluation toolkit.
Use Cases:
Multimodal model evaluation
Image-text understanding assessment
Visual reasoning evaluation
Cross-modal performance testing
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/vlmevalkit:25.10
Default Parameters:
Parameter |
Value |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Supported Benchmarks:
ocrbench- Optical character recognition and text understandingslidevqa- Slide-based visual question answering (requiresOPENAI_CLIENT_ID,OPENAI_CLIENT_SECRET)chartqa- Chart and graph interpretationai2d_judge- AI2 Diagram understanding (requiresOPENAI_CLIENT_ID,OPENAI_CLIENT_SECRET)