Vision-Language Containers#
Containers specialized for evaluating multimodal models that process both visual and textual information.
VLMEvalKit Container#
NGC Catalog: vlmevalkit
Container for Vision-Language Model evaluation toolkit.
Use Cases:
Multimodal model evaluation
Image-text understanding assessment
Visual reasoning evaluation
Cross-modal performance testing
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/vlmevalkit:25.09
Default Parameters:
Parameter |
Value |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Supported Benchmarks:
ocrbench
- Optical character recognition and text understandingslidevqa
- Slide-based visual question answering (requiresOPENAI_CLIENT_ID
,OPENAI_CLIENT_SECRET
)chartqa
- Chart and graph interpretationai2d_judge
- AI2 Diagram understanding (requiresOPENAI_CLIENT_ID
,OPENAI_CLIENT_SECRET
)