Code Generation Containers#
Containers specialized for evaluating code generation models and programming language capabilities.
BigCode Evaluation Harness Container#
NGC Catalog: bigcode-evaluation-harness
Container specialized for evaluating code generation models and programming language models.
Use Cases:
Code generation quality assessment
Programming problem solving
Code completion evaluation
Software engineering task assessment
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/bigcode-evaluation-harness:25.09
Default Parameters:
Parameter |
Value |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
LiveCodeBench Container#
NGC Catalog: livecodebench
LiveCodeBench provides holistic and contamination-free evaluation of coding capabilities of LLMs. It continuously collects new problems from contests across three competition platforms – LeetCode, AtCoder, and CodeForces.
Use Cases:
Holistic coding capability evaluation
Contamination-free assessment
Contest-style problem solving
Code generation and execution
Test output prediction
Self-repair capabilities
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/livecodebench:25.09
Default Parameters:
Parameter |
Value |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Supported Versions: v1-v6, 0724_0125, 0824_0225
SciCode Container#
NGC Catalog: scicode
SciCode is a challenging benchmark designed to evaluate the capabilities of language models in generating code for solving realistic scientific research problems with diverse coverage across 16 subdomains from six domains.
Use Cases:
Scientific research code generation
Multi-domain scientific programming
Research workflow automation
Scientific computation evaluation
Domain-specific coding tasks
Pull Command:
docker pull nvcr.io/nvidia/eval-factory/scicode:25.09
Default Parameters:
Parameter |
Value |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Supported Domains: Physics, Math, Material Science, Biology, Chemistry (16 subdomains from five domains)