Code Generation Containers#

Containers specialized for evaluating code generation models and programming language capabilities.


BigCode Evaluation Harness Container#

NGC Catalog: bigcode-evaluation-harness

Container specialized for evaluating code generation models and programming language models.

Use Cases:

  • Code generation quality assessment

  • Programming problem solving

  • Code completion evaluation

  • Software engineering task assessment

Pull Command:

docker pull nvcr.io/nvidia/eval-factory/bigcode-evaluation-harness:25.09

Default Parameters:

Parameter

Value

limit_samples

None

max_new_tokens

512

temperature

1e-07

top_p

0.9999999

parallelism

10

max_retries

5

request_timeout

30

do_sample

True

n_samples

1


LiveCodeBench Container#

NGC Catalog: livecodebench

LiveCodeBench provides holistic and contamination-free evaluation of coding capabilities of LLMs. It continuously collects new problems from contests across three competition platforms – LeetCode, AtCoder, and CodeForces.

Use Cases:

  • Holistic coding capability evaluation

  • Contamination-free assessment

  • Contest-style problem solving

  • Code generation and execution

  • Test output prediction

  • Self-repair capabilities

Pull Command:

docker pull nvcr.io/nvidia/eval-factory/livecodebench:25.09

Default Parameters:

Parameter

Value

limit_samples

None

max_new_tokens

4096

temperature

0.0

top_p

1e-05

parallelism

10

max_retries

5

request_timeout

60

n_samples

10

num_process_evaluate

5

cache_batch_size

10

support_system_role

False

cot_code_execution

False

Supported Versions: v1-v6, 0724_0125, 0824_0225


SciCode Container#

NGC Catalog: scicode

SciCode is a challenging benchmark designed to evaluate the capabilities of language models in generating code for solving realistic scientific research problems with diverse coverage across 16 subdomains from six domains.

Use Cases:

  • Scientific research code generation

  • Multi-domain scientific programming

  • Research workflow automation

  • Scientific computation evaluation

  • Domain-specific coding tasks

Pull Command:

docker pull nvcr.io/nvidia/eval-factory/scicode:25.09

Default Parameters:

Parameter

Value

limit_samples

None

temperature

0

max_new_tokens

2048

top_p

1e-05

request_timeout

60

max_retries

2

with_background

False

include_dev

False

n_samples

1

eval_threads

None

Supported Domains: Physics, Math, Material Science, Biology, Chemistry (16 subdomains from five domains)