bfcl#

This page contains all evaluation tasks for the bfcl harness.

Task

Description

bfclv2

BFCL v2 with Single-turn, Live and Non-Live, AST and Exec evaluation. Not using native function calling.

bfclv2_ast

BFCL v2 with Single-turn, Live and Non-Live, AST evaluation only. Uses native function calling.

bfclv2_ast_prompting

BFCL v2 with Single-turn, Live and Non-Live, AST evaluation only. Not using native function calling.

bfclv3

BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST and Exec evaluation. Not using native function calling.

bfclv3_ast

BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST evaluation. Uses native function calling.

bfclv3_ast_prompting

BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST evaluation. Not using native function calling.

bfclv2#

BFCL v2 with Single-turn, Live and Non-Live, AST and Exec evaluation. Not using native function calling.

Harness: bfcl

Container:

nvcr.io/nvidia/eval-factory/bfcl:26.01

Container Digest:

sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9

Container Arch: multiarch

Task Type: bfclv2

{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
  --dataset_format {{config.params.extra.custom_dataset.format}} \
  --dataset_path {{config.params.extra.custom_dataset.path}} \
  --test_category {{config.params.task}} \
  --processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
  {% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}  {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads  {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
  params:
    parallelism: 10
    task: single_turn
    extra:
      native_calling: false
      custom_dataset:
        path: null
        format: null
        data_template_path: null
  supported_endpoint_types:
  - chat
  - vlm
  type: bfclv2
target:
  api_endpoint: {}

bfclv2_ast#

BFCL v2 with Single-turn, Live and Non-Live, AST evaluation only. Uses native function calling.

Harness: bfcl

Container:

nvcr.io/nvidia/eval-factory/bfcl:26.01

Container Digest:

sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9

Container Arch: multiarch

Task Type: bfclv2_ast

{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
  --dataset_format {{config.params.extra.custom_dataset.format}} \
  --dataset_path {{config.params.extra.custom_dataset.path}} \
  --test_category {{config.params.task}} \
  --processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
  {% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}  {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads  {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
  params:
    parallelism: 10
    task: ast
    extra:
      native_calling: true
      custom_dataset:
        path: null
        format: null
        data_template_path: null
  supported_endpoint_types:
  - chat
  - vlm
  type: bfclv2_ast
target:
  api_endpoint: {}

bfclv2_ast_prompting#

BFCL v2 with Single-turn, Live and Non-Live, AST evaluation only. Not using native function calling.

Harness: bfcl

Container:

nvcr.io/nvidia/eval-factory/bfcl:26.01

Container Digest:

sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9

Container Arch: multiarch

Task Type: bfclv2_ast_prompting

{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
  --dataset_format {{config.params.extra.custom_dataset.format}} \
  --dataset_path {{config.params.extra.custom_dataset.path}} \
  --test_category {{config.params.task}} \
  --processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
  {% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}  {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads  {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
  params:
    parallelism: 10
    task: ast
    extra:
      native_calling: false
      custom_dataset:
        path: null
        format: null
        data_template_path: null
  supported_endpoint_types:
  - chat
  - vlm
  type: bfclv2_ast_prompting
target:
  api_endpoint: {}

bfclv3#

BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST and Exec evaluation. Not using native function calling.

Harness: bfcl

Container:

nvcr.io/nvidia/eval-factory/bfcl:26.01

Container Digest:

sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9

Container Arch: multiarch

Task Type: bfclv3

{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
  --dataset_format {{config.params.extra.custom_dataset.format}} \
  --dataset_path {{config.params.extra.custom_dataset.path}} \
  --test_category {{config.params.task}} \
  --processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
  {% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}  {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads  {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
  params:
    parallelism: 10
    task: all
    extra:
      native_calling: false
      custom_dataset:
        path: null
        format: null
        data_template_path: null
  supported_endpoint_types:
  - chat
  - vlm
  type: bfclv3
target:
  api_endpoint: {}

bfclv3_ast#

BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST evaluation. Uses native function calling.

Harness: bfcl

Container:

nvcr.io/nvidia/eval-factory/bfcl:26.01

Container Digest:

sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9

Container Arch: multiarch

Task Type: bfclv3_ast

{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
  --dataset_format {{config.params.extra.custom_dataset.format}} \
  --dataset_path {{config.params.extra.custom_dataset.path}} \
  --test_category {{config.params.task}} \
  --processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
  {% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}  {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads  {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
  params:
    parallelism: 10
    task: multi_turn,ast
    extra:
      native_calling: true
      custom_dataset:
        path: null
        format: null
        data_template_path: null
  supported_endpoint_types:
  - chat
  - vlm
  type: bfclv3_ast
target:
  api_endpoint: {}

bfclv3_ast_prompting#

BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST evaluation. Not using native function calling.

Harness: bfcl

Container:

nvcr.io/nvidia/eval-factory/bfcl:26.01

Container Digest:

sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9

Container Arch: multiarch

Task Type: bfclv3_ast_prompting

{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
  --dataset_format {{config.params.extra.custom_dataset.format}} \
  --dataset_path {{config.params.extra.custom_dataset.path}} \
  --test_category {{config.params.task}} \
  --processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
  {% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}  {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads  {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
  params:
    parallelism: 10
    task: multi_turn,ast
    extra:
      native_calling: false
      custom_dataset:
        path: null
        format: null
        data_template_path: null
  supported_endpoint_types:
  - chat
  - vlm
  type: bfclv3_ast_prompting
target:
  api_endpoint: {}