bfcl#
This page contains all evaluation tasks for the bfcl harness.
Task |
Description |
|---|---|
BFCL v2 with Single-turn, Live and Non-Live, AST and Exec evaluation. Not using native function calling. |
|
BFCL v2 with Single-turn, Live and Non-Live, AST evaluation only. Uses native function calling. |
|
BFCL v2 with Single-turn, Live and Non-Live, AST evaluation only. Not using native function calling. |
|
BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST and Exec evaluation. Not using native function calling. |
|
BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST evaluation. Uses native function calling. |
|
BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST evaluation. Not using native function calling. |
bfclv2#
BFCL v2 with Single-turn, Live and Non-Live, AST and Exec evaluation. Not using native function calling.
Harness: bfcl
Container:
nvcr.io/nvidia/eval-factory/bfcl:26.01
Container Digest:
sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9
Container Arch: multiarch
Task Type: bfclv2
{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
--dataset_format {{config.params.extra.custom_dataset.format}} \
--dataset_path {{config.params.extra.custom_dataset.path}} \
--test_category {{config.params.task}} \
--processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
{% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}} {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
params:
parallelism: 10
task: single_turn
extra:
native_calling: false
custom_dataset:
path: null
format: null
data_template_path: null
supported_endpoint_types:
- chat
- vlm
type: bfclv2
target:
api_endpoint: {}
bfclv2_ast#
BFCL v2 with Single-turn, Live and Non-Live, AST evaluation only. Uses native function calling.
Harness: bfcl
Container:
nvcr.io/nvidia/eval-factory/bfcl:26.01
Container Digest:
sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9
Container Arch: multiarch
Task Type: bfclv2_ast
{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
--dataset_format {{config.params.extra.custom_dataset.format}} \
--dataset_path {{config.params.extra.custom_dataset.path}} \
--test_category {{config.params.task}} \
--processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
{% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}} {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
params:
parallelism: 10
task: ast
extra:
native_calling: true
custom_dataset:
path: null
format: null
data_template_path: null
supported_endpoint_types:
- chat
- vlm
type: bfclv2_ast
target:
api_endpoint: {}
bfclv2_ast_prompting#
BFCL v2 with Single-turn, Live and Non-Live, AST evaluation only. Not using native function calling.
Harness: bfcl
Container:
nvcr.io/nvidia/eval-factory/bfcl:26.01
Container Digest:
sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9
Container Arch: multiarch
Task Type: bfclv2_ast_prompting
{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
--dataset_format {{config.params.extra.custom_dataset.format}} \
--dataset_path {{config.params.extra.custom_dataset.path}} \
--test_category {{config.params.task}} \
--processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
{% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}} {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
params:
parallelism: 10
task: ast
extra:
native_calling: false
custom_dataset:
path: null
format: null
data_template_path: null
supported_endpoint_types:
- chat
- vlm
type: bfclv2_ast_prompting
target:
api_endpoint: {}
bfclv3#
BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST and Exec evaluation. Not using native function calling.
Harness: bfcl
Container:
nvcr.io/nvidia/eval-factory/bfcl:26.01
Container Digest:
sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9
Container Arch: multiarch
Task Type: bfclv3
{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
--dataset_format {{config.params.extra.custom_dataset.format}} \
--dataset_path {{config.params.extra.custom_dataset.path}} \
--test_category {{config.params.task}} \
--processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
{% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}} {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
params:
parallelism: 10
task: all
extra:
native_calling: false
custom_dataset:
path: null
format: null
data_template_path: null
supported_endpoint_types:
- chat
- vlm
type: bfclv3
target:
api_endpoint: {}
bfclv3_ast#
BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST evaluation. Uses native function calling.
Harness: bfcl
Container:
nvcr.io/nvidia/eval-factory/bfcl:26.01
Container Digest:
sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9
Container Arch: multiarch
Task Type: bfclv3_ast
{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
--dataset_format {{config.params.extra.custom_dataset.format}} \
--dataset_path {{config.params.extra.custom_dataset.path}} \
--test_category {{config.params.task}} \
--processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
{% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}} {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
params:
parallelism: 10
task: multi_turn,ast
extra:
native_calling: true
custom_dataset:
path: null
format: null
data_template_path: null
supported_endpoint_types:
- chat
- vlm
type: bfclv3_ast
target:
api_endpoint: {}
bfclv3_ast_prompting#
BFCL v3 with Single-turn and Multi-turn, Live and Non-Live, AST evaluation. Not using native function calling.
Harness: bfcl
Container:
nvcr.io/nvidia/eval-factory/bfcl:26.01
Container Digest:
sha256:5016e1f2b9984f5d348ac3806974d7b5d6ff6f550605f3220a3f08318e0c60c9
Container Arch: multiarch
Task Type: bfclv3_ast_prompting
{%- if config.params.extra.custom_dataset.path is not none and config.params.extra.custom_dataset.format is not none -%} echo "Processing custom dataset..." && export BFCL_DATA_DIR=$(core-evals-process-custom-dataset \
--dataset_format {{config.params.extra.custom_dataset.format}} \
--dataset_path {{config.params.extra.custom_dataset.path}} \
--test_category {{config.params.task}} \
--processing_output_dir {{config.output_dir ~ "/custom_dataset_processing"}} \
{% if config.params.extra.custom_dataset.data_template_path %}--data_template_path {{config.params.extra.custom_dataset.data_template_path}}{% endif %}) && \
echo "Using custom dataset at ${BFCL_DATA_DIR}" && \
{% endif -%}
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl generate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}} {% if config.params.limit_samples is not none %} --limit {{config.params.limit_samples}}{% endif %} --num-threads {{config.params.parallelism}} && \
{% if target.api_endpoint.api_key_name is not none %}OPENAI_API_KEY=${{target.api_endpoint.api_key_name}}{% endif %} bfcl evaluate --model {{target.api_endpoint.model_id}} --test-category {{config.params.task}} --model-mapping oai --result-dir {{config.output_dir}} --score-dir {{config.output_dir}} --model-args base_url={{target.api_endpoint.url}},native_calling={{config.params.extra.native_calling}}
framework_name: bfcl
pkg_name: bfcl
config:
params:
parallelism: 10
task: multi_turn,ast
extra:
native_calling: false
custom_dataset:
path: null
format: null
data_template_path: null
supported_endpoint_types:
- chat
- vlm
type: bfclv3_ast_prompting
target:
api_endpoint: {}