Most benchmarks require a Hugging Face token (hf_token) to access gated datasets. Create this secret before running evaluations:

import os

client.secrets.create(
    workspace=workspace,
    name="hf_token",
    data=os.getenv("HF_TOKEN", "<your Hugging Face token>")
)