Most benchmarks require a Hugging Face token (hf_token) to access gated datasets. Create this secret before running evaluations:
import os
client.secrets.create(
workspace=workspace,
name="hf_token",
data=os.getenv("HF_TOKEN", "<your Hugging Face token>")
)