Google Vertex AI | NVIDIA OpenShell

Google Vertex AI is a managed machine learning platform that hosts Anthropic Claude, Gemini, and third-party models through Google Cloud. OpenShell can route inference.local traffic to Vertex AI using gateway-managed credential refresh, so sandbox agents do not handle GCP credentials directly.

Prerequisites

Before creating a Vertex AI provider, ensure you have:

A GCP project with the Vertex AI API enabled.
One of the following:
- A GCP service account with the Vertex AI User role and a downloaded JSON key file, for production use.
- The gcloud CLI with Application Default Credentials configured, for local development.

Authentication

The google-vertex-ai provider supports two credential sources.

Service Account Key

Supply the JSON key file content as the GOOGLE_SERVICE_ACCOUNT_KEY credential. OpenShell persists that value only as gateway-side refresh bootstrap material until you update or delete it. The raw service-account JSON and private key are not sandbox runtime credentials and are not exposed to sandboxes. Runtime inference requests use short-lived access tokens minted by the gateway and stored under a separate credential key.

$ openshell provider create \
>   --name vertex-prod \
>   --type google-vertex-ai \
>   --credential GOOGLE_SERVICE_ACCOUNT_KEY="$(cat /path/to/key.json)" \
>   --config VERTEX_AI_PROJECT_ID=my-gcp-project \
>   --config VERTEX_AI_REGION=us-central1

Then configure gateway-managed refresh so the gateway uses the private key as refresh bootstrap material and rotates access tokens:

$ openshell provider refresh configure vertex-prod \
>   --credential-key GOOGLE_VERTEX_AI_SERVICE_ACCOUNT_TOKEN \
>   --strategy google-service-account-jwt \
>   --material client_email="sa@my-gcp-project.iam.gserviceaccount.com" \
>   --material private_key="$(jq -r .private_key /path/to/key.json)" \
>   --secret-material-key private_key

gcloud Application Default Credentials

For local development, configure ADC first, then pass --from-gcloud-adc:

$ gcloud auth application-default login

$ openshell provider create \
>   --name vertex-local \
>   --type google-vertex-ai \
>   --from-gcloud-adc \
>   --config VERTEX_AI_PROJECT_ID=my-gcp-project \
>   --config VERTEX_AI_REGION=us-central1

--from-gcloud-adc reads GOOGLE_APPLICATION_CREDENTIALS first, then falls back to $CLOUDSDK_CONFIG/application_default_credentials.json when that environment variable is set, then to ~/.config/gcloud/application_default_credentials.json. It configures an OAuth2 refresh token flow on the gateway and immediately mints the first access token before the command returns. If the command succeeds, the provider is ready for inference right away. It only works with user credentials generated by gcloud auth application-default login. If your ADC file is a service account key, the CLI returns an error and directs you to use the service account key method above.

ADC-backed providers mint and rotate access tokens into GOOGLE_VERTEX_AI_TOKEN.

--from-gcloud-adc is valid for google-vertex-ai and google-cloud providers.

Configuration Keys

Pass these as --config KEY=VALUE when creating the provider, or set them as environment variables and use --from-existing.

Key	Required	Default	Description
`VERTEX_AI_PROJECT_ID`	Yes (unless `GOOGLE_VERTEX_AI_BASE_URL` or `VERTEX_AI_BASE_URL` is set)	—	GCP project ID.
`VERTEX_AI_REGION`	No	`us-central1`	Vertex location selector. Use a regional location such as `us-central1`, or `global`, `us`, or `eu` for the supported global and multi-region endpoints.
`GOOGLE_VERTEX_AI_BASE_URL`	No	—	Full base URL override for non-Anthropic routes. Must be an official Vertex AI HTTPS endpoint root.
`VERTEX_AI_BASE_URL`	No	—	Backward-compatible alias for `GOOGLE_VERTEX_AI_BASE_URL`.
`VERTEX_AI_PUBLISHER`	No	Inferred from model name	Set to `anthropic` to force Anthropic Messages API routing, or any other value for OpenAI-compatible routing.

When VERTEX_AI_PROJECT_ID is set and no base URL override is present, the gateway maps VERTEX_AI_REGION to the Vertex host automatically:

Regional locations such as us-central1 use https://<region>-aiplatform.googleapis.com.
global uses https://aiplatform.googleapis.com.
us and eu use https://aiplatform.<region>.rep.googleapis.com.

For Anthropic models, OpenShell builds the publisher-model Vertex path automatically and injects anthropic_version into the request body. Vertex rawPredict does not receive anthropic-version as a header, and OpenShell strips anthropic-beta for Vertex Claude routes. For non-Anthropic models, OpenShell uses Vertex’s OpenAI-compatible Chat Completions route under .../endpoints/openapi/chat/completions.

Use GOOGLE_VERTEX_AI_BASE_URL or VERTEX_AI_BASE_URL only for non-Anthropic Vertex routes. OpenShell rejects Anthropic models when a base URL override is set because Anthropic routes require model-path shaping and anthropic_version body injection. Overrides must use https:// and an official Vertex AI hostname such as aiplatform.googleapis.com, aiplatform.us.rep.googleapis.com, aiplatform.eu.rep.googleapis.com, or <region>-aiplatform.googleapis.com.

Supported Models

Vertex AI hosts Anthropic Claude models (claude-3-5-sonnet, claude-3-opus, and others) through a native Messages API integration, and Gemini and other third-party models through Vertex’s OpenAI-compatible Chat Completions endpoint. OpenShell infers the routing path from the model name. For the full list of available models and regions, refer to the Google Cloud model garden documentation.

Model names that match the claude-* pattern route through the Anthropic Messages API on Vertex. All other model names route through Vertex Chat Completions. Set VERTEX_AI_PUBLISHER=anthropic to force Anthropic routing when the model name does not follow the standard pattern.

OpenShell exposes Anthropic Vertex routes for inference only. It does not advertise OpenAI-style model discovery for those routes, so use the Google Cloud docs or Model Garden to discover supported Anthropic model IDs.

Configure Inference Routing

Before configuring inference routing, enable provider endpoint injection so the Vertex AI network endpoints are automatically included in sandbox policies:

$ openshell settings set --global --key providers_v2_enabled --value true --yes

Then point inference.local at the provider:

$ openshell inference set \
>   --provider vertex-prod \
>   --model claude-sonnet-4-6

Use --no-verify if the endpoint verification fails. This is common with the global region, where the validation probe may not match the actual rawPredict path:

$ openshell inference set \
>   --provider vertex-prod \
>   --model claude-sonnet-4-6 \
>   --no-verify

Sandboxes on that gateway reach the model at https://inference.local. For full details on inference routing, refer to Inference Routing.

Use from a Sandbox

Agents inside sandboxes should reach Vertex AI through inference.local, not by connecting to Vertex AI directly. The gateway manages GCP credential refresh and request translation; the agent only needs to point its SDK at the local endpoint.

The complete setup from scratch:

$ # 1. Enable provider endpoint injection
$ openshell settings set --global --key providers_v2_enabled --value true --yes
$ 
$ # 2. Create the provider
$ openshell provider create \
>   --name vertex-local \
>   --type google-vertex-ai \
>   --from-gcloud-adc \
>   --config VERTEX_AI_PROJECT_ID=my-gcp-project \
>   --config VERTEX_AI_REGION=us-central1
$ 
$ # 3. Configure inference routing
$ openshell inference set --provider vertex-local --model claude-sonnet-4-6 --no-verify
$ 
$ # 4. Create a sandbox with the provider attached
$ openshell sandbox create --name my-sandbox --provider vertex-local

Then inside the sandbox, launch the agent as shown below.

Claude Code

OpenCode

$ ANTHROPIC_BASE_URL="https://inference.local" ANTHROPIC_API_KEY=unused claude --bare

--bare skips the OAuth login flow and uses ANTHROPIC_API_KEY directly for authentication. The key value does not reach Vertex AI — inference.local strips it and injects the real GCP access token before forwarding.

Do not set CLAUDE_CODE_USE_VERTEX=1 inside the sandbox. That flag makes Claude Code connect directly to Vertex AI and attempt GCP credential discovery (ADC file, metadata service), which fails because the sandbox does not expose GCP credentials. Use inference.local instead.

Policy Proposals

After running an agent, the TUI (openshell term) may show policy proposals for denied endpoints. Common ones for Vertex AI sandboxes:

Endpoint	Action	Reason
`metadata.google.internal:80`	Reject	Resolves to `169.254.169.254` (GCE metadata service). Always blocked regardless of policy — the proxy blocks the resolved IP unconditionally to prevent credential exfiltration.
`downloads.claude.ai:443`	Approve if desired	Claude Code update checking and asset loading. Not required for inference.
`storage.googleapis.com:443`	Approve if desired	Google Cloud Storage. Used by some Claude Code features. Not required for inference.

From Existing Environment

If one of these token env vars is already set in your shell, create the provider with --from-existing:

GOOGLE_VERTEX_AI_TOKEN or VERTEX_AI_TOKEN
GOOGLE_VERTEX_AI_SERVICE_ACCOUNT_TOKEN or VERTEX_AI_SERVICE_ACCOUNT_TOKEN

OpenShell also reads these config env vars during --from-existing:

VERTEX_AI_PROJECT_ID
VERTEX_AI_REGION
GOOGLE_VERTEX_AI_BASE_URL or VERTEX_AI_BASE_URL
VERTEX_AI_PUBLISHER

Then create the provider:

$ openshell provider create \
>   --name vertex-env \
>   --type google-vertex-ai \
>   --from-existing

This reads credentials and config from the environment variables listed in the configuration keys table above.

Next Steps

To configure inference.local routing, refer to Inference Routing.
To manage provider credentials and refresh, refer to Providers.
To apply network policies to sandboxes using this provider, refer to Policies.