Google Vertex AI is a managed machine learning platform that hosts Anthropic Claude, Gemini, and third-party models through Google Cloud. OpenShell can route inference.local traffic to Vertex AI using gateway-managed credential refresh, so sandbox agents do not handle GCP credentials directly.
Before creating a Vertex AI provider, ensure you have:
gcloud CLI with Application Default Credentials configured, for local development.The google-vertex-ai provider supports two credential sources.
Supply the JSON key file content as the GOOGLE_SERVICE_ACCOUNT_KEY credential. OpenShell persists that value only as gateway-side refresh bootstrap material until you update or delete it. The raw service-account JSON and private key are not sandbox runtime credentials and are not exposed to sandboxes. Runtime inference requests use short-lived access tokens minted by the gateway and stored under a separate credential key.
Then configure gateway-managed refresh so the gateway uses the private key as refresh bootstrap material and rotates access tokens:
For local development, configure ADC first, then pass --from-gcloud-adc:
--from-gcloud-adc reads GOOGLE_APPLICATION_CREDENTIALS first, then falls back to $CLOUDSDK_CONFIG/application_default_credentials.json when that environment variable is set, then to ~/.config/gcloud/application_default_credentials.json. It configures an OAuth2 refresh token flow on the gateway and immediately mints the first access token before the command returns. If the command succeeds, the provider is ready for inference right away. It only works with user credentials generated by gcloud auth application-default login. If your ADC file is a service account key, the CLI returns an error and directs you to use the service account key method above.
ADC-backed providers mint and rotate access tokens into GOOGLE_VERTEX_AI_TOKEN.
--from-gcloud-adc is only valid for google-vertex-ai providers.
Pass these as --config KEY=VALUE when creating the provider, or set them as environment variables and use --from-existing.
When VERTEX_AI_PROJECT_ID is set and no base URL override is present, the gateway maps VERTEX_AI_REGION to the Vertex host automatically:
us-central1 use https://<region>-aiplatform.googleapis.com.global uses https://aiplatform.googleapis.com.us and eu use https://aiplatform.<region>.rep.googleapis.com.For Anthropic models, OpenShell builds the publisher-model Vertex path automatically and injects anthropic_version into the request body. Vertex rawPredict does not receive anthropic-version as a header, and OpenShell strips anthropic-beta for Vertex Claude routes. For non-Anthropic models, OpenShell uses Vertex’s OpenAI-compatible Chat Completions route under .../endpoints/openapi/chat/completions.
Use GOOGLE_VERTEX_AI_BASE_URL or VERTEX_AI_BASE_URL only for non-Anthropic Vertex routes. OpenShell rejects Anthropic models when a base URL override is set because Anthropic routes require model-path shaping and anthropic_version body injection. Overrides must use https:// and an official Vertex AI hostname such as aiplatform.googleapis.com, aiplatform.us.rep.googleapis.com, aiplatform.eu.rep.googleapis.com, or <region>-aiplatform.googleapis.com.
Vertex AI hosts Anthropic Claude models (claude-3-5-sonnet, claude-3-opus, and others) through a native Messages API integration, and Gemini and other third-party models through Vertex’s OpenAI-compatible Chat Completions endpoint. OpenShell infers the routing path from the model name. For the full list of available models and regions, refer to the Google Cloud model garden documentation.
Model names that match the claude-* pattern route through the Anthropic Messages API on Vertex. All other model names route through Vertex Chat Completions. Set VERTEX_AI_PUBLISHER=anthropic to force Anthropic routing when the model name does not follow the standard pattern.
OpenShell exposes Anthropic Vertex routes for inference only. It does not advertise OpenAI-style model discovery for those routes, so use the Google Cloud docs or Model Garden to discover supported Anthropic model IDs.
Before configuring inference routing, enable provider endpoint injection so the Vertex AI network endpoints are automatically included in sandbox policies:
Then point inference.local at the provider:
Use --no-verify if the endpoint verification fails. This is common with the global region, where the validation probe may not match the actual rawPredict path:
Sandboxes on that gateway reach the model at https://inference.local. For full details on inference routing, refer to Inference Routing.
Agents inside sandboxes should reach Vertex AI through inference.local, not by connecting to Vertex AI directly. The gateway manages GCP credential refresh and request translation; the agent only needs to point its SDK at the local endpoint.
The complete setup from scratch:
Then inside the sandbox, launch the agent as shown below.
--bare skips the OAuth login flow and uses ANTHROPIC_API_KEY directly for authentication. The key value does not reach Vertex AI — inference.local strips it and injects the real GCP access token before forwarding.
Do not set CLAUDE_CODE_USE_VERTEX=1 inside the sandbox. That flag makes Claude Code connect directly to Vertex AI and attempt GCP credential discovery (ADC file, metadata service), which fails because the sandbox does not expose GCP credentials. Use inference.local instead.
After running an agent, the TUI (openshell term) may show policy proposals for denied endpoints. Common ones for Vertex AI sandboxes:
If one of these token env vars is already set in your shell, create the provider with --from-existing:
GOOGLE_VERTEX_AI_TOKEN or VERTEX_AI_TOKENGOOGLE_VERTEX_AI_SERVICE_ACCOUNT_TOKEN or VERTEX_AI_SERVICE_ACCOUNT_TOKENOpenShell also reads these config env vars during --from-existing:
VERTEX_AI_PROJECT_IDVERTEX_AI_REGIONGOOGLE_VERTEX_AI_BASE_URL or VERTEX_AI_BASE_URLVERTEX_AI_PUBLISHERThen create the provider:
This reads credentials and config from the environment variables listed in the configuration keys table above.
inference.local routing, refer to Inference Routing.