About Inference Routing
NVIDIA OpenShell handles inference traffic through two endpoints: inference.local and external endpoints.
The following table summarizes how OpenShell handles inference traffic.
How inference.local Works
When code inside a sandbox calls https://inference.local, the privacy router routes the request to the configured backend for that gateway. The configured model is applied to generation requests, and provider credentials are supplied by OpenShell rather than by code inside the sandbox.
If code calls an external inference host directly, that traffic is evaluated only by network_policies.
Supported API Patterns
Supported request patterns depend on the provider configured for inference.local.
OpenAI-compatible
Anthropic-compatible
Requests to inference.local that do not match the configured provider’s supported patterns are denied.
Next Steps
Continue with one of the following: