Configure RBAC#

Inject Istio
Deploy Manifests
Conclusion

Inject Istio#

Label the namespace to enable Istio injection.
```
kubectl label namespace <namespace> istio-injection=enabled --overwrite
```
Replace the <namespace> with your target namespace.

Delete the existing pods to recreate them with Istio sidecar containers.

kubectl delete pod $(kubectl get pods -n <namespace> | awk '{print $1}') -n <namespace>

Deploy Manifests#

The following sample manifest deploys a gateway and ingress virtual service.

Update the target namespace for the virtual service resource.
The sample manifest applies to NVIDIA NIM for LLMs. For other NVIDIA microservices, update the match and route for the microservice endpoints.
- For information about the microservice endpoints, refer to the following documents:

---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: rag-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http2
        protocol: HTTP
      hosts:
        - "*"

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: sample-vs
  namespace: <namespace>
spec:
  hosts:
    - "*"
  gateways:
    - istio-system/rag-gateway
  http:
    - match:
        - uri:
            prefix: /admin
        - uri:
            prefix: /resources
        - uri:
            prefix: /welcome
        - uri:
            prefix: /realms
      route:
        - destination:
            host: keycloak.default.svc.cluster.local
            port:
              number: 8080
    - match:
        - uri:
            prefix: /v1/completions
        - uri:
            prefix: /v1/chat/completions
      route:
        - destination:
            host: inferencing
            port:
              number: 8080

Apply the manifest.

kubectl apply -f istio-sample-manifest.yaml

Determine the Istio ingress gateway node port.

kubectl get svc -n istio-system | grep ingress

Example Output

istio-ingressgateway   LoadBalancer   10.102.8.149     10.28.234.101   15021:32658/TCP,80:30611/TCP,443:31874/TCP,31400:30160/TCP,15443:32430/TCP   22h

List the worker IP addresses.

for node in `kubectl get nodes | awk '{print $1}' | grep -v NAME`; do echo $node ' ' | tr -d '\n'; kubectl describe node $node | grep -i 'internalIP:' | awk '{print $2}'; done

Example Output

nim-test-cluster-03-worker-nbhk9-56b4b888dd-8lpqd  10.120.199.16
nim-test-cluster-03-worker-nbhk9-56b4b888dd-hnrxr  10.120.199.23

The following manifest creates request authentication resources.

Update the target namespace.
Modify the issuer in the manifest with one of the preceding IP addresses and preceeding ingress Istio gateway node ports, mapped to port 80.

---
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  name: nim-request-authentication
  namespace: <namespace>
spec:
  selector:
    matchLabels:
     app.kubernetes.io/name: inferencing
  jwtRules:
  - issuer: "http://10.176.21.249:30669/realms/nvidia-nim"
    jwksUri: "http://keycloak.default.svc.cluster.local:8080/realms/nvidia-nim/protocol/openid-connect/certs"
    forwardOriginalToken: true
    fromHeaders:
      - name: Authorization
        prefix: "Bearer"
  - issuer: "http://10.176.21.249/realms/nvidia-nim"
    jwksUri: "http://keycloak.default.svc.cluster.local:8080/realms/nvidia-nim/protocol/openid-connect/certs"
    forwardOriginalToken: true
    fromHeaders:
      - name: Authorization
        prefix: "Bearer"
---
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  name: nim-request-authentication-gw
  namespace: istio-system
spec:
  selector:
    matchLabels:
     istio: ingressgateway
  jwtRules:
  - issuer: "http://10.176.21.249:30669/realms/nvidia-nim"
    jwksUri: "http://keycloak.default.svc.cluster.local:8080/realms/nvidia-nim/protocol/openid-connect/certs"
    forwardOriginalToken: true
    fromHeaders:
      - name: Authorization
        prefix: "Bearer"
  - issuer: "http://10.176.21.249/realms/nvidia-nim"
    jwksUri: "http://keycloak.default.svc.cluster.local:8080/realms/nvidia-nim/protocol/openid-connect/certs"
    forwardOriginalToken: true
    fromHeaders:
      - name: Authorization
        prefix: "Bearer"

Apply the manifest.

kubectl apply -f requestAuthentication.yaml

The following manifest creates an authorization policy resource.

Update the target namespace.
Update the rules that apply to the target microservices.

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: nim-auth-policy
  namespace: <namespace>
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: inferencing
  rules:
  - from:
    - source:
        requestPrincipals: ["*"]
    to:
    - operation:
        methods: ["POST"]
        paths: ["/v1/completions*"]
    when:
    - key: request.auth.claims[realm_access][roles]
      values: ["completions"]
  - from:
    - source:
        requestPrincipals: ["*"]
    to:
    - operation:
        methods: ["POST"]
        paths: ["/v1/chat/completions*"]
    when:
    - key: request.auth.claims[realm_access][roles]
      values: ["chat"]

Apply the manifest.

kubectl apply -f authorizationPolicy.yaml

Create a token for Keycloak authentication. Update the node IP address and ingress gateway node port.

TOKEN=`curl -X POST -d "client_id=nvidia-nim" -d "username=nim" -d "password=nvidia123" -d "grant_type=password" "http://10.217.19.114:30611/realms/nvidia-nim-llm/protocol/openid-connect/token"| jq .access_token| tr -d '"' `

Verify access to the microservice from Keycloak through the Istio gateway.

curl -v -X POST http://10.217.19.114:30611/v1/completions -H "Authorization: Bearer $TOKEN" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "model": "llama-2-13b-chat","prompt": "What is Kubernetes?","max_tokens": 16,"temperature": 1, "n": 1, "stream": false, "stop": "string", "frequency_penalty": 0.0 }'

Update the node IP address and ingress gateway port. Update the model name if it is not llama-2-13b-chat.

Generate some more data so it can be visualized in the next step on the Kiali dashboard.

for i in $(seq 1 100); do curl -X POST http://10.217.19.114:30611/v1/chat/completions -H 'accept: application/json' -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json' -d '{"model": "llama-2-13b-chat","messages": [{"role": "system","content": "You are a helpful assistant."},{"role": "user", "content": "Hello!"}]}'  -s -o /dev/null; done

Access the Istio Dashboard, specifying your client system IP address.
```
istioctl dashboard kiali --address <system-ip>
```

Access in browser with system-ip and port 20001.

Conclusion#

This architecture offers a robust solution for deploying NVIDIA NeMo MicroServices in a secure, scalable, and efficient manner. Integrating advanced service mesh capabilities with OIDC authentication sets a new standard for building sophisticated AI-driven applications.