Configure RBAC#

Inject Istio#

  1. Label the namespace to enable Istio injection.

    kubectl label namespace <namespace> istio-injection=enabled --overwrite
    

    Replace the <namespace> with your target namespace.

  2. Delete the existing pods to recreate them with Istio sidecar containers.

    kubectl delete pod $(kubectl get pods -n <namespace> | awk '{print $1}') -n <namespace>
    

Deploy Manifests#

  1. The following sample manifest deploys a gateway and ingress virtual service.

    ---
    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
      name: rag-gateway
      namespace: istio-system
    spec:
      selector:
        istio: ingressgateway
      servers:
        - port:
            number: 80
            name: http2
            protocol: HTTP
          hosts:
            - "*"
    
    ---
    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
      name: sample-vs
      namespace: <namespace>
    spec:
      hosts:
        - "*"
      gateways:
        - istio-system/rag-gateway
      http:
        - match:
            - uri:
                prefix: /admin
            - uri:
                prefix: /resources
            - uri:
                prefix: /welcome
            - uri:
                prefix: /realms
          route:
            - destination:
                host: keycloak.default.svc.cluster.local
                port:
                  number: 8080
        - match:
            - uri:
                prefix: /v1/completions
            - uri:
                prefix: /v1/chat/completions
          route:
            - destination:
                host: inferencing
                port:
                  number: 8080
    
  2. Apply the manifest.

    kubectl apply -f istio-sample-manifest.yaml
    
  3. Determine the Istio ingress gateway node port.

    kubectl get svc -n istio-system | grep ingress
    

    Example Output

    istio-ingressgateway   LoadBalancer   10.102.8.149     10.28.234.101   15021:32658/TCP,80:30611/TCP,443:31874/TCP,31400:30160/TCP,15443:32430/TCP   22h
    
  4. List the worker IP addresses.

    for node in `kubectl get nodes | awk '{print $1}' | grep -v NAME`; do echo $node ' ' | tr -d '\n'; kubectl describe node $node | grep -i 'internalIP:' | awk '{print $2}'; done
    

    Example Output

    nim-test-cluster-03-worker-nbhk9-56b4b888dd-8lpqd  10.120.199.16
    nim-test-cluster-03-worker-nbhk9-56b4b888dd-hnrxr  10.120.199.23
    
  5. The following manifest creates request authentication resources.

    • Update the target namespace.

    • Modify the issuer in the manifest with one of the preceding IP addresses and preceeding ingress Istio gateway node ports, mapped to port 80.

    ---
    apiVersion: security.istio.io/v1beta1
    kind: RequestAuthentication
    metadata:
      name: nim-request-authentication
      namespace: <namespace>
    spec:
      selector:
        matchLabels:
         app.kubernetes.io/name: inferencing
      jwtRules:
      - issuer: "http://10.176.21.249:30669/realms/nvidia-nim"
        jwksUri: "http://keycloak.default.svc.cluster.local:8080/realms/nvidia-nim/protocol/openid-connect/certs"
        forwardOriginalToken: true
        fromHeaders:
          - name: Authorization
            prefix: "Bearer"
      - issuer: "http://10.176.21.249/realms/nvidia-nim"
        jwksUri: "http://keycloak.default.svc.cluster.local:8080/realms/nvidia-nim/protocol/openid-connect/certs"
        forwardOriginalToken: true
        fromHeaders:
          - name: Authorization
            prefix: "Bearer"
    ---
    apiVersion: security.istio.io/v1beta1
    kind: RequestAuthentication
    metadata:
      name: nim-request-authentication-gw
      namespace: istio-system
    spec:
      selector:
        matchLabels:
         istio: ingressgateway
      jwtRules:
      - issuer: "http://10.176.21.249:30669/realms/nvidia-nim"
        jwksUri: "http://keycloak.default.svc.cluster.local:8080/realms/nvidia-nim/protocol/openid-connect/certs"
        forwardOriginalToken: true
        fromHeaders:
          - name: Authorization
            prefix: "Bearer"
      - issuer: "http://10.176.21.249/realms/nvidia-nim"
        jwksUri: "http://keycloak.default.svc.cluster.local:8080/realms/nvidia-nim/protocol/openid-connect/certs"
        forwardOriginalToken: true
        fromHeaders:
          - name: Authorization
            prefix: "Bearer"
    
  6. Apply the manifest.

    kubectl apply -f requestAuthentication.yaml
    
  7. The following manifest creates an authorization policy resource.

    • Update the target namespace.

    • Update the rules that apply to the target microservices.

    apiVersion: security.istio.io/v1beta1
    kind: AuthorizationPolicy
    metadata:
      name: nim-auth-policy
      namespace: <namespace>
    spec:
      selector:
        matchLabels:
          app.kubernetes.io/name: inferencing
      rules:
      - from:
        - source:
            requestPrincipals: ["*"]
        to:
        - operation:
            methods: ["POST"]
            paths: ["/v1/completions*"]
        when:
        - key: request.auth.claims[realm_access][roles]
          values: ["completions"]
      - from:
        - source:
            requestPrincipals: ["*"]
        to:
        - operation:
            methods: ["POST"]
            paths: ["/v1/chat/completions*"]
        when:
        - key: request.auth.claims[realm_access][roles]
          values: ["chat"]
    
  8. Apply the manifest.

    kubectl apply -f authorizationPolicy.yaml
    
  9. Create a token for Keycloak authentication. Update the node IP address and ingress gateway node port.

    TOKEN=`curl -X POST -d "client_id=nvidia-nim" -d "username=nim" -d "password=nvidia123" -d "grant_type=password" "http://10.217.19.114:30611/realms/nvidia-nim-llm/protocol/openid-connect/token"| jq .access_token| tr -d '"' `
    
  10. Verify access to the microservice from Keycloak through the Istio gateway.

    curl -v -X POST http://10.217.19.114:30611/v1/completions -H "Authorization: Bearer $TOKEN" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "model": "llama-2-13b-chat","prompt": "What is Kubernetes?","max_tokens": 16,"temperature": 1, "n": 1, "stream": false, "stop": "string", "frequency_penalty": 0.0 }'
    

    Update the node IP address and ingress gateway port. Update the model name if it is not llama-2-13b-chat.

  11. Generate some more data so it can be visualized in the next step on the Kiali dashboard.

    for i in $(seq 1 100); do curl -X POST http://10.217.19.114:30611/v1/chat/completions -H 'accept: application/json' -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json' -d '{"model": "llama-2-13b-chat","messages": [{"role": "system","content": "You are a helpful assistant."},{"role": "user", "content": "Hello!"}]}'  -s -o /dev/null; done
    
  12. Access the Istio Dashboard, specifying your client system IP address.

    istioctl dashboard kiali --address <system-ip>
    

Access in browser with system-ip and port 20001.

Conclusion#

This architecture offers a robust solution for deploying NVIDIA NeMo MicroServices in a secure, scalable, and efficient manner. Integrating advanced service mesh capabilities with OIDC authentication sets a new standard for building sophisticated AI-driven applications.