Deployment/DevOps Customization Guide

Next Item Prediction

This customization guide will help provide an overview of the various helm charts used in the setup and deployment of the Next Item Prediction AI Workflow. These charts can then be customized and integrated for your own infrastructure, software components, and environment.

This Helm chart will deploy an instance of MinIO, an S3 compatible object storage. This AI Workflow requires S3 compatible storage. To simplify the creation of development environments, we can provision MinIO to provide these object storage targets, which this Helm chart provides. If you have your own S3 buckets you would like to use, you can disable this service and choose to not install this helm chart.

MinIO Chart Installation

Copy
Copied!
            

helm install minio-merlin ./deploy/charts/minio --set persistence.storageClass=<your-storage-class> --set bucketCount=2 --create-namespace --namespace $NAMESPACE

Chart Values Explained (./deploy/charts/minio/values.yaml)

You can update the storage class to match your own storage requirements, and the size to match your storage needs. The bucketCount variable determines the number of S3 buckets used. This workflow uses 2 buckets. The first bucket stores the prepped data as parquet files in directories by date. The second bucket stores the trained models.

Copy
Copied!
            

minio: persistence: storageClass: "local-path" size: "100Gi" bucketCount: 2

This Helm chart will deploy an instance of MLflow , a platform to manage the ML lifecycle. MLflow is used as a model repository in this case.

This Helm chart will deploy an instance of MLflow , a platform to manage the ML lifecycle. MLflow is used as a model repository in this case, but its functionality in the current workflow is limited. In future releases of this workflow MLflow will become more integrated with the MLOps lifecycle.

MLflow Chart Installation

Copy
Copied!
            

helm install mlflow ./deploy/charts/mlflow --set databasePersistence.storageClass=local-path --create-namespace --namespace $NAMESPACE

Chart Values Explained (./deploy/charts/mlflow/values.yaml)

For objectPersistence:

The credentials that are set when MinIO is deployed are s3-user, but this should match your credentials associated with your own S3 storage if you chose not to deploy MinIO. You can update the object persistence with the bucket you want to usein S3. In this workflow we assume that all the models are stored in``bucket-2``, but you can change this to suit your needs.

Additionally, you can configure the databasePersistence section to match your own storage needs (storageClass and size).

Copy
Copied!
            

# This configures how MLflow will be deployed for the workflow. mlflow: replicaCount: 1 # change this value when using your own s3 storage objectPersistence: credentials: "s3-user" bucket: bucket-2 # Please update this section to match the storage you have available. databasePersistence: storageClass: "local-path" accessModes: - ReadWriteOnce size: "50Gi"

This Helm chart describes a job that prepares data. First, the job generates synthetic data for click interactions. Next, the data is formatted to be in parquet format in directories partitioned by day. The data is stored in S3 storage in bucket-1 (the storage functionality is in the source Python scripts).

Note

The Python scripts that are viewable as source in ./src are mounted in /merlin-ai-workflow-t4r in the container.

Data Preparation Chart Installation

Copy
Copied!
            

helm install next-item-wf-data-prep ./deploy/charts/next-item-wf-data-prep --create-namespace --namespace $NAMESPACE

Chart Values Explained (./deploy/charts/next-item-wf-data-prep/values.yaml)

This part of the workflow requires an NVIDIA GPU, which we specify in the resources field.

Environment variable values are also specified here. So, for example, our input data is downloaded to dataDir and we output it to dataDir/OUTPUT_DATA. Additionally, we select the bucket named bucket-1 in S3 to be the bucket where the prepped data is stored.

Copy
Copied!
            

resources: limits: nvidia.com/gpu: 1 # env variables dataDir: "/merlin-ai-workflow-t4r" outputFolder: "/merlin-ai-workflow-t4r/OUTPUT_DATA" preprocessBucketName: "bucket-1"

This Helm chart defines a CronJob that trains the Transformers4Rec model. A CronJob creates jobs on a repeating schedule. For purposes of running the workflow, you can create a job from a CronJob to manually trigger the job outside of its scheduled time. The output of training the model is stored in S3 in bucket-2.

Training Chart Installation

Copy
Copied!
            

helm install next-item-wf-train ./deploy/charts/next-item-wf-train --create-namespace --namespace $NAMESPACE

Chart Values Explained (./deploy/charts/next-item-wf-data-train/values.yaml)

This part of the workflow requires an NVIDIA GPU, which we specify in the resources field.

Environment variable values are also specified here – these variables all determine input data and model output locations within the container.

The CronJob’s trainingSchedule can be specified. A schedule is defined with the unix-cron string format (* * * * *) to specify when the job should be run.

schedule-cron-fields.png

So, for example, the current value 0 3 * * Sun means the job is set to run every Sunday at 3:00.

Copy
Copied!
            

resources: limits: nvidia.com/gpu: 1 # use the following if we aren't using GPU device plugin # resources: {} # env variables workflowOutput: "/workspace/data/output/nvt_workflow" modelOutputDir: "/workspace/data/output/model" modelOutputDirNoTrace: "/workspace/data/output/model/no_trace" preprocFolder: "/workspace/data/cleaned" workflowInput: "/workspace/data/output/nvt_workflow" modelInput: "/workspace/data/output/model" ensembleOutputFolder: "/workspace/data/" trainingSchedule: "03**Sun"

This Helm chart describes all the components needed for inference:

  • A Triton Inference server

  • A client to communicate with Triton

  • Integration with NVIDIA Cloud Native Add-On Pack components (such as Keycloak, Prometheus/Grafana, Cert/Trust Manager, HAProxy Ingress Controller, etc)

  • An Envoy proxy set up to authenticate and authorize requests to Triton via Keycloak

  • A sample Grafana dashboard

Inference Chart Installation

Copy
Copied!
            

helm install next-item-wf-infer ./deploy/charts/next-item-wf-infer --create-namespace --namespace $NAMESPACE

Chart Templates Explained

  • certificate.yaml - This template generates the certificate used by Envoy for encrypting traffic.

  • clientIngress.yaml - This template exposes the client to be accessible outside of the cluster.

  • clientService.yaml - This template sets up the service used by the client deployment.

  • envoyDeployment.yaml - This template sets up the Envoy deployment to connect to the Triton Inference Server service, as well as Keycloak and the realm created within Keycloak that is used for the authentication and authorization of requests.

  • envoyIngress.yaml - This template exposes the Triton service through Envoy to be accessible outside of the cluster.

  • envoyService.yaml - This template sets up the Envoy service that is used for routing traffic to Envoy.

  • grafanaDashboard.yaml - This is a sample Grafana dashboard showing Triton Inference Server metrics

  • merlin-workflow-client-deployment.yaml - This template deploys the Jupyter notebook client environment that is used to submit requests to Triton for inference.

  • merlin-workflow-infer-deployment.yaml - This template is the main inference pipeline deployment with Triton Inference Server, configured for the models generated from the training chart.

  • merlin-workflow-infer-service.yaml - This template sets up all of the network ports required for proper communication with Triton.

  • serviceMonitor.yaml - This template sets up a service monitor for Prometheus to scrape metrics from Triton.

Chart Values Explained (/deploy/charts/next-item-wf-data-train/values.yaml)

This part of the workflow is the main inference pipeline deployment, with Triton Inference Server and all of the supporting components and integrations for the pipeline, as well as a sample client to submit inference requests.

The Triton Inference Server deployment requires an NVIDIA GPU, which we specify in the resources field under infer. Triton configuration, such as the version, ports, and model repository are also specified under infer.

Various components and their required information, such as the client image/version, Envoy image/version, Keycloak realm, ingress and service configuration, etc. are found in the rest of the values.

Copy
Copied!
            

replicaCount: 1 infer: name: "infer" registry: "nvcr.io/nvidia/merlin" image: "merlin-pytorch" version: "22.12" ports: - name: grpc containerPort: 8001 - name: http containerPort: 8000 - name: metrics containerPort: 8002 command: ["/bin/bash", "-c"] args: - | tritonserver --model-repository=s3://$(S3_ENDPOINT)/bucket-2/ # ensure Merlin gets at least one GPU # and make it Pending the driver install # assumes GPU device plugin resources: requests: nvidia.com/gpu: 1 limits: nvidia.com/gpu: 1 inferClient: name: "client" registry: "nvcr.io/nvaie" image: "merlin-ai-workflow-t4r-client" version: "0.1.0" # use 'Always' for development, 'IfNotPresent' for production # TODO: change me when image is in NGC. Right now just using local image and not pulling. imagePullPolicy: Always imagePullSecrets: - name: ngc-secret nameOverride: "" fullnameOverride: "" workflow: imageurl: "nvcr.io" org: "nvcr.io/nvaie" certificateSecretName: "cert-secret" envoy: envoyImage: "envoy-http" envoyTag: "latest" rawEnvoyImage: "" keycloak: keycloakrealm: "merlin-workflows" service: clientNotebook: type: ClusterIP nodeport: 32223 envoy: type: ClusterIP nodeport: 32224 envoyIngress: # to use a default bare bones ingress controller. useIngress: true class: nvidia-ingress clientIngress: # to use a default bare bones ingress controller. useIngress: true class: nvidia-ingress

© Copyright 2022-2023, NVIDIA. Last updated on May 23, 2023.