Developer Guide

Next Item Prediction

The Development Guide is intended for Data Scientists and ML Engineers to understand the feature engineering and modeling process. The steps will be executed on a single machine rather than in a Cloud Native environment.

With this guide, you will understand:

  • Which features are used in the model

  • How feature engineering is done with NVTabular

  • How to build the XLNet-based transformer model with Transformers4Rec

  • How schemas are defined, and the relationship between an NVTabular Workflow and Triton Inference Server request schema.

This guide is best run within one of the NVIDIA-provided Docker containers, with all NVIDIA drivers, Deep Learning frameworks, Triton Inference Server, and Merlin libraries pre-installed. In particular, it has:

  • Transformers4Rec and an NVIDIA-optimized version of PyTorch

  • Triton Inference Server, for serving

  • The latest stable Merlin libraries as of 22.12

  • Cudf, Dask, and other dependencies of the Merlin ecosystem

We will use the the merlin-pytorch:22.12 container as a base, which are stored in NVIDIA’s container registry. You can create and log into your NVIDIA NGC account by following this guide.

Install Docker And Docker-compose

If you do not already have them installed, install Docker (version 20.10 or later) and Docker-Compose (version 1.29 or later) on your system. You can follow the guide from https://www.docker.com/

Once Docker has been installed, install the NVIDIA Container Toolkit if required, following the instructions here.

Note

This guide generally assumes that development will be occurring on a Linux-based system with an NVIDIA GPU installed. The NVIDIA Cloud Native Stack Developer version can be leveraged to set up all the required components, or the GPU-Optimized VMI available on most CSP Marketplaces can also be used.

Download The Workflow Source Code

Within the Next Item Prediction Workflow Collection on the Enterprise Catalog, we have included the source code for the workflow that will be referenced in this guide. Download the source code onto your system, then extract it, prior to proceeding.

Developing Locally Using MLflow

This repo contains a docker-compose configuration for launching a minimal MLflow server. To do so, run the following command:

Copy
Copied!
            

docker-compose -f docker-compose.mlflow.yaml up -d --build

You should then be able to navigate to localhost:5005/#/models to see a local MLflow model registry, and can use this for reading/writing models during development.

To shut the MLflow server down, you can run:

Copy
Copied!
            

docker-compose -f docker-compose.mlflow.yaml down

Note

The models/metrics that you record to the locally-running MLflow service will not be recorded to the production version, and vice-versa. So feel free to go wild training as many models as you want during development.

To switch between the development and production tracking servers, set the MLFLOW_TRACKING_URI environment variable in the .env file.

Bulding The Docker Image

Most of our code runs in the merlin-pytorch:22.12 Docker image, but you need to install some additional packages on top of it. To do so, there is a simple Dockerfile in docker/Dockerfile.nextitem to install these.

Note

Before running the train or ensemble steps, you must build this image locally.

Copy
Copied!
            

docker build . -f docker/Dockerfile.nextitem -t merlin-ai-workflow-t4r:latest

You can optionally push this to a container registry of your choice.

Running Scripts With Docker-compose

All scripts for training and data preparation are defined as docker-compose services. They are defined in docker-compose.yaml and executed by calling docker-compose run [service]. An example script configuration is below. Note that it uses environment variables such as ${RAW_DATA_FOLDER} to define where intermediate datasets and other files are stored.

Copy
Copied!
            

services: data-prep: image: nvcr.io/nvidia/merlin/merlin-pytorch:22.12 # (1) env_file: - .env volumes: # (2) - ./:/workspace - ${BASE_DATA_HOST_DIRECTORY}:/workspace/data environment: # (3) - LOCAL_MODE=true - DATA_DIR=${RAW_DATA_FOLDER} - OUTPUT_FOLDER=${PREPROCESSED_DATA_FOLDER} - PROCESSED_DATA_BUCKET_NAME="bucket-1" command: "python/workspace/src/01-data-prep.py" deploy: # (4) resources: reservations: devices: - driver: nvidia count: all capabilities: [ gpu ]

  1. This defines the docker image in which the command will run.

  2. We mount the dataset to /workspace/data, and also mount the current directory (your code) to /workspace, so that it can be executed inside the container.

  3. Parameters are passed to the python scripts via environment variables that we define here. Mostly things like input/output paths.

  4. The resources section enables the use of GPUs inside the container.

Mounting Data Into The Containers

All of the directories where data will be stored are defined in .env, and passed into the python scripts via the docker-compose.yaml file shown above. Here’s an example of how it looks.

Copy
Copied!
            

# This indicates where on the host machine the data is stored. # It will be mounted to /workspace/data inside the container(s) BASE_DATA_HOST_DIRECTORY=/home/userabc/data/yoochoose # All of these paths are inside the container itself. RAW_DATA_FOLDER=/workspace/data/raw PREPROCESSED_DATA_FOLDER=/workspace/data/cleaned BASE_OUTPUT_FOLDER=/workspace/data/output NVT_WORKFLOW_FOLDER=nvt_workflow MODEL_FOLDER=model MODEL_FOLDER_NOTRACE=model_notrace ENSEMBLE_OUTPUT_FOLDER=ensemble

Note

The first environment variable, BASE_DATA_HOST_DIRECTORY is the actual location on your host machine where data is stored a and will be mounted to /workspace/data in each container. You will need to change this to the path on your computer. All of the additional paths are relative to the containers themselves.

Important

These paths should be used for development purposes only. Production deployments should leverage more resilient enterprise storage solutions, such as object storage. More information is provided in the Deployment Guide.

There is also a helper docker-compose service called clean that will remove all of the intermediate/output data sets: PREPROCESSED_DATA_FOLDER, NVT_WORKFLOW_FOLDER, MODEL_FOLDER.

Copy
Copied!
            

docker-compose run clean

Note

See Replacing the Sample Data With Your Training Data for more about swapping the “Data Prep” stage of this workflow.

In this workflow, you will use a dataset with similar characteristics to the yoochoose dataset from the 2015 Recsys Challenge. More information is available about the dataset on Kaggle here.

We use a script to generate 1,000,000 user/item interactions per day for an 85 day period. The columns in the generated data set are:

  • Session ID - the id of the session. In one session there are one or many buying events. Could be represented as an integer number.

  • Timestamp - the time when the buy occurred. Format of YYYY-MM-DDThh:mm:ss.SSSZ

  • Item ID - the unique identifier of item that has been bought.

  • Category - the context of the click. This could be an item category i.e. sport.

The code to generate this data is in src/generate_synthetic_data.py.

In order to keep our model up to date, we will re-train it each day. The primary reason for doing this is that the model is trained on item_ids, and in order for the model to learn about new items, we must re-train it daily with fresh data.

As we will see, the item_ids go through an NVTabular Categorify operation, which assigns each of them to a unique integer. When the NVTabular workflow is fit to the data, all known item_ids should be present in the training data.

Each day, we will re-Categorify all historical interaction data to create a new NVTabular workflow and train a fresh model.

Note

While it is possible to reload a snapshot of a previous Transformers4Rec model and update it with new training data, it is not currently possible to update the NVTabular workflow to learn new item_ids.

The mapping from item_id -> categorified_integer_representation must be consistent when re-training a model.

NVTabular’s Categorify operation assigns these integer representations based on the frequency of each item_id in the training set, so the mapping will change as items get more or less popular over time.

Test / Validation Split

Which days to use for training and validation are defined in the .env file with the environment variables: CURRENT_DATE, NUM_EVAL_DAYS, and NUM_TRAIN_DAYS. This is often referred to as “out of time” validation. The training script will use the appropriate paths according to these environment variables, as shown below.

train_valid_schedule.png

Once the train and validation splits are defined, the training script will:

  • Fit our NVTabular workflow to the training set

  • Transform both the training set and eval set

  • Train a model on these transformed datasets

Any item_ids that exist in the validation set but not the training set will be mapped to an integer value of 0, which is a reserved number in NVTabular that represents all unknown items. These could either be item_ids that were newly added, or older ones that were not observed during the training set period.

In data preprocessing, we will split the entire synthetic data set into daily partitions, with one folder per day in YYYY-MM-DD format. This is necessary for our simulated dataset, but in reality it is assumed that new interaction data will be collected and made available in some sort of Data Lake each day.

Because a single session can cross the date boundary, we want to make sure that each full session ends up in the same date partition. To do this, we group the data by session_id and find the earliest timestamp for the session, and put all associated rows into the date partition of the earliest interaction. For example, all three of the events in the following table would be put in the 2014-05-08 directory.

session_id

timestamp

item_id

category

date

2209262

1399605897

214826705

0

2014-05-08

2209262

1399606257

214829670

0

2014-05-08

2209262

1399608054

214826705

0

2014-05-09

Pre-grouping by session_id

We will cover this more in the Feature Engineering with NVTabular section, and in the Understanding Schemas section, but before we start to use this data to train models, we will pre-group by session_id, so that all of the interactions in a given session are contained in one row. The data stored in each of the date-partitioned parquet files looks like this:

Copy
Copied!
            

timestamp item_id category [1396444666, 1396445162, 1396445412] [214716935, 214774687, 214832672] [0, 0, 0] [1396420745, 1396420733] [214826715, 214826835] [0, 0] [1396460527, 1396460844] [214532036, 214700432] [0, 0] [1396451237, 1396451257, 1396451287] [214712235, 214581489, 214602605] [0, 0, 0]

We also ensure that the following conditions are true:

  • The minimum session length is 2.

  • Interactions within each session (each row of the DataFrame) are sorted by timestamp.

Running The Data Prep Step

The data prep script will generate the synthetic data, group by session_id, and perform the filtering steps describe above. You can run it locally with the following command:

Copy
Copied!
            

docker-compose run data-prep

Note

As mentioned before, these docker-compose commands are intended for executing the stages of the AI Workflow in a development environment. See the Deployment Guide for how these steps are executed in a production environment.

This will run a docker-compose service called data-prep, which is is defined in docker-compose.yaml. The service will copy your code and data into the appropriate Docker image and then execute the python command to run it.

Be sure to update the volume mount paths in .env to point to your downloaded data!

The output data will be stored in parquet files in dated directories:

Copy
Copied!
            

${OUTPUT_FOLDER}/ ├── 2014-04-01 │ └── interactions_sessions_df.parquet ├── 2014-04-02 │ └── interactions_sessions_df.parquet ├── 2014-04-03 │ └── interactions_sessions_df.parquet ├── 2014-04-04 │ └── interactions_sessions_df.parquet ├── 2014-04-05 │ └── interactions_sessions_df.parquet ├── 2014-04-06 │ └── interactions_sessions_df.parquet ├── 2014-04-07 │ └── interactions_sessions_df.parquet ├── 2014-04-08 │ └── interactions_sessions_df.parquet ├── 2014-04-09 │ └── interactions_sessions_df.parquet ├── 2014-04-10 ...

The next step is to define an NVTabular workflow for transforming this data for model training and inference.

Feature engineering and model training are often demonstrated as two distinct steps of the modeling pipeline, but it is very important that the NVTabular workflow and Transformers4Rec (PyTorch) model that get produced are kept together. The PyTorch model will not work properly without the correct NVTabular workflow. For that reason, we perform these two steps together. This part of the workflow is where ML Engineers and Data Scientists will spend the majority of their time iterating on the model.

Building the Docker Image

If you haven’t already done so while configuring your local dev environment, build the Docker image that we will use for training the model:

Copy
Copied!
            

docker build . -f docker/Dockerfile.nextitem -t merlin-ai-workflow-t4r:latest

Now that we have the image with up-to-date libraries, we again use docker-compose to execute the training script.

Copy
Copied!
            

docker-compose run train-ensemble

The next two sections will dive deeper into the Feature Engineering and Model Training steps.

Feature Engineering With NVTabular

We use NVTabular for feature preprocessing and engineering. This will take the raw DataFrame that we produced in the data-prep step and prepare the columns for being fed to the model. NVTabular provides a high level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.

The workflow itself is defined in nvt_workflow.py. We will fit the NVTabular workflow to the training data and then use that to transform both the training and validation data. This is done so that our model doesn’t learn about any item_ids that may exist in the validation set but not in the training set.

Copy
Copied!
            

from nvt_workflow import define_nvt_workflow workflow = define_nvt_workflow() workflow.fit(train_data_paths) train_data = workflow.transform(train_data_paths) eval_data = workflow.transform(eval_data_paths)

Here is a preview of how the data looks before being transformed:

Copy
Copied!
            

timestamp item_id category 0 [1397183957, 1399737202] [7, 10] [3, 5] 1 [1397196378, 1403440345, 1410009519] [169, 27, 52] [76, 12, 23] 2 [1397229436] [4] [2] 3 [1397206158, 1405004431] [31, 4] [14, 2] 4 [1397174848, 1407974276] [182, 105] [82, 47]

The NVTabular workflow will perform the following operations:

  • Categorify the category and item_id columns

  • Transform the timestamp into a float representing the trigonometric sine of the day of the week, converting the timestamp into a cyclical temporal value.

  • Pad/crop each session to be a fixed length of 20 items (defined in defaults.max_sequence_length)

The transformed data looks like this:

Copy
Copied!
            

category-list et_dayofweek_sin-list item_id-list 0 [4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [-0.9749281, -0.781831, 0.0, 0.0, 0.0, 0.0, 0.... [2, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 1 [79, 13, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [-0.9749281, 1.1285199e-06, -0.781831, 0.0, 0.... [170, 27, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 2 [3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [-0.9749281, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0... [9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 3 [15, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [-0.9749281, -0.43388462, 0.0, 0.0, 0.0, 0.0, ... [31, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 4 [84, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... [-0.9749281, 0.43388295, 0.0, 0.0, 0.0, 0.0, 0... [183, 106, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...

Input And Output Schemas

The NVTabular Workflow that we have defined and fit will have an input schema and output schema that describe the shape and properties of the data that gets transformed by the Workflow.

Note

See Understanding Schemas for more about the end-to-end schemas for inference.

When inspecting workflow.input_schema and workflow.output_schema you will see the same column names as the DataFrames displayed above, but the output_schema also contains much more information about the shape of the data as well as the Tags that we applied to the different columns.

Copy
Copied!
            

# workflow.output_schema [ { 'name': 'category-list', 'tags': {<Tags.LIST: 'list'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': { 'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 1, 'cat_path': './/categories/unique.category.parquet', 'domain': {'min': 0, 'max': 197, 'name': 'category'}, 'embedding_sizes': {'cardinality': 198, 'dimension': 31}, 'value_count': {'max': 20} }, 'dtype': dtype('int64'), 'is_list': True, 'is_ragged': False }, { 'name': 'et_dayofweek_sin-list', 'tags': {<Tags.LIST: 'list'>, <Tags.CONTINUOUS: 'continuous'>}, 'properties': { 'value_count': {'max': 20}}, 'dtype': dtype('float32'), 'is_list': True, 'is_ragged': False }, { 'name': 'item_id-list', 'tags': {<Tags.ITEM_ID: 'item_id'>, <Tags.CATEGORICAL: 'categorical'>, <Tags.LIST: 'list'>, <Tags.ID: 'id'>, <Tags.ITEM: 'item'>}, 'properties': { 'num_buckets': None, 'freq_threshold': 0, 'max_size': 0, 'start_index': 1, 'cat_path': './/categories/unique.item_id.parquet', 'domain': {'min': 0, 'max': 2226, 'name': 'item_id'}, 'embedding_sizes': {'cardinality': 2227, 'dimension': 120}, 'value_count': {'max': 20} }, 'dtype': dtype('int64'), 'is_list': True, 'is_ragged': False } ]

This schema information is critical when moving to the training stage. We will use the workflow.output_schema to specify the input schema to our Transformers4Rec model. If you were to add features or otherwise change the NVTabular workflow, there is no change necessary to the Transformers4Rec model definition.

The workflow.input_schema is also very important because it defines the inference request format. There is more on how the input schema is used in Understanding Schemas.

Model Training With Transformers4Rec

As described previously, the model training process included in this AI workflow involves a transformer-based model trained over a range of dates speficied for the training and validation periods. Transformer-based models have gained increased popularity in recent years in the natural language processing (NLP) domain, which quickly translated into the recommender systems domain. The Transformers4Rec library works as a bridge between NLP and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). The following figure shows the use of the library in a recommender system along with the underlying architectures adopted from the NLP domain.

sequential_rec.png

What we implement in the training part of this workflow is a session-based recommendation model. This type of model considers short independent sequences (sessions) of interactions between users and items, which are commonly observed on e-commerce sites, news and media portals and on any other outlets where users choose to browse anonymously. This task is also relevant for scenarios where the users’ interests may change significantly over time depending on the user context or intent.

The workflow executes certain key steps during training that are further described below (note: code is simplified to show only the most relevant lines)

Loading Data And Feature Engineering

The initial step is to specify the parameters (eg. data ranges) for training and evaluation datasets and load the datasets from the source directory.

Copy
Copied!
            

# Definitions of the training/eval dates CURRENT_DATE = os.getenv("CURRENT_DATE", "2014-05-08") NUM_EVAL_DAYS = int(os.getenv("NUM_EVAL_DAYS", "2")) NUM_TRAIN_DAYS = int(os.getenv("NUM_TRAIN_DAYS", "14")) eval_dataset, train_dataset = training_datasets( PREPROC_FOLDER, datetime.datetime.strptime(CURRENT_DATE, "%Y-%m-%d"), NUM_EVAL_DAYS, NUM_TRAIN_DAYS, )

As you may recall from the previous section, performing feature engineering and transformation on the training and evaluation datasets is accomplished as follows, by first defining an NVTabular workflow and then using this workflow to fit and transform:

Copy
Copied!
            

workflow = define_nvt_workflow() workflow.fit(train_dataset) train_data = workflow.transform(train_dataset) eval_data = workflow.transform(eval_dataset)

Model Definition

The next step is to define the components of the transformer model that you will be training. These include defining:

  • Metrics for model evalution

  • The prediction task

  • The XLNet-based transformer model

In this workflow, two metrics will be defined and used to evaluate the trained transformer model that are commonly used in recommender systems: Normalized Discounted Cumulative Gain (NDGC@k) and Recall@k (over a list of top-k items presented to the user). NDCG accounts for rank of the relevant item in the recommendation list. Recall@k considers only the relevancy of items recommended in the top-k list. You can consider adding other metrics from here.

Copy
Copied!
            

# Define the evaluation top-N metrics and the cut-offs metrics = [ NDCGAt(top_ks=[20, 40], labels_onehot=True), RecallAt(top_ks=[20, 40], labels_onehot=True), ]

Next, we define the prediction task, which is next item prediction in this AI workflow. Other prediction tasks can be found here.

Copy
Copied!
            

# Define Next item prediction-task prediction_task = tr.NextItemPredictionTask(weight_tying=True, metrics=metrics)

Finally, we define and build the XLNet-based transformer model. There is a variety of other network architectures one can use, which are listed here.

An overview of Transformers4Rec model architectures can be found here.

Copy
Copied!
            

# Define the config of the XLNet Transformer architecture transformer_config = tr.XLNetConfig.build( d_model=d_model, n_head=8, n_layer=2, total_seq_length=max_sequence_length ) # Get the end-to-end model model = transformer_config.to_torch_model(input_module, prediction_task)

Training

The final step in the training portion of the workflow is to define a trainer object that will execute model training and evaluation:

Copy
Copied!
            

training_args = tr.trainer.T4RecTrainingArguments( output_dir=output_dir, **t4rec_training_arguments ) recsys_trainer = tr.Trainer( model=model, args=training_args, schema=schema, compute_metrics=True )

Assuming the NVTabular workflow object is already invoked to fit and transform the data, the following lines will train the transformer model and evaluate it on the evalution dataset using the evalution metrics previously defined.

Copy
Copied!
            

trainer.train_dataset_or_path = train_data trainer.reset_lr_scheduler() trainer.train() trainer.eval_dataset_or_path = eval_data eval_metrics = trainer.evaluate(metric_key_prefix="eval")

After the training is complete, the workflow will create and export the ensemble that will later be passed onto Triton inference server. This is described in the next secion.

To serve a previously trained Merlin model at inference, we need to export an ensemble that the Triton Inference Server can understand. This ensemble includes two components:

  • The NVTabular workflow that provides the specifics of how the new data (that will be subject to inference) needs to be transformed before it is fed into prediction

  • The trained pytorch model that will be used to perform inference

The following lines will export the ensemble to the specified target directory.

Copy
Copied!
            

export_pytorch_ensemble( model, workflow, SPARSE_MAX, _T4R_MODEL_NAME, ENSEMBLE_OUTPUT_FOLDER, )

We also utilize MLflow at this stage to register the three artifacts (NVTabular workflow, transformer model, and the ensemble) with MLflow for future retrieval and use. More information is available in the MLflow section of the Appendix.

Debugging the model on Triton

To run the model locally using Triton Inference Server, run the command below. This will allow you to send requests and inspect the results for development purposes.

Copy
Copied!
            

docker-compose up triton

The ensemble that you exported should be stored in ~/path/to/models/ensemble, which we will mount into the merlin-pytorch Docker container, and then run the tritonserver command.

For deploying a model to production on Kubernetes, see the Deployment guide in these docs.

One of the advantages of NVTabular is that it allows us to use the same feature transformation logic for training and serving our model.

The NVTabular workflow is fit to the training data and will infer an input schema and output schema. When using the Workflow to transform data, it uses the input schema to know which columns to read and how to manipulate them. This also means that it expects the data used for fit and transform to have the same columns and types. This includes the data used for real-time inference, which means that the NVTabular Workflow’s input_schema will define our API specification. Some care must be taken to ensure it all works together smoothly.

Inspecting The NVTabular Workflow Input Schema

Since the API is defined by the Workflow’s input schema, let’s take a look at what that looks like:

Copy
Copied!
            

[ { "name": "category", "tags": set(), "properties": {}, "dtype": dtype("int64"), "is_list": True, "is_ragged": True, }, { "name": "timestamp", "tags": set(), "properties": {}, "dtype": dtype("int64"), "is_list": True, "is_ragged": True, }, { "name": "item_id", "tags": set(), "properties": {}, "dtype": dtype("int64"), "is_list": True, "is_ragged": True, }, ]

Some things to note are:

  • There are 3 inputs: category, timestamp, and item_id

  • They are all variable-length (ragged) lists.

  • The elements of each list are int64 type.

The way we send ragged lists to Triton Inference Server uses a somewhat unique format. Each input column is sent as two pieces: values and offsets.

Converting To Values And Offsets Format

One major reason for requiring the ragged list columns to be converted to a “values and offsets” format is that it’s possible to send requests for multiple inferences at the same time. That may not be as immediately useful for this AI Workflow as others, but the functionality is there and we need to do some special formatting to account for it.

A ragged list (or ragged tensor) in this context means that the length of each session is variable. The NVTabular Workflow will perform a padding/cropping operation to ensure that the input to the model is a fixed-length list with 20 items, but the input to the Workflow can vary. The NVTabular model expects these ragged list inputs to come as not one but two columns, which are named <column>__values and <column>__nnzs (aka “offsets”).

The values of these come from the way that the cuDF library represents these list columns as leaves and offsets. The diagram below shows how a ragged input gets converted to arrays of values and offsets.

ragged_tensor_offsets.jpg

A bit of code might help explain how the conversion must happen. In the example below, we’ll make two different cuDF DataFrames - one with a single row (representing one session), and one with two rows. The two-row example is a better illustration of how the values and offsets are represented.

Copy
Copied!
            

import cudf df_2row = cudf.DataFrame( { "item_id": [[1, 2, 3, 4, 5], [101, 102]], } ) # item_id__values values = df_2row.item_id.list.leaves # [1, 2, 3, 4, 5, 101, 102] # item_id__nnzs offsets = df_2row.item_id._column.offsets # [0, 5, 7]

In this two-input case, you can see that the values are concatenated into a single array. The offsets [0, 5, 7] incidate the indices where each row starts and ends.

This single-input case will be the most common for this AI Workflow and is a bit simpler to construct. In this case, the item_id__values value will be the same as session_item_ids and item_id__nnzs will be [0, len(session_item_ids)].

Copy
Copied!
            

session_item_ids = [1,2,3,4,5] df_1row = cudf.DataFrame( { "item_id": [session_item_ids], } ) df_1row.item_id.list.leaves # [1, 2, 3, 4 ,5] df_1row.item_id._column.offsets # [0, 5]

Defining The HTTP Payload

Triton Inference Server uses the KServe Predict Protocol V2, which defines the HTTP/gRPC payload. You can find the OpenApi spec and gRPC protobuf here.

An example HTTP payload for a session with 5 items is:

Copy
Copied!
            

{ "id": "1", "inputs": [ { "name": "item_id__values", "shape": [ 5, 1 ], "datatype": "INT64", "data": [ 1, 2, 3, 4, 5 ] }, { "name": "item_id__nnzs", "shape": [ 2, 1 ], "datatype": "INT64", "data": [ 0, 5 ] }, { "name": "category__values", "shape": [ 5, 1 ], "datatype": "INT64", "data": [ 0, 0, 0, 0, 0 ] }, { "name": "category__nnzs", "shape": [ 2, 1 ], "datatype": "INT64", "data": [ 0, 5 ] }, { "name": "timestamp__values", "shape": [ 5, 1 ], "datatype": "INT64", "data": [ 1674198684, 1674198744, 1674198804, 1674198864, 1674198864 ] }, { "name": "timestamp__nnzs", "shape": [ 2, 1 ], "datatype": "INT64", "data": [ 0, 5 ] } ], "outputs": [ { "name": "output" } ] }

The sample data that we are using to train this model will not help the model learn about your products and how users interact with items. To do that, you will need to swap the “Data Prep” stage of this AI Workflow and train on your own user/item interactions.

There are many tools people use to prepare this data, from Spark jobs to SQL in data warehouses. In order to have a drop-in replacement for our training data, you must produce parquet files with the following fields and types:

Column Name

Type

Description

session_id

int64

An identifier for all transactions that occurred within the same “session”.

timestamp

int64

Seconds since epoch.

item_id

int64

The ID of the item in your catalog.

category

int64

The ID of the item category in your catalog. If you don’t have this, set them all to 0.

If you use a finer resolution for timestamps, update the definition of session_time in nvt_workflow.py.

Loading Data With NVTabular

NVTabular is able to read directly from Amazon/Minio S3, Google Cloud GCS, and Azure Blob Storage via the Dataset class. You can replace OUR_SOMETHING with YOUR_SOMETHING to read your data instead of the sample data.

See the merlin.io.dataset.Dataset documentation for more details.

Reading From Data Warehouses

If you use a data warehouse such as Snowflake, BigQuery, or Redshift, you can replace the training_datasets function with queries to your warehouse.

Copy
Copied!
            

eval_dataset, train_dataset = training_datasets( PREPROC_FOLDER, datetime.datetime.strptime(CURRENT_DATE, "%Y-%m-%d"), NUM_EVAL_DAYS, NUM_TRAIN_DAYS, )

Will become something like this, assuming your warehouse_client.query function returns a Pandas or cuDF DataFrame:

Copy
Copied!
            

from merlin.io.dataset import Dataset eval_dataset = Dataset( warehouse_client.query("SELECT * FROM eval_data") ) train_dataset = Dataset( warehouse_client.query("SELECT * FROM train_data") )

If you’re in a rush or want a quick summary of the commands to run, you can run the following commands in order to train and serve a model. All details of what is actually being executed are in the docker-compose.yaml file. Make sure you’ve configured your local dev environment first!

Copy
Copied!
            

# Start MLflow if you haven't already docker-compose -f docker-compose.mlflow.yaml up -d --build # Create a docker image with the necessary dependencies docker build . -f docker/Dockerfile.nextitem -t merlin-ai-workflow-t4r:latest # Train and serve the model docker-compose run data-prep docker-compose run train-ensemble docker-compose up triton

At this point you’ll have a Triton Server running on ports 8000 (http) and 8001 (grpc). To send some requests, run the included python example file.

Copy
Copied!
            

python src/triton_example_request.py

Continue with the rest of the guide to figure out what you just did.

© Copyright 2022-2023, NVIDIA. Last updated on May 23, 2023.