Next Item Prediction
The Development Guide is intended for Data Scientists and ML Engineers to understand the feature engineering and modeling process. The steps will be executed on a single machine rather than in a Cloud Native environment.
With this guide, you will understand:
Which features are used in the model
How feature engineering is done with NVTabular
How to build the XLNet-based transformer model with Transformers4Rec
How schemas are defined, and the relationship between an NVTabular Workflow and Triton Inference Server request schema.
This guide is best run within one of the NVIDIA-provided Docker containers, with all NVIDIA drivers, Deep Learning frameworks, Triton Inference Server, and Merlin libraries pre-installed. In particular, it has:
Transformers4Rec and an NVIDIA-optimized version of PyTorch
Triton Inference Server, for serving
The latest stable Merlin libraries as of 22.12
Cudf, Dask, and other dependencies of the Merlin ecosystem
We will use the the merlin-pytorch:22.12 container as a base, which are stored in NVIDIA’s container registry. You can create and log into your NVIDIA NGC account by following this guide.
Install Docker And Docker-compose
If you do not already have them installed, install Docker (version 20.10 or later) and Docker-Compose (version 1.29 or later) on your system. You can follow the guide from https://www.docker.com/
Once Docker has been installed, install the NVIDIA Container Toolkit if required, following the instructions here.
This guide generally assumes that development will be occurring on a Linux-based system with an NVIDIA GPU installed. The NVIDIA Cloud Native Stack Developer version can be leveraged to set up all the required components, or the GPU-Optimized VMI available on most CSP Marketplaces can also be used.
Download The Workflow Source Code
Within the Next Item Prediction Workflow Collection on the Enterprise Catalog, we have included the source code for the workflow that will be referenced in this guide. Download the source code onto your system, then extract it, prior to proceeding.
Developing Locally Using MLflow
This repo contains a docker-compose configuration for launching a minimal MLflow server. To do so, run the following command:
docker-compose -f docker-compose.mlflow.yaml up -d --build
You should then be able to navigate to localhost:5005/#/models
to see a local MLflow model registry, and can use this for reading/writing models during development.
To shut the MLflow server down, you can run:
docker-compose -f docker-compose.mlflow.yaml down
The models/metrics that you record to the locally-running MLflow service will not be recorded to the production version, and vice-versa. So feel free to go wild training as many models as you want during development.
To switch between the development and production tracking servers, set the MLFLOW_TRACKING_URI
environment variable in the .env
file.
Bulding The Docker Image
Most of our code runs in the merlin-pytorch:22.12 Docker image, but you need to install some additional packages on top of it. To do so, there is a simple Dockerfile in docker/Dockerfile.nextitem
to install these.
Before running the train
or ensemble
steps, you must build this image locally.
docker build . -f docker/Dockerfile.nextitem -t merlin-ai-workflow-t4r:latest
You can optionally push this to a container registry of your choice.
Running Scripts With Docker-compose
All scripts for training and data preparation are defined as docker-compose services. They are defined in docker-compose.yaml
and executed by calling docker-compose run [service]
. An example script configuration is below. Note that it uses environment variables such as ${RAW_DATA_FOLDER}
to define where intermediate datasets and other files are stored.
services:
data-prep:
image: nvcr.io/nvidia/merlin/merlin-pytorch:22.12 # (1)
env_file:
- .env
volumes: # (2)
- ./:/workspace
- ${BASE_DATA_HOST_DIRECTORY}:/workspace/data
environment: # (3)
- LOCAL_MODE=true
- DATA_DIR=${RAW_DATA_FOLDER}
- OUTPUT_FOLDER=${PREPROCESSED_DATA_FOLDER}
- PROCESSED_DATA_BUCKET_NAME="bucket-1"
command: "python/workspace/src/01-data-prep.py"
deploy: # (4)
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [ gpu ]
This defines the docker image in which the command will run.
We mount the dataset to
/workspace/data
, and also mount the current directory (your code) to/workspace
, so that it can be executed inside the container.Parameters are passed to the python scripts via environment variables that we define here. Mostly things like input/output paths.
The
resources
section enables the use of GPUs inside the container.
Mounting Data Into The Containers
All of the directories where data will be stored are defined in .env
, and passed into the python scripts via the docker-compose.yaml
file shown above. Here’s an example of how it looks.
# This indicates where on the host machine the data is stored.
# It will be mounted to /workspace/data inside the container(s)
BASE_DATA_HOST_DIRECTORY=/home/userabc/data/yoochoose
# All of these paths are inside the container itself.
RAW_DATA_FOLDER=/workspace/data/raw
PREPROCESSED_DATA_FOLDER=/workspace/data/cleaned
BASE_OUTPUT_FOLDER=/workspace/data/output
NVT_WORKFLOW_FOLDER=nvt_workflow
MODEL_FOLDER=model
MODEL_FOLDER_NOTRACE=model_notrace
ENSEMBLE_OUTPUT_FOLDER=ensemble
The first environment variable, BASE_DATA_HOST_DIRECTORY
is the actual location on your host machine where data is stored a and will be mounted to /workspace/data
in each container. You will need to change this to the path on your computer. All of the additional paths are relative to the containers themselves.
These paths should be used for development purposes only. Production deployments should leverage more resilient enterprise storage solutions, such as object storage. More information is provided in the Deployment Guide.
There is also a helper docker-compose service called clean
that will remove all of the intermediate/output data sets: PREPROCESSED_DATA_FOLDER
, NVT_WORKFLOW_FOLDER
, MODEL_FOLDER
.
docker-compose run clean
See Replacing the Sample Data With Your Training Data for more about swapping the “Data Prep” stage of this workflow.
In this workflow, you will use a dataset with similar characteristics to the yoochoose
dataset from the 2015 Recsys Challenge. More information is available about the dataset on Kaggle here.
We use a script to generate 1,000,000 user/item interactions per day for an 85 day period. The columns in the generated data set are:
Session ID - the id of the session. In one session there are one or many buying events. Could be represented as an integer number.
Timestamp - the time when the buy occurred. Format of YYYY-MM-DDThh:mm:ss.SSSZ
Item ID - the unique identifier of item that has been bought.
Category - the context of the click. This could be an item category i.e.
sport
.
The code to generate this data is in src/generate_synthetic_data.py
.
In order to keep our model up to date, we will re-train it each day. The primary reason for doing this is that the model is trained on item_id
s, and in order for the model to learn about new items, we must re-train it daily with fresh data.
As we will see, the item_id
s go through an NVTabular Categorify
operation, which assigns each of them to a unique integer. When the NVTabular workflow is fit
to the data, all known item_id
s should be present in the training data.
Each day, we will re-Categorify all historical interaction data to create a new NVTabular workflow and train a fresh model.
While it is possible to reload a snapshot of a previous Transformers4Rec model and update it with new training data, it is not currently possible to update the NVTabular workflow to learn new item_id
s.
The mapping from item_id -> categorified_integer_representation
must be consistent when re-training a model.
NVTabular’s Categorify operation assigns these integer representations based on the frequency of each item_id in the training set, so the mapping will change as items get more or less popular over time.
Test / Validation Split
Which days to use for training and validation are defined in the .env
file with the environment variables: CURRENT_DATE
, NUM_EVAL_DAYS
, and NUM_TRAIN_DAYS
. This is often referred to as “out of time” validation. The training script will use the appropriate paths according to these environment variables, as shown below.

Once the train and validation splits are defined, the training script will:
Fit our NVTabular workflow to the training set
Transform both the training set and eval set
Train a model on these transformed datasets
Any item_id
s that exist in the validation set but not the training set will be mapped to an integer value of 0
, which is a reserved number in NVTabular that represents all unknown items. These could either be item_id
s that were newly added, or older ones that were not observed during the training set period.
In data preprocessing, we will split the entire synthetic data set into daily partitions, with one folder per day in YYYY-MM-DD
format. This is necessary for our simulated dataset, but in reality it is assumed that new interaction data will be collected and made available in some sort of Data Lake each day.
Because a single session can cross the date boundary, we want to make sure that each full session ends up in the same date partition. To do this, we group the data by session_id
and find the earliest timestamp for the session, and put all associated rows into the date partition of the earliest interaction. For example, all three of the events in the following table would be put in the 2014-05-08
directory.
session_id |
timestamp |
item_id |
category |
date |
---|---|---|---|---|
2209262 |
1399605897 |
214826705 |
0 |
2014-05-08 |
2209262 |
1399606257 |
214829670 |
0 |
2014-05-08 |
2209262 |
1399608054 |
214826705 |
0 |
2014-05-09 |
Pre-grouping by session_id
We will cover this more in the Feature Engineering with NVTabular section, and in the Understanding Schemas section, but before we start to use this data to train models, we will pre-group by session_id
, so that all of the interactions in a given session are contained in one row. The data stored in each of the date-partitioned parquet files looks like this:
timestamp item_id category
[1396444666, 1396445162, 1396445412] [214716935, 214774687, 214832672] [0, 0, 0]
[1396420745, 1396420733] [214826715, 214826835] [0, 0]
[1396460527, 1396460844] [214532036, 214700432] [0, 0]
[1396451237, 1396451257, 1396451287] [214712235, 214581489, 214602605] [0, 0, 0]
We also ensure that the following conditions are true:
The minimum session length is 2.
Interactions within each session (each row of the DataFrame) are sorted by timestamp.
Running The Data Prep Step
The data prep script will generate the synthetic data, group by session_id
, and perform the filtering steps describe above. You can run it locally with the following command:
docker-compose run data-prep
As mentioned before, these docker-compose
commands are intended for executing the stages of the AI Workflow in a development environment. See the Deployment Guide for how these steps are executed in a production environment.
This will run a docker-compose service called data-prep
, which is is defined in docker-compose.yaml
. The service will copy your code and data into the appropriate Docker image and then execute the python command to run it.
Be sure to update the volume mount paths in .env
to point to your downloaded data!
The output data will be stored in parquet files in dated directories:
${OUTPUT_FOLDER}/
├── 2014-04-01
│ └── interactions_sessions_df.parquet
├── 2014-04-02
│ └── interactions_sessions_df.parquet
├── 2014-04-03
│ └── interactions_sessions_df.parquet
├── 2014-04-04
│ └── interactions_sessions_df.parquet
├── 2014-04-05
│ └── interactions_sessions_df.parquet
├── 2014-04-06
│ └── interactions_sessions_df.parquet
├── 2014-04-07
│ └── interactions_sessions_df.parquet
├── 2014-04-08
│ └── interactions_sessions_df.parquet
├── 2014-04-09
│ └── interactions_sessions_df.parquet
├── 2014-04-10
...
The next step is to define an NVTabular workflow for transforming this data for model training and inference.
Feature engineering and model training are often demonstrated as two distinct steps of the modeling pipeline, but it is very important that the NVTabular workflow and Transformers4Rec (PyTorch) model that get produced are kept together. The PyTorch model will not work properly without the correct NVTabular workflow. For that reason, we perform these two steps together. This part of the workflow is where ML Engineers and Data Scientists will spend the majority of their time iterating on the model.
Building the Docker Image
If you haven’t already done so while configuring your local dev environment, build the Docker image that we will use for training the model:
docker build . -f docker/Dockerfile.nextitem -t merlin-ai-workflow-t4r:latest
Now that we have the image with up-to-date libraries, we again use docker-compose to execute the training script.
docker-compose run train-ensemble
The next two sections will dive deeper into the Feature Engineering and Model Training steps.
Feature Engineering With NVTabular
We use NVTabular for feature preprocessing and engineering. This will take the raw DataFrame
that we produced in the data-prep
step and prepare the columns for being fed to the model. NVTabular provides a high level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.
The workflow itself is defined in nvt_workflow.py
. We will fit
the NVTabular workflow to the training data and then use that to transform
both the training and validation data. This is done so that our model doesn’t learn about any item_id
s that may exist in the validation set but not in the training set.
from nvt_workflow import define_nvt_workflow
workflow = define_nvt_workflow()
workflow.fit(train_data_paths)
train_data = workflow.transform(train_data_paths)
eval_data = workflow.transform(eval_data_paths)
Here is a preview of how the data looks before being transformed:
timestamp item_id category
0 [1397183957, 1399737202] [7, 10] [3, 5]
1 [1397196378, 1403440345, 1410009519] [169, 27, 52] [76, 12, 23]
2 [1397229436] [4] [2]
3 [1397206158, 1405004431] [31, 4] [14, 2]
4 [1397174848, 1407974276] [182, 105] [82, 47]
The NVTabular workflow will perform the following operations:
Categorify
thecategory
anditem_id
columnsTransform the timestamp into a float representing the trigonometric sine of the day of the week, converting the timestamp into a cyclical temporal value.
Pad/crop each session to be a fixed length of 20 items (defined in
defaults.max_sequence_length
)
The transformed data looks like this:
category-list et_dayofweek_sin-list item_id-list
0 [4, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [-0.9749281, -0.781831, 0.0, 0.0, 0.0, 0.0, 0.... [2, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
1 [79, 13, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [-0.9749281, 1.1285199e-06, -0.781831, 0.0, 0.... [170, 27, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
2 [3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [-0.9749281, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0... [9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
3 [15, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [-0.9749281, -0.43388462, 0.0, 0.0, 0.0, 0.0, ... [31, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
4 [84, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... [-0.9749281, 0.43388295, 0.0, 0.0, 0.0, 0.0, 0... [183, 106, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
Input And Output Schemas
The NVTabular Workflow that we have defined and fit will have an input schema and output schema that describe the shape and properties of the data that gets transformed by the Workflow.
See Understanding Schemas for more about the end-to-end schemas for inference.
When inspecting workflow.input_schema
and workflow.output_schema
you will see the same column names as the DataFrames displayed above, but the output_schema
also contains much more information about the shape of the data as well as the Tags that we applied to the different columns.
# workflow.output_schema
[
{
'name': 'category-list',
'tags': {<Tags.LIST: 'list'>, <Tags.CATEGORICAL: 'categorical'>},
'properties': {
'num_buckets': None,
'freq_threshold': 0,
'max_size': 0,
'start_index': 1,
'cat_path': './/categories/unique.category.parquet',
'domain': {'min': 0, 'max': 197, 'name': 'category'},
'embedding_sizes': {'cardinality': 198, 'dimension': 31},
'value_count': {'max': 20}
},
'dtype': dtype('int64'),
'is_list': True,
'is_ragged': False
},
{
'name': 'et_dayofweek_sin-list',
'tags': {<Tags.LIST: 'list'>, <Tags.CONTINUOUS: 'continuous'>},
'properties': {
'value_count': {'max': 20}},
'dtype': dtype('float32'),
'is_list': True,
'is_ragged': False
},
{
'name': 'item_id-list',
'tags': {<Tags.ITEM_ID: 'item_id'>, <Tags.CATEGORICAL: 'categorical'>, <Tags.LIST: 'list'>, <Tags.ID: 'id'>, <Tags.ITEM: 'item'>},
'properties': {
'num_buckets': None,
'freq_threshold': 0,
'max_size': 0,
'start_index': 1,
'cat_path': './/categories/unique.item_id.parquet',
'domain': {'min': 0, 'max': 2226, 'name': 'item_id'},
'embedding_sizes': {'cardinality': 2227, 'dimension': 120},
'value_count': {'max': 20}
},
'dtype': dtype('int64'),
'is_list':
True,
'is_ragged': False
}
]
This schema information is critical when moving to the training stage. We will use the workflow.output_schema
to specify the input schema to our Transformers4Rec model. If you were to add features or otherwise change the NVTabular workflow, there is no change necessary to the Transformers4Rec model definition.
The workflow.input_schema
is also very important because it defines the inference request format. There is more on how the input schema is used in Understanding Schemas.
Model Training With Transformers4Rec
As described previously, the model training process included in this AI workflow involves a transformer-based model trained over a range of dates speficied for the training and validation periods. Transformer-based models have gained increased popularity in recent years in the natural language processing (NLP) domain, which quickly translated into the recommender systems domain. The Transformers4Rec library works as a bridge between NLP and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). The following figure shows the use of the library in a recommender system along with the underlying architectures adopted from the NLP domain.

What we implement in the training part of this workflow is a session-based recommendation model. This type of model considers short independent sequences (sessions) of interactions between users and items, which are commonly observed on e-commerce sites, news and media portals and on any other outlets where users choose to browse anonymously. This task is also relevant for scenarios where the users’ interests may change significantly over time depending on the user context or intent.
The workflow executes certain key steps during training that are further described below (note: code is simplified to show only the most relevant lines)
Loading Data And Feature Engineering
The initial step is to specify the parameters (eg. data ranges) for training and evaluation datasets and load the datasets from the source directory.
# Definitions of the training/eval dates
CURRENT_DATE = os.getenv("CURRENT_DATE", "2014-05-08")
NUM_EVAL_DAYS = int(os.getenv("NUM_EVAL_DAYS", "2"))
NUM_TRAIN_DAYS = int(os.getenv("NUM_TRAIN_DAYS", "14"))
eval_dataset, train_dataset = training_datasets(
PREPROC_FOLDER,
datetime.datetime.strptime(CURRENT_DATE, "%Y-%m-%d"),
NUM_EVAL_DAYS,
NUM_TRAIN_DAYS,
)
As you may recall from the previous section, performing feature engineering and transformation on the training and evaluation datasets is accomplished as follows, by first defining an NVTabular workflow and then using this workflow to fit and transform:
workflow = define_nvt_workflow()
workflow.fit(train_dataset)
train_data = workflow.transform(train_dataset)
eval_data = workflow.transform(eval_dataset)
Model Definition
The next step is to define the components of the transformer model that you will be training. These include defining:
Metrics for model evalution
The prediction task
The XLNet-based transformer model
In this workflow, two metrics will be defined and used to evaluate the trained transformer model that are commonly used in recommender systems: Normalized Discounted Cumulative Gain (NDGC@k) and Recall@k (over a list of top-k items presented to the user). NDCG accounts for rank of the relevant item in the recommendation list. Recall@k considers only the relevancy of items recommended in the top-k list. You can consider adding other metrics from here.
# Define the evaluation top-N metrics and the cut-offs
metrics = [
NDCGAt(top_ks=[20, 40], labels_onehot=True),
RecallAt(top_ks=[20, 40], labels_onehot=True),
]
Next, we define the prediction task, which is next item prediction in this AI workflow. Other prediction tasks can be found here.
# Define Next item prediction-task
prediction_task = tr.NextItemPredictionTask(weight_tying=True, metrics=metrics)
Finally, we define and build the XLNet-based transformer model. There is a variety of other network architectures one can use, which are listed here.
An overview of Transformers4Rec model architectures can be found here.
# Define the config of the XLNet Transformer architecture
transformer_config = tr.XLNetConfig.build(
d_model=d_model, n_head=8, n_layer=2, total_seq_length=max_sequence_length
)
# Get the end-to-end model
model = transformer_config.to_torch_model(input_module, prediction_task)
Training
The final step in the training portion of the workflow is to define a trainer object that will execute model training and evaluation:
training_args = tr.trainer.T4RecTrainingArguments(
output_dir=output_dir, **t4rec_training_arguments
)
recsys_trainer = tr.Trainer(
model=model, args=training_args, schema=schema, compute_metrics=True
)
Assuming the NVTabular workflow object is already invoked to fit and transform the data, the following lines will train the transformer model and evaluate it on the evalution dataset using the evalution metrics previously defined.
trainer.train_dataset_or_path = train_data
trainer.reset_lr_scheduler()
trainer.train()
trainer.eval_dataset_or_path = eval_data
eval_metrics = trainer.evaluate(metric_key_prefix="eval")
After the training is complete, the workflow will create and export the ensemble that will later be passed onto Triton inference server. This is described in the next secion.
To serve a previously trained Merlin model at inference, we need to export an ensemble that the Triton Inference Server can understand. This ensemble includes two components:
The NVTabular workflow that provides the specifics of how the new data (that will be subject to inference) needs to be transformed before it is fed into prediction
The trained pytorch model that will be used to perform inference
The following lines will export the ensemble to the specified target directory.
export_pytorch_ensemble(
model,
workflow,
SPARSE_MAX,
_T4R_MODEL_NAME,
ENSEMBLE_OUTPUT_FOLDER,
)
We also utilize MLflow at this stage to register the three artifacts (NVTabular workflow, transformer model, and the ensemble) with MLflow for future retrieval and use. More information is available in the MLflow section of the Appendix.
Debugging the model on Triton
To run the model locally using Triton Inference Server, run the command below. This will allow you to send requests and inspect the results for development purposes.
docker-compose up triton
The ensemble that you exported should be stored in ~/path/to/models/ensemble
, which we will mount into the merlin-pytorch
Docker container, and then run the tritonserver
command.
For deploying a model to production on Kubernetes, see the Deployment guide in these docs.
One of the advantages of NVTabular is that it allows us to use the same feature transformation logic for training and serving our model.
The NVTabular workflow is fit
to the training data and will infer an input schema and output schema. When using the Workflow to transform
data, it uses the input schema to know which columns to read and how to manipulate them. This also means that it expects the data used for fit
and transform
to have the same columns and types. This includes the data used for real-time inference, which means that the NVTabular Workflow’s input_schema
will define our API specification. Some care must be taken to ensure it all works together smoothly.
Inspecting The NVTabular Workflow Input Schema
Since the API is defined by the Workflow’s input schema, let’s take a look at what that looks like:
[
{
"name": "category",
"tags": set(),
"properties": {},
"dtype": dtype("int64"),
"is_list": True,
"is_ragged": True,
},
{
"name": "timestamp",
"tags": set(),
"properties": {},
"dtype": dtype("int64"),
"is_list": True,
"is_ragged": True,
},
{
"name": "item_id",
"tags": set(),
"properties": {},
"dtype": dtype("int64"),
"is_list": True,
"is_ragged": True,
},
]
Some things to note are:
There are 3 inputs:
category
,timestamp
, anditem_id
They are all variable-length (ragged) lists.
The elements of each list are
int64
type.
The way we send ragged lists to Triton Inference Server uses a somewhat unique format. Each input column is sent as two pieces: values and offsets.
Converting To Values And Offsets Format
One major reason for requiring the ragged list columns to be converted to a “values and offsets” format is that it’s possible to send requests for multiple inferences at the same time. That may not be as immediately useful for this AI Workflow as others, but the functionality is there and we need to do some special formatting to account for it.
A ragged list (or ragged tensor) in this context means that the length of each session is variable. The NVTabular Workflow will perform a padding/cropping operation to ensure that the input to the model is a fixed-length list with 20 items, but the input to the Workflow can vary. The NVTabular model expects these ragged list inputs to come as not one but two columns, which are named <column>__values
and <column>__nnzs
(aka “offsets”).
The values of these come from the way that the cuDF library represents these list columns as leaves
and offsets
. The diagram below shows how a ragged input gets converted to arrays of values and offsets.

A bit of code might help explain how the conversion must happen. In the example below, we’ll make two different cuDF DataFrame
s - one with a single row (representing one session), and one with two rows. The two-row example is a better illustration of how the values and offsets are represented.
import cudf
df_2row = cudf.DataFrame(
{
"item_id": [[1, 2, 3, 4, 5], [101, 102]],
}
)
# item_id__values
values = df_2row.item_id.list.leaves
# [1, 2, 3, 4, 5, 101, 102]
# item_id__nnzs
offsets = df_2row.item_id._column.offsets
# [0, 5, 7]
In this two-input case, you can see that the values are concatenated into a single array. The offsets [0, 5, 7]
incidate the indices where each row starts and ends.
This single-input case will be the most common for this AI Workflow and is a bit simpler to construct. In this case, the item_id__values
value will be the same as session_item_ids
and item_id__nnzs
will be [0, len(session_item_ids)]
.
session_item_ids = [1,2,3,4,5]
df_1row = cudf.DataFrame(
{
"item_id": [session_item_ids],
}
)
df_1row.item_id.list.leaves
# [1, 2, 3, 4 ,5]
df_1row.item_id._column.offsets
# [0, 5]
Defining The HTTP Payload
Triton Inference Server uses the KServe Predict Protocol V2, which defines the HTTP/gRPC payload. You can find the OpenApi spec and gRPC protobuf here.
An example HTTP payload for a session with 5 items is:
{
"id": "1",
"inputs": [
{
"name": "item_id__values",
"shape": [
5,
1
],
"datatype": "INT64",
"data": [
1,
2,
3,
4,
5
]
},
{
"name": "item_id__nnzs",
"shape": [
2,
1
],
"datatype": "INT64",
"data": [
0,
5
]
},
{
"name": "category__values",
"shape": [
5,
1
],
"datatype": "INT64",
"data": [
0,
0,
0,
0,
0
]
},
{
"name": "category__nnzs",
"shape": [
2,
1
],
"datatype": "INT64",
"data": [
0,
5
]
},
{
"name": "timestamp__values",
"shape": [
5,
1
],
"datatype": "INT64",
"data": [
1674198684,
1674198744,
1674198804,
1674198864,
1674198864
]
},
{
"name": "timestamp__nnzs",
"shape": [
2,
1
],
"datatype": "INT64",
"data": [
0,
5
]
}
],
"outputs": [
{
"name": "output"
}
]
}
The sample data that we are using to train this model will not help the model learn about your products and how users interact with items. To do that, you will need to swap the “Data Prep” stage of this AI Workflow and train on your own user/item interactions.
There are many tools people use to prepare this data, from Spark jobs to SQL in data warehouses. In order to have a drop-in replacement for our training data, you must produce parquet files with the following fields and types:
Column Name |
Type |
Description |
---|---|---|
session_id |
int64 |
An identifier for all transactions that occurred within the same “session”. |
timestamp |
int64 |
Seconds since epoch. |
item_id |
int64 |
The ID of the item in your catalog. |
category |
int64 |
The ID of the item category in your catalog. If you don’t have this, set them all to |
If you use a finer resolution for timestamps, update the definition of session_time
in nvt_workflow.py
.
Loading Data With NVTabular
NVTabular is able to read directly from Amazon/Minio S3, Google Cloud GCS, and Azure Blob Storage via the Dataset
class. You can replace OUR_SOMETHING
with YOUR_SOMETHING
to read your data instead of the sample data.
See the merlin.io.dataset.Dataset documentation for more details.
Reading From Data Warehouses
If you use a data warehouse such as Snowflake, BigQuery, or Redshift, you can replace the training_datasets
function with queries to your warehouse.
eval_dataset, train_dataset = training_datasets(
PREPROC_FOLDER,
datetime.datetime.strptime(CURRENT_DATE, "%Y-%m-%d"),
NUM_EVAL_DAYS,
NUM_TRAIN_DAYS,
)
Will become something like this, assuming your warehouse_client.query
function returns a Pandas or cuDF DataFrame
:
from merlin.io.dataset import Dataset
eval_dataset = Dataset(
warehouse_client.query("SELECT * FROM eval_data")
)
train_dataset = Dataset(
warehouse_client.query("SELECT * FROM train_data")
)
If you’re in a rush or want a quick summary of the commands to run, you can run the following commands in order to train and serve a model. All details of what is actually being executed are in the docker-compose.yaml
file. Make sure you’ve configured your local dev environment first!
# Start MLflow if you haven't already
docker-compose -f docker-compose.mlflow.yaml up -d --build
# Create a docker image with the necessary dependencies
docker build . -f docker/Dockerfile.nextitem -t merlin-ai-workflow-t4r:latest
# Train and serve the model
docker-compose run data-prep
docker-compose run train-ensemble
docker-compose up triton
At this point you’ll have a Triton Server running on ports 8000 (http) and 8001 (grpc). To send some requests, run the included python example file.
python src/triton_example_request.py
Continue with the rest of the guide to figure out what you just did.