Passing Data to Triton Inference Server#

This document provides an overview of how to pass data to a Triton Inference Server model—in this case, a Python backend model named “prediction_and_shapley”. According to the config proto buffer config.pbtxt produced by the Financial Fraud Training, the model expects four inputs and returns two outputs, and its configuration specifies both dynamic dimensions, static dimensions, and data types.

Let’s take a close look at the IO part of the model:

  name: "prediction_and_shapley"
  backend: "python"
  input [
    {
      name: "NODE_FEATURES"
      data_type: TYPE_FP32
      dims: [ -1, <NUM_INPUT_FEATURE> ]
    },
    {
      name: "EDGE_INDEX"
      data_type: TYPE_INT64
      dims: [ 2, -1 ]
    },
    {
      name: "COMPUTE_SHAP"
      data_type: TYPE_BOOL
      dims: [ 1 ]
    },
    {
      name: "FEATURE_MASK"
      data_type: TYPE_INT32
      dims: [ <NUM_INPUT_FEATURE> ]
    }
  ]


  output [
    {
      name: "PREDICTION"
      data_type: TYPE_FP32
      dims: [ -1, 1 ]
    },
    {
      name: "SHAP_VALUES"
      data_type: TYPE_FP32
      dims: [ -1, <NUM_INPUT_FEATURE> ]
    }
  ]

Model Configuration#

Inputs#

NODE_FEATURES
- Data Type: FP32
- Shape: [ -1, <NUM_INPUT_FEATURE> ]
- Interpretation: A dynamic batch of node features, where each sample consists of <NUM_INPUT_FEATURE> floating-point values.
EDGE_INDEX
- Data Type: INT64
- Shape: [ 2, -1 ]
- Interpretation: A tensor representing edge indices in a graph. The first dimension is fixed (2 rows) while the second dimension is dynamic (number of edges).
COMPUTE_SHAP
- Data Type: BOOL
- Shape: [ 1 ]
- Interpretation: A single boolean flag (wrapped in an array) to indicate whether SHAP values should be computed.
FEATURE_MASK
- Data Type: INT32
- Shape: [ <NUM_INPUT_FEATURE> ]
- Interpretation: FEATURE_MASK defines a mask for the node features, grouping features which should be added together. Values in the FEATURE_MASK must be integers in the range 0 to number of node’s features - 1, and indices corresponding to the same feature should have the same value. For example, if a customer_id is encoded into a 10-dimensional feature vector, the feature mask value should be the same across all 10 dimensions corresponding to that customer_id.

Outputs#

PREDICTION
- Data Type: FP32
- Shape: [ -1, 1 ]
- Interpretation: The model’s predictions for each input sample.
SHAP_VALUES
- Data Type: FP32
- Shape: [ -1, <NUM_INPUT_FEATURE> ]
- Interpretation: SHAP values for each feature per sample, used for interpretability.

Additional Model Parameters#

The model configuration includes parameters such as "in_channels", "hidden_channels", "out_channels", "n_hops", and file paths like "embedding_generator_model_state_dict" and "embeddings_based_xgboost_model". These parameters configure model internals. While they do not affect how data is passed at inference time, they determine how the backend processes the inputs.

Preparing and Passing Data#

When passing data to this model, ensure each input is a NumPy array (or a similar structure) with the correct shape and data type:

NODE_FEATURES:
Create a NumPy array of type np.float32 with shape (batch_size, <NUM_INPUT_FEATURE>).
EDGE_INDEX:
Create a NumPy array of type np.int64 with shape (2, num_edges).
COMPUTE_SHAP:
Create a NumPy array of type np.bool with shape (1,). For example:
```
  np.array([True], dtype=bool)
```

Data Types and Shape#

Ensure each of these arrays matches the data type (for example, FP32 for floats, INT64 for integers, BOOL for booleans, and so on) and dimension defined in the model configuration. The dynamic dimensions (indicated by -1) allow you to vary the number of samples or edges as long as the fixed dimensions (<NUM_INPUT_FEATURE> for node features and 2 for edge index rows) are maintained.

Using the Triton Client Libraries#

You need to prepare batch data in a similar way as you prepared your training data. For demonstration purposes, the following code snippet uses random data. Also, the feature mask should be prepared in a way that the indices corresponding to an encoded feature should have the same value.

Example with HTTP Client (Python)#

The following is an example code snippet using the Triton Python HTTP client to create an inference request:

import numpy as np
import tritonclient.http as httpclient

# Parameter: Define the number of node features
in_channels = <NUM_INPUT_FEATURE> # Replace with the desired number of node features

# Initialize the Triton client
triton_client = httpclient.InferenceServerClient(url="localhost:8000")

# Prepare input data
batch_size = 5            # batch size
num_edges = 10            # number of edges

node_features = np.random.rand(batch_size, in_channels).astype(np.float32)
edge_index = np.random.randint(0, batch_size, size=(2, num_edges)).astype(np.int64)
compute_shap = np.array([True], dtype=bool)
feature_mask = np.random.randint(0, 2, size=(in_channels,)).astype(np.int32)

# Create Triton input objects
inputs = []
inputs.append(httpclient.InferInput("NODE_FEATURES", node_features.shape, "FP32"))
inputs[-1].set_data_from_numpy(node_features)

inputs.append(httpclient.InferInput("EDGE_INDEX", edge_index.shape, "INT64"))
inputs[-1].set_data_from_numpy(edge_index)

inputs.append(httpclient.InferInput("COMPUTE_SHAP", compute_shap.shape, "BOOL"))
inputs[-1].set_data_from_numpy(compute_shap)

inputs.append(httpclient.InferInput("FEATURE_MASK", feature_mask.shape, "INT32"))
inputs[-1].set_data_from_numpy(feature_mask)

# Specify outputs to retrieve
outputs = []
outputs.append(httpclient.InferRequestedOutput("PREDICTION"))
outputs.append(httpclient.InferRequestedOutput("SHAP_VALUES"))

# Perform the inference request
response = triton_client.infer("prediction_and_shapley", inputs=inputs, outputs=outputs)

# Retrieve the results
prediction = response.as_numpy("PREDICTION")
shap_values = response.as_numpy("SHAP_VALUES")

print("Prediction:", prediction)
print("SHAP Values:", shap_values)

The Triton Python client libraries (both HTTP and gRPC) simplify the process by handling serialization, but you can also send raw JSON payloads using REST API if needed.