Retail Object Recognition

This model encodes retail items to embedding vectors and predicts their labels based on the embedding vectors in the reference space.

The model consists of a trunk and an embedder. The trunk uses the architecture of ResNet101 with its fully connected layer removed. The embedder is a one-layer Perceptron with an input size of 2048 (the output size of the Average Pool in ResNet101) and an output size of 2048. Thus the embedding dimension of the Retail Embedding model is 2048.

Model Card

More details on the models can be found on the model card.

Deploy With DeepStream

To deploy these models with the Perception App, follow the config below to override the configuration:

property:
    net-scale-factor: 0.003921568627451
    offsets: 0;0;0
    model-color-format: 0
    tlt-model-key: nvidia_tlt
    tlt-encoded-model: ../../models/retailEmbedder/retailEmbedder.etlt
    model-engine-file: ../../models/retailEmbedder/retailEmbedder.etlt_b16_gpu0_fp16.engine
    infer-dims: 3;224;224
    batch-size: 16
    ## 0=FP32, 1=INT8, 2=FP16 mode
    network-mode: 2
    network-type: 100
    interval: 0
    ## Infer Processing Mode 1=Primary Mode 2=Secondary Mode
    process-mode: 2
    output-tensor-meta: 1

Note

The sample perception app have configuration file examples packaged under /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-fewshot-learning-app/configs/fsl/fsl_sgie_config.txt.

The “Deploying to DeepStream” chapter of TAO User Guide provides more details.