Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Framework Inference

For CLIP models, our inference script calculates CLIP similarity scores between a given image and a list of provided texts.

To enable the inference stage with a CLIP model, configure the configuration files:

  1. In the defaults section of conf/config.yaml, update the fw_inference field to point to the desired CLIP configuration file. For example, if you want to use the clip/clip_similarity configuration, change the fw_inference field to clip/clip_similarity.

defaults:
  - fw_inference: clip/clip_similarity
  ...
  1. In the stages field of conf/config.yaml, make sure the fw_inference stage is included. For example,

stages:
  - fw_inference
  ...
  1. Configure image_path and texts fields of conf/fw_inference/clip/clip_similarity.yaml. Set image_path to the path of the image for inference, and provide a list of texts for the texts field.

  2. Execute the launcher pipeline: python3 main.py.

Remarks:

  1. To load a pretrained checkpoint for inference, set the restore_from_path field in the model section to the path of the pretrained checkpoint in .nemo format in conf/fw_inference/clip/clip_similarity.yaml. By default, this field links to the .nemo format checkpoint located in the CLIP training checkpoints folder.