Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Framework Inference#

For CLIP models, our inference script calculates CLIP similarity scores between a given image and a list of provided texts.

To enable the inference stage with a CLIP model, configure the configuration files:

  1. In the defaults section of conf/config.yaml, update the fw_inference field to point to the desired CLIP configuration file. For example, if you want to use the clip/clip_similarity configuration, change the fw_inference field to clip/clip_similarity.

defaults:
  - fw_inference: clip/clip_similarity
  ...
  1. In the stages field of conf/config.yaml, make sure the fw_inference stage is included. For example,

stages:
  - fw_inference
  ...
  1. Configure image_path and texts fields of conf/fw_inference/clip/clip_similarity.yaml. Set image_path to the path of the image for inference, and provide a list of texts for the texts field.

  2. Execute the launcher pipeline: python3 main.py.

Remarks:

  1. To load a pretrained checkpoint for inference, set the restore_from_path field in the model section to the path of the pretrained checkpoint in .nemo format in conf/fw_inference/clip/clip_similarity.yaml. By default, this field links to the .nemo format checkpoint located in the CLIP training checkpoints folder.