Framework Inference

With NeVA models, the inference script generates responses to all the prompts provided in an input .jsonl file. Here’s an example of what an input file might look like:

Copy
Copied!
            

{"image": "001.jpg", "prompt": "What is the name of this famous sight in the photo?\n<image>", "category": "conv", "question_id": 0} {"image": "001.jpg", "prompt": "Describe this photo in detail.\n<image>", "category": "detail", "question_id": 1} {"image": "001.jpg", "prompt": "What are the possible reasons for the formation of this sight?\n<image>", "category": "complex", "question_id": 2} ...

The crucial fields within each line are “image” and “prompt”. The trained NeVA model will generate responses for each line, which can be viewed both on the console and in the output file.

To facilitate inference with NeVA, follow these configuration steps:

  1. In the defaults section of conf/config.yaml, adjust the fw_inference field to reference your desired NeVA inference configuration file. For instance, to use the neva/inference.yaml configuration, modify the inference field to neva/fw_inference:

    Copy
    Copied!
                

    defaults: - fw_inference: neva/inference ...

  2. In the stages section of conf/config.yaml, ensure the fw_inference stage is present:

    Copy
    Copied!
                

    stages: - fw_inference ...

  3. Inside conf/fw_inference/neva/inference.yaml, configure the paths for prompt_file, inference.images_base_path, and neva_model_file to correctly point to the locations relevant to your inference task. The trained NeVA model should also be accurately referenced under neva_model_file.

    Copy
    Copied!
                

    inference: images_base_path: /path/to/images_associate_with_prompt_file/ prompt_file: /path/to/input_jsonl_file.jsonl neva_model_file: /path/to/trained_neva_checkpoint.nemo

  4. Execute launcher pipeline: python3 main.py

Remarks:

  1. Ensure the value of run.model_train_name corresponds to the appropriate model size, either neva_llama2_7b_chat or neva_llama2_13b_chat.

  2. Ensure the tensor model parallel sizes are correctly assigned. By default, the 7B model uses tensor_model_parallel_size=4, while the 13B model uses tensor_model_parallel_size=8.

This section outlines the steps to set up and use a Gradio server for interacting with NeMo models. Follow these instructions after entering the NeMo container environment.

Server Setup

  1. Edit Server Configuration:

    • Navigate to the server script at /opt/NeMo/examples/multimodal/mllm/neva/eval/gardio_server.py.

    • Update the script with the correct configurations and the path to your Neva model.

  2. Install Gradio:

    • Run the following command to install Gradio:

      Copy
      Copied!
                  

      pip install gradio

  3. Environment Variables and Server Launch:

    • Set the necessary environment variables and start the Gradio server with the following commands:

      Copy
      Copied!
                  

      export NVTE_FLASH_ATTN=0 export NVTE_FUSED_ATTN=0 python /opt/NeMo/examples/multimodal/mllm/neva/eval/gardio_server.py

Querying the Server

  • Once the server is up and running, you can query it using a Python Gradio client.

  • The client sends text and image data as base64 encoded strings.

  • For an example of how to structure your client queries, refer to the script at /opt/NeMo/examples/multimodal/mllm/neva/eval/gardio_cli.py.

Previous Parameter Efficient Fine-Tuning (PEFT)
Next Model Export to TensorRT-LLM
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.