Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

BERT Embedding Models

Sentence-BERT (SBERT) is a modification of the Bidirectional Encoder Representations from Transformers (BERT) model that is specifically trained to generate semantically meaningful sentence embeddings. The model architecture and pre-training process are detailed in the Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks paper. Similar to BERT, SBERT utilizes a BERT-based architecture, but it is trained using a Siamese and triplet network structure to derive fixed-sized sentence embeddings that capture semantic information. SBERT is commonly used to generate high-quality sentence embeddings for various downstream natural language processing tasks, such as semantic textual similarity, clustering, and information retrieval.

Prepare the Data

The fine-tuning data for the SBERT model should consist of data instances, each comprising a query, a positive document, and a list of negative documents. Negative mining is not supported in NeMo yet; therefore, data preprocessing should be performed offline before training. The dataset should be in JSON format. For instance, the dataset should have the following structure:

[
    {
        "query": "Query",
        "pos_doc": "Positive",
        "neg_doc": ["Negative_1", "Negative_2", ..., "Negative_n"]
    },
    {
        // Next data instance
    },
    ...,
    {
        // Subsequent data instance
    }
]

This format ensures that the fine-tuning data is appropriately structured for training the SBERT model.

Convert the Checkpoint

To fine-tune the SBERT model, you must initialize it with a BERT model checkpoint. You have two options for obtaining this checkpoint:

If you already have a .nemo checkpoint for SBERT, you can use it directly.
If you have a Hugging Face BERT checkpoint, you’ll need to convert it to the NeMo Megatron Core (mcore) format. Follow the steps below:

python NeMo/scripts/nlp_language_modeling/convert_bert_hf_to_nemo.py \
       --input_name_or_path "intfloat/e5-large-unsupervised" \
       --output_path /path/to/output/nemo/file.nemo \
       --mcore True \
       --precision 32

Fine-tune the Model

You must set the configuration to be used for the fine-tuning pipeline in conf/config.yaml.

Set the fine_tuning configuration to specify the file to be used for training purposes. You must include fine_tuning in stages to run the training pipeline.
Set the fine_tuning configuration to bert_embedding/sft for BERT Embedding models.
Update the configuration to adjust the hyperparameters of the training runs.

Configure the Slurm Cluster

Set the configuration for your Slurm cluster in conf/cluster/bcm.yaml:

partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"

Set the job-specific training configurations in the run section of conf/fine_tuning/bert_embedding/sft.yaml:

run:
    name: bertembedding
    results_dir: ${base_results_dir}/${.name}
    time_limit: "4:00:00"
    dependency: "singleton"

To run only the fine-tuning pipeline, set the stages section of conf/config.yaml to:

stages:
  - fine_tuning

Enter the following command:

python3 main.py

Configure the Base Command Platform

Select the cluster-related configuration by following the information in the NVIDIA Base Command Platform documentation.
Launch the job and override the training job values of any configurations that need to be updated. Enter the following command:

python3 main.py