Evaluate Megatron Bridge Checkpoints Trained by NeMo Framework#
This guide provides step-by-step instructions for evaluating Megatron Bridge checkpoints trained using the NeMo Framework with the Megatron Core backend. This section specifically covers evaluation with nvidia-lm-eval, a wrapper around the lm-evaluation-harness tool.
First, we focus on benchmarks within the lm-evaluation-harness that depend on text generation. Evaluation on log-probability-based benchmarks is available in the subsequent section Evaluate Megatron Bridge Checkpoints on Log-probability benchmarks.
Deploy Megatron Bridge Checkpoints#
To evaluate a checkpoint saved during pretraining or fine-tuning with Megatron-Bridge, provide the path to the saved checkpoint using the --megatron_checkpoint flag in the deployment command below. Otherwise, Hugging Face checkpoints can be converted to Megatron Bridge using the single shell command:
huggingface-cli login --token <your token>
python -c "from megatron.bridge import AutoBridge; AutoBridge.import_ckpt('meta-llama/Meta-Llama-3-8B','/workspace/mbridge_llama3_8b/')"
The deployment scripts are available inside the /opt/Export-Deploy/scripts/deploy/nlp/ directory. Below is an example command for deployment. It uses a Hugging Face LLaMA 3 8B checkpoint that has been converted to Megatron Bridge format using the command shared above.
python \
/opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
--megatron_checkpoint "/workspace/mbridge_llama3_8b/iter_0000000" \
--model_id "megatron_model" \
--port 8080 \
--num_gpus 4 \
--num_replicas 2 \
--tensor_model_parallel_size 2 \
--pipeline_model_parallel_size 1 \
--context_parallel_size 1
Note
Megatron Bridge creates checkpoints in directories named iter_N, where N is the iteration number. Each iter_N directory contains model weights and related artifacts. When using a checkpoint, make sure to provide the path to the appropriate iter_N directory. Hugging Face checkpoints converted for Megatron Bridge are typically stored in a directory named iter_0000000, as shown in the command above.
Note
Megatron Bridge deployment for evaluation is supported only with Ray Serve and not PyTriton.
Evaluate Megatron Bridge Checkpoints#
Once deployment is successful, you can run evaluations using the NeMo Evaluator API. See NeMo Evaluator for more details.
Before starting the evaluation, it’s recommended to use the check_endpoint function to verify that the endpoint is responsive and ready to accept requests.
from nemo_evaluator.api import check_endpoint, evaluate
from nemo_evaluator.api.api_dataclasses import (
ApiEndpoint,
ConfigParams,
EvaluationConfig,
EvaluationTarget,
)
# Configure the evaluation target
api_endpoint = ApiEndpoint(
url="http://0.0.0.0:8080/v1/completions/",
type="completions",
model_id="megatron_model",
)
eval_target = EvaluationTarget(api_endpoint=api_endpoint)
eval_params = ConfigParams(top_p=0, temperature=0, limit_samples=2, parallelism=1)
eval_config = EvaluationConfig(type="mmlu", params=eval_params, output_dir="results")
if __name__ == "__main__":
check_endpoint(
endpoint_url=eval_target.api_endpoint.url,
endpoint_type=eval_target.api_endpoint.type,
model_name=eval_target.api_endpoint.model_id,
)
evaluate(target_cfg=eval_target, eval_cfg=eval_config)
Evaluate Megatron Bridge Checkpoints on Log-probability Benchmarks#
To evaluate Megatron Bridge checkpoints on benchmarks that require log-probabilities, use the same deployment command provided in Deploy Megatron Bridge Checkpoints.
For evaluation, you must specify the path to the tokenizer and set the tokenizer_backend parameter as shown below. The tokenizer files are located within the tokenizer directory of the checkpoint.
from nemo_evaluator.api import check_endpoint, evaluate
from nemo_evaluator.api.api_dataclasses import (
ApiEndpoint,
ConfigParams,
EvaluationConfig,
EvaluationTarget,
)
# Configure the evaluation target
api_endpoint = ApiEndpoint(
url="http://0.0.0.0:8080/v1/completions/",
type="completions",
model_id="megatron_model",
)
eval_target = EvaluationTarget(api_endpoint=api_endpoint)
eval_params = ConfigParams(
top_p=0,
temperature=0,
limit_samples=1,
parallelism=1,
extra={
"tokenizer": "/workspace/mbridge_llama3_8b/iter_0000000/tokenizer",
"tokenizer_backend": "huggingface",
},
)
eval_config = EvaluationConfig(
type="arc_challenge", params=eval_params, output_dir="results"
)
if __name__ == "__main__":
check_endpoint(
endpoint_url=eval_target.api_endpoint.url,
endpoint_type=eval_target.api_endpoint.type,
model_name=eval_target.api_endpoint.model_id,
)
evaluate(target_cfg=eval_target, eval_cfg=eval_config)