Post-training With GR00T#

Setup GR00T#

If you haven’t already installed GR00T during the setup module, let’s do it now:

# Install the dependencies for GR00T N1
bash tools/env_setup_robot_us.sh --policy gr00tn1
# Download the model weights for GR00T N1
i4h-asset-retrieve --sub-path Policies/LiverScan/GR00TN1
# Example for GR00T N1
python -m policy_runner.run_policy --policy gr00tn1

Launch GR00T N1 Finetuning#

Vision language action (VLA) models such as GR00T combine perception, with language context to predict actions for robots or agents.

Perception: Receiving visual inputs (e.g. camera or simulated sensor data)
Understanding: Interpreting these inputs in the context of language instructions, for example : “Perform a liver scan”
Control: Outputting actions that can be executed by a robot or agent, like a desired end-effector pose.

This integration makes VLA models especially powerful for embodied AI, robotics, and simulation-driven tasks, where understanding the environment and responding intelligently are tightly coupled.

Finetuning a VLA model typically involves adapting a large, pretrained foundation model to a specific environment or task. In our case we want to fine tune GR00T N1. Our training script wraps the training module from the repo and uses the data we have prepared. It follows the implementation of the gr00t_finetune.py script.

We start with the pretrained GR00T model (N1 or N1.5) and then post‑train it using your own demonstration data. The default configuration does not use LoRA (Low-Rank Adaptation), hence all model parameters are unfrozen and updated during training. Note that we use the EmbodimentTag new_embodiment, to specify that we use a custom robot. We encourage you to experiment with the finetuning parameters and have a look at the finetuning tutorial.

Code Instructions#

Finetuning GR00T#

Modify this line to point to the dataset you have prepared in the previous step.

export DATASET_PATH="<user_dir>/.cache/huggingface/lerobot/gr00t_ultrasound"
export OUTPUT_DIR="<your_model_training_folder>"

Run the finetuning code.

cd "$I4H_HOME/workflows/robotic_ultrasound/scripts/training/gr00t_n1"
python train.py \
  --data_config single_panda_us \
  --dataset_path $DATASET_PATH \
  --output-dir $OUTPUT_DIR

Command Arguments:

--data_config single_panda_us: Specifies the data configuration for single Panda robot with ultrasound setup. It is mapped to this configuration class, specific to our setup.
--dataset_path: Path to the HuggingFace dataset containing robotic ultrasound training data, in our case we point to the data we collected in the previous chapter.
--output-dir: Directory where training outputs, checkpoints, and logs will be saved. Specify a location on your filesystem. We will load the trained model from this directory in the rollout phase.
there are more configuration options in the training script. Have a look at the Config class if you would like to change the base model, checkpoint saving frequency, or any learning-related parameters.

Note

By default the training script will save checkpoints every 500 epochs. If you want to test your own model, you’ll need to wait for the first checkpoint to be saved. Don’t expect the model to work well, given the small dataset and training time. A pre-trained model is available in the cached assets folder for the next section.

In this video, we prepare the command to launch the finetuning process for GR00T N1. You can train your model until convergence, or stop early. You will be able to continue the course without your own model. The code deep dive section into the training script is part of the next section. You will find it in “Rollout a VLA Model.”