Real Evaluation#

In this session, you’ll run policy evaluation on the physical SO-101 robot using the same GR00T-based setup you used in simulation.

The client is now the real robot instead of the simulator!

Learning Objectives#

By the end of this session, you’ll be able to:

Run policy evaluation on the real robot using the GR00T server + client (Docker) setup
Observe the sim-to-real gap firsthand
Stop and restart the evaluation safely

What Policy Are We Running?#

We use the same policy you evaluated in simulation. The exact MODEL (checkpoint path) is set in the commands below.

Workspace Prep#

Before running real-robot evaluation:

Place 1-3 vials on the mat and keep the rack in its reference location.
Ensure both cameras have a clear view of the workspace.
Turn on the light and set brightness (see Set Up the Light).

Running Policy Evaluation on the Real Robot#

Throughout this course, when we run evaluations there will be two terminals involved:

The host terminal, where we start the GR00T container and policy server
The client terminal, where we run the evaluation rollout and actually control the robot

For sim, the client is our simulator. For the real robot, our client is the robot itself.

Terminal 1 (`real-robot` container) — Start the GR00T policy server#

Locate the terminal already running the real-robot container.

Inside this container, run the following. This is where we choose which model to evaluate.

export MODEL=aravindhs-NV/grootn16-finetune_sreetz-so101_teleop_vials_rack_left/checkpoint-10000

Run the policy server with that model.

python Isaac-GR00T/gr00t/eval/run_gr00t_server.py \
    --model-path /workspace/models/$MODEL

Terminal 2 (`real-robot` container) — Evaluation rollout#

Open a second terminal. You will attach to the same real-robot container and run the robot client. This step assumes your robot has been calibrated already (likely you already did this).

Attach a second terminal to the real-robot container.

docker exec -it real-robot /bin/bash

Once inside the container, run the evaluation script:

python Isaac-GR00T/gr00t/eval/real_robot/SO100/so101_eval.py \
  --robot.type=so101_follower \
  --robot.port="$ROBOT_PORT" \
  --robot.id="$ROBOT_ID" \
  --robot.cameras="{
      wrist:  {type: opencv, index_or_path: $CAMERA_GRIPPER, width: 640, height: 480, fps: 30},
      front:  {type: opencv, index_or_path: $CAMERA_EXTERNAL, width: 640, height: 480, fps: 30}
  }" \
  --policy_host=localhost \
  --policy_port=5555 \
  --lang_instruction="Pick up the vial and place it in the yellow rack" \
  --rerun True

Note

The --rerun flag is optional.

It adds Rerun into the loop for debugging, so you can see joint actions and the camera feeds while the policy is running. This lets you confirm the camera views are reasonable and the assignments are correct.

Watching the Evaluation#

Watch the robot and the terminal during execution. The policy will run until you stop it or it completes the evaluation. Watch closely but stay clear; note any unexpected behavior and be ready to intervene.

To stop the robot: Press CTRL+C in Terminal 2 (robot client). The policy server in Terminal 1 keeps running.

To run again: Simply run the command again python Isaac-GR00T/gr00t/eval/real_robot/SO100/so101_eval.py ... in Terminal 2

To switch model or fully restart:

Stop both terminals’ commands (CTRL+C)
Set MODEL environment variable to the model you want to evaluate
Restart the commands for each terminal (model server, robot client)

Note

At evaluation start, the robot will slowly rise to its initial pose, then enter into inference mode.
At robot stop (CTRL+C), it will slowly drive itself back to its home pose.

Tip

Keep the policy server running between evaluation attempts. Only restart it if you want to load a different model checkpoint.

Common Failure Modes#

When observing real evaluation runs, notice how perception and actuation differ from simulation. The same policy may miss grasps, overshoot, or behave differently under real lighting and dynamics. These differences are the sim-to-real gap you’ll address with the strategies in the modules that follow.

Key Takeaways#

Real robot evaluation uses the same GR00T server + client architecture as sim evaluation; only the client (robot vs. simulator) changes
The gap between sim and real performance is often visible immediately—perception and actuation both matter
Safe shutdown is CTRL+C in the robot client terminal first

What’s Next?#

Continue with Strategy 2: Co-Training With Real Data, where you’ll deploy policies trained on mixed simulation and real data to the physical robot.