Evaluating the Results#

Run the policy#

Run the following command, notice how we’re using the Play version of the task we created earlier.

python scripts/skrl/play.py --task Template-Reach-Play-v0

Tip

For performance, you may want to even further limit the number of environments loaded during play. To control this, use the --num_envs parameter like this: python scripts/skrl/play.py --task Template-Reach-Play-v0 --num_envs 100

Here’s what an example trial looks like. Did it do what we expected?

The problem#

Well, there’s a problem - while our robot moves to the right position in space, the orientation of the robot isn’t changing to meet the goal pose. How do we fix this?

If you want a challenge, consider trying to solve this on your own before seeing the solution below.

Improving the rewards#

Let’s add another reward for orientation.

First, let’s address the Reward manager by adding another Reward Term.

Add the new RewTerm to your reach_env_cfg.py file, under RewardsCfg.

@configclass
class RewardsCfg:
    """Reward terms for the MDP."""

    #
    # ADD this code
    #
    end_effector_orientation_tracking = RewTerm(
        func=mdp.orientation_command_error,
        weight=-0.1,
        params={"asset_cfg": SceneEntityCfg("robot", body_names=["ee_link"]), "command_name": "ee_pose"},
    )

    # task terms
    end_effector_position_tracking = RewTerm(
        func=mdp.position_command_error,
        weight=-0.2,
        params={"asset_cfg": SceneEntityCfg("robot", body_names=["ee_link"]), "command_name": "ee_pose"},
    )
    end_effector_position_tracking_fine_grained = RewTerm(
        func=mdp.position_command_error_tanh,
        weight=0.1,
        params={"asset_cfg": SceneEntityCfg("robot", body_names=["ee_link"]), "std": 0.1, "command_name": "ee_pose"},
    )
    
    # action penalty
    action_rate = RewTerm(func=mdp.action_rate_l2, weight=-0.0001)
    joint_vel = RewTerm(

Next we’ll add the corresponding function that our func=mdp.orientation_command_error, argument is expecting. Remember, this is where the math happens.

Adding an orientation reward#

This function computes the error between the desired end-effector orientation from the command, and the current position of the end-effector.

Add this code to your Reach/tasks/manager_based/reach/mdp/rewards.py file.

def orientation_command_error(env: ManagerBasedRLEnv, command_name: str, asset_cfg: SceneEntityCfg) -> torch.Tensor:
    """Penalize tracking orientation error using shortest path.

    The function computes the orientation error between the desired orientation (from the command) and the
    current orientation of the asset's body (in world frame). The orientation error is computed as the shortest
    path between the desired and current orientations.
    """
    # extract the asset (to enable type hinting)
    asset: RigidObject = env.scene[asset_cfg.name]
    command = env.command_manager.get_command(command_name)
    # obtain the desired and current orientations
    des_quat_b = command[:, 3:7]
    des_quat_w = quat_mul(asset.data.root_state_w[:, 3:7], des_quat_b)
    curr_quat_w = asset.data.body_state_w[:, asset_cfg.body_ids[0], 3:7]  # type: ignore
    return quat_error_magnitude(curr_quat_w, des_quat_w)

Train and review the policy, again#

Let’s evaluate how our new reward affects the policy by re-running the training, and reviewing the results.

As before, you can add the --headless argument to forego seeing Isaac Sim viewport and speed-up training, or not use the argument to watch the training live.

Running without --headless has debugging benefits of watching the robots as they learn. And especially when you’re starting out, it can be fun to watch the process evolve.

Option 1: Train in Headless Mode#

Run the following command:

python scripts/skrl/train.py --task Template-Reach-v0 --headless

Option 2: Train With Visualization#

Run the following command:

python scripts/skrl/train.py --task Template-Reach-v0

Review results after training#

Run the following command:

python scripts/skrl/play.py --task Template-Reach-Play-v0

Note

If you have issues running the code, you can open the files provided with this module and either compare or copy them over your work to get to a known-good configuration.

Reward Engineering#

How are reward functions determined?#

At this point, you may be thinking: how did we come up with these reward functions? While we explained the function of each reward, where did the idea to define those particular rewards come from?

These rewards we wrote above are not the only solution for this problem, they are just one particular solution that works for this robot and this task.

Beyond finding rewards that can give optimal behavior, rewards can also be evaluated on how quickly they can train, and on the robustness of the resulting behavior.

How to design rewards for a given task is a huge topic, and can be a continual engineering task. This could be a whole module, and we’re really scratching the surface here to get started and show you some of what’s possible!

How can we make training faster?#

We could pursue strategies like: optimizing colliders, meshes, making sure models are instanceable. Or, using more GPU resources, cloud resources, etc.

Note

Read about Simulation Performance tips for Isaac Lab in the docs.

Training, Evaluating, and Iterating#

Overview#

Now that everything is configured in our project, let’s do a quick test of our robot configuration, then begin training and evaluating our policy.

Testing With zero_agent and random_agent#

Isaac Lab provides a few scripts to perform basic tests of our configuration. Running these can help isolate issues with our environment.

Run the following command:

python scripts/zero_agent.py --task Template-Reach-v0 --num_envs=10

This will open an an environment where instances of our robot are loaded, but they won’t move. This is a useful initial test that our robot loads properly.

Run the following command:

python scripts/random_agent.py --task Template-Reach-v0 --num_envs=10

This will add some noise to the articulation in the scene, letting us see that the joints are all moving. It’s a simple sanity-check before we proceed to training.

For both commands, the –num_envs flag is used to override the configured number of parallel environments. Since we’re only doing a quick check, we can load the scene faster by reducing this number to 10.

The task name Template-Reach-v0 was generated along with our project, and was registered with Isaac Lab during the install process. It can be changed later if you needed, and you can see where it was defined in source/Reach/Reach/tasks/manager_based/reach/__init__.py. We can also define more entry points here, such as the playback-focused configuration.

Training#

We have a few more options for training.

Using the --headless argument will forego the Isaac Sim viewport and speed-up training.

Running without --headless has debugging benefits because you can watch the robots as they learn. And especially when you’re starting out, it can be fun to watch the process evolve.

Option 1: Train in Headless Mode#

Run the following command:

python scripts/skrl/train.py --task Template-Reach-v0 --headless

Option 2: Train With Visualization#

Run the following command:

python scripts/skrl/train.py --task Template-Reach-v0

Remember, this will take more computational resources to run this mode.

Note

If you have errors when running the code, you can open the files provided with this module and either compare or copy them over your work to get to a known-good configuration.