`nemo_rl.models.value.interfaces`#

Module Contents#

`ValueOutputSpec`	values: Tensor of value predictions [batch_size, sequence_length].
`ValueInterface`	Abstract base class defining the interface for value functions.

class nemo_rl.models.value.interfaces.ValueOutputSpec#

Bases: typing.TypedDict

values: Tensor of value predictions [batch_size, sequence_length].

Initialization

Initialize self. See help(type(self)) for accurate signature.

class nemo_rl.models.value.interfaces.ValueInterface#

Bases: abc.ABC

Abstract base class defining the interface for value functions.

Get value predictions for observations.

Parameters:

Returns:

Return type:

BatchedDataDict containing

abstractmethod train( data: nemo_rl.distributed.batched_data_dict.BatchedDataDict, loss_fn: nemo_rl.algorithms.loss.interfaces.LossFunction, eval_mode: bool = False, *, gbs: Optional[int] = None, mbs: Optional[int] = None, timer: Optional[nemo_rl.utils.timer.Timer] = None, ) → dict[str, Any]#

Train the value function on a global batch of data.

Parameters:

Returns:

Dictionary containing training metrics (loss, grad_norm, etc.)

abstractmethod prepare_for_training(*args: Any, **kwargs: Any) → None#: Prepare the value model for training (e.g., load to GPU).

abstractmethod prepare_for_inference(*args: Any, **kwargs: Any) → None#: Prepare the value model for inference (e.g., offload gradients).

abstractmethod finish_training(*args: Any, **kwargs: Any) → None#: Clean up after training.

abstractmethod save_checkpoint(*args: Any, **kwargs: Any) → None#: Save model checkpoint.