nemo_rl.data.multimodal_utils#

Module Contents#

Classes#

PackedTensor

Wrapper around a list of torch tensors and a dimension along which to pack the tensors.

Functions#

get_multimodal_keys_from_processor

Get keys of the multimodal data that can be used as model inputs.

get_dim_to_pack_along

Special considerations for packing certain keys from certain processors.

API#

class nemo_rl.data.multimodal_utils.PackedTensor(
tensors: Union[torch.Tensor, list[torch.Tensor]],
dim_to_pack: int,
)#

Wrapper around a list of torch tensors and a dimension along which to pack the tensors.

This class is used to wrap a list of tensors along with a dim_to_pack parameter. It can be used for data that can be packed along different dimensions (such as multimodal data).

dim_to_pack is used to specify the dimension along which to pack the tensors.

The list of tensors can be returned as a single packed tensor by calling as_tensor which will concatenate the tensors along the dim_to_pack dimension.

Initialization

as_tensor(
device: Optional[torch.device] = None,
) torch.Tensor#
__len__() int#
to(
device: str | torch.device,
) nemo_rl.data.multimodal_utils.PackedTensor#
slice(
indices: Union[list[int], torch.Tensor],
) nemo_rl.data.multimodal_utils.PackedTensor#
classmethod concat(
from_packed_tensors: list[nemo_rl.data.multimodal_utils.PackedTensor],
) nemo_rl.data.multimodal_utils.PackedTensor#

Concatenate a list of PackedTensor objects into a single PackedTensor.

The underlying tensors from the PackedTensors are combined into a single list of tensors and used to create a new PackedTensor.

Each batch must have the same dim_to_pack.

Example:

>>> import torch
>>> from nemo_rl.data.multimodal_utils import PackedTensor
>>> p1 = PackedTensor([torch.tensor([1, 2, 3]), torch.tensor([4, 5, 6])], dim_to_pack=0)
>>> p2 = PackedTensor([torch.tensor([7, 8, 9])], dim_to_pack=0)
>>> p3 = PackedTensor.concat([p1, p2])
>>> p3.tensors
[tensor([1, 2, 3]), tensor([4, 5, 6]), tensor([7, 8, 9])]
>>> p3.as_tensor()
tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
classmethod flattened_concat(
from_packed_tensors: list[nemo_rl.data.multimodal_utils.PackedTensor],
) nemo_rl.data.multimodal_utils.PackedTensor#

Given a list of PackedTensor objects, flattens each PackedTensor and then concatenates them into a single PackedTensor.

Each PackedTensor is first flattened by packing along the PackedTensor’s dim_to_pack dimension. Then, the resulting flattened tensors are used to create a new PackedTensor.

This is different from PackedTensor.concat which simply extends the underlying list of tensors. This is important because the slice and __len__ methods operate on the underlying list of tensors. Note, however, that calling as_tensor on the resulting PackedTensor will result in the same tensor as concat.

Each batch must have the same dim_to_pack.

Example:

>>> import torch
>>> from nemo_rl.data.multimodal_utils import PackedTensor
>>> p1 = PackedTensor([torch.tensor([1, 2, 3]), torch.tensor([4, 5, 6])], dim_to_pack=0)
>>> p2 = PackedTensor([torch.tensor([7, 8, 9])], dim_to_pack=0)
>>> p3 = PackedTensor.flattened_concat([p1, p2])
>>> p3.tensors
[tensor([1, 2, 3, 4, 5, 6]), tensor([7, 8, 9])]
>>> p3.as_tensor()
tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
nemo_rl.data.multimodal_utils.get_multimodal_keys_from_processor(processor) list[str]#

Get keys of the multimodal data that can be used as model inputs.

This will be used in the data_processor function to determine which keys to use as model inputs.

nemo_rl.data.multimodal_utils.get_dim_to_pack_along(processor, key: str) int#

Special considerations for packing certain keys from certain processors.

In most cases, the packed items are along dim 0