core.dist_checkpointing.strategies.nvrx#

Helpers for interacting with the experimental nvidia-resiliency-ext API.

Module Contents#

Functions#

has_nvrx_async_support

Checks whether the NVRx async checkpointing symbols Megatron uses are importable.

make_nvrx_async_request

Builds an AsyncRequest using the expected NVRx API.

API#

core.dist_checkpointing.strategies.nvrx.has_nvrx_async_support() bool#

Checks whether the NVRx async checkpointing symbols Megatron uses are importable.

core.dist_checkpointing.strategies.nvrx.make_nvrx_async_request(
async_request_cls: type,
async_fn: Callable[..., Any],
async_fn_args: Any,
finalize_fns: list[Callable[..., Any]],
async_fn_kwargs: Dict[str, Any] | None = None,
preload_fn: Callable[..., Any] | None = None,
)#

Builds an AsyncRequest using the expected NVRx API.