core.ssm.triton_cache_manager#

Module Contents#

Classes#

ParallelFileCacheManager

This patched version of ParallelFileCacheManager prevents errors related to the builing of the Triton compiler cache when the number of model parallel ranks is greater than one, including when certain types of file system are used (such as Lustre).

Functions#

_version_no_greater_than

default_cache_dir

Provides a default path for the Triton cache directory.

API#

core.ssm.triton_cache_manager._version_no_greater_than(version, version_limit)#
core.ssm.triton_cache_manager.default_cache_dir()#

Provides a default path for the Triton cache directory.

class core.ssm.triton_cache_manager.ParallelFileCacheManager#

Bases: triton.runtime.cache.FileCacheManager

This patched version of ParallelFileCacheManager prevents errors related to the builing of the Triton compiler cache when the number of model parallel ranks is greater than one, including when certain types of file system are used (such as Lustre).

Usage: export TRITON_CACHE_DIR= export TRITON_CACHE_MANAGER=megatron.core.ssm.triton_cache_manager:ParallelFileCacheManager

This patch implements the changes in the following two Triton project pull requests:

  1. https://github.com/triton-lang/triton/pull/3544

  2. https://github.com/triton-lang/triton/pull/4295

The above changes will probably be included in Triton release version 3.2, making this patch no longer necessary.

put(data, filename, binary=True) str#

A patched version of put, implementing PR 3544 and PR 4295.