`nemo_automodel.checkpoint._backports.consolidate_hf_safetensors`#

Module Contents#

Classes#

`_FqnData`	Dataclass to store information about a tensor (identified by its fully qualified name).
`_OutputFileData`	Dataclass to store information about an output safetensors file.
`_InputFileData`	Dataclass to store information about an input safetensors file.

Functions#

`_parse_input_metadata`	Parse metadata from input safetensors files to determine the full tensor shapes and types.
`_write_metadata`	Write metadata to the beginning of each output safetensors file.
`_process_output_file`	Process a single output file by writing tensor data from input files.
`_write_data`	Write tensor data from input files to the output files.
`_write_row_wise_tensor`	Writes a row-wise sharded tensor to the output file.
`_write_column_wise_tensor`	Writes a column-wise sharded 2D tensor to the output file.
`_write_element_by_element`	Writes a sub-tensor to the output file using a general element-by-element approach.
`_write_sub_tensor_to_file`	Writes a sub-tensor from a byte array into a file representing the full tensor at specified offsets.
`_write_overall_metadata_file`
`consolidate_safetensors_files`	Main function to consolidate sharded safetensors files into one or more output files.

API#

class nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._FqnData[source]#

Dataclass to store information about a tensor (identified by its fully qualified name).

.. attribute:: offset_in_file

Byte offset where this tensor’s data begins in the output file

.. attribute:: shape_in_file

Shape of the tensor in the output file

.. attribute:: dtype_size

Size of the tensor’s data type in bytes

.. attribute:: dtype_str

String representation of the tensor’s data type

offset_in_file: int#: 0

shape_in_file: list[int]#: ‘field(…)’

dtype_size: int#: 0

dtype_str: str = <Multiline-String>#

class nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._OutputFileData[source]#

Dataclass to store information about an output safetensors file.

.. attribute:: metadata_size

Size of the metadata section in bytes

.. attribute:: fqn_data

Dictionary mapping tensor names to their metadata

metadata_size: int#: 0

fqn_data: dict[str, nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._FqnData]#: ‘field(…)’

class nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._InputFileData[source]#

Dataclass to store information about an input safetensors file.

.. attribute:: metadata_size

Size of the metadata section in bytes

.. attribute:: metadata

Json metadata from the safetensors file

metadata_size: int#: 0

metadata: Any#: None

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._parse_input_metadata( input_files_data: dict[str, nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._InputFileData], output_files_data: dict[str, nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._OutputFileData], ) → None[source]#

Parse metadata from input safetensors files to determine the full tensor shapes and types.

This function analyzes the metadata from all input files to determine the complete shape of each tensor after consolidation. It updates the output_files_data with this information.

Parameters:

input_files_data – dict of metadata from input safetensors files
output_files_data – Dictionary mapping output file paths to their metadata

Raises:

ValueError – If no DCP custom metadata is found in a safetensors file

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._write_metadata( fs: fsspec.AbstractFileSystem, output_files_data: dict[str, nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._OutputFileData], ) → None[source]#

Write metadata to the beginning of each output safetensors file.

This function writes the metadata section to each output file, including information about tensor shapes, data types, and offsets. It also updates the offset_in_file field for each tensor in the output_files_data.

Parameters:

fs – Filesystem interface for file operations
output_files_data – Dictionary mapping output file paths to their metadata

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._process_output_file( input_fs: fsspec.AbstractFileSystem, output_fs: fsspec.AbstractFileSystem, output_file: str, output_data: nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._OutputFileData, input_files_data: dict[str, nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._InputFileData], ) → None[source]#

Process a single output file by writing tensor data from input files.

This function is designed to be run in parallel for different output files.

Parameters:

input_fs – Filesystem interface for input file operations
output_fs – Filesystem interface for output file operations
output_file – Path to the output file
output_data – Metadata for the output file
input_safetensors_files – List of input safetensors file paths
input_metadatas – Dictionary mapping input file paths to their metadata

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._write_data( input_fs: fsspec.AbstractFileSystem, output_fs: fsspec.AbstractFileSystem, input_files_data: dict[str, nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._InputFileData], output_files_data: dict[str, nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._OutputFileData], num_threads: int = 1, ) → None[source]#

Write tensor data from input files to the output files.

This function reads tensor data from each input file and writes it to the appropriate position in the output files based on the tensor’s offsets. When num_threads > 1, the work is split across threads with each thread handling a different output file.

Parameters:

input_fs – Filesystem interface for input file operations
output_fs – Filesystem interface for output file operations
input_files_data – Dictionary mapping input file paths to their metadata
output_files_data – Dictionary mapping output file paths to their metadata
num_threads – Number of threads to use for parallel processing

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._write_row_wise_tensor( fs: fsspec.AbstractFileSystem, sub_tensor_bytes: bytearray, element_size: int, full_tensor_strides: list[int], sub_tensor_strides: list[int], sub_tensor_offsets: list[int], sub_tensor_shape: list[int], output_file_path: str, output_start_byte: int, )[source]#

Writes a row-wise sharded tensor to the output file.

This is an optimized path for tensors that are sharded along the first dimension, with all other dimensions being complete. This allows writing entire rows at once.

Parameters:

fs – Filesystem interface for file operations
sub_tensor_bytes – Byte array containing the sub-tensor data
element_size – The size of each element in bytes
full_tensor_strides – Strides of the full tensor
sub_tensor_strides – Strides of the sub-tensor
sub_tensor_offsets – The starting offsets of the sub-tensor within the full tensor
sub_tensor_shape – The shape of the sub-tensor
output_file_path – The path to the file where the full tensor is stored
output_start_byte – The starting byte of the full tensor in the file

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._write_column_wise_tensor( fs: fsspec.AbstractFileSystem, sub_tensor_bytes: bytearray, element_size: int, tensor_shape: list[int], sub_tensor_offsets: list[int], sub_tensor_shape: list[int], output_file_path: str, output_start_byte: int, )[source]#

Writes a column-wise sharded 2D tensor to the output file.

This is an optimized path for 2D tensors that are sharded along the second dimension, with the first dimension being complete. This requires writing column by column.

Parameters:

fs – Filesystem interface for file operations
sub_tensor_bytes – Byte array containing the sub-tensor data
element_size – The size of each element in bytes
tensor_shape – The shape of the overall tensor
sub_tensor_strides – Strides of the sub-tensor
sub_tensor_offsets – The starting offsets of the sub-tensor within the full tensor
sub_tensor_shape – The shape of the sub-tensor
output_file_path – The path to the file where the full tensor is stored
output_start_byte – The starting byte of the full tensor in the file

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._write_element_by_element( fs: fsspec.AbstractFileSystem, sub_tensor_bytes: bytearray, element_size: int, tensor_shape: list[int], full_tensor_strides: list[int], sub_tensor_strides: list[int], sub_tensor_offsets: list[int], sub_tensor_shape: list[int], output_file_path: str, output_start_byte: int, )[source]#

Writes a sub-tensor to the output file using a general element-by-element approach.

This is a general approach that works for any sharding pattern, but is less efficient than the specialized approaches for row-wise or column-wise sharding.

Parameters:

fs – Filesystem interface for file operations
sub_tensor_bytes – Byte array containing the sub-tensor data
element_size – The size of each element in bytes
tensor_shape – The shape of the overall tensor
full_tensor_strides – Strides of the full tensor
sub_tensor_strides – Strides of the sub-tensor
sub_tensor_offsets – The starting offsets of the sub-tensor within the full tensor
sub_tensor_shape – The shape of the sub-tensor
output_file_path – The path to the file where the full tensor is stored
output_start_byte – The starting byte of the full tensor in the file

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._write_sub_tensor_to_file( fs: fsspec.AbstractFileSystem, sub_tensor_bytes: bytearray, element_size: int, tensor_shape: list[int], sub_tensor_offsets: list[int], sub_tensor_shape: list[int], output_file_path: str, output_start_byte: int, )[source]#

Writes a sub-tensor from a byte array into a file representing the full tensor at specified offsets.

This function handles the complex task of placing a tensor shard (sub-tensor) at the correct position within the consolidated tensor file. It works by calculating the exact byte offsets for each slice of data and writing them to the appropriate positions. This implementation supports tensors of any dimensionality with optimized paths for common sharding patterns:

Row-wise sharding (optimized path)
Column-wise sharding for 2D tensors (optimized path)
Any other arbitrary sharding pattern (general element-by-element approach)

Parameters:

sub_tensor_bytes – Byte array containing the sub-tensor data
element_size – The size of each element in bytes
tensor_shape – The shape of the overall tensor (list)
sub_tensor_offsets – The starting offsets of the sub-tensor within the full tensor (list)
sub_tensor_shape – The shape of the sub-tensor (list)
output_file_path – The path to the file where the full tensor is stored
output_start_byte – The starting byte of the full tensor in the file

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._write_overall_metadata_file( fs: fsspec.AbstractFileSystem, output_dir: str, output_files_data: dict[str, nemo_automodel.checkpoint._backports.consolidate_hf_safetensors._OutputFileData], ) → None[source]#

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors.consolidate_safetensors_files( input_dir: str, output_dir: str, fqn_to_index_mapping: Optional[dict[str, int]] = None, num_threads: int = 1, ) → None[source]#

Main function to consolidate sharded safetensors files into one or more output files.

This function orchestrates the entire consolidation process:

Sets up the output file structure based on the fqn_to_index_mapping
Finds all safetensors files in the input directory
Parses metadata from all input files
Writes metadata to the output files
Writes tensor data from input files to output files
Writes overall model.index.safetensors.json file with weight map

Parameters:

input_dir – Directory containing sharded safetensors files
output_dir – Directory where consolidated files will be written
fqn_to_index_mapping – Optional mapping of tensor names to output file indices. If None, all tensors will be consolidated into a single file.
num_threads – Number of threads to use for parallel processing of saving data to output files.

nemo_automodel.checkpoint._backports.consolidate_hf_safetensors#

Module Contents#

Classes#

Functions#

API#

`nemo_automodel.checkpoint._backports.consolidate_hf_safetensors`#