core.resharding.nvshmem_copy_service.validation#

Validation utilities for GPU-to-GPU communication.

Provides deterministic data generation and validation for verifying

correctness of communication operations.

Module Contents#

Classes#

ValidationResult

Result of validating a single task.

ValidationSummary

Summary of validation across all tasks.

Functions#

generate_deterministic_data

Generate deterministic data pattern for a task.

validate_received_data

Validate received data against expected deterministic pattern.

log_validation_summary

Log validation summary.

API#

class core.resharding.nvshmem_copy_service.validation.ValidationResult#

Result of validating a single task.

task_id: int#

None

size: int#

None

passed: bool#

None

src_pe: int#

None

mismatches: int#

0

first_mismatch_idx: int#

None

first_mismatch_expected: int#

0

first_mismatch_actual: int#

0

batch_index: int#

None

iteration: int#

None

class core.resharding.nvshmem_copy_service.validation.ValidationSummary#

Summary of validation across all tasks.

total_tasks: int#

None

passed_tasks: int#

None

failed_tasks: int#

None

total_bytes: int#

None

results: List[core.resharding.nvshmem_copy_service.validation.ValidationResult]#

None

property all_passed: bool#

Check if all validated tasks passed.

core.resharding.nvshmem_copy_service.validation.generate_deterministic_data(
task_id: int,
size: int,
device: str = 'cuda',
) torch.Tensor#

Generate deterministic data pattern for a task.

Pattern: Each byte = (task_id * 31 + position) % 256 This creates a unique pattern per task that varies along the data.

Parameters:
  • task_id – Unique task identifier

  • size – Number of bytes to generate

  • device – Device to create tensor on (‘cuda’ or ‘cpu’)

Returns:

torch.Tensor of uint8 with deterministic pattern

core.resharding.nvshmem_copy_service.validation.validate_received_data(
task_id: int,
tensor: torch.Tensor,
size: int,
src_pe: int = -1,
) core.resharding.nvshmem_copy_service.validation.ValidationResult#

Validate received data against expected deterministic pattern.

Parameters:
  • task_id – Task identifier to regenerate expected data

  • tensor – Received tensor to validate

  • size – Number of bytes to validate

Returns:

ValidationResult with pass/fail status and details

core.resharding.nvshmem_copy_service.validation.log_validation_summary(
summary: core.resharding.nvshmem_copy_service.validation.ValidationSummary,
) None#

Log validation summary.