core.resharding.nvshmem_copy_service.validation#
Validation utilities for GPU-to-GPU communication.
Provides deterministic data generation and validation for verifying
correctness of communication operations.
Module Contents#
Classes#
Result of validating a single task. |
|
Summary of validation across all tasks. |
Functions#
Generate deterministic data pattern for a task. |
|
Validate received data against expected deterministic pattern. |
|
Log validation summary. |
API#
- class core.resharding.nvshmem_copy_service.validation.ValidationResult#
Result of validating a single task.
- task_id: int#
None
- size: int#
None
- passed: bool#
None
- src_pe: int#
None
- mismatches: int#
0
- first_mismatch_idx: int#
None
- first_mismatch_expected: int#
0
- first_mismatch_actual: int#
0
- batch_index: int#
None
- iteration: int#
None
- class core.resharding.nvshmem_copy_service.validation.ValidationSummary#
Summary of validation across all tasks.
- total_tasks: int#
None
- passed_tasks: int#
None
- failed_tasks: int#
None
- total_bytes: int#
None
- results: List[core.resharding.nvshmem_copy_service.validation.ValidationResult]#
None
- property all_passed: bool#
Check if all validated tasks passed.
- core.resharding.nvshmem_copy_service.validation.generate_deterministic_data(
- task_id: int,
- size: int,
- device: str = 'cuda',
Generate deterministic data pattern for a task.
Pattern: Each byte = (task_id * 31 + position) % 256 This creates a unique pattern per task that varies along the data.
- Parameters:
task_id – Unique task identifier
size – Number of bytes to generate
device – Device to create tensor on (‘cuda’ or ‘cpu’)
- Returns:
torch.Tensor of uint8 with deterministic pattern
- core.resharding.nvshmem_copy_service.validation.validate_received_data(
- task_id: int,
- tensor: torch.Tensor,
- size: int,
- src_pe: int = -1,
Validate received data against expected deterministic pattern.
- Parameters:
task_id – Task identifier to regenerate expected data
tensor – Received tensor to validate
size – Number of bytes to validate
- Returns:
ValidationResult with pass/fail status and details
- core.resharding.nvshmem_copy_service.validation.log_validation_summary( ) None#
Log validation summary.