NVTX API for Compute Sanitizer Reference Manual
The reference manual on NVTX API for Compute Sanitizer.
1. Introduction
1.1. Overview
The NVTX Memory API for Compute Sanitizer allows CUDA programs to notify Compute Sanitizer about memory restrictions: memory pools management or permissions restrictions, in addition to memory labeling. The tools are notified through NVTX (NVIDIA Tools Extension), a header-only C library used by various NVIDIA tools. Latest NVTX headers can be downloaded on our GitHub repository (experimental branch).
- Programs can mark allocations as memory pools, allowing Compute Sanitizer to be aware of which parts of this specific allocation are actually used. When using the Memcheck tool, you are notified if unregistered parts of the pool are accessed by the program, errors that could have been missed otherwise. When using the Initcheck tool, in combination with option --track-unused-memory yes, you are not notified for unused memory in non-registered regions, therefore avoiding false positives.
- Programs can label allocations with meaningful names, allowing you to identify an allocation associated to a specific error by its name (e.g., allocation that is leaking, or unused).
- Programs can restrict some allocations to a specific set of permissions (e.g., read-only or write-only) applicable for a specific scope (e.g., CUDA stream, device or whole program). When using the Memcheck tool, violation of these restrictions will result in an error.
2. Usage
2.1. Compatibility and Requirements
The Compute Sanitizer tools require CUDA 11.0 or newer.
The NVTX Memory API is supported by Compute Sanitizer starting CUDA 11.3, using the --nvtx yes option. Starting CUDA 12.0, this option is enabled by default.
// NVTX calls are not allowed before CUDA runtime initialization. // Forces CUDA runtime initialization. cudaFree(0); // NVTX calls are now allowed.
NVTX structures must be zero-initialized. Examples on this page use C++ empty initializer ({}). If you are using C, you can use memset or use the intializer syntax with at least one field (C does not support empty initalizers).
2.2. NVTX Domain
// Requires <nvtx3/nvToolsExt.h> auto nvtxDomain = nvtxDomainCreateA("my-domain");
For now, NVTX domains have no specific usage, but will have one in a future Compute Sanitizer version.
2.3. Suballocation API
2.3.1. Pools Management
// Requires <nvtx3/nvToolsExtMem.h> // (see https://github.com/NVIDIA/NVTX/tree/dev-mem-api/c/include) void *ptr; cudaMalloc(&ptr, 64); nvtxMemVirtualRangeDesc_t nvtxRangeDesc = {}; nvtxRangeDesc.size = 64; nvtxRangeDesc.ptr = ptr; nvtxMemHeapDesc_t nvtxHeapDesc = {}; nvtxHeapDesc.extCompatID = NVTX_EXT_COMPATID_MEM; nvtxHeapDesc.structSize = sizeof(nvtxMemHeapDesc_t); nvtxHeapDesc.usage = NVTX_MEM_HEAP_USAGE_TYPE_SUB_ALLOCATOR; nvtxHeapDesc.type = NVTX_MEM_TYPE_VIRTUAL_ADDRESS; nvtxHeapDesc.typeSpecificDescSize = sizeof(nvtxMemVirtualRangeDesc_t); nvtxHeapDesc.typeSpecificDesc = &nvtxRangeDesc; auto nvtxPool = nvtxMemHeapRegister( nvtxDomain, &nvtxHeapDesc);Please note that Compute Sanitizer only supports nvtxMemHeapRegister with parameters usage = NVTX_MEM_HEAP_USAGE_TYPE_SUB_ALLOCATOR and type = NVTX_MEM_TYPE_VIRTUAL_ADDRESS. If you are using the CUDA runtime API, nvtxMemHeapRegister can be used with allocations created with cuMemAlloc.
// Requires <nvtx3/nvToolsExtMem.h>
nvtxMemHeapReset(nvtxDomain, nvtxPool);
// Requires <nvtx3/nvToolsExtMem.h>
nvtxMemHeapUnregister(nvtxDomain, nvtxPool);
For your convenience, calling cudaFree on a memory
pool causes Compute Sanitizer to automatically unregister it.
2.3.2. Suballocations Management
// Requires <nvtx3/nvToolsExtMem.h> nvtxMemVirtualRangeDesc_t nvtxRangeDesc = {}; nvtxRangeDesc.size = 16; nvtxRangeDesc.ptr = ptr; nvtxMemRegionsRegisterBatch_t nvtxRegionsDesc = {}; nvtxRegionsDesc.extCompatID = NVTX_EXT_COMPATID_MEM; nvtxRegionsDesc.structSize = sizeof(nvtxMemRegionsRegisterBatch_t); nvtxRegionsDesc.regionType = NVTX_MEM_TYPE_VIRTUAL_ADDRESS; nvtxRegionsDesc.heap = nvtxPool; nvtxRegionsDesc.regionCount = 1; nvtxRegionsDesc.regionDescElementSize = sizeof(nvtxMemVirtualRangeDesc_t); nvtxRegionsDesc.regionDescElements = &nvtxRangeDesc; nvtxMemRegionsRegister(nvtxDomain, &nvtxRegionsDesc);For your convenience, Initcheck assumes that a new suballocation is uninitialized, meaning failure to initialize it might result in error reports. Please note that Compute Sanitizer only supports nvtxMemRegionsRegister with parameter regionType = NVTX_MEM_TYPE_VIRTUAL_ADDRESS. Suballocations are considered as regular allocations for NVTX naming and permissions API, therefore it is possible to label them or change their permissions.
// Requires <nvtx3/nvToolsExtMem.h> nvtxMemVirtualRangeDesc_t nvtxRangeDesc = {}; nvtxRangeDesc.size = 32; nvtxRangeDesc.ptr = ptr; nvtxMemRegionsResizeBatch_t nvtxRegionsDesc = {}; nvtxRegionsDesc.extCompatID = NVTX_EXT_COMPATID_MEM; nvtxRegionsDesc.structSize = sizeof(nvtxMemRegionsResizeBatch_t); nvtxRegionsDesc.regionType = NVTX_MEM_TYPE_VIRTUAL_ADDRESS; nvtxRegionsDesc.regionDescCount = 1; nvtxRegionsDesc.regionDescElementSize = sizeof(nvtxMemVirtualRangeDesc_t); nvtxRegionsDesc.regionDescElements = &nvtxRangeDesc; nvtxMemRegionsResize(nvtxDomain, &nvtxRegionsDesc);
nvtxMemRegionRef_t nvtxRegionRef; nvtxRegionRef.pointer = ptr; nvtxMemRegionsUnregisterBatch_t nvtxRegionsDesc = {}; nvtxRegionsDesc.extCompatID = NVTX_EXT_COMPATID_MEM; nvtxRegionsDesc.structSize = sizeof(nvtxMemRegionsUnregisterBatch_t); nvtxRegionsDesc.refType = NVTX_MEM_REGION_REF_TYPE_POINTER; nvtxRegionsDesc.refCount = 1; nvtxRegionsDesc.refElementSize = sizeof(nvtxMemRegionRef_t); nvtxRegionsDesc.refElements = &nvtxRegionRef; nvtxMemRegionsUnregister(nvtxDomain, &nvtxRegionsDesc);Omitting to unregister a suballocation is reported as a memory leak if Compute Sanitizer is used in combination with option --leak-check yes.
2.4. Naming API
// Requires <nvtx3/nvToolsExtMem.h> nvtxMemRegionNameDesc_t nvtxLabelDesc; nvtxLabelDesc.regionRefType = NVTX_MEM_REGION_REF_TYPE_POINTER; nvtxLabelDesc.nameType = NVTX_MESSAGE_TYPE_ASCII; nvtxLabelDesc.region.pointer = ptr; nvtxLabelDesc.name.ascii = "My Allocation"; nvtxMemRegionsNameBatch_t nvtxRegionsDesc = {}; nvtxRegionsDesc.extCompatID = NVTX_EXT_COMPATID_MEM; nvtxRegionsDesc.structSize = sizeof(nvtxMemRegionsNameBatch_t); nvtxRegionsDesc.regionCount = 1; nvtxRegionsDesc.regionElementSize = sizeof(nvtxMemRegionNameDesc_t); nvtxRegionsDesc.regionElements = &nvtxLabelDesc; nvtxMemRegionsName(nvtxDomain, &nvtxRegionsDesc);Please note that Compute Sanitizer only supports nvtxMemRegionsName with parameter nameType = NVTX_MESSAGE_TYPE_ASCII for all region elements in regionElements. As of now, only leak and unused memory reporting features allocation names.
2.5. Permissions API
2.5.1. Basic Permissions Management
// Requires <nvtx3/nvToolsExtMem.h> and <nvtx3/nvToolsExtMemCudaRt.h> auto processPermHandle = nvtxMemCudaGetProcessWidePermissions(nvtxDomain); nvtxMemPermissionsAssignRegionDesc_t nvtxPermDesc; nvtxPermDesc.flags = NVTX_MEM_PERMISSIONS_REGION_FLAGS_READ; nvtxPermDesc.regionRefType = NVTX_MEM_REGION_REF_TYPE_POINTER; nvtxPermDesc.region.pointer = ptr; nvtxMemPermissionsAssignBatch_t nvtxRegionsDesc = {}; nvtxRegionsDesc.extCompatID = NVTX_EXT_COMPATID_MEM; nvtxRegionsDesc.structSize = sizeof(nvtxMemPermissionsAssignBatch_t); nvtxRegionsDesc.permissions = processPermHandle; nvtxRegionsDesc.regionCount = 1; nvtxRegionsDesc.regionElementSize = sizeof(nvtxMemPermissionsAssignRegionDesc_t); nvtxRegionsDesc.regionElements = &nvtxPermDesc; nvtxMemPermissionsAssign(nvtxDomain, &nvtxRegionsDesc);Valid permissions are:
- Read: NVTX_MEM_PERMISSIONS_REGION_FLAGS_READ
- Write: NVTX_MEM_PERMISSIONS_REGION_FLAGS_WRITE
- Atomic: NVTX_MEM_PERMISSIONS_REGION_FLAGS_ATOMIC
- A combination of read, write and atomic (using XORs).
- Reset: NVTX_MEM_PERMISSIONS_REGION_FLAGS_RESET
// Requires <nvtx3/nvToolsExtMem.h> and <nvtx3/nvToolsExtMemCudaRt.h> auto devicePermHandle = nvtxMemCudaGetDeviceWidePermissions(nvtxDomain, device); nvtxMemPermissionsAssignRegionDesc_t nvtxPermDesc; nvtxPermDesc.flags = NVTX_MEM_PERMISSIONS_REGION_FLAGS_READ | NVTX_MEM_PERMISSIONS_REGION_FLAGS_WRITE; nvtxPermDesc.regionRefType = NVTX_MEM_REGION_REF_TYPE_POINTER; nvtxPermDesc.region.pointer = ptr; nvtxMemPermissionsAssignBatch_t nvtxRegionsDesc = {}; nvtxRegionsDesc.extCompatID = NVTX_EXT_COMPATID_MEM; nvtxRegionsDesc.structSize = sizeof(nvtxMemPermissionsAssignBatch_t); nvtxRegionsDesc.permissions = devicePermHandle; nvtxRegionsDesc.regionCount = 1; nvtxRegionsDesc.regionElementSize = sizeof(nvtxMemPermissionsAssignRegionDesc_t); nvtxRegionsDesc.regionElements = &nvtxPermDesc; nvtxMemPermissionsAssign(nvtxDomain, &nvtxRegionsDesc);
2.5.2. Advanced Permissions Management
// Requires <nvtx3/nvToolsExtMem.h> and <nvtx3/nvToolsExtMemCudaRt.h> // Create new permissions object. auto permHandle = nvtxMemPermissionsCreate(nvtxDomain, NVTX_MEM_PERMISSIONS_CREATE_FLAGS_NONE); nvtxMemPermissionsAssignRegionDesc_t nvtxPermDesc; nvtxPermDesc.flags = NVTX_MEM_PERMISSIONS_REGION_FLAGS_READ; nvtxPermDesc.regionRefType = NVTX_MEM_REGION_REF_TYPE_POINTER; nvtxPermDesc.region.pointer = ptr; nvtxMemPermissionsAssignBatch_t nvtxRegionsDesc = {}; nvtxRegionsDesc.extCompatID = NVTX_EXT_COMPATID_MEM; nvtxRegionsDesc.structSize = sizeof(nvtxMemPermissionsAssignBatch_t); nvtxRegionsDesc.permissions = permHandle; nvtxRegionsDesc.regionCount = 1; nvtxRegionsDesc.regionElementSize = sizeof(nvtxMemPermissionsAssignRegionDesc_t); nvtxRegionsDesc.regionElements = &nvtxPermDesc; // Assign read-only permissions to allocation at address ptr. // Permissions will be applied on scope bound to permHandle. nvtxMemPermissionsAssign(nvtxDomain, &nvtxRegionsDesc); // Binding will happen on next kernel launch on this CPU thread, meaning the // stream for this launch will be the one bound to this permissions object. nvtxMemPermissionsBind( nvtxDomain, permHandle, NVTX_MEM_PERMISSIONS_BIND_SCOPE_CUDA_STREAM, NVTX_MEM_PERMISSIONS_BIND_FLAGS_NONE); // permHandle is now bound to stream. MyKernel<<<BlocksNb, ThreadsNb, 0, stream>>>(ptr);On permissions object creation or binding, you can specify inheritance restriction flags. For example, excluding write permissions will block access for all allocations with unassigned permissions on that scope. These are applied:
- nvtxMemPermissionsCreate: applied for kernel launches on stream bound to the created object.
- nvtxMemPermissionsBind: applied for next kernel launch (on this CPU thread) and others using the same stream.
// Requires <nvtx3/nvToolsExtMem.h>
nvtxMemPermissionsUnbind(nvtxDomain, NVTX_MEM_PERMISSIONS_BIND_SCOPE_CUDA_STREAM)
nvtxMemPermissionsDestroy(nvtxDomain, permHandle);
Please note that Compute Sanitizer only supports
nvtxMemPermissionsUnbind with parameter
scope = NVTX_MEM_PERMISSIONS_BIND_SCOPE_CUDA_STREAM.
// Requires <nvtx3/nvToolsExtMem.h> auto permHandle = nvtxMemCudaGetDeviceWidePermissions(nvtxDomain, device); nvtxMemCudaSetPeerAccess( nvtxDomain, permHandle, NVTX_MEM_CUDA_PEER_ALL_DEVICES, NVTX_MEM_PERMISSIONS_REGION_FLAGS_READ); nvtxMemCudaSetPeerAccess( nvtxDomain, permHandle, device, NVTX_MEM_PERMISSIONS_REGION_FLAGS_READ | NVTX_MEM_PERMISSIONS_REGION_FLAGS_WRITE | NVTX_MEM_PERMISSIONS_REGION_FLAGS_ATOMIC);
3. Limitations
- Allocation names are visible on leak and unused memory reports, but not on other error reports for now.
- Allocation names must be encoded in ASCII, contain only printable characters, and contain between 1 and 49 characters (must comply to the following regex: ^[:print:]{1,49}$)
- Permissions are only applied to kernel launches. Other operations, such as cudaMemcpy or cudaMemset, are not supported for now.
Notices
Notice
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.