NVIDIA NvNeural SDK
2022.2
GPU inference framework for NVIDIA Nsight Deep Learning Designer
|
INetworkBackend is a runtime-specific interface for CUDA, DirectX, or other system- specific operations needed during inference. More...
#include <nvneural/CoreTypes.h>
Public Member Functions | |
virtual NeuralResult | bindCurrentThread () noexcept=0 |
Rebinds internal data structures to the current thread. | |
virtual NetworkBackendId | id () const noexcept=0 |
Introspection function: Returns the backend ID implemented by this interface. | |
virtual NeuralResult | saveImage (const ILayer *pLayer, const INetworkRuntime *pNetwork, IImage *pImage, ImageSpace imageSpace, size_t channels) noexcept=0 |
Converts a layer's output tensor to a CPU image. More... | |
virtual NeuralResult | synchronize () noexcept=0 |
Performs a CPU/GPU sync and completes all pending operations on the device. | |
virtual NeuralResult | transformTensor (void *pDeviceDestination, TensorFormat destinationFormat, TensorDimension destinationSize, const void *pDeviceSource, TensorFormat sourceFormat, TensorDimension sourceSize) noexcept=0 |
Transforms a tensor from one format to another. | |
Device management functions | |
virtual NeuralResult | initializeFromDeviceOrdinal (std::uint32_t deviceOrdinal) noexcept=0 |
Initializes the backend to point to a specific device ordinal. More... | |
virtual NeuralResult | initializeFromDeviceIdentifier (const IBackendDeviceIdentifier *pDeviceIdentifier) noexcept=0 |
Initializes the backend to point to a specific device identifier. More... | |
virtual const IBackendDeviceIdentifier * | deviceIdentifier () const noexcept=0 |
Retrieves an opaque device identifier object corresponding to the device associated with this backend. More... | |
Low-level memory allocation functions | |
virtual NeuralResult | setDeviceMemory (void *pDeviceDestination, std::uint8_t value, std::size_t byteCount) noexcept=0 |
Fills a buffer with a preset value. Equivalent to memset. | |
virtual NeuralResult | copyMemoryD2D (void *pDeviceDestination, const void *pDeviceSource, std::size_t byteCount) noexcept=0 |
Device-to-device memory copy. | |
virtual NeuralResult | copyMemoryH2D (void *pDeviceDestination, const void *pHostSource, std::size_t byteCount) noexcept=0 |
Host-to-device memory copy. | |
virtual NeuralResult | copyMemoryD2H (void *pHostDestination, const void *pDeviceSource, std::size_t byteCount) noexcept=0 |
Device-to-host memory copy. | |
Library handle management functions | |
Many layer classes rely on external libraries for computation, and these libraries may be expensive to initialize repeatedly. These functions allow layers to query the presence of preexisting library contexts, and initialize them if required. Since many of these libraries share the same synchronization/binding behavior as the core backend, library support is implemented as a key-value store of ILibraryContext objects rather than direct handle access. A registered ILibraryContext object receives callbacks when rebinding or synchronization is required. | |
virtual NeuralResult | registerLibraryContext (ILibraryContext *pLibraryContext) noexcept=0 |
Registers a new library context with the backend. More... | |
virtual ILibraryContext * | getLibraryContext (ILibraryContext::LibraryId libraryId) noexcept=0 |
Retrieves a library context by its identifier. More... | |
virtual const ILibraryContext * | getLibraryContext (ILibraryContext::LibraryId libraryId) const noexcept=0 |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. | |
High-level memory allocation functions | |
virtual NeuralResult | allocateMemoryBlock (MemoryHandle *pHandle, size_t byteCount) noexcept=0 |
Allocates a memory block of the requested size. More... | |
virtual NeuralResult | freeMemoryBlock (MemoryHandle handle) noexcept=0 |
Frees a memory block that was allocated with allocateMemoryBlock. More... | |
virtual void * | getAddressForMemoryBlock (MemoryHandle handle) noexcept=0 |
Retrieves the raw address corresponding to a MemoryHandle. More... | |
virtual size_t | getSizeForMemoryBlock (MemoryHandle handle) noexcept=0 |
Retrieves the buffer size corresponding to a MemoryHandle. More... | |
virtual NeuralResult | lockMemoryBlock (MemoryHandle handle) noexcept=0 |
Locks a memory block to prevent reuse. More... | |
virtual NeuralResult | unlockMemoryBlock (MemoryHandle handle) noexcept=0 |
Unlocks a memory block. More... | |
virtual MemoryHandle | updateTensor (const ILayer *pLayer, INetworkRuntime *pNetwork, TensorFormat format, MemoryHandle hOriginal, TensorDimension stepping, TensorDimension internalDimensions) noexcept=0 |
Updates a memory handle. | |
Weights management | |
virtual NeuralResult | clearLoadedWeights () noexcept=0 |
Clears all loaded weights. | |
virtual NeuralResult | uploadWeights (const void **ppUploadedWeightsOut, const ILayer *pLayer, const IWeightsLoader *pOriginWeightLoader, const char *pName, const void *pWeightsData, std::size_t weightsDataSize, TensorDimension weightsDim, TensorFormat format, bool memManagedExternally) noexcept=0 |
Uploads weights data to an internal cache. More... | |
virtual const void * | getAddressForWeightsData (const ILayer *pLayer, const IWeightsLoader *pOriginWeightLoader, const char *pName, TensorFormat format) const noexcept=0 |
Retrieves loaded weights data from the internal cache. More... | |
virtual NeuralResult | getDimensionsForWeightsData (TensorDimension *pDimensionOut, const ILayer *pLayer, const IWeightsLoader *pOriginWeightLoader, const char *pName, TensorFormat format) const noexcept=0 |
Retrieves loaded weights dimensions from the internal cache. More... | |
virtual NeuralResult | getWeightsNamesForLayer (IStringList **ppListOut, const ILayer *pLayer, const IWeightsLoader *pOriginWeightLoader) const noexcept=0 |
Retrieves names of loaded weights objects from the internal cache. More... | |
![]() | |
virtual RefCount | addRef () const noexcept=0 |
Increments the object's reference count. More... | |
virtual const void * | queryInterface (TypeId interface) const noexcept=0 |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. | |
virtual void * | queryInterface (TypeId interface) noexcept=0 |
Retrieves a new object interface pointer. More... | |
virtual RefCount | release () const noexcept=0 |
Decrements the object's reference count and destroy the object if the reference count reaches zero. More... | |
Static Public Attributes | |
static const IRefObject::TypeId | typeID = 0xacd7828da90108ddul |
Interface TypeId for InterfaceOf purposes. | |
![]() | |
static const TypeId | typeID = 0x14ecc3f9de638e1dul |
Interface TypeId for InterfaceOf purposes. | |
Optimization capability checking | |
Not all backends may be compatible with certain network optimizations we perform. We therefore expose a capabilities system where backends can be queried before the Network class makes use of the optimization in question. When creating a network backend, please "fail safe" in these queries. If you do not recognize the optimization in question (perhaps the Network implementation is newer than your backend), do not claim to support the optimization. Unconditionally saying "no, this is unsupported" to these checks is always safe though may result in reduced performance. | |
enum class | OptimizationCapability : std::uint64_t { SkipConcatenation = 0xdd13d58fbabb2f5bul , FuseBatchNormAndConvolution = 0xeaffe6d9a4acfdc9ul } |
List of optional optimizations supported by backends. More... | |
virtual bool | supportsOptimization (OptimizationCapability optimization) const noexcept=0 |
Returns true if the indicated optimization is applicable to this backend. More... | |
Additional Inherited Members | |
![]() | |
using | RefCount = std::uint32_t |
Typedef used to track the number of active references to an object. | |
using | TypeId = std::uint64_t |
Every interface must define a unique TypeId. This should be randomized. | |
![]() | |
virtual | ~IRefObject ()=default |
A protected destructor prevents accidental stack-allocation of IRefObjects or use with other smart pointer classes like std::unique_ptr. | |
INetworkBackend is a runtime-specific interface for CUDA, DirectX, or other system- specific operations needed during inference.
|
strong |
List of optional optimizations supported by backends.
Enumerator | |
---|---|
SkipConcatenation | Backends exposing buffers as GPU virtual addresses rather than resource handles can skip concatenation layers by copying tensors directly into the relevant part of the concatenated output. Concatenation layers are identified by the presence of IConcatenationLayer. |
FuseBatchNormAndConvolution | Batch normalization and convolution can be fused into a single launch. Batch normalization layers are identified by the presence of the IBatchNormalizationLayer interface. Convolution layers are identified by the presence of IConvolutionLayer. |
|
pure virtualnoexcept |
Allocates a memory block of the requested size.
This memory must be freed with freeMemoryBlock.
If this function fails, pHandle will receive nullptr as a value.
pHandle | [out] Pointer receiving a MemoryHandle to the new memory |
byteCount | Number of bytes to allocate |
|
pure virtualnoexcept |
Retrieves an opaque device identifier object corresponding to the device associated with this backend.
|
pure virtualnoexcept |
Frees a memory block that was allocated with allocateMemoryBlock.
Freeing nullptr is explicitly permitted and does nothing.
handle | MemoryHandle to the buffer to be freed |
|
pure virtualnoexcept |
Retrieves the raw address corresponding to a MemoryHandle.
This address may correspond to a GPU pointer (such as CUDA device address space), so do not assume the return value is CPU-accessible by default. Check the documentation for the backend in question.
handle | MemoryHandle to query |
|
pure virtualnoexcept |
Retrieves loaded weights data from the internal cache.
If the weights have not been uploaded with uploadWeights in the desired format, this call is allowed to fail rather than causing a tensor transformation.
pLayer | Layer associated with the requested weights data |
pOriginWeightLoader | Weight loader associated with the weight |
pName | Name associated with the requested weights data |
format | Desired tensor format of the weights data |
|
pure virtualnoexcept |
Retrieves loaded weights dimensions from the internal cache.
If the weights have not been uploaded with uploadWeights in the desired format, this call is allowed to fail rather than causing a tensor transformation.
pDimensionOut | Output pointer receiving the weights dimensions |
pLayer | Layer associated with the requested weights data |
pOriginWeightLoader | Weight loader associated with the weight |
pName | Name associated with the requested weights data |
format | Desired tensor format of the weights data |
|
pure virtualnoexcept |
Retrieves a library context by its identifier.
libraryId | Library context identifier to retrieve. |
|
pure virtualnoexcept |
Retrieves the buffer size corresponding to a MemoryHandle.
System memory allocation functions (e.g., VirtualAlloc, cuMemAlloc) may add additional padding or alignment bytes to the original buffer size. They are not included in the value returned by this function.
handle | MemoryHandle to query |
|
pure virtualnoexcept |
Retrieves names of loaded weights objects from the internal cache.
ppListOut | Variable receiving a reference to a new IStringList. Caller must release the reference. |
pLayer | Layer associated with the requested weights data |
pOriginWeightLoader | Weight loader associated with the weight |
|
pure virtualnoexcept |
Initializes the backend to point to a specific device identifier.
This is used to ensure all backends execute on the same GPU; typically you should initialize one backend with initializeFromDeviceOrdinal, then retrieve its IBackendDeviceIdentifier and pass it to the other backends.
Backends do not support reinitialization; attempts to call this function after the backend has been initialized will fail.
pDeviceIdentifier | System-specific device identifier |
|
pure virtualnoexcept |
Initializes the backend to point to a specific device ordinal.
The enumeration order is backend-specific, and not guaranteed to be stable between runs or reboots. Typically ordinal zero refers to the backend's "primary" device.
Backends do not support reinitialization; attempts to call this function after the backend has been initialized will fail.
deviceOrdinal | Index for device enumeration |
|
pure virtualnoexcept |
Locks a memory block to prevent reuse.
To avoid leaks, be sure to unlock the memory block with unlockMemoryBlock.
handle | MemoryHandle to lock. |
|
pure virtualnoexcept |
Registers a new library context with the backend.
The backend takes a reference to the context. This function is in INetworkBackend rather than host-application-specific interfaces because layers might perform their own on-demand library context registration.
pLibraryContext | Context to register |
|
pure virtualnoexcept |
Converts a layer's output tensor to a CPU image.
pLayer | Layer to read. |
pImage | IImage object to receive image data. |
pNetwork | Network associated with the layer |
imageSpace | Conversion function to apply when mapping floats to RGB. |
channels | Number of channels to copy into the image. |
|
pure virtualnoexcept |
Returns true if the indicated optimization is applicable to this backend.
Reminder to implementers: "return false" is always safe. Do not return true if the optimization identifier being queried is not known to your code.
See https://devblogs.microsoft.com/oldnewthing/20040211-00/?p=40663 for an example.
|
pure virtualnoexcept |
Unlocks a memory block.
The memory block must have been locked previously by a call to lockMemoryBlock.
handle | MemoryHandle to lock. |
|
pure virtualnoexcept |
Uploads weights data to an internal cache.
ppUploadedWeightsOut | Variable which will receive a pointer to the uploaded data |
pLayer | Layer to associate with the weights data |
pOriginWeightLoader | Weight loader associated with the weight |
pName | Name to associate with the weights data |
pWeightsData | Binary buffer to upload |
weightsDataSize | Size of pWeightsData, in bytes |
weightsDim | Size of pWeightsData, in tensor dimensions |
format | Desired tensor format of the uploaded data |
memManagedExternally | pWeightsData owner is an external owner, likely caller |