NVIDIA NvNeural SDK
2022.2
GPU inference framework for NVIDIA Nsight Deep Learning Designer
|
Defines an input layer that accepts data from CUDA device memory. More...
#include <nvneural/layers/ICudaInputLayer.h>
Public Member Functions | |
virtual NeuralResult | copyCudaTensorAsync (const void *pBuffer, TensorDimension bufferSize) noexcept=0 |
Loads a raw tensor from a CUDA device pointer. More... | |
![]() | |
virtual RefCount | addRef () const noexcept=0 |
Increments the object's reference count. More... | |
virtual const void * | queryInterface (TypeId interface) const noexcept=0 |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. | |
virtual void * | queryInterface (TypeId interface) noexcept=0 |
Retrieves a new object interface pointer. More... | |
virtual RefCount | release () const noexcept=0 |
Decrements the object's reference count and destroy the object if the reference count reaches zero. More... | |
Static Public Attributes | |
static const IRefObject::TypeId | typeID = 0x66f6b04c4edaf310ul |
Interface TypeId for InterfaceOf purposes. | |
![]() | |
static const TypeId | typeID = 0x14ecc3f9de638e1dul |
Interface TypeId for InterfaceOf purposes. | |
Additional Inherited Members | |
![]() | |
using | RefCount = std::uint32_t |
Typedef used to track the number of active references to an object. | |
using | TypeId = std::uint64_t |
Every interface must define a unique TypeId. This should be randomized. | |
![]() | |
virtual | ~IRefObject ()=default |
A protected destructor prevents accidental stack-allocation of IRefObjects or use with other smart pointer classes like std::unique_ptr. | |
Defines an input layer that accepts data from CUDA device memory.
Note that IInputLayer is a type trait that should accompany this interface in almost all cases.
|
pure virtualnoexcept |
Loads a raw tensor from a CUDA device pointer.
The tensor must be in the layer's native format as described by ILayer::tensorFormat. Layers with additional tiling requirements (as in IStandardInputLayer::tilingFactor or ILayer::stepping) should reject unaligned data instead of inserting extra padding.
The copy is performed asynchronously on the CUDA backend's primary stream. Do not modify the source buffer until the copy completes. Use cuStreamSynchronize or INetworkBackend::synchronize to ensure all pending operations complete.
pBuffer | Pointer to the first element of the input tensor |
bufferSize | Size of the buffer in elements |