NVIDIA NvNeural SDK
2021.2
GPU inference framework for NVIDIA Nsight Deep Learning Designer
|
Represents a runtime-compiled function object from ICudaRuntimeCompiler. More...
#include <nvneural/CudaTypes.h>
Public Member Functions | |
virtual const void * | compiledBinary () const noexcept=0 |
Provides a pointer to the compiled binary representation (PTX or cubin) of this function. More... | |
virtual std::size_t | compiledBinarySize () const noexcept=0 |
Provides the size of the compiled binary (PTX or cubin) representation of this function in bytes. More... | |
virtual CUfunction | function () const noexcept=0 |
Returns the CUfunction represented by this function object. More... | |
virtual NeuralResult | launch (INetworkBackendCuda *pBackend, std::size_t gridSizeX, std::size_t gridSizeY, std::size_t gridSizeZ, std::size_t blockSizeX, std::size_t blockSizeY, std::size_t blockSizeZ, void **ppArguments, std::uint32_t smem) const noexcept=0 |
Launches the function on the specified CUDA backend's stream. More... | |
virtual CUmodule | module () const noexcept=0 |
Returns the CUmodule containing this function object. More... | |
![]() | |
virtual RefCount | addRef () const noexcept=0 |
Increments the object's reference count. More... | |
virtual const void * | queryInterface (TypeId interface) const noexcept=0 |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. | |
virtual void * | queryInterface (TypeId interface) noexcept=0 |
Retrieves a new object interface pointer. More... | |
virtual RefCount | release () const noexcept=0 |
Decrements the object's reference count and destroy the object if the reference count reaches zero. More... | |
Static Public Attributes | |
static const IRefObject::TypeId | typeID = 0x467d6d0e91bcc332ul |
Interface TypeId for InterfaceOf purposes. | |
![]() | |
static const TypeId | typeID = 0x14ecc3f9de638e1dul |
Interface TypeId for InterfaceOf purposes. | |
Additional Inherited Members | |
![]() | |
using | RefCount = std::uint32_t |
Typedef used to track the number of active references to an object. | |
using | TypeId = std::uint64_t |
Every interface must define a unique TypeId. This should be randomized. | |
![]() | |
virtual | ~IRefObject ()=default |
A protected destructor prevents accidental stack-allocation of IRefObjects or use with other smart pointer classes like std::unique_ptr. | |
Represents a runtime-compiled function object from ICudaRuntimeCompiler.
|
pure virtualnoexcept |
Provides a pointer to the compiled binary representation (PTX or cubin) of this function.
The size of the buffer is given by compiledBinarySize. If no binary representation is available, returns nullptr. The pointer returned by this function remains valid as long as this object does.
|
pure virtualnoexcept |
Provides the size of the compiled binary (PTX or cubin) representation of this function in bytes.
If a precompiled format is not available, returns 0. If the function was compiled to a cubin, returns the cubin; PTX is only returned when compiling to virtual (compute_##) architectures.
|
pure virtualnoexcept |
Returns the CUfunction represented by this function object.
If this function was compiled to an incompatible architecture, this function returns nullptr.
|
pure virtualnoexcept |
Launches the function on the specified CUDA backend's stream.
This function is conceptually equivalent to the cuLaunchKernel driver API.
pBackend | CUDA network backend owning the context/stream |
gridSizeX | X-dimension of launch grid in blocks |
gridSizeY | Y-dimension of launch grid in blocks |
gridSizeZ | Z-dimension of launch grid in blocks |
blockSizeX | X-dimension of a block in threads |
blockSizeY | Y-dimension of a block in threads |
blockSizeZ | Z-dimension of a block in threads |
ppArguments | Array of pointers to kernel parameters; see cuLaunchKernel 'kernelParams' for details |
smem | Dynamic shared-memory size per block in bytes; see cuLaunchKernel 'sharedMemBytes' for details |
|
pure virtualnoexcept |
Returns the CUmodule containing this function object.
If this function was compiled to an incompatible architecture, this function returns nullptr.