NVIDIA NvNeural SDK  2022.2
GPU inference framework for NVIDIA Nsight Deep Learning Designer
nvneural::ICudaRuntimeCompiler Class Referenceabstract

Represents a runtime compiler that can transform CUDA source code into compiled functions. More...

#include <nvneural/CudaTypes.h>

Inheritance diagram for nvneural::ICudaRuntimeCompiler:
nvneural::IRefObject

Classes

struct  CompilationDetails
 Params struct describing a compilation request. More...
 

Public Member Functions

virtual NeuralResult compile (ICudaCompiledFunction **ppCompiledFunctionOut, const CompilationDetails &compilationDetails) noexcept=0
 Compiles source code into an ICudaCompiledFunction object. More...
 
virtual NeuralResult loadCubin (ICudaCompiledFunction **ppCompiledFunctionOut, std::uint8_t *pCode, std::size_t codeSize, const char *pEntryPoint) noexcept=0
 Converts a binary representation of a cubin into an ICudaCompiledFunction object. More...
 
virtual NeuralResult setTargetArchitecture (const char *pTargetArch) noexcept=0
 Retargets the compiler for a particular GPU architecture. More...
 
virtual const char * targetArchitecture () const noexcept=0
 Returns the current target GPU architecture for compilation. More...
 
- Public Member Functions inherited from nvneural::IRefObject
virtual RefCount addRef () const noexcept=0
 Increments the object's reference count. More...
 
virtual const void * queryInterface (TypeId interface) const noexcept=0
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
 
virtual void * queryInterface (TypeId interface) noexcept=0
 Retrieves a new object interface pointer. More...
 
virtual RefCount release () const noexcept=0
 Decrements the object's reference count and destroy the object if the reference count reaches zero. More...
 

Static Public Attributes

static const IRefObject::TypeId typeID = 0xe39c2816f916d342ul
 Interface TypeId for InterfaceOf purposes.
 
- Static Public Attributes inherited from nvneural::IRefObject
static const TypeId typeID = 0x14ecc3f9de638e1dul
 Interface TypeId for InterfaceOf purposes.
 

Additional Inherited Members

- Public Types inherited from nvneural::IRefObject
using RefCount = std::uint32_t
 Typedef used to track the number of active references to an object.
 
using TypeId = std::uint64_t
 Every interface must define a unique TypeId. This should be randomized.
 
- Protected Member Functions inherited from nvneural::IRefObject
virtual ~IRefObject ()=default
 A protected destructor prevents accidental stack-allocation of IRefObjects or use with other smart pointer classes like std::unique_ptr.
 

Detailed Description

Represents a runtime compiler that can transform CUDA source code into compiled functions.

Compiled functions are contained in ICudaCompiledFunction objects. This is typically done with the CUDA nvrtc library; see https://docs.nvidia.com/cuda/nvrtc/index.html for more details.

Member Function Documentation

◆ compile()

virtual NeuralResult nvneural::ICudaRuntimeCompiler::compile ( ICudaCompiledFunction **  ppCompiledFunctionOut,
const CompilationDetails compilationDetails 
)
pure virtualnoexcept

Compiles source code into an ICudaCompiledFunction object.

If the compilation fails, compiler output is returned using the default logger. Implementations of ICudaRuntimeCompiler are allowed and encouraged to cache compiler output.

Parameters
ppCompiledFunctionOutVariable receiving a reference to a compiled function object
compilationDetailsParameter structure containing details of the requested compilation

◆ loadCubin()

virtual NeuralResult nvneural::ICudaRuntimeCompiler::loadCubin ( ICudaCompiledFunction **  ppCompiledFunctionOut,
std::uint8_t *  pCode,
std::size_t  codeSize,
const char *  pEntryPoint 
)
pure virtualnoexcept

Converts a binary representation of a cubin into an ICudaCompiledFunction object.

Parameters
ppCompiledFunctionOutVariable receiving a reference to a compiled function object
pCodePointer to the first element in a codeSize-sized byte array containing the cubin
codeSizeNumber of elements in the pCode buffer
pEntryPointName of the compiled function's entry point

◆ setTargetArchitecture()

virtual NeuralResult nvneural::ICudaRuntimeCompiler::setTargetArchitecture ( const char *  pTargetArch)
pure virtualnoexcept

Retargets the compiler for a particular GPU architecture.

Note that invalid architectures may lead to later failures in compile, and compiling for valid-but-incompatible architectures (e.g., sm_86 on GV100) will result in unlaunchable ICudaCompiledFunction objects.

You do not need to call this function inside inference scenarios; the default architecture for the compiler is compatible with the owning backend's CUDA context. Only code generation and "force PTX/force CUBIN" use cases will need to change target architectures.

It is good practice to save the original target architecture, then restore it after cross-compiling kernels.

Parameters
pTargetArchArchitecture suitable for nvrtc's –gpu-architecture argument

◆ targetArchitecture()

virtual const char* nvneural::ICudaRuntimeCompiler::targetArchitecture ( ) const
pure virtualnoexcept

Returns the current target GPU architecture for compilation.

Value returned is compatible with setTargetArchitecture.


The documentation for this class was generated from the following file: