Represents a runtime compiler that can transform CUDA source code into compiled functions.
More...
#include <nvneural/CudaTypes.h>
|
using | RefCount = std::uint32_t |
| Typedef used to track the number of active references to an object.
|
|
using | TypeId = std::uint64_t |
| Every interface must define a unique TypeId. This should be randomized.
|
|
virtual | ~IRefObject ()=default |
| A protected destructor prevents accidental stack-allocation of IRefObjects or use with other smart pointer classes like std::unique_ptr.
|
|
Represents a runtime compiler that can transform CUDA source code into compiled functions.
Compiled functions are contained in ICudaCompiledFunction objects. This is typically done with the CUDA nvrtc library; see https://docs.nvidia.com/cuda/nvrtc/index.html for more details.
◆ compile()
Compiles source code into an ICudaCompiledFunction object.
If the compilation fails, compiler output is returned using the default logger. Implementations of ICudaRuntimeCompiler are allowed and encouraged to cache compiler output.
- Parameters
-
ppCompiledFunctionOut | Variable receiving a reference to a compiled function object |
compilationDetails | Parameter structure containing details of the requested compilation |
◆ loadCubin()
virtual NeuralResult nvneural::ICudaRuntimeCompiler::loadCubin |
( |
ICudaCompiledFunction ** |
ppCompiledFunctionOut, |
|
|
std::uint8_t * |
pCode, |
|
|
std::size_t |
codeSize, |
|
|
const char * |
pEntryPoint |
|
) |
| |
|
pure virtualnoexcept |
Converts a binary representation of a cubin into an ICudaCompiledFunction object.
- Parameters
-
ppCompiledFunctionOut | Variable receiving a reference to a compiled function object |
pCode | Pointer to the first element in a codeSize-sized byte array containing the cubin |
codeSize | Number of elements in the pCode buffer |
pEntryPoint | Name of the compiled function's entry point |
◆ setTargetArchitecture()
virtual NeuralResult nvneural::ICudaRuntimeCompiler::setTargetArchitecture |
( |
const char * |
pTargetArch | ) |
|
|
pure virtualnoexcept |
Retargets the compiler for a particular GPU architecture.
Note that invalid architectures may lead to later failures in compile, and compiling for valid-but-incompatible architectures (e.g., sm_86 on GV100) will result in unlaunchable ICudaCompiledFunction objects.
You do not need to call this function inside inference scenarios; the default architecture for the compiler is compatible with the owning backend's CUDA context. Only code generation and "force PTX/force CUBIN" use cases will need to change target architectures.
It is good practice to save the original target architecture, then restore it after cross-compiling kernels.
- Parameters
-
pTargetArch | Architecture suitable for nvrtc's –gpu-architecture argument |
◆ targetArchitecture()
virtual const char* nvneural::ICudaRuntimeCompiler::targetArchitecture |
( |
| ) |
const |
|
pure virtualnoexcept |
Returns the current target GPU architecture for compilation.
Value returned is compatible with setTargetArchitecture.
The documentation for this class was generated from the following file: