Represents a runtime compiler that can transform CUDA source code into compiled functions. More...

#include <nvneural/CudaTypes.h>

Inheritance diagram for nvneural::ICudaRuntimeCompiler:

Classes
struct	CompilationDetails
	Params struct describing a compilation request. More...

Public Member Functions
virtual NeuralResult	compile (ICudaCompiledFunction **ppCompiledFunctionOut, const CompilationDetails &compilationDetails) noexcept=0
	Compiles source code into an ICudaCompiledFunction object. More...

virtual NeuralResult	loadCubin (ICudaCompiledFunction *ppCompiledFunctionOut, std::uint8_t pCode, std::size_t codeSize, const char *pEntryPoint) noexcept=0
	Converts a binary representation of a cubin into an ICudaCompiledFunction object. More...

virtual NeuralResult	setTargetArchitecture (const char *pTargetArch) noexcept=0
	Retargets the compiler for a particular GPU architecture. More...

virtual const char *	targetArchitecture () const noexcept=0
	Returns the current target GPU architecture for compilation. More...

Public Member Functions inherited from nvneural::IRefObject
virtual RefCount	addRef () const noexcept=0
	Increments the object's reference count. More...

virtual const void *	queryInterface (TypeId interface) const noexcept=0
	This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

virtual void *	queryInterface (TypeId interface) noexcept=0
	Retrieves a new object interface pointer. More...

virtual RefCount	release () const noexcept=0
	Decrements the object's reference count and destroy the object if the reference count reaches zero. More...

Static Public Attributes
static const IRefObject::TypeId	typeID = 0xe39c2816f916d342ul
	Interface TypeId for InterfaceOf purposes.

Static Public Attributes inherited from nvneural::IRefObject
static const TypeId	typeID = 0x14ecc3f9de638e1dul
	Interface TypeId for InterfaceOf purposes.

Additional Inherited Members
Public Types inherited from nvneural::IRefObject
using	RefCount = std::uint32_t
	Typedef used to track the number of active references to an object.

using	TypeId = std::uint64_t
	Every interface must define a unique TypeId. This should be randomized.

Protected Member Functions inherited from nvneural::IRefObject
virtual	~IRefObject ()=default
	A protected destructor prevents accidental stack-allocation of IRefObjects or use with other smart pointer classes like std::unique_ptr.

Detailed Description

Represents a runtime compiler that can transform CUDA source code into compiled functions.

Compiled functions are contained in ICudaCompiledFunction objects. This is typically done with the CUDA nvrtc library; see https://docs.nvidia.com/cuda/nvrtc/index.html for more details.

Member Function Documentation

◆ compile()

virtual NeuralResult nvneural::ICudaRuntimeCompiler::compile	(	ICudaCompiledFunction **	ppCompiledFunctionOut,
		const CompilationDetails &	compilationDetails
	)

pure virtualnoexcept

Compiles source code into an ICudaCompiledFunction object.

If the compilation fails, compiler output is returned using the default logger. Implementations of ICudaRuntimeCompiler are allowed and encouraged to cache compiler output.

Parameters

ppCompiledFunctionOut	Variable receiving a reference to a compiled function object
compilationDetails	Parameter structure containing details of the requested compilation

◆ loadCubin()

virtual NeuralResult nvneural::ICudaRuntimeCompiler::loadCubin	(	ICudaCompiledFunction **	ppCompiledFunctionOut,
		std::uint8_t *	pCode,
		std::size_t	codeSize,
		const char *	pEntryPoint
	)

pure virtualnoexcept

Converts a binary representation of a cubin into an ICudaCompiledFunction object.

Parameters

ppCompiledFunctionOut	Variable receiving a reference to a compiled function object
pCode	Pointer to the first element in a codeSize-sized byte array containing the cubin
codeSize	Number of elements in the pCode buffer
pEntryPoint	Name of the compiled function's entry point

◆ setTargetArchitecture()

virtual NeuralResult nvneural::ICudaRuntimeCompiler::setTargetArchitecture ( const char * pTargetArch )

pure virtualnoexcept

Retargets the compiler for a particular GPU architecture.

Note that invalid architectures may lead to later failures in compile, and compiling for valid-but-incompatible architectures (e.g., sm_86 on GV100) will result in unlaunchable ICudaCompiledFunction objects.

You do not need to call this function inside inference scenarios; the default architecture for the compiler is compatible with the owning backend's CUDA context. Only code generation and "force PTX/force CUBIN" use cases will need to change target architectures.

It is good practice to save the original target architecture, then restore it after cross-compiling kernels.

Parameters

pTargetArch Architecture suitable for nvrtc's –gpu-architecture argument

◆ targetArchitecture()

virtual const char* nvneural::ICudaRuntimeCompiler::targetArchitecture ( ) const

pure virtualnoexcept

Returns the current target GPU architecture for compilation.

Value returned is compatible with setTargetArchitecture.

The documentation for this class was generated from the following file:

Core/Inc/nvneural/CudaTypes.h

Classes

Public Member Functions

Static Public Attributes

Additional Inherited Members

Detailed Description

Member Function Documentation

◆ compile()

◆ loadCubin()

◆ setTargetArchitecture()

◆ targetArchitecture()