Base class for the CUDA prototype layers shipped with NvNeural. More...

#include <BasePrototypeLayer.h>

Inheritance diagram for nvneural::BasePrototypeLayer:

Public Member Functions
ILayer inherited members
const char *	serializedType () const noexcept
	Retrieves the layer type. More...

NeuralResult	setName (const char *pName) noexcept
	Sets the layer name. More...

const char *	name () const noexcept
	Retrieves the layer name. More...

NeuralResult	setNetworkRuntime (INetworkRuntime *pNetworkRuntime) noexcept
	Informs the layer it has been attached to a new network. More...

NetworkBackendId	backendId () const noexcept
	Returns the backend ID associated with this layer implementation. More...

TensorFormat	tensorFormat () const noexcept
	Returns the tensor format consumed by this layer implementation. More...

TensorDimension	stepping () const noexcept
	Returns the internal storage stride consumed by this layer implementation. More...

TensorDimension	dimensions () const noexcept
	Retrieves the dimensions of the layer's output tensor. More...

TensorDimension	internalDimensions () const noexcept
	Retrieves the dimensions of the layer's output tensor as allocated internally. More...

size_t	tensorBufferSize () const noexcept
	Retrieve the size of the layer's output tensor buffer in bytes. More...

size_t	tensorInternalBufferSize () const noexcept
	Retrieves the dimensions of the layer's output tensor as allocated internally. More...

NeuralResult	loadFromParameters (const IParameterNode *pParameters) noexcept
	Loads layer parameters from a serialized key-value representation. More...

NeuralResult	getInputLayers (ILayerList **ppInputLayers) const noexcept
	Retrieves the inputs for this layer. More...

NeuralResult	setInputLayer (size_t index, ILayer *pLayer) noexcept
	Sets an input layer by index. More...

NeuralResult	setPermanent (bool permanent) noexcept
	Sets or clears the "permanent" flag on a layer's output tensor. More...

bool	isPermanent () const noexcept
	Returns the current status of the "permanent" flag. More...

NeuralResult	setAffected (bool affected) noexcept
	Sets or clears the "affected" flag on a layer's output tensor. More...

bool	isAffected () const noexcept
	Returns the current status of the "affected" flag. More...

NeuralResult	setActivationFunction (ActivationFunctionId activationFunction) noexcept
	Sets the activation function attached to the layer. More...

ActivationFunctionId	activationFunction () const noexcept
	Retrieves the activation function attached to this layer. More...

NeuralResult	setActivationCoefficient (std::size_t coefficientIndex, float value) noexcept
	Sets an activation coefficient. More...

float	activationCoefficient (std::size_t coefficientIndex) const noexcept
	Retrieves the activation coefficient for the specified index. More...

const char *	weightsName () const noexcept
	Retrieves the name used to identify this layer's weights. More...

NeuralResult	setWeightsName (const char *pWeightsName) noexcept
	Sets the name used to identify this layer's weights. More...

TensorDimension	weightsDimensions (const char *pWeightsName, WeightsQuery queryType) const noexcept
	Retrieves the tensor dimension of a layer's named weight input. More...

NeuralResult	reshape () noexcept
	Initializes (or reinitializes) the layer implementation with the current set of parameters. More...

NeuralResult	evaluateForward () noexcept
	Performs forward evaluation for this layer. More...

NeuralResult	getData (void *ppOut, TensorFormat format, const ILayer pRequestingLayer) noexcept
	Retrieves device-side memory for the layer's output. More...

NeuralResult	getConstData (const void *ppOut, TensorFormat format, const ILayer pRequestingLayer) const noexcept
	Retrieves read-only device-side memory for the layer's output. More...

NeuralResult	getCpuConstData (void pOutBuffer, size_t bufferByteCount, size_t pBytesCopied, TensorFormat format) const noexcept
	Retrieves read-only CPU-side memory for the layer's output. More...

IPrototypeLayer inherited members
NeuralResult	addPrototypeCode (const char pName, const char pCode) noexcept final
	Adds a kernel definition for a particular tensor format and backend. More...

NeuralResult	addForwardEvalCall (const char pName, const char pCall) noexcept final
	Adds a call script for a particular tensor format and backend. More...

NeuralResult	addParameterKey (const char *pKey) noexcept final
	Adds a named parameter to the layer. More...

ICppCodeGenerationLayer inherited members
NeuralResult	generateLayerCpp (ICppCodeGenerationLayerHost *pHost) noexcept final
	Generates C++ code to configure the layer. More...

NeuralResult	networkGenerationComplete () noexcept final
	Indicates the entire network has been generated. More...

Public Member Functions inherited from nvneural::refobj::RefObjectBase< refobj::Implements< ILayer >, refobj::Implements< IPrototypeLayer >, refobj::Implements< ICppCodeGenerationLayer > >
IRefObject::RefCount	addRef () const noexcept
	Increment the object's reference count. More...

const void *	queryInterface (IRefObject::TypeId interfaceId) const noexcept
	This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

void *	queryInterface (IRefObject::TypeId interfaceId) noexcept
	Retrieves a new object interface pointer. More...

	RefObjectBase ()
	Default constructor. Logs object creation.

IRefObject::RefCount	release () const noexcept
	Decrements the object's reference count and destroys the object if the reference count reaches zero. More...

Protected Member Functions
	BasePrototypeLayer ()
	Creates a new BasePrototypeLayer object.

NeuralResult	loadBasePrototypeInfo (NetworkBackendId backendId, const TensorFormat &format, CompilationLevel compilationLevel, const std::string &currentImpl, const IParameterNode *pParameters)
	Loads prototype information from a parameter node. More...

NeuralResult	setBasePrototypeInfo (std::size_t inputsCount, const std::string &type, NetworkBackendId backend, const TensorFormat &format, const std::string &dim, const std::string &impl) noexcept
	Stores implementation-agnostic details of the prototype layer. More...

Methods to be implemented by derived classes
virtual NeuralResult	implementationReshape () noexcept=0
	Implementation-specific logic for ILayer::reshape.

virtual NeuralResult	implementationForward () noexcept=0
	Implementation-specific logic for ILayer::evaluateForward.

Protected Attributes
detail::SizeValue	m_block
	Preferred block size in threads.

std::string	m_entry
	Name of compiled entry point (e.g., CUDA function name)

detail::SizeValue	m_grid
	Preferred grid size in blocks.

std::vector< detail::AnyOperand >	m_ops
	Arguments to pass to the compiled entry point.

INetworkRuntime *	m_pNetwork
	Pointer to owning network.

std::string	m_selected_code
	Source code for the currently selected implementation.

Detailed Description

Base class for the CUDA prototype layers shipped with NvNeural.

Member Function Documentation

◆ activationCoefficient()

float BasePrototypeLayer::activationCoefficient ( std::size_t coefficientIndex ) const

noexcept

Retrieves the activation coefficient for the specified index.

The meaning of this value is activation-specific. If no setActivationCoefficient call was previously issued for this index, the layer should return a default value of zero.

Parameters

coefficientIndex Coefficient index

◆ activationFunction()

ActivationFunctionId BasePrototypeLayer::activationFunction ( ) const

noexcept

Retrieves the activation function attached to this layer.

Should return the most recent value passed to setActivationFunction.

◆ addForwardEvalCall()

NeuralResult BasePrototypeLayer::addForwardEvalCall	(	const char *	pName,
		const char *	pCall
	)

finalnoexcept

Adds a call script for a particular tensor format and backend.

Replaces previous definitions of that kernel.

Parameters

pName	Name of the prototype (e.g., "cuda_fp32_nchw")
pCall	Call script for the prototype

◆ addParameterKey()

NeuralResult BasePrototypeLayer::addParameterKey ( const char * pKey )

finalnoexcept

Adds a named parameter to the layer.

Parameters

pKey	Parameter name that should be included in kernel calls

◆ addPrototypeCode()

NeuralResult BasePrototypeLayer::addPrototypeCode	(	const char *	pName,
		const char *	pCode
	)

finalnoexcept

Adds a kernel definition for a particular tensor format and backend.

Replaces previous definitions of that kernel.

Parameters

pName	Name of the prototype provided (e.g., "cuda_fp32_nchw")
pCode	Kernel code for the prototype

◆ backendId()

NetworkBackendId BasePrototypeLayer::backendId ( ) const

noexcept

Returns the backend ID associated with this layer implementation.

Networks will reshape tensors as necessary in order to match the return value.

◆ dimensions()

TensorDimension BasePrototypeLayer::dimensions ( ) const

noexcept

Retrieves the dimensions of the layer's output tensor.

Should not change except in response to a reshape event.

◆ evaluateForward()

NeuralResult BasePrototypeLayer::evaluateForward ( )

noexcept

Performs forward evaluation for this layer.

If the layer does not provide a pre-fused implementation of the selected activation function, it MUST call INetworkRuntime::defaultForwardActivation(this).

◆ generateLayerCpp()

NeuralResult BasePrototypeLayer::generateLayerCpp ( ICppCodeGenerationLayerHost * pHost )

finalnoexcept

Generates C++ code to configure the layer.

See the ICppCodeGenerationLayerHost interface for details of the generated code.

A failure result from this function will cause the tool to report failure to the user.

◆ getConstData()

NeuralResult BasePrototypeLayer::getConstData	(	const void **	ppOut,
		TensorFormat	format,
		const ILayer *	pRequestingLayer
	)		const

noexcept

Retrieves read-only device-side memory for the layer's output.

◆ getCpuConstData()

NeuralResult BasePrototypeLayer::getCpuConstData	(	void *	pOutBuffer,
		size_t	bufferByteCount,
		size_t *	pBytesCopied,
		TensorFormat	format
	)		const

noexcept

Retrieves read-only CPU-side memory for the layer's output.

This function is intended for debugging purposes. Element count of the buffer should be dimensions().elementCount(), multiplied by the size of the desired element type.

To check buffer sizes without preallocating memory, pass nullptr in pOutBuffer and zero in bufferByteCount. pBytesCopied will then receive the optimal size for pOutBuffer. Other uses of nullptr are considered errors.

pBytesCopied is not modified if this function fails.

Parameters

pOutBuffer	Memory region receiving the layer output data
bufferByteCount	Size of pOutBuffer in bytes
pBytesCopied	Optional output variable receiving the number of bytes copied into pOutBuffer
format	Format for the output buffer

◆ getData()

NeuralResult BasePrototypeLayer::getData	(	void **	ppOut,
		TensorFormat	format,
		const ILayer *	pRequestingLayer
	)

noexcept

Retrieves device-side memory for the layer's output.

◆ getInputLayers()

NeuralResult BasePrototypeLayer::getInputLayers ( ILayerList ** ppInputLayers ) const

noexcept

Retrieves the inputs for this layer.

Parameters

ppInputLayers Pointer receiving a reference to an ILayerList object.

Layer may create a new list or return an incremented reference to a cached object; this is an implementation detail.

In either case, caller is responsible for releasing its reference when done with *ppInputLayers.

◆ internalDimensions()

TensorDimension BasePrototypeLayer::internalDimensions ( ) const

noexcept

Retrieves the dimensions of the layer's output tensor as allocated internally.

Internal allocations may have additional padding or alignment restrictions.

See also: stepping()

◆ isAffected()

bool BasePrototypeLayer::isAffected ( ) const

noexcept

Returns the current status of the "affected" flag.

By default, layers should be "affected" after initialization.

◆ isPermanent()

bool BasePrototypeLayer::isPermanent ( ) const

noexcept

Returns the current status of the "permanent" flag.

By default, layers should not be "permanent."

◆ loadBasePrototypeInfo()

NeuralResult BasePrototypeLayer::loadBasePrototypeInfo	(	NetworkBackendId	backendId,
		const TensorFormat &	format,
		CompilationLevel	compilationLevel,
		const std::string &	currentImpl,
		const IParameterNode *	pParameters
	)

protected

Loads prototype information from a parameter node.

Replaces calls to setBasePrototypeInfo; this path is used by network builder classes instead.

The parameter node used for this call does not typically map to a <Parameters> block in XML. Instead, it must provide the following key-value pairs:

inputCount
dim
parameterKeys

Implementation-specific blocks must also be provided for at least the currentImpl parameter:

[currentImpl]_call (call script)
[currentImpl] (inference code)

Parameters

backendId	Backend to prefer in ILayer::backendId
format	Format to prefer in ILayer::tensorFormat
compilationLevel	Determines whether to decode currentImpl's code as source or binary
currentImpl	Implementation type string like "cuda_fp32_nchw"
pParameters	IParameterNode containing additional type details

◆ loadFromParameters()

NeuralResult BasePrototypeLayer::loadFromParameters ( const IParameterNode * pParameters )

noexcept

Loads layer parameters from a serialized key-value representation.

Tools such as ConverenceNG call this during network construction. Deployed applications may use this function themselves (often with a container class such as StringMapParameterNode) or may use layer-specific object interfaces; either approach is valid.

◆ name()

const char * BasePrototypeLayer::name ( ) const

noexcept

Retrieves the layer name.

Returns: Layer name; pointer must be valid until object is destroyed or setName is called.

◆ networkGenerationComplete()

NeuralResult BasePrototypeLayer::networkGenerationComplete ( )

finalnoexcept

Indicates the entire network has been generated.

Layers that rely on out-of-band synchronization to avoid multiple definitions of key types may use this callback to know that all layers have been visited, and can then reset any static variables they used to track this procedure.

This function will be called whether network generation succeeded or failed, but any failure result reported should be treated by the caller as an overall generation error.

◆ reshape()

NeuralResult BasePrototypeLayer::reshape ( )

noexcept

Initializes (or reinitializes) the layer implementation with the current set of parameters.

It is safe to call this multiple times, and may be necessary as parameters change.

◆ serializedType()

const char * BasePrototypeLayer::serializedType ( ) const

noexcept

Retrieves the layer type.

Returns: Layer type; pointer must be a non null string.

◆ setActivationCoefficient()

NeuralResult BasePrototypeLayer::setActivationCoefficient	(	std::size_t	coefficientIndex,
		float	value
	)

noexcept

Sets an activation coefficient.

The meaning of this value is activation-specific.

Parameters

coefficientIndex	Coefficient index
value	Value to store

◆ setActivationFunction()

NeuralResult BasePrototypeLayer::setActivationFunction ( ActivationFunctionId activationFunction )

noexcept

Sets the activation function attached to the layer.

Repeated calls to this function replace the activation rather than stacking them.

Parameters

activationFunction New activation function ID

◆ setAffected()

NeuralResult BasePrototypeLayer::setAffected ( bool affected )

noexcept

Sets or clears the "affected" flag on a layer's output tensor.

Layers that are "affected" have pending computation work and will be reevaluated as necessary during inference. Networks will clear the "affected" flag after successful inference.

Layers should mark themselves as "affected" in response to parameter changes that would be visible during inference. Examples of such changes include loadFromParameters and the IRuntimeOptionsLayer::setRuntimeOptionValue callback.

Parameters

affected New value for the "affected" flag

◆ setBasePrototypeInfo()

NeuralResult BasePrototypeLayer::setBasePrototypeInfo	(	std::size_t	inputsCount,
		const std::string &	type,
		NetworkBackendId	backend,
		const TensorFormat &	format,
		const std::string &	dim,
		const std::string &	impl
	)

protectednoexcept

Stores implementation-agnostic details of the prototype layer.

Parameters

inputsCount	Number of inputs accepted by the layer
type	Type to report in ILayer::serializedType
backend	Backend to prefer in ILayer::backendId
format	Format to prefer in ILayer::tensorFormat
dim	Script code that determines the size reported in ILayer::dimensions during reshape
impl	Implementation type string like "cuda_fp32_nchw"

◆ setInputLayer()

NeuralResult BasePrototypeLayer::setInputLayer	(	size_t	index,
		ILayer *	pLayer
	)

noexcept

Sets an input layer by index.

Input indices are explicitly allowed to be sparse in order to support optional inputs; usually such inputs are used to provide weights from other parts of the graph instead of going out to IWeightsLoader.

Parameters

index	Index of the input to assign
pLayer	Layer to assign as an input

◆ setName()

NeuralResult BasePrototypeLayer::setName ( const char * pName )

noexcept

Sets the layer name.

Layer names are unique within a network.

Parameters

pName New layer name; layer must copy this data.

◆ setNetworkRuntime()

NeuralResult BasePrototypeLayer::setNetworkRuntime ( INetworkRuntime * pNetworkRuntime )

noexcept

Informs the layer it has been attached to a new network.

INetwork::pushLayer implementations will call this function for you; host applications do not normally need to call this function.

Parameters

pNetworkRuntime New owning network

◆ setPermanent()

NeuralResult BasePrototypeLayer::setPermanent ( bool permanent )

noexcept

Sets or clears the "permanent" flag on a layer's output tensor.

Layers that are "permanent" are kept resident in memory and their tensor buffers are not reused. Setting too many layers as "permanent" will have negative effects on memory utilization.

Parameters

permanent New value for the "permanent" flag

◆ setWeightsName()

NeuralResult BasePrototypeLayer::setWeightsName ( const char * pWeightsName )

noexcept

Sets the name used to identify this layer's weights.

Parameters

pWeightsName New layer name to use for weights instead of name()

◆ stepping()

TensorDimension BasePrototypeLayer::stepping ( ) const

noexcept

Returns the internal storage stride consumed by this layer implementation.

Output tensors are padded to be multiples of this value.

◆ tensorBufferSize()

size_t BasePrototypeLayer::tensorBufferSize ( ) const

noexcept

Retrieve the size of the layer's output tensor buffer in bytes.

This result is defined as the number of elements in dimensions() multiplied by the size of an individual element (e.g., 4 for FP32).

◆ tensorFormat()

TensorFormat BasePrototypeLayer::tensorFormat ( ) const

noexcept

Returns the tensor format consumed by this layer implementation.

Networks will reshape tensors as necessary in order to match the return value.

◆ tensorInternalBufferSize()

size_t BasePrototypeLayer::tensorInternalBufferSize ( ) const

noexcept

Retrieves the dimensions of the layer's output tensor as allocated internally.

Internal allocations may have additional padding or alignment restrictions. The result is defined as the number of elements in internalDimensions() multiplied by the size of an individual element (e.g., 4 for FP32).

◆ weightsDimensions()

TensorDimension BasePrototypeLayer::weightsDimensions	(	const char *	pWeightsName,
		WeightsQuery	queryType
	)		const

noexcept

Retrieves the tensor dimension of a layer's named weight input.

Parameters

pWeightsName	Name of a weights object being queried
queryType	Type of dimensions being queried; see WeightsQuery for details

◆ weightsName()

const char * BasePrototypeLayer::weightsName ( ) const

noexcept

Retrieves the name used to identify this layer's weights.

By default, this should be the name assigned with setName. Alternate weights names allow layer aliasing and namespacing within subnetworks.

The documentation for this class was generated from the following files:

Host/Inc/BasePrototypeLayer.h
Host/Src/BasePrototypeLayer.cpp

Public Member Functions

Protected Member Functions

Protected Attributes

Detailed Description

Member Function Documentation

◆ activationCoefficient()

◆ activationFunction()

◆ addForwardEvalCall()

◆ addParameterKey()

◆ addPrototypeCode()

◆ backendId()

◆ dimensions()

◆ evaluateForward()

◆ generateLayerCpp()

◆ getConstData()

◆ getCpuConstData()

◆ getData()

◆ getInputLayers()

◆ internalDimensions()

◆ isAffected()

◆ isPermanent()

◆ loadBasePrototypeInfo()

◆ loadFromParameters()

◆ name()

◆ networkGenerationComplete()

◆ reshape()

◆ serializedType()

◆ setActivationCoefficient()

◆ setActivationFunction()

◆ setAffected()

◆ setBasePrototypeInfo()

◆ setInputLayer()

◆ setName()

◆ setNetworkRuntime()

◆ setPermanent()

◆ setWeightsName()

◆ stepping()

◆ tensorBufferSize()

◆ tensorFormat()

◆ tensorInternalBufferSize()

◆ weightsDimensions()

◆ weightsName()