CUDA Driver API :: CUDA Toolkit Documentation

6.35. Green Contexts

This section describes the APIs for creation and manipulation of green contexts in the CUDA driver. Green contexts are a lightweight alternative to traditional contexts, with the ability to pass in a set of resources that they should be initialized with. This allows the developer to represent distinct spatial partitions of the GPU, provision resources for them, and target them via the same programming model that CUDA exposes (streams, kernel launches, etc.).

There are 4 main steps to using these new set of APIs.

(1) Start with an initial set of resources, for example via cuDeviceGetDevResource. Only SM type is supported today.
(2) Partition this set of resources by providing them as input to a partition API, for example: cuDevSmResourceSplitByCount.
(3) Finalize the specification of resources by creating a descriptor via cuDevResourceGenerateDesc.
(4) Provision the resources and create a green context via cuGreenCtxCreate.

For CU_DEV_RESOURCE_TYPE_SM, the partitions created have minimum SM count requirements, often rounding up and aligning the minCount provided to cuDevSmResourceSplitByCount. The following is a guideline for each architecture and may be subject to change:

On Compute Architecture 6.X: The minimum count is 1 SM.
On Compute Architecture 7.X: The minimum count is 2 SMs and must be a multiple of 2.
On Compute Architecture 8.X: The minimum count is 4 SMs and must be a multiple of 2.
On Compute Architecture 9.0+: The minimum count is 8 SMs and must be a multiple of 8.

In the future, flags can be provided to tradeoff functional and performance characteristics versus finer grained SM partitions.

Even if the green contexts have disjoint SM partitions, it is not guaranteed that the kernels launched in them will run concurrently or have forward progress guarantees. This is due to other resources (like HW connections, see CUDA_DEVICE_MAX_CONNECTIONS) that could cause a dependency. Additionally, in certain scenarios, it is possible for the workload to run on more SMs than was provisioned (but never less). The following are two scenarios which can exhibit this behavior:

On Volta+ MPS: When CUDA_MPS_ACTIVE_THREAD_PERCENTAGE is used, the set of SMs that are used for running kernels can be scaled up to the value of SMs used for the MPS client.
On Compute Architecture 9.x: When a module with dynamic parallelism (CDP) is loaded, all future kernels running under green contexts may use and share an additional set of 2 SMs.

Classes

struct CUdevResource
struct CUdevSmResource

Typedefs

typedef CUdevResourceDesc_st * CUdevResourceDesc
typedef CUgreenCtx_st * CUgreenCtx

Enumerations

enum CUdevResourceType

Functions

CUresult cuCtxFromGreenCtx ( CUcontext* pContext, CUgreenCtx hCtx ): Converts a green context into the primary context.
CUresult cuCtxGetDevResource ( CUcontext hCtx, CUdevResource* resource, CUdevResourceType type ): Get context resources.
CUresult cuDevResourceGenerateDesc ( CUdevResourceDesc* phDesc, CUdevResource* resources, unsigned int nbResources ): Generate a resource descriptor.
CUresult cuDevSmResourceSplitByCount ( CUdevResource* result, unsigned int* nbGroups, const CUdevResource* input, CUdevResource* remaining, unsigned int useFlags, unsigned int minCount ): Splits CU_DEV_RESOURCE_TYPE_SM resources.
CUresult cuDeviceGetDevResource ( CUdevice device, CUdevResource* resource, CUdevResourceType type ): Get device resources.
CUresult cuGreenCtxCreate ( CUgreenCtx* phCtx, CUdevResourceDesc desc, CUdevice dev, unsigned int flags ): Creates a green context with a specified set of resources.
CUresult cuGreenCtxDestroy ( CUgreenCtx hCtx ): Destroys a green context.
CUresult cuGreenCtxGetDevResource ( CUgreenCtx hCtx, CUdevResource* resource, CUdevResourceType type ): Get green context resources.
CUresult cuGreenCtxRecordEvent ( CUgreenCtx hCtx, CUevent hEvent ): Records an event.
CUresult cuGreenCtxWaitEvent ( CUgreenCtx hCtx, CUevent hEvent ): Make a green context wait on an event.
CUresult cuStreamGetGreenCtx ( CUstream hStream, CUgreenCtx* phCtx ): Query the green context associated with a stream.

Typedefs

typedef CUdevResourceDesc_st * CUdevResourceDesc: An opaque descriptor handle. The descriptor encapsulates multiple created and configured resources. Created via cuDevResourceGenerateDesc
typedef CUgreenCtx_st * CUgreenCtx: A green context handle. This handle can be used safely from only one CPU thread at a time. Created via cuGreenCtxCreate

Enumerations

enum CUdevResourceType

Type of resource

Values

CU_DEV_RESOURCE_TYPE_INVALID = 0
CU_DEV_RESOURCE_TYPE_SM = 1: Streaming multiprocessors related information

Functions

CUresult cuCtxFromGreenCtx ( CUcontext* pContext, CUgreenCtx hCtx )

Converts a green context into the primary context.

Parameters

pContext: Returned primary context with green context resources
hCtx: Green context to convert

Returns

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE

Description

The API converts a green context into the primary context returned in pContext. It is important to note that the converted context pContext is a normal primary context but with the resources of the specified green context hCtx. Once converted, it can then be used to set the context current with cuCtxSetCurrent or with any of the CUDA APIs that accept a CUcontext parameter.

Users are expected to call this API before calling any CUDA APIs that accept a CUcontext. Failing to do so will result in the APIs returning CUDA_ERROR_INVALID_CONTEXT.

See also:

cuGreenCtxCreate

CUresult cuCtxGetDevResource ( CUcontext hCtx, CUdevResource* resource, CUdevResourceType type )

Get context resources.

Parameters

hCtx: - Context to get resource for
resource: - Output pointer to a CUdevResource structure
type: - Type of resource to retrieve

Returns

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_RESOURCE_TYPE, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_INVALID_CONTEXT

Description

Get the type resources available to the context represented by hCtx Note: The API is not supported on 32-bit platforms.

See also:

cuDevResourceGenerateDesc

CUresult cuDevResourceGenerateDesc ( CUdevResourceDesc* phDesc, CUdevResource* resources, unsigned int nbResources )

Generate a resource descriptor.

Parameters

phDesc: - Output descriptor
resources: - Array of resources to be included in the descriptor
nbResources: - Number of resources passed in resources

Returns

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_INVALID_RESOURCE_TYPE, CUDA_ERROR_INVALID_RESOURCE_CONFIGURATION

Description

Generates a resource descriptor with the set of resources specified in resources. The generated resource descriptor is necessary for the creation of green contexts via the cuGreenCtxCreate API. The API expects nbResources == 1, as there is only one type of resource and merging the same types of resource is currently not supported.

Note: The API is not supported on 32-bit platforms.

See also:

cuDevSmResourceSplitByCount

CUresult cuDevSmResourceSplitByCount ( CUdevResource* result, unsigned int* nbGroups, const CUdevResource* input, CUdevResource* remaining, unsigned int useFlags, unsigned int minCount )

Splits CU_DEV_RESOURCE_TYPE_SM resources.

Parameters

result: - Output array of CUdevResource resources. Can be NULL to query the number of groups.
nbGroups: - This is a pointer, specifying the number of groups that would be or should be created as described below.
input: - Input SM resource to be split. Must be a valid CU_DEV_RESOURCE_TYPE_SM resource.
remaining: - If the input resource cannot be cleanly split among nbGroups, the remaining is placed in here. Can be ommitted (NULL) if the user does not need the remaining set.
useFlags: - Flags specifying how these partitions are used or which constraints to abide by when splitting the input.
minCount: - Minimum number of SMs required

Returns

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_DEVICE, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_INVALID_RESOURCE_TYPE, CUDA_ERROR_INVALID_RESOURCE_CONFIGURATION

Description

Splits CU_DEV_RESOURCE_TYPE_SM resources into nbGroups, adhering to the minimum SM count specified in minCount and the usage flags in useFlags. If result is NULL, the API simulates a split and provides the amount of groups that would be created in nbGroups. Otherwise, nbGroups must point to the amount of elements in result and on return, the API will overwrite nbGroups with the amount actually created. The groups are written to the array in result. nbGroups can be less than the total amount if a smaller number of groups is needed.

This API is used to spatially partition the input resource. The input resource needs to come from one of cuDeviceGetDevResource, cuCtxGetDevResource, or cuGreenCtxGetDevResource. A limitation of the API is that the output results cannot be split again without first creating a descriptor and a green context with that descriptor.

When creating the groups, the API will take into account the performance and functional characteristics of the input resource, and guarantee a split that will create a disjoint set of symmetrical partitions. This may lead to less groups created than purely dividing the total SM count by the minCount due to cluster requirements or alignment and granularity requirements for the minCount.

The remainder set, might not have the same functional or performance guarantees as the groups in result. Its use should be carefully planned and future partitions of the remainder set are discouraged.

A successful API call must either have:

A valid array of result pointers of size passed in nbGroups, with Input of type CU_DEV_RESOURCE_TYPE_SM. Value of minCount must be between 0 and the SM count specified in input. remaining and useFlags are optional.
NULL passed in for result, with a valid integer pointer in nbGroups and Input of type CU_DEV_RESOURCE_TYPE_SM. Value of minCount must be between 0 and the SM count specified in input. This queries the number of groups that would be created by the API.

Note: The API is not supported on 32-bit platforms.

CUresult cuDeviceGetDevResource ( CUdevice device, CUdevResource* resource, CUdevResourceType type )

Get device resources.

Parameters

device: - Device to get resource for
resource: - Output pointer to a CUdevResource structure
type: - Type of resource to retrieve

Returns

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_RESOURCE_TYPE, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_INVALID_DEVICE

Description

Get the type resources available to the device. This may often be the starting point for further partitioning or configuring of resources.

Note: The API is not supported on 32-bit platforms.

See also:

cuDevResourceGenerateDesc

CUresult cuGreenCtxCreate ( CUgreenCtx* phCtx, CUdevResourceDesc desc, CUdevice dev, unsigned int flags )

Creates a green context with a specified set of resources.

Parameters

phCtx: - Pointer for the output handle to the green context
desc: - Descriptor generated via cuDevResourceGenerateDesc which contains the set of resources to be used
dev: - Device on which to create the green context.
flags: - One of the supported green context creation flags. CU_GREEN_CTX_DEFAULT_STREAM is required.

Returns

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_DEVICE, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_SUPPORTED, CUDA_ERROR_OUT_OF_MEMORY

Description

This API creates a green context with the resources specified in the descriptor desc and returns it in the handle represented by phCtx. This API will retain the primary context on device dev, which will is released when the green context is destroyed. It is advised to have the primary context active before calling this API to avoid the heavy cost of triggering primary context initialization and deinitialization multiple times.

The API does not set the green context current. In order to set it current, you need to explicitly set it current by first converting the green context to a CUcontext using cuCtxFromGreenCtx and subsequently calling cuCtxSetCurrent / cuCtxPushCurrent. It should be noted that a green context can be current to only one thread at a time. There is no internal synchronization to make API calls accessing the same green context from multiple threads work.

Note: The API is not supported on 32-bit platforms.

The supported flags are:

CU_GREEN_CTX_DEFAULT_STREAM : Creates a default stream to use inside the green context. Required.

CUresult cuGreenCtxDestroy ( CUgreenCtx hCtx )

Destroys a green context.

Parameters

hCtx: - Green context to be destroyed

Returns

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_CONTEXT_IS_DESTROYED

Description

Destroys the green context, releasing the primary context of the device that this green context was created for. Any resources provisioned for this green context (that were initially available via the resource descriptor) are released as well.

See also:

cuGreenCtxCreate, cuCtxDestroy

CUresult cuGreenCtxGetDevResource ( CUgreenCtx hCtx, CUdevResource* resource, CUdevResourceType type )

Get green context resources.

Parameters

hCtx: - Green context to get resource for
resource: - Output pointer to a CUdevResource structure
type: - Type of resource to retrieve

Returns

CUDA_SUCCESS CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_RESOURCE_TYPE, CUDA_ERROR_INVALID_VALUE

Description

Get the type resources available to the green context represented by hCtx

See also:

cuDevResourceGenerateDesc

CUresult cuGreenCtxRecordEvent ( CUgreenCtx hCtx, CUevent hEvent )

Records an event.

Parameters

hCtx: - Green context to record event for
hEvent: - Event to record

Returns

CUDA_SUCCESS CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_HANDLE

Description

Captures in all the activities of the green context of at the time of this call. and must be from the same CUDA context. Calls such as cuEventQuery() or cuGreenCtxWaitEvent() will then examine or wait for completion of the work that was captured. Uses of hCtx after this call do not modify hEvent.

Note:

The API will return an error if the specified green context hCtx has a stream in the capture mode. In such a case, the call will invalidate all the conflicting captures.

See also:

cuGreenCtxWaitEvent, cuEventRecord

CUresult cuGreenCtxWaitEvent ( CUgreenCtx hCtx, CUevent hEvent )

Make a green context wait on an event.

Parameters

hCtx: - Green context to wait
hEvent: - Event to wait on (may not be NULL)

Returns

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_HANDLE

Description

Makes all future work submitted to green context wait for all work captured in . The synchronization will be performed on the device and will not block the calling CPU thread. See cuGreenCtxRecordEvent() for details on what is captured by an event.

Note:

The API will return an error and invalidate the capture if the specified event hEvent is part of an ongoing capture sequence.

CUresult cuStreamGetGreenCtx ( CUstream hStream, CUgreenCtx* phCtx )

Query the green context associated with a stream.

Parameters

hStream: - Handle to the stream to be queried
phCtx: - Returned green context associated with the stream

Returns

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_HANDLE,

Description

Returns the CUDA green context that the stream is associated with, or NULL if the stream is not associated with any green context.

The stream handle hStream can refer to any of the following:

a stream created via any of the CUDA driver APIs such as cuStreamCreate. If during stream creation the context that was active in the calling thread was obtained with cuCtxFromGreenCtx, that green context is returned in phCtx. Otherwise, *phCtx is set to NULL instead.
special stream such as the NULL stream or CU_STREAM_LEGACY. In that case if context that is active in the calling thread was obtained with cuCtxFromGreenCtx, that green context is returned. Otherwise, *phCtx is set to NULL instead.

Passing an invalid handle will result in undefined behavior.

Note:

Note that this function may also return error codes from previous, asynchronous launches.