You are here:

Programs

This chapter describes the different kinds of OptiX programs, which provide programmatic control over ray intersection, shading, and other general computation in OptiX ray tracing kernels. OptiX programs are associated with binding points serving different semantic roles during a ray tracing computation. Like other concepts, OptiX abstracts programs through its object model as program objects.

4.1. OptiX Program Objects

The central theme of the OptiX API is programmability. OptiX programs are written in CUDA C, and specified to the API through a string or file containing PTX, the parallel thread execution virtual assembly language associated with CUDA. The nvcc compiler that is distributed with the CUDA SDK is used to create PTX in conjunction with the OptiX header files.

These PTX files are then bound to Program objects via the host API. Program objects can be used for any of the OptiX program types discussed later in this section.

4.1.1. Managing Program Objects

OptiX provides two API entry points for creating Program objects: rtProgramCreateFromPTXString, and rtProgramCreateFromPTXFile. The former creates a new Program object from a string of PTX source code. The latter creates a new Program object from a file of PTX source on disk:

RTcontext context = ...;
const char *ptx_filename = ...;
const char *program_name = ...;
RTprogram program = ...;
rtProgramCreateFromPTXFile( context, ptx_filename,
			    function_name, &program );

In this example, ptx_filename names a file of PTX source on disk, and function_name names a particular function of interest within that source. If the program is ill-formed and cannot compile, these entry points return an error code.

Program objects may be checked for completeness using the rtProgramValidate function, as the following example demonstrates:

if( rtProgramValidate(context, program)!=RT_SUCCESS )
{
    printf( "Program is not complete." );
}

An error code returned from rtProgramValidate indicates an error condition due to the program object or any other objects bound to it.

Finally, the rtProgramGetContext function reports the context object owning the program object, while rtProgramDestroy invalidates the object and frees all resources related to it.

4.1.2. Communication Through Variables

OptiX program objects communicate with the host program through variables. Variables are declared in an OptiX program using the rtDeclareVariable macro:

rtDeclareVariable( float, x, , );

This declaration creates a variable named x of type float which is available to both the host program through the OptiX variable object API, and to the device program code through usual C language semantics. Notice that the last two arguments are left blank in this example. The commas must still be specified.

Taking the address of a variable on the device is not supported. This means that pointers and references to x in the above example are not allowed. If, for instance, you needed to pass x into a function taking a float* argument you would need to first copy x into a stack variable and then pass in the address of this local variable:

void my_func( float* my_float) {…}

RT_PROGRAM call_my_func()
{
     my_func(&x);         // not allowed
     float local_x = x;
     my_func(&local_x);   // allowed
}

Variables declared in this way may be read and written by the host program through the rtVariableGet* and rtVariableSet* family of functions. When variables are declared this way, they are implicitly const-qualified from the device program’s perspective. If communication from the program to the host is necessary, an rtBuffer should be used instead.

As of OptiX 2.0, variables may be declared inside arbitrarily nested namespaces to avoid name conflicts. References from the host program to namespace-enclosed OptiX variables will need to include the full namespace.

Program variables may also be declared with semantics. Declaring a variable with a semantic binds the variable to a special value which OptiX manages internally over the lifetime of the ray tracing kernel. For example, declaring a variable with the rtCurrentRay semantic creates a special read-only program variable that mirrors the value of the Ray currently being traced through the program flow:

rtDeclareVariable( OptiX::Ray, ray, rtCurrentRay, );

Variables declared with a built-in semantic exist only during ray tracing kernel runtime and may not be modified or queried by the host program. Unlike regular variables, some semantic variables may be modified by the device program.

Declaring a variable with an annotation associates with it a read-only string which, for example, may be interpreted by the host program as a human-readable description of the variable. For example:

rtDeclareVariable( float, shininess, , "The shininess of the sphere" );

A variable's annotation is the fourth argument of rtDeclareVariable, following the variable’s optional semantic argument. The host program may query a variable’s annotation with the rtVariableGetAnnotation function.

4.1.3. Internally Provided Semantics

OptiX manages five internal semantics for program variable binding. Table 5 summarizes in which types of program these semantics are available, along with their access rules from device programs and a brief description of their meaning.

Name	`rtLaunchIndex`	`rtCurrentRay`	`rtPayload`	`rtIntersectionDistance`	`rtSubframeIndex`
Access	read only	read only	read/write	read only	read only
Description	The unique index identifying each thread launched by `rtContextLaunch{1\|2\|3}D`.	The state of the current ray.	The state of the current ray’s payload of user-defined data.	The parametric distance from the current ray’s origin to the closest intersection point yet discovered.	The unique index identifying each subframe in a progressive launch. Zero for non-progressive launches.
Ray Generation	Yes	No	No	No	Yes
Exception	Yes	No	No	No	Yes
Closest Hit	Yes	Yes	Yes	Yes	Yes
Any Hit	Yes	Yes	Yes	Yes	Yes
Miss	Yes	Yes	Yes	No	Yes
Intersection	Yes	Yes	No	Yes	Yes
Bounding Box	No	No	No	No	No
Visit	Yes	Yes	Yes	Yes	Yes

Table 5 Semantic Variables

4.1.4. Attribute Variables

In addition to the semantics provided by OptiX, variables may also be declared with user-defined semantics called attributes. Unlike built-in semantics, the value of variables declared in this way must be managed by the programmer. Attribute variables provide a mechanism for communicating data between the intersection program and the shading programs (e.g., surface normal, texture coordinates). Attribute variables may only be written in an intersection program between calls to rtPotentialIntersection and rtReportIntersection. Although OptiX may not find all object intersections in order along the ray, the value of the attribute variable is guaranteed to reflect the value at the closest intersection at the time that the closest hit program is invoked. For this reason, programs should use attribute variables (as opposed to the ray payload) to communicate information about the local hit point between intersection and shading programs.

The following example declares an attribute variable of type float3 named normal. The semantic association of the attribute is specified with the user-defined name normal_vec. This name is arbitrary, and is the link between the variable declared here and another variable declared in the closest hit program. The two attribute variables need not have the same name as long as their attribute names match.

rtDeclareVariable( float3, normal, attribute
 		   normal_vec, );

4.1.5. Program Variable Scoping

OptiX program variables can have their values defined in two ways: static initializations, and (more typically) by variable declarations attached to API objects. A variable declared with a static initializer will only use that value if it does not find a definition attached to an API object. A declaration with static initialization is written:

rtDeclareVariable( float, x, , ) = 5.0f;

The OptiX variable scoping rules provide a valuable inheritance mechanism that is designed to create compact representations of material and object parameters. To enable this, each program type also has an ordered list of scopes through which it will search for variable definitions in order. For example, a closest hit program that refers to a variable named color will search the Program, GeometryInstance, Material and Context API objects for definitions created with the rt*DeclareVariable functions, in that order. Similar to scoping rules in a programming language, variables in one scope will shadow those in another scope. summarizes the scopes that are searched for variable declarations for each type of program.

Ray Generation	Program	Context
Exception	Program	Context
Closest Hit	Program	GeometryInstance	Material	Context
Any Hit	Program	GeometryInstance	Material	Context
Miss	Program	Context
Intersection	Program	GeometryInstance	Geometry	Context
Bounding Box	Program	GeometryInstance	Geometry	Context
Visit	Program	Node

Table 6 Scope search order for each type of program (from left to right)

It is possible for a program to find multiple definitions for a variable in its scopes depending upon where the program is called. For example, a closest hit program may be attached to several Material objects and reference a variable named shininess. We can attach a variable definition to the Material object as well as attach a variable definition to specific GeometryInstance objects that we create that reference that Material.

During execution of a specific GeometryInstance's closest hit program, the value of shininess depends on whether the particular instance has a definition attached: if the GeometryInstance defines shininess, then that value will be used. Otherwise, the value will be taken from the Material object. As you can see from Table 6 above, the program searches the GeometryInstance scope before the Material scope. Variables with definitions in multiple scopes are said to be dynamic and may incur a performance penalty. Dynamic variables are therefore best used sparingly.

4.1.6. Program Variable Transformation

Recall that rays have a projective transformation applied to them upon encountering Transform nodes during traversal. The transformed ray is said to be in object space, while the original ray is said to be in world space.

Programs with access to the rtCurrentRay semantic operate in the spaces summarized in Table 7:

Ray Generation	World
Closest Hit	World
Any Hit	Object
Miss	World
Intersection	Object
Visit	Object

Table 7 Space of rtCurrentRay for Each Program Type

To facilitate transforming variables from one space to another, OptiX’s CUDA C API provides a set of functions:

__device__ float3 rtTransformPoint( RTtransformkind kind,
 				    const float3& p )
__device__ float3 rtTransformVector( RTtransformkind kind,
				    const float3& v )
__device__ float3 rtTransformNormal( RTtransformkind kind,
				    const float3& n )
__device__ void rtGetTransform( RTtransformkind kind,
				float matrix[16] )

The first three functions transform a float3, interpreted as a point, vector, or normal vector, from object to world space or vice versa depending on the value of a RTtransformkind flag passed as an argument. rtGetTransform returns the four-by-four matrix representing the current transformation from object to world space (or vice versa depending on the RTtransformkind argument). For best performance, use the rtTransform* functions rather than performing your own explicit matrix multiplication with the result of rtGetTransform.

A common use case of variable transformation occurs when interpreting attributes passed from the intersection program to the closest hit program. Intersection programs often produce attributes, such as normal vectors, in object space. Should a closest hit program wish to consume that attribute, it often must transform the attribute from object space to world space:

float3 n = rtTransformNormal( RT_OBJECT_TO_WORLD, normal );

4.2. Which OptiX calls are supported where?

Not all OptiX function calls are supported in all types of user provided programs. For example, it doesn’t make sense to spawn a new ray inside an intersection program, so this behavior is disallowed. A complete table of what device-side functions are allowed is given below:

	Ray Generation	Exception	Closest Hit	Any Hit	Miss	Intersection	Bounding Box	Visit	Bindless Callable Program
`rtTransform*`			●	●	●	●	●	●
`rtTrace`	●		●		●
`rtThrow`	●		●	●	●	●	●	●	●
`rtPrintf`	●	●	●	●	●	●	●	●	●
`rtTerminateRay`				●
`rtIgnoreIntersection`				●
`rtIntersectChild`								●
`rtPotentialIntersection`						●
`rtReportIntersection`						●
Callable Program	●	●	●	●	●	●	●	●	●

Table 8 Device API Function Allowed Scopes

4.3. Ray Generation Programs

A ray generation program serves as the first point of entry upon a call to rtContextLaunch{1|2|3}D. As such, it serves a role analogous to the main function of a C program. Like C's main function, any subsequent computation performed by the kernel, from casting rays to reading and writing from buffers, is spawned by the ray generation program. However, unlike a serial C program, an OptiX ray generation program is executed many times in parallel — once for each thread implied by rtContextLaunch{1|2|3}D's parameters.

Each thread is assigned a unique rtLaunchIndex. The value of this variable may be used to distinguish it from its neighbors for the purpose of, e.g., writing to a unique location in an rtBuffer:

rtBuffer<float, 1> output_buffer;
rtDeclareVariable( unsigned int, index, rtLaunchIndex, );
...;
float result = ...;
output_buffer[index] = result;

In this case, the result is written to a unique location in the output buffer. In general, a ray generation program may write to any location in output buffers, as long as care is taken to avoid race conditions between buffer writes.

4.3.1. Entry Point Indices

To configure a ray tracing kernel launch, the programmer must specify the desired ray generation program using an entry point index. The total number of entry points for a context is specified with rtContextSetEntryPointCount:

RTcontext context = ...;
unsigned int num_entry_points = ...;
rtContextSetEntryPointCount( context, num_entry_points );

OptiX requires that each entry point index created in this manner have a ray generation program associated with it. A ray generation program may be associated with multiple indices. Use the rtContextSetRayGenerationProgram function to associate a ray generation program with an entry point index in the range [0, num_entry_points]:

RTprogram prog = ...;
// index is >= 0 and < num_entry_points
unsigned int index = ...;
rtContextSetRayGenerationProgram( context, index, prog );

4.3.2. Launching a Ray Generation Program

rtContextLaunch{1|2|3}D takes as a parameter the entry point index of the ray generation program to launch:

RTsize width = ...;
rtContextLaunch1D( context, index, width );

If no ray generation program has been associated with the entry point index specified by rtContextLaunch{1|2|3}D's parameter, the launch will fail.

4.3.3. Ray Generation Program Function Signature

In CUDA C, ray generation programs return void and take no parameters. Like all OptiX programs, ray generation programs written in CUDA C must be tagged with the RT_PROGRAM qualifier. The following snippet shows an example ray generation program function prototype:

RT_PROGRAM void ray_generation_program( void );

4.3.4. Example Ray Generation Program

The following example ray generation program implements a pinhole camera model in a rendering application. This example demonstrates that ray generation programs act as the gateway to all ray tracing computation by initiating traversal through the rtTrace function, and often store the result of a ray tracing computation to an output buffer.

Note the variables eye, U, V, and W. Together, these four variables allow the host API to specify the position and orientation of the camera.

rtBuffer<uchar4, 2> output_buffer;
rtDeclareVariable( uint2, index, rtLaunchIndex, );
rtDeclareVariable( rtObject, top_object, , );
rtDeclareVariable(float3,        eye, , );
rtDeclareVariable(float3,        U, , );
rtDeclareVariable(float3,        V, , );
rtDeclareVariable(float3,        W, , );

struct Payload
{
    uchar4 result;
};

RT_PROGRAM void pinhole_camera( void )
{
    uint2 screen = output_buffer.size();

    float2 d = make_float2( index ) /
		make_float2( screen ) * 2.f - 1.f;
    float3 origin = eye;
    float3 direction = normalize( d.x*U + d.y*V + W );

    OptiX::Ray ray =
	OptiX::make_Ray( origin, direction, 0,
			 0.05f, RT_DEFAULT_MAX );

    Payload payload;

    rtTrace( top_object, ray, payload );

    output_buffer[index] = payload.result;
}

4.4. Exception Programs

OptiX ray tracing kernels invoke an exception program when certain types of serious errors are encountered. Exception programs provide a means of communicating to the host program that something has gone wrong during a launch. The information an exception program provides may be useful in avoiding an error state in a future launch or for debugging during application development.

4.4.1. Exception Program Entry Point Association

An exception program is associated with an entry point using the rtContextSetExceptionProgram function:

RTcontext context = ...;
RTprogram program = ...;
// index is >= 0 and < num_entry_points
unsigned int index = ...;
rtContextSetExceptionProgram( context, index, program );

Unlike with ray generation programs, the programmer need not associate an exception program with an entry point. By default, entry points are associated with an internally provided exception program that silently ignores errors.

As with ray generation programs, a single exception program may be associated with many different entry points.

4.4.2. Exception Types

OptiX detects a number of different error conditions that result in exception programs being invoked. An exception is identified by its code, which is an integer defined by the OptiX API. For example, the exception code for the stack overflow exception is RT_EXCEPTION_STACK_OVERFLOW.

The type or code of a caught exception can be queried by calling rtGetExceptionCode from the exception program. More detailed information on the exception can be printed to the standard output using rtPrintExceptionDetails.

In addition to the built in exception types, OptiX provides means to introduce user-defined exceptions. Exception codes between RT_EXCEPTION_USER (0x400) and 0xFFFF are reserved for user exceptions. To trigger such an exception, rtThrow is used:

// Define user-specified exception codes.
#define MY_EXCEPTION_0 RT_EXCEPTION_USER + 0
#define MY_EXCEPTION_1 RT_EXCEPTION_USER + 1
RT_PROGRAM void some_program()
{
...
// Throw user exceptions from within a program.
if( condition0 )
rtThrow( MY_EXCEPTION_0 );
if( condition1 )
rtThrow( MY_EXCEPTION_1 );
...
}

In order to control the runtime overhead involved in checking for error conditions, individual types of exceptions may be switched on or off using rtContextSetExceptionEnabled. Disabling exceptions usually results in faster performance, but is less safe. By default, only RT_EXCEPTION_STACK_OVERFLOW is enabled. During debugging, it is often useful to turn on all available exceptions. This can be achieved with a single call:

...
rtContextSetExceptionEnabled(context, RT_EXCEPTION_ALL, 1);
...

4.4.3. Exception Program Function Signature

In CUDA C, exception programs return void, take no parameters, and use the RT_PROGRAM qualifier:

RT_PROGRAM void exception_program( void );

4.4.4. Example Exception Program

The following example code demonstrates a simple exception program which indicates a stack overflow error by outputting a special value to an output buffer which is otherwise used as a buffer of pixels. In this way, the exception program indicates the rtLaunchIndex of the failed thread by marking its location in a buffer of pixels with a known color. Exceptions which are not caused by a stack overflow are reported by printing their details to the console.

rtDeclareVariable( int, launch_index, rtLaunchIndex, );
rtDeclareVariable( float3, error, , ) = make_float3(1,0,0);
rtBuffer<float3, 2> output_buffer;

RT_PROGRAM void exception_program( void )
{
    const unsigned int code = rtGetExceptionCode();

    if( code == RT_EXCEPTION_STACK_OVERFLOW )
	output_buffer[launch_index] = error;
    else
    rtPrintExceptionDetails();
}

4.5. Closest Hit Programs

After a call to the rtTrace function, OptiX invokes a closest hit program once it identifies the nearest primitive intersected along the ray from its origin. Closest hit programs are useful for performing primitive-dependent processing that should occur once a ray’s visibility has been established. A closest hit program may communicate the results of its computation by modifying per-ray data or writing to an output buffer. It may also recursively call the rtTrace function. For example, a computer graphics application might implement a surface shading algorithm with a closest hit program.

4.5.1. Closest Hit Program Material Association

A closest hit program is associated with each (material, ray_type) pair. Each pair's default program is a no-op. This is convenient when an OptiX application requires many types of rays but only a small number of those types require special closest hit processing.

The programmer may change an association with the rtMaterialSetClosestHitProgram function:

RTmaterial material = ...;
RTprogram program = ...;
unsigned int type = ...;
rtMaterialSetClosestHitProgram( material, type, program );

4.5.2. Closest Hit Program Function Signature

In CUDA C, closest hit programs return void, take no parameters, and use the RT_PROGRAM qualifier:

RT_PROGRAM void closest_hit_program( void );

4.5.3. Recursion in a Closest Hit Program

Though the rtTrace function is available to all programs with access to the rtLaunchIndex semantic, a common use case of closest hit programs is to perform recursion by tracing more rays upon identification of the closest surface intersected by a ray. For example, a computer graphics application might implement Whitted-style ray tracing by recursive invocation of rtTrace and closest hit programs. Care must be used to limit the recursion depth to avoid stack overflow.

4.5.4. Example Closest Hit Program

The following code example demonstrates a closest hit program that transforms the normal vector computed by an intersection program (not shown) from the intersected primitive's local coordinate system to a global coordinate system. The transformed normal vector is returned to the calling function through a variable declared with the rtPayload semantic. Note that this program is quite trivial; normally the transformed normal vector would be used by the closest hit program to perform some calculation (e.g., lighting). See the OptiX Quickstart Guide for examples.

rtDeclareVariable( float3, normal, attribute normal_vec, );
struct Payload
{
    float3 result;
};
rtDeclareVariable( Payload, ray_data, rtPayload, );

RT_PROGRAM void closest_hit_program( void )
{
    float3 norm;
    norm = rtTransformNormal( RT_OBJECT_TO_WORLD, normal );
    norm = normalize( norm );
    ray_data.result = norm;
}

4.6. Any Hit Programs

Instead of the closest intersected primitive, an application may wish to perform some computation for any primitive intersection that occurs along a ray cast during the rtTrace function; this usage model can be implemented using any hit programs. For example, a rendering application may require some value to be accumulated along a ray at each surface intersection.

4.6.1. Any Hit Program Material Association

Like closest hit programs, an any hit program is associated with each (material, ray_type) pair. Each pair's default association is with an internally-provided any hit program which implements a no-op.

The rtMaterialSetAnyHitProgram function changes a (material, ray_type) pair’s association:

RTmaterial material = ...;
RTprogram program = ...;
unsigned int type = ...;
rtMaterialSetAnyHitProgram( material, type, program );

4.6.2. Termination in an Any Hit Program

A common OptiX usage pattern is for an any hit program to halt ray traversal upon discovery of an intersection. The any hit program can do this by calling rtTerminateRay. This technique can increase performance by eliminating redundant traversal computations when an application only needs to determine whether any intersection occurs and identification of the nearest intersection is irrelevant. For example, a rendering application might use this technique to implement shadow ray casting, which is often a binary true or false computation.

4.6.3. Any Hit Program Function Signature

In CUDA C, any hit programs return void, take no parameters, and use the RT_PROGRAM qualifier:

RT_PROGRAM void any_hit_program( void );

4.6.4. Example Any Hit Program

The following code example demonstrates an any hit program that implements early termination of shadow ray traversal upon intersection. The program also sets the value of a per-ray payload member, attenuation, to zero to indicate the material associated with the program is totally opaque.

struct Payload
{
    float attenuation;
};

rtDeclareVariable( Payload, payload, rtPayload, );

RT_PROGRAM void any_hit_program( void )
{
    payload.attenuation = 0.f;

    rtTerminateRay();
}

4.7. Miss Programs

When a ray traced by the rtTrace function intersects no primitive, a miss program is invoked. Miss programs may access variables declared with the rtPayload semantic in the same way as closest hit and any hit programs.

4.7.1. Miss Program Function Signature

In CUDA C, miss programs return void, take no parameters, and use the RT_PROGRAM qualifier:

RT_PROGRAM void miss_program( void );

4.7.2. Example Miss Program

In a computer graphics application, the miss program may implement an environment mapping algorithm using a simple gradient, as this example demonstrates:

rtDeclareVariable( float3, environment_light, , );
rtDeclareVariable( float3, environment_dark, , );
rtDeclareVariable( float3, up, , );

struct Payload
{
    float3 result;
};
rtDeclareVariable( Payload, payload, rtPayload, );
rtDeclareVariable( OptiX::Ray, ray, rtCurrentRay, );

RT_PROGRAM void miss(void)
{
    float t = max( dot( ray.direction, up ), 0.0f );
    payload.result = lerp( environment_light,
  			   environment_dark, t );
}

4.8. Intersection and Bounding Box Programs

Intersection and bounding box programs represents geometry by implementing ray-primitive intersection and bounding algorithms. These program types are associated with and queried from Geometry objects using rtGeometrySetIntersectionProgram, rtGeometryGetIntersectionProgram, rtGeometrySetBoundingBoxProgram, and rtGeometryGetBoundingBoxProgram.

4.8.1. Intersection and Bounding Box Program Function Signatures

Like the previously discussed OptiX programs, in CUDA C, intersection and bounding box programs return void and use the RT_PROGRAM qualifier. Because Geometry objects are collections of primitives, these functions require a parameter to specify the index of the primitive of interest to the computation. This parameter is always in the range [0, N), where N is given by the argument to the rtGeometrySetPrimitiveCount function.

Additionally, the bounding box program requires an array of floats to store the result of the bounding box computation, yielding these function signatures:

RT_PROGRAM void intersection_program( int prim_index);
RT_PROGRAM void bounding_box_program( int prim_index,
float result[6]);

4.8.2. Reporting Intersections

Ray traversal invokes an intersection program when the current ray encounters one of a Geometry object's primitives. It is the responsibility of an intersection program to compute whether the ray intersects with the primitive, and to report the parametric t-value of the intersection. Additionally, the intersection program is responsible for computing and reporting any details of the intersection, such as surface normal vectors, through attribute variables.

Once the intersection program has determined the t-value of a ray-primitive intersection, it must report the result by calling a pair of OptiX functions, rtPotentialIntersection and rtReportIntersection:

__device__ bool rtPotentialIntersection( float tmin )
__device__ bool rtReportIntersection( unsigned int material )

rtPotentialIntersection takes the intersection's t-value as an argument. If the t-value could potentially be the closest intersection of the current traversal the function narrows the t-interval of the current ray accordingly and returns true. If the t-value lies outside the t-interval the function returns false, whereupon the intersection program may trivially return.

If rtPotentialIntersection returns true, the intersection program may then set any attribute variable values and call rtReportIntersection. This function takes an unsigned int specifying the index of a material that must be associated with an any hit and closest hit program. This material index can be used to support primitives of several different materials flattened into a single Geometry object. Traversal then immediately invokes the corresponding any hit program. Should that any hit program invalidate the intersection via the rtIgnoreIntersection function, then rtReportIntersection will return false. Otherwise, it will return true.

The values of attribute variables must be modified only between the call to rtPotentialIntersection and the call to rtReportIntersection. The result of writing to an attribute variable outside the bounds of these two calls is undefined. The values of attribute variables written in this way are accessible by any hit and closest hit programs.

If the any hit program invokes rtIgnoreIntersection, any attributes computed will be reset to their previous values and the previous t-interval will be restored.

If no intersection exists between the current ray and the primitive, an intersection program need only return.

4.8.3. Specifying Bounding Boxes

Acceleration structures use bounding boxes to bound the spatial extent of scene primitives to accelerate the performance of ray traversal. A bounding box program’s responsibility is to describe the minimal three dimensional axis-aligned bounding box that contains the primitive specified by its first argument and store the result in its second argument. Bounding boxes are always specified in object space, so the user should not apply any transformations to them.

For correct results bounding boxes must merely contain the primitive. For best performance bounding boxes should be as tight as possible.

4.8.4. Example Intersection and Bounding Box Programs

The following code demonstrates how an intersection and bounding box program combine to describe a simple geometric primitive. The sphere is a simple analytic shape with a well-known ray intersection algorithm. In the following code example, the sphere variable encodes the center and radius of a three-dimensional sphere in a float4:

rtDeclareVariable( float4, sphere, , );
rtDeclareVariable( OptiX::Ray, ray, rtCurrentRay, );
rtDeclareVariable( float3, normal, attribute normal );

RT_PROGRAM void intersect_sphere( int prim_index )
{
    float3 center = make_float3( sphere.x, sphere.y,
				 sphere.z );
    float radius = sphere.w;
    float3 O = ray.origin - center;
    float b = dot( O, ray.direction );
    float c = dot( O, O ) - radius*radius;
    float disc = b*b - c;
    if( disc > 0.0f ) {
	float sdisc = sqrtf( disc );
	float root1 = (-b - sdisc);
	bool check_second = true;
	if( rtPotentialIntersection( root1 ) ) {
	    normal = (O + root1*D) / radius;
	    if( rtReportIntersection( 0 ) )
		check_second = false;
	}
	if( check_second ) {
	    float root2 = (-b + sdisc);
	    if( rtPotentialIntersection( root2 ) ) {
		normal = (O + root2*D) / radius;
		rtReportIntersection( 0 );
	    }
	}
    }
}

Note that this intersection program ignores its prim_index argument and passes a material index of 0 to rtReportIntersection; it represents only the single primitive of its corresponding Geometry object.

The bounding box program for the sphere is very simple:

RT_PROGRAM void bound_sphere( int, float result[6] )
{
    float3 cen = make_float3( sphere.x, sphere.y, sphere.z );
    float3 rad = make_float3( sphere.w, sphere.w, sphere.w );

    // compute the minimal and maximal corners of
    // the axis-aligned bounding box
    float3 min = cen - rad;
    float3 max = cen + rad;
    // store results in order
    result[0] = min.x;
    result[1] = min.y;
    result[2] = min.z;
    result[3] = max.x;
    result[4] = max.y;
    result[5] = max.z;
}

4.9. Selector Programs

Ray traversal invokes selector visit programs upon encountering a Selector node to programmatically select which of the node’s children the ray shall visit. A visit program dispatches the current ray to a particular child by calling the rtIntersectChild function. The argument to rtIntersectChild selects the child by specifying its index in the range [0, N), where N is given by the argument to rtSelectorSetChildCount.

4.9.1. Selector Visit Program Function Signature

In CUDA C, visit programs return void, take no parameters, and use the RT_PROGRAM qualifier:

RT_PROGRAM void visit_program( void );

4.9.2. Example Visit Program

Visit programs may implement, for example, sophisticated level-of-detail systems or simple selections based on ray direction. The following code sample demonstrates an example visit program that selects between two children based on the direction of the current ray:

rtDeclareVariable( OptiX::Ray, ray, rtCurrentRay, );

RT_PROGRAM void visit( void )
{
    unsigned int index = (unsigned int)( ray.direction.y < 0 );
    rtIntersectChild( index );
}

4.10. Callable Programs

Callable programs allow for additional programmability within the standard set of OptiX programs. Callable programs are referenced by handles that are set via RTvariables or RTbuffers on the host. This allows the changing of the target of a function call at runtime to achieve, for example, different shading effects in response to user input or customize a more general program based on the scene setup. Also, if you have a function that is invoked from many different places in your OptiX node graph, making it an RT_CALLABLE_PROGRAM can reduce code replication and compile time, and potentially improve runtime through increased warp utilization.

There are three pieces of callable programs. The first is the program you wish to call. The second is a declaration of a proxy function used to call the callable program. The third is the host code used to associate a callable program with the proxy function that will call it within the OptiX node graph.

Callable programs come in two variants, bound and bindless. Bound programs are invoked by direct use of a program bound to a variable through the host API and inherit the semantic type and variable scope lookup as the calling program. Bindless programs are called via an ID obtained from the RTprogram on the host and unlike bound programs do not inherit the semantic type or scope lookup of the calling program

4.10.1. Defining a Callable Program in CUDA

Defining an RT_CALLABLE_PROGRAM is similar to defining an RT_PROGRAM:

RT_CALLABLE_PROGRAM float3 get_color(
		float3 input_color, float scale)
{
    uint2 tile_size = make_uint2(launch_dim.x / N,
			launch_dim.y / N);
    if (launch_index.x/tile_size.x ^
	launch_index.y/tile_size.y)
        return input_color;
    else
	return input_color * scale;
}

RT_CALLABLE_PROGRAMs can take arguments and return values just like other functions in CUDA, whereas RT_PROGRAMs must return void.

4.10.2. Using a Callable Program Variable in CUDA

To invoke an RT_CALLABLE_PROGRAM from inside another RT_PROGRAM, you must first declare its handle. The handles can be one of two types, rtCallableProgramId or rtCallableProgramX. Both of these types are templated on the return type followed by the argument types (up to 10 arguments are supported as of OptiX 3.6). The difference between these two will be discussed later in this section.

typedef rtCallableProgramId<int(int)> callT; 
rtDeclareVariable(callT, do_work,,);
typedef rtCallableProgramX<float(int,int)> call2T;
rtDeclareVariable(call2T, do_more_work,,);

OptiX versions 3.5 and older declared callable programs via the rtCallableProgram macro. This macro still works for compatibility, but for SM_20 and newer targets rtCallableProgram now creates a declaration similar to rtCallableProgramX.

rtCallableProgram(return_type, function_name,
		(argument_list) );

(Note that the third argument must be contained in parentheses).

It is recommended to replace all uses of the macro version of rtCallableProgram with the templated version, rtCallableProgramX. In addition, if the preprocessor macro RT_USE_TEMPLATED_RTCALLABLEPROGRAM is defined then the old rtCallableProgram macro is supplanted by a definition that uses rtCallableProgramX.

// Before 
#include <optix_world.h>
rtCallableProgram(int, func, (int,float));

// After
#define RT_USE_TEMPLATED_RTCALLABLEPROGRAM
#include <optix_world.h>
rtDeclareVariable(rtCallableProgram<int(int,float)>,
func,,);

Once the program variable is declared, your OptiX program may invoke function_name as if it were a standard CUDA function. For example:

rtDeclareVariable( 
    rtCallableProgramId<float3(float3,float)>,
    get_color,,);

RT_PROGRAM camera()
{
    float3 initial_color, final_color;
    // … trace a ray, get the initial color …
    final_color = get_color( initial_color, 0.5f );
    // … write new final color to output buffer …
}

Because the target of the get_color program variable is specified at runtime by the host, camera does not need to know how its colors are being modified by the get_color function.

In addition to declaring single rtCallableProgramId variables, you can also declare a buffer of them, as follows.

rtCallableProgram(float3, get_color, (float3, float));
RT_PROGRAM camera()
{
    float3 initial_color, final_color;
    // … trace a ray, get the initial color …
    final_color = get_color( initial_color, 0.5f );
    // … write new final color to output buffer …
}

You can also pass rtCallableProgramId objects to other functions and store them for later use.

4.10.3 Setting a Callable Program on the Host

To set up an RT_CALLABLE_PROGRAM in your host code, simply load the PTX function using rtProgramCreateFromPTXFile, just like you would any other OptiX program. The resulting RTprogram object can be used in one of two ways. You can use the object directly to set an RTvariable via rtVariableSetObject. This is done for rtCallableProgramX and rtCallableProgram declared variables.

Alternatively, an ID for the RTprogram can be obtained through rtProgramGetId. This ID can be used to set the value of a rtCallableProgramId typed RTvariable (via rtVariableSetInt) or the values in a RTbuffer declared with type RT_FORMAT_PROGRAM_ID. For example:

RTprogram color_program; 
RTvariable color_program_variable;

rtProgramCreateFromPTXFile( context, ptx_path,
			"my_color_program",
			&color_program );
rtProgramDeclareVariable( camera_program,
"get_color",
			&color_program_variable );
// for rtCallableProgramX and rtCallableProgram
rtVariableSetObject( color_program_variable,
			color_program );
// for rtCallableProgramId
int id;
rtProgramGetId( color_program, &id );
rtVariableSetInt( color_program_variable, id );

// For convenience the C++ wrapper has a
// Variable::setProgramId method that gets the ID and
// sets the variable with it
camera_program["get_color"]->setProgramId(
color_program);

Here is an example of creating a buffer of rtCallableProgramIds using the C++ API. This sets up several programs one of which ("times_multiplier") makes use of a locally defined RTvariable called "multiplier" that is unique to each instance of the program.

Program plus10 = 
    context->createProgramFromPTXFile( ptx_path, 
			"plus10" ); 
Program minus10 =
    context->createProgramFromPTXFile( ptx_path,
			"minus10" );
Program times_multiplier2 =
    context->createProgramFromPTXFile( ptx_path,
			"times_multiplier" ); 
times_multiplier2["multiplier"]->setInt(2);

Program times_multiplier3 =
    context->createProgramFromPTXFile( ptx_path,
			"times_multiplier" );
times_multiplier3["multiplier"]->setInt(3);

Buffer functions =
    context->createBuffer( RT_BUFFER_INPUT,
			RT_FORMAT_PROGRAM_ID, 5 );
context["functions"]->set( functions ); 

// Here you can use the host defined type of
// callableProgramId<> or int
callableProgramId<int(int)>* f_data =
    static_cast<callableProgramId<int(int)>*>(functions->map());
f_data[ 0 ] = callableProgramId<int(int)>(plus10->getId());
f_data[ 1 ] = callableProgramId<int(int)>(plus10->getId());
f_data[ 2 ] = callableProgramId<int(int)>(times_multiplier2->getId());
f_data[ 3 ] = callableProgramId<int(int)>(minus10->getId());
f_data[ 4 ] = callableProgramId<int(int)>(times_multiplier3->getId());
functions->unmap();

int* f_data_int = static_cast<int*>(functions->map());
f_data_int[ 0 ] = plus10->getId();
f_data_int[ 1 ] = plus10->getId();
f_data_int[ 2 ] = times_multiplier2->getId();
f_data_int[ 3 ] = minus10->getId();
f_data_int[ 4 ] = times_multiplier3->getId();
functions->unmap();

Buffers created using RT_FORMAT_PROGRAM_ID can either cast the mapped pointer to a callableProgramId type or to int as seen above.

4.10.4. Bound versus Bindless Callable Programs

Bound callable programs are defined using either the rtCallableProgramX templated class or with the backward compatible rtCallableProgram macro. Bound programs are referred to as bound because you bind an RTprogram directly to an RTvariable that is then used to call the program. Binding a program to a variable enables OptiX to extend certain features to the program. Bound programs can be thought of as an extension to the caller, inheriting the semantic type as well as the RTvariable lookup scope based on where the program variable is called from. For example, if a callable program is called from a closest hit program then attributes are available to the callable program as well as being able to call functions such as rtTrace. Additionally, OptiX will look up identifiers in your callable program in the same scopes as the OptiX programs that invoke it. For example, if invoked from a closest hit program the lookup scopes will be program, geometry instance, material, then context where the program scope is the callable program itself instead of the caller's.

Bindless callable programs, on the other hand, inherit neither a program semantic type nor scope. Their scope is always itself (the RTprogram object) then the context regardless of where the program is invoked from. This is to enable calling these programs from arbitray locations. Obtaining the ID via rtProgramGetId will mark the RTprogram as bindless and this RTprogram object can no longer be bound to an RTvariable (used with rtCallableProgramX or rtCallableProgram). Bindless programs can only call callable programs, rtPrintf, rtThrow, and inlineable CUDA functions. Buffer, texture, and variable accesses also work.

Where the callable program variable is attached to the OptiX node graph determines which callable program is invoked when called from another OptiX program. This follows the same variable lookup method that other rtVariables employ. The only difference is that you cannot specify a default initializer.