1.1.2. Half2 Arithmetic Functions
[Half Precision Intrinsics]
To use these functions, include the header file cuda_fp16.h in your program.
Functions
 __device__ __half2 __h2div ( const __half2 a, const __half2 b )
 Performs half2 vector division in roundtonearesteven mode.
 __device__ __half2 __habs2 ( const __half2 a )
 Calculates the absolute value of both halves of the input half2 number and returns the result.
 __device__ __half2 __hadd2 ( const __half2 a, const __half2 b )
 Performs half2 vector addition in roundtonearesteven mode.
 __device__ __half2 __hadd2_rn ( const __half2 a, const __half2 b )
 Performs half2 vector addition in roundtonearesteven mode.
 __device__ __half2 __hadd2_sat ( const __half2 a, const __half2 b )
 Performs half2 vector addition in roundtonearesteven mode, with saturation to [0.0, 1.0].
 __device__ __half2 __hcmadd ( const __half2 a, const __half2 b, const __half2 c )
 Performs fast complex multiplyaccumulate.
 __device__ __half2 __hfma2 ( const __half2 a, const __half2 b, const __half2 c )
 Performs half2 vector fused multiplyadd in roundtonearesteven mode.
 __device__ __half2 __hfma2_relu ( const __half2 a, const __half2 b, const __half2 c )
 Performs half2 vector fused multiplyadd in roundtonearesteven mode with relu saturation.
 __device__ __half2 __hfma2_sat ( const __half2 a, const __half2 b, const __half2 c )
 Performs half2 vector fused multiplyadd in roundtonearesteven mode, with saturation to [0.0, 1.0].
 __device__ __half2 __hmul2 ( const __half2 a, const __half2 b )
 Performs half2 vector multiplication in roundtonearesteven mode.
 __device__ __half2 __hmul2_rn ( const __half2 a, const __half2 b )
 Performs half2 vector multiplication in roundtonearesteven mode.
 __device__ __half2 __hmul2_sat ( const __half2 a, const __half2 b )
 Performs half2 vector multiplication in roundtonearesteven mode, with saturation to [0.0, 1.0].
 __device__ __half2 __hneg2 ( const __half2 a )
 Negates both halves of the input half2 number and returns the result.
 __device__ __half2 __hsub2 ( const __half2 a, const __half2 b )
 Performs half2 vector subtraction in roundtonearesteven mode.
 __device__ __half2 __hsub2_rn ( const __half2 a, const __half2 b )
 Performs half2 vector subtraction in roundtonearesteven mode.
 __device__ __half2 __hsub2_sat ( const __half2 a, const __half2 b )
 Performs half2 vector subtraction in roundtonearesteven mode, with saturation to [0.0, 1.0].
 __device__ __half2 atomicAdd ( const __half2* address, const __half2 val )
 Vector add val to the value stored at address in global or shared memory, and writes this value back to address. The atomicity of the add operation is guaranteed separately for each of the two __half elements; the entire __half2 is not guaranteed to be atomic as a single 32bit access.
Functions
 __device__ __half2 __h2div ( const __half2 a, const __half2 b )

Performs half2 vector division in roundtonearesteven mode.
Description
Divides half2 input vector a by input vector b in roundtonearest mode.
 __device__ __half2 __habs2 ( const __half2 a )

Calculates the absolute value of both halves of the input half2 number and returns the result.
Parameters
 a
  half2. Is only being read.
Returns
half2
 Returns
a with the absolute value of both halves.
Description
Calculates the absolute value of both halves of the input half2 number and returns the result.
 __device__ __half2 __hadd2 ( const __half2 a, const __half2 b )

Performs half2 vector addition in roundtonearesteven mode.
Description
Performs half2 vector add of inputs a and b, in roundtonearest mode.
 __device__ __half2 __hadd2_rn ( const __half2 a, const __half2 b )

Performs half2 vector addition in roundtonearesteven mode.
Description
Performs half2 vector add of inputs a and b, in roundtonearest mode. Prevents floatingpoint contractions of mul+add into fma.
 __device__ __half2 __hadd2_sat ( const __half2 a, const __half2 b )

Performs half2 vector addition in roundtonearesteven mode, with saturation to [0.0, 1.0].
Parameters
 a
  half2. Is only being read.
 b
  half2. Is only being read.
Returns
half2
 The
sum of a and b, with respect to saturation.
Description
Performs half2 vector add of inputs a and b, in roundtonearest mode, and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0.
 __device__ __half2 __hcmadd ( const __half2 a, const __half2 b, const __half2 c )

Performs fast complex multiplyaccumulate.
Parameters
 a
  half2. Is only being read.
 b
  half2. Is only being read.
 c
  half2. Is only being read.
Returns
half2
 The
result of complex multiplyaccumulate operation on complex numbers a, b, and c
Description
Interprets vector half2 input pairs a, b, and c as complex numbers in half precision and performs complex multiplyaccumulate operation: a*b + c
 __device__ __half2 __hfma2 ( const __half2 a, const __half2 b, const __half2 c )

Performs half2 vector fused multiplyadd in roundtonearesteven mode.
Description
Performs half2 vector multiply on inputs a and b, then performs a half2 vector add of the result with c, rounding the result once in roundtonearesteven mode.
 __device__ __half2 __hfma2_relu ( const __half2 a, const __half2 b, const __half2 c )

Performs half2 vector fused multiplyadd in roundtonearesteven mode with relu saturation.
Parameters
 a
  half2. Is only being read.
 b
  half2. Is only being read.
 c
  half2. Is only being read.
Returns
half2
 The
result of elementwise fused multiplyadd operation on vectors a, b, and c with relu saturation.
Description
Performs half2 vector multiply on inputs a and b, then performs a half2 vector add of the result with c, rounding the result once in roundtonearesteven mode. Then negative result is clamped to 0. NaN result is converted to canonical NaN.
 __device__ __half2 __hfma2_sat ( const __half2 a, const __half2 b, const __half2 c )

Performs half2 vector fused multiplyadd in roundtonearesteven mode, with saturation to [0.0, 1.0].
Parameters
 a
  half2. Is only being read.
 b
  half2. Is only being read.
 c
  half2. Is only being read.
Returns
half2
 The
result of elementwise fused multiplyadd operation on vectors a, b, and c, with respect to saturation.
Description
Performs half2 vector multiply on inputs a and b, then performs a half2 vector add of the result with c, rounding the result once in roundtonearesteven mode, and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0.
 __device__ __half2 __hmul2 ( const __half2 a, const __half2 b )

Performs half2 vector multiplication in roundtonearesteven mode.
Description
Performs half2 vector multiplication of inputs a and b, in roundtonearesteven mode.
 __device__ __half2 __hmul2_rn ( const __half2 a, const __half2 b )

Performs half2 vector multiplication in roundtonearesteven mode.
Description
Performs half2 vector multiplication of inputs a and b, in roundtonearesteven mode. Prevents floatingpoint contractions of mul+add or sub into fma.
 __device__ __half2 __hmul2_sat ( const __half2 a, const __half2 b )

Performs half2 vector multiplication in roundtonearesteven mode, with saturation to [0.0, 1.0].
Parameters
 a
  half2. Is only being read.
 b
  half2. Is only being read.
Returns
half2
 The
result of elementwise multiplication of vectors a and b, with respect to saturation.
Description
Performs half2 vector multiplication of inputs a and b, in roundtonearesteven mode, and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0.
 __device__ __half2 __hneg2 ( const __half2 a )

Negates both halves of the input half2 number and returns the result.
Description
Negates both halves of the input half2 number a and returns the result.
 __device__ __half2 __hsub2 ( const __half2 a, const __half2 b )

Performs half2 vector subtraction in roundtonearesteven mode.
Description
Subtracts half2 input vector b from input vector a in roundtonearesteven mode.
 __device__ __half2 __hsub2_rn ( const __half2 a, const __half2 b )

Performs half2 vector subtraction in roundtonearesteven mode.
Description
Subtracts half2 input vector b from input vector a in roundtonearesteven mode. Prevents floatingpoint contractions of mul+sub into fma.
 __device__ __half2 __hsub2_sat ( const __half2 a, const __half2 b )

Performs half2 vector subtraction in roundtonearesteven mode, with saturation to [0.0, 1.0].
Parameters
 a
  half2. Is only being read.
 b
  half2. Is only being read.
Returns
half2
 The
subtraction of vector b from a, with respect to saturation.
Description
Subtracts half2 input vector b from input vector a in roundtonearesteven mode, and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0.
 __device__ __half2 atomicAdd ( const __half2* address, const __half2 val )

Vector add val to the value stored at address in global or shared memory, and writes this value back to address. The atomicity of the add operation is guaranteed separately for each of the two __half elements; the entire __half2 is not guaranteed to be atomic as a single 32bit access.
Parameters
 address
  half2*. An address in global or shared memory.
 val
  half2. The value to be added.
Returns
half2
 The
old value read from address.
Description
The location of address must be in global or shared memory. This operation has undefined behavior otherwise. This operation is only supported by devices of compute capability 6.x and higher.
Note:For more details for this function see the Atomic Functions section in the CUDA C++ Programming Guide.