5.6. Bfloat162 Arithmetic Functions
To use these functions, include the header file cuda_bf16.h in your program. 
Functions
- __host__ __device__ __nv_bfloat162 __h2div(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector division in round-to-nearest-even mode.
- __host__ __device__ __nv_bfloat162 __habs2(const __nv_bfloat162 a)
- Calculates the absolute value of both halves of the input - nv_bfloat162number and returns the result.
- __host__ __device__ __nv_bfloat162 __hadd2(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector addition in round-to-nearest-even mode.
- __host__ __device__ __nv_bfloat162 __hadd2_rn(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector addition in round-to-nearest-even mode.
- __host__ __device__ __nv_bfloat162 __hadd2_sat(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector addition in round-to-nearest-even mode, with saturation to [0.0, 1.0].
- __device__ __nv_bfloat162 __hcmadd(const __nv_bfloat162 a, const __nv_bfloat162 b, const __nv_bfloat162 c)
- Performs fast complex multiply-accumulate. 
- __device__ __nv_bfloat162 __hfma2(const __nv_bfloat162 a, const __nv_bfloat162 b, const __nv_bfloat162 c)
- Performs - nv_bfloat162vector fused multiply-add in round-to-nearest-even mode.
- __device__ __nv_bfloat162 __hfma2_relu(const __nv_bfloat162 a, const __nv_bfloat162 b, const __nv_bfloat162 c)
- Performs - nv_bfloat162vector fused multiply-add in round-to-nearest-even mode with relu saturation.
- __device__ __nv_bfloat162 __hfma2_sat(const __nv_bfloat162 a, const __nv_bfloat162 b, const __nv_bfloat162 c)
- Performs - nv_bfloat162vector fused multiply-add in round-to-nearest-even mode, with saturation to [0.0, 1.0].
- __host__ __device__ __nv_bfloat162 __hmul2(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector multiplication in round-to-nearest-even mode.
- __host__ __device__ __nv_bfloat162 __hmul2_rn(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector multiplication in round-to-nearest-even mode.
- __host__ __device__ __nv_bfloat162 __hmul2_sat(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector multiplication in round-to-nearest-even mode, with saturation to [0.0, 1.0].
- __host__ __device__ __nv_bfloat162 __hneg2(const __nv_bfloat162 a)
- Negates both halves of the input - nv_bfloat162number and returns the result.
- __host__ __device__ __nv_bfloat162 __hsub2(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector subtraction in round-to-nearest-even mode.
- __host__ __device__ __nv_bfloat162 __hsub2_rn(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector subtraction in round-to-nearest-even mode.
- __host__ __device__ __nv_bfloat162 __hsub2_sat(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector subtraction in round-to-nearest-even mode, with saturation to [0.0, 1.0].
- __device__ __nv_bfloat162 atomicAdd(__nv_bfloat162 *const address, const __nv_bfloat162 val)
- Vector add - valto the value stored at- addressin global or shared memory, and writes this value back to- address.
- __host__ __device__ __nv_bfloat162 operator*(const __nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16multiplication operation.
- __host__ __device__ __nv_bfloat162 & operator*=(__nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16compound assignment with multiplication operation.
- __host__ __device__ __nv_bfloat162 operator+(const __nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16addition operation.
- __host__ __device__ __nv_bfloat162 operator+(const __nv_bfloat162 &h)
- Implements packed - nv_bfloat16unary plus operator, returns input value.
- __host__ __device__ __nv_bfloat162 operator++(__nv_bfloat162 &h, const int ignored)
- Performs packed - nv_bfloat16postfix increment operation.
- __host__ __device__ __nv_bfloat162 & operator++(__nv_bfloat162 &h)
- Performs packed - nv_bfloat16prefix increment operation.
- __host__ __device__ __nv_bfloat162 & operator+=(__nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16compound assignment with addition operation.
- __host__ __device__ __nv_bfloat162 operator-(const __nv_bfloat162 &h)
- Implements packed - nv_bfloat16unary minus operator.
- __host__ __device__ __nv_bfloat162 operator-(const __nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16subtraction operation.
- __host__ __device__ __nv_bfloat162 operator–(__nv_bfloat162 &h, const int ignored)
- Performs packed - nv_bfloat16postfix decrement operation.
- __host__ __device__ __nv_bfloat162 & operator–(__nv_bfloat162 &h)
- Performs packed - nv_bfloat16prefix decrement operation.
- __host__ __device__ __nv_bfloat162 & operator-=(__nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16compound assignment with subtraction operation.
- __host__ __device__ __nv_bfloat162 operator/(const __nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16division operation.
- __host__ __device__ __nv_bfloat162 & operator/=(__nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16compound assignment with division operation.
5.6.1. Functions
- 
__host__ __device__ __nv_bfloat162 __h2div(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector division in round-to-nearest-even mode.- Divides - nv_bfloat162input vector- aby input vector- bin round-to-nearest-even mode.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The elementwise division of - awith- b.
 
 
- 
__host__ __device__ __nv_bfloat162 __habs2(const __nv_bfloat162 a)
- Calculates the absolute value of both halves of the input - nv_bfloat162number and returns the result.- Calculates the absolute value of both halves of the input - nv_bfloat162number and returns the result.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- Returns
- bfloat2 - Returns - awith the absolute value of both halves.
 
 
- 
__host__ __device__ __nv_bfloat162 __hadd2(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector addition in round-to-nearest-even mode.- Performs - nv_bfloat162vector add of inputs- aand- b, in round-to-nearest-even mode.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The sum of vectors - aand- b.
 
 
- 
__host__ __device__ __nv_bfloat162 __hadd2_rn(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector addition in round-to-nearest-even mode.- Performs - nv_bfloat162vector add of inputs- aand- b, in round-to-nearest-even mode. Prevents floating-point contractions of mul+add into fma.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The sum of vectors - aand- b.
 
 
- 
__host__ __device__ __nv_bfloat162 __hadd2_sat(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector addition in round-to-nearest-even mode, with saturation to [0.0, 1.0].- Performs - nv_bfloat162vector add of inputs- aand- b, in round-to-nearest-even mode, and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The sum of - aand- b, with respect to saturation.
 
 
- 
__device__ __nv_bfloat162 __hcmadd(const __nv_bfloat162 a, const __nv_bfloat162 b, const __nv_bfloat162 c)
- Performs fast complex multiply-accumulate. - Interprets vector - nv_bfloat162input pairs- a,- b, and- cas complex numbers in- nv_bfloat16precision and performs complex multiply-accumulate operation: a*b + c- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
- c – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The result of complex multiply-accumulate operation on complex numbers - a,- b, and- c
 
 
- 
__device__ __nv_bfloat162 __hfma2(const __nv_bfloat162 a, const __nv_bfloat162 b, const __nv_bfloat162 c)
- Performs - nv_bfloat162vector fused multiply-add in round-to-nearest-even mode.- Performs - nv_bfloat162vector multiply on inputs- aand- b, then performs a- nv_bfloat162vector add of the result with- c, rounding the result once in round-to-nearest-even mode.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
- c – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The result of elementwise fused multiply-add operation on vectors - a,- b, and- c.
 
 
- 
__device__ __nv_bfloat162 __hfma2_relu(const __nv_bfloat162 a, const __nv_bfloat162 b, const __nv_bfloat162 c)
- Performs - nv_bfloat162vector fused multiply-add in round-to-nearest-even mode with relu saturation.- Performs - nv_bfloat162vector multiply on inputs- aand- b, then performs a- nv_bfloat162vector add of the result with- c, rounding the result once in round-to-nearest-even mode. Then negative result is clamped to 0. NaN result is converted to canonical NaN.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
- c – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The result of elementwise fused multiply-add operation on vectors - a,- b, and- cwith relu saturation.
 
 
- 
__device__ __nv_bfloat162 __hfma2_sat(const __nv_bfloat162 a, const __nv_bfloat162 b, const __nv_bfloat162 c)
- Performs - nv_bfloat162vector fused multiply-add in round-to-nearest-even mode, with saturation to [0.0, 1.0].- Performs - nv_bfloat162vector multiply on inputs- aand- b, then performs a- nv_bfloat162vector add of the result with- c, rounding the result once in round-to-nearest-even mode, and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
- c – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The result of elementwise fused multiply-add operation on vectors - a,- b, and- c, with respect to saturation.
 
 
- 
__host__ __device__ __nv_bfloat162 __hmul2(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector multiplication in round-to-nearest-even mode.- Performs - nv_bfloat162vector multiplication of inputs- aand- b, in round-to-nearest-even mode.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The result of elementwise multiplying the vectors - aand- b.
 
 
- 
__host__ __device__ __nv_bfloat162 __hmul2_rn(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector multiplication in round-to-nearest-even mode.- Performs - nv_bfloat162vector multiplication of inputs- aand- b, in round-to-nearest-even mode. Prevents floating-point contractions of mul+add or sub into fma.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The result of elementwise multiplying the vectors - aand- b.
 
 
- 
__host__ __device__ __nv_bfloat162 __hmul2_sat(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector multiplication in round-to-nearest-even mode, with saturation to [0.0, 1.0].- Performs - nv_bfloat162vector multiplication of inputs- aand- b, in round-to-nearest-even mode, and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The result of elementwise multiplication of vectors - aand- b, with respect to saturation.
 
 
- 
__host__ __device__ __nv_bfloat162 __hneg2(const __nv_bfloat162 a)
- Negates both halves of the input - nv_bfloat162number and returns the result.- Negates both halves of the input - nv_bfloat162number- aand returns the result.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- Returns
- nv_bfloat162 - Returns - awith both halves negated.
 
 
- 
__host__ __device__ __nv_bfloat162 __hsub2(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector subtraction in round-to-nearest-even mode.- Subtracts - nv_bfloat162input vector- bfrom input vector- ain round-to-nearest-even mode.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The subtraction of vector - bfrom- a.
 
 
- 
__host__ __device__ __nv_bfloat162 __hsub2_rn(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector subtraction in round-to-nearest-even mode.- Subtracts - nv_bfloat162input vector- bfrom input vector- ain round-to-nearest-even mode. Prevents floating-point contractions of mul+sub into fma.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The subtraction of vector - bfrom- a.
 
 
- 
__host__ __device__ __nv_bfloat162 __hsub2_sat(const __nv_bfloat162 a, const __nv_bfloat162 b)
- Performs - nv_bfloat162vector subtraction in round-to-nearest-even mode, with saturation to [0.0, 1.0].- Subtracts - nv_bfloat162input vector- bfrom input vector- ain round-to-nearest-even mode, and clamps the results to range [0.0, 1.0]. NaN results are flushed to +0.0.- Parameters
- a – [in] - nv_bfloat162. Is only being read. 
- b – [in] - nv_bfloat162. Is only being read. 
 
- Returns
- nv_bfloat162 - The subtraction of vector - bfrom- a, with respect to saturation.
 
 
- 
__device__ __nv_bfloat162 atomicAdd(__nv_bfloat162 *const address, const __nv_bfloat162 val)
- Vector add - valto the value stored at- addressin global or shared memory, and writes this value back to- address.- The atomicity of the add operation is guaranteed separately for each of the two nv_bfloat16 elements; the entire __nv_bfloat162 is not guaranteed to be atomic as a single 32-bit access. - The location of - addressmust be in global or shared memory. This operation has undefined behavior otherwise. This operation is natively supported by devices of compute capability 9.x and higher, older devices use emulation path.- Note - For more details about this function, see the Atomic Functions section in the CUDA C++ Programming Guide. - Parameters
- address – [in] - __nv_bfloat162*. An address in global or shared memory. 
- val – [in] - __nv_bfloat162. The value to be added. 
 
- Returns
- The old value read from - address.
 
 
- 
__host__ __device__ __nv_bfloat162 operator*(const __nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16multiplication operation.
- 
__host__ __device__ __nv_bfloat162 &operator*=(__nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16compound assignment with multiplication operation.
- 
__host__ __device__ __nv_bfloat162 operator+(const __nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16addition operation.
- 
__host__ __device__ __nv_bfloat162 operator+(const __nv_bfloat162 &h)
- Implements packed - nv_bfloat16unary plus operator, returns input value.
- 
__host__ __device__ __nv_bfloat162 operator++(__nv_bfloat162 &h, const int ignored)
- Performs packed - nv_bfloat16postfix increment operation.
- 
__host__ __device__ __nv_bfloat162 &operator++(__nv_bfloat162 &h)
- Performs packed - nv_bfloat16prefix increment operation.
- 
__host__ __device__ __nv_bfloat162 &operator+=(__nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16compound assignment with addition operation.
- 
__host__ __device__ __nv_bfloat162 operator-(const __nv_bfloat162 &h)
- Implements packed - nv_bfloat16unary minus operator.- See also __hneg2(__nv_bfloat162) 
- 
__host__ __device__ __nv_bfloat162 operator-(const __nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16subtraction operation.
- 
__host__ __device__ __nv_bfloat162 operator--(__nv_bfloat162 &h, const int ignored)
- Performs packed - nv_bfloat16postfix decrement operation.
- 
__host__ __device__ __nv_bfloat162 &operator--(__nv_bfloat162 &h)
- Performs packed - nv_bfloat16prefix decrement operation.
- 
__host__ __device__ __nv_bfloat162 &operator-=(__nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16compound assignment with subtraction operation.
- 
__host__ __device__ __nv_bfloat162 operator/(const __nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16division operation.
- 
__host__ __device__ __nv_bfloat162 &operator/=(__nv_bfloat162 &lh, const __nv_bfloat162 &rh)
- Performs packed - nv_bfloat16compound assignment with division operation.