1.2.5. Bfloat16 Precision Conversion And Data Movement

[Bfloat16 Precision Intrinsics]

To use these functions include the header file cuda_bf16.h in your program.

Functions

__host____device__ ​ float2 __bfloat1622float2 ( const __nv_bfloat162 a )
Converts both halves of nv_bfloat162 to float2 and returns the result.
__device__ ​ __nv_bfloat162 __bfloat162bfloat162 ( const __nv_bfloat16 a )
Returns nv_bfloat162 with both halves equal to the input value.
__host____device__ ​ float __bfloat162float ( const __nv_bfloat16 a )
Converts nv_bfloat16 number to float.
__device__ ​ int __bfloat162int_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed integer in round-down mode.
__device__ ​ int __bfloat162int_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed integer in round-to-nearest-even mode.
__device__ ​ int __bfloat162int_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed integer in round-up mode.
__device__ ​ int __bfloat162int_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed integer in round-towards-zero mode.
__device__ ​ long long int __bfloat162ll_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed 64-bit integer in round-down mode.
__device__ ​ long long int __bfloat162ll_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed 64-bit integer in round-to-nearest-even mode.
__device__ ​ long long int __bfloat162ll_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed 64-bit integer in round-up mode.
__device__ ​ long long int __bfloat162ll_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed 64-bit integer in round-towards-zero mode.
__device__ ​ short int __bfloat162short_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed short integer in round-down mode.
__device__ ​ short int __bfloat162short_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed short integer in round-to-nearest-even mode.
__device__ ​ short int __bfloat162short_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed short integer in round-up mode.
__device__ ​ short int __bfloat162short_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed short integer in round-towards-zero mode.
__device__ ​ unsigned int __bfloat162uint_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned integer in round-down mode.
__device__ ​ unsigned int __bfloat162uint_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned integer in round-to-nearest-even mode.
__device__ ​ unsigned int __bfloat162uint_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned integer in round-up mode.
__device__ ​ unsigned int __bfloat162uint_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned integer in round-towards-zero mode.
__device__ ​ unsigned long long int __bfloat162ull_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned 64-bit integer in round-down mode.
__device__ ​ unsigned long long int __bfloat162ull_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned 64-bit integer in round-to-nearest-even mode.
__device__ ​ unsigned long long int __bfloat162ull_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned 64-bit integer in round-up mode.
__device__ ​ unsigned long long int __bfloat162ull_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned 64-bit integer in round-towards-zero mode.
__device__ ​ unsigned short int __bfloat162ushort_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned short integer in round-down mode.
__device__ ​ unsigned short int __bfloat162ushort_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned short integer in round-to-nearest-even mode.
__device__ ​ unsigned short int __bfloat162ushort_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned short integer in round-up mode.
__device__ ​ unsigned short int __bfloat162ushort_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned short integer in round-towards-zero mode.
__device__ ​ short int __bfloat16_as_short ( const __nv_bfloat16 h )
Reinterprets bits in a nv_bfloat16 as a signed short integer.
__device__ ​ unsigned short int __bfloat16_as_ushort ( const __nv_bfloat16 h )
Reinterprets bits in a nv_bfloat16 as an unsigned short integer.
__host____device__ ​ __nv_bfloat16 __double2bfloat16 ( const double  a )
Converts double number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.
__host____device__ ​ __nv_bfloat162 __float22bfloat162_rn ( const float2 a )
Converts both components of float2 number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat162 with converted values.
__host____device__ ​ __nv_bfloat16 __float2bfloat16 ( const float  a )
Converts float number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.
__host____device__ ​ __nv_bfloat162 __float2bfloat162_rn ( const float  a )
Converts input to nv_bfloat16 precision in round-to-nearest-even mode and populates both halves of nv_bfloat162 with converted value.
__host____device__ ​ __nv_bfloat16 __float2bfloat16_rd ( const float  a )
Converts float number to nv_bfloat16 precision in round-down mode and returns nv_bfloat16 with converted value.
__host____device__ ​ __nv_bfloat16 __float2bfloat16_rn ( const float  a )
Converts float number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.
__host____device__ ​ __nv_bfloat16 __float2bfloat16_ru ( const float  a )
Converts float number to nv_bfloat16 precision in round-up mode and returns nv_bfloat16 with converted value.
__host____device__ ​ __nv_bfloat16 __float2bfloat16_rz ( const float  a )
Converts float number to nv_bfloat16 precision in round-towards-zero mode and returns nv_bfloat16 with converted value.
__host____device__ ​ __nv_bfloat162 __floats2bfloat162_rn ( const float  a, const float  b )
Converts both input floats to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat162 with converted values.
__device__ ​ __nv_bfloat162 __halves2bfloat162 ( const __nv_bfloat16 a, const __nv_bfloat16 b )
Combines two nv_bfloat16 numbers into one nv_bfloat162 number.
__device__ ​ __nv_bfloat16 __high2bfloat16 ( const __nv_bfloat162 a )
Returns high 16 bits of nv_bfloat162 input.
__device__ ​ __nv_bfloat162 __high2bfloat162 ( const __nv_bfloat162 a )
Extracts high 16 bits from nv_bfloat162 input.
__host____device__ ​ float __high2float ( const __nv_bfloat162 a )
Converts high 16 bits of nv_bfloat162 to float and returns the result.
__device__ ​ __nv_bfloat162 __highs2bfloat162 ( const __nv_bfloat162 a, const __nv_bfloat162 b )
Extracts high 16 bits from each of the two nv_bfloat162 inputs and combines into one nv_bfloat162 number.
__device__ ​ __nv_bfloat16 __int2bfloat16_rd ( int  i )
Convert a signed integer to a nv_bfloat16 in round-down mode.
__device__ ​ __nv_bfloat16 __int2bfloat16_rn ( int  i )
Convert a signed integer to a nv_bfloat16 in round-to-nearest-even mode.
__device__ ​ __nv_bfloat16 __int2bfloat16_ru ( int  i )
Convert a signed integer to a nv_bfloat16 in round-up mode.
__device__ ​ __nv_bfloat16 __int2bfloat16_rz ( int  i )
Convert a signed integer to a nv_bfloat16 in round-towards-zero mode.
__device__ ​ __nv_bfloat16 __ldca ( const __nv_bfloat16* ptr )
Generates a `ld.global.ca` load instruction.
__device__ ​ __nv_bfloat162 __ldca ( const __nv_bfloat162* ptr )
Generates a `ld.global.ca` load instruction.
__device__ ​ __nv_bfloat16 __ldcg ( const __nv_bfloat16* ptr )
Generates a `ld.global.cg` load instruction.
__device__ ​ __nv_bfloat162 __ldcg ( const __nv_bfloat162* ptr )
Generates a `ld.global.cg` load instruction.
__device__ ​ __nv_bfloat16 __ldcs ( const __nv_bfloat16* ptr )
Generates a `ld.global.cs` load instruction.
__device__ ​ __nv_bfloat162 __ldcs ( const __nv_bfloat162* ptr )
Generates a `ld.global.cs` load instruction.
__device__ ​ __nv_bfloat16 __ldcv ( const __nv_bfloat16* ptr )
Generates a `ld.global.cv` load instruction.
__device__ ​ __nv_bfloat162 __ldcv ( const __nv_bfloat162* ptr )
Generates a `ld.global.cv` load instruction.
__device__ ​ __nv_bfloat16 __ldg ( const __nv_bfloat16* ptr )
Generates a `ld.global.nc` load instruction.
__device__ ​ __nv_bfloat162 __ldg ( const __nv_bfloat162* ptr )
Generates a `ld.global.nc` load instruction.
__device__ ​ __nv_bfloat16 __ldlu ( const __nv_bfloat16* ptr )
Generates a `ld.global.lu` load instruction.
__device__ ​ __nv_bfloat162 __ldlu ( const __nv_bfloat162* ptr )
Generates a `ld.global.lu` load instruction.
__device__ ​ __nv_bfloat16 __ll2bfloat16_rd ( long long int i )
Convert a signed 64-bit integer to a nv_bfloat16 in round-down mode.
__device__ ​ __nv_bfloat16 __ll2bfloat16_rn ( long long int i )
Convert a signed 64-bit integer to a nv_bfloat16 in round-to-nearest-even mode.
__device__ ​ __nv_bfloat16 __ll2bfloat16_ru ( long long int i )
Convert a signed 64-bit integer to a nv_bfloat16 in round-up mode.
__device__ ​ __nv_bfloat16 __ll2bfloat16_rz ( long long int i )
Convert a signed 64-bit integer to a nv_bfloat16 in round-towards-zero mode.
__device__ ​ __nv_bfloat16 __low2bfloat16 ( const __nv_bfloat162 a )
Returns low 16 bits of nv_bfloat162 input.
__device__ ​ __nv_bfloat162 __low2bfloat162 ( const __nv_bfloat162 a )
Extracts low 16 bits from nv_bfloat162 input.
__host____device__ ​ float __low2float ( const __nv_bfloat162 a )
Converts low 16 bits of nv_bfloat162 to float and returns the result.
__device__ ​ __nv_bfloat162 __lowhigh2highlow ( const __nv_bfloat162 a )
Swaps both halves of the nv_bfloat162 input.
__device__ ​ __nv_bfloat162 __lows2bfloat162 ( const __nv_bfloat162 a, const __nv_bfloat162 b )
Extracts low 16 bits from each of the two nv_bfloat162 inputs and combines into one nv_bfloat162 number.
__device__ ​ __nv_bfloat16 __shfl_down_sync ( unsigned mask, __nv_bfloat16 var, unsigned int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread with higher ID relative to the caller.
__device__ ​ __nv_bfloat162 __shfl_down_sync ( unsigned mask, __nv_bfloat162 var, unsigned int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread with higher ID relative to the caller.
__device__ ​ __nv_bfloat16 __shfl_sync ( unsigned mask, __nv_bfloat16 var, int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Direct copy from indexed thread.
__device__ ​ __nv_bfloat162 __shfl_sync ( unsigned mask, __nv_bfloat162 var, int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Direct copy from indexed thread.
__device__ ​ __nv_bfloat16 __shfl_up_sync ( unsigned mask, __nv_bfloat16 var, unsigned int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread with lower ID relative to the caller.
__device__ ​ __nv_bfloat162 __shfl_up_sync ( unsigned mask, __nv_bfloat162 var, unsigned int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread with lower ID relative to the caller.
__device__ ​ __nv_bfloat16 __shfl_xor_sync ( unsigned mask, __nv_bfloat16 var, int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread based on bitwise XOR of own thread ID.
__device__ ​ __nv_bfloat162 __shfl_xor_sync ( unsigned mask, __nv_bfloat162 var, int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread based on bitwise XOR of own thread ID.
__device__ ​ __nv_bfloat16 __short2bfloat16_rd ( short int i )
Convert a signed short integer to a nv_bfloat16 in round-down mode.
__device__ ​ __nv_bfloat16 __short2bfloat16_rn ( short int i )
Convert a signed short integer to a nv_bfloat16 in round-to-nearest-even mode.
__device__ ​ __nv_bfloat16 __short2bfloat16_ru ( short int i )
Convert a signed short integer to a nv_bfloat16 in round-up mode.
__device__ ​ __nv_bfloat16 __short2bfloat16_rz ( short int i )
Convert a signed short integer to a nv_bfloat16 in round-towards-zero mode.
__device__ ​ __nv_bfloat16 __short_as_bfloat16 ( const short int i )
Reinterprets bits in a signed short integer as a nv_bfloat16.
__device__ ​ void __stcg ( __nv_bfloat16* ptr, __nv_bfloat16 value )
Generates a `st.global.cg` store instruction.
__device__ ​ void __stcg ( __nv_bfloat162* ptr, __nv_bfloat162 value )
Generates a `st.global.cg` store instruction.
__device__ ​ void __stcs ( __nv_bfloat16* ptr, __nv_bfloat16 value )
Generates a `st.global.cs` store instruction.
__device__ ​ void __stcs ( __nv_bfloat162* ptr, __nv_bfloat162 value )
Generates a `st.global.cs` store instruction.
__device__ ​ void __stwb ( __nv_bfloat16* ptr, __nv_bfloat16 value )
Generates a `st.global.wb` store instruction.
__device__ ​ void __stwb ( __nv_bfloat162* ptr, __nv_bfloat162 value )
Generates a `st.global.wb` store instruction.
__device__ ​ void __stwt ( __nv_bfloat16* ptr, __nv_bfloat16 value )
Generates a `st.global.wt` store instruction.
__device__ ​ void __stwt ( __nv_bfloat162* ptr, __nv_bfloat162 value )
Generates a `st.global.wt` store instruction.
__device__ ​ __nv_bfloat16 __uint2bfloat16_rd ( unsigned int  i )
Convert an unsigned integer to a nv_bfloat16 in round-down mode.
__device__ ​ __nv_bfloat16 __uint2bfloat16_rn ( unsigned int  i )
Convert an unsigned integer to a nv_bfloat16 in round-to-nearest-even mode.
__device__ ​ __nv_bfloat16 __uint2bfloat16_ru ( unsigned int  i )
Convert an unsigned integer to a nv_bfloat16 in round-up mode.
__device__ ​ __nv_bfloat16 __uint2bfloat16_rz ( unsigned int  i )
Convert an unsigned integer to a nv_bfloat16 in round-towards-zero mode.
__device__ ​ __nv_bfloat16 __ull2bfloat16_rd ( unsigned long long int i )
Convert an unsigned 64-bit integer to a nv_bfloat16 in round-down mode.
__device__ ​ __nv_bfloat16 __ull2bfloat16_rn ( unsigned long long int i )
Convert an unsigned 64-bit integer to a nv_bfloat16 in round-to-nearest-even mode.
__device__ ​ __nv_bfloat16 __ull2bfloat16_ru ( unsigned long long int i )
Convert an unsigned 64-bit integer to a nv_bfloat16 in round-up mode.
__device__ ​ __nv_bfloat16 __ull2bfloat16_rz ( unsigned long long int i )
Convert an unsigned 64-bit integer to a nv_bfloat16 in round-towards-zero mode.
__device__ ​ __nv_bfloat16 __ushort2bfloat16_rd ( unsigned short int i )
Convert an unsigned short integer to a nv_bfloat16 in round-down mode.
__device__ ​ __nv_bfloat16 __ushort2bfloat16_rn ( unsigned short int i )
Convert an unsigned short integer to a nv_bfloat16 in round-to-nearest-even mode.
__device__ ​ __nv_bfloat16 __ushort2bfloat16_ru ( unsigned short int i )
Convert an unsigned short integer to a nv_bfloat16 in round-up mode.
__device__ ​ __nv_bfloat16 __ushort2bfloat16_rz ( unsigned short int i )
Convert an unsigned short integer to a nv_bfloat16 in round-towards-zero mode.
__device__ ​ __nv_bfloat16 __ushort_as_bfloat16 ( const unsigned short int i )
Reinterprets bits in an unsigned short integer as a nv_bfloat16.

Functions

__host____device__ ​ float2 __bfloat1622float2 ( const __nv_bfloat162 a )
Converts both halves of nv_bfloat162 to float2 and returns the result.
Parameters
a
- nv_bfloat162. Is only being read.
Returns

float2

  • \p

    a converted to float2.

Description

Converts both halves of nv_bfloat162 input a to float2 and returns the result.

__device__ ​ __nv_bfloat162 __bfloat162bfloat162 ( const __nv_bfloat16 a )
Returns nv_bfloat162 with both halves equal to the input value.
Parameters
a
- nv_bfloat16. Is only being read.
Returns

nv_bfloat162

  • The

    vector which has both its halves equal to the input a.

Description

Returns nv_bfloat162 number with both halves equal to the input anv_bfloat16 number.

__host____device__ ​ float __bfloat162float ( const __nv_bfloat16 a )
Converts nv_bfloat16 number to float.
Parameters
a
- float. Is only being read.
Returns

float

  • \p

    a converted to float.

Description

Converts nv_bfloat16 number a to float.

__device__ ​ int __bfloat162int_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed integer in round-down mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

int

  • \p

    h converted to a signed integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed integer in round-down mode.

__device__ ​ int __bfloat162int_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed integer in round-to-nearest-even mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

int

  • \p

    h converted to a signed integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed integer in round-to-nearest-even mode.

__device__ ​ int __bfloat162int_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed integer in round-up mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

int

  • \p

    h converted to a signed integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed integer in round-up mode.

__device__ ​ int __bfloat162int_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed integer in round-towards-zero mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

int

  • \p

    h converted to a signed integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed integer in round-towards-zero mode.

__device__ ​ long long int __bfloat162ll_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed 64-bit integer in round-down mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

long long int

  • \p

    h converted to a signed 64-bit integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed 64-bit integer in round-down mode.

__device__ ​ long long int __bfloat162ll_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed 64-bit integer in round-to-nearest-even mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

long long int

  • \p

    h converted to a signed 64-bit integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed 64-bit integer in round-to-nearest-even mode.

__device__ ​ long long int __bfloat162ll_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed 64-bit integer in round-up mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

long long int

  • \p

    h converted to a signed 64-bit integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed 64-bit integer in round-up mode.

__device__ ​ long long int __bfloat162ll_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed 64-bit integer in round-towards-zero mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

long long int

  • \p

    h converted to a signed 64-bit integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed 64-bit integer in round-towards-zero mode.

__device__ ​ short int __bfloat162short_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed short integer in round-down mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

short int

  • \p

    h converted to a signed short integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed short integer in round-down mode.

__device__ ​ short int __bfloat162short_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed short integer in round-to-nearest-even mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

short int

  • \p

    h converted to a signed short integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed short integer in round-to-nearest-even mode.

__device__ ​ short int __bfloat162short_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed short integer in round-up mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

short int

  • \p

    h converted to a signed short integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed short integer in round-up mode.

__device__ ​ short int __bfloat162short_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to a signed short integer in round-towards-zero mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

short int

  • \p

    h converted to a signed short integer.

Description

Convert the nv_bfloat16-precision floating point value h to a signed short integer in round-towards-zero mode.

__device__ ​ unsigned int __bfloat162uint_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned integer in round-down mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned int

  • \p

    h converted to an unsigned integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned integer in round-down mode.

__device__ ​ unsigned int __bfloat162uint_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned integer in round-to-nearest-even mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned int

  • \p

    h converted to an unsigned integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned integer in round-to-nearest-even mode.

__device__ ​ unsigned int __bfloat162uint_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned integer in round-up mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned int

  • \p

    h converted to an unsigned integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned integer in round-up mode.

__device__ ​ unsigned int __bfloat162uint_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned integer in round-towards-zero mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned int

  • \p

    h converted to an unsigned integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned integer in round-towards-zero mode.

__device__ ​ unsigned long long int __bfloat162ull_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned 64-bit integer in round-down mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned long long int

  • \p

    h converted to an unsigned 64-bit integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned 64-bit integer in round-down mode.

__device__ ​ unsigned long long int __bfloat162ull_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned 64-bit integer in round-to-nearest-even mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned long long int

  • \p

    h converted to an unsigned 64-bit integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned 64-bit integer in round-to-nearest-even mode.

__device__ ​ unsigned long long int __bfloat162ull_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned 64-bit integer in round-up mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned long long int

  • \p

    h converted to an unsigned 64-bit integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned 64-bit integer in round-up mode.

__device__ ​ unsigned long long int __bfloat162ull_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned 64-bit integer in round-towards-zero mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned long long int

  • \p

    h converted to an unsigned 64-bit integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned 64-bit integer in round-towards-zero mode.

__device__ ​ unsigned short int __bfloat162ushort_rd ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned short integer in round-down mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned short int

  • \p

    h converted to an unsigned short integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned short integer in round-down mode.

__device__ ​ unsigned short int __bfloat162ushort_rn ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned short integer in round-to-nearest-even mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned short int

  • \p

    h converted to an unsigned short integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned short integer in round-to-nearest-even mode.

__device__ ​ unsigned short int __bfloat162ushort_ru ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned short integer in round-up mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned short int

  • \p

    h converted to an an unsigned short integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned short integer in round-up mode.

__device__ ​ unsigned short int __bfloat162ushort_rz ( __nv_bfloat16 h )
Convert a nv_bfloat16 to an unsigned short integer in round-towards-zero mode.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned short int

  • \p

    h converted to an unsigned short integer.

Description

Convert the nv_bfloat16-precision floating point value h to an unsigned short integer in round-towards-zero mode.

__device__ ​ short int __bfloat16_as_short ( const __nv_bfloat16 h )
Reinterprets bits in a nv_bfloat16 as a signed short integer.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

short int

  • The

    reinterpreted value.

Description

Reinterprets the bits in the nv_bfloat16-precision floating point number h as a signed short integer.

__device__ ​ unsigned short int __bfloat16_as_ushort ( const __nv_bfloat16 h )
Reinterprets bits in a nv_bfloat16 as an unsigned short integer.
Parameters
h
- nv_bfloat16. Is only being read.
Returns

unsigned short int

  • The

    reinterpreted value.

Description

Reinterprets the bits in the nv_bfloat16-precision floating point h as an unsigned short number.

__host____device__ ​ __nv_bfloat16 __double2bfloat16 ( const double  a )
Converts double number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.
Parameters
a
- double. Is only being read.
Returns

nv_bfloat16

  • \p

    a converted to nv_bfloat16.

Description

Converts double number a to nv_bfloat16 precision in round-to-nearest-even mode.

__host____device__ ​ __nv_bfloat162 __float22bfloat162_rn ( const float2 a )
Converts both components of float2 number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat162 with converted values.
Parameters
a
- float2. Is only being read.
Returns

nv_bfloat162

  • The

    nv_bfloat162 which has corresponding halves equal to the converted float2 components.

Description

Converts both components of float2 to nv_bfloat16 precision in round-to-nearest mode and combines the results into one nv_bfloat162 number. Low 16 bits of the return value correspond to a.x and high 16 bits of the return value correspond to a.y.

__host____device__ ​ __nv_bfloat16 __float2bfloat16 ( const float  a )
Converts float number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.
Parameters
a
- float. Is only being read.
Returns

nv_bfloat16

  • \p

    a converted to nv_bfloat16.

Description

Converts float number a to nv_bfloat16 precision in round-to-nearest-even mode.

__host____device__ ​ __nv_bfloat162 __float2bfloat162_rn ( const float  a )
Converts input to nv_bfloat16 precision in round-to-nearest-even mode and populates both halves of nv_bfloat162 with converted value.
Parameters
a
- float. Is only being read.
Returns

nv_bfloat162

  • The

    nv_bfloat162 value with both halves equal to the converted nv_bfloat16 precision number.

Description

Converts input a to nv_bfloat16 precision in round-to-nearest-even mode and populates both halves of nv_bfloat162 with converted value.

__host____device__ ​ __nv_bfloat16 __float2bfloat16_rd ( const float  a )
Converts float number to nv_bfloat16 precision in round-down mode and returns nv_bfloat16 with converted value.
Parameters
a
- float. Is only being read.
Returns

nv_bfloat16

  • \p

    a converted to nv_bfloat16.

Description

Converts float number a to nv_bfloat16 precision in round-down mode.

__host____device__ ​ __nv_bfloat16 __float2bfloat16_rn ( const float  a )
Converts float number to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat16 with converted value.
Parameters
a
- float. Is only being read.
Returns

nv_bfloat16

  • \p

    a converted to nv_bfloat16.

Description

Converts float number a to nv_bfloat16 precision in round-to-nearest-even mode.

__host____device__ ​ __nv_bfloat16 __float2bfloat16_ru ( const float  a )
Converts float number to nv_bfloat16 precision in round-up mode and returns nv_bfloat16 with converted value.
Parameters
a
- float. Is only being read.
Returns

nv_bfloat16

  • \p

    a converted to nv_bfloat16.

Description

Converts float number a to nv_bfloat16 precision in round-up mode.

__host____device__ ​ __nv_bfloat16 __float2bfloat16_rz ( const float  a )
Converts float number to nv_bfloat16 precision in round-towards-zero mode and returns nv_bfloat16 with converted value.
Parameters
a
- float. Is only being read.
Returns

nv_bfloat16

  • \p

    a converted to nv_bfloat16.

Description

Converts float number a to nv_bfloat16 precision in round-towards-zero mode.

__host____device__ ​ __nv_bfloat162 __floats2bfloat162_rn ( const float  a, const float  b )
Converts both input floats to nv_bfloat16 precision in round-to-nearest-even mode and returns nv_bfloat162 with converted values.
Parameters
a
- float. Is only being read.
b
- float. Is only being read.
Returns

nv_bfloat162

  • The

    nv_bfloat162 value with corresponding halves equal to the converted input floats.

Description

Converts both input floats to nv_bfloat16 precision in round-to-nearest-even mode and combines the results into one nv_bfloat162 number. Low 16 bits of the return value correspond to the input a, high 16 bits correspond to the input b.

__device__ ​ __nv_bfloat162 __halves2bfloat162 ( const __nv_bfloat16 a, const __nv_bfloat16 b )
Combines two nv_bfloat16 numbers into one nv_bfloat162 number.
Parameters
a
- nv_bfloat16. Is only being read.
b
- nv_bfloat16. Is only being read.
Returns

nv_bfloat162

  • The

    nv_bfloat162 with one nv_bfloat16 equal to a and the other to b.

Description

Combines two input nv_bfloat16 number a and b into one nv_bfloat162 number. Input a is stored in low 16 bits of the return value, input b is stored in high 16 bits of the return value.

__device__ ​ __nv_bfloat16 __high2bfloat16 ( const __nv_bfloat162 a )
Returns high 16 bits of nv_bfloat162 input.
Parameters
a
- nv_bfloat162. Is only being read.
Returns

nv_bfloat16

  • The

    high 16 bits of the input.

Description

Returns high 16 bits of nv_bfloat162 input a.

__device__ ​ __nv_bfloat162 __high2bfloat162 ( const __nv_bfloat162 a )
Extracts high 16 bits from nv_bfloat162 input.
Parameters
a
- nv_bfloat162. Is only being read.
Returns

nv_bfloat162

  • The

    nv_bfloat162 with both halves equal to the high 16 bits of the input.

Description

Extracts high 16 bits from nv_bfloat162 input a and returns a new nv_bfloat162 number which has both halves equal to the extracted bits.

__host____device__ ​ float __high2float ( const __nv_bfloat162 a )
Converts high 16 bits of nv_bfloat162 to float and returns the result.
Parameters
a
- nv_bfloat162. Is only being read.
Returns

float

  • The

    high 16 bits of a converted to float.

Description

Converts high 16 bits of nv_bfloat162 input a to 32 bit floating point number and returns the result.

__device__ ​ __nv_bfloat162 __highs2bfloat162 ( const __nv_bfloat162 a, const __nv_bfloat162 b )
Extracts high 16 bits from each of the two nv_bfloat162 inputs and combines into one nv_bfloat162 number.
Parameters
a
- nv_bfloat162. Is only being read.
b
- nv_bfloat162. Is only being read.
Returns

nv_bfloat162

  • The

    high 16 bits of a and of b.

Description

Extracts high 16 bits from each of the two nv_bfloat162 inputs and combines into one nv_bfloat162 number. High 16 bits from input a is stored in low 16 bits of the return value, high 16 bits from input b is stored in high 16 bits of the return value.

__device__ ​ __nv_bfloat16 __int2bfloat16_rd ( int  i )
Convert a signed integer to a nv_bfloat16 in round-down mode.
Parameters
i
- int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed integer value i to a nv_bfloat16-precision floating point value in round-down mode.

__device__ ​ __nv_bfloat16 __int2bfloat16_rn ( int  i )
Convert a signed integer to a nv_bfloat16 in round-to-nearest-even mode.
Parameters
i
- int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed integer value i to a nv_bfloat16-precision floating point value in round-to-nearest-even mode.

__device__ ​ __nv_bfloat16 __int2bfloat16_ru ( int  i )
Convert a signed integer to a nv_bfloat16 in round-up mode.
Parameters
i
- int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed integer value i to a nv_bfloat16-precision floating point value in round-up mode.

__device__ ​ __nv_bfloat16 __int2bfloat16_rz ( int  i )
Convert a signed integer to a nv_bfloat16 in round-towards-zero mode.
Parameters
i
- int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed integer value i to a nv_bfloat16-precision floating point value in round-towards-zero mode.

__device__ ​ __nv_bfloat16 __ldca ( const __nv_bfloat16* ptr )
Generates a `ld.global.ca` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat162 __ldca ( const __nv_bfloat162* ptr )
Generates a `ld.global.ca` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat16 __ldcg ( const __nv_bfloat16* ptr )
Generates a `ld.global.cg` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat162 __ldcg ( const __nv_bfloat162* ptr )
Generates a `ld.global.cg` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat16 __ldcs ( const __nv_bfloat16* ptr )
Generates a `ld.global.cs` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat162 __ldcs ( const __nv_bfloat162* ptr )
Generates a `ld.global.cs` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat16 __ldcv ( const __nv_bfloat16* ptr )
Generates a `ld.global.cv` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat162 __ldcv ( const __nv_bfloat162* ptr )
Generates a `ld.global.cv` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat16 __ldg ( const __nv_bfloat16* ptr )
Generates a `ld.global.nc` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat162 __ldg ( const __nv_bfloat162* ptr )
Generates a `ld.global.nc` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat16 __ldlu ( const __nv_bfloat16* ptr )
Generates a `ld.global.lu` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat162 __ldlu ( const __nv_bfloat162* ptr )
Generates a `ld.global.lu` load instruction.
Parameters
ptr
- memory location
Returns

The value pointed by `ptr`

Description

__device__ ​ __nv_bfloat16 __ll2bfloat16_rd ( long long int i )
Convert a signed 64-bit integer to a nv_bfloat16 in round-down mode.
Parameters
i
- long long int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed 64-bit integer value i to a nv_bfloat16-precision floating point value in round-down mode.

__device__ ​ __nv_bfloat16 __ll2bfloat16_rn ( long long int i )
Convert a signed 64-bit integer to a nv_bfloat16 in round-to-nearest-even mode.
Parameters
i
- long long int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed 64-bit integer value i to a nv_bfloat16-precision floating point value in round-to-nearest-even mode.

__device__ ​ __nv_bfloat16 __ll2bfloat16_ru ( long long int i )
Convert a signed 64-bit integer to a nv_bfloat16 in round-up mode.
Parameters
i
- long long int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed 64-bit integer value i to a nv_bfloat16-precision floating point value in round-up mode.

__device__ ​ __nv_bfloat16 __ll2bfloat16_rz ( long long int i )
Convert a signed 64-bit integer to a nv_bfloat16 in round-towards-zero mode.
Parameters
i
- long long int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed 64-bit integer value i to a nv_bfloat16-precision floating point value in round-towards-zero mode.

__device__ ​ __nv_bfloat16 __low2bfloat16 ( const __nv_bfloat162 a )
Returns low 16 bits of nv_bfloat162 input.
Parameters
a
- nv_bfloat162. Is only being read.
Returns

nv_bfloat16

  • Returns

    nv_bfloat16 which contains low 16 bits of the input a.

Description

Returns low 16 bits of nv_bfloat162 input a.

__device__ ​ __nv_bfloat162 __low2bfloat162 ( const __nv_bfloat162 a )
Extracts low 16 bits from nv_bfloat162 input.
Parameters
a
- nv_bfloat162. Is only being read.
Returns

nv_bfloat162

  • The

    nv_bfloat162 with both halves equal to the low 16 bits of the input.

Description

Extracts low 16 bits from nv_bfloat162 input a and returns a new nv_bfloat162 number which has both halves equal to the extracted bits.

__host____device__ ​ float __low2float ( const __nv_bfloat162 a )
Converts low 16 bits of nv_bfloat162 to float and returns the result.
Parameters
a
- nv_bfloat162. Is only being read.
Returns

float

  • The

    low 16 bits of a converted to float.

Description

Converts low 16 bits of nv_bfloat162 input a to 32 bit floating point number and returns the result.

__device__ ​ __nv_bfloat162 __lowhigh2highlow ( const __nv_bfloat162 a )
Swaps both halves of the nv_bfloat162 input.
Parameters
a
- nv_bfloat162. Is only being read.
Returns

nv_bfloat162

  • \p

    a with its halves being swapped.

Description

Swaps both halves of the nv_bfloat162 input and returns a new nv_bfloat162 number with swapped halves.

__device__ ​ __nv_bfloat162 __lows2bfloat162 ( const __nv_bfloat162 a, const __nv_bfloat162 b )
Extracts low 16 bits from each of the two nv_bfloat162 inputs and combines into one nv_bfloat162 number.
Parameters
a
- nv_bfloat162. Is only being read.
b
- nv_bfloat162. Is only being read.
Returns

nv_bfloat162

  • The

    low 16 bits of a and of b.

Description

Extracts low 16 bits from each of the two nv_bfloat162 inputs and combines into one nv_bfloat162 number. Low 16 bits from input a is stored in low 16 bits of the return value, low 16 bits from input b is stored in high 16 bits of the return value.

__device__ ​ __nv_bfloat16 __shfl_down_sync ( unsigned mask, __nv_bfloat16 var, unsigned int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread with higher ID relative to the caller.
Parameters
mask
- unsigned int. Is only being read.
var
- nv_bfloat16. Is only being read.
delta
- int. Is only being read.
width
- int. Is only being read.
Returns

Returns the 2-byte word referenced by var from the source thread ID as nv_bfloat16. If the source thread ID is out of range or the source thread has exited, the calling thread's own var is returned.

Description

Calculates a source thread ID by adding delta to the caller's thread ID. The value of var held by the resulting thread ID is returned: this has the effect of shifting var down the warp by delta threads. If width is less than warpSize then each subsection of the warp behaves as a separate entity with a starting logical thread ID of 0. As for __shfl_up_sync(), the ID number of the source thread will not wrap around the value of width and so the upper delta threads will remain unchanged.

__device__ ​ __nv_bfloat162 __shfl_down_sync ( unsigned mask, __nv_bfloat162 var, unsigned int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread with higher ID relative to the caller.
Parameters
mask
- unsigned int. Is only being read.
var
- nv_bfloat162. Is only being read.
delta
- int. Is only being read.
width
- int. Is only being read.
Returns

Returns the 4-byte word referenced by var from the source thread ID as nv_bfloat162. If the source thread ID is out of range or the source thread has exited, the calling thread's own var is returned.

Description

Calculates a source thread ID by adding delta to the caller's thread ID. The value of var held by the resulting thread ID is returned: this has the effect of shifting var down the warp by delta threads. If width is less than warpSize then each subsection of the warp behaves as a separate entity with a starting logical thread ID of 0. As for __shfl_up_sync(), the ID number of the source thread will not wrap around the value of width and so the upper delta threads will remain unchanged.

__device__ ​ __nv_bfloat16 __shfl_sync ( unsigned mask, __nv_bfloat16 var, int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Direct copy from indexed thread.
Parameters
mask
- unsigned int. Is only being read.
var
- nv_bfloat16. Is only being read.
delta
- int. Is only being read.
width
- int. Is only being read.
Returns

Returns the 2-byte word referenced by var from the source thread ID as nv_bfloat16. If the source thread ID is out of range or the source thread has exited, the calling thread's own var is returned.

Description

Returns the value of var held by the thread whose ID is given by delta. If width is less than warpSize then each subsection of the warp behaves as a separate entity with a starting logical thread ID of 0. If delta is outside the range [0:width-1], the value returned corresponds to the value of var held by the delta modulo width (i.e. ithin the same subsection). width must have a value which is a power of 2; results are undefined if width is not a power of 2, or is a number greater than warpSize.

__device__ ​ __nv_bfloat162 __shfl_sync ( unsigned mask, __nv_bfloat162 var, int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Direct copy from indexed thread.
Parameters
mask
- unsigned int. Is only being read.
var
- nv_bfloat162. Is only being read.
delta
- int. Is only being read.
width
- int. Is only being read.
Returns

Returns the 4-byte word referenced by var from the source thread ID as nv_bfloat162. If the source thread ID is out of range or the source thread has exited, the calling thread's own var is returned.

Description

Returns the value of var held by the thread whose ID is given by delta. If width is less than warpSize then each subsection of the warp behaves as a separate entity with a starting logical thread ID of 0. If delta is outside the range [0:width-1], the value returned corresponds to the value of var held by the delta modulo width (i.e. ithin the same subsection). width must have a value which is a power of 2; results are undefined if width is not a power of 2, or is a number greater than warpSize.

__device__ ​ __nv_bfloat16 __shfl_up_sync ( unsigned mask, __nv_bfloat16 var, unsigned int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread with lower ID relative to the caller.
Parameters
mask
- unsigned int. Is only being read.
var
- nv_bfloat16. Is only being read.
delta
- int. Is only being read.
width
- int. Is only being read.
Returns

Returns the 2-byte word referenced by var from the source thread ID as nv_bfloat16. If the source thread ID is out of range or the source thread has exited, the calling thread's own var is returned.

Description

Calculates a source thread ID by subtracting delta from the caller's lane ID. The value of var held by the resulting lane ID is returned: in effect, var is shifted up the warp by delta threads. If width is less than warpSize then each subsection of the warp behaves as a separate entity with a starting logical thread ID of 0. The source thread index will not wrap around the value of width, so effectively the lower delta threads will be unchanged. width must have a value which is a power of 2; results are undefined if width is not a power of 2, or is a number greater than warpSize.

__device__ ​ __nv_bfloat162 __shfl_up_sync ( unsigned mask, __nv_bfloat162 var, unsigned int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread with lower ID relative to the caller.
Parameters
mask
- unsigned int. Is only being read.
var
- nv_bfloat162. Is only being read.
delta
- int. Is only being read.
width
- int. Is only being read.
Returns

Returns the 4-byte word referenced by var from the source thread ID as nv_bfloat162. If the source thread ID is out of range or the source thread has exited, the calling thread's own var is returned.

Description

Calculates a source thread ID by subtracting delta from the caller's lane ID. The value of var held by the resulting lane ID is returned: in effect, var is shifted up the warp by delta threads. If width is less than warpSize then each subsection of the warp behaves as a separate entity with a starting logical thread ID of 0. The source thread index will not wrap around the value of width, so effectively the lower delta threads will be unchanged. width must have a value which is a power of 2; results are undefined if width is not a power of 2, or is a number greater than warpSize.

__device__ ​ __nv_bfloat16 __shfl_xor_sync ( unsigned mask, __nv_bfloat16 var, int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread based on bitwise XOR of own thread ID.
Parameters
mask
- unsigned int. Is only being read.
var
- nv_bfloat16. Is only being read.
delta
- int. Is only being read.
width
- int. Is only being read.
Returns

Returns the 2-byte word referenced by var from the source thread ID as nv_bfloat16. If the source thread ID is out of range or the source thread has exited, the calling thread's own var is returned.

Description

Calculates a source thread ID by performing a bitwise XOR of the caller's thread ID with mask: the value of var held by the resulting thread ID is returned. If width is less than warpSize then each group of width consecutive threads are able to access elements from earlier groups of threads, however if they attempt to access elements from later groups of threads their own value of var will be returned. This mode implements a butterfly addressing pattern such as is used in tree reduction and broadcast.

__device__ ​ __nv_bfloat162 __shfl_xor_sync ( unsigned mask, __nv_bfloat162 var, int  delta, int  width = warpSize )
Exchange a variable between threads within a warp. Copy from a thread based on bitwise XOR of own thread ID.
Parameters
mask
- unsigned int. Is only being read.
var
- nv_bfloat162. Is only being read.
delta
- int. Is only being read.
width
- int. Is only being read.
Returns

Returns the 4-byte word referenced by var from the source thread ID as nv_bfloat162. If the source thread ID is out of range or the source thread has exited, the calling thread's own var is returned.

Description

Calculates a source thread ID by performing a bitwise XOR of the caller's thread ID with mask: the value of var held by the resulting thread ID is returned. If width is less than warpSize then each group of width consecutive threads are able to access elements from earlier groups of threads, however if they attempt to access elements from later groups of threads their own value of var will be returned. This mode implements a butterfly addressing pattern such as is used in tree reduction and broadcast.

__device__ ​ __nv_bfloat16 __short2bfloat16_rd ( short int i )
Convert a signed short integer to a nv_bfloat16 in round-down mode.
Parameters
i
- short int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed short integer value i to a nv_bfloat16-precision floating point value in round-down mode.

__device__ ​ __nv_bfloat16 __short2bfloat16_rn ( short int i )
Convert a signed short integer to a nv_bfloat16 in round-to-nearest-even mode.
Parameters
i
- short int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed short integer value i to a nv_bfloat16-precision floating point value in round-to-nearest-even mode.

__device__ ​ __nv_bfloat16 __short2bfloat16_ru ( short int i )
Convert a signed short integer to a nv_bfloat16 in round-up mode.
Parameters
i
- short int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed short integer value i to a nv_bfloat16-precision floating point value in round-up mode.

__device__ ​ __nv_bfloat16 __short2bfloat16_rz ( short int i )
Convert a signed short integer to a nv_bfloat16 in round-towards-zero mode.
Parameters
i
- short int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the signed short integer value i to a nv_bfloat16-precision floating point value in round-towards-zero mode.

__device__ ​ __nv_bfloat16 __short_as_bfloat16 ( const short int i )
Reinterprets bits in a signed short integer as a nv_bfloat16.
Parameters
i
- short int. Is only being read.
Returns

nv_bfloat16

  • The

    reinterpreted value.

Description

Reinterprets the bits in the signed short integer i as a nv_bfloat16-precision floating point number.

__device__ ​ void __stcg ( __nv_bfloat16* ptr, __nv_bfloat16 value )
Generates a `st.global.cg` store instruction.
Parameters
ptr
- memory location
value
- the value to be stored
Description

__device__ ​ void __stcg ( __nv_bfloat162* ptr, __nv_bfloat162 value )
Generates a `st.global.cg` store instruction.
Parameters
ptr
- memory location
value
- the value to be stored
Description

__device__ ​ void __stcs ( __nv_bfloat16* ptr, __nv_bfloat16 value )
Generates a `st.global.cs` store instruction.
Parameters
ptr
- memory location
value
- the value to be stored
Description

__device__ ​ void __stcs ( __nv_bfloat162* ptr, __nv_bfloat162 value )
Generates a `st.global.cs` store instruction.
Parameters
ptr
- memory location
value
- the value to be stored
Description

__device__ ​ void __stwb ( __nv_bfloat16* ptr, __nv_bfloat16 value )
Generates a `st.global.wb` store instruction.
Parameters
ptr
- memory location
value
- the value to be stored
Description

__device__ ​ void __stwb ( __nv_bfloat162* ptr, __nv_bfloat162 value )
Generates a `st.global.wb` store instruction.
Parameters
ptr
- memory location
value
- the value to be stored
Description

__device__ ​ void __stwt ( __nv_bfloat16* ptr, __nv_bfloat16 value )
Generates a `st.global.wt` store instruction.
Parameters
ptr
- memory location
value
- the value to be stored
Description

__device__ ​ void __stwt ( __nv_bfloat162* ptr, __nv_bfloat162 value )
Generates a `st.global.wt` store instruction.
Parameters
ptr
- memory location
value
- the value to be stored
Description

__device__ ​ __nv_bfloat16 __uint2bfloat16_rd ( unsigned int  i )
Convert an unsigned integer to a nv_bfloat16 in round-down mode.
Parameters
i
- unsigned int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned integer value i to a nv_bfloat16-precision floating point value in round-down mode.

__device__ ​ __nv_bfloat16 __uint2bfloat16_rn ( unsigned int  i )
Convert an unsigned integer to a nv_bfloat16 in round-to-nearest-even mode.
Parameters
i
- unsigned int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned integer value i to a nv_bfloat16-precision floating point value in round-to-nearest-even mode.

__device__ ​ __nv_bfloat16 __uint2bfloat16_ru ( unsigned int  i )
Convert an unsigned integer to a nv_bfloat16 in round-up mode.
Parameters
i
- unsigned int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned integer value i to a nv_bfloat16-precision floating point value in round-up mode.

__device__ ​ __nv_bfloat16 __uint2bfloat16_rz ( unsigned int  i )
Convert an unsigned integer to a nv_bfloat16 in round-towards-zero mode.
Parameters
i
- unsigned int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned integer value i to a nv_bfloat16-precision floating point value in round-towards-zero mode.

__device__ ​ __nv_bfloat16 __ull2bfloat16_rd ( unsigned long long int i )
Convert an unsigned 64-bit integer to a nv_bfloat16 in round-down mode.
Parameters
i
- unsigned long long int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned 64-bit integer value i to a nv_bfloat16-precision floating point value in round-down mode.

__device__ ​ __nv_bfloat16 __ull2bfloat16_rn ( unsigned long long int i )
Convert an unsigned 64-bit integer to a nv_bfloat16 in round-to-nearest-even mode.
Parameters
i
- unsigned long long int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned 64-bit integer value i to a nv_bfloat16-precision floating point value in round-to-nearest-even mode.

__device__ ​ __nv_bfloat16 __ull2bfloat16_ru ( unsigned long long int i )
Convert an unsigned 64-bit integer to a nv_bfloat16 in round-up mode.
Parameters
i
- unsigned long long int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned 64-bit integer value i to a nv_bfloat16-precision floating point value in round-up mode.

__device__ ​ __nv_bfloat16 __ull2bfloat16_rz ( unsigned long long int i )
Convert an unsigned 64-bit integer to a nv_bfloat16 in round-towards-zero mode.
Parameters
i
- unsigned long long int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned 64-bit integer value i to a nv_bfloat16-precision floating point value in round-towards-zero mode.

__device__ ​ __nv_bfloat16 __ushort2bfloat16_rd ( unsigned short int i )
Convert an unsigned short integer to a nv_bfloat16 in round-down mode.
Parameters
i
- unsigned short int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned short integer value i to a nv_bfloat16-precision floating point value in round-down mode.

__device__ ​ __nv_bfloat16 __ushort2bfloat16_rn ( unsigned short int i )
Convert an unsigned short integer to a nv_bfloat16 in round-to-nearest-even mode.
Parameters
i
- unsigned short int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned short integer value i to a nv_bfloat16-precision floating point value in round-to-nearest-even mode.

__device__ ​ __nv_bfloat16 __ushort2bfloat16_ru ( unsigned short int i )
Convert an unsigned short integer to a nv_bfloat16 in round-up mode.
Parameters
i
- unsigned short int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned short integer value i to a nv_bfloat16-precision floating point value in round-up mode.

__device__ ​ __nv_bfloat16 __ushort2bfloat16_rz ( unsigned short int i )
Convert an unsigned short integer to a nv_bfloat16 in round-towards-zero mode.
Parameters
i
- unsigned short int. Is only being read.
Returns

nv_bfloat16

  • \p

    i converted to nv_bfloat16.

Description

Convert the unsigned short integer value i to a nv_bfloat16-precision floating point value in round-towards-zero mode.

__device__ ​ __nv_bfloat16 __ushort_as_bfloat16 ( const unsigned short int i )
Reinterprets bits in an unsigned short integer as a nv_bfloat16.
Parameters
i
- unsigned short int. Is only being read.
Returns

nv_bfloat16

  • The

    reinterpreted value.

Description

Reinterprets the bits in the unsigned short integer i as a nv_bfloat16-precision floating point number.