BLAS Runtime APIs
This section describes the Fortran interfaces to the CUDA BLAS libraries. There are currently four separate collections of function entry points which are commonly referred to as the cuBLAS:
The original CUDA implementation of the BLAS routines, referred to as the legacy API, which are callable from the host and expect and operate on device data.
The newer “v2” CUDA implementation of the BLAS routines, plus some extensions for batched operations. These are also callable from the host and operate on device data. In Fortran terms, these entry points have been changed from subroutines to functions which return status.
The cuBLAS XT library which can target multiple GPUs using only host-resident data.
The cuBLAS MP library which can target multiple GPUs using distributed device data, similar to the ScaLAPACK PBLAS functions. The cublasMp and cusolverMp libraries are built, in part, upon a communications library named CAL, which is documented in another section of this document.
NVIDIA currently ships with four Fortran modules which programmers can use to call into this cuBLAS functionality:
cublas, which provides interfaces to into the main cublas library. Both the legacy and v2 names are supported. In this module, the cublas names (such as cublasSaxpy) use the legacy calling conventions. Interfaces to a host BLAS library (for instance libblas.a in the NVIDIA distribution) are also included in the cublas module. These interfaces are exposed by adding the line
use cublas
to your program unit.
cublas_v2, which is similar to the cublas module in most ways except the cublas names (such as cublasSaxpy) use the v2 calling conventions. For instance, instead of a subroutine, cublasSaxpy is a function which takes a handle as the first argument and returns an integer containing the status of the call. These interfaces are exposed by adding the line
use cublas_v2
to your program unit.
cublasxt, which interfaces directly to the cublasXT API. These interfaces are exposed by adding the line
use cublasxt
to your program unit.
cublasmp, which provides interfaces into the cublasMp API. These interfaces are exposed by adding the line
use cublasMp
to your program unit.
The v2 routines are integer functions that return an error status code; they return a value of CUBLAS_STATUS_SUCCESS if the call was successful, or other cuBLAS status return value if there was an error.
Documented interfaces to the traditional BLAS names in the subsequent sections, which contain the comment ! device or host variable
should not be confused with the pointer mode issue from section 1.6. The traditional BLAS names are overloaded generic names in the cublas
module. For instance, in this interface
subroutine scopy(n, x, incx, y, incy)
integer :: n
real(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
The arrays x and y can either both be device arrays, in which case cublasScopy
is called via the generic interface, or they can both be host arrays, in which case scopy
from the host BLAS library is called. Using CUDA Fortran managed data as actual arguments to scopy
poses an interesting case, and calling cublasScopy
is chosen by default. If you wish to call the host library version of scopy with managed data, don’t expose the generic scopy interface at the call site.
Unless a specific kind is provided, in the following interfaces the plain integer type implies integer(4) and the plain real type implies real(4).
CUBLAS Definitions and Helper Functions
This section contains definitions and data types used in the cuBLAS library and interfaces to the cuBLAS Helper Functions.
The cublas module contains the following derived type definitions:
TYPE cublasHandle
TYPE(C_PTR) :: handle
END TYPE
The cuBLAS module contains the following enumerations:
enum, bind(c)
enumerator :: CUBLAS_STATUS_SUCCESS =0
enumerator :: CUBLAS_STATUS_NOT_INITIALIZED =1
enumerator :: CUBLAS_STATUS_ALLOC_FAILED =3
enumerator :: CUBLAS_STATUS_INVALID_VALUE =7
enumerator :: CUBLAS_STATUS_ARCH_MISMATCH =8
enumerator :: CUBLAS_STATUS_MAPPING_ERROR =11
enumerator :: CUBLAS_STATUS_EXECUTION_FAILED=13
enumerator :: CUBLAS_STATUS_INTERNAL_ERROR =14
end enum
enum, bind(c)
enumerator :: CUBLAS_FILL_MODE_LOWER=0
enumerator :: CUBLAS_FILL_MODE_UPPER=1
end enum
enum, bind(c)
enumerator :: CUBLAS_DIAG_NON_UNIT=0
enumerator :: CUBLAS_DIAG_UNIT=1
end enum
enum, bind(c)
enumerator :: CUBLAS_SIDE_LEFT =0
enumerator :: CUBLAS_SIDE_RIGHT=1
end enum
enum, bind(c)
enumerator :: CUBLAS_OP_N=0
enumerator :: CUBLAS_OP_T=1
enumerator :: CUBLAS_OP_C=2
end enum
enum, bind(c)
enumerator :: CUBLAS_POINTER_MODE_HOST = 0
enumerator :: CUBLAS_POINTER_MODE_DEVICE = 1
end enum
cublasCreate
This function initializes the CUBLAS library and creates a handle to an opaque structure holding the CUBLAS library context. It allocates hardware resources on the host and device and must be called prior to making any other CUBLAS library calls. The CUBLAS library context is tied to the current CUDA device. To use the library on multiple devices, one CUBLAS handle needs to be created for each device. Furthermore, for a given device, multiple CUBLAS handles with different configuration can be created. Because cublasCreate allocates some internal resources and the release of those resources by calling cublasDestroy will implicitly call cublasDeviceSynchronize, it is recommended to minimize the number of cublasCreate/cublasDestroy occurences. For multi-threaded applications that use the same device from different threads, the recommended programming model is to create one CUBLAS handle per thread and use that CUBLAS handle for the entire life of the thread.
integer(4) function cublasCreate(handle)
type(cublasHandle) :: handle
cublasDestroy
This function releases hardware resources used by the CUBLAS library. This function is usually the last call with a particular handle to the CUBLAS library. Because cublasCreate allocates some internal resources and the release of those resources by calling cublasDestroy will implicitly call cublasDeviceSynchronize, it is recommended to minimize the number of cublasCreate/cublasDestroy occurences.
integer(4) function cublasDestroy(handle)
type(cublasHandle) :: handle
cublasGetVersion
This function returns the version number of the cuBLAS library.
integer(4) function cublasGetVersion(handle, version)
type(cublasHandle) :: handle
integer(4) :: version
cublasSetStream
This function sets the cuBLAS library stream, which will be used to execute all subsequent calls to the cuBLAS library functions. If the cuBLAS library stream is not set, all kernels use the default NULL stream. In particular, this routine can be used to change the stream between kernel launches and then to reset the cuBLAS library stream back to NULL.
integer(4) function cublasSetStream(handle, stream)
type(cublasHandle) :: handle
integer(kind=cuda_stream_kind()) :: stream
cublasGetStream
This function gets the cuBLAS library stream, which is being used to execute all calls to the cuBLAS library functions. If the cuBLAS library stream is not set, all kernels use the default NULL stream.
integer(4) function cublasGetStream(handle, stream)
type(cublasHandle) :: handle
integer(kind=cuda_stream_kind()) :: stream
cublasGetStatusName
This function returns the cuBLAS status name associated with a given status value.
character(128) function cublasGetStatusName(ierr)
integer(4) :: ierr
cublasGetStatusString
This function returns the cuBLAS status string associated with a given status value.
character(128) function cublasGetStatusString(ierr)
integer(4) :: ierr
cublasGetPointerMode
This function obtains the pointer mode used by the cuBLAS library. In the cublas
module, the pointer mode is set and reset on a call-by-call basis depending on the whether the device attribute is set on scalar actual arguments. See section 1.6 for a discussion of pointer modes.
integer(4) function cublasGetPointerMode(handle, mode)
type(cublasHandle) :: handle
integer(4) :: mode
cublasSetPointerMode
This function sets the pointer mode used by the cuBLAS library. When using the cublas
module, the pointer mode is set on a call-by-call basis depending on the whether the device attribute is set on scalar actual arguments. When using the cublas_v2
module with v2 interfaces, it is the programmer’s responsibility to make calls to cublasSetPointerMode
so scalar arguments are handled correctly by the library. See section 1.6 for a discussion of pointer modes.
integer(4) function cublasSetPointerMode(handle, mode)
type(cublasHandle) :: handle
integer(4) :: mode
cublasGetAtomicsMode
This function obtains the atomics mode used by the cuBLAS library.
integer(4) function cublasGetAtomicsMode(handle, mode)
type(cublasHandle) :: handle
integer(4) :: mode
cublasSetAtomicsMode
This function sets the atomics mode used by the cuBLAS library. Some routines in the cuBLAS library have alternate implementations that use atomics to accumulate results. These alternate implementations may run faster but may also generate results which are not identical from one run to the other. The default is to not allow atomics in cuBLAS functions.
integer(4) function cublasSetAtomicsMode(handle, mode)
type(cublasHandle) :: handle
integer(4) :: mode
cublasGetMathMode
This function obtains the math mode used by the cuBLAS library.
integer(4) function cublasGetMathMode(handle, mode)
type(cublasHandle) :: handle
integer(4) :: mode
cublasSetMathMode
This function sets the math mode used by the cuBLAS library. Some routines in the cuBLAS library allow you to choose the compute precision used to generate results. These alternate approaches may run faster but may also generate different, less accurate results.
integer(4) function cublasSetMathMode(handle, mode)
type(cublasHandle) :: handle
integer(4) :: mode
cublasGetSmCountTarget
This function obtains the SM count target used by the cuBLAS library.
integer(4) function cublasGetSmCountTarget(handle, counttarget)
type(cublasHandle) :: handle
integer(4) :: counttarget
cublasSetSmCountTarget
This function sets the SM count target used by the cuBLAS library.
integer(4) function cublasSetSmCountTarget(handle, counttarget)
type(cublasHandle) :: handle
integer(4) :: counttarget
cublasGetHandle
This function gets the cuBLAS handle currently in use by a thread. The CUDA Fortran runtime keeps track of a CPU thread’s current handle, if you are either using the legacy BLAS API, or do not wish to pass the handle through to low-level functions or subroutines manually.
type(cublashandle) function cublasGetHandle()
integer(4) function cublasGetHandle(handle)
type(cublasHandle) :: handle
cublasSetVector
This function copies n elements from a vector x in host memory space to a vector y in GPU memory space. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of vector x and y is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpy or array assignment statements.
integer(4) function cublassetvector(n, elemsize, x, incx, y, incy)
integer :: n, elemsize, incx, incy
integer*1, dimension(*) :: x
integer*1, device, dimension(*) :: y
cublasGetVector
This function copies n elements from a vector x in GPU memory space to a vector y in host memory space. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of vector x and y is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpy or array assignment statements.
integer(4) function cublasgetvector(n, elemsize, x, incx, y, incy)
integer :: n, elemsize, incx, incy
integer*1, device, dimension(*) :: x
integer*1, dimension(*) :: y
cublasSetMatrix
This function copies a tile of rows x cols elements from a matrix A in host memory space to a matrix B in GPU memory space. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of Matrix A and B is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpy, cudaMemcpy2D, or array assignment statements.
integer(4) function cublassetmatrix(rows, cols, elemsize, a, lda, b, ldb)
integer :: rows, cols, elemsize, lda, ldb
integer*1, dimension(lda, *) :: a
integer*1, device, dimension(ldb, *) :: b
cublasGetMatrix
This function copies a tile of rows x cols elements from a matrix A in GPU memory space to a matrix B in host memory space. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of Matrix A and B is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpy, cudaMemcpy2D, or array assignment statements.
integer(4) function cublasgetmatrix(rows, cols, elemsize, a, lda, b, ldb)
integer :: rows, cols, elemsize, lda, ldb
integer*1, device, dimension(lda, *) :: a
integer*1, dimension(ldb, *) :: b
cublasSetVectorAsync
This function copies n elements from a vector x in host memory space to a vector y in GPU memory space, asynchronously, on the given CUDA stream. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of vector x and y is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpyAsync.
integer(4) function cublassetvectorasync(n, elemsize, x, incx, y, incy, stream)
integer :: n, elemsize, incx, incy
integer*1, dimension(*) :: x
integer*1, device, dimension(*) :: y
integer(kind=cuda_stream_kind()) :: stream
cublasGetVectorAsync
This function copies n elements from a vector x in host memory space to a vector y in GPU memory space, asynchronously, on the given CUDA stream. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of vector x and y is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpyAsync.
integer(4) function cublasgetvectorasync(n, elemsize, x, incx, y, incy, stream)
integer :: n, elemsize, incx, incy
integer*1, device, dimension(*) :: x
integer*1, dimension(*) :: y
integer(kind=cuda_stream_kind()) :: stream
cublasSetMatrixAsync
This function copies a tile of rows x cols elements from a matrix A in host memory space to a matrix B in GPU memory space, asynchronously using the specified stream. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of Matrix A and B is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpyAsync or cudaMemcpy2DAsync.
integer(4) function cublassetmatrixasync(rows, cols, elemsize, a, lda, b, ldb, stream)
integer :: rows, cols, elemsize, lda, ldb
integer*1, dimension(lda, *) :: a
integer*1, device, dimension(ldb, *) :: b
integer(kind=cuda_stream_kind()) :: stream
cublasGetMatrixAsync
This function copies a tile of rows x cols elements from a matrix A in GPU memory space to a matrix B in host memory space, asynchronously, using the specified stream. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of Matrix A and B is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpyAsync or cudaMemcpy2DAsync.
integer(4) function cublasgetmatrixasync(rows, cols, elemsize, a, lda, b, ldb, stream)
integer :: rows, cols, elemsize, lda, ldb
integer*1, device, dimension(lda, *) :: a
integer*1, dimension(ldb, *) :: b
integer(kind=cuda_stream_kind()) :: stream
Single Precision Functions and Subroutines
This section contains interfaces to the single precision BLAS and cuBLAS functions and subroutines.
isamax
ISAMAX finds the index of the element having the maximum absolute value.
integer(4) function isamax(n, x, incx)
integer :: n
real(4), device, dimension(*) :: x ! device or host variable
integer :: incx
integer(4) function cublasIsamax(n, x, incx)
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasIsamax_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
isamin
ISAMIN finds the index of the element having the minimum absolute value.
integer(4) function isamin(n, x, incx)
integer :: n
real(4), device, dimension(*) :: x ! device or host variable
integer :: incx
integer(4) function cublasIsamin(n, x, incx)
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasIsamin_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
sasum
SASUM takes the sum of the absolute values.
real(4) function sasum(n, x, incx)
integer :: n
real(4), device, dimension(*) :: x ! device or host variable
integer :: incx
real(4) function cublasSasum(n, x, incx)
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasSasum_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
real(4), device :: res ! device or host variable
saxpy
SAXPY constant times a vector plus a vector.
subroutine saxpy(n, a, x, incx, y, incy)
integer :: n
real(4), device :: a ! device or host variable
real(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasSaxpy(n, a, x, incx, y, incy)
integer :: n
real(4), device :: a ! device or host variable
real(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasSaxpy_v2(h, n, a, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(4), device :: a ! device or host variable
real(4), device, dimension(*) :: x, y
integer :: incx, incy
scopy
SCOPY copies a vector, x, to a vector, y.
subroutine scopy(n, x, incx, y, incy)
integer :: n
real(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasScopy(n, x, incx, y, incy)
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasScopy_v2(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
sdot
SDOT forms the dot product of two vectors.
real(4) function sdot(n, x, incx, y, incy)
integer :: n
real(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
real(4) function cublasSdot(n, x, incx, y, incy)
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasSdot_v2(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
real(4), device :: res ! device or host variable
snrm2
SNRM2 returns the euclidean norm of a vector via the function name, so that SNRM2 := sqrt( x’*x ).
real(4) function snrm2(n, x, incx)
integer :: n
real(4), device, dimension(*) :: x ! device or host variable
integer :: incx
real(4) function cublasSnrm2(n, x, incx)
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasSnrm2_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
real(4), device :: res ! device or host variable
srot
SROT applies a plane rotation.
subroutine srot(n, x, incx, y, incy, sc, ss)
integer :: n
real(4), device :: sc, ss ! device or host variable
real(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasSrot(n, x, incx, y, incy, sc, ss)
integer :: n
real(4), device :: sc, ss ! device or host variable
real(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasSrot_v2(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(4), device :: sc, ss ! device or host variable
real(4), device, dimension(*) :: x, y
integer :: incx, incy
srotg
SROTG constructs a Givens plane rotation.
subroutine srotg(sa, sb, sc, ss)
real(4), device :: sa, sb, sc, ss ! device or host variable
subroutine cublasSrotg(sa, sb, sc, ss)
real(4), device :: sa, sb, sc, ss ! device or host variable
integer(4) function cublasSrotg_v2(h, sa, sb, sc, ss)
type(cublasHandle) :: h
real(4), device :: sa, sb, sc, ss ! device or host variable
srotm
SROTM applies the modified Givens transformation, H, to the 2 by N matrix (SX**T) , where **T indicates transpose. The elements of SX are in (SX**T) SX(LX+I*INCX), I = 0 to N-1, where LX = 1 if INCX .GE. 0, ELSE LX = (-INCX)*N, and similarly for SY using LY and INCY. With SPARAM(1)=SFLAG, H has one of the following forms.. SFLAG=-1.E0 SFLAG=0.E0 SFLAG=1.E0 SFLAG=-2.E0 (SH11 SH12) (1.E0 SH12) (SH11 1.E0) (1.E0 0.E0) H=( ) ( ) ( ) ( ) (SH21 SH22), (SH21 1.E0), (-1.E0 SH22), (0.E0 1.E0). See SROTMG for a description of data storage in SPARAM.
subroutine srotm(n, x, incx, y, incy, param)
integer :: n
real(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasSrotm(n, x, incx, y, incy, param)
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
real(4), device :: param(*) ! device or host variable
integer(4) function cublasSrotm_v2(h, n, x, incx, y, incy, param)
type(cublasHandle) :: h
integer :: n
real(4), device :: param(*) ! device or host variable
real(4), device, dimension(*) :: x, y
integer :: incx, incy
srotmg
SROTMG constructs the modified Givens transformation matrix H which zeros the second component of the 2-vector (SQRT(SD1)*SX1,SQRT(SD2)*SY2)**T. With SPARAM(1)=SFLAG, H has one of the following forms..SFLAG=-1.E0 SFLAG=0.E0 SFLAG=1.E0 SFLAG=-2.E0 (SH11 SH12) (1.E0 SH12) (SH11 1.E0) (1.E0 0.E0) H=( ) ( ) ( ) ( ) (SH21 SH22), (SH21 1.E0), (-1.E0 SH22), (0.E0 1.E0).
Locations 2-4 of SPARAM contain SH11,SH21,SH12, and SH22 respectively. (Values of 1.E0, -1.E0, or 0.E0 implied by the value of SPARAM(1) are not stored in SPARAM.)
subroutine srotmg(d1, d2, x1, y1, param)
real(4), device :: d1, d2, x1, y1, param(*) ! device or host variable
subroutine cublasSrotmg(d1, d2, x1, y1, param)
real(4), device :: d1, d2, x1, y1, param(*) ! device or host variable
integer(4) function cublasSrotmg_v2(h, d1, d2, x1, y1, param)
type(cublasHandle) :: h
real(4), device :: d1, d2, x1, y1, param(*) ! device or host variable
sscal
SSCAL scales a vector by a constant.
subroutine sscal(n, a, x, incx)
integer :: n
real(4), device :: a ! device or host variable
real(4), device, dimension(*) :: x ! device or host variable
integer :: incx
subroutine cublasSscal(n, a, x, incx)
integer :: n
real(4), device :: a ! device or host variable
real(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasSscal_v2(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
real(4), device :: a ! device or host variable
real(4), device, dimension(*) :: x
integer :: incx
sswap
SSWAP interchanges two vectors.
subroutine sswap(n, x, incx, y, incy)
integer :: n
real(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasSswap(n, x, incx, y, incy)
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasSswap_v2(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
sgbmv
SGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals.
subroutine sgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, kl, ku, lda, incx, incy
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x, y ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, kl, ku, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSgbmv_v2(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, kl, ku, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
sgemv
SGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
subroutine sgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x, y ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSgemv_v2(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
sger
SGER performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine sger(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x, y ! device or host variable
real(4), device :: alpha ! device or host variable
subroutine cublasSger(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha ! device or host variable
integer(4) function cublasSger_v2(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha ! device or host variable
ssbmv
SSBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric band matrix, with k super-diagonals.
subroutine ssbmv(t, n, k, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: k, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x, y ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsbmv(t, n, k, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: k, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsbmv_v2(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: k, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
sspmv
SSPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form.
subroutine sspmv(t, n, alpha, a, x, incx, beta, y, incy)
character*1 :: t
integer :: n, incx, incy
real(4), device, dimension(*) :: a, x, y ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSspmv(t, n, alpha, a, x, incx, beta, y, incy)
character*1 :: t
integer :: n, incx, incy
real(4), device, dimension(*) :: a, x, y
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSspmv_v2(h, t, n, alpha, a, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
real(4), device, dimension(*) :: a, x, y
real(4), device :: alpha, beta ! device or host variable
sspr
SSPR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix, supplied in packed form.
subroutine sspr(t, n, alpha, x, incx, a)
character*1 :: t
integer :: n, incx
real(4), device, dimension(*) :: a, x ! device or host variable
real(4), device :: alpha ! device or host variable
subroutine cublasSspr(t, n, alpha, x, incx, a)
character*1 :: t
integer :: n, incx
real(4), device, dimension(*) :: a, x
real(4), device :: alpha ! device or host variable
integer(4) function cublasSspr_v2(h, t, n, alpha, x, incx, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx
real(4), device, dimension(*) :: a, x
real(4), device :: alpha ! device or host variable
sspr2
SSPR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form.
subroutine sspr2(t, n, alpha, x, incx, y, incy, a)
character*1 :: t
integer :: n, incx, incy
real(4), device, dimension(*) :: a, x, y ! device or host variable
real(4), device :: alpha ! device or host variable
subroutine cublasSspr2(t, n, alpha, x, incx, y, incy, a)
character*1 :: t
integer :: n, incx, incy
real(4), device, dimension(*) :: a, x, y
real(4), device :: alpha ! device or host variable
integer(4) function cublasSspr2_v2(h, t, n, alpha, x, incx, y, incy, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
real(4), device, dimension(*) :: a, x, y
real(4), device :: alpha ! device or host variable
ssymv
SSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine ssymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x, y ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsymv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
ssyr
SSYR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix.
subroutine ssyr(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x ! device or host variable
real(4), device :: alpha ! device or host variable
subroutine cublasSsyr(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
real(4), device :: alpha ! device or host variable
integer(4) function cublasSsyr_v2(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
real(4), device :: alpha ! device or host variable
ssyr2
SSYR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine ssyr2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x, y ! device or host variable
real(4), device :: alpha ! device or host variable
subroutine cublasSsyr2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha ! device or host variable
integer(4) function cublasSsyr2_v2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha ! device or host variable
stbmv
STBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals.
subroutine stbmv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x ! device or host variable
subroutine cublasStbmv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
integer(4) function cublasStbmv_v2(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
stbsv
STBSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine stbsv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x ! device or host variable
subroutine cublasStbsv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
integer(4) function cublasStbsv_v2(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
stpmv
STPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
subroutine stpmv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
real(4), device, dimension(*) :: a, x ! device or host variable
subroutine cublasStpmv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
real(4), device, dimension(*) :: a, x
integer(4) function cublasStpmv_v2(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
real(4), device, dimension(*) :: a, x
stpsv
STPSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine stpsv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
real(4), device, dimension(*) :: a, x ! device or host variable
subroutine cublasStpsv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
real(4), device, dimension(*) :: a, x
integer(4) function cublasStpsv_v2(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
real(4), device, dimension(*) :: a, x
strmv
STRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix.
subroutine strmv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x ! device or host variable
subroutine cublasStrmv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
integer(4) function cublasStrmv_v2(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
strsv
STRSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine strsv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(*) :: x ! device or host variable
subroutine cublasStrsv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
integer(4) function cublasStrsv_v2(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
sgemm
SGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
subroutine sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: transa, transb
integer :: m, n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(ldb, *) :: b ! device or host variable
real(4), device, dimension(ldc, *) :: c ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: transa, transb
integer :: m, n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSgemm_v2(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
ssymm
SSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
subroutine ssymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(ldb, *) :: b ! device or host variable
real(4), device, dimension(ldc, *) :: c ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsymm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
ssyrk
SSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine ssyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(ldc, *) :: c ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsyrk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
ssyr2k
SSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine ssyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(ldb, *) :: b ! device or host variable
real(4), device, dimension(ldc, *) :: c ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsyr2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
ssyrkx
SSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
subroutine ssyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(ldb, *) :: b ! device or host variable
real(4), device, dimension(ldc, *) :: c ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsyrkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
strmm
STRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ), where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T.
subroutine strmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(ldb, *) :: b ! device or host variable
real(4), device :: alpha ! device or host variable
subroutine cublasStrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device :: alpha ! device or host variable
integer(4) function cublasStrmm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha ! device or host variable
strsm
STRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
subroutine strsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(4), device, dimension(lda, *) :: a ! device or host variable
real(4), device, dimension(ldb, *) :: b ! device or host variable
real(4), device :: alpha ! device or host variable
subroutine cublasStrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device :: alpha ! device or host variable
integer(4) function cublasStrsm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device :: alpha ! device or host variable
cublasSgemvBatched
SGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
integer(4) function cublasSgemvBatched(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
real(4), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
integer(4) function cublasSgemvBatched_v2(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
real(4), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
cublasSgemmBatched
SGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasSgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
real(4), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
integer(4) function cublasSgemmBatched_v2(h, transa, transb, m, n, k, alpha, &
Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
real(4), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
cublasSgelsBatched
SGELS solves overdetermined or underdetermined real linear systems involving an M-by-N matrix A, or its transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = ‘N’ and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = ‘N’ and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = ‘T’ and m >= n: find the minimum norm solution of an undetermined system A**T * X = B. 4. If TRANS = ‘T’ and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**T * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X.
integer(4) function cublasSgelsBatched(h, trans, m, n, nrhs, Aarray, lda, Carray, ldc, info, devinfo, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n, nrhs
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: info(*)
integer, device :: devinfo(*)
integer :: batchCount
cublasSgeqrfBatched
SGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.
integer(4) function cublasSgeqrfBatched(h, m, n, Aarray, lda, Tau, info, batchCount)
type(cublasHandle) :: h
integer :: m, n
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Tau(*)
integer :: info(*)
integer :: batchCount
cublasSgetrfBatched
SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm.
integer(4) function cublasSgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
integer, device :: info(*)
integer :: batchCount
cublasSgetriBatched
SGETRI computes the inverse of a matrix using the LU factorization computed by SGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A).
integer(4) function cublasSgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
type(c_devptr), device :: Carray(*)
integer :: ldc
integer, device :: info(*)
integer :: batchCount
cublasSgetrsBatched
SGETRS solves a system of linear equations A * X = B or A**T * X = B with a general N-by-N matrix A using the LU factorization computed by SGETRF.
integer(4) function cublasSgetrsBatched(h, trans, n, nrhs, Aarray, lda, ipvt, Barray, ldb, info, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: n, nrhs
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
type(c_devptr), device :: Barray(*)
integer :: ldb
integer :: info(*)
integer :: batchCount
cublasSmatinvBatched
cublasSmatinvBatched is a short cut of cublasSgetrfBatched plus cublasSgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasSgetrfBatched and cublasSgetriBatched.
integer(4) function cublasSmatinvBatched(h, n, Aarray, lda, Ainv, lda_inv, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Ainv(*)
integer :: lda_inv
integer, device :: info(*)
integer :: batchCount
cublasStrsmBatched
STRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
integer(4) function cublasStrsmBatched( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount)
type(cublasHandle) :: h
integer :: side ! integer or character(1) variable
integer :: uplo ! integer or character(1) variable
integer :: trans ! integer or character(1) variable
integer :: diag ! integer or character(1) variable
integer :: m, n
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: A(*)
integer :: lda
type(c_devptr), device :: B(*)
integer :: ldb
integer :: batchCount
integer(4) function cublasStrsmBatched_v2( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount)
type(cublasHandle) :: h
integer :: side
integer :: uplo
integer :: trans
integer :: diag
integer :: m, n
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: A(*)
integer :: lda
type(c_devptr), device :: B(*)
integer :: ldb
integer :: batchCount
cublasSgemvStridedBatched
SGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
integer(4) function cublasSgemvStridedBatched(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
real(4), device :: alpha ! device or host variable
real(4), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(4), device :: X(*)
integer :: incx
integer(8) :: strideX
real(4), device :: beta ! device or host variable
real(4), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
integer(4) function cublasSgemvStridedBatched_v2(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
real(4), device :: alpha ! device or host variable
real(4), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(4), device :: X(*)
integer :: incx
integer(8) :: strideX
real(4), device :: beta ! device or host variable
real(4), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
cublasSgemmStridedBatched
SGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasSgemmStridedBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, strideA, Barray, ldb, strideB, beta, Carray, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
real(4), device :: alpha ! device or host variable
real(4), device :: Aarray(*)
integer :: lda
integer :: strideA
real(4), device :: Barray(*)
integer :: ldb
integer :: strideB
real(4), device :: beta ! device or host variable
real(4), device :: Carray(*)
integer :: ldc
integer :: strideC
integer :: batchCount
integer(4) function cublasSgemmStridedBatched_v2(h, transa, transb, m, n, k, alpha, &
Aarray, lda, strideA, Barray, ldb, strideB, beta, Carray, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
real(4), device :: alpha ! device or host variable
real(4), device :: Aarray(*)
integer :: lda
integer :: strideA
real(4), device :: Barray(*)
integer :: ldb
integer :: strideB
real(4), device :: beta ! device or host variable
real(4), device :: Carray(*)
integer :: ldc
integer :: strideC
integer :: batchCount
Double Precision Functions and Subroutines
This section contains interfaces to the double precision BLAS and cuBLAS functions and subroutines.
idamax
IDAMAX finds the the index of the element having the maximum absolute value.
integer(4) function idamax(n, x, incx)
integer :: n
real(8), device, dimension(*) :: x ! device or host variable
integer :: incx
integer(4) function cublasIdamax(n, x, incx)
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasIdamax_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
idamin
IDAMIN finds the index of the element having the minimum absolute value.
integer(4) function idamin(n, x, incx)
integer :: n
real(8), device, dimension(*) :: x ! device or host variable
integer :: incx
integer(4) function cublasIdamin(n, x, incx)
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasIdamin_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
dasum
DASUM takes the sum of the absolute values.
real(8) function dasum(n, x, incx)
integer :: n
real(8), device, dimension(*) :: x ! device or host variable
integer :: incx
real(8) function cublasDasum(n, x, incx)
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasDasum_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
real(8), device :: res ! device or host variable
daxpy
DAXPY constant times a vector plus a vector.
subroutine daxpy(n, a, x, incx, y, incy)
integer :: n
real(8), device :: a ! device or host variable
real(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasDaxpy(n, a, x, incx, y, incy)
integer :: n
real(8), device :: a ! device or host variable
real(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasDaxpy_v2(h, n, a, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(8), device :: a ! device or host variable
real(8), device, dimension(*) :: x, y
integer :: incx, incy
dcopy
DCOPY copies a vector, x, to a vector, y.
subroutine dcopy(n, x, incx, y, incy)
integer :: n
real(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasDcopy(n, x, incx, y, incy)
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasDcopy_v2(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
ddot
DDOT forms the dot product of two vectors.
real(8) function ddot(n, x, incx, y, incy)
integer :: n
real(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
real(8) function cublasDdot(n, x, incx, y, incy)
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasDdot_v2(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
real(8), device :: res ! device or host variable
dnrm2
DNRM2 returns the euclidean norm of a vector via the function name, so that DNRM2 := sqrt( x’*x )
real(8) function dnrm2(n, x, incx)
integer :: n
real(8), device, dimension(*) :: x ! device or host variable
integer :: incx
real(8) function cublasDnrm2(n, x, incx)
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasDnrm2_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
real(8), device :: res ! device or host variable
drot
DROT applies a plane rotation.
subroutine drot(n, x, incx, y, incy, sc, ss)
integer :: n
real(8), device :: sc, ss ! device or host variable
real(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasDrot(n, x, incx, y, incy, sc, ss)
integer :: n
real(8), device :: sc, ss ! device or host variable
real(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasDrot_v2(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(8), device :: sc, ss ! device or host variable
real(8), device, dimension(*) :: x, y
integer :: incx, incy
drotg
DROTG constructs a Givens plane rotation.
subroutine drotg(sa, sb, sc, ss)
real(8), device :: sa, sb, sc, ss ! device or host variable
subroutine cublasDrotg(sa, sb, sc, ss)
real(8), device :: sa, sb, sc, ss ! device or host variable
integer(4) function cublasDrotg_v2(h, sa, sb, sc, ss)
type(cublasHandle) :: h
real(8), device :: sa, sb, sc, ss ! device or host variable
drotm
DROTM applies the modified Givens transformation, H, to the 2 by N matrix (DX**T) , where **T indicates transpose. The elements of DX are in (DX**T) DX(LX+I*INCX), I = 0 to N-1, where LX = 1 if INCX .GE. 0, ELSE LX = (-INCX)*N, and similarly for DY using LY and INCY. With DPARAM(1)=DFLAG, H has one of the following forms.. DFLAG=-1.D0 DFLAG=0.D0 DFLAG=1.D0 DFLAG=-2.D0 (DH11 DH12) (1.D0 DH12) (DH11 1.D0) (1.D0 0.D0) H=( ) ( ) ( ) ( ) (DH21 DH22), (DH21 1.D0), (-1.D0 DH22), (0.D0 1.D0). See DROTMG for a description of data storage in DPARAM.
subroutine drotm(n, x, incx, y, incy, param)
integer :: n
real(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasDrotm(n, x, incx, y, incy, param)
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
real(8), device :: param(*) ! device or host variable
integer(4) function cublasDrotm_v2(h, n, x, incx, y, incy, param)
type(cublasHandle) :: h
integer :: n
real(8), device :: param(*) ! device or host variable
real(8), device, dimension(*) :: x, y
integer :: incx, incy
drotmg
DROTMG constructs the modified Givens transformation matrix H which zeros the second component of the 2-vector (SQRT(DD1)*DX1,SQRT(DD2)*DY2)**T. With DPARAM(1)=DFLAG, H has one of the following forms.. DFLAG=-1.D0 DFLAG=0.D0 DFLAG=1.D0 DFLAG=-2.D0 (DH11 DH12) (1.D0 DH12) (DH11 1.D0) (1.D0 0.D0) H=( ) ( ) ( ) ( ) (DH21 DH22), (DH21 1.D0), (-1.D0 DH22), (0.D0 1.D0). Locations 2-4 of DPARAM contain DH11, DH21, DH12, and DH22 respectively. (Values of 1.D0, -1.D0, of 0.D0 implied by the value of DPARAM(1) are not stored in DPARAM.)
subroutine drotmg(d1, d2, x1, y1, param)
real(8), device :: d1, d2, x1, y1, param(*) ! device or host variable
subroutine cublasDrotmg(d1, d2, x1, y1, param)
real(8), device :: d1, d2, x1, y1, param(*) ! device or host variable
integer(4) function cublasDrotmg_v2(h, d1, d2, x1, y1, param)
type(cublasHandle) :: h
real(8), device :: d1, d2, x1, y1, param(*) ! device or host variable
dscal
DSCAL scales a vector by a constant.
subroutine dscal(n, a, x, incx)
integer :: n
real(8), device :: a ! device or host variable
real(8), device, dimension(*) :: x ! device or host variable
integer :: incx
subroutine cublasDscal(n, a, x, incx)
integer :: n
real(8), device :: a ! device or host variable
real(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasDscal_v2(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
real(8), device :: a ! device or host variable
real(8), device, dimension(*) :: x
integer :: incx
dswap
interchanges two vectors.
subroutine dswap(n, x, incx, y, incy)
integer :: n
real(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasDswap(n, x, incx, y, incy)
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasDswap_v2(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
dgbmv
DGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals.
subroutine dgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, kl, ku, lda, incx, incy
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x, y ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, kl, ku, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDgbmv_v2(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, kl, ku, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
dgemv
DGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
subroutine dgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x, y ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDgemv_v2(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
dger
DGER performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine dger(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x, y ! device or host variable
real(8), device :: alpha ! device or host variable
subroutine cublasDger(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha ! device or host variable
integer(4) function cublasDger_v2(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha ! device or host variable
dsbmv
DSBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric band matrix, with k super-diagonals.
subroutine dsbmv(t, n, k, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: k, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x, y ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsbmv(t, n, k, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: k, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsbmv_v2(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: k, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
dspmv
DSPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form.
subroutine dspmv(t, n, alpha, a, x, incx, beta, y, incy)
character*1 :: t
integer :: n, incx, incy
real(8), device, dimension(*) :: a, x, y ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDspmv(t, n, alpha, a, x, incx, beta, y, incy)
character*1 :: t
integer :: n, incx, incy
real(8), device, dimension(*) :: a, x, y
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDspmv_v2(h, t, n, alpha, a, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
real(8), device, dimension(*) :: a, x, y
real(8), device :: alpha, beta ! device or host variable
dspr
DSPR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix, supplied in packed form.
subroutine dspr(t, n, alpha, x, incx, a)
character*1 :: t
integer :: n, incx
real(8), device, dimension(*) :: a, x ! device or host variable
real(8), device :: alpha ! device or host variable
subroutine cublasDspr(t, n, alpha, x, incx, a)
character*1 :: t
integer :: n, incx
real(8), device, dimension(*) :: a, x
real(8), device :: alpha ! device or host variable
integer(4) function cublasDspr_v2(h, t, n, alpha, x, incx, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx
real(8), device, dimension(*) :: a, x
real(8), device :: alpha ! device or host variable
dspr2
DSPR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form.
subroutine dspr2(t, n, alpha, x, incx, y, incy, a)
character*1 :: t
integer :: n, incx, incy
real(8), device, dimension(*) :: a, x, y ! device or host variable
real(8), device :: alpha ! device or host variable
subroutine cublasDspr2(t, n, alpha, x, incx, y, incy, a)
character*1 :: t
integer :: n, incx, incy
real(8), device, dimension(*) :: a, x, y
real(8), device :: alpha ! device or host variable
integer(4) function cublasDspr2_v2(h, t, n, alpha, x, incx, y, incy, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
real(8), device, dimension(*) :: a, x, y
real(8), device :: alpha ! device or host variable
dsymv
DSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine dsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x, y ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsymv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
dsyr
DSYR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix.
subroutine dsyr(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x ! device or host variable
real(8), device :: alpha ! device or host variable
subroutine cublasDsyr(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
real(8), device :: alpha ! device or host variable
integer(4) function cublasDsyr_v2(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
real(8), device :: alpha ! device or host variable
dsyr2
DSYR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine dsyr2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x, y ! device or host variable
real(8), device :: alpha ! device or host variable
subroutine cublasDsyr2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha ! device or host variable
integer(4) function cublasDsyr2_v2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha ! device or host variable
dtbmv
DTBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals.
subroutine dtbmv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x ! device or host variable
subroutine cublasDtbmv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
integer(4) function cublasDtbmv_v2(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
dtbsv
DTBSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine dtbsv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x ! device or host variable
subroutine cublasDtbsv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
integer(4) function cublasDtbsv_v2(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
dtpmv
DTPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
subroutine dtpmv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
real(8), device, dimension(*) :: a, x ! device or host variable
subroutine cublasDtpmv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
real(8), device, dimension(*) :: a, x
integer(4) function cublasDtpmv_v2(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
real(8), device, dimension(*) :: a, x
dtpsv
DTPSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine dtpsv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
real(8), device, dimension(*) :: a, x ! device or host variable
subroutine cublasDtpsv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
real(8), device, dimension(*) :: a, x
integer(4) function cublasDtpsv_v2(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
real(8), device, dimension(*) :: a, x
dtrmv
DTRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix.
subroutine dtrmv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x ! device or host variable
subroutine cublasDtrmv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
integer(4) function cublasDtrmv_v2(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
dtrsv
DTRSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine dtrsv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(*) :: x ! device or host variable
subroutine cublasDtrsv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
integer(4) function cublasDtrsv_v2(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
dgemm
DGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
subroutine dgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: transa, transb
integer :: m, n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(ldb, *) :: b ! device or host variable
real(8), device, dimension(ldc, *) :: c ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: transa, transb
integer :: m, n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDgemm_v2(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dsymm
DSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
subroutine dsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(ldb, *) :: b ! device or host variable
real(8), device, dimension(ldc, *) :: c ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsymm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dsyrk
DSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine dsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(ldc, *) :: c ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsyrk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dsyr2k
DSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine dsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(ldb, *) :: b ! device or host variable
real(8), device, dimension(ldc, *) :: c ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsyr2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dsyrkx
DSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
subroutine dsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(ldb, *) :: b ! device or host variable
real(8), device, dimension(ldc, *) :: c ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsyrkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dtrmm
DTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ), where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T.
subroutine dtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(ldb, *) :: b ! device or host variable
real(8), device :: alpha ! device or host variable
subroutine cublasDtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device :: alpha ! device or host variable
integer(4) function cublasDtrmm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha ! device or host variable
dtrsm
DTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
subroutine dtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(8), device, dimension(lda, *) :: a ! device or host variable
real(8), device, dimension(ldb, *) :: b ! device or host variable
real(8), device :: alpha ! device or host variable
subroutine cublasDtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device :: alpha ! device or host variable
integer(4) function cublasDtrsm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device :: alpha ! device or host variable
cublasDgemvBatched
DGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
integer(4) function cublasDgemvBatched(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
real(8), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
real(8), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
integer(4) function cublasDgemvBatched_v2(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
real(8), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
real(8), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
cublasDgemmBatched
DGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasDgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
real(8), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
real(8), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
integer(4) function cublasDgemmBatched_v2(h, transa, transb, m, n, k, alpha, &
Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
real(8), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
real(8), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
cublasDgelsBatched
DGELS solves overdetermined or underdetermined real linear systems involving an M-by-N matrix A, or its transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = ‘N’ and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = ‘N’ and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = ‘T’ and m >= n: find the minimum norm solution of an undetermined system A**T * X = B. 4. If TRANS = ‘T’ and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**T * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X.
integer(4) function cublasDgelsBatched(h, trans, m, n, nrhs, Aarray, lda, Carray, ldc, info, devinfo, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n, nrhs
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: info(*)
integer, device :: devinfo(*)
integer :: batchCount
cublasDgeqrfBatched
DGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.
integer(4) function cublasDgeqrfBatched(h, m, n, Aarray, lda, Tau, info, batchCount)
type(cublasHandle) :: h
integer :: m, n
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Tau(*)
integer :: info(*)
integer :: batchCount
cublasDgetrfBatched
DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm.
integer(4) function cublasDgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
integer, device :: info(*)
integer :: batchCount
cublasDgetriBatched
DGETRI computes the inverse of a matrix using the LU factorization computed by DGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A).
integer(4) function cublasDgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
type(c_devptr), device :: Carray(*)
integer :: ldc
integer, device :: info(*)
integer :: batchCount
cublasDgetrsBatched
DGETRS solves a system of linear equations A * X = B or A**T * X = B with a general N-by-N matrix A using the LU factorization computed by DGETRF.
integer(4) function cublasDgetrsBatched(h, trans, n, nrhs, Aarray, lda, ipvt, Barray, ldb, info, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: n, nrhs
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
type(c_devptr), device :: Barray(*)
integer :: ldb
integer :: info(*)
integer :: batchCount
cublasDmatinvBatched
cublasDmatinvBatched is a short cut of cublasDgetrfBatched plus cublasDgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasDgetrfBatched and cublasDgetriBatched.
integer(4) function cublasDmatinvBatched(h, n, Aarray, lda, Ainv, lda_inv, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Ainv(*)
integer :: lda_inv
integer, device :: info(*)
integer :: batchCount
cublasDtrsmBatched
DTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
integer(4) function cublasDtrsmBatched( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount)
type(cublasHandle) :: h
integer :: side ! integer or character(1) variable
integer :: uplo ! integer or character(1) variable
integer :: trans ! integer or character(1) variable
integer :: diag ! integer or character(1) variable
integer :: m, n
real(8), device :: alpha ! device or host variable
type(c_devptr), device :: A(*)
integer :: lda
type(c_devptr), device :: B(*)
integer :: ldb
integer :: batchCount
integer(4) function cublasDtrsmBatched_v2( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount)
type(cublasHandle) :: h
integer :: side
integer :: uplo
integer :: trans
integer :: diag
integer :: m, n
real(8), device :: alpha ! device or host variable
type(c_devptr), device :: A(*)
integer :: lda
type(c_devptr), device :: B(*)
integer :: ldb
integer :: batchCount
cublasDgemvStridedBatched
DGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
integer(4) function cublasDgemvStridedBatched(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
real(8), device :: alpha ! device or host variable
real(8), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(8), device :: X(*)
integer :: incx
integer(8) :: strideX
real(8), device :: beta ! device or host variable
real(8), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
integer(4) function cublasDgemvStridedBatched_v2(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
real(8), device :: alpha ! device or host variable
real(8), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(8), device :: X(*)
integer :: incx
integer(8) :: strideX
real(8), device :: beta ! device or host variable
real(8), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
cublasDgemmStridedBatched
DGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasDgemmStridedBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, strideA, Barray, ldb, strideB, beta, Carray, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
real(8), device :: alpha ! device or host variable
real(8), device :: Aarray(*)
integer :: lda
integer :: strideA
real(8), device :: Barray(*)
integer :: ldb
integer :: strideB
real(8), device :: beta ! device or host variable
real(8), device :: Carray(*)
integer :: ldc
integer :: strideC
integer :: batchCount
integer(4) function cublasDgemmStridedBatched_v2(h, transa, transb, m, n, k, alpha, &
Aarray, lda, strideA, Barray, ldb, strideB, beta, Carray, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
real(8), device :: alpha ! device or host variable
real(8), device :: Aarray(*)
integer :: lda
integer :: strideA
real(8), device :: Barray(*)
integer :: ldb
integer :: strideB
real(8), device :: beta ! device or host variable
real(8), device :: Carray(*)
integer :: ldc
integer :: strideC
integer :: batchCount
Single Precision Complex Functions and Subroutines
This section contains interfaces to the single precision complex BLAS and cuBLAS functions and subroutines.
icamax
ICAMAX finds the index of the element having the maximum absolute value.
integer(4) function icamax(n, x, incx)
integer :: n
complex(4), device, dimension(*) :: x ! device or host variable
integer :: incx
integer(4) function cublasIcamax(n, x, incx)
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasIcamax_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
icamin
ICAMIN finds the index of the element having the minimum absolute value.
integer(4) function icamin(n, x, incx)
integer :: n
complex(4), device, dimension(*) :: x ! device or host variable
integer :: incx
integer(4) function cublasIcamin(n, x, incx)
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasIcamin_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
scasum
SCASUM takes the sum of the absolute values of a complex vector and returns a single precision result.
real(4) function scasum(n, x, incx)
integer :: n
complex(4), device, dimension(*) :: x ! device or host variable
integer :: incx
real(4) function cublasScasum(n, x, incx)
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasScasum_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
real(4), device :: res ! device or host variable
caxpy
CAXPY constant times a vector plus a vector.
subroutine caxpy(n, a, x, incx, y, incy)
integer :: n
complex(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasCaxpy(n, a, x, incx, y, incy)
integer :: n
complex(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasCaxpy_v2(h, n, a, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
ccopy
CCOPY copies a vector x to a vector y.
subroutine ccopy(n, x, incx, y, incy)
integer :: n
complex(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasCcopy(n, x, incx, y, incy)
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasCcopy_v2(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
cdotc
forms the dot product of two vectors, conjugating the first vector.
complex(4) function cdotc(n, x, incx, y, incy)
integer :: n
complex(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
complex(4) function cublasCdotc(n, x, incx, y, incy)
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasCdotc_v2(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
complex(4), device :: res ! device or host variable
cdotu
CDOTU forms the dot product of two vectors.
complex(4) function cdotu(n, x, incx, y, incy)
integer :: n
complex(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
complex(4) function cublasCdotu(n, x, incx, y, incy)
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasCdotu_v2(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
complex(4), device :: res ! device or host variable
scnrm2
SCNRM2 returns the euclidean norm of a vector via the function name, so that SCNRM2 := sqrt( x**H*x )
real(4) function scnrm2(n, x, incx)
integer :: n
complex(4), device, dimension(*) :: x ! device or host variable
integer :: incx
real(4) function cublasScnrm2(n, x, incx)
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasScnrm2_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
real(4), device :: res ! device or host variable
crot
CROT applies a plane rotation, where the cos (C) is real and the sin (S) is complex, and the vectors CX and CY are complex.
subroutine crot(n, x, incx, y, incy, sc, ss)
integer :: n
real(4), device :: sc ! device or host variable
complex(4), device :: ss ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasCrot(n, x, incx, y, incy, sc, ss)
integer :: n
real(4), device :: sc ! device or host variable
complex(4), device :: ss ! device or host variable
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasCrot_v2(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(4), device :: sc ! device or host variable
complex(4), device :: ss ! device or host variable
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
csrot
CSROT applies a plane rotation, where the cos and sin (c and s) are real and the vectors cx and cy are complex.
subroutine csrot(n, x, incx, y, incy, sc, ss)
integer :: n
real(4), device :: sc, ss ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasCsrot(n, x, incx, y, incy, sc, ss)
integer :: n
real(4), device :: sc, ss ! device or host variable
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasCsrot_v2(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(4), device :: sc, ss ! device or host variable
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
crotg
CROTG determines a complex Givens rotation.
subroutine crotg(sa, sb, sc, ss)
complex(4), device :: sa, sb, ss ! device or host variable
real(4), device :: sc ! device or host variable
subroutine cublasCrotg(sa, sb, sc, ss)
complex(4), device :: sa, sb, ss ! device or host variable
real(4), device :: sc ! device or host variable
integer(4) function cublasCrotg_v2(h, sa, sb, sc, ss)
type(cublasHandle) :: h
complex(4), device :: sa, sb, ss ! device or host variable
real(4), device :: sc ! device or host variable
cscal
CSCAL scales a vector by a constant.
subroutine cscal(n, a, x, incx)
integer :: n
complex(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x ! device or host variable
integer :: incx
subroutine cublasCscal(n, a, x, incx)
integer :: n
complex(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasCscal_v2(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
complex(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x
integer :: incx
csscal
CSSCAL scales a complex vector by a real constant.
subroutine csscal(n, a, x, incx)
integer :: n
real(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x ! device or host variable
integer :: incx
subroutine cublasCsscal(n, a, x, incx)
integer :: n
real(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x
integer :: incx
integer(4) function cublasCsscal_v2(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
real(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x
integer :: incx
cswap
CSWAP interchanges two vectors.
subroutine cswap(n, x, incx, y, incy)
integer :: n
complex(4), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasCswap(n, x, incx, y, incy)
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasCswap_v2(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
cgbmv
CGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals.
subroutine cgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, kl, ku, lda, incx, incy
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, kl, ku, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCgbmv_v2(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, kl, ku, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
cgemv
CGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
subroutine cgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCgemv_v2(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
cgerc
CGERC performs the rank 1 operation A := alpha*x*y**H + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine cgerc(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
complex(4), device :: alpha ! device or host variable
subroutine cublasCgerc(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha ! device or host variable
integer(4) function cublasCgerc_v2(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha ! device or host variable
cgeru
CGERU performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine cgeru(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
complex(4), device :: alpha ! device or host variable
subroutine cublasCgeru(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha ! device or host variable
integer(4) function cublasCgeru_v2(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha ! device or host variable
csymv
CSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine csymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsymv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
csyr
CSYR performs the symmetric rank 1 operation A := alpha*x*x**H + A, where alpha is a complex scalar, x is an n element vector and A is an n by n symmetric matrix.
subroutine csyr(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x ! device or host variable
complex(4), device :: alpha ! device or host variable
subroutine cublasCsyr(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
complex(4), device :: alpha ! device or host variable
integer(4) function cublasCsyr_v2(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
complex(4), device :: alpha ! device or host variable
csyr2
CSYR2 performs the symmetric rank 2 operation A := alpha*x*y’ + alpha*y*x’ + A, where alpha is a complex scalar, x and y are n element vectors and A is an n by n SY matrix.
subroutine csyr2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
complex(4), device :: alpha ! device or host variable
subroutine cublasCsyr2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha ! device or host variable
integer(4) function cublasCsyr2_v2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha ! device or host variable
ctbmv
CTBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals.
subroutine ctbmv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x ! device or host variable
subroutine cublasCtbmv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
integer(4) function cublasCtbmv_v2(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
ctbsv
CTBSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ctbsv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x ! device or host variable
subroutine cublasCtbsv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
integer(4) function cublasCtbsv_v2(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
ctpmv
CTPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
subroutine ctpmv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
complex(4), device, dimension(*) :: a, x ! device or host variable
subroutine cublasCtpmv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
complex(4), device, dimension(*) :: a, x
integer(4) function cublasCtpmv_v2(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
complex(4), device, dimension(*) :: a, x
ctpsv
CTPSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ctpsv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
complex(4), device, dimension(*) :: a, x ! device or host variable
subroutine cublasCtpsv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
complex(4), device, dimension(*) :: a, x
integer(4) function cublasCtpsv_v2(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
complex(4), device, dimension(*) :: a, x
ctrmv
CTRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix.
subroutine ctrmv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x ! device or host variable
subroutine cublasCtrmv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
integer(4) function cublasCtrmv_v2(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
ctrsv
CTRSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ctrsv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x ! device or host variable
subroutine cublasCtrsv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
integer(4) function cublasCtrsv_v2(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
chbmv
CHBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian band matrix, with k super-diagonals.
subroutine chbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: k, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasChbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: k, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasChbmv_v2(h, uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: k, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
chemv
CHEMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix.
subroutine chemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(*) :: x, y ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasChemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasChemv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
chpmv
CHPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form.
subroutine chpmv(uplo, n, alpha, a, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, incx, incy
complex(4), device, dimension(*) :: a, x, y ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasChpmv(uplo, n, alpha, a, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, incx, incy
complex(4), device, dimension(*) :: a, x, y
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasChpmv_v2(h, uplo, n, alpha, a, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, incx, incy
complex(4), device, dimension(*) :: a, x, y
complex(4), device :: alpha, beta ! device or host variable
cher
CHER performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix.
subroutine cher(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
complex(4), device, dimension(*) :: a, x ! device or host variable
real(4), device :: alpha ! device or host variable
subroutine cublasCher(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
complex(4), device, dimension(*) :: a, x
real(4), device :: alpha ! device or host variable
integer(4) function cublasCher_v2(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
complex(4), device, dimension(*) :: a, x
real(4), device :: alpha ! device or host variable
cher2
CHER2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix.
subroutine cher2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
complex(4), device, dimension(*) :: a, x, y ! device or host variable
complex(4), device :: alpha ! device or host variable
subroutine cublasCher2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
complex(4), device, dimension(*) :: a, x, y
complex(4), device :: alpha ! device or host variable
integer(4) function cublasCher2_v2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
complex(4), device, dimension(*) :: a, x, y
complex(4), device :: alpha ! device or host variable
chpr
CHPR performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix, supplied in packed form.
subroutine chpr(t, n, alpha, x, incx, a)
character*1 :: t
integer :: n, incx
complex(4), device, dimension(*) :: a, x ! device or host variable
real(4), device :: alpha ! device or host variable
subroutine cublasChpr(t, n, alpha, x, incx, a)
character*1 :: t
integer :: n, incx
complex(4), device, dimension(*) :: a, x
real(4), device :: alpha ! device or host variable
integer(4) function cublasChpr_v2(h, t, n, alpha, x, incx, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx
complex(4), device, dimension(*) :: a, x
real(4), device :: alpha ! device or host variable
chpr2
CHPR2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form.
subroutine chpr2(t, n, alpha, x, incx, y, incy, a)
character*1 :: t
integer :: n, incx, incy
complex(4), device, dimension(*) :: a, x, y ! device or host variable
complex(4), device :: alpha ! device or host variable
subroutine cublasChpr2(t, n, alpha, x, incx, y, incy, a)
character*1 :: t
integer :: n, incx, incy
complex(4), device, dimension(*) :: a, x, y
complex(4), device :: alpha ! device or host variable
integer(4) function cublasChpr2_v2(h, t, n, alpha, x, incx, y, incy, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
complex(4), device, dimension(*) :: a, x, y
complex(4), device :: alpha ! device or host variable
cgemm
CGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
subroutine cgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: transa, transb
integer :: m, n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldb, *) :: b ! device or host variable
complex(4), device, dimension(ldc, *) :: c ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: transa, transb
integer :: m, n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCgemm_v2(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
csymm
CSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
subroutine csymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldb, *) :: b ! device or host variable
complex(4), device, dimension(ldc, *) :: c ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsymm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
csyrk
CSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine csyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldc, *) :: c ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsyrk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
csyr2k
CSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine csyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldb, *) :: b ! device or host variable
complex(4), device, dimension(ldc, *) :: c ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsyr2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
csyrkx
CSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
subroutine csyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldb, *) :: b ! device or host variable
complex(4), device, dimension(ldc, *) :: c ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsyrkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
ctrmm
CTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ) where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H.
subroutine ctrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldb, *) :: b ! device or host variable
complex(4), device :: alpha ! device or host variable
subroutine cublasCtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device :: alpha ! device or host variable
integer(4) function cublasCtrmm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha ! device or host variable
ctrsm
CTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
subroutine ctrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldb, *) :: b ! device or host variable
complex(4), device :: alpha ! device or host variable
subroutine cublasCtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device :: alpha ! device or host variable
integer(4) function cublasCtrsm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device :: alpha ! device or host variable
chemm
CHEMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is an hermitian matrix and B and C are m by n matrices.
subroutine chemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldb, *) :: b ! device or host variable
complex(4), device, dimension(ldc, *) :: c ! device or host variable
complex(4), device :: alpha, beta ! device or host variable
subroutine cublasChemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasChemm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
cherk
CHERK performs one of the hermitian rank k operations C := alpha*A*A**H + beta*C, or C := alpha*A**H*A + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine cherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldc, *) :: c ! device or host variable
real(4), device :: alpha, beta ! device or host variable
subroutine cublasCherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCherk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
cher2k
CHER2K performs one of the hermitian rank 2k operations C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C, or C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C, where alpha and beta are scalars with beta real, C is an n by n hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine cher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldb, *) :: b ! device or host variable
complex(4), device, dimension(ldc, *) :: c ! device or host variable
complex(4), device :: alpha ! device or host variable
real(4), device :: beta ! device or host variable
subroutine cublasCher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha ! device or host variable
real(4), device :: beta ! device or host variable
integer(4) function cublasCher2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha ! device or host variable
real(4), device :: beta ! device or host variable
cherkx
CHERKX performs a variation of the hermitian rank k operations C := alpha*A*B**H + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix stored in lower or upper mode, and A and B are n by k matrices. See the CUBLAS documentation for more details.
subroutine cherkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a ! device or host variable
complex(4), device, dimension(ldb, *) :: b ! device or host variable
complex(4), device, dimension(ldc, *) :: c ! device or host variable
complex(4), device :: alpha ! device or host variable
real(4), device :: beta ! device or host variable
subroutine cublasCherkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha ! device or host variable
real(4), device :: beta ! device or host variable
integer(4) function cublasCherkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha ! device or host variable
real(4), device :: beta ! device or host variable
cublasCgemvBatched
CGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
integer(4) function cublasCgemvBatched(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
complex(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
complex(4), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
integer(4) function cublasCgemvBatched_v2(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
complex(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
complex(4), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
cublasCgemmBatched
CGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasCgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
complex(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
complex(4), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
integer(4) function cublasCgemmBatched_v2(h, transa, transb, m, n, k, alpha, &
Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
complex(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
complex(4), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
cublasCgelsBatched
CGELS solves overdetermined or underdetermined complex linear systems involving an M-by-N matrix A, or its conjugate-transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = ‘N’ and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = ‘N’ and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = ‘C’ and m >= n: find the minimum norm solution of an undetermined system A**H * X = B. 4. If TRANS = ‘C’ and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**H * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X.
integer(4) function cublasCgelsBatched(h, trans, m, n, nrhs, Aarray, lda, Carray, ldc, info, devinfo, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n, nrhs
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: info(*)
integer, device :: devinfo(*)
integer :: batchCount
cublasCgeqrfBatched
CGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.
integer(4) function cublasCgeqrfBatched(h, m, n, Aarray, lda, Tau, info, batchCount)
type(cublasHandle) :: h
integer :: m, n
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Tau(*)
integer :: info(*)
integer :: batchCount
cublasCgetrfBatched
CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm.
integer(4) function cublasCgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
integer, device :: info(*)
integer :: batchCount
cublasCgetriBatched
CGETRI computes the inverse of a matrix using the LU factorization computed by CGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A).
integer(4) function cublasCgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
type(c_devptr), device :: Carray(*)
integer :: ldc
integer, device :: info(*)
integer :: batchCount
cublasCgetrsBatched
CGETRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general N-by-N matrix A using the LU factorization computed by CGETRF.
integer(4) function cublasCgetrsBatched(h, trans, n, nrhs, Aarray, lda, ipvt, Barray, ldb, info, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: n, nrhs
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
type(c_devptr), device :: Barray(*)
integer :: ldb
integer :: info(*)
integer :: batchCount
cublasCmatinvBatched
cublasCmatinvBatched is a short cut of cublasCgetrfBatched plus cublasCgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasCgetrfBatched and cublasCgetriBatched.
integer(4) function cublasCmatinvBatched(h, n, Aarray, lda, Ainv, lda_inv, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Ainv(*)
integer :: lda_inv
integer, device :: info(*)
integer :: batchCount
cublasCtrsmBatched
CTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
integer(4) function cublasCtrsmBatched( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount)
type(cublasHandle) :: h
integer :: side ! integer or character(1) variable
integer :: uplo ! integer or character(1) variable
integer :: trans ! integer or character(1) variable
integer :: diag ! integer or character(1) variable
integer :: m, n
complex(4), device :: alpha ! device or host variable
type(c_devptr), device :: A(*)
integer :: lda
type(c_devptr), device :: B(*)
integer :: ldb
integer :: batchCount
integer(4) function cublasCtrsmBatched_v2( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount)
type(cublasHandle) :: h
integer :: side
integer :: uplo
integer :: trans
integer :: diag
integer :: m, n
complex(4), device :: alpha ! device or host variable
type(c_devptr), device :: A(*)
integer :: lda
type(c_devptr), device :: B(*)
integer :: ldb
integer :: batchCount
cublasCgemvStridedBatched
CGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
integer(4) function cublasCgemvStridedBatched(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
complex(4), device :: alpha ! device or host variable
complex(4), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
complex(4), device :: X(*)
integer :: incx
integer(8) :: strideX
complex(4), device :: beta ! device or host variable
complex(4), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
integer(4) function cublasCgemvStridedBatched_v2(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
complex(4), device :: alpha ! device or host variable
complex(4), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
complex(4), device :: X(*)
integer :: incx
integer(8) :: strideX
complex(4), device :: beta ! device or host variable
complex(4), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
cublasCgemmStridedBatched
CGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasCgemmStridedBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, strideA, Barray, ldb, strideB, beta, Carray, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
complex(4), device :: alpha ! device or host variable
complex(4), device :: Aarray(*)
integer :: lda
integer :: strideA
complex(4), device :: Barray(*)
integer :: ldb
integer :: strideB
complex(4), device :: beta ! device or host variable
complex(4), device :: Carray(*)
integer :: ldc
integer :: strideC
integer :: batchCount
integer(4) function cublasCgemmStridedBatched_v2(h, transa, transb, m, n, k, alpha, &
Aarray, lda, strideA, Barray, ldb, strideB, beta, Carray, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
complex(4), device :: alpha ! device or host variable
complex(4), device :: Aarray(*)
integer :: lda
integer :: strideA
complex(4), device :: Barray(*)
integer :: ldb
integer :: strideB
complex(4), device :: beta ! device or host variable
complex(4), device :: Carray(*)
integer :: ldc
integer :: strideC
integer :: batchCount
Double Precision Complex Functions and Subroutines
This section contains interfaces to the double precision complex BLAS and cuBLAS functions and subroutines.
izamax
IZAMAX finds the index of the element having the maximum absolute value.
integer(4) function izamax(n, x, incx)
integer :: n
complex(8), device, dimension(*) :: x ! device or host variable
integer :: incx
integer(4) function cublasIzamax(n, x, incx)
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasIzamax_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
izamin
IZAMIN finds the index of the element having the minimum absolute value.
integer(4) function izamin(n, x, incx)
integer :: n
complex(8), device, dimension(*) :: x ! device or host variable
integer :: incx
integer(4) function cublasIzamin(n, x, incx)
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasIzamin_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
dzasum
DZASUM takes the sum of the absolute values.
real(8) function dzasum(n, x, incx)
integer :: n
complex(8), device, dimension(*) :: x ! device or host variable
integer :: incx
real(8) function cublasDzasum(n, x, incx)
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasDzasum_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
real(8), device :: res ! device or host variable
zaxpy
ZAXPY constant times a vector plus a vector.
subroutine zaxpy(n, a, x, incx, y, incy)
integer :: n
complex(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasZaxpy(n, a, x, incx, y, incy)
integer :: n
complex(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasZaxpy_v2(h, n, a, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zcopy
ZCOPY copies a vector, x, to a vector, y.
subroutine zcopy(n, x, incx, y, incy)
integer :: n
complex(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasZcopy(n, x, incx, y, incy)
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasZcopy_v2(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zdotc
ZDOTC forms the dot product of a vector.
complex(8) function zdotc(n, x, incx, y, incy)
integer :: n
complex(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
complex(8) function cublasZdotc(n, x, incx, y, incy)
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasZdotc_v2(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
complex(8), device :: res ! device or host variable
zdotu
ZDOTU forms the dot product of two vectors.
complex(8) function zdotu(n, x, incx, y, incy)
integer :: n
complex(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
complex(8) function cublasZdotu(n, x, incx, y, incy)
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasZdotu_v2(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
complex(8), device :: res ! device or host variable
dznrm2
DZNRM2 returns the euclidean norm of a vector via the function name, so that DZNRM2 := sqrt( x**H*x )
real(8) function dznrm2(n, x, incx)
integer :: n
complex(8), device, dimension(*) :: x ! device or host variable
integer :: incx
real(8) function cublasDznrm2(n, x, incx)
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasDznrm2_v2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
real(8), device :: res ! device or host variable
zrot
ZROT applies a plane rotation, where the cos (C) is real and the sin (S) is complex, and the vectors CX and CY are complex.
subroutine zrot(n, x, incx, y, incy, sc, ss)
integer :: n
real(8), device :: sc ! device or host variable
complex(8), device :: ss ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasZrot(n, x, incx, y, incy, sc, ss)
integer :: n
real(8), device :: sc ! device or host variable
complex(8), device :: ss ! device or host variable
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasZrot_v2(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(8), device :: sc ! device or host variable
complex(8), device :: ss ! device or host variable
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zsrot
ZSROT applies a plane rotation, where the cos and sin (c and s) are real and the vectors cx and cy are complex.
subroutine zsrot(n, x, incx, y, incy, sc, ss)
integer :: n
real(8), device :: sc, ss ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasZsrot(n, x, incx, y, incy, sc, ss)
integer :: n
real(8), device :: sc, ss ! device or host variable
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasZsrot_v2(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(8), device :: sc, ss ! device or host variable
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zrotg
ZROTG determines a double complex Givens rotation.
subroutine zrotg(sa, sb, sc, ss)
complex(8), device :: sa, sb, ss ! device or host variable
real(8), device :: sc ! device or host variable
subroutine cublasZrotg(sa, sb, sc, ss)
complex(8), device :: sa, sb, ss ! device or host variable
real(8), device :: sc ! device or host variable
integer(4) function cublasZrotg_v2(h, sa, sb, sc, ss)
type(cublasHandle) :: h
complex(8), device :: sa, sb, ss ! device or host variable
real(8), device :: sc ! device or host variable
zscal
ZSCAL scales a vector by a constant.
subroutine zscal(n, a, x, incx)
integer :: n
complex(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x ! device or host variable
integer :: incx
subroutine cublasZscal(n, a, x, incx)
integer :: n
complex(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasZscal_v2(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
complex(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x
integer :: incx
zdscal
ZDSCAL scales a vector by a constant.
subroutine zdscal(n, a, x, incx)
integer :: n
real(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x ! device or host variable
integer :: incx
subroutine cublasZdscal(n, a, x, incx)
integer :: n
real(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x
integer :: incx
integer(4) function cublasZdscal_v2(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
real(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x
integer :: incx
zswap
ZSWAP interchanges two vectors.
subroutine zswap(n, x, incx, y, incy)
integer :: n
complex(8), device, dimension(*) :: x, y ! device or host variable
integer :: incx, incy
subroutine cublasZswap(n, x, incx, y, incy)
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
integer(4) function cublasZswap_v2(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zgbmv
ZGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals.
subroutine zgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, kl, ku, lda, incx, incy
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, kl, ku, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZgbmv_v2(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, kl, ku, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zgemv
ZGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
subroutine zgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: t
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZgemv_v2(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zgerc
ZGERC performs the rank 1 operation A := alpha*x*y**H + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine zgerc(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
complex(8), device :: alpha ! device or host variable
subroutine cublasZgerc(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha ! device or host variable
integer(4) function cublasZgerc_v2(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha ! device or host variable
zgeru
ZGERU performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine zgeru(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
complex(8), device :: alpha ! device or host variable
subroutine cublasZgeru(m, n, alpha, x, incx, y, incy, a, lda)
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha ! device or host variable
integer(4) function cublasZgeru_v2(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha ! device or host variable
zsymv
ZSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine zsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsymv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zsyr
ZSYR performs the symmetric rank 1 operation A := alpha*x*x**H + A, where alpha is a complex scalar, x is an n element vector and A is an n by n symmetric matrix.
subroutine zsyr(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x ! device or host variable
complex(8), device :: alpha ! device or host variable
subroutine cublasZsyr(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
complex(8), device :: alpha ! device or host variable
integer(4) function cublasZsyr_v2(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
complex(8), device :: alpha ! device or host variable
zsyr2
ZSYR2 performs the symmetric rank 2 operation A := alpha*x*y’ + alpha*y*x’ + A, where alpha is a complex scalar, x and y are n element vectors and A is an n by n SY matrix.
subroutine zsyr2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
complex(8), device :: alpha ! device or host variable
subroutine cublasZsyr2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha ! device or host variable
integer(4) function cublasZsyr2_v2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha ! device or host variable
ztbmv
ZTBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals.
subroutine ztbmv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x ! device or host variable
subroutine cublasZtbmv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
integer(4) function cublasZtbmv_v2(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
ztbsv
ZTBSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ztbsv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x ! device or host variable
subroutine cublasZtbsv(u, t, d, n, k, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, k, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
integer(4) function cublasZtbsv_v2(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
ztpmv
ZTPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
subroutine ztpmv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
complex(8), device, dimension(*) :: a, x ! device or host variable
subroutine cublasZtpmv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
complex(8), device, dimension(*) :: a, x
integer(4) function cublasZtpmv_v2(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
complex(8), device, dimension(*) :: a, x
ztpsv
ZTPSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ztpsv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
complex(8), device, dimension(*) :: a, x ! device or host variable
subroutine cublasZtpsv(u, t, d, n, a, x, incx)
character*1 :: u, t, d
integer :: n, incx
complex(8), device, dimension(*) :: a, x
integer(4) function cublasZtpsv_v2(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
complex(8), device, dimension(*) :: a, x
ztrmv
ZTRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix.
subroutine ztrmv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x ! device or host variable
subroutine cublasZtrmv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
integer(4) function cublasZtrmv_v2(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
ztrsv
ZTRSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ztrsv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x ! device or host variable
subroutine cublasZtrsv(u, t, d, n, a, lda, x, incx)
character*1 :: u, t, d
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
integer(4) function cublasZtrsv_v2(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
zhbmv
ZHBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian band matrix, with k super-diagonals.
subroutine zhbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: k, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZhbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: k, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZhbmv_v2(h, uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: k, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zhemv
ZHEMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix.
subroutine zhemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(*) :: x, y ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZhemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZhemv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zhpmv
ZHPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form.
subroutine zhpmv(uplo, n, alpha, a, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, incx, incy
complex(8), device, dimension(*) :: a, x, y ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZhpmv(uplo, n, alpha, a, x, incx, beta, y, incy)
character*1 :: uplo
integer :: n, incx, incy
complex(8), device, dimension(*) :: a, x, y
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZhpmv_v2(h, uplo, n, alpha, a, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, incx, incy
complex(8), device, dimension(*) :: a, x, y
complex(8), device :: alpha, beta ! device or host variable
zher
ZHER performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix.
subroutine zher(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
complex(8), device, dimension(*) :: a, x ! device or host variable
real(8), device :: alpha ! device or host variable
subroutine cublasZher(t, n, alpha, x, incx, a, lda)
character*1 :: t
integer :: n, incx, lda
complex(8), device, dimension(*) :: a, x
real(8), device :: alpha ! device or host variable
integer(4) function cublasZher_v2(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
complex(8), device, dimension(*) :: a, x
real(8), device :: alpha ! device or host variable
zher2
ZHER2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix.
subroutine zher2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy
complex(8), device, dimension(*) :: a, x, y ! device or host variable
complex(8), device :: alpha ! device or host variable
subroutine cublasZher2(t, n, alpha, x, incx, y, incy, a, lda)
character*1 :: t
integer :: n, incx, incy, lda
complex(8), device, dimension(*) :: a, x, y
complex(8), device :: alpha ! device or host variable
integer(4) function cublasZher2_v2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
complex(8), device, dimension(*) :: a, x, y
complex(8), device :: alpha ! device or host variable
zhpr
ZHPR performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix, supplied in packed form.
subroutine zhpr(t, n, alpha, x, incx, a)
character*1 :: t
integer :: n, incx
complex(8), device, dimension(*) :: a, x ! device or host variable
real(8), device :: alpha ! device or host variable
subroutine cublasZhpr(t, n, alpha, x, incx, a)
character*1 :: t
integer :: n, incx
complex(8), device, dimension(*) :: a, x
real(8), device :: alpha ! device or host variable
integer(4) function cublasZhpr_v2(h, t, n, alpha, x, incx, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx
complex(8), device, dimension(*) :: a, x
real(8), device :: alpha ! device or host variable
zhpr2
ZHPR2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form.
subroutine zhpr2(t, n, alpha, x, incx, y, incy, a)
character*1 :: t
integer :: n, incx, incy
complex(8), device, dimension(*) :: a, x, y ! device or host variable
complex(8), device :: alpha ! device or host variable
subroutine cublasZhpr2(t, n, alpha, x, incx, y, incy, a)
character*1 :: t
integer :: n, incx, incy
complex(8), device, dimension(*) :: a, x, y
complex(8), device :: alpha ! device or host variable
integer(4) function cublasZhpr2_v2(h, t, n, alpha, x, incx, y, incy, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
complex(8), device, dimension(*) :: a, x, y
complex(8), device :: alpha ! device or host variable
zgemm
ZGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
subroutine zgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: transa, transb
integer :: m, n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldb, *) :: b ! device or host variable
complex(8), device, dimension(ldc, *) :: c ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: transa, transb
integer :: m, n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZgemm_v2(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zsymm
ZSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
subroutine zsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldb, *) :: b ! device or host variable
complex(8), device, dimension(ldc, *) :: c ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsymm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zsyrk
ZSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine zsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldc, *) :: c ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsyrk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zsyr2k
ZSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine zsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldb, *) :: b ! device or host variable
complex(8), device, dimension(ldc, *) :: c ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsyr2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zsyrkx
ZSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
subroutine zsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldb, *) :: b ! device or host variable
complex(8), device, dimension(ldc, *) :: c ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsyrkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
ztrmm
ZTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ) where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H.
subroutine ztrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldb, *) :: b ! device or host variable
complex(8), device :: alpha ! device or host variable
subroutine cublasZtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device :: alpha ! device or host variable
integer(4) function cublasZtrmm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha ! device or host variable
ztrsm
ZTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
subroutine ztrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldb, *) :: b ! device or host variable
complex(8), device :: alpha ! device or host variable
subroutine cublasZtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
character*1 :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device :: alpha ! device or host variable
integer(4) function cublasZtrsm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device :: alpha ! device or host variable
zhemm
ZHEMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is an hermitian matrix and B and C are m by n matrices.
subroutine zhemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldb, *) :: b ! device or host variable
complex(8), device, dimension(ldc, *) :: c ! device or host variable
complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZhemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZhemm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zherk
ZHERK performs one of the hermitian rank k operations C := alpha*A*A**H + beta*C, or C := alpha*A**H*A + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine zherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldc, *) :: c ! device or host variable
real(8), device :: alpha, beta ! device or host variable
subroutine cublasZherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZherk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
zher2k
ZHER2K performs one of the hermitian rank 2k operations C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C, or C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C, where alpha and beta are scalars with beta real, C is an n by n hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine zher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldb, *) :: b ! device or host variable
complex(8), device, dimension(ldc, *) :: c ! device or host variable
complex(8), device :: alpha ! device or host variable
real(8), device :: beta ! device or host variable
subroutine cublasZher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha ! device or host variable
real(8), device :: beta ! device or host variable
integer(4) function cublasZher2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha ! device or host variable
real(8), device :: beta ! device or host variable
zherkx
ZHERKX performs a variation of the hermitian rank k operations C := alpha*A*B**H + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix stored in lower or upper mode, and A and B are n by k matrices. See the CUBLAS documentation for more details.
subroutine zherkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a ! device or host variable
complex(8), device, dimension(ldb, *) :: b ! device or host variable
complex(8), device, dimension(ldc, *) :: c ! device or host variable
complex(8), device :: alpha ! device or host variable
real(8), device :: beta ! device or host variable
subroutine cublasZherkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
character*1 :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha ! device or host variable
real(8), device :: beta ! device or host variable
integer(4) function cublasZherkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha ! device or host variable
real(8), device :: beta ! device or host variable
cublasZgemvBatched
ZGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
integer(4) function cublasZgemvBatched(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
complex(8), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
complex(8), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
integer(4) function cublasZgemvBatched_v2(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
complex(8), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
complex(8), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
cublasZgemmBatched
ZGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasZgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
complex(8), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
complex(8), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
integer(4) function cublasZgemmBatched_v2(h, transa, transb, m, n, k, alpha, &
Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
complex(8), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
complex(8), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
cublasZgelsBatched
ZGELS solves overdetermined or underdetermined complex linear systems involving an M-by-N matrix A, or its conjugate-transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = ‘N’ and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = ‘N’ and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = ‘C’ and m >= n: find the minimum norm solution of an undetermined system A**H * X = B. 4. If TRANS = ‘C’ and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**H * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X.
integer(4) function cublasZgelsBatched(h, trans, m, n, nrhs, Aarray, lda, Carray, ldc, info, devinfo, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n, nrhs
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: info(*)
integer, device :: devinfo(*)
integer :: batchCount
cublasZgeqrfBatched
ZGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.
integer(4) function cublasZgeqrfBatched(h, m, n, Aarray, lda, Tau, info, batchCount)
type(cublasHandle) :: h
integer :: m, n
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Tau(*)
integer :: info(*)
integer :: batchCount
cublasZgetrfBatched
ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm.
integer(4) function cublasZgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
integer, device :: info(*)
integer :: batchCount
cublasZgetriBatched
ZGETRI computes the inverse of a matrix using the LU factorization computed by ZGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A).
integer(4) function cublasZgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
type(c_devptr), device :: Carray(*)
integer :: ldc
integer, device :: info(*)
integer :: batchCount
cublasZgetrsBatched
ZGETRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general N-by-N matrix A using the LU factorization computed by ZGETRF.
integer(4) function cublasZgetrsBatched(h, trans, n, nrhs, Aarray, lda, ipvt, Barray, ldb, info, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: n, nrhs
type(c_devptr), device :: Aarray(*)
integer :: lda
integer, device :: ipvt(*)
type(c_devptr), device :: Barray(*)
integer :: ldb
integer :: info(*)
integer :: batchCount
cublasZmatinvBatched
cublasZmatinvBatched is a short cut of cublasZgetrfBatched plus cublasZgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasZgetrfBatched and cublasZgetriBatched.
integer(4) function cublasZmatinvBatched(h, n, Aarray, lda, Ainv, lda_inv, info, batchCount)
type(cublasHandle) :: h
integer :: n
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Ainv(*)
integer :: lda_inv
integer, device :: info(*)
integer :: batchCount
cublasZtrsmBatched
ZTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
integer(4) function cublasZtrsmBatched( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount)
type(cublasHandle) :: h
integer :: side ! integer or character(1) variable
integer :: uplo ! integer or character(1) variable
integer :: trans ! integer or character(1) variable
integer :: diag ! integer or character(1) variable
integer :: m, n
complex(8), device :: alpha ! device or host variable
type(c_devptr), device :: A(*)
integer :: lda
type(c_devptr), device :: B(*)
integer :: ldb
integer :: batchCount
integer(4) function cublasZtrsmBatched_v2( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount)
type(cublasHandle) :: h
integer :: side
integer :: uplo
integer :: trans
integer :: diag
integer :: m, n
complex(8), device :: alpha ! device or host variable
type(c_devptr), device :: A(*)
integer :: lda
type(c_devptr), device :: B(*)
integer :: ldb
integer :: batchCount
cublasZgemvStridedBatched
ZGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
integer(4) function cublasZgemvStridedBatched(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
complex(8), device :: alpha ! device or host variable
complex(8), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
complex(8), device :: X(*)
integer :: incx
integer(8) :: strideX
complex(8), device :: beta ! device or host variable
complex(8), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
integer(4) function cublasZgemvStridedBatched_v2(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
complex(8), device :: alpha ! device or host variable
complex(8), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
complex(8), device :: X(*)
integer :: incx
integer(8) :: strideX
complex(8), device :: beta ! device or host variable
complex(8), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
cublasZgemmStridedBatched
ZGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasZgemmStridedBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, strideA, Barray, ldb, strideB, beta, Carray, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
complex(8), device :: alpha ! device or host variable
complex(8), device :: Aarray(*)
integer :: lda
integer :: strideA
complex(8), device :: Barray(*)
integer :: ldb
integer :: strideB
complex(8), device :: beta ! device or host variable
complex(8), device :: Carray(*)
integer :: ldc
integer :: strideC
integer :: batchCount
integer(4) function cublasZgemmStridedBatched_v2(h, transa, transb, m, n, k, alpha, &
Aarray, lda, strideA, Barray, ldb, strideB, beta, Carray, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
complex(8), device :: alpha ! device or host variable
complex(8), device :: Aarray(*)
integer :: lda
integer :: strideA
complex(8), device :: Barray(*)
integer :: ldb
integer :: strideB
complex(8), device :: beta ! device or host variable
complex(8), device :: Carray(*)
integer :: ldc
integer :: strideC
integer :: batchCount
Half Precision Functions and Extension Functions
This section contains interfaces to the half precision cuBLAS functions and the BLAS extension functions which allow the user to individually specify the types of the arrays and computation (many or all of which support half precision).
The extension functions can accept one of many supported datatypes. Users should always check the latest cuBLAS documentation for supported combinations. In this document we will use the real(2) datatype since those functions are not otherwise supported by the S, D, C,
and Z
variants in the libraries. In addition, the user is responsible for properly setting the pointer mode by making calls to cublasSetPointerMode
for all extension functions.
The type(cudaDataType)
is now common to several of the newer library functions covered in this document. Though some functions will accept an appropriately valued integer, the use of type(cudaDataType)
is now recommended going forward.
cublasHgemvBatched
HGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
In the HSH versions, alpha, beta
are real(4), and the arrays which are pointed to should all contain real(2) data.
integer(4) function cublasHSHgemvBatched(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
real(4), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
integer(4) function cublasHSHgemvBatched_v2(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
real(4), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
In the HSS versions, alpha, beta
are real(4), the Aarray, xarray
arrays which are pointed to should contain real(2) data, and yarray
should contain real(4) data.
integer(4) function cublasHSSgemvBatched(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
real(4), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
integer(4) function cublasHSSgemvBatched_v2(h, trans, m, n, alpha, &
Aarray, lda, xarray, incx, beta, yarray, incy, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
real(4), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: xarray(*)
integer :: incx
real(4), device :: beta ! device or host variable
type(c_devptr), device :: yarray(*)
integer :: incy
integer :: batchCount
cublasHgemvStridedBatched
HGEMV performs a batch of the matrix-vector operations Y := alpha*op( A ) * X + beta*Y, where op( A ) is one of op( A ) = A or op( A ) = A**T, alpha and beta are scalars, A is an m by n matrix, and X and Y are vectors.
In the HSH versions, alpha, beta
are real(4), and the arrays A, X, Y
are all real(2) data.
integer(4) function cublasHSHgemvStridedBatched(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
real(4), device :: alpha ! device or host variable
real(2), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(2), device :: X(*)
integer :: incx
integer(8) :: strideX
real(4), device :: beta ! device or host variable
real(2), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
integer(4) function cublasHSHgemvStridedBatched_v2(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
real(4), device :: alpha ! device or host variable
real(2), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(2), device :: X(*)
integer :: incx
integer(8) :: strideX
real(4), device :: beta ! device or host variable
real(2), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
In the HSS versions, alpha, beta
are real(4), the A, X
arrays contain real(2) data, and the Y
array contains real(4) data.
integer(4) function cublasHSSgemvStridedBatched(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans ! integer or character(1) variable
integer :: m, n
real(4), device :: alpha ! device or host variable
real(2), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(2), device :: X(*)
integer :: incx
integer(8) :: strideX
real(4), device :: beta ! device or host variable
real(4), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
integer(4) function cublasHSSgemvStridedBatched_v2(h, trans, m, n, alpha, &
A, lda, strideA, X, incx, strideX, beta, Y, incy, strideY, batchCount)
type(cublasHandle) :: h
integer :: trans
integer :: m, n
real(4), device :: alpha ! device or host variable
real(2), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(2), device :: X(*)
integer :: incx
integer(8) :: strideX
real(4), device :: beta ! device or host variable
real(4), device :: Y(*)
integer :: incy
integer(8) :: strideY
integer :: batchCount
cublasHgemm
HGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
subroutine cublasHgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, &
beta, c, ldc)
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k, lda, ldb, ldc
real(2), device, dimension(lda, *) :: a
real(2), device, dimension(ldb, *) :: b
real(2), device, dimension(ldc, *) :: c
real(2), device :: alpha, beta ! device or host variable
In the v2 version, the user is responsible for setting the pointer mode for the alpha, beta
arguments.
integer(4) function cublasHgemm_v2(h, transa, transb, m, n, k, alpha, &
a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k, lda, ldb, ldc
real(2), device, dimension(lda, *) :: a
real(2), device, dimension(ldb, *) :: b
real(2), device, dimension(ldc, *) :: c
real(2), device :: alpha, beta ! device or host variable
cublasHgemmBatched
HGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasHgemmBatched(h, transa, transb, m, n, k, &
alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
real(2), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
real(2), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
integer(4) function cublasHgemmBatched_v2(h, transa, transb, m, n, k, &
alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
real(2), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
integer :: lda
type(c_devptr), device :: Barray(*)
integer :: ldb
real(2), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
integer :: ldc
integer :: batchCount
cublasHgemmStridedBatched
HGEMM performs a set of matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasHgemmStridedBatched(h, transa, transb, m, n, k, &
alpha, A, lda, strideA, B, ldb, strideB, beta, C, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa ! integer or character(1) variable
integer :: transb ! integer or character(1) variable
integer :: m, n, k
real(2), device :: alpha ! device or host variable
real(2), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(2), device :: B(ldb,*)
integer :: ldb
integer(8) :: strideB
real(2), device :: beta ! device or host variable
real(2), device :: C(ldc,*)
integer :: ldc
integer(8) :: strideC
integer :: batchCount
integer(4) function cublasHgemmStridedBatched_v2(h, transa, transb, m, n, k, &
alpha, A, lda, strideA, B, ldb, strideB, beta, C, ldc, strideC, batchCount)
type(cublasHandle) :: h
integer :: transa
integer :: transb
integer :: m, n, k
real(2), device :: alpha ! device or host variable
real(2), device :: A(lda,*)
integer :: lda
integer(8) :: strideA
real(2), device :: B(ldb,*)
integer :: ldb
integer(8) :: strideB
real(2), device :: beta ! device or host variable
real(2), device :: C(ldc,*)
integer :: ldc
integer(8) :: strideC
integer :: batchCount
cublasIamaxEx
IAMAX finds the index of the element having the maximum absolute value.
integer(4) function cublasIamaxEx(h, n, x, xtype, incx, res)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x ! Type and kind as specified by xtype
type(cudaDataType) :: xtype
integer :: incx
integer, device :: res ! device or host variable
cublasIaminEx
IAMIN finds the index of the element having the minimum absolute value.
integer(4) function cublasIaminEx(h, n, x, xtype, incx, res)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x ! Type and kind as specified by xtype
type(cudaDataType) :: xtype
integer :: incx
integer, device :: res ! device or host variable
cublasAsumEx
ASUM takes the sum of the absolute values.
integer(4) function cublasAsumEx(h, n, x, xtype, incx, res, &
restype, extype)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x ! Type and kind as specified by xtype
type(cudaDataType) :: xtype
integer :: incx
real(2), device :: res ! device or host variable
type(cudaDataType) :: restype
type(cudaDataType) :: extype
cublasAxpyEx
AXPY computes a constant times a vector plus a vector.
integer(4) function cublasAxpyEx(h, n, alpha, alphatype, &
x, xtype, incx, y, ytype, incy, extype)
type(cublasHandle) :: h
integer :: n
real(2), device :: alpha
type(cudaDataType) :: alphatype
real(2), device, dimension(*) :: x
type(cudaDataType) :: xtype
integer :: incx
real(2), device, dimension(*) :: y
type(cudaDataType) :: ytype
integer :: incy
type(cudaDataType) :: extype
cublasCopyEx
COPY copies a vector, x, to a vector, y.
integer(4) function cublasCopyEx(h, n, x, xtype, incx, &
y, ytype, incy)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x
type(cudaDataType) :: xtype
integer :: incx
real(2), device, dimension(*) :: y
type(cudaDataType) :: ytype
integer :: incy
cublasDotEx
DOT forms the dot product of two vectors.
integer(4) function cublasDotEx(h, n, x, xtype, incx, &
y, ytype, incy, res, restype, extype)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x ! Type and kind as specified by xtype
type(cudaDataType) :: xtype
integer :: incx
real(2), device, dimension(*) :: y ! Type and kind as specified by ytype
type(cudaDataType) :: ytype
integer :: incy
real(2), device :: res ! device or host variable
type(cudaDataType) :: restype
type(cudaDataType) :: extype
cublasDotcEx
DOTC forms the conjugated dot product of two vectors.
integer(4) function cublasDotcEx(h, n, x, xtype, incx, &
y, ytype, incy, res, restype, extype)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x ! Type and kind as specified by xtype
type(cudaDataType) :: xtype
integer :: incx
real(2), device, dimension(*) :: y ! Type and kind as specified by ytype
type(cudaDataType) :: ytype
integer :: incy
real(2), device :: res ! device or host variable
type(cudaDataType) :: restype
type(cudaDataType) :: extype
cublasNrm2Ex
NRM2 produces the euclidean norm of a vector.
integer(4) function cublasNrm2Ex(h, n, x, xtype, incx, res, &
restype, extype)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x ! Type and kind as specified by xtype
type(cudaDataType) :: xtype
integer :: incx
real(2), device :: res ! device or host variable
type(cudaDataType) :: restype
type(cudaDataType) :: extype
cublasRotEx
ROT applies a plane rotation.
integer(4) function cublasRotEx(h, n, x, xtype, incx, &
y, ytype, incy, c, s, cstype, extype)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x ! Type and kind as specified by xtype
type(cudaDataType) :: xtype
integer :: incx
real(2), device, dimension(*) :: y ! Type and kind as specified by ytype
type(cudaDataType) :: ytype
integer :: incy
real(2), device :: c, s ! device or host variable
type(cudaDataType) :: cstype
type(cudaDataType) :: extype
cublasRotgEx
ROTG constructs a Givens plane rotation
integer(4) function cublasRotgEx(h, a, b, abtype, &
c, s, cstype, extype)
type(cublasHandle) :: h
real(2), device :: a, b ! Type and kind as specified by abtype
type(cudaDataType) :: abtype
real(2), device :: c, s ! device or host variable
type(cudaDataType) :: cstype
type(cudaDataType) :: extype
cublasRotmEx
ROTM applies a modified Givens transformation.
integer(4) function cublasRotmEx(h, n, x, xtype, incx, &
y, ytype, incy, param, paramtype, extype)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x ! Type and kind as specified by xtype
type(cudaDataType) :: xtype
integer :: incx
real(2), device, dimension(*) :: y ! Type and kind as specified by ytype
type(cudaDataType) :: ytype
integer :: incy
real(2), device, dimension(*) :: param
type(cudaDataType) :: paramtype
type(cudaDataType) :: extype
cublasRotmgEx
ROTMG constructs a modified Givens transformation matrix.
integer(4) function cublasRotmgEx(h, d1, d1type, d2, d2type, &
x1, x1type, y1, y1type, param, paramtype, extype)
type(cublasHandle) :: h
real(2), device :: d1 ! Type and kind as specified by d1type
type(cudaDataType) :: d1type
real(2), device :: d2 ! Type and kind as specified by d2type
type(cudaDataType) :: d2type
real(2), device :: x1 ! Type and kind as specified by x1type
type(cudaDataType) :: x1type
real(2), device :: y1 ! Type and kind as specified by y1type
type(cudaDataType) :: y1type
real(2), device, dimension(*) :: param
type(cudaDataType) :: paramtype
type(cudaDataType) :: extype
cublasScalEx
SCAL scales a vector by a constant.
integer(4) function cublasScalEx(h, n, alpha, alphatype, &
x, xtype, incx, extype)
type(cublasHandle) :: h
integer :: n
real(2), device :: alpha
type(cudaDataType) :: alphatype
real(2), device, dimension(*) :: x
type(cudaDataType) :: xtype
integer :: incx
type(cudaDataType) :: extype
cublasSwapEx
SWAP interchanges two vectors.
integer(4) function cublasSwapEx(h, n, x, xtype, incx, &
y, ytype, incy)
type(cublasHandle) :: h
integer :: n
real(2), device, dimension(*) :: x
type(cudaDataType) :: xtype
integer :: incx
real(2), device, dimension(*) :: y
type(cudaDataType) :: ytype
integer :: incy
cublasGemmEx
GEMM performs the matrix-matrix multiply operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix, and C an m by n matrix.
The data type of alpha, beta
mainly follows the computeType argument. See the cuBLAS documentation for data type combinations currently supported.
integer(4) function cublasGemmEx(h, transa, transb, m, n, k, alpha, &
A, atype, lda, B, btype, ldb, beta, C, ctype, ldc, computeType, algo)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k
real(2), device :: alpha ! device or host variable
real(2), device :: A(lda,*)
type(cudaDataType) :: atype
integer :: lda
real(2), device :: B(ldb,*)
type(cudaDataType) :: btype
integer :: ldb
real(2), device :: beta ! device or host variable
real(2), device :: C(ldc,*)
type(cudaDataType) :: ctype
integer :: ldc
type(cublasComputeType) :: computeType ! also accept integer
type(cublasGemmAlgoType) :: algo ! also accept integer
cublasGemmBatchedEx
GEMM performs a batch of matrix-matrix multiply operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix, and C an m by n matrix.
The data type of alpha, beta
mainly follows the computeType argument. See the cuBLAS documentation for data type combinations currently supported.
integer(4) function cublasGemmBatchedEx(h, transa, transb, m, n, k, &
alpha, Aarray, atype, lda, Barray, btype, ldb, beta, &
Carray, ctype, ldc, batchCount, computeType, algo)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k
real(2), device :: alpha ! device or host variable
type(c_devptr), device :: Aarray(*)
type(cudaDataType) :: atype
integer :: lda
type(c_devptr), device :: Barray(*)
type(cudaDataType) :: btype
integer :: ldb
real(2), device :: beta ! device or host variable
type(c_devptr), device :: Carray(*)
type(cudaDataType) :: ctype
integer :: ldc
integer :: batchCount
type(cublasComputeType) :: computeType ! also accept integer
type(cublasGemmAlgoType) :: algo ! also accept integer
cublasGemmStridedBatchedEx
GEMM performs a batch of matrix-matrix multiply operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix, and C an m by n matrix.
The data type of alpha, beta
mainly follows the computeType argument. See the cuBLAS documentation for data type combinations currently supported.
integer(4) function cublasGemmStridedBatchedEx(h, transa, transb, m, n, k, &
alpha, A, atype, lda, strideA, B, btype, ldb, strideB, beta, &
C, ctype, ldc, strideC, batchCount, computeType, algo)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k
real(2), device :: alpha ! device or host variable
real(2), device :: A(lda,*)
type(cudaDataType) :: atype
integer :: lda
integer(8) :: strideA
real(2), device :: B(ldb,*)
type(cudaDataType) :: btype
integer :: ldb
integer(8) :: strideB
real(2), device :: beta ! device or host variable
real(2), device :: C(ldc,*)
type(cudaDataType) :: ctype
integer :: ldc
integer(8) :: strideC
integer :: batchCount
type(cublasComputeType) :: computeType ! also accept integer
type(cublasGemmAlgoType) :: algo ! also accept integer
CUBLAS V2 Module Functions
This section contains interfaces to the cuBLAS V2 Module Functions. Users can access this module by inserting the line use cublas_v2
into the program unit. One major difference in the cublas_v2
versus the cublas
module is the cublas entry points, such as cublasIsamax
are changed to take the handle as the first argument. The second difference in the cublas_v2
module is the v2 entry points, such as cublasIsamax_v2
do not implicitly handle the pointer modes for the user. It is up to the programmer to make calls to cublasSetPointerMode
to tell the library if scalar arguments reside on the host or device. The actual interfaces to the v2 entry points do not change, and are not listed in this section.
Single Precision Functions and Subroutines
This section contains the V2 interfaces to the single precision BLAS and cuBLAS functions and subroutines.
isamax
If you use the cublas_v2 module, the interface for cublasIsamax is changed to the following:
integer(4) function cublasIsamax(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
isamin
If you use the cublas_v2 module, the interface for cublasIsamin is changed to the following:
integer(4) function cublasIsamin(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
sasum
If you use the cublas_v2 module, the interface for cublasSasum is changed to the following:
integer(4) function cublasSasum(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
real(4), device :: res ! device or host variable
saxpy
If you use the cublas_v2 module, the interface for cublasSaxpy is changed to the following:
integer(4) function cublasSaxpy(h, n, a, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(4), device :: a ! device or host variable
real(4), device, dimension(*) :: x, y
integer :: incx, incy
scopy
If you use the cublas_v2 module, the interface for cublasScopy is changed to the following:
integer(4) function cublasScopy(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
sdot
If you use the cublas_v2 module, the interface for cublasSdot is changed to the following:
integer(4) function cublasSdot(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
real(4), device :: res ! device or host variable
snrm2
If you use the cublas_v2 module, the interface for cublasSnrm2 is changed to the following:
integer(4) function cublasSnrm2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x
integer :: incx
real(4), device :: res ! device or host variable
srot
If you use the cublas_v2 module, the interface for cublasSrot is changed to the following:
integer(4) function cublasSrot(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(4), device :: sc, ss ! device or host variable
real(4), device, dimension(*) :: x, y
integer :: incx, incy
srotg
If you use the cublas_v2 module, the interface for cublasSrotg is changed to the following:
integer(4) function cublasSrotg(h, sa, sb, sc, ss)
type(cublasHandle) :: h
real(4), device :: sa, sb, sc, ss ! device or host variable
srotm
If you use the cublas_v2 module, the interface for cublasSrotm is changed to the following:
integer(4) function cublasSrotm(h, n, x, incx, y, incy, param)
type(cublasHandle) :: h
integer :: n
real(4), device :: param(*) ! device or host variable
real(4), device, dimension(*) :: x, y
integer :: incx, incy
srotmg
If you use the cublas_v2 module, the interface for cublasSrotmg is changed to the following:
integer(4) function cublasSrotmg(h, d1, d2, x1, y1, param)
type(cublasHandle) :: h
real(4), device :: d1, d2, x1, y1, param(*) ! device or host variable
sscal
If you use the cublas_v2 module, the interface for cublasSscal is changed to the following:
integer(4) function cublasSscal(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
real(4), device :: a ! device or host variable
real(4), device, dimension(*) :: x
integer :: incx
sswap
If you use the cublas_v2 module, the interface for cublasSswap is changed to the following:
integer(4) function cublasSswap(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(4), device, dimension(*) :: x, y
integer :: incx, incy
sgbmv
If you use the cublas_v2 module, the interface for cublasSgbmv is changed to the following:
integer(4) function cublasSgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, kl, ku, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
sgemv
If you use the cublas_v2 module, the interface for cublasSgemv is changed to the following:
integer(4) function cublasSgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
sger
If you use the cublas_v2 module, the interface for cublasSger is changed to the following:
integer(4) function cublasSger(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha ! device or host variable
ssbmv
If you use the cublas_v2 module, the interface for cublasSsbmv is changed to the following:
integer(4) function cublasSsbmv(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: k, n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
sspmv
If you use the cublas_v2 module, the interface for cublasSspmv is changed to the following:
integer(4) function cublasSspmv(h, t, n, alpha, a, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
real(4), device, dimension(*) :: a, x, y
real(4), device :: alpha, beta ! device or host variable
sspr
If you use the cublas_v2 module, the interface for cublasSspr is changed to the following:
integer(4) function cublasSspr(h, t, n, alpha, x, incx, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx
real(4), device, dimension(*) :: a, x
real(4), device :: alpha ! device or host variable
sspr2
If you use the cublas_v2 module, the interface for cublasSspr2 is changed to the following:
integer(4) function cublasSspr2(h, t, n, alpha, x, incx, y, incy, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
real(4), device, dimension(*) :: a, x, y
real(4), device :: alpha ! device or host variable
ssymv
If you use the cublas_v2 module, the interface for cublasSsymv is changed to the following:
integer(4) function cublasSsymv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha, beta ! device or host variable
ssyr
If you use the cublas_v2 module, the interface for cublasSsyr is changed to the following:
integer(4) function cublasSsyr(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
real(4), device :: alpha ! device or host variable
ssyr2
If you use the cublas_v2 module, the interface for cublasSsyr2 is changed to the following:
integer(4) function cublasSsyr2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x, y
real(4), device :: alpha ! device or host variable
stbmv
If you use the cublas_v2 module, the interface for cublasStbmv is changed to the following:
integer(4) function cublasStbmv(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
stbsv
If you use the cublas_v2 module, the interface for cublasStbsv is changed to the following:
integer(4) function cublasStbsv(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
stpmv
If you use the cublas_v2 module, the interface for cublasStpmv is changed to the following:
integer(4) function cublasStpmv(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
real(4), device, dimension(*) :: a, x
stpsv
If you use the cublas_v2 module, the interface for cublasStpsv is changed to the following:
integer(4) function cublasStpsv(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
real(4), device, dimension(*) :: a, x
strmv
If you use the cublas_v2 module, the interface for cublasStrmv is changed to the following:
integer(4) function cublasStrmv(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
strsv
If you use the cublas_v2 module, the interface for cublasStrsv is changed to the following:
integer(4) function cublasStrsv(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(*) :: x
sgemm
If you use the cublas_v2 module, the interface for cublasSgemm is changed to the following:
integer(4) function cublasSgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
ssymm
If you use the cublas_v2 module, the interface for cublasSsymm is changed to the following:
integer(4) function cublasSsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
ssyrk
If you use the cublas_v2 module, the interface for cublasSsyrk is changed to the following:
integer(4) function cublasSsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
ssyr2k
If you use the cublas_v2 module, the interface for cublasSsyr2k is changed to the following:
integer(4) function cublasSsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
ssyrkx
If you use the cublas_v2 module, the interface for cublasSsyrkx is changed to the following:
integer(4) function cublasSsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
strmm
If you use the cublas_v2 module, the interface for cublasStrmm is changed to the following:
integer(4) function cublasStrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb, ldc
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device, dimension(ldc, *) :: c
real(4), device :: alpha ! device or host variable
strsm
If you use the cublas_v2 module, the interface for cublasStrsm is changed to the following:
integer(4) function cublasStrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(4), device, dimension(lda, *) :: a
real(4), device, dimension(ldb, *) :: b
real(4), device :: alpha ! device or host variable
Double Precision Functions and Subroutines
This section contains the V2 interfaces to the double precision BLAS and cuBLAS functions and subroutines.
idamax
If you use the cublas_v2 module, the interface for cublasIdamax is changed to the following:
integer(4) function cublasIdamax(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
idamin
If you use the cublas_v2 module, the interface for cublasIdamin is changed to the following:
integer(4) function cublasIdamin(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
dasum
If you use the cublas_v2 module, the interface for cublasDasum is changed to the following:
integer(4) function cublasDasum(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
real(8), device :: res ! device or host variable
daxpy
If you use the cublas_v2 module, the interface for cublasDaxpy is changed to the following:
integer(4) function cublasDaxpy(h, n, a, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(8), device :: a ! device or host variable
real(8), device, dimension(*) :: x, y
integer :: incx, incy
dcopy
If you use the cublas_v2 module, the interface for cublasDcopy is changed to the following:
integer(4) function cublasDcopy(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
ddot
If you use the cublas_v2 module, the interface for cublasDdot is changed to the following:
integer(4) function cublasDdot(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
real(8), device :: res ! device or host variable
dnrm2
If you use the cublas_v2 module, the interface for cublasDnrm2 is changed to the following:
integer(4) function cublasDnrm2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x
integer :: incx
real(8), device :: res ! device or host variable
drot
If you use the cublas_v2 module, the interface for cublasDrot is changed to the following:
integer(4) function cublasDrot(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(8), device :: sc, ss ! device or host variable
real(8), device, dimension(*) :: x, y
integer :: incx, incy
drotg
If you use the cublas_v2 module, the interface for cublasDrotg is changed to the following:
integer(4) function cublasDrotg(h, sa, sb, sc, ss)
type(cublasHandle) :: h
real(8), device :: sa, sb, sc, ss ! device or host variable
drotm
If you use the cublas_v2 module, the interface for cublasDrotm is changed to the following:
integer(4) function cublasDrotm(h, n, x, incx, y, incy, param)
type(cublasHandle) :: h
integer :: n
real(8), device :: param(*) ! device or host variable
real(8), device, dimension(*) :: x, y
integer :: incx, incy
drotmg
If you use the cublas_v2 module, the interface for cublasDrotmg is changed to the following:
integer(4) function cublasDrotmg(h, d1, d2, x1, y1, param)
type(cublasHandle) :: h
real(8), device :: d1, d2, x1, y1, param(*) ! device or host variable
dscal
If you use the cublas_v2 module, the interface for cublasDscal is changed to the following:
integer(4) function cublasDscal(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
real(8), device :: a ! device or host variable
real(8), device, dimension(*) :: x
integer :: incx
dswap
If you use the cublas_v2 module, the interface for cublasDswap is changed to the following:
integer(4) function cublasDswap(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
real(8), device, dimension(*) :: x, y
integer :: incx, incy
dgbmv
If you use the cublas_v2 module, the interface for cublasDgbmv is changed to the following:
integer(4) function cublasDgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, kl, ku, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
dgemv
If you use the cublas_v2 module, the interface for cublasDgemv is changed to the following:
integer(4) function cublasDgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
dger
If you use the cublas_v2 module, the interface for cublasDger is changed to the following:
integer(4) function cublasDger(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha ! device or host variable
dsbmv
If you use the cublas_v2 module, the interface for cublasDsbmv is changed to the following:
integer(4) function cublasDsbmv(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: k, n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
dspmv
If you use the cublas_v2 module, the interface for cublasDspmv is changed to the following:
integer(4) function cublasDspmv(h, t, n, alpha, a, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
real(8), device, dimension(*) :: a, x, y
real(8), device :: alpha, beta ! device or host variable
dspr
If you use the cublas_v2 module, the interface for cublasDspr is changed to the following:
integer(4) function cublasDspr(h, t, n, alpha, x, incx, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx
real(8), device, dimension(*) :: a, x
real(8), device :: alpha ! device or host variable
dspr2
If you use the cublas_v2 module, the interface for cublasDspr2 is changed to the following:
integer(4) function cublasDspr2(h, t, n, alpha, x, incx, y, incy, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
real(8), device, dimension(*) :: a, x, y
real(8), device :: alpha ! device or host variable
dsymv
If you use the cublas_v2 module, the interface for cublasDsymv is changed to the following:
integer(4) function cublasDsymv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha, beta ! device or host variable
dsyr
If you use the cublas_v2 module, the interface for cublasDsyr is changed to the following:
integer(4) function cublasDsyr(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
real(8), device :: alpha ! device or host variable
dsyr2
If you use the cublas_v2 module, the interface for cublasDsyr2 is changed to the following:
integer(4) function cublasDsyr2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x, y
real(8), device :: alpha ! device or host variable
dtbmv
If you use the cublas_v2 module, the interface for cublasDtbmv is changed to the following:
integer(4) function cublasDtbmv(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
dtbsv
If you use the cublas_v2 module, the interface for cublasDtbsv is changed to the following:
integer(4) function cublasDtbsv(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
dtpmv
If you use the cublas_v2 module, the interface for cublasDtpmv is changed to the following:
integer(4) function cublasDtpmv(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
real(8), device, dimension(*) :: a, x
dtpsv
If you use the cublas_v2 module, the interface for cublasDtpsv is changed to the following:
integer(4) function cublasDtpsv(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
real(8), device, dimension(*) :: a, x
dtrmv
If you use the cublas_v2 module, the interface for cublasDtrmv is changed to the following:
integer(4) function cublasDtrmv(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
dtrsv
If you use the cublas_v2 module, the interface for cublasDtrsv is changed to the following:
integer(4) function cublasDtrsv(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(*) :: x
dgemm
If you use the cublas_v2 module, the interface for cublasDgemm is changed to the following:
integer(4) function cublasDgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dsymm
If you use the cublas_v2 module, the interface for cublasDsymm is changed to the following:
integer(4) function cublasDsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dsyrk
If you use the cublas_v2 module, the interface for cublasDsyrk is changed to the following:
integer(4) function cublasDsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dsyr2k
If you use the cublas_v2 module, the interface for cublasDsyr2k is changed to the following:
integer(4) function cublasDsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dsyrkx
If you use the cublas_v2 module, the interface for cublasDsyrkx is changed to the following:
integer(4) function cublasDsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
dtrmm
If you use the cublas_v2 module, the interface for cublasDtrmm is changed to the following:
integer(4) function cublasDtrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb, ldc
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device, dimension(ldc, *) :: c
real(8), device :: alpha ! device or host variable
dtrsm
If you use the cublas_v2 module, the interface for cublasDtrsm is changed to the following:
integer(4) function cublasDtrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb
real(8), device, dimension(lda, *) :: a
real(8), device, dimension(ldb, *) :: b
real(8), device :: alpha ! device or host variable
Single Precision Complex Functions and Subroutines
This section contains the V2 interfaces to the single precision complex BLAS and cuBLAS functions and subroutines.
icamax
If you use the cublas_v2 module, the interface for cublasIcamax is changed to the following:
integer(4) function cublasIcamax(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
icamin
If you use the cublas_v2 module, the interface for cublasIcamin is changed to the following:
integer(4) function cublasIcamin(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
scasum
If you use the cublas_v2 module, the interface for cublasScasum is changed to the following:
integer(4) function cublasScasum(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
real(4), device :: res ! device or host variable
caxpy
If you use the cublas_v2 module, the interface for cublasCaxpy is changed to the following:
integer(4) function cublasCaxpy(h, n, a, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
ccopy
If you use the cublas_v2 module, the interface for cublasCcopy is changed to the following:
integer(4) function cublasCcopy(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
cdotc
If you use the cublas_v2 module, the interface for cublasCdotc is changed to the following:
integer(4) function cublasCdotc(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
complex(4), device :: res ! device or host variable
cdotu
If you use the cublas_v2 module, the interface for cublasCdotu is changed to the following:
integer(4) function cublasCdotu(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
complex(4), device :: res ! device or host variable
scnrm2
If you use the cublas_v2 module, the interface for cublasScnrm2 is changed to the following:
integer(4) function cublasScnrm2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x
integer :: incx
real(4), device :: res ! device or host variable
crot
If you use the cublas_v2 module, the interface for cublasCrot is changed to the following:
integer(4) function cublasCrot(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(4), device :: sc ! device or host variable
complex(4), device :: ss ! device or host variable
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
csrot
If you use the cublas_v2 module, the interface for cublasCsrot is changed to the following:
integer(4) function cublasCsrot(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(4), device :: sc, ss ! device or host variable
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
crotg
If you use the cublas_v2 module, the interface for cublasCrotg is changed to the following:
integer(4) function cublasCrotg(h, sa, sb, sc, ss)
type(cublasHandle) :: h
complex(4), device :: sa, sb, ss ! device or host variable
real(4), device :: sc ! device or host variable
cscal
If you use the cublas_v2 module, the interface for cublasCscal is changed to the following:
integer(4) function cublasCscal(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
complex(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x
integer :: incx
csscal
If you use the cublas_v2 module, the interface for cublasCsscal is changed to the following:
integer(4) function cublasCsscal(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
real(4), device :: a ! device or host variable
complex(4), device, dimension(*) :: x
integer :: incx
cswap
If you use the cublas_v2 module, the interface for cublasCswap is changed to the following:
integer(4) function cublasCswap(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(4), device, dimension(*) :: x, y
integer :: incx, incy
cgbmv
If you use the cublas_v2 module, the interface for cublasCgbmv is changed to the following:
integer(4) function cublasCgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, kl, ku, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
cgemv
If you use the cublas_v2 module, the interface for cublasCgemv is changed to the following:
integer(4) function cublasCgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
cgerc
If you use the cublas_v2 module, the interface for cublasCgerc is changed to the following:
integer(4) function cublasCgerc(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha ! device or host variable
cgeru
If you use the cublas_v2 module, the interface for cublasCgeru is changed to the following:
integer(4) function cublasCgeru(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha ! device or host variable
csymv
If you use the cublas_v2 module, the interface for cublasCsymv is changed to the following:
integer(4) function cublasCsymv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
csyr
If you use the cublas_v2 module, the interface for cublasCsyr is changed to the following:
integer(4) function cublasCsyr(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
complex(4), device :: alpha ! device or host variable
csyr2
If you use the cublas_v2 module, the interface for cublasCsyr2 is changed to the following:
integer(4) function cublasCsyr2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha ! device or host variable
ctbmv
If you use the cublas_v2 module, the interface for cublasCtbmv is changed to the following:
integer(4) function cublasCtbmv(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
ctbsv
If you use the cublas_v2 module, the interface for cublasCtbsv is changed to the following:
integer(4) function cublasCtbsv(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
ctpmv
If you use the cublas_v2 module, the interface for cublasCtpmv is changed to the following:
integer(4) function cublasCtpmv(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
complex(4), device, dimension(*) :: a, x
ctpsv
If you use the cublas_v2 module, the interface for cublasCtpsv is changed to the following:
integer(4) function cublasCtpsv(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
complex(4), device, dimension(*) :: a, x
ctrmv
If you use the cublas_v2 module, the interface for cublasCtrmv is changed to the following:
integer(4) function cublasCtrmv(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
ctrsv
If you use the cublas_v2 module, the interface for cublasCtrsv is changed to the following:
integer(4) function cublasCtrsv(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x
chbmv
If you use the cublas_v2 module, the interface for cublasChbmv is changed to the following:
integer(4) function cublasChbmv(h, uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: k, n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
chemv
If you use the cublas_v2 module, the interface for cublasChemv is changed to the following:
integer(4) function cublasChemv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(*) :: x, y
complex(4), device :: alpha, beta ! device or host variable
chpmv
If you use the cublas_v2 module, the interface for cublasChpmv is changed to the following:
integer(4) function cublasChpmv(h, uplo, n, alpha, a, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, incx, incy
complex(4), device, dimension(*) :: a, x, y
complex(4), device :: alpha, beta ! device or host variable
cher
If you use the cublas_v2 module, the interface for cublasCher is changed to the following:
integer(4) function cublasCher(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
complex(4), device, dimension(*) :: a, x
real(4), device :: alpha ! device or host variable
cher2
If you use the cublas_v2 module, the interface for cublasCher2 is changed to the following:
integer(4) function cublasCher2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
complex(4), device, dimension(*) :: a, x, y
complex(4), device :: alpha ! device or host variable
chpr
If you use the cublas_v2 module, the interface for cublasChpr is changed to the following:
integer(4) function cublasChpr(h, t, n, alpha, x, incx, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx
complex(4), device, dimension(*) :: a, x
real(4), device :: alpha ! device or host variable
chpr2
If you use the cublas_v2 module, the interface for cublasChpr2 is changed to the following:
integer(4) function cublasChpr2(h, t, n, alpha, x, incx, y, incy, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
complex(4), device, dimension(*) :: a, x, y
complex(4), device :: alpha ! device or host variable
cgemm
If you use the cublas_v2 module, the interface for cublasCgemm is changed to the following:
integer(4) function cublasCgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
csymm
If you use the cublas_v2 module, the interface for cublasCsymm is changed to the following:
integer(4) function cublasCsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
csyrk
If you use the cublas_v2 module, the interface for cublasCsyrk is changed to the following:
integer(4) function cublasCsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
csyr2k
If you use the cublas_v2 module, the interface for cublasCsyr2k is changed to the following:
integer(4) function cublasCsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
csyrkx
If you use the cublas_v2 module, the interface for cublasCsyrkx is changed to the following:
integer(4) function cublasCsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
ctrmm
If you use the cublas_v2 module, the interface for cublasCtrmm is changed to the following:
integer(4) function cublasCtrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha ! device or host variable
ctrsm
If you use the cublas_v2 module, the interface for cublasCtrsm is changed to the following:
integer(4) function cublasCtrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device :: alpha ! device or host variable
chemm
If you use the cublas_v2 module, the interface for cublasChemm is changed to the following:
integer(4) function cublasChemm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha, beta ! device or host variable
cherk
If you use the cublas_v2 module, the interface for cublasCherk is changed to the following:
integer(4) function cublasCherk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldc, *) :: c
real(4), device :: alpha, beta ! device or host variable
cher2k
If you use the cublas_v2 module, the interface for cublasCher2k is changed to the following:
integer(4) function cublasCher2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha ! device or host variable
real(4), device :: beta ! device or host variable
cherkx
If you use the cublas_v2 module, the interface for cublasCherkx is changed to the following:
integer(4) function cublasCherkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(4), device, dimension(lda, *) :: a
complex(4), device, dimension(ldb, *) :: b
complex(4), device, dimension(ldc, *) :: c
complex(4), device :: alpha ! device or host variable
real(4), device :: beta ! device or host variable
Double Precision Complex Functions and Subroutines
This section contains the V2 interfaces to the double precision complex BLAS and cuBLAS functions and subroutines.
izamax
If you use the cublas_v2 module, the interface for cublasIzamax is changed to the following:
integer(4) function cublasIzamax(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
izamin
If you use the cublas_v2 module, the interface for cublasIzamin is changed to the following:
integer(4) function cublasIzamin(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
integer, device :: res ! device or host variable
dzasum
If you use the cublas_v2 module, the interface for cublasDzasum is changed to the following:
integer(4) function cublasDzasum(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
real(8), device :: res ! device or host variable
zaxpy
If you use the cublas_v2 module, the interface for cublasZaxpy is changed to the following:
integer(4) function cublasZaxpy(h, n, a, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zcopy
If you use the cublas_v2 module, the interface for cublasZcopy is changed to the following:
integer(4) function cublasZcopy(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zdotc
If you use the cublas_v2 module, the interface for cublasZdotc is changed to the following:
integer(4) function cublasZdotc(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
complex(8), device :: res ! device or host variable
zdotu
If you use the cublas_v2 module, the interface for cublasZdotu is changed to the following:
integer(4) function cublasZdotu(h, n, x, incx, y, incy, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
complex(8), device :: res ! device or host variable
dznrm2
If you use the cublas_v2 module, the interface for cublasDznrm2 is changed to the following:
integer(4) function cublasDznrm2(h, n, x, incx, res)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x
integer :: incx
real(8), device :: res ! device or host variable
zrot
If you use the cublas_v2 module, the interface for cublasZrot is changed to the following:
integer(4) function cublasZrot(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(8), device :: sc ! device or host variable
complex(8), device :: ss ! device or host variable
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zsrot
If you use the cublas_v2 module, the interface for cublasZsrot is changed to the following:
integer(4) function cublasZsrot(h, n, x, incx, y, incy, sc, ss)
type(cublasHandle) :: h
integer :: n
real(8), device :: sc, ss ! device or host variable
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zrotg
If you use the cublas_v2 module, the interface for cublasZrotg is changed to the following:
integer(4) function cublasZrotg(h, sa, sb, sc, ss)
type(cublasHandle) :: h
complex(8), device :: sa, sb, ss ! device or host variable
real(8), device :: sc ! device or host variable
zscal
If you use the cublas_v2 module, the interface for cublasZscal is changed to the following:
integer(4) function cublasZscal(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
complex(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x
integer :: incx
zdscal
If you use the cublas_v2 module, the interface for cublasZdscal is changed to the following:
integer(4) function cublasZdscal(h, n, a, x, incx)
type(cublasHandle) :: h
integer :: n
real(8), device :: a ! device or host variable
complex(8), device, dimension(*) :: x
integer :: incx
zswap
If you use the cublas_v2 module, the interface for cublasZswap is changed to the following:
integer(4) function cublasZswap(h, n, x, incx, y, incy)
type(cublasHandle) :: h
integer :: n
complex(8), device, dimension(*) :: x, y
integer :: incx, incy
zgbmv
If you use the cublas_v2 module, the interface for cublasZgbmv is changed to the following:
integer(4) function cublasZgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, kl, ku, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zgemv
If you use the cublas_v2 module, the interface for cublasZgemv is changed to the following:
integer(4) function cublasZgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: t
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zgerc
If you use the cublas_v2 module, the interface for cublasZgerc is changed to the following:
integer(4) function cublasZgerc(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha ! device or host variable
zgeru
If you use the cublas_v2 module, the interface for cublasZgeru is changed to the following:
integer(4) function cublasZgeru(h, m, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: m, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha ! device or host variable
zsymv
If you use the cublas_v2 module, the interface for cublasZsymv is changed to the following:
integer(4) function cublasZsymv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zsyr
If you use the cublas_v2 module, the interface for cublasZsyr is changed to the following:
integer(4) function cublasZsyr(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
complex(8), device :: alpha ! device or host variable
zsyr2
If you use the cublas_v2 module, the interface for cublasZsyr2 is changed to the following:
integer(4) function cublasZsyr2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha ! device or host variable
ztbmv
If you use the cublas_v2 module, the interface for cublasZtbmv is changed to the following:
integer(4) function cublasZtbmv(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
ztbsv
If you use the cublas_v2 module, the interface for cublasZtbsv is changed to the following:
integer(4) function cublasZtbsv(h, u, t, d, n, k, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, k, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
ztpmv
If you use the cublas_v2 module, the interface for cublasZtpmv is changed to the following:
integer(4) function cublasZtpmv(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
complex(8), device, dimension(*) :: a, x
ztpsv
If you use the cublas_v2 module, the interface for cublasZtpsv is changed to the following:
integer(4) function cublasZtpsv(h, u, t, d, n, a, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx
complex(8), device, dimension(*) :: a, x
ztrmv
If you use the cublas_v2 module, the interface for cublasZtrmv is changed to the following:
integer(4) function cublasZtrmv(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
ztrsv
If you use the cublas_v2 module, the interface for cublasZtrsv is changed to the following:
integer(4) function cublasZtrsv(h, u, t, d, n, a, lda, x, incx)
type(cublasHandle) :: h
integer :: u, t, d
integer :: n, incx, lda
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x
zhbmv
If you use the cublas_v2 module, the interface for cublasZhbmv is changed to the following:
integer(4) function cublasZhbmv(h, uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: k, n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zhemv
If you use the cublas_v2 module, the interface for cublasZhemv is changed to the following:
integer(4) function cublasZhemv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, lda, incx, incy
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(*) :: x, y
complex(8), device :: alpha, beta ! device or host variable
zhpmv
If you use the cublas_v2 module, the interface for cublasZhpmv is changed to the following:
integer(4) function cublasZhpmv(h, uplo, n, alpha, a, x, incx, beta, y, incy)
type(cublasHandle) :: h
integer :: uplo
integer :: n, incx, incy
complex(8), device, dimension(*) :: a, x, y
complex(8), device :: alpha, beta ! device or host variable
zher
If you use the cublas_v2 module, the interface for cublasZher is changed to the following:
integer(4) function cublasZher(h, t, n, alpha, x, incx, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, lda
complex(8), device, dimension(*) :: a, x
real(8), device :: alpha ! device or host variable
zher2
If you use the cublas_v2 module, the interface for cublasZher2 is changed to the following:
integer(4) function cublasZher2(h, t, n, alpha, x, incx, y, incy, a, lda)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy, lda
complex(8), device, dimension(*) :: a, x, y
complex(8), device :: alpha ! device or host variable
zhpr
If you use the cublas_v2 module, the interface for cublasZhpr is changed to the following:
integer(4) function cublasZhpr(h, t, n, alpha, x, incx, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx
complex(8), device, dimension(*) :: a, x
real(8), device :: alpha ! device or host variable
zhpr2
If you use the cublas_v2 module, the interface for cublasZhpr2 is changed to the following:
integer(4) function cublasZhpr2(h, t, n, alpha, x, incx, y, incy, a)
type(cublasHandle) :: h
integer :: t
integer :: n, incx, incy
complex(8), device, dimension(*) :: a, x, y
complex(8), device :: alpha ! device or host variable
zgemm
If you use the cublas_v2 module, the interface for cublasZgemm is changed to the following:
integer(4) function cublasZgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: transa, transb
integer :: m, n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zsymm
If you use the cublas_v2 module, the interface for cublasZsymm is changed to the following:
integer(4) function cublasZsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zsyrk
If you use the cublas_v2 module, the interface for cublasZsyrk is changed to the following:
integer(4) function cublasZsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zsyr2k
If you use the cublas_v2 module, the interface for cublasZsyr2k is changed to the following:
integer(4) function cublasZsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zsyrkx
If you use the cublas_v2 module, the interface for cublasZsyrkx is changed to the following:
integer(4) function cublasZsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
ztrmm
If you use the cublas_v2 module, the interface for cublasZtrmm is changed to the following:
integer(4) function cublasZtrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha ! device or host variable
ztrsm
If you use the cublas_v2 module, the interface for cublasZtrsm is changed to the following:
integer(4) function cublasZtrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasHandle) :: h
integer :: side, uplo, transa, diag
integer :: m, n, lda, ldb
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device :: alpha ! device or host variable
zhemm
If you use the cublas_v2 module, the interface for cublasZhemm is changed to the following:
integer(4) function cublasZhemm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: side, uplo
integer :: m, n, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha, beta ! device or host variable
zherk
If you use the cublas_v2 module, the interface for cublasZherk is changed to the following:
integer(4) function cublasZherk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldc, *) :: c
real(8), device :: alpha, beta ! device or host variable
zher2k
If you use the cublas_v2 module, the interface for cublasZher2k is changed to the following:
integer(4) function cublasZher2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha ! device or host variable
real(8), device :: beta ! device or host variable
zherkx
If you use the cublas_v2 module, the interface for cublasZherkx is changed to the following:
integer(4) function cublasZherkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasHandle) :: h
integer :: uplo, trans
integer :: n, k, lda, ldb, ldc
complex(8), device, dimension(lda, *) :: a
complex(8), device, dimension(ldb, *) :: b
complex(8), device, dimension(ldc, *) :: c
complex(8), device :: alpha ! device or host variable
real(8), device :: beta ! device or host variable
CUBLAS XT Module Functions
This section contains interfaces to the cuBLAS XT Module Functions. Users can access this module by inserting the line use cublasXt
into the program unit. The cublasXt library is a host-side library, which supports multiple GPUs. Here is an example:
subroutine testxt(n)
use cublasXt
complex*16 :: a(n,n), b(n,n), c(n,n), alpha, beta
type(cublasXtHandle) :: h
integer ndevices(1)
a = cmplx(1.0d0,0.0d0)
b = cmplx(2.0d0,0.0d0)
c = cmplx(-1.0d0,0.0d0)
alpha = cmplx(1.0d0,0.0d0)
beta = cmplx(0.0d0,0.0d0)
istat = cublasXtCreate(h)
if (istat .ne. CUBLAS_STATUS_SUCCESS) print *,istat
ndevices(1) = 0
istat = cublasXtDeviceSelect(h, 1, ndevices)
if (istat .ne. CUBLAS_STATUS_SUCCESS) print *,istat
istat = cublasXtZgemm(h, CUBLAS_OP_N, CUBLAS_OP_N, &
n, n, n, &
alpha, A, n, B, n, beta, C, n)
if (istat .ne. CUBLAS_STATUS_SUCCESS) print *,istat
istat = cublasXtDestroy(h)
if (istat .ne. CUBLAS_STATUS_SUCCESS) print *,istat
if (all(dble(c).eq.2.0d0*n)) then
print *,"Test PASSED"
else
print *,"Test FAILED"
endif
end
The cublasXt
module contains all the types and definitions from the cublas
module, and these additional types and enumerations:
TYPE cublasXtHandle
TYPE(C_PTR) :: handle
END TYPE
! Pinned memory mode
enum, bind(c)
enumerator :: CUBLASXT_PINNING_DISABLED=0
enumerator :: CUBLASXT_PINNING_ENABLED=1
end enum
! cublasXtOpType
enum, bind(c)
enumerator :: CUBLASXT_FLOAT=0
enumerator :: CUBLASXT_DOUBLE=1
enumerator :: CUBLASXT_COMPLEX=2
enumerator :: CUBLASXT_DOUBLECOMPLEX=3
end enum
! cublasXtBlasOp
enum, bind(c)
enumerator :: CUBLASXT_GEMM=0
enumerator :: CUBLASXT_SYRK=1
enumerator :: CUBLASXT_HERK=2
enumerator :: CUBLASXT_SYMM=3
enumerator :: CUBLASXT_HEMM=4
enumerator :: CUBLASXT_TRSM=5
enumerator :: CUBLASXT_SYR2K=6
enumerator :: CUBLASXT_HER2K=7
enumerator :: CUBLASXT_SPMM=8
enumerator :: CUBLASXT_SYRKX=9
enumerator :: CUBLASXT_HERKX=10
enumerator :: CUBLASXT_TRMM=11
enumerator :: CUBLASXT_ROUTINE_MAX=12
end enum
cublasXtCreate
This function initializes the cublasXt API and creates a handle to an opaque structure holding the cublasXT library context. It allocates hardware resources on the host and device and must be called prior to making any other cublasXt API library calls.
integer(4) function cublasXtcreate(h)
type(cublasXtHandle) :: h
cublasXtDestroy
This function releases hardware resources used by the cublasXt API context. This function is usually the last call with a particular handle to the cublasXt API.
integer(4) function cublasXtdestroy(h)
type(cublasXtHandle) :: h
cublasXtDeviceSelect
This function allows the user to provide the number of GPU devices and their respective Ids that will participate to the subsequent cublasXt API math function calls. This function will create a cuBLAS context for every GPU provided in that list. Currently the device configuration is static and cannot be changed between math function calls. In that regard, this function should be called only once after cublasXtCreate. To be able to run multiple configurations, multiple cublasXt API contexts should be created.
integer(4) function cublasXtdeviceselect(h, ndevices, deviceid)
type(cublasXtHandle) :: h
integer :: ndevices
integer, dimension(*) :: deviceid
cublasXtSetBlockDim
This function allows the user to set the block dimension used for the tiling of the matrices for the subsequent Math function calls. Matrices are split in square tiles of blockDim x blockDim dimension. This function can be called anytime and will take effect for the following math function calls. The block dimension should be chosen in a way to optimize the math operation and to make sure that the PCI transfers are well overlapped with the computation.
integer(4) function cublasXtsetblockdim(h, blockdim)
type(cublasXtHandle) :: h
integer :: blockdim
cublasXtGetBlockDim
This function allows the user to query the block dimension used for the tiling of the matrices.
integer(4) function cublasXtgetblockdim(h, blockdim)
type(cublasXtHandle) :: h
integer :: blockdim
cublasXtSetCpuRoutine
This function allows the user to provide a CPU implementation of the corresponding BLAS routine. This function can be used with the function cublasXtSetCpuRatio() to define an hybrid computation between the CPU and the GPUs. Currently the hybrid feature is only supported for the xGEMM routines.
integer(4) function cublasXtsetcpuroutine(h, blasop, blastype)
type(cublasXtHandle) :: h
integer :: blasop, blastype
cublasXtSetCpuRatio
This function allows the user to define the percentage of workload that should be done on a CPU in the context of an hybrid computation. This function can be used with the function cublasXtSetCpuRoutine() to define an hybrid computation between the CPU and the GPUs. Currently the hybrid feature is only supported for the xGEMM routines.
integer(4) function cublasXtsetcpuratio(h, blasop, blastype, ratio)
type(cublasXtHandle) :: h
integer :: blasop, blastype
real(4) :: ratio
cublasXtSetPinningMemMode
This function allows the user to enable or disable the Pinning Memory mode. When enabled, the matrices passed in subsequent cublasXt API calls will be pinned/unpinned using the CUDART routine cudaHostRegister and cudaHostUnregister respectively if the matrices are not already pinned. If a matrix happened to be pinned partially, it will also not be pinned. Pinning the memory improve PCI transfer performace and allows to overlap PCI memory transfer with computation. However pinning/unpinning the memory takes some time which might not be amortized. It is advised that the user pins the memory on its own using cudaMallocHost or cudaHostRegister and unpins it when the computation sequence is completed. By default, the Pinning Memory mode is disabled.
integer(4) function cublasXtsetpinningmemmode(h, mode)
type(cublasXtHandle) :: h
integer :: mode
cublasXtGetPinningMemMode
This function allows the user to query the Pinning Memory mode. By default, the Pinning Memory mode is disabled.
integer(4) function cublasXtgetpinningmemmode(h, mode)
type(cublasXtHandle) :: h
integer :: mode
cublasXtSgemm
SGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasXtsgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: transa, transb
integer(kind=c_intptr_t) :: m, n, k, lda, ldb, ldc
real(4), dimension(lda, *) :: a
real(4), dimension(ldb, *) :: b
real(4), dimension(ldc, *) :: c
real(4) :: alpha, beta
cublasXtSsymm
SSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
integer(4) function cublasXtssymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
real(4), dimension(lda, *) :: a
real(4), dimension(ldb, *) :: b
real(4), dimension(ldc, *) :: c
real(4) :: alpha, beta
cublasXtSsyrk
SSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtssyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldc
real(4), dimension(lda, *) :: a
real(4), dimension(ldc, *) :: c
real(4) :: alpha, beta
cublasXtSsyr2k
SSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtssyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
real(4), dimension(lda, *) :: a
real(4), dimension(ldb, *) :: b
real(4), dimension(ldc, *) :: c
real(4) :: alpha, beta
cublasXtSsyrkx
SSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
integer(4) function cublasXtssyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
real(4), dimension(lda, *) :: a
real(4), dimension(ldb, *) :: b
real(4), dimension(ldc, *) :: c
real(4) :: alpha, beta
cublasXtStrmm
STRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ), where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T.
integer(4) function cublasXtstrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo, transa, diag
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
real(4), dimension(lda, *) :: a
real(4), dimension(ldb, *) :: b
real(4), dimension(ldc, *) :: c
real(4) :: alpha
cublasXtStrsm
STRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
integer(4) function cublasXtstrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasXtHandle) :: h
integer :: side, uplo, transa, diag
integer(kind=c_intptr_t) :: m, n, lda, ldb
real(4), dimension(lda, *) :: a
real(4), dimension(ldb, *) :: b
real(4) :: alpha
cublasXtSspmm
SSPMM performs one of the symmetric packed matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a n by n symmetric matrix stored in packed format, and B and C are m by n matrices.
integer(4) function cublasXtsspmm(h, side, uplo, m, n, alpha, ap, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, ldb, ldc
real(4), dimension(*) :: ap
real(4), dimension(ldb, *) :: b
real(4), dimension(ldc, *) :: c
real(4) :: alpha, beta
cublasXtCgemm
CGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasXtcgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: transa, transb
integer(kind=c_intptr_t) :: m, n, k, lda, ldb, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldb, *) :: b
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha, beta
cublasXtChemm
CHEMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is an hermitian matrix and B and C are m by n matrices.
integer(4) function cublasXtchemm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldb, *) :: b
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha, beta
cublasXtCherk
CHERK performs one of the hermitian rank k operations C := alpha*A*A**H + beta*C, or C := alpha*A**H*A + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtcherk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldc, *) :: c
real(4) :: alpha, beta
cublasXtCher2k
CHER2K performs one of the hermitian rank 2k operations C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C, or C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C, where alpha and beta are scalars with beta real, C is an n by n hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtcher2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldb, *) :: b
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha
real(4) :: beta
cublasXtCherkx
CHERKX performs a variation of the hermitian rank k operations C := alpha*A*B**H + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix stored in lower or upper mode, and A and B are n by k matrices. See the CUBLAS documentation for more details.
integer(4) function cublasXtcherkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldb, *) :: b
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha
real(4) :: beta
cublasXtCsymm
CSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
integer(4) function cublasXtcsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldb, *) :: b
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha, beta
cublasXtCsyrk
CSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtcsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha, beta
cublasXtCsyr2k
CSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtcsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldb, *) :: b
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha, beta
cublasXtCsyrkx
CSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
integer(4) function cublasXtcsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldb, *) :: b
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha, beta
cublasXtCtrmm
CTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ) where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H.
integer(4) function cublasXtctrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo, transa, diag
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldb, *) :: b
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha
cublasXtCtrsm
CTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
integer(4) function cublasXtctrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasXtHandle) :: h
integer :: side, uplo, transa, diag
integer(kind=c_intptr_t) :: m, n, lda, ldb
complex(4), dimension(lda, *) :: a
complex(4), dimension(ldb, *) :: b
complex(4) :: alpha
cublasXtCspmm
CSPMM performs one of the symmetric packed matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a n by n symmetric matrix stored in packed format, and B and C are m by n matrices.
integer(4) function cublasXtcspmm(h, side, uplo, m, n, alpha, ap, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, ldb, ldc
complex(4), dimension(*) :: ap
complex(4), dimension(ldb, *) :: b
complex(4), dimension(ldc, *) :: c
complex(4) :: alpha, beta
cublasXtDgemm
DGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasXtdgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: transa, transb
integer(kind=c_intptr_t) :: m, n, k, lda, ldb, ldc
real(8), dimension(lda, *) :: a
real(8), dimension(ldb, *) :: b
real(8), dimension(ldc, *) :: c
real(8) :: alpha, beta
cublasXtDsymm
DSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
integer(4) function cublasXtdsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
real(8), dimension(lda, *) :: a
real(8), dimension(ldb, *) :: b
real(8), dimension(ldc, *) :: c
real(8) :: alpha, beta
cublasXtDsyrk
DSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtdsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldc
real(8), dimension(lda, *) :: a
real(8), dimension(ldc, *) :: c
real(8) :: alpha, beta
cublasXtDsyr2k
DSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtdsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
real(8), dimension(lda, *) :: a
real(8), dimension(ldb, *) :: b
real(8), dimension(ldc, *) :: c
real(8) :: alpha, beta
cublasXtDsyrkx
DSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
integer(4) function cublasXtdsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
real(8), dimension(lda, *) :: a
real(8), dimension(ldb, *) :: b
real(8), dimension(ldc, *) :: c
real(8) :: alpha, beta
cublasXtDtrmm
DTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ), where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T.
integer(4) function cublasXtdtrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo, transa, diag
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
real(8), dimension(lda, *) :: a
real(8), dimension(ldb, *) :: b
real(8), dimension(ldc, *) :: c
real(8) :: alpha
cublasXtDtrsm
DTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
integer(4) function cublasXtdtrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasXtHandle) :: h
integer :: side, uplo, transa, diag
integer(kind=c_intptr_t) :: m, n, lda, ldb
real(8), dimension(lda, *) :: a
real(8), dimension(ldb, *) :: b
real(8) :: alpha
cublasXtDspmm
DSPMM performs one of the symmetric packed matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a n by n symmetric matrix stored in packed format, and B and C are m by n matrices.
integer(4) function cublasXtdspmm(h, side, uplo, m, n, alpha, ap, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, ldb, ldc
real(8), dimension(*) :: ap
real(8), dimension(ldb, *) :: b
real(8), dimension(ldc, *) :: c
real(8) :: alpha, beta
cublasXtZgemm
ZGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasXtzgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: transa, transb
integer(kind=c_intptr_t) :: m, n, k, lda, ldb, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldb, *) :: b
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha, beta
cublasXtZhemm
ZHEMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is an hermitian matrix and B and C are m by n matrices.
integer(4) function cublasXtzhemm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldb, *) :: b
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha, beta
cublasXtZherk
ZHERK performs one of the hermitian rank k operations C := alpha*A*A**H + beta*C, or C := alpha*A**H*A + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtzherk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldc, *) :: c
real(8) :: alpha, beta
cublasXtZher2k
ZHER2K performs one of the hermitian rank 2k operations C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C, or C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C, where alpha and beta are scalars with beta real, C is an n by n hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtzher2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldb, *) :: b
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha
real(8) :: beta
cublasXtZherkx
ZHERKX performs a variation of the hermitian rank k operations C := alpha*A*B**H + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix stored in lower or upper mode, and A and B are n by k matrices. See the CUBLAS documentation for more details.
integer(4) function cublasXtzherkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldb, *) :: b
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha
real(8) :: beta
cublasXtZsymm
ZSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
integer(4) function cublasXtzsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldb, *) :: b
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha, beta
cublasXtZsyrk
ZSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtzsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha, beta
cublasXtZsyr2k
ZSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtzsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldb, *) :: b
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha, beta
cublasXtZsyrkx
ZSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
integer(4) function cublasXtzsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: uplo, trans
integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldb, *) :: b
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha, beta
cublasXtZtrmm
ZTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ) where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H.
integer(4) function cublasXtztrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo, transa, diag
integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldb, *) :: b
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha
cublasXtZtrsm
ZTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
integer(4) function cublasXtztrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
type(cublasXtHandle) :: h
integer :: side, uplo, transa, diag
integer(kind=c_intptr_t) :: m, n, lda, ldb
complex(8), dimension(lda, *) :: a
complex(8), dimension(ldb, *) :: b
complex(8) :: alpha
cublasXtZspmm
ZSPMM performs one of the symmetric packed matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a n by n symmetric matrix stored in packed format, and B and C are m by n matrices.
integer(4) function cublasXtzspmm(h, side, uplo, m, n, alpha, ap, b, ldb, beta, c, ldc)
type(cublasXtHandle) :: h
integer :: side, uplo
integer(kind=c_intptr_t) :: m, n, ldb, ldc
complex(8), dimension(*) :: ap
complex(8), dimension(ldb, *) :: b
complex(8), dimension(ldc, *) :: c
complex(8) :: alpha, beta
CUBLAS MP Module Functions
This section contains interfaces to the cuBLAS MP Module Functions. Users can access this module by inserting the line use cublasMp
into the program unit. The cublasMp library is a host-side library which operates on distributed device data, and which supports multiple processes and GPUs. It is based on the ScaLAPACK PBLAS library.
Beginning with the 25.1 release, the cublasMp library has a newer API for CUDA versions > 12.0, specifically cublasMp version 0.3.0 and higher. For users of CUDA versions <= 11.8, the old module has been renamed, and you can access it by inserting the line use cublasMp02
in the program unit. One major difference in version 0.3.x is that all cublasMp functions now return a type(cublasMpStatus)
rather than an integer(4). There are other additions and changes which we will point out in the individual descriptions below. For complete documentation of the Fortran interfaces for cublasMp 0.2.x, please see the documentation from a 2024 NVHPC release.
Some overloaded operations for comparing and assigning type(cublasMpStatus)
variables and expressions are provided in the new module.
The cublasMp
module contains all the common types and definitions from the cublas
module, types and interfaces from the nvf_cal_comm
module, and these additional types and enumerations:
! Version information
integer, parameter :: CUBLASMP_VER_MAJOR = 0
integer, parameter :: CUBLASMP_VER_MINOR = 3
integer, parameter :: CUBLASMP_VER_PATCH = 0
integer, parameter :: CUBLASMP_VERSION = &
(CUBLASMP_VER_MAJOR * 1000 + CUBLASMP_VER_MINOR * 100 + CUBLASMP_VER_PATCH)
! New status type, with version 0.3.0
TYPE cublasMpStatus
integer(4) :: stat
END TYPE
TYPE(cublasMpStatus), parameter :: &
CUBLASMP_STATUS_SUCCESS = cublasMpStatus(0), &
CUBLASMP_STATUS_NOT_INITIALIZED = cublasMpStatus(1), &
CUBLASMP_STATUS_ALLOCATION_FAILED = cublasMpStatus(2), &
CUBLASMP_STATUS_INVALID_VALUE = cublasMpStatus(3), &
CUBLASMP_STATUS_ARCHITECTURE_MISMATCH = cublasMpStatus(4), &
CUBLASMP_STATUS_EXECUTION_FAILED = cublasMpStatus(5), &
CUBLASMP_STATUS_INTERNAL_ERROR = cublasMpStatus(6), &
CUBLASMP_STATUS_NOT_SUPPORTED = cublasMpStatus(7)
! Grid Layout
TYPE cublasMpGridLayout
integer(4) :: grid
END TYPE
TYPE(cublasMpGridLayout), parameter :: &
CUBLASMP_GRID_LAYOUT_COL_MAJOR = cublasMpGridLayout(0), &
CUBLASMP_GRID_LAYOUT_ROW_MAJOR = cublasMpGridLayout(1)
! Matmul Descriptor Attributes
TYPE cublasMpMatmulDescriptorAttribute
integer(4) :: attr
END TYPE
TYPE(cublasMpMatmulDescriptorAttribute), parameter :: &
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_TRANSA = cublasMpMatmulDescriptorAttribute(0), &
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_TRANSB = cublasMpMatmulDescriptorAttribute(1), &
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_COMPUTE_TYPE = cublasMpMatmulDescriptorAttribute(2), &
CUBLASMP_MATMUL_DESCRIPTOR_ATTRIBUTE_ALGO_TYPE = cublasMpMatmulDescriptorAttribute(3)
! Matmul Algorithm Type
TYPE cublasMpMatmulAlgoType
integer(4) :: atyp
END TYPE
TYPE(cublasMpMatmulAlgoType), parameter :: &
CUBLASMP_MATMUL_ALGO_TYPE_DEFAULT = cublasMpMatmulAlgoType(0), &
CUBLASMP_MATMUL_ALGO_TYPE_SPLIT_P2P = cublasMpMatmulAlgoType(1), &
CUBLASMP_MATMUL_ALGO_TYPE_SPLIT_MULTICAST = cublasMpMatmulAlgoType(2), &
CUBLASMP_MATMUL_ALGO_TYPE_ATOMIC_P2P = cublasMpMatmulAlgoType(3), &
CUBLASMP_MATMUL_ALGO_TYPE_ATOMIC_MULTICAST = cublasMpMatmulAlgoType(4)
TYPE cublasMpHandle
TYPE(C_PTR) :: handle
END TYPE
TYPE cublasMpGrid
TYPE(C_PTR) :: handle
END TYPE
TYPE cublasMpMatrixDescriptor
TYPE(C_PTR) :: handle
END TYPE
TYPE cublasMpMatmulDescriptor
TYPE(C_PTR) :: handle
END TYPE
cublasMpCreate
This function initializes the cublasMp API and creates a handle to an opaque structure holding the cublasMp library context. It allocates hardware resources on the host and device and must be called prior to making any other cublasMp library calls.
type(cublasMpStatus) function cublasMpCreate(handle, stream)
type(cublasMpHandle) :: handle
integer(kind=cuda_stream_kind()) :: stream
cublasMpDestroy
This function releases resources used by the cublasMp handle and context.
type(cublasMpStatus) function cublasMpDestroy(handle)
type(cublasMpHandle) :: handle
cublasMpStreamSet
This function sets the CUDA stream to be used in the cublasMp computations.
type(cublasMpStatus) function cublasMpStreamSet(handle, stream)
type(cublasMpHandle) :: handle
integer(kind=cuda_stream_kind()) :: stream
cublasMpStreamGet
This function returns the current CUDA stream used in the cublasMp computations.
type(cublasMpStatus) function cublasMpStreamGet(handle, stream)
type(cublasMpHandle) :: handle
integer(kind=cuda_stream_kind()) :: stream
cublasMpGetVersion
This function returns the version number of the cublasMp library.
type(cublasMpStatus) function cublasMpGetVersion(handle, version)
type(cublasMpHandle) :: handle
integer(4) :: version
cublasMpGridCreate
This function initializes the grid data structure used in the cublasMp library. It takes a communicator, and other information related to the data layout as inputs. Starting in version 0.3.0, it no longer takes a handle argument.
type(cublasMpStatus) function cublasMpGridCreate(nprow, npcol, &
layout, comm, grid)
integer(8) :: nprow, npcol
type(cublasMpGridLayout) :: layout ! usually column major in Fortran
type(cal_comm) :: comm
type(cublasMpGrid), intent(out) :: grid
cublasMpGridDestroy
This function releases the grid data structure used in the cublasMp library. Starting in version 0.3.0, it no longer takes a handle argument.
type(cublasMpStatus) function cublasMpGridDestroy(grid)
type(cublasMpGrid) :: grid
cublasMpMatrixDescriptorCreate
This function initializes the matrix descriptor object used in the cublasMp library. It takes the number of rows (M) and the number of columns (N) in the global array, along with the blocking factor over each dimension. RSRC and CSRC must currently be 0. LLD is the leading dimension of the local matrix, after blocking and distributing the matrix. Starting in version 0.3.0, it no longer takes a handle argument.
type(cublasMpStatus) function cublasMpMatrixDescriptorCreate(M, N, MB, NB, &
RSRC, CSRC, LLD, dataType, grid, descr)
integer(8) :: M, N, MB, NB, RSRC, CSRC, LLD
type(cudaDataType) :: dataType
type(cublasMpGrid) :: grid
type(cublasMpMatrixDescriptor), intent(out) :: descr
cublasMpMatrixDescriptorDestroy
This function frees the matrix descriptor object used in the cublasMp library. Starting in version 0.3.0, it no longer takes a handle argument.
type(cublasMpStatus) function cublasMpMatrixDescriptorDestroy(descr)
type(cublasMpMatrixDescriptor) :: descr
cublasMpMatrixDescriptorInit
This function initializes the values within the matrix descriptor object used in the cublasMp library. It takes the number of rows (M) and the number of columns (N) in the global array, along with the blocking factor over each dimension. RSRC and CSRC must currently be 0. LLD is the leading dimension of the local matrix, after blocking and distributing the matrix.
type(cublasMpStatus) function cublasMpMatrixDescriptorInit(M, N, MB, NB, &
RSRC, CSRC, LLD, dataType, grid, descr)
integer(8) :: M, N, MB, NB, RSRC, CSRC, LLD
type(cudaDataType) :: dataType
type(cublasMpGrid) :: grid
type(cublasMpMatrixDescriptor), intent(out) :: descr
cublasMpNumroc
This function computes (and returns) the local number of rows or columns of a distributed matrix, similar to the ScaLAPACK NUMROC function.
type(cublasMpStatus) function cublasMpNumroc(N, NB, iproc, isrcproc, nprocs)
integer(8) :: N, NB
integer(4) :: iproc, isrcproc, nprocs
cublasMpMatmulDescriptorCreate
This function initializes the matmul descriptor object used in the cublasMp library.
type(cublasMpStatus) function cublasMpMatmulDescriptorCreate(descr, computeType)
type(cublasMpMatmulDescriptor) :: descr
type(cublasComputeType) :: computeType
cublasMpMatmulDescriptorDestroy
This function destroys the matmul descriptor object used in the cublasMp library.
type(cublasMpStatus) function cublasMpMatmulDescriptorDestroy(descr)
type(cublasMpMatmulDescriptor) :: descr
cublasMpMatmulDescriptorAttributeSet
This function sets attributes within the matmul descriptor object used in the cublasMp library.
type(cublasMpStatus) function cublasMpMatmulDescriptorAttributeSet(descr, attr &
buf, sizeInBytes)
type(cublasMpMatmulDescriptor) :: descr
type(cublasMpMatmulDescriptorAttribute) :: attr
integer(1) :: buf(sizeInBytes) ! Any type, kind, or rank allowed
integer(8) :: sizeInBytes
cublasMpMatmulDescriptorAttributeGet
This function retrieves attributes within the matmul descriptor object used in the cublasMp library.
type(cublasMpStatus) function cublasMpMatmulDescriptorAttributeGet(descr, attr &
buf, sizeInBytes, sizeWritten)
type(cublasMpMatmulDescriptor) :: descr
type(cublasMpMatmulDescriptorAttribute) :: attr
integer(1) :: buf(sizeInBytes) ! Any type, kind, or rank allowed
integer(8) :: sizeInBytes, sizeWritten
cublasMpGemr2D_bufferSize
This functions computes the workspace requirements of cublasMpGemr2D
type(cublasMpStatus) function cublasMpGemr2D_bufferSize(handle, M, N, &
A, IA, JA, descrA, B, IB, JB, descrB, &
devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes, comm)
type(cublasMpHandle) :: handle
integer(8), intent(in) :: M, N, IA, JA, IB, JB
real(4), device, dimension(*) :: A, B ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB
integer(8), intent(out) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
type(cal_comm) :: comm
cublasMpGemr2D
This functions copies a matrix from one distributed form to another. The layout of each matrix is defined in the matrix descriptor. M and N are the global matrix dimensions. IA, JA, IB, and JB are 1-based, and typically equal to 1 for a full matrix.
type(cublasMpStatus) function cublasMpGemr2D(handle, M, N, &
A, IA, JA, descrA, B, IB, JB, descrB, &
bufferOnDevice, devWorkspaceSizeInBytes, &
bufferOnHost, hostWorkspaceSizeInBytes, comm)
type(cublasMpHandle) :: handle
integer(8), intent(in) :: M, N, IA, JA, IB, JB
real(4), device, dimension(*) :: A, B ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB
integer(8), intent(in) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
integer(1), device :: bufferOnDevice(devWorkspaceSizeInBytes) ! Any type
integer(1) :: bufferOnHost(hostWorkspaceSizeInBytes) ! Any type
type(cal_comm) :: comm
cublasMpTrmr2D_bufferSize
This functions computes the workspace requirements of cublasMpTrmr2D
type(cublasMpStatus) function cublasMpTrmr2D_bufferSize(handle, uplo, diag, &
M, N, A, IA, JA, descrA, B, IB, JB, descrB, &
devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes, comm)
type(cublasMpHandle) :: handle
integer(4), intent(in) :: uplo, diag
integer(8), intent(in) :: M, N, IA, JA, IB, JB
real(4), device, dimension(*) :: A, B ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB
integer(8), intent(out) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
type(cal_comm) :: comm
cublasMpTrmr2D
This functions copies a trapezoidal matrix from one distributed form to another. The layout of each matrix is defined in the matrix descriptor. M and N are the global matrix dimensions. IA, JA, IB, and JB are 1-based, and typically equal to 1 for a full matrix.
type(cublasMpStatus) function cublasMpTrmr2D(handle, uplo, diag, &
M, N, A, IA, JA, descrA, B, IB, JB, descrB, &
bufferOnDevice, devWorkspaceSizeInBytes, &
bufferOnHost, hostWorkspaceSizeInBytes, comm)
type(cublasMpHandle) :: handle
integer(4), intent(in) :: uplo, diag
integer(8), intent(in) :: M, N, IA, JA, IB, JB
real(4), device, dimension(*) :: A, B ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB
integer(8), intent(in) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
integer(1), device :: bufferOnDevice(devWorkspaceSizeInBytes) ! Any type
integer(1) :: bufferOnHost(hostWorkspaceSizeInBytes) ! Any type
type(cal_comm) :: comm
cublasMpGemm_bufferSize
This functions computes the workspace requirements of cublasMpGemm.
type(cublasMpStatus) function cublasMpGemm_bufferSize(handle, transA, transB, M, N, K, &
alpha, A, IA, JA, descrA, B, IB, JB, descrB, beta, C, IC, JC, descrC, &
computeType, devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: transA, transB
integer(8), intent(in) :: M, N, K, IA, JA, IB, JB, IC, JC
real(4) :: alpha, beta ! type and kind compatible with computeType
real(4), device, dimension(*) :: A, B, C ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB, descrC
type(cublasComputeType) :: computeType
integer(8), intent(out) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
cublasMpGemm
This is the multi-processor version of the BLAS GEMM operation, similar to the ScaLAPACK PBLAS functions pdgemm, pzgemm, etc.
GEMM performs one of the matrix-matrix operations
C := alpha*op( A )*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X**T,
alpha and beta are scalars, and A, B, and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix. The data for A, B, and C should be properly distributed over the process grid. That mapping is contained within the descriptors descrA, descrB, and descrC via the cublasMpMatrixDescriptorCreate()
function. The datatype is also specified then. M, N, and K are the global matrix dimensions. IA, JA, IB, JB, IC, and JC are 1-based, and typically equal to 1 for a full matrix. Integer(4) input values will be promoted to integer(8) according to the interface.
type(cublasMpStatus) function cublasMpGemm(handle, transA, transB, M, N, K, &
alpha, A, IA, JA, descrA, B, IB, JB, descrB, beta, C, IC, JC, descrC, &
computeType, bufferOnDevice, devWorkspaceSizeInBytes, &
bufferOnHost, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: transA, transB
integer(8), intent(in) :: M, N, K, IA, JA, IB, JB, IC, JC
real(4) :: alpha, beta ! type and kind compatible with computeType
real(4), device, dimension(*) :: A, B, C ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB, descrC
type(cublasComputeType) :: computeType
integer(8), intent(in) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
integer(1), device :: bufferOnDevice(devWorkspaceInBytes) ! Any type
integer(1) :: bufferOnHost(hostWorkspaceInBytes) ! Any type
cublasMpMatmul_bufferSize
This functions computes the workspace requirements of cublasMpMatmul.
type(cublasMpStatus) function cublasMpMatmul_bufferSize(handle, matmulDescr, M, N, K, &
alpha, A, IA, JA, descrA, B, IB, JB, descrB, beta, C, IC, JC, descrC, &
D, ID, JD, descrD, devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
type(cublasMpMatmulDescriptor) :: matmulDescr
integer(8), intent(in) :: M, N, K, IA, JA, IB, JB, IC, JC, ID, JD
real(4) :: alpha, beta ! Any compatible kind
real(4), device, dimension(*) :: A, B, C, D ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB, descrC, descrD
integer(8), intent(out) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
cublasMpMatmul
This is the multi-processor version of the matrix multiplication operation.
Matmul performs one of the matrix-matrix operations
D := alpha*op( A )*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X**T, as set by a call to cublasMpMatmulDescriptorAttributeSet()
.
alpha and beta are scalars, and A, B, C, and D are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C and D are m by n matrices. The data for A, B, C, and D should be properly distributed over the process grid. That mapping is contained within the descriptors descrA, descrB, descrC, and descrD via the cublasMpMatrixDescriptorCreate()
function. The datatype is also specified there. M, N, and K are the global matrix dimensions. IA, JA, IB, JB, IC, JC, ID, and JD are 1-based, and typically equal to 1 for a full matrix. Integer(4) input values will be promoted to integer(8) according to the interface.
type(cublasMpStatus) function cublasMpMatmul(handle, matmulDescr, M, N, K, &
alpha, A, IA, JA, descrA, B, IB, JB, descrB, beta, C, IC, JC, descrC, &
D, ID, JD, descrD, bufferOnDevice, devWorkspaceSizeInBytes, &
bufferOnHost, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
type(cublasMpMatmulDescriptor) :: matmulDescr
integer(8), intent(in) :: M, N, K, IA, JA, IB, JB, IC, JC, ID, JD
real(4) :: alpha, beta ! Any supported type and kind
real(4), device, dimension(*) :: A, B, C, D ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB, descrC, descrD
integer(8), intent(in) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
integer(1), device :: bufferOnDevice(devWorkspaceInBytes) ! Any type
integer(1) :: bufferOnHost(hostWorkspaceInBytes) ! Any type
cublasMpSyrk
This is the multi-processor version of the BLAS SYRK operation, similar to the ScaLAPACK PBLAS functions pdsyrk, pzsyrk, etc.
SYRK performs one of the symmetric rank k operations
C := alpha*A*A**T + beta*C, or
C := alpha*A**T*A + beta*C
alpha and beta are scalars, and A and C are matrices. A is either N x K or K x N depending on the trans argument, and C is N x N. The data for A and C should be properly distributed over the process grid. That mapping is contained within the descriptors descrA and descrC via the cublasMpMatrixDescriptorCreate() function. The datatype is also specified then. N and K are the global matrix dimensions. IA, JA, IC, and JC are 1-based, and typically equal to 1 for a full matrix. Integer(4) input values will be promoted to integer(8) according to the interface.
type(cublasMpStatus) function cublasMpSyrk(handle, uplo, trans, &
N, K, alpha, A, IA, JA, descrA, beta, C, IC, JC, descrC, &
computeType, bufferOnDevice, devWorkspaceSizeInBytes, &
bufferOnHost, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: uplo, trans
integer(8), intent(in) :: N, K, IA, JA, IC, JC
real(4) :: alpha, beta ! type and kind compatible with computeType
real(4), device, dimension(*) :: A, C ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrC
type(cublasComputeType) :: computeType
integer(8), intent(in) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
integer(1), device :: bufferOnDevice(devWorkspaceSizeInBytes) ! Any type
integer(1) :: bufferOnHost(hostWorkspaceSizeInBytes) ! Any type
cublasMpSyrk
This is the multi-processor version of the BLAS SYRK operation, similar to the ScaLAPACK PBLAS functions pdsyrk, pzsyrk, etc.
SYRK performs one of the symmetric rank k operations
C := alpha*A*A**T + beta*C, or
C := alpha*A**T*A + beta*C
alpha and beta are scalars, and A and C are matrices. A is either N x K or K x N depending on the trans argument, and C is N x N. The data for A and C should be properly distributed over the process grid. That mapping is contained within the descriptors descrA and descrC via the cublasMpMatrixDescriptorCreate() function. The datatype is also specified then. N and K are the global matrix dimensions. IA, JA, IC, and JC are 1-based, and typically equal to 1 for a full matrix. Integer(4) input values will be promoted to integer(8) according to the interface.
type(cublasMpStatus) function cublasMpSyrk(handle, uplo, trans, &
N, K, alpha, A, IA, JA, descrA, beta, C, IC, JC, descrC, &
computeType, bufferOnDevice, devWorkspaceSizeInBytes, &
bufferOnHost, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: uplo, trans
integer(8), intent(in) :: N, K, IA, JA, IC, JC
real(4) :: alpha, beta ! type and kind compatible with computeType
real(4), device, dimension(*) :: A, C ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrC
type(cublasComputeType) :: computeType
integer(8), intent(in) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
integer(1), device :: bufferOnDevice(devWorkspaceSizeInBytes) ! Any type
integer(1) :: bufferOnHost(hostWorkspaceSizeInBytes) ! Any type
cublasMpTrsm_bufferSize
This functions computes the workspace requirements of cublasMpTrsm.
type(cublasMpStatus) function cublasMpTrsm_bufferSize(handle, side, uplo, trans, diag, &
M, N, alpha, A, IA, JA, descrA, B, IB, JB, descrB, &
computeType, devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: side, uplo, trans, diag
integer(8), intent(in) :: M, N, IA, JA, IB, JB
real(4) :: alpha ! type and kind compatible with computeType
real(4), device, dimension(*) :: A, B ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB
type(cublasComputeType) :: computeType
integer(8), intent(out) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
cublasMpTrsm
This is the multi-processor version of the BLAS TRSM operation, similar to the ScaLAPACK PBLAS functions pdtrsm, pztrsm, etc.
TRSM solves one of the matrix equations
op( A )*X = alpha*B, or
X*op( A ) = alpha*B
alpha is a scalar, A and B are matrices whose dimensions are determined by the side argument. The data for A and B should be properly distributed over the process grid. That mapping is contained within the descriptors descrA and descrB via the cublasMpMatrixDescriptorCreate() function. The datatype is also specified then. M and N are the global matrix dimensions. IA, JA, IB, and JB are 1-based, and typically equal to 1 for a full matrix. Integer(4) input values will be promoted to integer(8) according to the interface.
type(cublasMpStatus) function cublasMpTrsm(handle, side, uplo, trans, diag, &
M, N, alpha, A, IA, JA, descrA, B, IB, JB, descrB, &
computeType, bufferOnDevice, devWorkspaceSizeInBytes, &
bufferOnHost, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: side, uplo, trans, diag
integer(8), intent(in) :: M, N, IA, JA, IB, JB
real(4) :: alpha ! type and kind compatible with computeType
real(4), device, dimension(*) :: A, B ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrB
type(cublasComputeType) :: computeType
integer(8), intent(in) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
integer(1), device :: bufferOnDevice(devWorkspaceSizeInBytes) ! Any type
integer(1) :: bufferOnHost(hostWorkspaceSizeInBytes) ! Any type
cublasMpGeadd_bufferSize
This functions computes the workspace requirements of cublasMpGeadd.
type(cublasMpStatus) function cublasMpGeadd_bufferSize(handle, trans, &
M, N, alpha, A, IA, JA, descrA, beta, C, IC, JC, descrC, &
devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: trans
integer(8), intent(in) :: M, N, IA, JA, IC, JC
real(4) :: alpha, beta ! Any compatible kind
real(4), device, dimension(*) :: A, C ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrC
integer(8), intent(out) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
cublasMpGeadd
This is the multi-processor version of a general matrix addition function.
GEADD performs the matrix-matrix addition operation
C := alpha*A + beta*C
alpha and beta are scalars, and A and C are matrices. A is either M x N or N x M depending on the trans argument, and C is M x N. The data for A and C should be properly distributed over the process grid. That mapping is contained within the descriptors descrA and descrC via the cublasMpMatrixDescriptorCreate() function. The datatype is also specified then. M and N are the global matrix dimensions. IA, JA, IC, and JC are 1-based, and typically equal to 1 for a full matrix. Integer(4) input values will be promoted to integer(8) according to the interface.
type(cublasMpStatus) function cublasMpGeadd(handle, trans, &
M, N, alpha, A, IA, JA, descrA, beta, C, IC, JC, descrC, &
bufferOnDevice, devWorkspaceSizeInBytes, &
bufferOnHost, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: trans
integer(8), intent(in) :: M, N, IA, JA, IC, JC
real(4) :: alpha, beta ! Any compatible type and kind
real(4), device, dimension(*) :: A, C ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrC
integer(8), intent(in) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
integer(1), device :: bufferOnDevice(devWorkspaceSizeInBytes) ! Any type
integer(1) :: bufferOnHost(hostWorkspaceSizeInBytes) ! Any type
cublasMpTradd_bufferSize
This functions computes the workspace requirements of cublasMpTradd.
type(cublasMpStatus) function cublasMpTradd_bufferSize(handle, uplo, trans, &
M, N, alpha, A, IA, JA, descrA, beta, C, IC, JC, descrC, &
devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: uplo, trans
integer(8), intent(in) :: M, N, IA, JA, IC, JC
real(4) :: alpha, beta ! Any compatible kind
real(4), device, dimension(*) :: A, C ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrC
integer(8), intent(out) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
cublasMpTradd
This is the multi-processor version of a trapezoidal matrix addition function.
TRADD performs the trapezoidal matrix-matrix addition operation
C := alpha*A + beta*C
alpha and beta are scalars, and A and C are matrices. A is either M x N or N x M depending on the trans argument, and C is M x N. The data for A and C should be properly distributed over the process grid. That mapping is contained within the descriptors descrA and descrC via the cublasMpMatrixDescriptorCreate() function. The datatype is also specified then. M and N are the global matrix dimensions. IA, JA, IC, and JC are 1-based, and typically equal to 1 for a full matrix. Integer(4) input values will be promoted to integer(8) according to the interface.
type(cublasMpStatus) function cublasMpTradd(handle, uplo, trans, &
M, N, alpha, A, IA, JA, descrA, beta, C, IC, JC, descrC, &
bufferOnDevice, devWorkspaceSizeInBytes, &
bufferOnHost, hostWorkspaceSizeInBytes)
type(cublasMpHandle) :: handle
integer(4) :: uplo, trans
integer(8), intent(in) :: M, N, IA, JA, IC, JC
real(4) :: alpha, beta ! Any compatible type and kind
real(4), device, dimension(*) :: A, C ! Any supported type and kind
type(cublasMpMatrixDescriptor) :: descrA, descrC
integer(8), intent(in) :: devWorkspaceSizeInBytes, hostWorkspaceSizeInBytes
integer(1), device :: bufferOnDevice(devWorkspaceSizeInBytes) ! Any type
integer(1) :: bufferOnHost(hostWorkspaceSizeInBytes) ! Any type
cublasMpLoggerSetFile
This function specifies the Fortran unit to be used as the cublasMp logfile.
type(cublasMpStatus) function cublasMpLoggerSetFile(unit)
integer :: unit
cublasMpLoggerOpenFile
This function specifies a Fortran character string to be opened and used as the cublasMp logfile.
type(cublasMpStatus) function cublasMpLoggerOpenFile(logFile)
character*(*) :: logFile
cublasMpLoggerSetLevel
This function specifies the cublasMp logging level.
type(cublasMpStatus) function cublasMpLoggerSetLevel(level)
integer :: level
cublasMpLoggerSetMask
This function specifies the cublasMp logging mask.
type(cublasMpStatus) function cublasMpLoggerSetMask(mask)
integer :: mask
cublasMpLoggerForceDisable
This function disables cublasMp logging.
type(cublasMpStatus) function cublasMpLoggerForceDisable()