Preface
This document describes the PGI Fortran interfaces to cuBLAS, cuFFT, cuRAND, and cuSPARSE, which are CUDA Libraries used in scientific and engineering applications built upon the CUDA computing architecture.
Intended Audience
This guide is intended for application programmers, scientists and engineers proficient in programming with the Fortran language. This guide assumes some familiarity with either CUDA Fortran or OpenACC.
Organization
The organization of this document is as follows:
- Introduction
- contains a general introduction to Fortran interfaces, OpenACC, CUDA Fortran, and CUDA Library functions
- BLAS Runtime Library APIs
- describes the Fortran interfaces to the various cuBLAS libraries
- FFT Runtime Library APIs
- describes the module types, definitions and Fortran interfaces to the cuFFT library
- Random Number Runtime APIs
- describes the Fortran interfaces to the host and device cuRAND libraries
- Sparse Matrix Runtime APIs
- describes the module types, definitions and Fortran interfaces to the cuSPARSE Library
- Examples
- provides sample code and an explanation of each of the simple examples.
Conventions
This guide uses the following conventions:
- italic
- is used for emphasis.
- Constant Width
- is used for filenames, directories, arguments, options, examples, and for language statements in the text, including assembly language statements.
- Bold
- is used for commands.
- [ item1 ]
- in general, square brackets indicate optional items. In this case item1 is optional. In the context of p/t-sets, square brackets are required to specify a p/t-set.
- { item2 | item 3 }
- braces indicate that a selection is required. In this case, you must select either item2 or item3.
- filename ...
- ellipsis indicate a repetition. Zero or more of the preceding item may occur. In this example, multiple filenames are allowed.
- FORTRAN
- Fortran language statements are shown in the text of this guide using a reduced fixed point size.
- C/C++
- C/C++ language statements are shown in the test of this guide using a reduced fixed point size.
The PGI compilers and tools are supported on a wide variety of Linux, macOS and Windows operating systems running on 64-bit x86-compatible processors, and on Linux running on OpenPOWER processors. (Currently, the PGI debugger is supported on x86-64/x64 only.) See the Compatibility and Installation section on the PGI website for a comprehensive listing of supported platforms.
1. Introduction
This document provides a reference for calling CUDA Library functions from PGI Fortran. It can be used from Fortran code using the OpenACC programming model, or from PGI CUDA Fortran. Currently, the CUDA libraries which PGI provides pre-built interface modules for, and which are documented here, are:
- cuBLAS, an implementation of the BLAS.
- cuFFT, a library of Fast Fourier Transform (FFT) routines.
- cuRAND, a library for random number generation.
- cuSPARSE, a library of linear algebra routines used with sparse matrices.
The OpenACC Application Program Interface is a collection of compiler directives and runtime routines that allows the programmer to specify loops and regions of code for offloading from a host CPU to an attached accelerator, such as a GPU. The OpenACC API was designed and is maintained by an industry consortium. See the OpenACC website for more information about the OpenACC API.
CUDA Fortran is a small set of extensions to Fortran that supports and is built upon the CUDA computing architecture. CUDA Fortran includes a Fortran 2003 compiler and tool chain for programming NVIDIA GPUs using Fortran, and is an analog to NVIDIA's CUDA C compiler. Compared to the PGI Accelerator and OpenACC directives-based model and compilers, CUDA Fortran is a lower-level explicit programming model with substantial runtime library components that give expert programmers direct control of all aspects of GPGPU programming.
This document does not contain explanations or purposes of the library functions, nor does it contain details of the approach used in the CUDA implementation to target GPUs. For that information, please see the appropriate library document that comes with the NVIDIA CUDA Toolkit. This document does provide the Fortran module contents: derived types, enumerations, and interfaces, to make use of the libraries from Fortran rather than from C or C++.
Many of the examples used in this document are provided in the PGI compiler and tools distribution, along with Makefiles, and are stored in the yearly directory, such as 2016/CUDA-Libraries.
1.1. Fortran Interfaces and Wrappers
Almost all of the function interfaces shown in this document make use of features from the Fortran 2003 iso_c_binding intrinsic module. This module provides a standard way for dealing with isues such as inter-language data types, capitalization, adding underscores to symbol names, or passing arguments by value.
Often, the iso_c_binding module enables Fortran programs containing properly written interfaces to call directly into the C library functions. In some cases, PGI has written small wrappers around the C library function, to make the Fortran call site more "Fortran-like", hiding some issues exposed in the C interfaces like handle management, host vs. device pointer management, or character and complex data type issues.
In a small number of cases, the C Library may contain multiple entry points to handle different data types, perhaps an int in one function and a size_t in another, otherwise the functions are identical. In these cases, PGI may provide just one generic Fortran interface, and will call the appropriate C function under the hood.
1.2. Using CUDA Libraries from OpenACC Host Code
All four of the libraries covered in this document contain functions which are callable from OpenACC host code. Most functions take some arguments which are expected to be device pointers (the address of a variable in device global memory). There are several ways to do that in OpenACC.
If the call is lexically nested within an OpenACC data directive, the PGI Fortran compiler, in the presence of an explicit interface such as those provided by the PGI library modules, will default to passing the device pointer when required.
subroutine hostcall(a, b, n) use cublas real a(n), b(n) !$acc data copy(a, b) call cublasSswap(n, a, 1, b, 1) !$acc end data return end
A Fortran interface is made explicit when you use the module that contains it, as in the line use cublas in the example above. If you look ahead to the actual interface for cublasSswap, you will see that the arrays a and b are declared with the CUDA Fortran device attribute, so they take only device addresses as arguments.
It is more acceptable and general when using OpenACC to pass device pointers to subprograms by using the host_data clause as most implementations don't have a way to mark arguments as device pointers. The host_data construct with the use_device clause makes the device addresses available in host code for passing to the subprogram.
use cufft use openacc . . . !$acc data copyin(a), copyout(b,c) ierr = cufftPlan2D(iplan1,m,n,CUFFT_C2C) ierr = ierr + cufftSetStream(iplan1,acc_get_cuda_stream(acc_async_sync)) !$acc host_data use_device(a,b,c) ierr = ierr + cufftExecC2C(iplan1,a,b,CUFFT_FORWARD) ierr = ierr + cufftExecC2C(iplan1,b,c,CUFFT_INVERSE) !$acc end host_data ! scale c !$acc kernels c = c / (m*n) !$acc end kernels !$acc end data
This code snippet also shows an example of sharing the stream that OpenACC and the cuFFT library use. Every library in this document has a function for setting the CUDA stream which the library runs on. Usually, when using OpenACC, you want the OpenACC kernels to run on the same stream as the library functions. In the case above, this guarantees that the kernel c = c / (m*n) does not start until the FFT operations complete. The function acc_get_cuda_stream and the definition for acc_async_sync are in the openacc module.
1.3. Using CUDA Libraries from OpenACC Device Code
Two libraries are available from withn OpenACC compute regions, though the two can behave quite differently. Functions in both the openacc_cublas module and the openacc_curand module are marked acc routine seq. The cuBLAS interfaces from device code closely mirror what's available from the host, but the underlying implementation may launch a new kernel using CUDA dynamic parallelism. The routines should not be called by multiple threads if you expect the threads to cooperate together to compute the answer.
subroutine testdev( a, b, n ) use openacc_cublas real :: a(n), b(n) type(cublasHandle) :: h !$acc parallel num_gangs(1) copy(a,b,h) j = cublasCreate(h) j = cublasSswap(h,n,a,1,b,1) j = cublasDestroy(h) !$acc end parallel return end subroutine
When using the openacc_cublas module, you must link with -lcublas_device (or defaultlib:cublas_device on Windows) and compile and link with -Mcuda.
The cuRAND device library is all contained within CUDA header files. In device code, it is designed to return one or a small number of random numbers per thread. The thread's random generators run independently of each other, and it is usually advised for performance reasons to give each thread a different seed, rather than a different offset.
program t use openacc_curand integer, parameter :: n = 500 real a(n,n,4) type(curandStateXORWOW) :: h integer(8) :: seed, seq, offset a = 0.0 !$acc parallel num_gangs(n) vector_length(n) copy(a) !$acc loop gang do j = 1, n !$acc loop vector private(h) do i = 1, n seed = 12345_8 + j*n*n + i*2 seq = 0_8 offset = 0_8 call curand_init(seed, seq, offset, h) !$acc loop seq do k = 1, 4 a(i,j,k) = curand_uniform(h) end do end do end do !$acc end parallel print *,maxval(a),minval(a),sum(a)/(n*n*4) end
When using the openacc_curand module, since all the code is contained in CUDA header files, you do not need any additional libraries on the link line. However, since the current implementation relies on CUDA compilation, you must compile with -ta=tesla,nollvm.
1.4. Using CUDA Libraries from CUDA Fortran Host Code
The predominant usage model for the library functions listed in this document is to call them from CUDA Host code. CUDA Fortran allows some special capabilities in that the compiler is able to recognize the device and managed attribute in resolving generic interfaces. Device actual arguments can only match the interface's device dummy arguments; managed actual arguments, by precedence, match managed dummy arguments first, then device dummies, then host.
program testisamax ! link with -Mcudalib=cublas -lblas use cublas real*4 x(1000) real*4, device :: xd(1000) real*4, managed :: xm(1000) call random_number(x) ! Call host BLAS j = isamax(1000,x,1) xd = x ! Call cuBLAS k = isamax(1000,xd,1) print *,j.eq.k xm = x ! Also calls cuBLAS k = isamax(1000,xm,1) print *,j.eq.k end
Using the cudafor module, the full set of CUDA functionality is available to programmers for managing CUDA events, streams, synchronization, and asynchronous behaviors. CUDA Fortran can be used in OpenMP programs, and the CUDA Libraries in this document are thread safe with respect to host CPU threads. Further examples are included in chapter Examples.
1.5. Using CUDA Libraries from CUDA Fortran Device Code
The cuBLAS and cuRAND libraries have functions callable from CUDA Fortran device code, and their interfaces are accessed via the cublas_device and curand_device modules, respectively. The module interfaces are very similar to the modules used in OpenACC device code, but for CUDA Fortran, each subroutine and function is declared attributes(device), and the subroutines and functions do not need to be marked as acc routine seq.
! cuBLAS in device code requires -Mcuda=cc35 or higher ! since it potentially uses dynamic parallelism to launch kernels. ! pgfortran -Mcuda=cc35 testcu.cuf -lcublas_device attributes(global) subroutine testcu( a, b, n ) use cublas_device real, device :: a(*), b(*) type(cublasHandle) :: h integer, value :: n i = threadIdx%x if (i.eq.1) then j = cublasCreate(h) j = cublasSswap(h,n,a,1,b,1) j = cublasDestroy(h) end if return end subroutine
Using the device cuRAND library with CUDA Fortran also requires compiling with -Mcuda=nollvm so the CUDA in the cuRAND headers can get compiled.
module mrand use curand_device integer, parameter :: n = 500 contains attributes(global) subroutine randsub(a) real, device :: a(n,n,4) type(curandStateXORWOW) :: h integer(8) :: seed, seq, offset j = blockIdx%x; i = threadIdx%x seed = 12345_8 + j*n*n + i*2 seq = 0_8 offset = 0_8 call curand_init(seed, seq, offset, h) do k = 1, 4 a(i,j,k) = curand_uniform(h) end do end subroutine end module program t ! pgfortran -Mcuda=nollvm t.cuf use mrand use cudafor ! recognize maxval, minval, sum w/managed real, managed :: a(n,n,4) a = 0.0 call randsub<<<n,n>>>(a) print *,maxval(a),minval(a),sum(a)/(n*n*4) end program
1.6. Pointer Modes in cuBLAS and cuSPARSE
Because the PGI Fortran compiler can distinguish between host and device arguments, the PGI modules for interfacing to cuBLAS and cuSPARSE handle pointer modes differently than CUDA C, which requires setting the mode explicitly for scalar arguments. Examples of scalar arguments which can reside either on the host or device are the alpha and beta scale factors to the *gemm functions.
Typically, when using the normal "non-_v2" interfaces in the cuBLAS and cuSPARSE modules, the runtime wrappers will implicitly add the setting and restoring of the library pointer modes behind the scenes. This adds some negligible but non-zero overhead to the calls.
To avoid the implicit getting and setting of the pointer mode with every invocation of a library function do the following:
-
For the BLAS, use the cublas_v2 module, and the v2 entry points, such as cublasIsamax_v2. It is the programmer's responsibility to properly set the pointer mode when needed. Examples of scalar arguments which do require setting the pointer mode are the alpha and beta scale factors passed to the *gemm routines, and the scalar results returned from the v2 versions of the *amax(), *amin(), *asum(), *rotg(), *rotmg(), *nrm2(), and *dot() functions. In the v2 interfaces shown in the chapter 2, these scalar arguments will have the comment ! device or host variable. Examples of scalar arguments which do not require setting the pointer mode are increments, extents, and lengths such as incx, incy, n, lda, ldb, and ldc.
-
For the cuSPARSE library, each function listed in chapter 5 which contains scalar arguments with the comment ! device or host variable has a corresponding v2 interface, though it is not documented here. For instance, in addition to the interface named cusparseSaxpyi, there is another interface named cusparseSaxpyi_v2 with the exact same argument list which calls into the cuSPARSE library directly and will not implicitly get or set the library pointer mode.
The CUDA default pointer mode is that the scalar arguments reside on the host. The PGI runtime does not change that setting.
1.7. Writing Your Own CUDA Interfaces
Despite the large number of interfaces included in the modules described in this document, users will have the need from time-to-time to write their own interfaces to new libraries or their own tuned CUDA, perhaps written in C/C++. There are some standard techniques to use, and some non-standard PGI extensions which can make creating working interfaces easier.
! cufftExecC2C interface cufftExecC2C integer function cufftExecC2C( plan, idata, odata, direction ) bind(C,name='cufftExecC2C') integer, value :: plan complex, device, dimension(*) :: idata, odata integer, value :: direction end function cufftExecC2C end interface cufftExecC2C
This interface calls the C library function directly. You can deal with Fortran's capitalization issues by putting the properly capitalized C function in the bind(C) attribute. If the C function expects input arguments passed by value, you can add the value attribute to the dummy declaration as well. A nice feature of Fortran is that the interface can change, but the code at the call site may not have to. The compiler changes the details of the call to fit the interface.
Now suppose a user of this interface would like to call this function with REAL data (F77 code is notorious for mixing REAL and COMPLEX declarations). There are two ways to do this:
! cufftExecC2C interface cufftExecC2C integer function cufftExecC2C( plan, idata, odata, direction ) bind(C,name='cufftExecC2C') integer, value :: plan complex, device, dimension(*) :: idata, odata integer, value :: direction end function cufftExecC2C integer function cufftExecR2R( plan, idata, odata, direction ) bind(C,name='cufftExecC2C') integer, value :: plan real, device, dimension(*) :: idata, odata integer, value :: direction end function cufftExecR2R end interface cufftExecC2C
Here the C name hasn't changed. The compiler will now accept actual arguments corresponding to idata and odata that are declared REAL. A generic interface is created named cufftExecC2C. If you have problems debugging your generic interface, as a debugging aid you can try calling the specific name, cufftExecR2R in this case, to help diagnose the problem.
A commonly used extension which is supported by PGI is ignore_tkr. A programmer can use it in an interface to instruct the compiler to ignore any combination of the type, kind, and rank during the interface matching process. The previous example using ignore_tkr looks like this:
! cufftExecC2C interface cufftExecC2C integer function cufftExecC2C( plan, idata, odata, direction ) bind(C,name='cufftExecC2C') integer, value :: plan !dir$ ignore_tkr(tr) idata, (tr) odata complex, device, dimension(*) :: idata, odata integer, value :: direction end function cufftExecC2C end interface cufftExecC2C
Now the compiler will ignore both the type and rank (F77 could also be sloppy in its handling of array dimensions) of idata and odata when matching the call site to the interface. An unfortunate side-effect is that the interface will now allow integer, logical, and character data for idata and odata. It is up to the implementor to determine if that is acceptable.
A final aid, specific to PGI, worth mentioning here is ignore_tkr (d), which ignores the device attribute of an actual argument during interface matching.
Of course, if you write a wrapper, a narrow strip of code between the Fortran call and your library function, you are not limited by the simple transormations that a compiler can do, such as those listed here. As mentioned earlier, many of the interfaces provided in the cuBLAS and cuSPARSE modules use wrappers.
A common request is a way for Fortran programmers to take advantage of the thrust library. Explaining thrust and C++ programming is outside of the scope of this document, but this simple example can show how to take advantage of the excellent sort capabilities in thrust:
// Filename: csort.cu // nvcc -c -arch sm_35 csort.cu #include <thrust/device_vector.h> #include <thrust/copy.h> #include <thrust/sort.h> extern "C" { //Sort for integer arrays void thrust_int_sort_wrapper( int *data, int N) { thrust::device_ptr <int> dev_ptr(data); thrust::sort(dev_ptr, dev_ptr+N); } //Sort for float arrays void thrust_float_sort_wrapper( float *data, int N) { thrust::device_ptr <float> dev_ptr(data); thrust::sort(dev_ptr, dev_ptr+N); } //Sort for double arrays void thrust_double_sort_wrapper( double *data, int N) { thrust::device_ptr <double> dev_ptr(data); thrust::sort(dev_ptr, dev_ptr+N); } }
Set up interface to the sort subroutine in Fortran and calls are simple:
program t interface sort subroutine sort_int(array, n) & bind(C,name='thrust_int_sort_wrapper') integer(4), device, dimension(*) :: array integer(4), value :: n end subroutine end interface integer(4), parameter :: n = 100 integer(4), device :: a_d(n) integer(4) :: a_h(n) !$cuf kernel do do i = 1, n a_d(i) = 1 + mod(47*i,n) end do call sort(a_d, n) a_h = a_d nres = count(a_h .eq. (/(i,i=1,n)/)) if (nres.eq.n) then print *,"test PASSED" else print *,"test FAILED" endif end
1.8. PGI Fortran Compiler Options
The PGI Fortran compiler driver is called pgfortran. General information on the compiler options which can be passed to pgfortran can be obtained by typing pgfortran -help. To enable targeting NVIDIA GPUs using OpenACC, use pgfortran -ta=tesla. To enable targeting NVIDIA GPUs using CUDA Fortran, use pgfortran -Mcuda. CUDA Fortran is also supported by the PGI Fortran compilers when the filename uses the .cuf extension. Uppercase file extensions, .F90 or .CUF, for example, may also be used, in which case the program is processed by the preprocessor before being compiled.
Other options which are pertinent to the examples in this document are:
-
--Mcudalib[=cublas|cufft|curand|cusparse]: this option adds the appropriate versions of the CUDA-optimized libraries to the link line. It handles static and dynamic linking, and platform (Linux, Windows, macOS) differences unobtrusively.
-
--Mcuda=[no]llvm: this option chooses between two choices for the compiler back-end code generator. Currently, using the cuRAND library from device code requires -Mcuda=nollvm.
-
--ta=tesla:cc35: this option compiles for compute capability 3.5. Certain device functionality, such as dynamic parallelism in the cuBLAS library, requires compute capability 3.5 or higher.
-
--lcublas_device: this adds the cuBLAS device library to set of linker options. On Windows, use --defaultlib:cublas_device.
2. BLAS Runtime APIs
This section describes the Fortran interfaces to the CUDA BLAS libraries. There are currently four somewhat separate collections of function entry points which are commonly referred to as the cuBLAS:
-
The original CUDA implementation of the BLAS routines, referred to as the legacy API, which are callable from the host and expect and operate on device data.
-
The newer "v2" CUDA implementation of the BLAS routines, plus some extensions for batched operations. These are also callable from the host and operate on device data. In Fortran terms, these entry points have been changed from subroutines to functions which return status.
-
Another implementation of the BLAS routines using the v2 entry points, callable from device code and which may take advantage of dynamic parallelism.
-
A new cuBLAS XT library which can target multiple GPUs using only host-resident data.
PGI currently ships with five Fortran modules which programmers can use to call into this cuBLAS functionality:
-
cublas, which provides interfaces to into the main cublas library. Both the legacy and v2 names are supported. In this module, the cublas names (such as cublasSaxpy) use the legacy calling conventions. Interfaces to a host BLAS library (for instance libblas.a in the PGI distribution) are also included in the cublas module.
-
cublas_v2, which is similar to the cublas module in most ways except the cublas names (such as cublasSaxpy) use the v2 calling conventions. For instance, instead of a subroutine, cublasSaxpy is a function which takes a handle as the first argument and returns an integer containing the status of the call.
-
cublasxt, which interfaces directly to the cublasXT API.
-
cublas_device, which is useable from CUDA Fortran device code and interfaces into the static cuBLAS Library cublas_device.a. The legacy cuBLAS API is not supported in this library or module.
-
openacc_cublas, which is useable from OpenACC device code and also provides interfaces into cublas_device.a. For convenience, this module marks each function as "!$acc routine seq". The legacy cuBLAS API is not supported in this library or module.
The v2 routines are integer functions that return an error status code; they return a value of CUBLAS_STATUS_SUCCESS if the call was successful, or other cuBLAS status return value if there was an error.
Documented interfaces to the traditional BLAS names in the subsequent sections, which contain the comment ! device or host variable should not be confused with the pointer mode issue from section 1.6. The traditional BLAS names are overloaded generic names in the cublas module. For instance, in this interface
subroutine scopy(n, x, incx, y, incy) integer :: n real(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
The arrays x and y can either both be device arrays, in which case cublasScopy is called via the generic interface, or they can both be host arrays, in which case scopy from the host BLAS library is called. Using CUDA Fortran managed data as actual arguments to scopy poses an interesting case, and calling cublasScopy is chosen by default. If you wish to call the host library version of scopy with managed data, don't expose the generic scopy interface at the call site.
Unless a specific kind is provided, in the following interfaces the plain integer type implies integer(4) and the plain real type implies real(4).
2.1. CUBLAS Definitions and Helper Functions
This section contains definitions and data types used in the cuBLAS library and interfaces to the cuBLAS Helper Functions.
The cublas module contains the following derived type definitions:
TYPE cublasHandle TYPE(C_PTR) :: handle END TYPE
The cuBLAS module contains the following enumerations:
enum, bind(c) enumerator :: CUBLAS_STATUS_SUCCESS =0 enumerator :: CUBLAS_STATUS_NOT_INITIALIZED =1 enumerator :: CUBLAS_STATUS_ALLOC_FAILED =3 enumerator :: CUBLAS_STATUS_INVALID_VALUE =7 enumerator :: CUBLAS_STATUS_ARCH_MISMATCH =8 enumerator :: CUBLAS_STATUS_MAPPING_ERROR =11 enumerator :: CUBLAS_STATUS_EXECUTION_FAILED=13 enumerator :: CUBLAS_STATUS_INTERNAL_ERROR =14 end enum
enum, bind(c) enumerator :: CUBLAS_FILL_MODE_LOWER=0 enumerator :: CUBLAS_FILL_MODE_UPPER=1 end enum
enum, bind(c) enumerator :: CUBLAS_DIAG_NON_UNIT=0 enumerator :: CUBLAS_DIAG_UNIT=1 end enum
enum, bind(c) enumerator :: CUBLAS_SIDE_LEFT =0 enumerator :: CUBLAS_SIDE_RIGHT=1 end enum
enum, bind(c) enumerator :: CUBLAS_OP_N=0 enumerator :: CUBLAS_OP_T=1 enumerator :: CUBLAS_OP_C=2 end enum
enum, bind(c) enumerator :: CUBLAS_POINTER_MODE_HOST = 0 enumerator :: CUBLAS_POINTER_MODE_DEVICE = 1 end enum
2.1.1. cublasCreate
This function initializes the CUBLAS library and creates a handle to an opaque structure holding the CUBLAS library context. It allocates hardware resources on the host and device and must be called prior to making any other CUBLAS library calls. The CUBLAS library context is tied to the current CUDA device. To use the library on multiple devices, one CUBLAS handle needs to be created for each device. Furthermore, for a given device, multiple CUBLAS handles with different configuration can be created. Because cublasCreate allocates some internal resources and the release of those resources by calling cublasDestroy will implicitly call cublasDeviceSynchronize, it is recommended to minimize the number of cublasCreate/cublasDestroy occurences. For multi-threaded applications that use the same device from different threads, the recommended programming model is to create one CUBLAS handle per thread and use that CUBLAS handle for the entire life of the thread.
integer(4) function cublasCreate(handle) type(cublasHandle) :: handle
2.1.2. cublasDestroy
This function releases hardware resources used by the CUBLAS library. This function is usually the last call with a particular handle to the CUBLAS library. Because cublasCreate allocates some internal resources and the release of those resources by calling cublasDestroy will implicitly call cublasDeviceSynchronize, it is recommended to minimize the number of cublasCreate/cublasDestroy occurences.
integer(4) function cublasDestroy(handle) type(cublasHandle) :: handle
2.1.3. cublasGetVersion
This function returns the version number of the cuBLAS library.
integer(4) function cublasGetVersion(handle, version) type(cublasHandle) :: handle integer(4) :: version
2.1.4. cublasSetStream
This function sets the cuBLAS library stream, which will be used to execute all subsequent calls to the cuBLAS library functions. If the cuBLAS library stream is not set, all kernels use the default NULL stream. In particular, this routine can be used to change the stream between kernel launches and then to reset the cuBLAS library stream back to NULL.
integer(4) function cublasSetStream(handle, stream) type(cublasHandle) :: handle integer(kind=cuda_stream_kind()) :: stream
2.1.5. cublasGetStream
This function gets the cuBLAS library stream, which is being used to execute all calls to the cuBLAS library functions. If the cuBLAS library stream is not set, all kernels use the default NULL stream.
integer(4) function cublasGetStream(handle, stream) type(cublasHandle) :: handle integer(kind=cuda_stream_kind()) :: stream
2.1.6. cublasGetPointerMode
This function obtains the pointer mode used by the cuBLAS library. In the cublas module, the pointer mode is set and reset on a call-by-call basis depending on the whether the device attribute is set on scalar actual arguments. See section 1.6 for a discussion of pointer modes.
integer(4) function cublasGetPointerMode(handle, mode) type(cublasHandle) :: handle integer(4) :: mode
2.1.7. cublasSetPointerMode
This function sets the pointer mode used by the cuBLAS library. When using the cublas module, the pointer mode is set on a call-by-call basis depending on the whether the device attribute is set on scalar actual arguments. When using the cublas_v2 module with v2 interfaces, it is the programmer's responsibility to make calls to cublasSetPointerMode so scalar arguments are handled correctly by the library. See section 1.6 for a discussion of pointer modes.
integer(4) function cublasSetPointerMode(handle, mode) type(cublasHandle) :: handle integer(4) :: mode
2.1.8. cublasGetAtomicsMode
This function obtains the atomics mode used by the cuBLAS library.
integer(4) function cublasGetAtomicsMode(handle, mode) type(cublasHandle) :: handle integer(4) :: mode
2.1.9. cublasSetAtomicsMode
This function sets the atomics mode used by the cuBLAS library. Some routines in the cuBLAS library have alternate implementations that use atomics to accumulate results. These alternate implementations may run faster but may also generate results which are not identical from one run to the other. The default is to not allow atomics in cuBLAS functions.
integer(4) function cublasSetAtomicsMode(handle, mode) type(cublasHandle) :: handle integer(4) :: mode
2.1.10. cublasGetHandle
This function gets the cuBLAS handle currently in use by a thread. The CUDA Fortran runtime keeps track of a CPU thread's current handle, if you are either using the legacy BLAS API, or do not wish to pass the handle through to low-level functions or subroutines manually.
type(cublashandle) function cublasGetHandle()
integer(4) function cublasGetHandle(handle) type(cublasHandle) :: handle
2.1.11. cublasSetVector
This function copies n elements from a vector x in host memory space to a vector y in GPU memory space. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of vector x and y is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpy or array assignment statements.
integer(4) function cublassetvector(n, elemsize, x, incx, y, incy) integer :: n, elemsize, incx, incy integer*1, dimension(*) :: x integer*1, device, dimension(*) :: y
2.1.12. cublasGetVector
This function copies n elements from a vector x in GPU memory space to a vector y in host memory space. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of vector x and y is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpy or array assignment statements.
integer(4) function cublasgetvector(n, elemsize, x, incx, y, incy) integer :: n, elemsize, incx, incy integer*1, device, dimension(*) :: x integer*1, dimension(*) :: y
2.1.13. cublasSetMatrix
This function copies a tile of rows x cols elements from a matrix A in host memory space to a matrix B in GPU memory space. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of Matrix A and B is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpy, cudaMemcpy2D, or array assignment statements.
integer(4) function cublassetmatrix(rows, cols, elemsize, a, lda, b, ldb) integer :: rows, cols, elemsize, lda, ldb integer*1, dimension(lda, *) :: a integer*1, device, dimension(ldb, *) :: b
2.1.14. cublasGetMatrix
This function copies a tile of rows x cols elements from a matrix A in GPU memory space to a matrix B in host memory space. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of Matrix A and B is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpy, cudaMemcpy2D, or array assignment statements.
integer(4) function cublasgetmatrix(rows, cols, elemsize, a, lda, b, ldb) integer :: rows, cols, elemsize, lda, ldb integer*1, device, dimension(lda, *) :: a integer*1, dimension(ldb, *) :: b
2.1.15. cublasSetVectorAsync
This function copies n elements from a vector x in host memory space to a vector y in GPU memory space, asynchronously, on the given CUDA stream. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of vector x and y is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpyAsync.
integer(4) function cublassetvectorasync(n, elemsize, x, incx, y, incy, stream) integer :: n, elemsize, incx, incy integer*1, dimension(*) :: x integer*1, device, dimension(*) :: y integer(kind=cuda_stream_kind()) :: stream
2.1.16. cublasGetVectorAsync
This function copies n elements from a vector x in host memory space to a vector y in GPU memory space, asynchronously, on the given CUDA stream. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of vector x and y is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpyAsync.
integer(4) function cublasgetvectorasync(n, elemsize, x, incx, y, incy, stream) integer :: n, elemsize, incx, incy integer*1, device, dimension(*) :: x integer*1, dimension(*) :: y integer(kind=cuda_stream_kind()) :: stream
2.1.17. cublasSetMatrixAsync
This function copies a tile of rows x cols elements from a matrix A in host memory space to a matrix B in GPU memory space, asynchronously using the specified stream. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of Matrix A and B is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpyAsync or cudaMemcpy2DAsync.
integer(4) function cublassetmatrixasync(rows, cols, elemsize, a, lda, b, ldb, stream) integer :: rows, cols, elemsize, lda, ldb integer*1, dimension(lda, *) :: a integer*1, device, dimension(ldb, *) :: b integer(kind=cuda_stream_kind()) :: stream
2.1.18. cublasGetMatrixAsync
This function copies a tile of rows x cols elements from a matrix A in GPU memory space to a matrix B in host memory space, asynchronously, using the specified stream. It is assumed that each element requires storage of elemSize bytes. In CUDA Fortran, the type of Matrix A and B is overloaded to take any data type, but the size of the data type must still be specified in bytes. This functionality can also be implemented using cudaMemcpyAsync or cudaMemcpy2DAsync.
integer(4) function cublasgetmatrixasync(rows, cols, elemsize, a, lda, b, ldb, stream) integer :: rows, cols, elemsize, lda, ldb integer*1, device, dimension(lda, *) :: a integer*1, dimension(ldb, *) :: b integer(kind=cuda_stream_kind()) :: stream
2.2. Single Precision Functions and Subroutines
This section contains interfaces to the single precision BLAS and cuBLAS functions and subroutines.
2.2.1. isamax
ISAMAX finds the index of the element having the maximum absolute value.
integer(4) function isamax(n, x, incx) integer :: n real(4), device, dimension(*) :: x ! device or host variable integer :: incx
integer(4) function cublasIsamax(n, x, incx) integer :: n real(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasIsamax_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.2.2. isamin
ISAMIN finds the index of the element having the minimum absolute value.
integer(4) function isamin(n, x, incx) integer :: n real(4), device, dimension(*) :: x ! device or host variable integer :: incx
integer(4) function cublasIsamin(n, x, incx) integer :: n real(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasIsamin_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.2.3. sasum
SASUM takes the sum of the absolute values.
real(4) function sasum(n, x, incx) integer :: n real(4), device, dimension(*) :: x ! device or host variable integer :: incx
real(4) function cublasSasum(n, x, incx) integer :: n real(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasSasum_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx real(4), device :: res ! device or host variable
2.2.4. saxpy
SAXPY constant times a vector plus a vector.
subroutine saxpy(n, a, x, incx, y, incy) integer :: n real(4), device :: a ! device or host variable real(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasSaxpy(n, a, x, incx, y, incy) integer :: n real(4), device :: a ! device or host variable real(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasSaxpy_v2(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(4), device :: a ! device or host variable real(4), device, dimension(*) :: x, y integer :: incx, incy
2.2.5. scopy
SCOPY copies a vector, x, to a vector, y.
subroutine scopy(n, x, incx, y, incy) integer :: n real(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasScopy(n, x, incx, y, incy) integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasScopy_v2(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy
2.2.6. sdot
SDOT forms the dot product of two vectors.
real(4) function sdot(n, x, incx, y, incy) integer :: n real(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
real(4) function cublasSdot(n, x, incx, y, incy) integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasSdot_v2(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy real(4), device :: res ! device or host variable
2.2.7. snrm2
SNRM2 returns the euclidean norm of a vector via the function name, so that SNRM2 := sqrt( x'*x ).
real(4) function snrm2(n, x, incx) integer :: n real(4), device, dimension(*) :: x ! device or host variable integer :: incx
real(4) function cublasSnrm2(n, x, incx) integer :: n real(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasSnrm2_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx real(4), device :: res ! device or host variable
2.2.8. srot
SROT applies a plane rotation.
subroutine srot(n, x, incx, y, incy, sc, ss) integer :: n real(4), device :: sc, ss ! device or host variable real(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasSrot(n, x, incx, y, incy, sc, ss) integer :: n real(4), device :: sc, ss ! device or host variable real(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasSrot_v2(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(4), device :: sc, ss ! device or host variable real(4), device, dimension(*) :: x, y integer :: incx, incy
2.2.9. srotg
SROTG constructs a Givens plane rotation.
subroutine srotg(sa, sb, sc, ss) real(4), device :: sa, sb, sc, ss ! device or host variable
subroutine cublasSrotg(sa, sb, sc, ss) real(4), device :: sa, sb, sc, ss ! device or host variable
integer(4) function cublasSrotg_v2(h, sa, sb, sc, ss) type(cublasHandle) :: h real(4), device :: sa, sb, sc, ss ! device or host variable
2.2.10. srotm
SROTM applies the modified Givens transformation, H, to the 2 by N matrix (SX**T) , where **T indicates transpose. The elements of SX are in (SX**T) SX(LX+I*INCX), I = 0 to N-1, where LX = 1 if INCX .GE. 0, ELSE LX = (-INCX)*N, and similarly for SY using LY and INCY. With SPARAM(1)=SFLAG, H has one of the following forms.. SFLAG=-1.E0 SFLAG=0.E0 SFLAG=1.E0 SFLAG=-2.E0 (SH11 SH12) (1.E0 SH12) (SH11 1.E0) (1.E0 0.E0) H=( ) ( ) ( ) ( ) (SH21 SH22), (SH21 1.E0), (-1.E0 SH22), (0.E0 1.E0). See SROTMG for a description of data storage in SPARAM.
subroutine srotm(n, x, incx, y, incy, param) integer :: n real(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasSrotm(n, x, incx, y, incy, param) integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy real(4), device :: param(*) ! device or host variable
integer(4) function cublasSrotm_v2(h, n, x, incx, y, incy, param) type(cublasHandle) :: h integer :: n real(4), device :: param(*) ! device or host variable real(4), device, dimension(*) :: x, y integer :: incx, incy
2.2.11. srotmg
SROTMG constructs the modified Givens transformation matrix H which zeros the second component of the 2-vector (SQRT(SD1)*SX1,SQRT(SD2)*SY2)**T. With SPARAM(1)=SFLAG, H has one of the following forms.. SFLAG=-1.E0 SFLAG=0.E0 SFLAG=1.E0 SFLAG=-2.E0 (SH11 SH12) (1.E0 SH12) (SH11 1.E0) (1.E0 0.E0) H=( ) ( ) ( ) ( ) (SH21 SH22), (SH21 1.E0), (-1.E0 SH22), (0.E0 1.E0). Locations 2-4 of SPARAM contain SH11,SH21,SH12, and SH22 respectively. (Values of 1.E0, -1.E0, or 0.E0 implied by the value of SPARAM(1) are not stored in SPARAM.)
subroutine srotmg(d1, d2, x1, y1, param) real(4), device :: d1, d2, x1, y1, param(*) ! device or host variable
subroutine cublasSrotmg(d1, d2, x1, y1, param) real(4), device :: d1, d2, x1, y1, param(*) ! device or host variable
integer(4) function cublasSrotmg_v2(h, d1, d2, x1, y1, param) type(cublasHandle) :: h real(4), device :: d1, d2, x1, y1, param(*) ! device or host variable
2.2.12. sscal
SSCAL scales a vector by a constant.
subroutine sscal(n, a, x, incx) integer :: n real(4), device :: a ! device or host variable real(4), device, dimension(*) :: x ! device or host variable integer :: incx
subroutine cublasSscal(n, a, x, incx) integer :: n real(4), device :: a ! device or host variable real(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasSscal_v2(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(4), device :: a ! device or host variable real(4), device, dimension(*) :: x integer :: incx
2.2.13. sswap
SSWAP interchanges two vectors.
subroutine sswap(n, x, incx, y, incy) integer :: n real(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasSswap(n, x, incx, y, incy) integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasSswap_v2(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy
2.2.14. sgbmv
SGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals.
subroutine sgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, kl, ku, lda, incx, incy real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x, y ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, kl, ku, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSgbmv_v2(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
2.2.15. sgemv
SGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
subroutine sgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x, y ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSgemv_v2(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
2.2.16. sger
SGER performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine sger(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x, y ! device or host variable real(4), device :: alpha ! device or host variable
subroutine cublasSger(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha ! device or host variable
integer(4) function cublasSger_v2(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha ! device or host variable
2.2.17. ssbmv
SSBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric band matrix, with k super-diagonals.
subroutine ssbmv(t, n, k, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: k, n, lda, incx, incy real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x, y ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsbmv(t, n, k, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: k, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsbmv_v2(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: k, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
2.2.18. sspmv
SSPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form.
subroutine sspmv(t, n, alpha, a, x, incx, beta, y, incy) character*1 :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSspmv(t, n, alpha, a, x, incx, beta, y, incy) character*1 :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSspmv_v2(h, t, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y real(4), device :: alpha, beta ! device or host variable
2.2.19. sspr
SSPR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix, supplied in packed form.
subroutine sspr(t, n, alpha, x, incx, a) character*1 :: t integer :: n, incx real(4), device, dimension(*) :: a, x ! device or host variable real(4), device :: alpha ! device or host variable
subroutine cublasSspr(t, n, alpha, x, incx, a) character*1 :: t integer :: n, incx real(4), device, dimension(*) :: a, x real(4), device :: alpha ! device or host variable
integer(4) function cublasSspr_v2(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx real(4), device, dimension(*) :: a, x real(4), device :: alpha ! device or host variable
2.2.20. sspr2
SSPR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form.
subroutine sspr2(t, n, alpha, x, incx, y, incy, a) character*1 :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y ! device or host variable real(4), device :: alpha ! device or host variable
subroutine cublasSspr2(t, n, alpha, x, incx, y, incy, a) character*1 :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y real(4), device :: alpha ! device or host variable
integer(4) function cublasSspr2_v2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y real(4), device :: alpha ! device or host variable
2.2.21. ssymv
SSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine ssymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x, y ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsymv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
2.2.22. ssyr
SSYR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix.
subroutine ssyr(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x ! device or host variable real(4), device :: alpha ! device or host variable
subroutine cublasSsyr(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x real(4), device :: alpha ! device or host variable
integer(4) function cublasSsyr_v2(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x real(4), device :: alpha ! device or host variable
2.2.23. ssyr2
SSYR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine ssyr2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x, y ! device or host variable real(4), device :: alpha ! device or host variable
subroutine cublasSsyr2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha ! device or host variable
integer(4) function cublasSsyr2_v2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha ! device or host variable
2.2.24. stbmv
STBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals.
subroutine stbmv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x ! device or host variable
subroutine cublasStbmv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
integer(4) function cublasStbmv_v2(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.2.25. stbsv
STBSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine stbsv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x ! device or host variable
subroutine cublasStbsv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
integer(4) function cublasStbsv_v2(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.2.26. stpmv
STPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
subroutine stpmv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x ! device or host variable
subroutine cublasStpmv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x
integer(4) function cublasStpmv_v2(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x
2.2.27. stpsv
STPSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine stpsv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x ! device or host variable
subroutine cublasStpsv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x
integer(4) function cublasStpsv_v2(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x
2.2.28. strmv
STRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix.
subroutine strmv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x ! device or host variable
subroutine cublasStrmv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
integer(4) function cublasStrmv_v2(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.2.29. strsv
STRSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine strsv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(*) :: x ! device or host variable
subroutine cublasStrsv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
integer(4) function cublasStrsv_v2(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.2.30. sgemm
SGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
subroutine sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: transa, transb integer :: m, n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(ldb, *) :: b ! device or host variable real(4), device, dimension(ldc, *) :: c ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: transa, transb integer :: m, n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSgemm_v2(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.2.31. ssymm
SSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
subroutine ssymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(ldb, *) :: b ! device or host variable real(4), device, dimension(ldc, *) :: c ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsymm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.2.32. ssyrk
SSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine ssyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(ldc, *) :: c ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsyrk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.2.33. ssyr2k
SSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine ssyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(ldb, *) :: b ! device or host variable real(4), device, dimension(ldc, *) :: c ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsyr2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.2.34. ssyrkx
SSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
subroutine ssyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(ldb, *) :: b ! device or host variable real(4), device, dimension(ldc, *) :: c ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasSsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasSsyrkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.2.35. strmm
STRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ), where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T.
subroutine strmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(ldb, *) :: b ! device or host variable real(4), device :: alpha ! device or host variable
subroutine cublasStrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device :: alpha ! device or host variable
integer(4) function cublasStrmm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha ! device or host variable
2.2.36. strsm
STRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
subroutine strsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb real(4), device, dimension(lda, *) :: a ! device or host variable real(4), device, dimension(ldb, *) :: b ! device or host variable real(4), device :: alpha ! device or host variable
subroutine cublasStrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device :: alpha ! device or host variable
integer(4) function cublasStrsm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device :: alpha ! device or host variable
2.2.37. cublasSgetrfBatched
SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm.
integer(4) function cublasSgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) integer, device :: info(*) integer :: batchCount
2.2.38. cublasSgetriBatched
SGETRI computes the inverse of a matrix using the LU factorization computed by SGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A).
integer(4) function cublasSgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Carray(*) integer :: ldc integer, device :: info(*) integer :: batchCount
2.2.39. cublasSgetrsBatched
SGETRS solves a system of linear equations A * X = B or A**T * X = B with a general N-by-N matrix A using the LU factorization computed by SGETRF.
integer(4) function cublasSgetrsBatched(h, trans, n, nrhs, Aarray, lda, ipvt, Barray, ldb, info, batchCount) type(cublasHandle) :: h integer :: trans ! integer or character(1) variable integer :: n, nrhs type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Barray(*) integer :: ldb integer :: info(*) integer :: batchCount
2.2.40. cublasSgemmBatched
SGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasSgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa ! integer or character(1) variable integer :: transb ! integer or character(1) variable integer :: m, n, k real(4), device :: alpha ! device or host variable type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb real(4), device :: beta ! device or host variable type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
integer(4) function cublasSgemmBatched_v2(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa integer :: transb integer :: m, n, k real(4), device :: alpha ! device or host variable type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb real(4), device :: beta ! device or host variable type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
2.2.41. cublasStrsmBatched
STRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
integer(4) function cublasStrsmBatched( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side ! integer or character(1) variable integer :: uplo ! integer or character(1) variable integer :: trans ! integer or character(1) variable integer :: diag ! integer or character(1) variable integer :: m, n real(4), device :: alpha ! device or host variable type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
integer(4) function cublasStrsmBatched_v2( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side integer :: uplo integer :: trans integer :: diag integer :: m, n real(4), device :: alpha ! device or host variable type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
2.2.42. cublasSmatinvBatched
cublasSmatinvBatched is a short cut of cublasSgetrfBatched plus cublasSgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasSgetrfBatched and cublasSgetriBatched.
integer(4) function cublasSmatinvBatched(h, n, Aarray, lda, Ainv, lda_inv, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Ainv(*) integer :: lda_inv integer, device :: info(*) integer :: batchCount
2.2.43. cublasSgeqrfBatched
SGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.
integer(4) function cublasSgeqrfBatched(h, m, n, Aarray, lda, Tau, info, batchCount) type(cublasHandle) :: h integer :: m, n type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Tau(*) integer :: info(*) integer :: batchCount
2.2.44. cublasSgelsBatched
SGELS solves overdetermined or underdetermined real linear systems involving an M-by-N matrix A, or its transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = 'N' and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = 'N' and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = 'T' and m >= n: find the minimum norm solution of an undetermined system A**T * X = B. 4. If TRANS = 'T' and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**T * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X.
integer(4) function cublasSgelsBatched(h, trans, m, n, nrhs, Aarray, lda, Carray, ldc, info, devinfo, batchCount) type(cublasHandle) :: h integer :: trans ! integer or character(1) variable integer :: m, n, nrhs type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Carray(*) integer :: ldc integer :: info(*) integer, device :: devinfo(*) integer :: batchCount
2.3. Double Precision Functions and Subroutines
This section contains interfaces to the double precision BLAS and cuBLAS functions and subroutines.
2.3.1. idamax
IDAMAX finds the the index of the element having the maximum absolute value.
integer(4) function idamax(n, x, incx) integer :: n real(8), device, dimension(*) :: x ! device or host variable integer :: incx
integer(4) function cublasIdamax(n, x, incx) integer :: n real(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasIdamax_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.3.2. idamin
IDAMIN finds the index of the element having the minimum absolute value.
integer(4) function idamin(n, x, incx) integer :: n real(8), device, dimension(*) :: x ! device or host variable integer :: incx
integer(4) function cublasIdamin(n, x, incx) integer :: n real(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasIdamin_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.3.3. dasum
DASUM takes the sum of the absolute values.
real(8) function dasum(n, x, incx) integer :: n real(8), device, dimension(*) :: x ! device or host variable integer :: incx
real(8) function cublasDasum(n, x, incx) integer :: n real(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasDasum_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx real(8), device :: res ! device or host variable
2.3.4. daxpy
DAXPY constant times a vector plus a vector.
subroutine daxpy(n, a, x, incx, y, incy) integer :: n real(8), device :: a ! device or host variable real(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasDaxpy(n, a, x, incx, y, incy) integer :: n real(8), device :: a ! device or host variable real(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasDaxpy_v2(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(8), device :: a ! device or host variable real(8), device, dimension(*) :: x, y integer :: incx, incy
2.3.5. dcopy
DCOPY copies a vector, x, to a vector, y.
subroutine dcopy(n, x, incx, y, incy) integer :: n real(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasDcopy(n, x, incx, y, incy) integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasDcopy_v2(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy
2.3.6. ddot
DDOT forms the dot product of two vectors.
real(8) function ddot(n, x, incx, y, incy) integer :: n real(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
real(8) function cublasDdot(n, x, incx, y, incy) integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasDdot_v2(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy real(8), device :: res ! device or host variable
2.3.7. dnrm2
DNRM2 returns the euclidean norm of a vector via the function name, so that DNRM2 := sqrt( x'*x )
real(8) function dnrm2(n, x, incx) integer :: n real(8), device, dimension(*) :: x ! device or host variable integer :: incx
real(8) function cublasDnrm2(n, x, incx) integer :: n real(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasDnrm2_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx real(8), device :: res ! device or host variable
2.3.8. drot
DROT applies a plane rotation.
subroutine drot(n, x, incx, y, incy, sc, ss) integer :: n real(8), device :: sc, ss ! device or host variable real(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasDrot(n, x, incx, y, incy, sc, ss) integer :: n real(8), device :: sc, ss ! device or host variable real(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasDrot_v2(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(8), device :: sc, ss ! device or host variable real(8), device, dimension(*) :: x, y integer :: incx, incy
2.3.9. drotg
DROTG constructs a Givens plane rotation.
subroutine drotg(sa, sb, sc, ss) real(8), device :: sa, sb, sc, ss ! device or host variable
subroutine cublasDrotg(sa, sb, sc, ss) real(8), device :: sa, sb, sc, ss ! device or host variable
integer(4) function cublasDrotg_v2(h, sa, sb, sc, ss) type(cublasHandle) :: h real(8), device :: sa, sb, sc, ss ! device or host variable
2.3.10. drotm
DROTM applies the modified Givens transformation, H, to the 2 by N matrix (DX**T) , where **T indicates transpose. The elements of DX are in (DX**T) DX(LX+I*INCX), I = 0 to N-1, where LX = 1 if INCX .GE. 0, ELSE LX = (-INCX)*N, and similarly for DY using LY and INCY. With DPARAM(1)=DFLAG, H has one of the following forms.. DFLAG=-1.D0 DFLAG=0.D0 DFLAG=1.D0 DFLAG=-2.D0 (DH11 DH12) (1.D0 DH12) (DH11 1.D0) (1.D0 0.D0) H=( ) ( ) ( ) ( ) (DH21 DH22), (DH21 1.D0), (-1.D0 DH22), (0.D0 1.D0). See DROTMG for a description of data storage in DPARAM.
subroutine drotm(n, x, incx, y, incy, param) integer :: n real(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasDrotm(n, x, incx, y, incy, param) integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy real(8), device :: param(*) ! device or host variable
integer(4) function cublasDrotm_v2(h, n, x, incx, y, incy, param) type(cublasHandle) :: h integer :: n real(8), device :: param(*) ! device or host variable real(8), device, dimension(*) :: x, y integer :: incx, incy
2.3.11. drotmg
DROTMG constructs the modified Givens transformation matrix H which zeros the second component of the 2-vector (SQRT(DD1)*DX1,SQRT(DD2)*DY2)**T. With DPARAM(1)=DFLAG, H has one of the following forms.. DFLAG=-1.D0 DFLAG=0.D0 DFLAG=1.D0 DFLAG=-2.D0 (DH11 DH12) (1.D0 DH12) (DH11 1.D0) (1.D0 0.D0) H=( ) ( ) ( ) ( ) (DH21 DH22), (DH21 1.D0), (-1.D0 DH22), (0.D0 1.D0). Locations 2-4 of DPARAM contain DH11, DH21, DH12, and DH22 respectively. (Values of 1.D0, -1.D0, of 0.D0 implied by the value of DPARAM(1) are not stored in DPARAM.)
subroutine drotmg(d1, d2, x1, y1, param) real(8), device :: d1, d2, x1, y1, param(*) ! device or host variable
subroutine cublasDrotmg(d1, d2, x1, y1, param) real(8), device :: d1, d2, x1, y1, param(*) ! device or host variable
integer(4) function cublasDrotmg_v2(h, d1, d2, x1, y1, param) type(cublasHandle) :: h real(8), device :: d1, d2, x1, y1, param(*) ! device or host variable
2.3.12. dscal
DSCAL scales a vector by a constant.
subroutine dscal(n, a, x, incx) integer :: n real(8), device :: a ! device or host variable real(8), device, dimension(*) :: x ! device or host variable integer :: incx
subroutine cublasDscal(n, a, x, incx) integer :: n real(8), device :: a ! device or host variable real(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasDscal_v2(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(8), device :: a ! device or host variable real(8), device, dimension(*) :: x integer :: incx
2.3.13. dswap
interchanges two vectors.
subroutine dswap(n, x, incx, y, incy) integer :: n real(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasDswap(n, x, incx, y, incy) integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasDswap_v2(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy
2.3.14. dgbmv
DGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals.
subroutine dgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, kl, ku, lda, incx, incy real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x, y ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, kl, ku, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDgbmv_v2(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
2.3.15. dgemv
DGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
subroutine dgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x, y ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDgemv_v2(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
2.3.16. dger
DGER performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine dger(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x, y ! device or host variable real(8), device :: alpha ! device or host variable
subroutine cublasDger(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha ! device or host variable
integer(4) function cublasDger_v2(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha ! device or host variable
2.3.17. dsbmv
DSBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric band matrix, with k super-diagonals.
subroutine dsbmv(t, n, k, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: k, n, lda, incx, incy real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x, y ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsbmv(t, n, k, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: k, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsbmv_v2(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: k, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
2.3.18. dspmv
DSPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form.
subroutine dspmv(t, n, alpha, a, x, incx, beta, y, incy) character*1 :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDspmv(t, n, alpha, a, x, incx, beta, y, incy) character*1 :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDspmv_v2(h, t, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y real(8), device :: alpha, beta ! device or host variable
2.3.19. dspr
DSPR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix, supplied in packed form.
subroutine dspr(t, n, alpha, x, incx, a) character*1 :: t integer :: n, incx real(8), device, dimension(*) :: a, x ! device or host variable real(8), device :: alpha ! device or host variable
subroutine cublasDspr(t, n, alpha, x, incx, a) character*1 :: t integer :: n, incx real(8), device, dimension(*) :: a, x real(8), device :: alpha ! device or host variable
integer(4) function cublasDspr_v2(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx real(8), device, dimension(*) :: a, x real(8), device :: alpha ! device or host variable
2.3.20. dspr2
DSPR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form.
subroutine dspr2(t, n, alpha, x, incx, y, incy, a) character*1 :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y ! device or host variable real(8), device :: alpha ! device or host variable
subroutine cublasDspr2(t, n, alpha, x, incx, y, incy, a) character*1 :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y real(8), device :: alpha ! device or host variable
integer(4) function cublasDspr2_v2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y real(8), device :: alpha ! device or host variable
2.3.21. dsymv
DSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine dsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x, y ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsymv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
2.3.22. dsyr
DSYR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix.
subroutine dsyr(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x ! device or host variable real(8), device :: alpha ! device or host variable
subroutine cublasDsyr(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x real(8), device :: alpha ! device or host variable
integer(4) function cublasDsyr_v2(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x real(8), device :: alpha ! device or host variable
2.3.23. dsyr2
DSYR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine dsyr2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x, y ! device or host variable real(8), device :: alpha ! device or host variable
subroutine cublasDsyr2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha ! device or host variable
integer(4) function cublasDsyr2_v2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha ! device or host variable
2.3.24. dtbmv
DTBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals.
subroutine dtbmv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x ! device or host variable
subroutine cublasDtbmv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
integer(4) function cublasDtbmv_v2(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
2.3.25. dtbsv
DTBSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine dtbsv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x ! device or host variable
subroutine cublasDtbsv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
integer(4) function cublasDtbsv_v2(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
2.3.26. dtpmv
DTPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
subroutine dtpmv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx real(8), device, dimension(*) :: a, x ! device or host variable
subroutine cublasDtpmv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx real(8), device, dimension(*) :: a, x
integer(4) function cublasDtpmv_v2(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(8), device, dimension(*) :: a, x
2.3.27. dtpsv
DTPSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine dtpsv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx real(8), device, dimension(*) :: a, x ! device or host variable
subroutine cublasDtpsv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx real(8), device, dimension(*) :: a, x
integer(4) function cublasDtpsv_v2(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(8), device, dimension(*) :: a, x
2.3.28. dtrmv
DTRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix.
subroutine dtrmv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x ! device or host variable
subroutine cublasDtrmv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
integer(4) function cublasDtrmv_v2(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
2.3.29. dtrsv
DTRSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine dtrsv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(*) :: x ! device or host variable
subroutine cublasDtrsv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
integer(4) function cublasDtrsv_v2(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
2.3.30. dgemm
DGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
subroutine dgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: transa, transb integer :: m, n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(ldb, *) :: b ! device or host variable real(8), device, dimension(ldc, *) :: c ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: transa, transb integer :: m, n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDgemm_v2(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.3.31. dsymm
DSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
subroutine dsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(ldb, *) :: b ! device or host variable real(8), device, dimension(ldc, *) :: c ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsymm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.3.32. dsyrk
DSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine dsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(ldc, *) :: c ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsyrk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.3.33. dsyr2k
DSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine dsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(ldb, *) :: b ! device or host variable real(8), device, dimension(ldc, *) :: c ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsyr2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.3.34. dsyrkx
DSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
subroutine dsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(ldb, *) :: b ! device or host variable real(8), device, dimension(ldc, *) :: c ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasDsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasDsyrkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.3.35. dtrmm
DTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ), where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T.
subroutine dtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(ldb, *) :: b ! device or host variable real(8), device :: alpha ! device or host variable
subroutine cublasDtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device :: alpha ! device or host variable
integer(4) function cublasDtrmm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha ! device or host variable
2.3.36. dtrsm
DTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
subroutine dtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb real(8), device, dimension(lda, *) :: a ! device or host variable real(8), device, dimension(ldb, *) :: b ! device or host variable real(8), device :: alpha ! device or host variable
subroutine cublasDtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device :: alpha ! device or host variable
integer(4) function cublasDtrsm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device :: alpha ! device or host variable
2.3.37. cublasDgetrfBatched
DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm.
integer(4) function cublasDgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) integer, device :: info(*) integer :: batchCount
2.3.38. cublasDgetriBatched
DGETRI computes the inverse of a matrix using the LU factorization computed by DGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A).
integer(4) function cublasDgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Carray(*) integer :: ldc integer, device :: info(*) integer :: batchCount
2.3.39. cublasDgetrsBatched
DGETRS solves a system of linear equations A * X = B or A**T * X = B with a general N-by-N matrix A using the LU factorization computed by DGETRF.
integer(4) function cublasDgetrsBatched(h, trans, n, nrhs, Aarray, lda, ipvt, Barray, ldb, info, batchCount) type(cublasHandle) :: h integer :: trans ! integer or character(1) variable integer :: n, nrhs type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Barray(*) integer :: ldb integer :: info(*) integer :: batchCount
2.3.40. cublasDgemmBatched
DGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasDgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa ! integer or character(1) variable integer :: transb ! integer or character(1) variable integer :: m, n, k real(8), device :: alpha ! device or host variable type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb real(8), device :: beta ! device or host variable type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
integer(4) function cublasDgemmBatched_v2(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa integer :: transb integer :: m, n, k real(8), device :: alpha ! device or host variable type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb real(8), device :: beta ! device or host variable type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
2.3.41. cublasDtrsmBatched
DTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
integer(4) function cublasDtrsmBatched( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side ! integer or character(1) variable integer :: uplo ! integer or character(1) variable integer :: trans ! integer or character(1) variable integer :: diag ! integer or character(1) variable integer :: m, n real(8), device :: alpha ! device or host variable type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
integer(4) function cublasDtrsmBatched_v2( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side integer :: uplo integer :: trans integer :: diag integer :: m, n real(8), device :: alpha ! device or host variable type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
2.3.42. cublasDmatinvBatched
cublasDmatinvBatched is a short cut of cublasDgetrfBatched plus cublasDgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasDgetrfBatched and cublasDgetriBatched.
integer(4) function cublasDmatinvBatched(h, n, Aarray, lda, Ainv, lda_inv, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Ainv(*) integer :: lda_inv integer, device :: info(*) integer :: batchCount
2.3.43. cublasDgeqrfBatched
DGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R.
integer(4) function cublasDgeqrfBatched(h, m, n, Aarray, lda, Tau, info, batchCount) type(cublasHandle) :: h integer :: m, n type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Tau(*) integer :: info(*) integer :: batchCount
2.3.44. cublasDgelsBatched
DGELS solves overdetermined or underdetermined real linear systems involving an M-by-N matrix A, or its transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = 'N' and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = 'N' and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = 'T' and m >= n: find the minimum norm solution of an undetermined system A**T * X = B. 4. If TRANS = 'T' and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**T * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X.
integer(4) function cublasDgelsBatched(h, trans, m, n, nrhs, Aarray, lda, Carray, ldc, info, devinfo, batchCount) type(cublasHandle) :: h integer :: trans ! integer or character(1) variable integer :: m, n, nrhs type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Carray(*) integer :: ldc integer :: info(*) integer, device :: devinfo(*) integer :: batchCount
2.4. Single Precision Complex Functions and Subroutines
This section contains interfaces to the single precision complex BLAS and cuBLAS functions and subroutines.
2.4.1. icamax
ICAMAX finds the index of the element having the maximum absolute value.
integer(4) function icamax(n, x, incx) integer :: n complex(4), device, dimension(*) :: x ! device or host variable integer :: incx
integer(4) function cublasIcamax(n, x, incx) integer :: n complex(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasIcamax_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.4.2. icamin
ICAMIN finds the index of the element having the minimum absolute value.
integer(4) function icamin(n, x, incx) integer :: n complex(4), device, dimension(*) :: x ! device or host variable integer :: incx
integer(4) function cublasIcamin(n, x, incx) integer :: n complex(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasIcamin_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.4.3. scasum
SCASUM takes the sum of the absolute values of a complex vector and returns a single precision result.
real(4) function scasum(n, x, incx) integer :: n complex(4), device, dimension(*) :: x ! device or host variable integer :: incx
real(4) function cublasScasum(n, x, incx) integer :: n complex(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasScasum_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx real(4), device :: res ! device or host variable
2.4.4. caxpy
CAXPY constant times a vector plus a vector.
subroutine caxpy(n, a, x, incx, y, incy) integer :: n complex(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasCaxpy(n, a, x, incx, y, incy) integer :: n complex(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasCaxpy_v2(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.4.5. ccopy
CCOPY copies a vector x to a vector y.
subroutine ccopy(n, x, incx, y, incy) integer :: n complex(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasCcopy(n, x, incx, y, incy) integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasCcopy_v2(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.4.6. cdotc
forms the dot product of two vectors, conjugating the first vector.
complex(4) function cdotc(n, x, incx, y, incy) integer :: n complex(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
complex(4) function cublasCdotc(n, x, incx, y, incy) integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasCdotc_v2(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy complex(4), device :: res ! device or host variable
2.4.7. cdotu
CDOTU forms the dot product of two vectors.
complex(4) function cdotu(n, x, incx, y, incy) integer :: n complex(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
complex(4) function cublasCdotu(n, x, incx, y, incy) integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasCdotu_v2(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy complex(4), device :: res ! device or host variable
2.4.8. scnrm2
SCNRM2 returns the euclidean norm of a vector via the function name, so that SCNRM2 := sqrt( x**H*x )
real(4) function scnrm2(n, x, incx) integer :: n complex(4), device, dimension(*) :: x ! device or host variable integer :: incx
real(4) function cublasScnrm2(n, x, incx) integer :: n complex(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasScnrm2_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx real(4), device :: res ! device or host variable
2.4.9. crot
CROT applies a plane rotation, where the cos (C) is real and the sin (S) is complex, and the vectors CX and CY are complex.
subroutine crot(n, x, incx, y, incy, sc, ss) integer :: n real(4), device :: sc ! device or host variable complex(4), device :: ss ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasCrot(n, x, incx, y, incy, sc, ss) integer :: n real(4), device :: sc ! device or host variable complex(4), device :: ss ! device or host variable complex(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasCrot_v2(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(4), device :: sc ! device or host variable complex(4), device :: ss ! device or host variable complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.4.10. csrot
CSROT applies a plane rotation, where the cos and sin (c and s) are real and the vectors cx and cy are complex.
subroutine csrot(n, x, incx, y, incy, sc, ss) integer :: n real(4), device :: sc, ss ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasCsrot(n, x, incx, y, incy, sc, ss) integer :: n real(4), device :: sc, ss ! device or host variable complex(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasCsrot_v2(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(4), device :: sc, ss ! device or host variable complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.4.11. crotg
CROTG determines a complex Givens rotation.
subroutine crotg(sa, sb, sc, ss) complex(4), device :: sa, sb, ss ! device or host variable real(4), device :: sc ! device or host variable
subroutine cublasCrotg(sa, sb, sc, ss) complex(4), device :: sa, sb, ss ! device or host variable real(4), device :: sc ! device or host variable
integer(4) function cublasCrotg_v2(h, sa, sb, sc, ss) type(cublasHandle) :: h complex(4), device :: sa, sb, ss ! device or host variable real(4), device :: sc ! device or host variable
2.4.12. cscal
CSCAL scales a vector by a constant.
subroutine cscal(n, a, x, incx) integer :: n complex(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x ! device or host variable integer :: incx
subroutine cublasCscal(n, a, x, incx) integer :: n complex(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasCscal_v2(h, n, a, x, incx) type(cublasHandle) :: h integer :: n complex(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x integer :: incx
2.4.13. csscal
CSSCAL scales a complex vector by a real constant.
subroutine csscal(n, a, x, incx) integer :: n real(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x ! device or host variable integer :: incx
subroutine cublasCsscal(n, a, x, incx) integer :: n real(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x integer :: incx
integer(4) function cublasCsscal_v2(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x integer :: incx
2.4.14. cswap
CSWAP interchanges two vectors.
subroutine cswap(n, x, incx, y, incy) integer :: n complex(4), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasCswap(n, x, incx, y, incy) integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasCswap_v2(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.4.15. cgbmv
CGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals.
subroutine cgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, kl, ku, lda, incx, incy complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, kl, ku, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCgbmv_v2(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.4.16. cgemv
CGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
subroutine cgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCgemv_v2(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.4.17. cgerc
CGERC performs the rank 1 operation A := alpha*x*y**H + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine cgerc(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable complex(4), device :: alpha ! device or host variable
subroutine cublasCgerc(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha ! device or host variable
integer(4) function cublasCgerc_v2(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha ! device or host variable
2.4.18. cgeru
CGERU performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine cgeru(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable complex(4), device :: alpha ! device or host variable
subroutine cublasCgeru(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha ! device or host variable
integer(4) function cublasCgeru_v2(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha ! device or host variable
2.4.19. csymv
CSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine csymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsymv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.4.20. csyr
CSYR performs the symmetric rank 1 operation A := alpha*x*x**H + A, where alpha is a complex scalar, x is an n element vector and A is an n by n symmetric matrix.
subroutine csyr(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x ! device or host variable complex(4), device :: alpha ! device or host variable
subroutine cublasCsyr(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x complex(4), device :: alpha ! device or host variable
integer(4) function cublasCsyr_v2(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x complex(4), device :: alpha ! device or host variable
2.4.21. csyr2
CSYR2 performs the symmetric rank 2 operation A := alpha*x*y' + alpha*y*x' + A, where alpha is a complex scalar, x and y are n element vectors and A is an n by n SY matrix.
subroutine csyr2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable complex(4), device :: alpha ! device or host variable
subroutine cublasCsyr2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha ! device or host variable
integer(4) function cublasCsyr2_v2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha ! device or host variable
2.4.22. ctbmv
CTBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals.
subroutine ctbmv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x ! device or host variable
subroutine cublasCtbmv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
integer(4) function cublasCtbmv_v2(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.4.23. ctbsv
CTBSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ctbsv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x ! device or host variable
subroutine cublasCtbsv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
integer(4) function cublasCtbsv_v2(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.4.24. ctpmv
CTPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
subroutine ctpmv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x ! device or host variable
subroutine cublasCtpmv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x
integer(4) function cublasCtpmv_v2(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x
2.4.25. ctpsv
CTPSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ctpsv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x ! device or host variable
subroutine cublasCtpsv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x
integer(4) function cublasCtpsv_v2(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x
2.4.26. ctrmv
CTRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix.
subroutine ctrmv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x ! device or host variable
subroutine cublasCtrmv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
integer(4) function cublasCtrmv_v2(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.4.27. ctrsv
CTRSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ctrsv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x ! device or host variable
subroutine cublasCtrsv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
integer(4) function cublasCtrsv_v2(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.4.28. chbmv
CHBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian band matrix, with k super-diagonals.
subroutine chbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: k, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasChbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: k, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasChbmv_v2(h, uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: k, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.4.29. chemv
CHEMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix.
subroutine chemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(*) :: x, y ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasChemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasChemv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.4.30. chpmv
CHPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form.
subroutine chpmv(uplo, n, alpha, a, x, incx, beta, y, incy) character*1 :: uplo integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasChpmv(uplo, n, alpha, a, x, incx, beta, y, incy) character*1 :: uplo integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasChpmv_v2(h, uplo, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y complex(4), device :: alpha, beta ! device or host variable
2.4.31. cher
CHER performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix.
subroutine cher(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda complex(4), device, dimension(*) :: a, x ! device or host variable real(4), device :: alpha ! device or host variable
subroutine cublasCher(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda complex(4), device, dimension(*) :: a, x real(4), device :: alpha ! device or host variable
integer(4) function cublasCher_v2(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda complex(4), device, dimension(*) :: a, x real(4), device :: alpha ! device or host variable
2.4.32. cher2
CHER2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix.
subroutine cher2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda complex(4), device, dimension(*) :: a, x, y ! device or host variable complex(4), device :: alpha ! device or host variable
subroutine cublasCher2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda complex(4), device, dimension(*) :: a, x, y complex(4), device :: alpha ! device or host variable
integer(4) function cublasCher2_v2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda complex(4), device, dimension(*) :: a, x, y complex(4), device :: alpha ! device or host variable
2.4.33. chpr
CHPR performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix, supplied in packed form.
subroutine chpr(t, n, alpha, x, incx, a) character*1 :: t integer :: n, incx complex(4), device, dimension(*) :: a, x ! device or host variable real(4), device :: alpha ! device or host variable
subroutine cublasChpr(t, n, alpha, x, incx, a) character*1 :: t integer :: n, incx complex(4), device, dimension(*) :: a, x real(4), device :: alpha ! device or host variable
integer(4) function cublasChpr_v2(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx complex(4), device, dimension(*) :: a, x real(4), device :: alpha ! device or host variable
2.4.34. chpr2
CHPR2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form.
subroutine chpr2(t, n, alpha, x, incx, y, incy, a) character*1 :: t integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y ! device or host variable complex(4), device :: alpha ! device or host variable
subroutine cublasChpr2(t, n, alpha, x, incx, y, incy, a) character*1 :: t integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y complex(4), device :: alpha ! device or host variable
integer(4) function cublasChpr2_v2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y complex(4), device :: alpha ! device or host variable
2.4.35. cgemm
CGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
subroutine cgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: transa, transb integer :: m, n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldb, *) :: b ! device or host variable complex(4), device, dimension(ldc, *) :: c ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: transa, transb integer :: m, n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCgemm_v2(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.4.36. csymm
CSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
subroutine csymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldb, *) :: b ! device or host variable complex(4), device, dimension(ldc, *) :: c ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsymm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.4.37. csyrk
CSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine csyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldc, *) :: c ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsyrk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.4.38. csyr2k
CSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine csyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldb, *) :: b ! device or host variable complex(4), device, dimension(ldc, *) :: c ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsyr2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.4.39. csyrkx
CSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
subroutine csyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldb, *) :: b ! device or host variable complex(4), device, dimension(ldc, *) :: c ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasCsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCsyrkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.4.40. ctrmm
CTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ) where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H.
subroutine ctrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldb, *) :: b ! device or host variable complex(4), device :: alpha ! device or host variable
subroutine cublasCtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device :: alpha ! device or host variable
integer(4) function cublasCtrmm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha ! device or host variable
2.4.41. ctrsm
CTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
subroutine ctrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldb, *) :: b ! device or host variable complex(4), device :: alpha ! device or host variable
subroutine cublasCtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device :: alpha ! device or host variable
integer(4) function cublasCtrsm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device :: alpha ! device or host variable
2.4.42. chemm
CHEMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is an hermitian matrix and B and C are m by n matrices.
subroutine chemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldb, *) :: b ! device or host variable complex(4), device, dimension(ldc, *) :: c ! device or host variable complex(4), device :: alpha, beta ! device or host variable
subroutine cublasChemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
integer(4) function cublasChemm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.4.43. cherk
CHERK performs one of the hermitian rank k operations C := alpha*A*A**H + beta*C, or C := alpha*A**H*A + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine cherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldc, *) :: c ! device or host variable real(4), device :: alpha, beta ! device or host variable
subroutine cublasCherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
integer(4) function cublasCherk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.4.44. cher2k
CHER2K performs one of the hermitian rank 2k operations C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C, or C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C, where alpha and beta are scalars with beta real, C is an n by n hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine cher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldb, *) :: b ! device or host variable complex(4), device, dimension(ldc, *) :: c ! device or host variable complex(4), device :: alpha ! device or host variable real(4), device :: beta ! device or host variable
subroutine cublasCher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha ! device or host variable real(4), device :: beta ! device or host variable
integer(4) function cublasCher2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha ! device or host variable real(4), device :: beta ! device or host variable
2.4.45. cherkx
CHERKX performs a variation of the hermitian rank k operations C := alpha*A*B**H + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix stored in lower or upper mode, and A and B are n by k matrices. See the CUBLAS documentation for more details.
subroutine cherkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a ! device or host variable complex(4), device, dimension(ldb, *) :: b ! device or host variable complex(4), device, dimension(ldc, *) :: c ! device or host variable complex(4), device :: alpha ! device or host variable real(4), device :: beta ! device or host variable
subroutine cublasCherkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha ! device or host variable real(4), device :: beta ! device or host variable
integer(4) function cublasCherkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha ! device or host variable real(4), device :: beta ! device or host variable
2.4.46. cublasCgetrfBatched
CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm.
integer(4) function cublasCgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) integer, device :: info(*) integer :: batchCount
2.4.47. cublasCgetriBatched
CGETRI computes the inverse of a matrix using the LU factorization computed by CGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A).
integer(4) function cublasCgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Carray(*) integer :: ldc integer, device :: info(*) integer :: batchCount
2.4.48. cublasCgetrsBatched
CGETRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general N-by-N matrix A using the LU factorization computed by CGETRF.
integer(4) function cublasCgetrsBatched(h, trans, n, nrhs, Aarray, lda, ipvt, Barray, ldb, info, batchCount) type(cublasHandle) :: h integer :: trans ! integer or character(1) variable integer :: n, nrhs type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Barray(*) integer :: ldb integer :: info(*) integer :: batchCount
2.4.49. cublasCgemmBatched
CGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasCgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa ! integer or character(1) variable integer :: transb ! integer or character(1) variable integer :: m, n, k complex(4), device :: alpha ! device or host variable type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb complex(4), device :: beta ! device or host variable type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
integer(4) function cublasCgemmBatched_v2(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa integer :: transb integer :: m, n, k complex(4), device :: alpha ! device or host variable type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb complex(4), device :: beta ! device or host variable type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
2.4.50. cublasCtrsmBatched
CTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
integer(4) function cublasCtrsmBatched( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side ! integer or character(1) variable integer :: uplo ! integer or character(1) variable integer :: trans ! integer or character(1) variable integer :: diag ! integer or character(1) variable integer :: m, n complex(4), device :: alpha ! device or host variable type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
integer(4) function cublasCtrsmBatched_v2( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side integer :: uplo integer :: trans integer :: diag integer :: m, n complex(4), device :: alpha ! device or host variable type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
2.4.51. cublasCmatinvBatched
cublasCmatinvBatched is a short cut of cublasCgetrfBatched plus cublasCgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasCgetrfBatched and cublasCgetriBatched.
integer(4) function cublasCmatinvBatched(h, n, Aarray, lda, Ainv, lda_inv, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Ainv(*) integer :: lda_inv integer, device :: info(*) integer :: batchCount
2.4.52. cublasCgeqrfBatched
CGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.
integer(4) function cublasCgeqrfBatched(h, m, n, Aarray, lda, Tau, info, batchCount) type(cublasHandle) :: h integer :: m, n type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Tau(*) integer :: info(*) integer :: batchCount
2.4.53. cublasCgelsBatched
CGELS solves overdetermined or underdetermined complex linear systems involving an M-by-N matrix A, or its conjugate-transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = 'N' and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = 'N' and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = 'C' and m >= n: find the minimum norm solution of an undetermined system A**H * X = B. 4. If TRANS = 'C' and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**H * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X.
integer(4) function cublasCgelsBatched(h, trans, m, n, nrhs, Aarray, lda, Carray, ldc, info, devinfo, batchCount) type(cublasHandle) :: h integer :: trans ! integer or character(1) variable integer :: m, n, nrhs type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Carray(*) integer :: ldc integer :: info(*) integer, device :: devinfo(*) integer :: batchCount
2.5. Double Precision Complex Functions and Subroutines
This section contains interfaces to the double precision complex BLAS and cuBLAS functions and subroutines.
2.5.1. izamax
IZAMAX finds the index of the element having the maximum absolute value.
integer(4) function izamax(n, x, incx) integer :: n complex(8), device, dimension(*) :: x ! device or host variable integer :: incx
integer(4) function cublasIzamax(n, x, incx) integer :: n complex(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasIzamax_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.5.2. izamin
IZAMIN finds the index of the element having the minimum absolute value.
integer(4) function izamin(n, x, incx) integer :: n complex(8), device, dimension(*) :: x ! device or host variable integer :: incx
integer(4) function cublasIzamin(n, x, incx) integer :: n complex(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasIzamin_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.5.3. dzasum
DZASUM takes the sum of the absolute values.
real(8) function dzasum(n, x, incx) integer :: n complex(8), device, dimension(*) :: x ! device or host variable integer :: incx
real(8) function cublasDzasum(n, x, incx) integer :: n complex(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasDzasum_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x integer :: incx real(8), device :: res ! device or host variable
2.5.4. zaxpy
ZAXPY constant times a vector plus a vector.
subroutine zaxpy(n, a, x, incx, y, incy) integer :: n complex(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasZaxpy(n, a, x, incx, y, incy) integer :: n complex(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasZaxpy_v2(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.5.5. zcopy
ZCOPY copies a vector, x, to a vector, y.
subroutine zcopy(n, x, incx, y, incy) integer :: n complex(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasZcopy(n, x, incx, y, incy) integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasZcopy_v2(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.5.6. zdotc
ZDOTC forms the dot product of a vector.
complex(8) function zdotc(n, x, incx, y, incy) integer :: n complex(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
complex(8) function cublasZdotc(n, x, incx, y, incy) integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasZdotc_v2(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy complex(8), device :: res ! device or host variable
2.5.7. zdotu
ZDOTU forms the dot product of two vectors.
complex(8) function zdotu(n, x, incx, y, incy) integer :: n complex(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
complex(8) function cublasZdotu(n, x, incx, y, incy) integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasZdotu_v2(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy complex(8), device :: res ! device or host variable
2.5.8. dznrm2
DZNRM2 returns the euclidean norm of a vector via the function name, so that DZNRM2 := sqrt( x**H*x )
real(8) function dznrm2(n, x, incx) integer :: n complex(8), device, dimension(*) :: x ! device or host variable integer :: incx
real(8) function cublasDznrm2(n, x, incx) integer :: n complex(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasDznrm2_v2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x integer :: incx real(8), device :: res ! device or host variable
2.5.9. zrot
ZROT applies a plane rotation, where the cos (C) is real and the sin (S) is complex, and the vectors CX and CY are complex.
subroutine zrot(n, x, incx, y, incy, sc, ss) integer :: n real(8), device :: sc ! device or host variable complex(8), device :: ss ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasZrot(n, x, incx, y, incy, sc, ss) integer :: n real(8), device :: sc ! device or host variable complex(8), device :: ss ! device or host variable complex(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasZrot_v2(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(8), device :: sc ! device or host variable complex(8), device :: ss ! device or host variable complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.5.10. zsrot
ZSROT applies a plane rotation, where the cos and sin (c and s) are real and the vectors cx and cy are complex.
subroutine zsrot(n, x, incx, y, incy, sc, ss) integer :: n real(8), device :: sc, ss ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasZsrot(n, x, incx, y, incy, sc, ss) integer :: n real(8), device :: sc, ss ! device or host variable complex(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasZsrot_v2(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(8), device :: sc, ss ! device or host variable complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.5.11. zrotg
ZROTG determines a double complex Givens rotation.
subroutine zrotg(sa, sb, sc, ss) complex(8), device :: sa, sb, ss ! device or host variable real(8), device :: sc ! device or host variable
subroutine cublasZrotg(sa, sb, sc, ss) complex(8), device :: sa, sb, ss ! device or host variable real(8), device :: sc ! device or host variable
integer(4) function cublasZrotg_v2(h, sa, sb, sc, ss) type(cublasHandle) :: h complex(8), device :: sa, sb, ss ! device or host variable real(8), device :: sc ! device or host variable
2.5.12. zscal
ZSCAL scales a vector by a constant.
subroutine zscal(n, a, x, incx) integer :: n complex(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x ! device or host variable integer :: incx
subroutine cublasZscal(n, a, x, incx) integer :: n complex(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasZscal_v2(h, n, a, x, incx) type(cublasHandle) :: h integer :: n complex(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x integer :: incx
2.5.13. zdscal
ZDSCAL scales a vector by a constant.
subroutine zdscal(n, a, x, incx) integer :: n real(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x ! device or host variable integer :: incx
subroutine cublasZdscal(n, a, x, incx) integer :: n real(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x integer :: incx
integer(4) function cublasZdscal_v2(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x integer :: incx
2.5.14. zswap
ZSWAP interchanges two vectors.
subroutine zswap(n, x, incx, y, incy) integer :: n complex(8), device, dimension(*) :: x, y ! device or host variable integer :: incx, incy
subroutine cublasZswap(n, x, incx, y, incy) integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy
integer(4) function cublasZswap_v2(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.5.15. zgbmv
ZGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals.
subroutine zgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, kl, ku, lda, incx, incy complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZgbmv(t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, kl, ku, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZgbmv_v2(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.5.16. zgemv
ZGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
subroutine zgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZgemv(t, m, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: t integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZgemv_v2(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.5.17. zgerc
ZGERC performs the rank 1 operation A := alpha*x*y**H + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine zgerc(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable complex(8), device :: alpha ! device or host variable
subroutine cublasZgerc(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha ! device or host variable
integer(4) function cublasZgerc_v2(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha ! device or host variable
2.5.18. zgeru
ZGERU performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix.
subroutine zgeru(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable complex(8), device :: alpha ! device or host variable
subroutine cublasZgeru(m, n, alpha, x, incx, y, incy, a, lda) integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha ! device or host variable
integer(4) function cublasZgeru_v2(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha ! device or host variable
2.5.19. zsymv
ZSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix.
subroutine zsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsymv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.5.20. zsyr
ZSYR performs the symmetric rank 1 operation A := alpha*x*x**H + A, where alpha is a complex scalar, x is an n element vector and A is an n by n symmetric matrix.
subroutine zsyr(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x ! device or host variable complex(8), device :: alpha ! device or host variable
subroutine cublasZsyr(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x complex(8), device :: alpha ! device or host variable
integer(4) function cublasZsyr_v2(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x complex(8), device :: alpha ! device or host variable
2.5.21. zsyr2
ZSYR2 performs the symmetric rank 2 operation A := alpha*x*y' + alpha*y*x' + A, where alpha is a complex scalar, x and y are n element vectors and A is an n by n SY matrix.
subroutine zsyr2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable complex(8), device :: alpha ! device or host variable
subroutine cublasZsyr2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha ! device or host variable
integer(4) function cublasZsyr2_v2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha ! device or host variable
2.5.22. ztbmv
ZTBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals.
subroutine ztbmv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x ! device or host variable
subroutine cublasZtbmv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
integer(4) function cublasZtbmv_v2(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
2.5.23. ztbsv
ZTBSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ztbsv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x ! device or host variable
subroutine cublasZtbsv(u, t, d, n, k, a, lda, x, incx) character*1 :: u, t, d integer :: n, k, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
integer(4) function cublasZtbsv_v2(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
2.5.24. ztpmv
ZTPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form.
subroutine ztpmv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx complex(8), device, dimension(*) :: a, x ! device or host variable
subroutine cublasZtpmv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx complex(8), device, dimension(*) :: a, x
integer(4) function cublasZtpmv_v2(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(8), device, dimension(*) :: a, x
2.5.25. ztpsv
ZTPSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ztpsv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx complex(8), device, dimension(*) :: a, x ! device or host variable
subroutine cublasZtpsv(u, t, d, n, a, x, incx) character*1 :: u, t, d integer :: n, incx complex(8), device, dimension(*) :: a, x
integer(4) function cublasZtpsv_v2(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(8), device, dimension(*) :: a, x
2.5.26. ztrmv
ZTRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix.
subroutine ztrmv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x ! device or host variable
subroutine cublasZtrmv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
integer(4) function cublasZtrmv_v2(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
2.5.27. ztrsv
ZTRSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine.
subroutine ztrsv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x ! device or host variable
subroutine cublasZtrsv(u, t, d, n, a, lda, x, incx) character*1 :: u, t, d integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
integer(4) function cublasZtrsv_v2(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
2.5.28. zhbmv
ZHBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian band matrix, with k super-diagonals.
subroutine zhbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: k, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZhbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: k, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZhbmv_v2(h, uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: k, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.5.29. zhemv
ZHEMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix.
subroutine zhemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(*) :: x, y ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZhemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) character*1 :: uplo integer :: n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZhemv_v2(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.5.30. zhpmv
ZHPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form.
subroutine zhpmv(uplo, n, alpha, a, x, incx, beta, y, incy) character*1 :: uplo integer :: n, incx, incy complex(8), device, dimension(*) :: a, x, y ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZhpmv(uplo, n, alpha, a, x, incx, beta, y, incy) character*1 :: uplo integer :: n, incx, incy complex(8), device, dimension(*) :: a, x, y complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZhpmv_v2(h, uplo, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, incx, incy complex(8), device, dimension(*) :: a, x, y complex(8), device :: alpha, beta ! device or host variable
2.5.31. zher
ZHER performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix.
subroutine zher(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda complex(8), device, dimension(*) :: a, x ! device or host variable real(8), device :: alpha ! device or host variable
subroutine cublasZher(t, n, alpha, x, incx, a, lda) character*1 :: t integer :: n, incx, lda complex(8), device, dimension(*) :: a, x real(8), device :: alpha ! device or host variable
integer(4) function cublasZher_v2(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda complex(8), device, dimension(*) :: a, x real(8), device :: alpha ! device or host variable
2.5.32. zher2
ZHER2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix.
subroutine zher2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy complex(8), device, dimension(*) :: a, x, y ! device or host variable complex(8), device :: alpha ! device or host variable
subroutine cublasZher2(t, n, alpha, x, incx, y, incy, a, lda) character*1 :: t integer :: n, incx, incy, lda complex(8), device, dimension(*) :: a, x, y complex(8), device :: alpha ! device or host variable
integer(4) function cublasZher2_v2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda complex(8), device, dimension(*) :: a, x, y complex(8), device :: alpha ! device or host variable
2.5.33. zhpr
ZHPR performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix, supplied in packed form.
subroutine zhpr(t, n, alpha, x, incx, a) character*1 :: t integer :: n, incx complex(8), device, dimension(*) :: a, x ! device or host variable real(8), device :: alpha ! device or host variable
subroutine cublasZhpr(t, n, alpha, x, incx, a) character*1 :: t integer :: n, incx complex(8), device, dimension(*) :: a, x real(8), device :: alpha ! device or host variable
integer(4) function cublasZhpr_v2(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx complex(8), device, dimension(*) :: a, x real(8), device :: alpha ! device or host variable
2.5.34. zhpr2
ZHPR2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form.
subroutine zhpr2(t, n, alpha, x, incx, y, incy, a) character*1 :: t integer :: n, incx, incy complex(8), device, dimension(*) :: a, x, y ! device or host variable complex(8), device :: alpha ! device or host variable
subroutine cublasZhpr2(t, n, alpha, x, incx, y, incy, a) character*1 :: t integer :: n, incx, incy complex(8), device, dimension(*) :: a, x, y complex(8), device :: alpha ! device or host variable
integer(4) function cublasZhpr2_v2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy complex(8), device, dimension(*) :: a, x, y complex(8), device :: alpha ! device or host variable
2.5.35. zgemm
ZGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
subroutine zgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: transa, transb integer :: m, n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldb, *) :: b ! device or host variable complex(8), device, dimension(ldc, *) :: c ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: transa, transb integer :: m, n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZgemm_v2(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.5.36. zsymm
ZSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
subroutine zsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldb, *) :: b ! device or host variable complex(8), device, dimension(ldc, *) :: c ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsymm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.5.37. zsyrk
ZSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine zsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldc, *) :: c ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsyrk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.5.38. zsyr2k
ZSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine zsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldb, *) :: b ! device or host variable complex(8), device, dimension(ldc, *) :: c ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsyr2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.5.39. zsyrkx
ZSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
subroutine zsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldb, *) :: b ! device or host variable complex(8), device, dimension(ldc, *) :: c ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZsyrkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZsyrkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.5.40. ztrmm
ZTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ) where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H.
subroutine ztrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldb, *) :: b ! device or host variable complex(8), device :: alpha ! device or host variable
subroutine cublasZtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device :: alpha ! device or host variable
integer(4) function cublasZtrmm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha ! device or host variable
2.5.41. ztrsm
ZTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
subroutine ztrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldb, *) :: b ! device or host variable complex(8), device :: alpha ! device or host variable
subroutine cublasZtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) character*1 :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device :: alpha ! device or host variable
integer(4) function cublasZtrsm_v2(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device :: alpha ! device or host variable
2.5.42. zhemm
ZHEMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is an hermitian matrix and B and C are m by n matrices.
subroutine zhemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldb, *) :: b ! device or host variable complex(8), device, dimension(ldc, *) :: c ! device or host variable complex(8), device :: alpha, beta ! device or host variable
subroutine cublasZhemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: side, uplo integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZhemm_v2(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.5.43. zherk
ZHERK performs one of the hermitian rank k operations C := alpha*A*A**H + beta*C, or C := alpha*A**H*A + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
subroutine zherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldc, *) :: c ! device or host variable real(8), device :: alpha, beta ! device or host variable
subroutine cublasZherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
integer(4) function cublasZherk_v2(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.5.44. zher2k
ZHER2K performs one of the hermitian rank 2k operations C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C, or C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C, where alpha and beta are scalars with beta real, C is an n by n hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
subroutine zher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldb, *) :: b ! device or host variable complex(8), device, dimension(ldc, *) :: c ! device or host variable complex(8), device :: alpha ! device or host variable real(8), device :: beta ! device or host variable
subroutine cublasZher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha ! device or host variable real(8), device :: beta ! device or host variable
integer(4) function cublasZher2k_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha ! device or host variable real(8), device :: beta ! device or host variable
2.5.45. zherkx
ZHERKX performs a variation of the hermitian rank k operations C := alpha*A*B**H + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix stored in lower or upper mode, and A and B are n by k matrices. See the CUBLAS documentation for more details.
subroutine zherkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a ! device or host variable complex(8), device, dimension(ldb, *) :: b ! device or host variable complex(8), device, dimension(ldc, *) :: c ! device or host variable complex(8), device :: alpha ! device or host variable real(8), device :: beta ! device or host variable
subroutine cublasZherkx(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) character*1 :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha ! device or host variable real(8), device :: beta ! device or host variable
integer(4) function cublasZherkx_v2(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha ! device or host variable real(8), device :: beta ! device or host variable
2.5.46. cublasZgetrfBatched
ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm.
integer(4) function cublasZgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) integer, device :: info(*) integer :: batchCount
2.5.47. cublasZgetriBatched
ZGETRI computes the inverse of a matrix using the LU factorization computed by ZGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A).
integer(4) function cublasZgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Carray(*) integer :: ldc integer, device :: info(*) integer :: batchCount
2.5.48. cublasZgetrsBatched
ZGETRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general N-by-N matrix A using the LU factorization computed by ZGETRF.
integer(4) function cublasZgetrsBatched(h, trans, n, nrhs, Aarray, lda, ipvt, Barray, ldb, info, batchCount) type(cublasHandle) :: h integer :: trans ! integer or character(1) variable integer :: n, nrhs type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Barray(*) integer :: ldb integer :: info(*) integer :: batchCount
2.5.49. cublasZgemmBatched
ZGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasZgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa ! integer or character(1) variable integer :: transb ! integer or character(1) variable integer :: m, n, k complex(8), device :: alpha ! device or host variable type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb complex(8), device :: beta ! device or host variable type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
integer(4) function cublasZgemmBatched_v2(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa integer :: transb integer :: m, n, k complex(8), device :: alpha ! device or host variable type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb complex(8), device :: beta ! device or host variable type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
2.5.50. cublasZtrsmBatched
ZTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
integer(4) function cublasZtrsmBatched( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side ! integer or character(1) variable integer :: uplo ! integer or character(1) variable integer :: trans ! integer or character(1) variable integer :: diag ! integer or character(1) variable integer :: m, n complex(8), device :: alpha ! device or host variable type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
integer(4) function cublasZtrsmBatched_v2( h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side integer :: uplo integer :: trans integer :: diag integer :: m, n complex(8), device :: alpha ! device or host variable type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
2.5.51. cublasZmatinvBatched
cublasZmatinvBatched is a short cut of cublasZgetrfBatched plus cublasZgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasZgetrfBatched and cublasZgetriBatched.
integer(4) function cublasZmatinvBatched(h, n, Aarray, lda, Ainv, lda_inv, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Ainv(*) integer :: lda_inv integer, device :: info(*) integer :: batchCount
2.5.52. cublasZgeqrfBatched
ZGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R.
integer(4) function cublasZgeqrfBatched(h, m, n, Aarray, lda, Tau, info, batchCount) type(cublasHandle) :: h integer :: m, n type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Tau(*) integer :: info(*) integer :: batchCount
2.5.53. cublasZgelsBatched
ZGELS solves overdetermined or underdetermined complex linear systems involving an M-by-N matrix A, or its conjugate-transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = 'N' and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = 'N' and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = 'C' and m >= n: find the minimum norm solution of an undetermined system A**H * X = B. 4. If TRANS = 'C' and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**H * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X.
integer(4) function cublasZgelsBatched(h, trans, m, n, nrhs, Aarray, lda, Carray, ldc, info, devinfo, batchCount) type(cublasHandle) :: h integer :: trans ! integer or character(1) variable integer :: m, n, nrhs type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Carray(*) integer :: ldc integer :: info(*) integer, device :: devinfo(*) integer :: batchCount
2.6. CUBLAS V2 Module Functions
This section contains interfaces to the cuBLAS V2 Module Functions. Users can access this module by inserting the line use cublas_v2 into the program unit. One major difference in the cublas_v2 versus the cublas module is the cublas entry points, such as cublasIsamax are changed to take the handle as the first argument. The second difference in the cublas_v2 module is the v2 entry points, such as cublasIsamax_v2 do not implicitly handle the pointer modes for the user. It is up to the programmer to make calls to cublasSetPointerMode to tell the library if scalar arguments reside on the host or device. The actual interfaces to the v2 entry points do not change, and are not listed in this section.
2.6.1. Single Precision Functions and Subroutines
This section contains the V2 interfaces to the single precision BLAS and cuBLAS functions and subroutines.
2.6.1.1. isamax
If you use the cublas_v2 module, the interface for cublasIsamax is changed to the following:
integer(4) function cublasIsamax(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.6.1.2. isamin
If you use the cublas_v2 module, the interface for cublasIsamin is changed to the following:
integer(4) function cublasIsamin(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.6.1.3. sasum
If you use the cublas_v2 module, the interface for cublasSasum is changed to the following:
integer(4) function cublasSasum(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx real(4), device :: res ! device or host variable
2.6.1.4. saxpy
If you use the cublas_v2 module, the interface for cublasSaxpy is changed to the following:
integer(4) function cublasSaxpy(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(4), device :: a ! device or host variable real(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.1.5. scopy
If you use the cublas_v2 module, the interface for cublasScopy is changed to the following:
integer(4) function cublasScopy(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.1.6. sdot
If you use the cublas_v2 module, the interface for cublasSdot is changed to the following:
integer(4) function cublasSdot(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy real(4), device :: res ! device or host variable
2.6.1.7. snrm2
If you use the cublas_v2 module, the interface for cublasSnrm2 is changed to the following:
integer(4) function cublasSnrm2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx real(4), device :: res ! device or host variable
2.6.1.8. srot
If you use the cublas_v2 module, the interface for cublasSrot is changed to the following:
integer(4) function cublasSrot(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(4), device :: sc, ss ! device or host variable real(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.1.9. srotg
If you use the cublas_v2 module, the interface for cublasSrotg is changed to the following:
integer(4) function cublasSrotg(h, sa, sb, sc, ss) type(cublasHandle) :: h real(4), device :: sa, sb, sc, ss ! device or host variable
2.6.1.10. srotm
If you use the cublas_v2 module, the interface for cublasSrotm is changed to the following:
integer(4) function cublasSrotm(h, n, x, incx, y, incy, param) type(cublasHandle) :: h integer :: n real(4), device :: param(*) ! device or host variable real(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.1.11. srotmg
If you use the cublas_v2 module, the interface for cublasSrotmg is changed to the following:
integer(4) function cublasSrotmg(h, d1, d2, x1, y1, param) type(cublasHandle) :: h real(4), device :: d1, d2, x1, y1, param(*) ! device or host variable
2.6.1.12. sscal
If you use the cublas_v2 module, the interface for cublasSscal is changed to the following:
integer(4) function cublasSscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(4), device :: a ! device or host variable real(4), device, dimension(*) :: x integer :: incx
2.6.1.13. sswap
If you use the cublas_v2 module, the interface for cublasSswap is changed to the following:
integer(4) function cublasSswap(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.1.14. sgbmv
If you use the cublas_v2 module, the interface for cublasSgbmv is changed to the following:
integer(4) function cublasSgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
2.6.1.15. sgemv
If you use the cublas_v2 module, the interface for cublasSgemv is changed to the following:
integer(4) function cublasSgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
2.6.1.16. sger
If you use the cublas_v2 module, the interface for cublasSger is changed to the following:
integer(4) function cublasSger(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha ! device or host variable
2.6.1.17. ssbmv
If you use the cublas_v2 module, the interface for cublasSsbmv is changed to the following:
integer(4) function cublasSsbmv(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: k, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
2.6.1.18. sspmv
If you use the cublas_v2 module, the interface for cublasSspmv is changed to the following:
integer(4) function cublasSspmv(h, t, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y real(4), device :: alpha, beta ! device or host variable
2.6.1.19. sspr
If you use the cublas_v2 module, the interface for cublasSspr is changed to the following:
integer(4) function cublasSspr(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx real(4), device, dimension(*) :: a, x real(4), device :: alpha ! device or host variable
2.6.1.20. sspr2
If you use the cublas_v2 module, the interface for cublasSspr2 is changed to the following:
integer(4) function cublasSspr2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y real(4), device :: alpha ! device or host variable
2.6.1.21. ssymv
If you use the cublas_v2 module, the interface for cublasSsymv is changed to the following:
integer(4) function cublasSsymv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha, beta ! device or host variable
2.6.1.22. ssyr
If you use the cublas_v2 module, the interface for cublasSsyr is changed to the following:
integer(4) function cublasSsyr(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x real(4), device :: alpha ! device or host variable
2.6.1.23. ssyr2
If you use the cublas_v2 module, the interface for cublasSsyr2 is changed to the following:
integer(4) function cublasSsyr2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4), device :: alpha ! device or host variable
2.6.1.24. stbmv
If you use the cublas_v2 module, the interface for cublasStbmv is changed to the following:
integer(4) function cublasStbmv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.6.1.25. stbsv
If you use the cublas_v2 module, the interface for cublasStbsv is changed to the following:
integer(4) function cublasStbsv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.6.1.26. stpmv
If you use the cublas_v2 module, the interface for cublasStpmv is changed to the following:
integer(4) function cublasStpmv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x
2.6.1.27. stpsv
If you use the cublas_v2 module, the interface for cublasStpsv is changed to the following:
integer(4) function cublasStpsv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x
2.6.1.28. strmv
If you use the cublas_v2 module, the interface for cublasStrmv is changed to the following:
integer(4) function cublasStrmv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.6.1.29. strsv
If you use the cublas_v2 module, the interface for cublasStrsv is changed to the following:
integer(4) function cublasStrsv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.6.1.30. sgemm
If you use the cublas_v2 module, the interface for cublasSgemm is changed to the following:
integer(4) function cublasSgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.6.1.31. ssymm
If you use the cublas_v2 module, the interface for cublasSsymm is changed to the following:
integer(4) function cublasSsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.6.1.32. ssyrk
If you use the cublas_v2 module, the interface for cublasSsyrk is changed to the following:
integer(4) function cublasSsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.6.1.33. ssyr2k
If you use the cublas_v2 module, the interface for cublasSsyr2k is changed to the following:
integer(4) function cublasSsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.6.1.34. ssyrkx
If you use the cublas_v2 module, the interface for cublasSsyrkx is changed to the following:
integer(4) function cublasSsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.6.1.35. strmm
If you use the cublas_v2 module, the interface for cublasStrmm is changed to the following:
integer(4) function cublasStrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4), device :: alpha ! device or host variable
2.6.1.36. strsm
If you use the cublas_v2 module, the interface for cublasStrsm is changed to the following:
integer(4) function cublasStrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device :: alpha ! device or host variable
2.6.2. Double Precision Functions and Subroutines
This section contains the V2 interfaces to the double precision BLAS and cuBLAS functions and subroutines.
2.6.2.1. idamax
If you use the cublas_v2 module, the interface for cublasIdamax is changed to the following:
integer(4) function cublasIdamax(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.6.2.2. idamin
If you use the cublas_v2 module, the interface for cublasIdamin is changed to the following:
integer(4) function cublasIdamin(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.6.2.3. dasum
If you use the cublas_v2 module, the interface for cublasDasum is changed to the following:
integer(4) function cublasDasum(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx real(8), device :: res ! device or host variable
2.6.2.4. daxpy
If you use the cublas_v2 module, the interface for cublasDaxpy is changed to the following:
integer(4) function cublasDaxpy(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(8), device :: a ! device or host variable real(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.2.5. dcopy
If you use the cublas_v2 module, the interface for cublasDcopy is changed to the following:
integer(4) function cublasDcopy(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.2.6. ddot
If you use the cublas_v2 module, the interface for cublasDdot is changed to the following:
integer(4) function cublasDdot(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy real(8), device :: res ! device or host variable
2.6.2.7. dnrm2
If you use the cublas_v2 module, the interface for cublasDnrm2 is changed to the following:
integer(4) function cublasDnrm2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx real(8), device :: res ! device or host variable
2.6.2.8. drot
If you use the cublas_v2 module, the interface for cublasDrot is changed to the following:
integer(4) function cublasDrot(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(8), device :: sc, ss ! device or host variable real(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.2.9. drotg
If you use the cublas_v2 module, the interface for cublasDrotg is changed to the following:
integer(4) function cublasDrotg(h, sa, sb, sc, ss) type(cublasHandle) :: h real(8), device :: sa, sb, sc, ss ! device or host variable
2.6.2.10. drotm
If you use the cublas_v2 module, the interface for cublasDrotm is changed to the following:
integer(4) function cublasDrotm(h, n, x, incx, y, incy, param) type(cublasHandle) :: h integer :: n real(8), device :: param(*) ! device or host variable real(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.2.11. drotmg
If you use the cublas_v2 module, the interface for cublasDrotmg is changed to the following:
integer(4) function cublasDrotmg(h, d1, d2, x1, y1, param) type(cublasHandle) :: h real(8), device :: d1, d2, x1, y1, param(*) ! device or host variable
2.6.2.12. dscal
If you use the cublas_v2 module, the interface for cublasDscal is changed to the following:
integer(4) function cublasDscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(8), device :: a ! device or host variable real(8), device, dimension(*) :: x integer :: incx
2.6.2.13. dswap
If you use the cublas_v2 module, the interface for cublasDswap is changed to the following:
integer(4) function cublasDswap(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.2.14. dgbmv
If you use the cublas_v2 module, the interface for cublasDgbmv is changed to the following:
integer(4) function cublasDgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
2.6.2.15. dgemv
If you use the cublas_v2 module, the interface for cublasDgemv is changed to the following:
integer(4) function cublasDgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
2.6.2.16. dger
If you use the cublas_v2 module, the interface for cublasDger is changed to the following:
integer(4) function cublasDger(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha ! device or host variable
2.6.2.17. dsbmv
If you use the cublas_v2 module, the interface for cublasDsbmv is changed to the following:
integer(4) function cublasDsbmv(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: k, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
2.6.2.18. dspmv
If you use the cublas_v2 module, the interface for cublasDspmv is changed to the following:
integer(4) function cublasDspmv(h, t, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y real(8), device :: alpha, beta ! device or host variable
2.6.2.19. dspr
If you use the cublas_v2 module, the interface for cublasDspr is changed to the following:
integer(4) function cublasDspr(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx real(8), device, dimension(*) :: a, x real(8), device :: alpha ! device or host variable
2.6.2.20. dspr2
If you use the cublas_v2 module, the interface for cublasDspr2 is changed to the following:
integer(4) function cublasDspr2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y real(8), device :: alpha ! device or host variable
2.6.2.21. dsymv
If you use the cublas_v2 module, the interface for cublasDsymv is changed to the following:
integer(4) function cublasDsymv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha, beta ! device or host variable
2.6.2.22. dsyr
If you use the cublas_v2 module, the interface for cublasDsyr is changed to the following:
integer(4) function cublasDsyr(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x real(8), device :: alpha ! device or host variable
2.6.2.23. dsyr2
If you use the cublas_v2 module, the interface for cublasDsyr2 is changed to the following:
integer(4) function cublasDsyr2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8), device :: alpha ! device or host variable
2.6.2.24. dtbmv
If you use the cublas_v2 module, the interface for cublasDtbmv is changed to the following:
integer(4) function cublasDtbmv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
2.6.2.25. dtbsv
If you use the cublas_v2 module, the interface for cublasDtbsv is changed to the following:
integer(4) function cublasDtbsv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
2.6.2.26. dtpmv
If you use the cublas_v2 module, the interface for cublasDtpmv is changed to the following:
integer(4) function cublasDtpmv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(8), device, dimension(*) :: a, x
2.6.2.27. dtpsv
If you use the cublas_v2 module, the interface for cublasDtpsv is changed to the following:
integer(4) function cublasDtpsv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(8), device, dimension(*) :: a, x
2.6.2.28. dtrmv
If you use the cublas_v2 module, the interface for cublasDtrmv is changed to the following:
integer(4) function cublasDtrmv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
2.6.2.29. dtrsv
If you use the cublas_v2 module, the interface for cublasDtrsv is changed to the following:
integer(4) function cublasDtrsv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x
2.6.2.30. dgemm
If you use the cublas_v2 module, the interface for cublasDgemm is changed to the following:
integer(4) function cublasDgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.6.2.31. dsymm
If you use the cublas_v2 module, the interface for cublasDsymm is changed to the following:
integer(4) function cublasDsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.6.2.32. dsyrk
If you use the cublas_v2 module, the interface for cublasDsyrk is changed to the following:
integer(4) function cublasDsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.6.2.33. dsyr2k
If you use the cublas_v2 module, the interface for cublasDsyr2k is changed to the following:
integer(4) function cublasDsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.6.2.34. dsyrkx
If you use the cublas_v2 module, the interface for cublasDsyrkx is changed to the following:
integer(4) function cublasDsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.6.2.35. dtrmm
If you use the cublas_v2 module, the interface for cublasDtrmm is changed to the following:
integer(4) function cublasDtrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device, dimension(ldc, *) :: c real(8), device :: alpha ! device or host variable
2.6.2.36. dtrsm
If you use the cublas_v2 module, the interface for cublasDtrsm is changed to the following:
integer(4) function cublasDtrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb real(8), device, dimension(lda, *) :: a real(8), device, dimension(ldb, *) :: b real(8), device :: alpha ! device or host variable
2.6.3. Single Precision Complex Functions and Subroutines
This section contains the V2 interfaces to the single precision complex BLAS and cuBLAS functions and subroutines.
2.6.3.1. icamax
If you use the cublas_v2 module, the interface for cublasIcamax is changed to the following:
integer(4) function cublasIcamax(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.6.3.2. icamin
If you use the cublas_v2 module, the interface for cublasIcamin is changed to the following:
integer(4) function cublasIcamin(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.6.3.3. scasum
If you use the cublas_v2 module, the interface for cublasScasum is changed to the following:
integer(4) function cublasScasum(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx real(4), device :: res ! device or host variable
2.6.3.4. caxpy
If you use the cublas_v2 module, the interface for cublasCaxpy is changed to the following:
integer(4) function cublasCaxpy(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.3.5. ccopy
If you use the cublas_v2 module, the interface for cublasCcopy is changed to the following:
integer(4) function cublasCcopy(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.3.6. cdotc
If you use the cublas_v2 module, the interface for cublasCdotc is changed to the following:
integer(4) function cublasCdotc(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy complex(4), device :: res ! device or host variable
2.6.3.7. cdotu
If you use the cublas_v2 module, the interface for cublasCdotu is changed to the following:
integer(4) function cublasCdotu(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy complex(4), device :: res ! device or host variable
2.6.3.8. scnrm2
If you use the cublas_v2 module, the interface for cublasScnrm2 is changed to the following:
integer(4) function cublasScnrm2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx real(4), device :: res ! device or host variable
2.6.3.9. crot
If you use the cublas_v2 module, the interface for cublasCrot is changed to the following:
integer(4) function cublasCrot(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(4), device :: sc ! device or host variable complex(4), device :: ss ! device or host variable complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.3.10. csrot
If you use the cublas_v2 module, the interface for cublasCsrot is changed to the following:
integer(4) function cublasCsrot(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(4), device :: sc, ss ! device or host variable complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.3.11. crotg
If you use the cublas_v2 module, the interface for cublasCrotg is changed to the following:
integer(4) function cublasCrotg(h, sa, sb, sc, ss) type(cublasHandle) :: h complex(4), device :: sa, sb, ss ! device or host variable real(4), device :: sc ! device or host variable
2.6.3.12. cscal
If you use the cublas_v2 module, the interface for cublasCscal is changed to the following:
integer(4) function cublasCscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n complex(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x integer :: incx
2.6.3.13. csscal
If you use the cublas_v2 module, the interface for cublasCsscal is changed to the following:
integer(4) function cublasCsscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(4), device :: a ! device or host variable complex(4), device, dimension(*) :: x integer :: incx
2.6.3.14. cswap
If you use the cublas_v2 module, the interface for cublasCswap is changed to the following:
integer(4) function cublasCswap(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.6.3.15. cgbmv
If you use the cublas_v2 module, the interface for cublasCgbmv is changed to the following:
integer(4) function cublasCgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.6.3.16. cgemv
If you use the cublas_v2 module, the interface for cublasCgemv is changed to the following:
integer(4) function cublasCgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.6.3.17. cgerc
If you use the cublas_v2 module, the interface for cublasCgerc is changed to the following:
integer(4) function cublasCgerc(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha ! device or host variable
2.6.3.18. cgeru
If you use the cublas_v2 module, the interface for cublasCgeru is changed to the following:
integer(4) function cublasCgeru(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha ! device or host variable
2.6.3.19. csymv
If you use the cublas_v2 module, the interface for cublasCsymv is changed to the following:
integer(4) function cublasCsymv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.6.3.20. csyr
If you use the cublas_v2 module, the interface for cublasCsyr is changed to the following:
integer(4) function cublasCsyr(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x complex(4), device :: alpha ! device or host variable
2.6.3.21. csyr2
If you use the cublas_v2 module, the interface for cublasCsyr2 is changed to the following:
integer(4) function cublasCsyr2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha ! device or host variable
2.6.3.22. ctbmv
If you use the cublas_v2 module, the interface for cublasCtbmv is changed to the following:
integer(4) function cublasCtbmv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.6.3.23. ctbsv
If you use the cublas_v2 module, the interface for cublasCtbsv is changed to the following:
integer(4) function cublasCtbsv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.6.3.24. ctpmv
If you use the cublas_v2 module, the interface for cublasCtpmv is changed to the following:
integer(4) function cublasCtpmv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x
2.6.3.25. ctpsv
If you use the cublas_v2 module, the interface for cublasCtpsv is changed to the following:
integer(4) function cublasCtpsv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x
2.6.3.26. ctrmv
If you use the cublas_v2 module, the interface for cublasCtrmv is changed to the following:
integer(4) function cublasCtrmv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.6.3.27. ctrsv
If you use the cublas_v2 module, the interface for cublasCtrsv is changed to the following:
integer(4) function cublasCtrsv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.6.3.28. chbmv
If you use the cublas_v2 module, the interface for cublasChbmv is changed to the following:
integer(4) function cublasChbmv(h, uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: k, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.6.3.29. chemv
If you use the cublas_v2 module, the interface for cublasChemv is changed to the following:
integer(4) function cublasChemv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4), device :: alpha, beta ! device or host variable
2.6.3.30. chpmv
If you use the cublas_v2 module, the interface for cublasChpmv is changed to the following:
integer(4) function cublasChpmv(h, uplo, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y complex(4), device :: alpha, beta ! device or host variable
2.6.3.31. cher
If you use the cublas_v2 module, the interface for cublasCher is changed to the following:
integer(4) function cublasCher(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda complex(4), device, dimension(*) :: a, x real(4), device :: alpha ! device or host variable
2.6.3.32. cher2
If you use the cublas_v2 module, the interface for cublasCher2 is changed to the following:
integer(4) function cublasCher2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda complex(4), device, dimension(*) :: a, x, y complex(4), device :: alpha ! device or host variable
2.6.3.33. chpr
If you use the cublas_v2 module, the interface for cublasChpr is changed to the following:
integer(4) function cublasChpr(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx complex(4), device, dimension(*) :: a, x real(4), device :: alpha ! device or host variable
2.6.3.34. chpr2
If you use the cublas_v2 module, the interface for cublasChpr2 is changed to the following:
integer(4) function cublasChpr2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y complex(4), device :: alpha ! device or host variable
2.6.3.35. cgemm
If you use the cublas_v2 module, the interface for cublasCgemm is changed to the following:
integer(4) function cublasCgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.6.3.36. csymm
If you use the cublas_v2 module, the interface for cublasCsymm is changed to the following:
integer(4) function cublasCsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.6.3.37. csyrk
If you use the cublas_v2 module, the interface for cublasCsyrk is changed to the following:
integer(4) function cublasCsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.6.3.38. csyr2k
If you use the cublas_v2 module, the interface for cublasCsyr2k is changed to the following:
integer(4) function cublasCsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.6.3.39. csyrkx
If you use the cublas_v2 module, the interface for cublasCsyrkx is changed to the following:
integer(4) function cublasCsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.6.3.40. ctrmm
If you use the cublas_v2 module, the interface for cublasCtrmm is changed to the following:
integer(4) function cublasCtrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha ! device or host variable
2.6.3.41. ctrsm
If you use the cublas_v2 module, the interface for cublasCtrsm is changed to the following:
integer(4) function cublasCtrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device :: alpha ! device or host variable
2.6.3.42. chemm
If you use the cublas_v2 module, the interface for cublasChemm is changed to the following:
integer(4) function cublasChemm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha, beta ! device or host variable
2.6.3.43. cherk
If you use the cublas_v2 module, the interface for cublasCherk is changed to the following:
integer(4) function cublasCherk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldc, *) :: c real(4), device :: alpha, beta ! device or host variable
2.6.3.44. cher2k
If you use the cublas_v2 module, the interface for cublasCher2k is changed to the following:
integer(4) function cublasCher2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha ! device or host variable real(4), device :: beta ! device or host variable
2.6.3.45. cherkx
If you use the cublas_v2 module, the interface for cublasCherkx is changed to the following:
integer(4) function cublasCherkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4), device :: alpha ! device or host variable real(4), device :: beta ! device or host variable
2.6.4. Double Precision Complex Functions and Subroutines
This section contains the V2 interfaces to the double precision complex BLAS and cuBLAS functions and subroutines.
2.6.4.1. izamax
If you use the cublas_v2 module, the interface for cublasIzamax is changed to the following:
integer(4) function cublasIzamax(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.6.4.2. izamin
If you use the cublas_v2 module, the interface for cublasIzamin is changed to the following:
integer(4) function cublasIzamin(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x integer :: incx integer, device :: res ! device or host variable
2.6.4.3. dzasum
If you use the cublas_v2 module, the interface for cublasDzasum is changed to the following:
integer(4) function cublasDzasum(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x integer :: incx real(8), device :: res ! device or host variable
2.6.4.4. zaxpy
If you use the cublas_v2 module, the interface for cublasZaxpy is changed to the following:
integer(4) function cublasZaxpy(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.4.5. zcopy
If you use the cublas_v2 module, the interface for cublasZcopy is changed to the following:
integer(4) function cublasZcopy(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.4.6. zdotc
If you use the cublas_v2 module, the interface for cublasZdotc is changed to the following:
integer(4) function cublasZdotc(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy complex(8), device :: res ! device or host variable
2.6.4.7. zdotu
If you use the cublas_v2 module, the interface for cublasZdotu is changed to the following:
integer(4) function cublasZdotu(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy complex(8), device :: res ! device or host variable
2.6.4.8. dznrm2
If you use the cublas_v2 module, the interface for cublasDznrm2 is changed to the following:
integer(4) function cublasDznrm2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x integer :: incx real(8), device :: res ! device or host variable
2.6.4.9. zrot
If you use the cublas_v2 module, the interface for cublasZrot is changed to the following:
integer(4) function cublasZrot(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(8), device :: sc ! device or host variable complex(8), device :: ss ! device or host variable complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.4.10. zsrot
If you use the cublas_v2 module, the interface for cublasZsrot is changed to the following:
integer(4) function cublasZsrot(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(8), device :: sc, ss ! device or host variable complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.4.11. zrotg
If you use the cublas_v2 module, the interface for cublasZrotg is changed to the following:
integer(4) function cublasZrotg(h, sa, sb, sc, ss) type(cublasHandle) :: h complex(8), device :: sa, sb, ss ! device or host variable real(8), device :: sc ! device or host variable
2.6.4.12. zscal
If you use the cublas_v2 module, the interface for cublasZscal is changed to the following:
integer(4) function cublasZscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n complex(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x integer :: incx
2.6.4.13. zdscal
If you use the cublas_v2 module, the interface for cublasZdscal is changed to the following:
integer(4) function cublasZdscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(8), device :: a ! device or host variable complex(8), device, dimension(*) :: x integer :: incx
2.6.4.14. zswap
If you use the cublas_v2 module, the interface for cublasZswap is changed to the following:
integer(4) function cublasZswap(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(8), device, dimension(*) :: x, y integer :: incx, incy
2.6.4.15. zgbmv
If you use the cublas_v2 module, the interface for cublasZgbmv is changed to the following:
integer(4) function cublasZgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.6.4.16. zgemv
If you use the cublas_v2 module, the interface for cublasZgemv is changed to the following:
integer(4) function cublasZgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.6.4.17. zgerc
If you use the cublas_v2 module, the interface for cublasZgerc is changed to the following:
integer(4) function cublasZgerc(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha ! device or host variable
2.6.4.18. zgeru
If you use the cublas_v2 module, the interface for cublasZgeru is changed to the following:
integer(4) function cublasZgeru(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha ! device or host variable
2.6.4.19. zsymv
If you use the cublas_v2 module, the interface for cublasZsymv is changed to the following:
integer(4) function cublasZsymv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.6.4.20. zsyr
If you use the cublas_v2 module, the interface for cublasZsyr is changed to the following:
integer(4) function cublasZsyr(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x complex(8), device :: alpha ! device or host variable
2.6.4.21. zsyr2
If you use the cublas_v2 module, the interface for cublasZsyr2 is changed to the following:
integer(4) function cublasZsyr2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha ! device or host variable
2.6.4.22. ztbmv
If you use the cublas_v2 module, the interface for cublasZtbmv is changed to the following:
integer(4) function cublasZtbmv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
2.6.4.23. ztbsv
If you use the cublas_v2 module, the interface for cublasZtbsv is changed to the following:
integer(4) function cublasZtbsv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
2.6.4.24. ztpmv
If you use the cublas_v2 module, the interface for cublasZtpmv is changed to the following:
integer(4) function cublasZtpmv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(8), device, dimension(*) :: a, x
2.6.4.25. ztpsv
If you use the cublas_v2 module, the interface for cublasZtpsv is changed to the following:
integer(4) function cublasZtpsv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(8), device, dimension(*) :: a, x
2.6.4.26. ztrmv
If you use the cublas_v2 module, the interface for cublasZtrmv is changed to the following:
integer(4) function cublasZtrmv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
2.6.4.27. ztrsv
If you use the cublas_v2 module, the interface for cublasZtrsv is changed to the following:
integer(4) function cublasZtrsv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x
2.6.4.28. zhbmv
If you use the cublas_v2 module, the interface for cublasZhbmv is changed to the following:
integer(4) function cublasZhbmv(h, uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: k, n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.6.4.29. zhemv
If you use the cublas_v2 module, the interface for cublasZhemv is changed to the following:
integer(4) function cublasZhemv(h, uplo, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, lda, incx, incy complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(*) :: x, y complex(8), device :: alpha, beta ! device or host variable
2.6.4.30. zhpmv
If you use the cublas_v2 module, the interface for cublasZhpmv is changed to the following:
integer(4) function cublasZhpmv(h, uplo, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: uplo integer :: n, incx, incy complex(8), device, dimension(*) :: a, x, y complex(8), device :: alpha, beta ! device or host variable
2.6.4.31. zher
If you use the cublas_v2 module, the interface for cublasZher is changed to the following:
integer(4) function cublasZher(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda complex(8), device, dimension(*) :: a, x real(8), device :: alpha ! device or host variable
2.6.4.32. zher2
If you use the cublas_v2 module, the interface for cublasZher2 is changed to the following:
integer(4) function cublasZher2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda complex(8), device, dimension(*) :: a, x, y complex(8), device :: alpha ! device or host variable
2.6.4.33. zhpr
If you use the cublas_v2 module, the interface for cublasZhpr is changed to the following:
integer(4) function cublasZhpr(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx complex(8), device, dimension(*) :: a, x real(8), device :: alpha ! device or host variable
2.6.4.34. zhpr2
If you use the cublas_v2 module, the interface for cublasZhpr2 is changed to the following:
integer(4) function cublasZhpr2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy complex(8), device, dimension(*) :: a, x, y complex(8), device :: alpha ! device or host variable
2.6.4.35. zgemm
If you use the cublas_v2 module, the interface for cublasZgemm is changed to the following:
integer(4) function cublasZgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.6.4.36. zsymm
If you use the cublas_v2 module, the interface for cublasZsymm is changed to the following:
integer(4) function cublasZsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.6.4.37. zsyrk
If you use the cublas_v2 module, the interface for cublasZsyrk is changed to the following:
integer(4) function cublasZsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.6.4.38. zsyr2k
If you use the cublas_v2 module, the interface for cublasZsyr2k is changed to the following:
integer(4) function cublasZsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.6.4.39. zsyrkx
If you use the cublas_v2 module, the interface for cublasZsyrkx is changed to the following:
integer(4) function cublasZsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.6.4.40. ztrmm
If you use the cublas_v2 module, the interface for cublasZtrmm is changed to the following:
integer(4) function cublasZtrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha ! device or host variable
2.6.4.41. ztrsm
If you use the cublas_v2 module, the interface for cublasZtrsm is changed to the following:
integer(4) function cublasZtrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device :: alpha ! device or host variable
2.6.4.42. zhemm
If you use the cublas_v2 module, the interface for cublasZhemm is changed to the following:
integer(4) function cublasZhemm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha, beta ! device or host variable
2.6.4.43. zherk
If you use the cublas_v2 module, the interface for cublasZherk is changed to the following:
integer(4) function cublasZherk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldc, *) :: c real(8), device :: alpha, beta ! device or host variable
2.6.4.44. zher2k
If you use the cublas_v2 module, the interface for cublasZher2k is changed to the following:
integer(4) function cublasZher2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha ! device or host variable real(8), device :: beta ! device or host variable
2.6.4.45. zherkx
If you use the cublas_v2 module, the interface for cublasZherkx is changed to the following:
integer(4) function cublasZherkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(8), device, dimension(lda, *) :: a complex(8), device, dimension(ldb, *) :: b complex(8), device, dimension(ldc, *) :: c complex(8), device :: alpha ! device or host variable real(8), device :: beta ! device or host variable
2.7. CUBLAS XT Module Functions
This section contains interfaces to the cuBLAS XT Module Functions. Users can access this module by inserting the line use cublasXt into the program unit. The cublasXt library is a host-side library, which supports multiple GPUs. Here is an example:
subroutine testxt(n) use cublasXt complex*16 :: a(n,n), b(n,n), c(n,n), alpha, beta type(cublasXtHandle) :: h integer ndevices(1) a = cmplx(1.0d0,0.0d0) b = cmplx(2.0d0,0.0d0) c = cmplx(-1.0d0,0.0d0) alpha = cmplx(1.0d0,0.0d0) beta = cmplx(0.0d0,0.0d0) istat = cublasXtCreate(h) if (istat .ne. CUBLAS_STATUS_SUCCESS) print *,istat ndevices(1) = 0 istat = cublasXtDeviceSelect(h, 1, ndevices) if (istat .ne. CUBLAS_STATUS_SUCCESS) print *,istat istat = cublasXtZgemm(h, CUBLAS_OP_N, CUBLAS_OP_N, & n, n, n, & alpha, A, n, B, n, beta, C, n) if (istat .ne. CUBLAS_STATUS_SUCCESS) print *,istat istat = cublasXtDestroy(h) if (istat .ne. CUBLAS_STATUS_SUCCESS) print *,istat if (all(dble(c).eq.2.0d0*n)) then print *,"Test PASSED" else print *,"Test FAILED" endif end
The cublasXt module contains all the types and definitions from the cublas module, and these additional types and enumerations:
TYPE cublasXtHandle TYPE(C_PTR) :: handle END TYPE
! Pinned memory mode enum, bind(c) enumerator :: CUBLASXT_PINNING_DISABLED=0 enumerator :: CUBLASXT_PINNING_ENABLED=1 end enum
! cublasXtOpType enum, bind(c) enumerator :: CUBLASXT_FLOAT=0 enumerator :: CUBLASXT_DOUBLE=1 enumerator :: CUBLASXT_COMPLEX=2 enumerator :: CUBLASXT_DOUBLECOMPLEX=3 end enum
! cublasXtBlasOp enum, bind(c) enumerator :: CUBLASXT_GEMM=0 enumerator :: CUBLASXT_SYRK=1 enumerator :: CUBLASXT_HERK=2 enumerator :: CUBLASXT_SYMM=3 enumerator :: CUBLASXT_HEMM=4 enumerator :: CUBLASXT_TRSM=5 enumerator :: CUBLASXT_SYR2K=6 enumerator :: CUBLASXT_HER2K=7 enumerator :: CUBLASXT_SPMM=8 enumerator :: CUBLASXT_SYRKX=9 enumerator :: CUBLASXT_HERKX=10 enumerator :: CUBLASXT_TRMM=11 enumerator :: CUBLASXT_ROUTINE_MAX=12 end enum
2.7.1. cublasXtCreate
This function initializes the cublasXt API and creates a handle to an opaque structure holding the cublasXT library context. It allocates hardware resources on the host and device and must be called prior to making any other cublasXt API library calls.
integer(4) function cublasXtcreate(h) type(cublasXtHandle) :: h
2.7.2. cublasXtDestroy
This function releases hardware resources used by the cublasXt API context. This function is usually the last call with a particular handle to the cublasXt API.
integer(4) function cublasXtdestroy(h) type(cublasXtHandle) :: h
2.7.3. cublasXtDeviceSelect
This function allows the user to provide the number of GPU devices and their respective Ids that will participate to the subsequent cublasXt API math function calls. This function will create a cuBLAS context for every GPU provided in that list. Currently the device configuration is static and cannot be changed between math function calls. In that regard, this function should be called only once after cublasXtCreate. To be able to run multiple configurations, multiple cublasXt API contexts should be created.
integer(4) function cublasXtdeviceselect(h, ndevices, deviceid) type(cublasXtHandle) :: h integer :: ndevices integer, dimension(*) :: deviceid
2.7.4. cublasXtSetBlockDim
This function allows the user to set the block dimension used for the tiling of the matrices for the subsequent Math function calls. Matrices are split in square tiles of blockDim x blockDim dimension. This function can be called anytime and will take effect for the following math function calls. The block dimension should be chosen in a way to optimize the math operation and to make sure that the PCI transfers are well overlapped with the computation.
integer(4) function cublasXtsetblockdim(h, blockdim) type(cublasXtHandle) :: h integer :: blockdim
2.7.5. cublasXtGetBlockDim
This function allows the user to query the block dimension used for the tiling of the matrices.
integer(4) function cublasXtgetblockdim(h, blockdim) type(cublasXtHandle) :: h integer :: blockdim
2.7.6. cublasXtSetCpuRoutine
This function allows the user to provide a CPU implementation of the corresponding BLAS routine. This function can be used with the function cublasXtSetCpuRatio() to define an hybrid computation between the CPU and the GPUs. Currently the hybrid feature is only supported for the xGEMM routines.
integer(4) function cublasXtsetcpuroutine(h, blasop, blastype) type(cublasXtHandle) :: h integer :: blasop, blastype
2.7.7. cublasXtSetCpuRatio
This function allows the user to define the percentage of workload that should be done on a CPU in the context of an hybrid computation. This function can be used with the function cublasXtSetCpuRoutine() to define an hybrid computation between the CPU and the GPUs. Currently the hybrid feature is only supported for the xGEMM routines.
integer(4) function cublasXtsetcpuratio(h, blasop, blastype, ratio) type(cublasXtHandle) :: h integer :: blasop, blastype real(4) :: ratio
2.7.8. cublasXtSetPinningMemMode
This function allows the user to enable or disable the Pinning Memory mode. When enabled, the matrices passed in subsequent cublasXt API calls will be pinned/unpinned using the CUDART routine cudaHostRegister and cudaHostUnregister respectively if the matrices are not already pinned. If a matrix happened to be pinned partially, it will also not be pinned. Pinning the memory improve PCI transfer performace and allows to overlap PCI memory transfer with computation. However pinning/unpinning the memory takes some time which might not be amortized. It is advised that the user pins the memory on its own using cudaMallocHost or cudaHostRegister and unpins it when the computation sequence is completed. By default, the Pinning Memory mode is disabled.
integer(4) function cublasXtsetpinningmemmode(h, mode) type(cublasXtHandle) :: h integer :: mode
2.7.9. cublasXtGetPinningMemMode
This function allows the user to query the Pinning Memory mode. By default, the Pinning Memory mode is disabled.
integer(4) function cublasXtgetpinningmemmode(h, mode) type(cublasXtHandle) :: h integer :: mode
2.7.10. cublasXtSgemm
SGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasXtsgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: transa, transb integer(kind=c_intptr_t) :: m, n, k, lda, ldb, ldc real(4), dimension(lda, *) :: a real(4), dimension(ldb, *) :: b real(4), dimension(ldc, *) :: c real(4) :: alpha, beta
2.7.11. cublasXtSsymm
SSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
integer(4) function cublasXtssymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc real(4), dimension(lda, *) :: a real(4), dimension(ldb, *) :: b real(4), dimension(ldc, *) :: c real(4) :: alpha, beta
2.7.12. cublasXtSsyrk
SSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtssyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldc real(4), dimension(lda, *) :: a real(4), dimension(ldc, *) :: c real(4) :: alpha, beta
2.7.13. cublasXtSsyr2k
SSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtssyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc real(4), dimension(lda, *) :: a real(4), dimension(ldb, *) :: b real(4), dimension(ldc, *) :: c real(4) :: alpha, beta
2.7.14. cublasXtSsyrkx
SSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
integer(4) function cublasXtssyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc real(4), dimension(lda, *) :: a real(4), dimension(ldb, *) :: b real(4), dimension(ldc, *) :: c real(4) :: alpha, beta
2.7.15. cublasXtStrmm
STRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ), where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T.
integer(4) function cublasXtstrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo, transa, diag integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc real(4), dimension(lda, *) :: a real(4), dimension(ldb, *) :: b real(4), dimension(ldc, *) :: c real(4) :: alpha
2.7.16. cublasXtStrsm
STRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
integer(4) function cublasXtstrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasXtHandle) :: h integer :: side, uplo, transa, diag integer(kind=c_intptr_t) :: m, n, lda, ldb real(4), dimension(lda, *) :: a real(4), dimension(ldb, *) :: b real(4) :: alpha
2.7.17. cublasXtSspmm
SSPMM performs one of the symmetric packed matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a n by n symmetric matrix stored in packed format, and B and C are m by n matrices.
integer(4) function cublasXtsspmm(h, side, uplo, m, n, alpha, ap, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, ldb, ldc real(4), dimension(*) :: ap real(4), dimension(ldb, *) :: b real(4), dimension(ldc, *) :: c real(4) :: alpha, beta
2.7.18. cublasXtCgemm
CGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasXtcgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: transa, transb integer(kind=c_intptr_t) :: m, n, k, lda, ldb, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldb, *) :: b complex(4), dimension(ldc, *) :: c complex(4) :: alpha, beta
2.7.19. cublasXtChemm
CHEMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is an hermitian matrix and B and C are m by n matrices.
integer(4) function cublasXtchemm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldb, *) :: b complex(4), dimension(ldc, *) :: c complex(4) :: alpha, beta
2.7.20. cublasXtCherk
CHERK performs one of the hermitian rank k operations C := alpha*A*A**H + beta*C, or C := alpha*A**H*A + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtcherk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldc, *) :: c real(4) :: alpha, beta
2.7.21. cublasXtCher2k
CHER2K performs one of the hermitian rank 2k operations C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C, or C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C, where alpha and beta are scalars with beta real, C is an n by n hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtcher2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldb, *) :: b complex(4), dimension(ldc, *) :: c complex(4) :: alpha real(4) :: beta
2.7.22. cublasXtCherkx
CHERKX performs a variation of the hermitian rank k operations C := alpha*A*B**H + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix stored in lower or upper mode, and A and B are n by k matrices. See the CUBLAS documentation for more details.
integer(4) function cublasXtcherkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldb, *) :: b complex(4), dimension(ldc, *) :: c complex(4) :: alpha real(4) :: beta
2.7.23. cublasXtCsymm
CSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
integer(4) function cublasXtcsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldb, *) :: b complex(4), dimension(ldc, *) :: c complex(4) :: alpha, beta
2.7.24. cublasXtCsyrk
CSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtcsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldc, *) :: c complex(4) :: alpha, beta
2.7.25. cublasXtCsyr2k
CSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtcsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldb, *) :: b complex(4), dimension(ldc, *) :: c complex(4) :: alpha, beta
2.7.26. cublasXtCsyrkx
CSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
integer(4) function cublasXtcsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldb, *) :: b complex(4), dimension(ldc, *) :: c complex(4) :: alpha, beta
2.7.27. cublasXtCtrmm
CTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ) where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H.
integer(4) function cublasXtctrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo, transa, diag integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc complex(4), dimension(lda, *) :: a complex(4), dimension(ldb, *) :: b complex(4), dimension(ldc, *) :: c complex(4) :: alpha
2.7.28. cublasXtCtrsm
CTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
integer(4) function cublasXtctrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasXtHandle) :: h integer :: side, uplo, transa, diag integer(kind=c_intptr_t) :: m, n, lda, ldb complex(4), dimension(lda, *) :: a complex(4), dimension(ldb, *) :: b complex(4) :: alpha
2.7.29. cublasXtCspmm
CSPMM performs one of the symmetric packed matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a n by n symmetric matrix stored in packed format, and B and C are m by n matrices.
integer(4) function cublasXtcspmm(h, side, uplo, m, n, alpha, ap, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, ldb, ldc complex(4), dimension(*) :: ap complex(4), dimension(ldb, *) :: b complex(4), dimension(ldc, *) :: c complex(4) :: alpha, beta
2.7.30. cublasXtDgemm
DGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasXtdgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: transa, transb integer(kind=c_intptr_t) :: m, n, k, lda, ldb, ldc real(8), dimension(lda, *) :: a real(8), dimension(ldb, *) :: b real(8), dimension(ldc, *) :: c real(8) :: alpha, beta
2.7.31. cublasXtDsymm
DSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
integer(4) function cublasXtdsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc real(8), dimension(lda, *) :: a real(8), dimension(ldb, *) :: b real(8), dimension(ldc, *) :: c real(8) :: alpha, beta
2.7.32. cublasXtDsyrk
DSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtdsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldc real(8), dimension(lda, *) :: a real(8), dimension(ldc, *) :: c real(8) :: alpha, beta
2.7.33. cublasXtDsyr2k
DSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtdsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc real(8), dimension(lda, *) :: a real(8), dimension(ldb, *) :: b real(8), dimension(ldc, *) :: c real(8) :: alpha, beta
2.7.34. cublasXtDsyrkx
DSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
integer(4) function cublasXtdsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc real(8), dimension(lda, *) :: a real(8), dimension(ldb, *) :: b real(8), dimension(ldc, *) :: c real(8) :: alpha, beta
2.7.35. cublasXtDtrmm
DTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ), where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T.
integer(4) function cublasXtdtrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo, transa, diag integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc real(8), dimension(lda, *) :: a real(8), dimension(ldb, *) :: b real(8), dimension(ldc, *) :: c real(8) :: alpha
2.7.36. cublasXtDtrsm
DTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B.
integer(4) function cublasXtdtrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasXtHandle) :: h integer :: side, uplo, transa, diag integer(kind=c_intptr_t) :: m, n, lda, ldb real(8), dimension(lda, *) :: a real(8), dimension(ldb, *) :: b real(8) :: alpha
2.7.37. cublasXtDspmm
DSPMM performs one of the symmetric packed matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a n by n symmetric matrix stored in packed format, and B and C are m by n matrices.
integer(4) function cublasXtdspmm(h, side, uplo, m, n, alpha, ap, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, ldb, ldc real(8), dimension(*) :: ap real(8), dimension(ldb, *) :: b real(8), dimension(ldc, *) :: c real(8) :: alpha, beta
2.7.38. cublasXtZgemm
ZGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
integer(4) function cublasXtzgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: transa, transb integer(kind=c_intptr_t) :: m, n, k, lda, ldb, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldb, *) :: b complex(8), dimension(ldc, *) :: c complex(8) :: alpha, beta
2.7.39. cublasXtZhemm
ZHEMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is an hermitian matrix and B and C are m by n matrices.
integer(4) function cublasXtzhemm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldb, *) :: b complex(8), dimension(ldc, *) :: c complex(8) :: alpha, beta
2.7.40. cublasXtZherk
ZHERK performs one of the hermitian rank k operations C := alpha*A*A**H + beta*C, or C := alpha*A**H*A + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtzherk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldc, *) :: c real(8) :: alpha, beta
2.7.41. cublasXtZher2k
ZHER2K performs one of the hermitian rank 2k operations C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C, or C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C, where alpha and beta are scalars with beta real, C is an n by n hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtzher2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldb, *) :: b complex(8), dimension(ldc, *) :: c complex(8) :: alpha real(8) :: beta
2.7.42. cublasXtZherkx
ZHERKX performs a variation of the hermitian rank k operations C := alpha*A*B**H + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix stored in lower or upper mode, and A and B are n by k matrices. See the CUBLAS documentation for more details.
integer(4) function cublasXtzherkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldb, *) :: b complex(8), dimension(ldc, *) :: c complex(8) :: alpha real(8) :: beta
2.7.43. cublasXtZsymm
ZSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
integer(4) function cublasXtzsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldb, *) :: b complex(8), dimension(ldc, *) :: c complex(8) :: alpha, beta
2.7.44. cublasXtZsyrk
ZSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case.
integer(4) function cublasXtzsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldc, *) :: c complex(8) :: alpha, beta
2.7.45. cublasXtZsyr2k
ZSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case.
integer(4) function cublasXtzsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldb, *) :: b complex(8), dimension(ldc, *) :: c complex(8) :: alpha, beta
2.7.46. cublasXtZsyrkx
ZSYRKX performs a variation of the symmetric rank k update C := alpha*A*B**T + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix stored in lower or upper mode, and A and B are n by k matrices. This routine can be used when B is in such a way that the result is guaranteed to be symmetric. See the CUBLAS documentation for more details.
integer(4) function cublasXtzsyrkx(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: uplo, trans integer(kind=c_intptr_t) :: n, k, lda, ldb, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldb, *) :: b complex(8), dimension(ldc, *) :: c complex(8) :: alpha, beta
2.7.47. cublasXtZtrmm
ZTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ) where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H.
integer(4) function cublasXtztrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo, transa, diag integer(kind=c_intptr_t) :: m, n, lda, ldb, ldc complex(8), dimension(lda, *) :: a complex(8), dimension(ldb, *) :: b complex(8), dimension(ldc, *) :: c complex(8) :: alpha
2.7.48. cublasXtZtrsm
ZTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B.
integer(4) function cublasXtztrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasXtHandle) :: h integer :: side, uplo, transa, diag integer(kind=c_intptr_t) :: m, n, lda, ldb complex(8), dimension(lda, *) :: a complex(8), dimension(ldb, *) :: b complex(8) :: alpha
2.7.49. cublasXtZspmm
ZSPMM performs one of the symmetric packed matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a n by n symmetric matrix stored in packed format, and B and C are m by n matrices.
integer(4) function cublasXtzspmm(h, side, uplo, m, n, alpha, ap, b, ldb, beta, c, ldc) type(cublasXtHandle) :: h integer :: side, uplo integer(kind=c_intptr_t) :: m, n, ldb, ldc complex(8), dimension(*) :: ap complex(8), dimension(ldb, *) :: b complex(8), dimension(ldc, *) :: c complex(8) :: alpha, beta
2.8. CUBLAS DEVICE Module Functions
This section contains interfaces to the cuBLAS functions accessible from device code. CUDA Fortran users can access this module by inserting the line use cublas_device into the program unit. OpenACC users can access this module by inserting the line use openacc_cublas into the program unit. Examples for making cuBLAS calls from device code are included in Chapter 6.
The cublas_device module and the openacc_cublas module contain all the types and definitions from the cublas module:
TYPE cublasHandle TYPE(C_PTR) :: handle END TYPE
Each device module contains the following enumerations:
enum, bind(c) enumerator :: CUBLAS_STATUS_SUCCESS =0 enumerator :: CUBLAS_STATUS_NOT_INITIALIZED =1 enumerator :: CUBLAS_STATUS_ALLOC_FAILED =3 enumerator :: CUBLAS_STATUS_INVALID_VALUE =7 enumerator :: CUBLAS_STATUS_ARCH_MISMATCH =8 enumerator :: CUBLAS_STATUS_MAPPING_ERROR =11 enumerator :: CUBLAS_STATUS_EXECUTION_FAILED=13 enumerator :: CUBLAS_STATUS_INTERNAL_ERROR =14 end enum
enum, bind(c) enumerator :: CUBLAS_FILL_MODE_LOWER=0 enumerator :: CUBLAS_FILL_MODE_UPPER=1 end enum
enum, bind(c) enumerator :: CUBLAS_DIAG_NON_UNIT=0 enumerator :: CUBLAS_DIAG_UNIT=1 end enum
enum, bind(c) enumerator :: CUBLAS_SIDE_LEFT =0 enumerator :: CUBLAS_SIDE_RIGHT=1 end enum
enum, bind(c) enumerator :: CUBLAS_OP_N=0 enumerator :: CUBLAS_OP_T=1 enumerator :: CUBLAS_OP_C=2 end enum
enum, bind(c) enumerator :: CUBLAS_POINTER_MODE_HOST = 0 enumerator :: CUBLAS_POINTER_MODE_DEVICE = 1 end enum
2.8.1. Device Library Helper Functions
This section contains the cuBLAS interfaces to the device-side single precision BLAS and cuBLAS functions and subroutines.
2.8.1.1. cublasCreate
This function initializes the CUBLAS library and creates a handle to an opaque structure holding the CUBLAS library context. It allocates hardware resources on the host and device and must be called prior to making any other CUBLAS library calls. The CUBLAS library context is tied to the current CUDA device. To use the library on multiple devices, one CUBLAS handle needs to be created for each device. Furthermore, for a given device, multiple CUBLAS handles with different configuration can be created. Because cublasCreate allocates some internal resources and the release of those resources by calling cublasDestroy will implicitly call cublasDeviceSynchronize, it is recommended to minimize the number of cublasCreate/cublasDestroy occurences. For multi-threaded applications that use the same device from different threads, the recommended programming model is to create one CUBLAS handle per thread and use that CUBLAS handle for the entire life of the thread. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasCreate(handle) type(cublasHandle) :: handle
2.8.1.2. cublasDestroy
This function releases hardware resources used by the CUBLAS library. This function is usually the last call with a particular handle to the CUBLAS library. Because cublasCreate allocates some internal resources and the release of those resources by calling cublasDestroy will implicitly call cublasDeviceSynchronize, it is recommended to minimize the number of cublasCreate/cublasDestroy occurences. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasDestroy(handle) type(cublasHandle) :: handle
2.8.1.3. cublasGetVersion
This function returns the version number of the cuBLAS library. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasGetVersion(handle, version) type(cublasHandle) :: handle integer(4) :: version
2.8.1.4. cublasSetStream
This function sets the cuBLAS library stream, which will be used to execute all subsequent calls to the cuBLAS library functions. If the cuBLAS library stream is not set, all kernels use the default NULL stream. In particular, this routine can be used to change the stream between kernel launches and then to reset the cuBLAS library stream back to NULL. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasSetStream(handle, stream) type(cublasHandle) :: handle integer(kind=cuda_stream_kind()) :: stream
2.8.1.5. cublasGetStream
This function gets the cuBLAS library stream, which is being used to execute all calls to the cuBLAS library functions. If the cuBLAS library stream is not set, all kernels use the default NULL stream. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasGetStream(handle, stream) type(cublasHandle) :: handle integer(kind=cuda_stream_kind()) :: stream
2.8.2. Single Precision Functions and Subroutines
This section contains the cuBLAS interfaces to the device-side single precision BLAS and cuBLAS functions and subroutines.
2.8.2.1. cublasIsamax
ISAMAX finds the index of the first element having maximum absolute value. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasisamax(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx integer :: res
2.8.2.2. cublasIsamin
ISAMIN finds the index of the first element having minimum absolute value. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasisamin(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx integer :: res
2.8.2.3. cublasSasum
SASUM takes the sum of the absolute values. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassasum(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx real(4) :: res
2.8.2.4. cublasSaxpy
SAXPY constant times a vector plus a vector. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassaxpy(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(4) :: a real(4), device, dimension(*) :: x, y integer :: incx, incy
2.8.2.5. cublasScopy
SCOPY copies a vector, x, to a vector, y. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasscopy(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy
2.8.2.6. cublasSdot
SDOT forms the dot product of two vectors. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassdot(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy real(4) :: res
2.8.2.7. cublasSnrm2
SNRM2 returns the euclidean norm of a vector via the function name, so that SNRM2 := sqrt( x'*x ). Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassnrm2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x integer :: incx real(4) :: res
2.8.2.8. cublasSrot
SROT applies a plane rotation. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassrot(h, n, x, incx, y, incy, sc, ss) type(cublasHandle) :: h integer :: n real(4) :: sc, ss real(4), device, dimension(*) :: x, y integer :: incx, incy
2.8.2.9. cublasSrotg
SROTG constructs a Givens plane rotation. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassrotg(h, sa, sb, sc, ss) type(cublasHandle) :: h real(4) :: sa, sb, sc, ss
2.8.2.10. cublasSrotm
SROTM applies the modified Givens transformation, H, to the 2 by N matrix (SX**T) , where **T indicates transpose. The elements of SX are in (SX**T) SX(LX+I*INCX), I = 0 to N-1, where LX = 1 if INCX .GE. 0, ELSE LX = (-INCX)*N, and similarly for SY using LY and INCY. With SPARAM(1)=SFLAG, H has one of the following forms.. SFLAG=-1.E0 SFLAG=0.E0 SFLAG=1.E0 SFLAG=-2.E0 (SH11 SH12) (1.E0 SH12) (SH11 1.E0) (1.E0 0.E0) H=( ) ( ) ( ) ( ) (SH21 SH22), (SH21 1.E0), (-1.E0 SH22), (0.E0 1.E0). See SROTMG for a description of data storage in SPARAM. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassrotm(h, n, x, incx, y, incy, param) type(cublasHandle) :: h integer :: n real(4) :: param(*) real(4), device, dimension(*) :: x, y integer :: incx, incy
2.8.2.11. cublasSrotmg
SFLAG=-1.E0 SFLAG=0.E0 SFLAG=1.E0 SFLAG=-2.E0 (SH11 SH12) (1.E0 SH12) (SH11 1.E0) (1.E0 0.E0) H=( ) ( ) ( ) ( ) (SH21 SH22), (SH21 1.E0), (-1.E0 SH22), (0.E0 1.E0).Locations 2-4 of SPARAM contain SH11,SH21,SH12, and SH22 respectively. (Values of 1.E0, -1.E0, or 0.E0 implied by the value of SPARAM(1) are not stored in SPARAM.) Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassrotmg(h, d1, d2, x1, y1, param) type(cublasHandle) :: h real(4) :: d1, d2, x1, y1, param(*)
2.8.2.12. cublasSscal
SSCAL scales a vector by a constant. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(4) :: a real(4), device, dimension(*) :: x integer :: incx
2.8.2.13. cublasSswap
SSWAP interchanges two vectors. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassswap(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(4), device, dimension(*) :: x, y integer :: incx, incy
2.8.2.14. cublasSgbmv
SGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4) :: alpha, beta
2.8.2.15. cublasSgemv
SGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4) :: alpha, beta
2.8.2.16. cublasSger
SGER performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassger(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4) :: alpha
2.8.2.17. cublasSsbmv
SSBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric band matrix, with k super-diagonals. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasssbmv(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: k, n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4) :: alpha, beta
2.8.2.18. cublasSspmv
SSPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassspmv(h, t, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y real(4) :: alpha, beta
2.8.2.19. cublasSspr
SSPR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassspr(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx real(4), device, dimension(*) :: a, x real(4) :: alpha
2.8.2.20. cublasSspr2
SSPR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassspr2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(4), device, dimension(*) :: a, x, y real(4) :: alpha
2.8.2.21. cublasSsymv
SSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasssymv(h, t, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, lda, incx, incy real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4) :: alpha, beta
2.8.2.22. cublasSsyr
SSYR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasssyr(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x real(4) :: alpha
2.8.2.23. cublasSsyr2
SSYR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasssyr2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x, y real(4) :: alpha
2.8.2.24. cublasStbmv
STBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasstbmv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.8.2.25. cublasStbsv
STBSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasstbsv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.8.2.26. cublasStpmv
STPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasstpmv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x
2.8.2.27. cublasStpsv
STPSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasstpsv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx real(4), device, dimension(*) :: a, x
2.8.2.28. cublasStrmv
STRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasstrmv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.8.2.29. cublasStrsv
STRSV solves one of the systems of equations A*x = b, or A**T*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasstrsv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda real(4), device, dimension(lda, *) :: a real(4), device, dimension(*) :: x
2.8.2.30. cublasSgemm
SGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublassgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4) :: alpha, beta
2.8.2.31. cublasSsymm
SSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasssymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4) :: alpha, beta
2.8.2.32. cublasSsyrk
SSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasssyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldc, *) :: c real(4) :: alpha, beta
2.8.2.33. cublasSsyr2k
SSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasssyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4) :: alpha, beta
2.8.2.34. cublasStrmm
STRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ), where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasstrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4), device, dimension(ldc, *) :: c real(4) :: alpha
2.8.2.35. cublasStrsm
STRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasstrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb real(4), device, dimension(lda, *) :: a real(4), device, dimension(ldb, *) :: b real(4) :: alpha
2.8.2.36. cublasSgemmBatched
SGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasSgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa integer :: transb integer :: m, n, k real(4) :: alpha type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb real(4) :: beta type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
2.8.2.37. cublasSgetrfBatched
SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasSgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) integer, device :: info(*) integer :: batchCount
2.8.2.38. cublasSgetriBatched
SGETRI computes the inverse of a matrix using the LU factorization computed by SGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A). Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasSgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Carray(*) integer :: ldc integer, device :: info(*) integer :: batchCount
2.8.2.39. cublasSgetrsBatched
SGETRS solves a system of linear equations A * X = B or A**T * X = B with a general N-by-N matrix A using the LU factorization computed by SGETRF. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasSgetrsBatched(h, trans, n, nrhs, A, lda, ipvt, B, ldb, info, batchCount) type(cublasHandle) :: h integer :: trans integer :: n, nrhs type(c_devptr), device :: A(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: B(*) integer :: ldb integer, device :: info(*) integer :: batchCount
2.8.2.40. cublasStrsmBatched
STRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T. The matrix X is overwritten on B. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasStrsmBatched(h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side integer :: uplo integer :: trans integer :: diag integer :: m, n real(4) :: alpha type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
2.8.2.41. cublasSmatinvBatched
cublasSmatinvBatched is a short cut of cublasSgetrfBatched plus cublasSgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasSgetrfBatched and cublasSgetriBatched. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasSmatinvBatched(h, n, A, lda, Ainv, lda_inv, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: Ainv(*) integer :: lda_inv integer, device :: info(*) integer :: batchCount
2.8.2.42. cublasSgeqrfBatched
SGEQRF computes a QR factorization of a real M-by-N matrix A: A = Q * R. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasSgeqrfBatched(h, m, n, A, lda, Tau, info, batchCount) type(cublasHandle) :: h integer :: m, n type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: Tau(*) integer, device :: info(*) integer :: batchCount
2.8.2.43. cublasSgelsBatched
SGELS solves overdetermined or underdetermined real linear systems involving an M-by-N matrix A, or its transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = 'N' and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = 'N' and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = 'T' and m >= n: find the minimum norm solution of an undetermined system A**T * X = B. 4. If TRANS = 'T' and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**T * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasSgelsBatched(h, trans, m, n, nrhs, A, lda, C, ldc, info, dinfo, batchCount) type(cublasHandle) :: h integer :: trans integer :: m, n, nrhs type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: C(*) integer :: ldc integer, device :: info(*) integer, device :: dinfo(*) integer :: batchCount
2.8.3. Single Precision Complex Functions and Subroutines
This section contains the cuBLAS interfaces to the device-side single precision complex BLAS and cuBLAS functions and subroutines.
2.8.3.1. cublasCaxpy
CAXPY constant times a vector plus a vector. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascaxpy(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(4) :: a complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.8.3.2. cublasCcopy
CCOPY copies a vector x to a vector y. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasccopy(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.8.3.3. cublasCdotc
forms the dot product of two vectors, conjugating the first vector. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascdotc(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy complex(4) :: res
2.8.3.4. cublasCdotu
CDOTU forms the dot product of two vectors. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascdotu(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy complex(4) :: res
2.8.3.5. cublasCrot
CROT applies a plane rotation, where the cos (C) is real and the sin (S) is complex, and the vectors CX and CY are complex. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascrot(h, n, x, incx, y, incy, sc, cs) type(cublasHandle) :: h integer :: n real(4) :: sc complex(4) :: cs complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.8.3.6. cublasCscal
CSCAL scales a vector by a constant. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n complex(4) :: a complex(4), device, dimension(*) :: x integer :: incx
2.8.3.7. cublasCsscal
CSSCAL scales a complex vector by a real constant. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascsscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(4) :: a complex(4), device, dimension(*) :: x integer :: incx
2.8.3.8. cublasCswap
CSWAP interchanges two vectors. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascswap(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x, y integer :: incx, incy
2.8.3.9. cublasIcamax
ICAMAX finds the index of the first element having maximum absolute value. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasicamax(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx integer :: res
2.8.3.10. cublasIcamin
ICAMIN finds the index of the first element having minimum absolute value. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasicamin(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx integer :: res
2.8.3.11. cublasScasum
SCASUM takes the sum of the absolute values of a complex vector and returns a single precision result. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasscasum(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx real(4) :: res
2.8.3.12. cublasScnrm2
SCNRM2 returns the euclidean norm of a vector via the function name, so that SCNRM2 := sqrt( x**H*x ) Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasscnrm2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n complex(4), device, dimension(*) :: x integer :: incx real(4) :: res
2.8.3.13. cublasCgbmv
CGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4) :: alpha, beta
2.8.3.14. cublasCgemv
CGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, or y := alpha*A**H*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4) :: alpha, beta
2.8.3.15. cublasCgerc
CGERC performs the rank 1 operation A := alpha*x*y**H + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascgerc(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4) :: alpha
2.8.3.16. cublasCgeru
CGERU performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascgeru(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4) :: alpha
2.8.3.17. cublasChbmv
CHBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian band matrix, with k super-diagonals. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublaschbmv(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: k, n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4) :: alpha, beta
2.8.3.18. cublasChemv
CHEMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublaschemv(h, t, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, lda, incx, incy complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4) :: alpha, beta
2.8.3.19. cublasCher
CHER performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascher(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x real(4) :: alpha
2.8.3.20. cublasCher2
CHER2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascher2(h, t, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, incy, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x, y complex(4) :: alpha
2.8.3.21. cublasChpmv
CHPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublaschpmv(h, t, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y complex(4) :: alpha, beta
2.8.3.22. cublasChpr
CHPR performs the hermitian rank 1 operation A := alpha*x*x**H + A, where alpha is a real scalar, x is an n element vector and A is an n by n hermitian matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublaschpr(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx complex(4), device, dimension(*) :: a, x real(4) :: alpha
2.8.3.23. cublasChpr2
CHPR2 performs the hermitian rank 2 operation A := alpha*x*y**H + conjg( alpha )*y*x**H + A, where alpha is a scalar, x and y are n element vectors and A is an n by n hermitian matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublaschpr2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy complex(4), device, dimension(*) :: a, x, y complex(4) :: alpha
2.8.3.24. cublasCtbmv
CTBMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasctbmv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.8.3.25. cublasCtbsv
CTBSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular band matrix, with ( k + 1 ) diagonals. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasctbsv(h, u, t, d, n, k, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, k, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.8.3.26. cublasCtpmv
CTPMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasctpmv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x
2.8.3.27. cublasCtpsv
CTPSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasctpsv(h, u, t, d, n, a, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx complex(4), device, dimension(*) :: a, x
2.8.3.28. cublasCtrmv
CTRMV performs one of the matrix-vector operations x := A*x, or x := A**T*x, or x := A**H*x, where x is an n element vector and A is an n by n unit, or non-unit, upper or lower triangular matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasctrmv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.8.3.29. cublasCtrsv
CTRSV solves one of the systems of equations A*x = b, or A**T*x = b, or A**H*x = b, where b and x are n element vectors and A is an n by n unit, or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this routine. Such tests must be performed before calling this routine. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasctrsv(h, u, t, d, n, a, lda, x, incx) type(cublasHandle) :: h integer :: u, t, d integer :: n, incx, lda complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(*) :: x
2.8.3.30. cublasCgemm
CGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascgemm(h, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: transa, transb integer :: m, n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4) :: alpha, beta
2.8.3.31. cublasChemm
CHEMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is an hermitian matrix and B and C are m by n matrices. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublaschemm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4) :: alpha, beta
2.8.3.32. cublasCherk
CHERK performs one of the hermitian rank k operations C := alpha*A*A**H + beta*C, or C := alpha*A**H*A + beta*C, where alpha and beta are real scalars, C is an n by n hermitian matrix and A is an n by k matrix in the first case and a k by n matrix in the second case. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascherk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldc, *) :: c real(4) :: alpha, beta
2.8.3.33. cublasCher2k
CHER2K performs one of the hermitian rank 2k operations C := alpha*A*B**H + conjg( alpha )*B*A**H + beta*C, or C := alpha*A**H*B + conjg( alpha )*B**H*A + beta*C, where alpha and beta are scalars with beta real, C is an n by n hermitian matrix and A and B are n by k matrices in the first case and k by n matrices in the second case. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascher2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4) :: alpha real(4) :: beta
2.8.3.34. cublasCsymm
CSYMM performs one of the matrix-matrix operations C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascsymm(h, side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: side, uplo integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4) :: alpha, beta
2.8.3.35. cublasCsyrk
CSYRK performs one of the symmetric rank k operations C := alpha*A*A**T + beta*C, or C := alpha*A**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A is an n by k matrix in the first case and a k by n matrix in the second case. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascsyrk(h, uplo, trans, n, k, alpha, a, lda, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldc, *) :: c complex(4) :: alpha, beta
2.8.3.36. cublasCsyr2k
CSYR2K performs one of the symmetric rank 2k operations C := alpha*A*B**T + alpha*B*A**T + beta*C, or C := alpha*A**T*B + alpha*B**T*A + beta*C, where alpha and beta are scalars, C is an n by n symmetric matrix and A and B are n by k matrices in the first case and k by n matrices in the second case. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublascsyr2k(h, uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) type(cublasHandle) :: h integer :: uplo, trans integer :: n, k, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4) :: alpha, beta
2.8.3.37. cublasCtrmm
CTRMM performs one of the matrix-matrix operations B := alpha*op( A )*B, or B := alpha*B*op( A ) where alpha is a scalar, B is an m by n matrix, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasctrmm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb, c, ldc) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb, ldc complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4), device, dimension(ldc, *) :: c complex(4) :: alpha
2.8.3.38. cublasCtrsm
CTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasctrsm(h, side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) type(cublasHandle) :: h integer :: side, uplo, transa, diag integer :: m, n, lda, ldb complex(4), device, dimension(lda, *) :: a complex(4), device, dimension(ldb, *) :: b complex(4) :: alpha
2.8.3.39. cublasCgemmBatched
CGEMM performs one of the matrix-matrix operations C := alpha*op( A )*op( B ) + beta*C, where op( X ) is one of op( X ) = X or op( X ) = X**T or op( X ) = X**H, alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasCgemmBatched(h, transa, transb, m, n, k, alpha, Aarray, lda, Barray, ldb, beta, Carray, ldc, batchCount) type(cublasHandle) :: h integer :: transa integer :: transb integer :: m, n, k complex(4) :: alpha type(c_devptr), device :: Aarray(*) integer :: lda type(c_devptr), device :: Barray(*) integer :: ldb complex(4) :: beta type(c_devptr), device :: Carray(*) integer :: ldc integer :: batchCount
2.8.3.40. cublasCgetrfBatched
CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges. The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n). This is the right-looking Level 3 BLAS version of the algorithm. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasCgetrfBatched(h, n, Aarray, lda, ipvt, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) integer, device :: info(*) integer :: batchCount
2.8.3.41. cublasCgetriBatched
CGETRI computes the inverse of a matrix using the LU factorization computed by CGETRF. This method inverts U and then computes inv(A) by solving the system inv(A)*L = inv(U) for inv(A). Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasCgetriBatched(h, n, Aarray, lda, ipvt, Carray, ldc, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: Aarray(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: Carray(*) integer :: ldc integer, device :: info(*) integer :: batchCount
2.8.3.42. cublasCgetrsBatched
CGETRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general N-by-N matrix A using the LU factorization computed by CGETRF. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasCgetrsBatched(h, trans, n, nrhs, A, lda, ipvt, B, ldb, info, batchCount) type(cublasHandle) :: h integer :: trans integer :: n, nrhs type(c_devptr), device :: A(*) integer :: lda integer, device :: ipvt(*) type(c_devptr), device :: B(*) integer :: ldb integer, device :: info(*) integer :: batchCount
2.8.3.43. cublasCtrsmBatched
CTRSM solves one of the matrix equations op( A )*X = alpha*B, or X*op( A ) = alpha*B, where alpha is a scalar, X and B are m by n matrices, A is a unit, or non-unit, upper or lower triangular matrix and op( A ) is one of op( A ) = A or op( A ) = A**T or op( A ) = A**H. The matrix X is overwritten on B. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasCtrsmBatched(h, side, uplo, trans, diag, m, n, alpha, A, lda, B, ldb, batchCount) type(cublasHandle) :: h integer :: side integer :: uplo integer :: trans integer :: diag integer :: m, n complex(4) :: alpha type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: B(*) integer :: ldb integer :: batchCount
2.8.3.44. cublasCmatinvBatched
cublasCmatinvBatched is a short cut of cublasCgetrfBatched plus cublasCgetriBatched. However it only works if n is less than 32. If not, the user has to go through cublasCgetrfBatched and cublasCgetriBatched. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasCmatinvBatched(h, n, A, lda, Ainv, lda_inv, info, batchCount) type(cublasHandle) :: h integer :: n type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: Ainv(*) integer :: lda_inv integer, device :: info(*) integer :: batchCount
2.8.3.45. cublasCgeqrfBatched
CGEQRF computes a QR factorization of a complex M-by-N matrix A: A = Q * R. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasCgeqrfBatched(h, m, n, A, lda, Tau, info, batchCount) type(cublasHandle) :: h integer :: m, n type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: Tau(*) integer, device :: info(*) integer :: batchCount
2.8.3.46. cublasCgelsBatched
CGELS solves overdetermined or underdetermined complex linear systems involving an M-by-N matrix A, or its conjugate-transpose, using a QR or LQ factorization of A. It is assumed that A has full rank. The following options are provided: 1. If TRANS = 'N' and m >= n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A*X ||. 2. If TRANS = 'N' and m < n: find the minimum norm solution of an underdetermined system A * X = B. 3. If TRANS = 'C' and m >= n: find the minimum norm solution of an undetermined system A**H * X = B. 4. If TRANS = 'C' and m < n: find the least squares solution of an overdetermined system, i.e., solve the least squares problem minimize || B - A**H * X ||. Several right hand side vectors b and solution vectors x can be handled in a single call; they are stored as the columns of the M-by-NRHS right hand side matrix B and the N-by-NRHS solution matrix X. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasCgelsBatched(h, trans, m, n, nrhs, A, lda, C, ldc, info, dinfo, batchCount) type(cublasHandle) :: h integer :: trans integer :: m, n, nrhs type(c_devptr), device :: A(*) integer :: lda type(c_devptr), device :: C(*) integer :: ldc integer, device :: info(*) integer, device :: dinfo(*) integer :: batchCount
2.8.4. Double Precision Functions and Subroutines
This section contains the cuBLAS interfaces to the device-side double precision BLAS and cuBLAS functions and subroutines.
2.8.4.1. cublasIdamax
IDAMAX finds the the index of the first element having maximum absolute value. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasidamax(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx integer :: res
2.8.4.2. cublasIdamin
IDAMIN finds the index of the first element having minimum absolute value. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasidamin(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx integer :: res
2.8.4.3. cublasDasum
DASUM takes the sum of the absolute values. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdasum(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx real(8) :: res
2.8.4.4. cublasDaxpy
DAXPY constant times a vector plus a vector. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdaxpy(h, n, a, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(8) :: a real(8), device, dimension(*) :: x, y integer :: incx, incy
2.8.4.5. cublasDcopy
DCOPY copies a vector, x, to a vector, y. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdcopy(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy
2.8.4.6. cublasDdot
DDOT forms the dot product of two vectors. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasddot(h, n, x, incx, y, incy, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy real(8) :: res
2.8.4.7. cublasDnrm2
DNRM2 returns the euclidean norm of a vector via the function name, so that DNRM2 := sqrt( x'*x ) Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdnrm2(h, n, x, incx, res) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x integer :: incx real(8) :: res
2.8.4.8. cublasDrot
DROT applies a plane rotation. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdrot(h, n, x, incx, y, incy, dc, ds) type(cublasHandle) :: h integer :: n real(8) :: dc, ds real(8), device, dimension(*) :: x, y integer :: incx, incy
2.8.4.9. cublasDrotg
DROTG constructs a Givens plane rotation. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdrotg(h, sa, sb, sc, ss) type(cublasHandle) :: h real(8) :: sa, sb, sc, ss
2.8.4.10. cublasDrotm
DROTM applies the modified Givens transformation, H, to the 2 by N matrix (DX**T) , where **T indicates transpose. The elements of DX are in (DX**T) DX(LX+I*INCX), I = 0 to N-1, where LX = 1 if INCX .GE. 0, ELSE LX = (-INCX)*N, and similarly for DY using LY and INCY. With DPARAM(1)=DFLAG, H has one of the following forms.. DFLAG=-1.D0 DFLAG=0.D0 DFLAG=1.D0 DFLAG=-2.D0 (DH11 DH12) (1.D0 DH12) (DH11 1.D0) (1.D0 0.D0) H=( ) ( ) ( ) ( ) (DH21 DH22), (DH21 1.D0), (-1.D0 DH22), (0.D0 1.D0). See DROTMG for a description of data storage in DPARAM. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdrotm(h, n, x, incx, y, incy, param) type(cublasHandle) :: h integer :: n real(8) :: param(*) real(8), device, dimension(*) :: x, y integer :: incx, incy
2.8.4.11. cublasDrotmg
DFLAG=-1.D0 DFLAG=0.D0 DFLAG=1.D0 DFLAG=-2.D0 (DH11 DH12) (1.D0 DH12) (DH11 1.D0) (1.D0 0.D0) H=( ) ( ) ( ) ( ) (DH21 DH22), (DH21 1.D0), (-1.D0 DH22), (0.D0 1.D0).Locations 2-4 of DPARAM contain DH11, DH21, DH12, and DH22 respectively. (Values of 1.D0, -1.D0, of 0.D0 implied by the value of DPARAM(1) are not stored in DPARAM.) Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdrotmg(h, d1, d2, x1, y1, param) type(cublasHandle) :: h real(8) :: d1, d2, x1, y1, param(*)
2.8.4.12. cublasDscal
DSCAL scales a vector by a constant. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdscal(h, n, a, x, incx) type(cublasHandle) :: h integer :: n real(8) :: a real(8), device, dimension(*) :: x integer :: incx
2.8.4.13. cublasDswap
interchanges two vectors. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdswap(h, n, x, incx, y, incy) type(cublasHandle) :: h integer :: n real(8), device, dimension(*) :: x, y integer :: incx, incy
2.8.4.14. cublasDgbmv
DGBMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n band matrix, with kl sub-diagonals and ku super-diagonals. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdgbmv(h, t, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, kl, ku, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8) :: alpha, beta
2.8.4.15. cublasDgemv
DGEMV performs one of the matrix-vector operations y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y, where alpha and beta are scalars, x and y are vectors and A is an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdgemv(h, t, m, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8) :: alpha, beta
2.8.4.16. cublasDger
DGER performs the rank 1 operation A := alpha*x*y**T + A, where alpha is a scalar, x is an m element vector, y is an n element vector and A is an m by n matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdger(h, m, n, alpha, x, incx, y, incy, a, lda) type(cublasHandle) :: h integer :: m, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8) :: alpha
2.8.4.17. cublasDsbmv
DSBMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric band matrix, with k super-diagonals. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdsbmv(h, t, n, k, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: k, n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8) :: alpha, beta
2.8.4.18. cublasDspmv
DSPMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdspmv(h, t, n, alpha, a, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y real(8) :: alpha, beta
2.8.4.19. cublasDspr
DSPR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdspr(h, t, n, alpha, x, incx, a) type(cublasHandle) :: h integer :: t integer :: n, incx real(8), device, dimension(*) :: a, x real(8) :: alpha
2.8.4.20. cublasDspr2
DSPR2 performs the symmetric rank 2 operation A := alpha*x*y**T + alpha*y*x**T + A, where alpha is a scalar, x and y are n element vectors and A is an n by n symmetric matrix, supplied in packed form. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdspr2(h, t, n, alpha, x, incx, y, incy, a) type(cublasHandle) :: h integer :: t integer :: n, incx, incy real(8), device, dimension(*) :: a, x, y real(8) :: alpha
2.8.4.21. cublasDsymv
DSYMV performs the matrix-vector operation y := alpha*A*x + beta*y, where alpha and beta are scalars, x and y are n element vectors and A is an n by n symmetric matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdsymv(h, t, n, alpha, a, lda, x, incx, beta, y, incy) type(cublasHandle) :: h integer :: t integer :: n, lda, incx, incy real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x, y real(8) :: alpha, beta
2.8.4.22. cublasDsyr
DSYR performs the symmetric rank 1 operation A := alpha*x*x**T + A, where alpha is a real scalar, x is an n element vector and A is an n by n symmetric matrix. Device Functions are declared "attributes(device)" in CUDA Fortran and "!$acc routine() seq" in OpenACC.
integer(4) function cublasdsyr(h, t, n, alpha, x, incx, a, lda) type(cublasHandle) :: h integer :: t integer :: n, incx, lda real(8), device, dimension(lda, *) :: a real(8), device, dimension(*) :: x real(8) :: alpha