API User Guide#

Compute Model#

The cuEST API provides high-performance GPU implementations of the rate-limiting contributions to DFT and electronic structure problems. Rather than providing an end-to-end solution, cuEST provides routines that can be easily integrated into existing DFT software packages to provide performance uplift on NVIDIA GPUs. Access to these routines is through an imperative C API (Python bindings of the C API are also provided). Due to the verbosity of the electronic structure problem in gaussian orbitals and the limitations of a C API, there are several key concepts that cuEST users must understand.

  • cuEST Objects: All cuEST objects are associated with an opaque “handle”. These handles contain configuration parameters and pointers to internal resources, providing the user with a convenient way to interact with the cuEST object. For example, things like the definition of the basis set and the DFT integration grid live in these opaque handles. Every handle has paired Create and Destroy routines that enable the user to explicitly manage the lifetime of the handle. These handles are used as input to create other handles and as input to perform the desired computations. It is the users responsibility to make sure that the handles remain live while they are needed or in use.

    Additionally, cuEST objects can be queried, allowing the user to extract configuration information and metadata from the object during its lifetime. For example, a user might query a basis set handle for the total number of basis functions present.

  • Compute Routines: The cuEST objects are simply data structures that enable the evaluation of the quantities needed to perform gaussian orbital DFT calculations. These quantities (for example, the overlap matrix, exchange correlation potential matrix, kinetic energy integral derivatives, etc.) are accessed through Compute routines. Generally, these compute routines take as input some number of cuEST handles, some configuration parameters and some user data (often as raw arrays on either the host or device). For example, a gradient routine might take a density matrix as input in a one-dimensional array, contract against some integral derivatives (as defined by one or more cuEST handles) and write the result into another user-provided output array.

  • Memory Model: cuEST does not allocate GPU memory inside library function calls. The user must make and manage all of the GPU memory allocations themselves. Prior to making the desired function calls (Create or Compute calls), the user makes a WorkspaceQuery call that tells the user how much memory to allocate through a workspaceDescriptor struct. Then, the users allocates the requested memory and passes the memory as a function argument when the Create or Compute call is made. Compute calls require only a temporaryWorkspace that can be freed after the call is completed; this is the scratch space required to complete the calculation. Create calls require temporaryWorkspace and persistentWorkspace. The temporaryWorkspace has the same meaning as in the Compute call. This is scratch space that can be released after the handle is created. The persistentWorkspace must remain untouched for the lifetime of the handle. The persistentWorkspace is where the internal data of the handle is stored. It is the responsibility of the user to ensure that the lifetime of the handles and its associated persistentWorkspace are closely tethered.

  • Error Checking: Where possible, cuEST does rigorous error checking on function inputs. In general, cuEST functions will check that the requested computation is topologically possible. cuEST does not check that a computation is reasonable; that is left to the purview of the user. Due to the nature of a C API, certain checks are not possible. For example, if a user-provided density matrix is smaller than the expected size, out-of-bounds reads will occur. This is part of the contract between the user and the API. The user is expected to provide valid handles to function calls, buffers of the expected size and to leave these objects in valid states for the duration of their use. The user should query the metadata of opaque handles and check return codes of cuEST functions to ensure correct operation of the library.

Note

cuEST is designed to provide robust computation of the continuous quantities involved in Gaussian basis electronic structure theory. However, there are cases where a user could provide a physically non-sensical geometry that, while non-infinite, may produce intermediates that result in severe roundoff error. It is strongly recommended that users do not attempt to run computations on molecules with bond lengths that lie far outside of their equilibrium positions (e.g. atoms lying on top of one another).

Basis and Integration Grids#

Basis Sets#

Constructing a cuEST basis set handle provides a good example of the general design philosophy of the library: handle construction is explicit and verbatim. A Gaussian basis set is composed of basis function centers, at each center there are some number of contracted Gaussian shells, each shell has an associated angular momentum and list of primitive Gaussian functions (with exponents and contraction coefficients).

To construct a basis set, the user defines each of the shells (an opaque shell handle). An array of shell handles defines the basis on a particular center. An array of shell arrays (one shell array for each center) defines the overall basis for the molecule (in practice this is flattened to a one-dimensional array).

This is typical of how objects are constructed in cuEST. The basis set definition is fully explicit. Further, it is also taken in verbatim to the internal data structures. As a consequence of this verbatim construction, the user is responsible for normalization of the the contraction coefficients.

Primitive Basis Function Normalization#

The cuEST API function to construct a shell handle is cuestAOShellCreate(). This function does limited error checking and then builds the shell exactly as indicated by the user. In this case, the user should pass in the “finished” primitive-and-L-normalized contraction coefficients. These are very often different from what is present in standard text files defining the basis set.

One specific example commonly arises when constructing basis functions from a basis set format such as .gbs that stores coefficients in an unnormalized form. This case is clearly described in the cuest_scf/ao_shell.py example, where the normalized coefficients in the compute_normalized_coefficients function are computed according to:

\[d = \prod_{l}^{L+1} 2 l - 1\]
\[c_k = c_k^{\text{raw}} \sqrt{\frac{2^L \times (2 e_k)^{L + \frac{3}{2}}}{\pi^{3/2} \times d}}\]
\[V = \sum_{i,j} c_{i}^{\text{raw}} c_{j}^{\text{raw}} \left( 2 \frac{\sqrt{e_i e_j} } { (e_i + e_j) } \right) ^ {L + \frac{3}{2}}\]
\[c_k^N = c_k \times \sqrt{\frac{N_{\text{fact}}}{V}}\]

where:

  • \(L\) is the function angular momentum

  • \(c^{raw}\) are the raw basis function coefficients (e.g. from the basis file)

  • \(e\) are the basis function exponents

  • \(N_{fact}\) is a user-provided normalization factor (default to 1.0)

  • \(c^{N}\) are the computed, normalized primitive basis function coefficients. These are used as input to cuestAOShellCreate() to construct the shell handle.

Pair List and Sparsity Discovery#

Notice that the discussion of basis set definition did not include the Cartesian coordinates of the atom centers. This allows the basis set handle to remain valid (without modification) while the positions of those centers vary. One of the first places where the Cartesian positions of the basis function centers is captured is in the pair list handle. The pair list contains – as the name would indicate – a list of all the significant function pairs at a particular geometry and at a particular sparsity threshold. The pair list handle becomes one of the key objects needed to compute one-electron integral matrices as well as density-fitted J and K matrices.

Pair list construction requires a basis set, XYZ coordinates of all the basis function centers and a pair screening threshold: thresholdPQ. Pairs of primitive Gaussians are included in the pair list if their spherical overlap exceeds the user-selected thresholdPQ (ignoring angular momentum and including contraction coefficients). Shell pairs are included in the pair list if any of their constituent primitive pairs exceed the pair screening threshold. In general, we recommend values of thresholdPQ between \(10^{-10}\) and \(10^{-16}\) depending on the tradeoff between accuracy and efficiency required for a particular application.

Integration Grids#

The definition of the quadrature grid used for exchange-correlation integration follows closely the paradigm outlined for the basis set. That is, the definition of the grid is provided explicitly and (largely) verbatim. cuEST allows the user to define Becke-style integration grids with direct-product radial/angular quadratures at each atom center and a molecular partition (Becke weights).

Atom Grids#

A cuEST atom grid handle defines the radial and angular quadratures on each atom center. To construct this handle, the user provides:

  • The number of radial nodes

  • The position of each radial node

  • The quadrature weight of each radial node

  • Number of points in the angular quadrature at each radial node

The angular integration is done using Lebedev quadrature, so the number of angular integration points is limited to the Lebedev numbers. The radial quadrature is taken verbatim from the user input.

Molecular Grids#

In analogy to the basis set, the molecular grid handle is an array of atom grid handles for all of the atoms in the molecule. This specification is sufficiently verbose to cover application to any standard DFT integration grid utilizing Lebedev quadratures. Custom or standard pruning schemes are provided explicitly in the definition of the atom grid handles.

The resulting direct-product-type grid is partitioned using Becke’s approach [1], with grid points being assigned to their nearest atoms using Voronoi mapping, the weights assigned to the grid points belonging to a given atom sum to 1, and these weights decay smoothly to 0 as the distance between the weight and its owning atom increase. cuEST relies on the modified weighting scheme described by Stratmann [2] to achieve linear scaling in the quadrature dimension while retaining good numerical stability characteristics.

Effective Core Pseudopotentials (ECP)#

Effective Core Pseudopotentials (ECPs), replace the chemically inert core electrons of heavy atoms with an effective potential. This reduces the computational cost by decreasing the number of electrons that must be treated explicitly while maintaining accuracy for valence-electron properties.

cuEST implements semi-local ECPs using numerical quadrature on atom-centered grids. The ECP matrix is decomposed into:

  • Local component: An isotropic potential that screens all angular momentum channels equally.

  • Non-local component: Angular momentum-dependent potentials that project onto specific spherical harmonic channels.

The ECP integral matrix \(V^{\text{ECP}}_{\mu\nu}\) contributes to the one-electron part of the Fock matrix:

\[V^{\text{ECP}}_{\mu\nu} = V^{\text{local}}_{\mu\nu} + V^{\text{nonlocal}}_{\mu\nu}\]

Defining ECPs in cuEST follows along similar lines to the grid and basis set. The definition starts from ECP shell handles to define individual angular momenta channels of the ECP. ECP atom handles contain all the shells centered on a particular atom. An array of ECP atom handles provides the full definition of the ECPs for a molecule.

ECP Shells#

ECP shells are individual angular momenta channels of the ECP, defined by:

  • radial powers

  • exponents

  • contraction coefficients.

Each shell has an angular momentum quantum number \(L\).

ECP Atoms#

An ECP atom is a collection of ECPShell objects representing the complete ECP for a single element. Each ECPAtom has a “top shell” (the local/isotropic component at \(L = L_{\max}\)) and additional shells for \(L < L_{\max}\) (the non-local components).

Integral Compute Plans#

One-Electron Integrals#

To compute one-electron integrals and their derivatives, a cuestOEIntPlan_t one-electron integral plan handle must be created. With such a handle, overlap integrals (S) and their gradients, kinetic energy integrals (T) and their gradients, and potential integrals (V) and their gradients can be computed.

The one-electron integral plan handle requires a cuestAOBasis_t basis set handle and corresponding cuestAOPairList_t pair list handle for construction. This defines the one-particle basis consisting of the basis function and locations of their centers. Subsequent compute calls using the one-electron integral plan handle may take additional data as input, such as an array of point charges when evaluating potential integrals.

Density-Fitted Integrals#

When evaluating two-electron contributions to the Fock matrix, Coulomb (J) and exchange (K) matrices, cuEST uses the density fitting approximation. To carry out such computations, a cuestDFIntPlan_t must be constructed. In addition to the usual orbital basis (in this context, the “primary” basis), an auxiliary or fitting basis must be defined. The cuestDFIntPlan_t density-fitted integral plan handle is constructed from two cuestAOBasis_t handles: a primary basis handle and an auxiliary basis handle. It also requires a cuestAOPairList_t pair list handle corresponding to the primary basis. With the density-fitted integral plan handle, J and K matrices can be evaluated. A single routine to compute J and K gradients simultaneously is also provided.

In contrast to some of the other handles described here, the cuestDFIntPlan_t density-fitted integral plan handle requires a significant amount of compute to construct and a significant amount of GPU memory to store. It is strongly recommended that only a single density-fitted integral plan handle be constructed for each DFT calculation (energy and gradient). If a new density-fitted integral plan handle is constructed at each SCF iteration, performance will degrade significantly. Similarly, making unnecessary copies of the density-fitted integral plan handle will reduce the apparent tractability of the cuEST library.

DFT Exchange-Correlation#

cuEST allows users to evaluate the exchange correlation (XC) energy, potential and gradients for both local and nonlocal functionals. These API function go through a grid-based representation internally, but inputs and output arrays are not in grid space. Other more advanced API functions are available for users that want to perform custom operations on quantities while they are in the grid representation.

To construct an cuestXCIntPlan_t XC integral plan, the user must provide a cuestAOBasis_t basis handle and a cuestMolecularGrid_t handle. It is required that the number of basis function centers in the basis handle match exactly the number of atom centers in the molecular grid.

Additionally, an cuestXCIntPlanParametersFunctional_t is specified to identify which XC functional should be evaluated.

ECP Integrals#

The calculation of ECP integrals requires a cuestECPIntPlan_t ECP integral plan handle. These are constructed from a cuestAOBasis_t basis handle and an array of cuestECPAtom_t ECP atom handles. The Cartesian coordinates of the molecule and the indices of the ECP atoms in the larger molecule must also be provided. With the ECP integral plan handle, ECP integral matrices and their gradients can be computed using the definition of the ECP provided during construction. Note that cuEST uses a novel quadrature-based implementation of the ECP integrals; some of the attributes of cuestECPIntPlanParameters_t may not be found in other software packages.

Polarizable Continuum Models#

An implementation of various flavors of PCM models can be accessed by constructing a cuestPCMIntPlan_t PCM plan handle. The implementation of PCM in cuEST uses the improved switching Gaussian approach (iSWIG) [3] for cavity discretization.

To perform PCM calculations, the basis set and pair list are provided via a previously constructed cuestOEIntPlan_t plan handle. What remains is to define the cavity and nuclear contributions to the PCM potential. The user must provide an explicit definition of the cavity by providing atomic radii for every atom in the molecule. Additionally, each atom has an associated Lebedev quadrature (specified by Lebedev number) to discretize the cavity. The zeta coefficients (see York and Karplus [5]) that enter the iSWIG discretization must also be provided for each atom. Finally, nuclear charges of each atom must be provided; these should be adjusted to account for ECPs if ECPs are present.

A reasonable set of atomic radii can be found in the CRC Handbook [4]. These were originally from Bondi and then extended by Cramer and Truhlar to cover more of the periodic table. Recommended zeta coefficient values, taken from York [5] are as follows:

Table 3 PCM Zeta Coefficient Values#

Num Angular Points

Optimized Zeta

14

4.865

26

4.855

50

4.893

110

4.901

194

4.903

302

4.905

434

4.906

590

4.905

770

4.899

974

4.907

1202

4.907