3.3. LAPACK Functions

LAPACK routines solve complex Numerical Linear Algebra problems. These functions are organized in the following categories:

Note

Throughout the APIs’ descriptions, we use the following notations:

  • x[i] stands for the i-th element of vector x, while A[i,j] represents the element in the i-th row and j-th column of matrix A. Indices are 1-based, i.e. x[1] is the first element of x.

  • If X is a real vector or matrix, \(X^T\) indicates its transpose; if X is complex, then \(X^H\) represents its conjugate transpose. When X could be real or complex, we use X’ to indicate X transposed or X conjugate transposed, accordingly.

  • x_i \(=x_i\); we sometimes use both notations, \(x_i\) when displaying mathematical equations, and x_i in the text describing the function parameters.

3.3.1. Triangular factorizations

rocsolver_<type>potf2()

rocblas_status rocsolver_zpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_cpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_spotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

POTF2 computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form:

\[\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}\]

U is an upper triangular matrix and L is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful factorization of matrix A. If info = j > 0, the leading minor of order j of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potf2_batched()

rocblas_status rocsolver_zpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

POTF2_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_i\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_i = U_i'U_i & \: \text{if uplo is upper, or}\\ A_i = L_iL_i' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_i\) is an upper triangular matrix and \(L_i\) is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix A_i.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful factorization of matrix A_i. If info[i] = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potf2_strided_batched()

rocblas_status rocsolver_zpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

POTF2_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_i\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_i = U_i'U_i & \: \text{if uplo is upper, or}\\ A_i = L_iL_i' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_i\) is an upper triangular matrix and \(L_i\) is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix A_i.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful factorization of matrix A_i. If info[i] = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potrf()

rocblas_status rocsolver_zpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_cpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_spotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

POTRF computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the blocked version of the algorithm).

The factorization has the form:

\[\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}\]

U is an upper triangular matrix and L is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful factorization of matrix A. If info = j > 0, the leading minor of order j of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potrf_batched()

rocblas_status rocsolver_zpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

POTRF_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_i\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_i = U_i'U_i & \: \text{if uplo is upper, or}\\ A_i = L_iL_i' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_i\) is an upper triangular matrix and \(L_i\) is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix A_i.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful factorization of matrix A_i. If info[i] = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potrf_strided_batched()

rocblas_status rocsolver_zpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

POTRF_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_i\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_i = U_i'U_i & \: \text{if uplo is upper, or}\\ A_i = L_iL_i' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_i\) is an upper triangular matrix and \(L_i\) is lower triangular.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix A_i.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i to be factored. On exit, the upper or lower triangular factors.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful factorization of matrix A_i. If info[i] = j > 0, the leading minor of order j of A_i is not positive definite. The i-th factorization stopped at this point.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getf2()

rocblas_status rocsolver_zgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_cgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_sgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

GETF2 computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

\[ A = PLU \]

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension min(m,n).

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = j > 0, U is singular. U[j,j] is the first zero pivot.

rocsolver_<type>getf2_batched()

rocblas_status rocsolver_zgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETF2_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_i\) in the batch has the form

\[ A_i = P_iL_iU_i \]

where \(P_i\) is a permutation matrix, \(L_i\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_i\) is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivot indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, U_i is singular. U_i[j,j] is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getf2_strided_batched()

rocblas_status rocsolver_zgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETF2_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_i\) in the batch has the form

\[ A_i = P_iL_iU_i \]

where \(P_i\) is a permutation matrix, \(L_i\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_i\) is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivots indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, U_i is singular. U_i[j,j] is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf()

rocblas_status rocsolver_zgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_cgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_sgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

GETRF computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

\[ A = PLU \]

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension min(m,n).

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = j > 0, U is singular. U[j,j] is the first zero pivot.

rocsolver_<type>getrf_batched()

rocblas_status rocsolver_zgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRF_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_i\) in the batch has the form

\[ A_i = P_iL_iU_i \]

where \(P_i\) is a permutation matrix, \(L_i\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_i\) is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorizations. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivot indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, U_i is singular. U_i[j,j] is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf_strided_batched()

rocblas_status rocsolver_zgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRF_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_i\) in the batch has the form

\[ A_i = P_iL_iU_i \]

where \(P_i\) is a permutation matrix, \(L_i\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_i\) is upper triangular (upper trapezoidal if m < n).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_i to be factored. On exit, the factors L_i and U_i from the factorization. The unit diagonal elements of L_i are not stored.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors of pivots indices ipiv_i (corresponding to A_i). Dimension of ipiv_i is min(m,n). Elements of ipiv_i are 1-based indices. For each instance A_i in the batch and for 1 <= j <= min(m,n), the row j of the matrix A_i was interchanged with row ipiv_i[j]. Matrix P_i of the factorization can be derived from ipiv_i.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, U_i is singular. U_i[j,j] is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytf2()

rocblas_status rocsolver_zsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_csytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_ssytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

SYTF2 computes the factorization of a symmetric indefinite matrix \(A\) using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}\]

where \(U\) or \(L\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D(k)\).

Specifically, \(U\) and \(L\) are computed as

\[\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D(k)\), and \(P(k)\) is a permutation matrix defined by \(ipiv[k]\). If we let \(s\) denote the order of block \(D(k)\), then \(U(k)\) and \(L(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D(k)\) is stored in \(A[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A\). If \(s = 2\) and uplo is upper, then \(D(k)\) is stored in \(A[k-1,k-1]\), \(A[k-1,k]\), and \(A[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A\). If \(s = 2\) and uplo is lower, then \(D(k)\) is stored in \(A[k,k]\), \(A[k+1,k]\), and \(A[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info[i] = j > 0, D is singular. D[j,j] is the first diagonal zero.

rocsolver_<type>sytf2_batched()

rocblas_status rocsolver_zsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_csytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

SYTF2_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_i = U_i D_i U_i^T & \: \text{or}\\ A_i = L_i D_i L_i^T & \end{array} \end{split}\]

where \(U_i\) or \(L_i\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_i\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_i(k)\).

Specifically, \(U_i\) and \(L_i\) are computed as

\[\begin{split} \begin{array}{cl} U_i = P_i(n) U_i(n) \cdots P_i(k) U_i(k) \cdots & \: \text{and}\\ L_i = P_i(1) L_i(1) \cdots P_i(k) L_i(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_i(k)\), and \(P_i(k)\) is a permutation matrix defined by \(ipiv_i[k]\). If we let \(s\) denote the order of block \(D_i(k)\), then \(U_i(k)\) and \(L_i(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_i(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_i(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_i(k)\) is stored in \(A_i[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_i\). If \(s = 2\) and uplo is upper, then \(D_i(k)\) is stored in \(A_i[k-1,k-1]\), \(A_i[k-1,k]\), and \(A_i[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_i\). If \(s = 2\) and uplo is lower, then \(D_i(k)\) is stored in \(A_i[k,k]\), \(A_i[k+1,k]\), and \(A_i[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_i\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_i is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_i in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the symmetric matrices A_i to be factored. On exit, the block diagonal matrices D_i and the multipliers needed to compute U_i or L_i.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_i.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_i[k] > 0 then rows and columns k and ipiv_i[k] were interchanged and D_i[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_i[k] = ipiv_i[k-1] < 0 and uplo is upper (or ipiv_i[k] = ipiv_i[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_i[k] (or rows and columns k+1 and -ipiv_i[k]) were interchanged and D_i[k-1,k-1] to D_i[k,k] (or D_i[k,k] to D_i[k+1,k+1]) is a 2-by-2 diagonal block.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, D_i is singular. D_i[j,j] is the first diagonal zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytf2_strided_batched()

rocblas_status rocsolver_zsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_csytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

SYTF2_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_i = U_i D_i U_i^T & \: \text{or}\\ A_i = L_i D_i L_i^T & \end{array} \end{split}\]

where \(U_i\) or \(L_i\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_i\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_i(k)\).

Specifically, \(U_i\) and \(L_i\) are computed as

\[\begin{split} \begin{array}{cl} U_i = P_i(n) U_i(n) \cdots P_i(k) U_i(k) \cdots & \: \text{and}\\ L_i = P_i(1) L_i(1) \cdots P_i(k) L_i(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_i(k)\), and \(P_i(k)\) is a permutation matrix defined by \(ipiv_i[k]\). If we let \(s\) denote the order of block \(D_i(k)\), then \(U_i(k)\) and \(L_i(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_i(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_i(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_i(k)\) is stored in \(A_i[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_i\). If \(s = 2\) and uplo is upper, then \(D_i(k)\) is stored in \(A_i[k-1,k-1]\), \(A_i[k-1,k]\), and \(A_i[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_i\). If \(s = 2\) and uplo is lower, then \(D_i(k)\) is stored in \(A_i[k,k]\), \(A_i[k+1,k]\), and \(A_i[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_i\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_i is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_i in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the symmetric matrices A_i to be factored. On exit, the block diagonal matrices D_i and the multipliers needed to compute U_i or L_i.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_i[k] > 0 then rows and columns k and ipiv_i[k] were interchanged and D_i[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_i[k] = ipiv_i[k-1] < 0 and uplo is upper (or ipiv_i[k] = ipiv_i[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_i[k] (or rows and columns k+1 and -ipiv_i[k]) were interchanged and D_i[k-1,k-1] to D_i[k,k] (or D_i[k,k] to D_i[k+1,k+1]) is a 2-by-2 diagonal block.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, D_i is singular. D_i[j,j] is the first diagonal zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytrf()

rocblas_status rocsolver_zsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_csytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_ssytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

SYTRF computes the factorization of a symmetric indefinite matrix \(A\) using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}\]

where \(U\) or \(L\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D(k)\).

Specifically, \(U\) and \(L\) are computed as

\[\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D(k)\), and \(P(k)\) is a permutation matrix defined by \(ipiv[k]\). If we let \(s\) denote the order of block \(D(k)\), then \(U(k)\) and \(L(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D(k)\) is stored in \(A[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A\). If \(s = 2\) and uplo is upper, then \(D(k)\) is stored in \(A[k-1,k-1]\), \(A[k-1,k]\), and \(A[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A\). If \(s = 2\) and uplo is lower, then \(D(k)\) is stored in \(A[k,k]\), \(A[k+1,k]\), and \(A[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info[i] = j > 0, D is singular. D[j,j] is the first diagonal zero.

rocsolver_<type>sytrf_batched()

rocblas_status rocsolver_zsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_csytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

SYTRF_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_i = U_i D_i U_i^T & \: \text{or}\\ A_i = L_i D_i L_i^T & \end{array} \end{split}\]

where \(U_i\) or \(L_i\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_i\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_i(k)\).

Specifically, \(U_i\) and \(L_i\) are computed as

\[\begin{split} \begin{array}{cl} U_i = P_i(n) U_i(n) \cdots P_i(k) U_i(k) \cdots & \: \text{and}\\ L_i = P_i(1) L_i(1) \cdots P_i(k) L_i(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_i(k)\), and \(P_i(k)\) is a permutation matrix defined by \(ipiv_i[k]\). If we let \(s\) denote the order of block \(D_i(k)\), then \(U_i(k)\) and \(L_i(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_i(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_i(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_i(k)\) is stored in \(A_i[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_i\). If \(s = 2\) and uplo is upper, then \(D_i(k)\) is stored in \(A_i[k-1,k-1]\), \(A_i[k-1,k]\), and \(A_i[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_i\). If \(s = 2\) and uplo is lower, then \(D_i(k)\) is stored in \(A_i[k,k]\), \(A_i[k+1,k]\), and \(A_i[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_i\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_i is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_i in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the symmetric matrices A_i to be factored. On exit, the block diagonal matrices D_i and the multipliers needed to compute U_i or L_i.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_i.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_i[k] > 0 then rows and columns k and ipiv_i[k] were interchanged and D_i[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_i[k] = ipiv_i[k-1] < 0 and uplo is upper (or ipiv_i[k] = ipiv_i[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_i[k] (or rows and columns k+1 and -ipiv_i[k]) were interchanged and D_i[k-1,k-1] to D_i[k,k] (or D_i[k,k] to D_i[k+1,k+1]) is a 2-by-2 diagonal block.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, D_i is singular. D_i[j,j] is the first diagonal zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytrf_strided_batched()

rocblas_status rocsolver_zsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_csytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

SYTRF_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_i = U_i D_i U_i^T & \: \text{or}\\ A_i = L_i D_i L_i^T & \end{array} \end{split}\]

where \(U_i\) or \(L_i\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_i\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_i(k)\).

Specifically, \(U_i\) and \(L_i\) are computed as

\[\begin{split} \begin{array}{cl} U_i = P_i(n) U_i(n) \cdots P_i(k) U_i(k) \cdots & \: \text{and}\\ L_i = P_i(1) L_i(1) \cdots P_i(k) L_i(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_i(k)\), and \(P_i(k)\) is a permutation matrix defined by \(ipiv_i[k]\). If we let \(s\) denote the order of block \(D_i(k)\), then \(U_i(k)\) and \(L_i(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_i(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_i(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_i(k)\) is stored in \(A_i[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_i\). If \(s = 2\) and uplo is upper, then \(D_i(k)\) is stored in \(A_i[k-1,k-1]\), \(A_i[k-1,k]\), and \(A_i[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_i\). If \(s = 2\) and uplo is lower, then \(D_i(k)\) is stored in \(A_i[k,k]\), \(A_i[k+1,k]\), and \(A_i[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_i\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_i is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_i in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the symmetric matrices A_i to be factored. On exit, the block diagonal matrices D_i and the multipliers needed to compute U_i or L_i.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_i[k] > 0 then rows and columns k and ipiv_i[k] were interchanged and D_i[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_i[k] = ipiv_i[k-1] < 0 and uplo is upper (or ipiv_i[k] = ipiv_i[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_i[k] (or rows and columns k+1 and -ipiv_i[k]) were interchanged and D_i[k-1,k-1] to D_i[k,k] (or D_i[k,k] to D_i[k+1,k+1]) is a 2-by-2 diagonal block.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_i to the next one ipiv_(i+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for factorization of A_i. If info[i] = j > 0, D_i is singular. D_i[j,j] is the first diagonal zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

3.3.2. Orthogonal factorizations

rocsolver_<type>geqr2()

rocblas_status rocsolver_zgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQR2 computes a QR factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}\]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_1H_2\cdots H_k, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{ipiv}[i] \cdot v_i v_i' \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars.

rocsolver_<type>geqr2_batched()

rocblas_status rocsolver_zgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQR2_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}\]

where \(R_j\) is upper triangular (upper trapezoidal if m < n), and \(Q_j\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the first i-1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m - i elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqr2_strided_batched()

rocblas_status rocsolver_zgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQR2_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}\]

where \(R_j\) is upper triangular (upper trapezoidal if m < n), and \(Q_j\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the first i-1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m - i elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqrf()

rocblas_status rocsolver_zgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQRF computes a QR factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}\]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_1H_2\cdots H_k, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{ipiv}[i] \cdot v_i v_i' \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars.

rocsolver_<type>geqrf_batched()

rocblas_status rocsolver_zgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQRF_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}\]

where \(R_j\) is upper triangular (upper trapezoidal if m < n), and \(Q_j\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the first i-1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m - i elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqrf_strided_batched()

rocblas_status rocsolver_zgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQRF_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}\]

where \(R_j\) is upper triangular (upper trapezoidal if m < n), and \(Q_j\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the first i-1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m - i elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gerq2()

rocblas_status rocsolver_zgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GERQ2 computes a RQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q \]

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_1'H_2' \cdots H_k', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{ipiv}[i] \cdot v_i v_i' \]

where the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars.

rocsolver_<type>gerq2_batched()

rocblas_status rocsolver_zgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GERQ2_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j \]

where \(R_j\) is upper triangular (upper trapezoidal if m > n), and \(Q_j\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the last n-i elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gerq2_strided_batched()

rocblas_status rocsolver_zgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GERQ2_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j \]

where \(R_j\) is upper triangular (upper trapezoidal if m > n), and \(Q_j\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the last n-i elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gerqf()

rocblas_status rocsolver_zgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GERQF computes a RQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q \]

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_1'H_2' \cdots H_k', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{ipiv}[i] \cdot v_i v_i' \]

where the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars.

rocsolver_<type>gerqf_batched()

rocblas_status rocsolver_zgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GERQF_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j \]

where \(R_j\) is upper triangular (upper trapezoidal if m > n), and \(Q_j\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the last n-i elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gerqf_strided_batched()

rocblas_status rocsolver_zgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GERQF_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j \]

where \(R_j\) is upper triangular (upper trapezoidal if m > n), and \(Q_j\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the last n-i elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geql2()

rocblas_status rocsolver_zgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQL2 computes a QL factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}\]

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_kH_{k-1}\cdots H_1, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{ipiv}[i] \cdot v_i v_i' \]

where the last m-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars.

rocsolver_<type>geql2_batched()

rocblas_status rocsolver_zgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQL2_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}\]

where \(L_j\) is lower triangular (lower trapezoidal if m < n), and \(Q_j\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_{j_k}H_{j_{k-1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the last m-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geql2_strided_batched()

rocblas_status rocsolver_zgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQL2_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}\]

where \(L_j\) is lower triangular (lower trapezoidal if m < n), and \(Q_j\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_{j_k}H_{j_{k-1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the last m-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqlf()

rocblas_status rocsolver_zgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GEQLF computes a QL factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}\]

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_kH_{k-1}\cdots H_1, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{ipiv}[i] \cdot v_i v_i' \]

where the last m-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars.

rocsolver_<type>geqlf_batched()

rocblas_status rocsolver_zgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQLF_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}\]

where \(L_j\) is lower triangular (lower trapezoidal if m < n), and \(Q_j\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_{j_k}H_{j_{k-1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the last m-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>geqlf_strided_batched()

rocblas_status rocsolver_zgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GEQLF_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}\]

where \(L_j\) is lower triangular (lower trapezoidal if m < n), and \(Q_j\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_{j_k}H_{j_{k-1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}' \]

where the last m-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelq2()

rocblas_status rocsolver_zgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GELQ2 computes a LQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} L & 0 \end{array}\right] Q \]

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_k'H_{k-1}' \cdots H_1', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{ipiv}[i] \cdot v_i' v_i \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the last n - i elements of Householder vector v_i.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars.

rocsolver_<type>gelq2_batched()

rocblas_status rocsolver_zgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQ2_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j \]

where \(L_j\) is lower triangular (lower trapezoidal if m > n), and \(Q_j\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_k}'H_{j_{k-1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i} \]

where the first i-1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n - i elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelq2_strided_batched()

rocblas_status rocsolver_zgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQ2_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j \]

where \(L_j\) is lower triangular (lower trapezoidal if m > n), and \(Q_j\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_k}'H_{j_{k-1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i} \]

where the first i-1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n - i elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelqf()

rocblas_status rocsolver_zgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)
rocblas_status rocsolver_cgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)
rocblas_status rocsolver_dgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)
rocblas_status rocsolver_sgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)

GELQF computes a LQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} L & 0 \end{array}\right] Q \]

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H_k'H_{k-1}' \cdots H_1', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{ipiv}[i] \cdot v_i' v_i \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the last n - i elements of Householder vector v_i.

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • [out] ipiv: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars.

rocsolver_<type>gelqf_batched()

rocblas_status rocsolver_zgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQF_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j \]

where \(L_j\) is lower triangular (lower trapezoidal if m > n), and \(Q_j\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_k}'H_{j_{k-1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i} \]

where the first i-1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n - i elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gelqf_strided_batched()

rocblas_status rocsolver_zgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)

GELQF_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j \]

where \(L_j\) is lower triangular (lower trapezoidal if m > n), and \(Q_j\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_j = H_{j_k}'H_{j_{k-1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i} \]

where the first i-1 elements of Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n - i elements of Householder vector v_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

3.3.3. Problem and matrix reductions

rocsolver_<type>gebd2()

rocblas_status rocsolver_zgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)
rocblas_status rocsolver_cgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)
rocblas_status rocsolver_dgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)
rocblas_status rocsolver_sgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)

GEBD2 computes the bidiagonal form of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The bidiagonal form is given by:

\[ B = Q' A P \]

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_n\: \text{and} \: P = G_1G_2\cdots G_{n-1}, & \: \text{if}\: m >= n, \:\text{or}\\ Q = H_1H_2\cdots H_{m-1}\: \text{and} \: P = G_1G_2\cdots G_{m}, & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_i\) and \(G_i\) is given by

\[\begin{split} \begin{array}{cl} H_i = I - \text{tauq}[i] \cdot v_i v_i', & \: \text{and}\\ G_i = I - \text{taup}[i] \cdot u_i' u_i. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\); while the first i elements of the Householder vector \(u_i\) are zero, and \(u_i[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_i\) are zero, and \(u_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_i, and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_i. If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_i, and the elements above the diagonal are the last n - i elements of Householder vector u_i.

  • [in] lda: rocblas_int. lda >= m.

    specifies the leading dimension of A.

  • [out] D: pointer to real type. Array on the GPU of dimension min(m,n).

    The diagonal elements of B.

  • [out] E: pointer to real type. Array on the GPU of dimension min(m,n)-1.

    The off-diagonal elements of B.

  • [out] tauq: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars associated with matrix Q.

  • [out] taup: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars associated with matrix P.

rocsolver_<type>gebd2_batched()

rocblas_status rocsolver_zgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBD2_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

\[ B_j = Q_j' A_j P_j \]

where \(B_j\) is upper bidiagonal if m >= n and lower bidiagonal if m < n, and \(Q_j\) and \(P_j\) are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_n}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_{n-1}}, & \: \text{if}\: m >= n, \:\text{or}\\ Q_j = H_{j_1}H_{j_2}\cdots H_{j_{m-1}}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_m}, & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) and \(G_{j_i}\) is given by

\[\begin{split} \begin{array}{cl} H_{j_i} = I - \text{tauq}_j[i] \cdot v_{j_i} v_{j_i}', & \: \text{and}\\ G_{j_i} = I - \text{taup}_j[i] \cdot u_{j_i}' u_{j_i}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\); while the first i elements of the Householder vector \(u_{j_i}\) are zero, and \(u_{j_i}[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_{j_i}\) are zero, and \(u_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(j_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(j_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(j_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [out] tauq: pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of Householder scalars associated with matrices Q_j.

  • [in] strideQ: rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • [out] taup: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of Householder scalars associated with matrices P_j.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gebd2_strided_batched()

rocblas_status rocsolver_zgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBD2_STRIDED_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

\[ B_j = Q_j' A_j P_j \]

where \(B_j\) is upper bidiagonal if m >= n and lower bidiagonal if m < n, and \(Q_j\) and \(P_j\) are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_n}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_{n-1}}, & \: \text{if}\: m >= n, \:\text{or}\\ Q_j = H_{j_1}H_{j_2}\cdots H_{j_{m-1}}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_m}, & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) and \(G_{j_i}\) is given by

\[\begin{split} \begin{array}{cl} H_{j_i} = I - \text{tauq}_j[i] \cdot v_{j_i} v_{j_i}', & \: \text{and}\\ G_{j_i} = I - \text{taup}_j[i] \cdot u_{j_i}' u_{j_i}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\); while the first i elements of the Householder vector \(u_{j_i}\) are zero, and \(u_{j_i}[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_{j_i}\) are zero, and \(u_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(j_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(j_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(j_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [out] tauq: pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of Householder scalars associated with matrices Q_j.

  • [in] strideQ: rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • [out] taup: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of Householder scalars associated with matrices P_j.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gebrd()

rocblas_status rocsolver_zgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)
rocblas_status rocsolver_cgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)
rocblas_status rocsolver_dgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)
rocblas_status rocsolver_sgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)

GEBRD computes the bidiagonal form of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The bidiagonal form is given by:

\[ B = Q' A P \]

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_n\: \text{and} \: P = G_1G_2\cdots G_{n-1}, & \: \text{if}\: m >= n, \:\text{or}\\ Q = H_1H_2\cdots H_{m-1}\: \text{and} \: P = G_1G_2\cdots G_{m}, & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_i\) and \(G_i\) is given by

\[\begin{split} \begin{array}{cl} H_i = I - \text{tauq}[i] \cdot v_i v_i', & \: \text{and}\\ G_i = I - \text{taup}[i] \cdot u_i' u_i. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\); while the first i elements of the Householder vector \(u_i\) are zero, and \(u_i[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_i\) are zero, and \(u_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_i, and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_i. If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_i, and the elements above the diagonal are the last n - i elements of Householder vector u_i.

  • [in] lda: rocblas_int. lda >= m.

    specifies the leading dimension of A.

  • [out] D: pointer to real type. Array on the GPU of dimension min(m,n).

    The diagonal elements of B.

  • [out] E: pointer to real type. Array on the GPU of dimension min(m,n)-1.

    The off-diagonal elements of B.

  • [out] tauq: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars associated with matrix Q.

  • [out] taup: pointer to type. Array on the GPU of dimension min(m,n).

    The Householder scalars associated with matrix P.

rocsolver_<type>gebrd_batched()

rocblas_status rocsolver_zgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBRD_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

\[ B_j = Q_j' A_j P_j \]

where \(B_j\) is upper bidiagonal if m >= n and lower bidiagonal if m < n, and \(Q_j\) and \(P_j\) are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_n}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_{n-1}}, & \: \text{if}\: m >= n, \:\text{or}\\ Q_j = H_{j_1}H_{j_2}\cdots H_{j_{m-1}}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_m}, & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) and \(G_{j_i}\) is given by

\[\begin{split} \begin{array}{cl} H_{j_i} = I - \text{tauq}_j[i] \cdot v_{j_i} v_{j_i}', & \: \text{and}\\ G_{j_i} = I - \text{taup}_j[i] \cdot u_{j_i}' u_{j_i}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\); while the first i elements of the Householder vector \(u_{j_i}\) are zero, and \(u_{j_i}[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_{j_i}\) are zero, and \(u_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(j_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(j_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(j_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [out] tauq: pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of Householder scalars associated with matrices Q_j.

  • [in] strideQ: rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • [out] taup: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of Householder scalars associated with matrices P_j.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gebrd_strided_batched()

rocblas_status rocsolver_zgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_cgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_dgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_sgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)

GEBRD_STRIDED_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

\[ B_j = Q_j' A_j P_j \]

where \(B_j\) is upper bidiagonal if m >= n and lower bidiagonal if m < n, and \(Q_j\) and \(P_j\) are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_n}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_{n-1}}, & \: \text{if}\: m >= n, \:\text{or}\\ Q_j = H_{j_1}H_{j_2}\cdots H_{j_{m-1}}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_m}, & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) and \(G_{j_i}\) is given by

\[\begin{split} \begin{array}{cl} H_{j_i} = I - \text{tauq}_j[i] \cdot v_{j_i} v_{j_i}', & \: \text{and}\\ G_{j_i} = I - \text{taup}_j[i] \cdot u_{j_i}' u_{j_i}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\); while the first i elements of the Householder vector \(u_{j_i}\) are zero, and \(u_{j_i}[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_{j_i}\) are zero, and \(u_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all the matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all the matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(j_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(j_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(j_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(j_i).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of B_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of B_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [out] tauq: pointer to type. Array on the GPU (the size depends on the value of strideQ).

    Contains the vectors tauq_j of Householder scalars associated with matrices Q_j.

  • [in] strideQ: rocblas_stride.

    Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • [out] taup: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors taup_j of Householder scalars associated with matrices P_j.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytd2()

rocblas_status rocsolver_dsytd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tau)
rocblas_status rocsolver_ssytd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tau)

SYTD2 computes the tridiagonal form of a real symmetric matrix A.

(This is the unblocked version of the algorithm).

The tridiagonal form is given by:

\[ T = Q' A Q \]

where T is symmetric tridiagonal and Q is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_{n-1} & \: \text{if uplo indicates lower, or}\\ Q = H_{n-1}H_{n-2}\cdots H_1 & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{tau}[i] \cdot v_i v_i' \]

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_i stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_i stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A.

  • [out] D: pointer to type. Array on the GPU of dimension n.

    The diagonal elements of T.

  • [out] E: pointer to type. Array on the GPU of dimension n-1.

    The off-diagonal elements of T.

  • [out] tau: pointer to type. Array on the GPU of dimension n-1.

    The Householder scalars.

rocsolver_<type>sytd2_batched()

rocblas_status rocsolver_dsytd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_ssytd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)

SYTD2_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_j.

(This is the unblocked version of the algorithm).

The tridiagonal form of \(A_j\) is given by:

\[ T_j = Q_j' A_j Q_j \]

where \(T_j\) is symmetric tridiagonal and \(Q_j\) is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_{n-1}} & \: \text{if uplo indicates lower, or}\\ Q_j = H_{j_{n-1}}H_{j_{n-2}}\cdots H_{j_1} & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{tau}_j[i] \cdot v_{j_i} v_{j_i}' \]

where \(\text{tau}_j[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_j; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(j_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(j_i) stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A_j.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • [out] tau: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytd2_strided_batched()

rocblas_status rocsolver_dsytd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_ssytd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)

SYTD2_STRIDED_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_j.

(This is the unblocked version of the algorithm).

The tridiagonal form of \(A_j\) is given by:

\[ T_j = Q_j' A_j Q_j \]

where \(T_j\) is symmetric tridiagonal and \(Q_j\) is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_{n-1}} & \: \text{if uplo indicates lower, or}\\ Q_j = H_{j_{n-1}}H_{j_{n-2}}\cdots H_{j_1} & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{tau}_j[i] \cdot v_{j_i} v_{j_i}' \]

where \(\text{tau}_j[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_j; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(j_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(j_i) stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • [out] tau: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hetd2()

rocblas_status rocsolver_zhetd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tau)
rocblas_status rocsolver_chetd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tau)

HETD2 computes the tridiagonal form of a complex hermitian matrix A.

(This is the unblocked version of the algorithm).

The tridiagonal form is given by:

\[ T = Q' A Q \]

where T is hermitian tridiagonal and Q is an unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_{n-1} & \: \text{if uplo indicates lower, or}\\ Q = H_{n-1}H_{n-2}\cdots H_1 & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{tau}[i] \cdot v_i v_i' \]

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the first i-1 elements of the Householders vector v_i stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_i stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A.

  • [out] D: pointer to real type. Array on the GPU of dimension n.

    The diagonal elements of T.

  • [out] E: pointer to real type. Array on the GPU of dimension n-1.

    The off-diagonal elements of T.

  • [out] tau: pointer to type. Array on the GPU of dimension n-1.

    The Householder scalars.

rocsolver_<type>hetd2_batched()

rocblas_status rocsolver_zhetd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_chetd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)

HETD2_BATCHED computes the tridiagonal form of a batch of complex hermitian matrices A_j.

(This is the unblocked version of the algorithm).

The tridiagonal form of \(A_j\) is given by:

\[ T_j = Q_j' A_j Q_j \]

where \(T_j\) is Hermitian tridiagonal and \(Q_j\) is a unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_{n-1}} & \: \text{if uplo indicates lower, or}\\ Q_j = H_{j_{n-1}}H_{j_{n-2}}\cdots H_{j_1} & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{tau}_j[i] \cdot v_{j_i} v_{j_i}' \]

where \(\text{tau}_j[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_j; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(j_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(j_i) stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A_j.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • [out] tau: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hetd2_strided_batched()

rocblas_status rocsolver_zhetd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_chetd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)

HETD2_STRIDED_BATCHED computes the tridiagonal form of a batch of complex hermitian matrices A_j.

(This is the unblocked version of the algorithm).

The tridiagonal form of \(A_j\) is given by:

\[ T_j = Q_j' A_j Q_j \]

where \(T_j\) is Hermitian tridiagonal and \(Q_j\) is a unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_{n-1}} & \: \text{if uplo indicates lower, or}\\ Q_j = H_{j_{n-1}}H_{j_{n-2}}\cdots H_{j_1} & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{tau}_j[i] \cdot v_{j_i} v_{j_i}' \]

where \(\text{tau}_j[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_j; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(j_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(j_i) stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • [out] tau: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytrd()

rocblas_status rocsolver_dsytrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tau)
rocblas_status rocsolver_ssytrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tau)

SYTRD computes the tridiagonal form of a real symmetric matrix A.

(This is the blocked version of the algorithm).

The tridiagonal form is given by:

\[ T = Q' A Q \]

where T is symmetric tridiagonal and Q is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_{n-1} & \: \text{if uplo indicates lower, or}\\ Q = H_{n-1}H_{n-2}\cdots H_1 & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{tau}[i] \cdot v_i v_i' \]

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_i stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_i stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A.

  • [out] D: pointer to type. Array on the GPU of dimension n.

    The diagonal elements of T.

  • [out] E: pointer to type. Array on the GPU of dimension n-1.

    The off-diagonal elements of T.

  • [out] tau: pointer to type. Array on the GPU of dimension n-1.

    The Householder scalars.

rocsolver_<type>sytrd_batched()

rocblas_status rocsolver_dsytrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_ssytrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)

SYTRD_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_j.

(This is the blocked version of the algorithm).

The tridiagonal form of \(A_j\) is given by:

\[ T_j = Q_j' A_j Q_j \]

where \(T_j\) is symmetric tridiagonal and \(Q_j\) is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_{n-1}} & \: \text{if uplo indicates lower, or}\\ Q_j = H_{j_{n-1}}H_{j_{n-2}}\cdots H_{j_1} & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{tau}_j[i] \cdot v_{j_i} v_{j_i}' \]

where \(\text{tau}_j[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_j; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(j_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(j_i) stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A_j.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • [out] tau: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sytrd_strided_batched()

rocblas_status rocsolver_dsytrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_ssytrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)

SYTRD_STRIDED_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_j.

(This is the blocked version of the algorithm).

The tridiagonal form of \(A_j\) is given by:

\[ T_j = Q_j' A_j Q_j \]

where \(T_j\) is symmetric tridiagonal and \(Q_j\) is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_{n-1}} & \: \text{if uplo indicates lower, or}\\ Q_j = H_{j_{n-1}}H_{j_{n-2}}\cdots H_{j_1} & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{tau}_j[i] \cdot v_{j_i} v_{j_i}' \]

where \(\text{tau}_j[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_j; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(j_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(j_i) stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • [out] tau: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hetrd()

rocblas_status rocsolver_zhetrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tau)
rocblas_status rocsolver_chetrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tau)

HETRD computes the tridiagonal form of a complex hermitian matrix A.

(This is the blocked version of the algorithm).

The tridiagonal form is given by:

\[ T = Q' A Q \]

where T is hermitian tridiagonal and Q is an unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_{n-1} & \: \text{if uplo indicates lower, or}\\ Q = H_{n-1}H_{n-2}\cdots H_1 & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_i\) is given by

\[ H_i = I - \text{tau}[i] \cdot v_i v_i' \]

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_i stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_i stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A.

  • [out] D: pointer to real type. Array on the GPU of dimension n.

    The diagonal elements of T.

  • [out] E: pointer to real type. Array on the GPU of dimension n-1.

    The off-diagonal elements of T.

  • [out] tau: pointer to type. Array on the GPU of dimension n-1.

    The Householder scalars.

rocsolver_<type>hetrd_batched()

rocblas_status rocsolver_zhetrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_chetrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)

HETRD_BATCHED computes the tridiagonal form of a batch of complex hermitian matrices A_j.

(This is the blocked version of the algorithm).

The tridiagonal form of \(A_j\) is given by:

\[ T_j = Q_j' A_j Q_j \]

where \(T_j\) is Hermitian tridiagonal and \(Q_j\) is a unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_{n-1}} & \: \text{if uplo indicates lower, or}\\ Q_j = H_{j_{n-1}}H_{j_{n-2}}\cdots H_{j_1} & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{tau}_j[i] \cdot v_{j_i} v_{j_i}' \]

where \(\text{tau}_j[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_j; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(j_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(j_i) stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A_j.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • [out] tau: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hetrd_strided_batched()

rocblas_status rocsolver_zhetrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)
rocblas_status rocsolver_chetrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)

HETRD_STRIDED_BATCHED computes the tridiagonal form of a batch of complex hermitian matrices A_j.

(This is the blocked version of the algorithm).

The tridiagonal form of \(A_j\) is given by:

\[ T_j = Q_j' A_j Q_j \]

where \(T_j\) is Hermitian tridiagonal and \(Q_j\) is a unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_{n-1}} & \: \text{if uplo indicates lower, or}\\ Q_j = H_{j_{n-1}}H_{j_{n-2}}\cdots H_{j_1} & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_{j_i}\) is given by

\[ H_{j_i} = I - \text{tau}_j[i] \cdot v_{j_i} v_{j_i}' \]

where \(\text{tau}_j[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{j_i}\) are zero, and \(v_{j_i}[i] = 1\).

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the hermitian matrix A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrices A_j.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_j; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(j_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_j; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(j_i) stored as columns.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The diagonal elements of T_j.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    The off-diagonal elements of T_j.

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • [out] tau: pointer to type. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors tau_j of corresponding Householder scalars.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector tau_j to the next one tau_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygs2()

rocblas_status rocsolver_dsygs2(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *B, const rocblas_int ldb)
rocblas_status rocsolver_ssygs2(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *B, const rocblas_int ldb)

SYGS2 reduces a real symmetric-definite generalized eigenproblem to standard form.

(This is the unblocked version of the algorithm).

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U^{-T} A U^{-1}, & \: \text{or}\\ L^{-1} A L^{-T}, \end{array} \end{split}\]

where the symmetric-definite matrix B has been factorized as either \(U^T U\) or \(L L^T\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U A U^T, & \: \text{or}\\ L^T A L, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblem.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrix A is stored, and whether the factorization applied to B was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the transformed matrix associated with the equivalent standard eigenvalue problem.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] B: pointer to type. Array on the GPU of dimension ldb*n.

    The triangular factor of the matrix B, as returned by

    POTRF.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

rocsolver_<type>sygs2_batched()

rocblas_status rocsolver_dsygs2_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_ssygs2_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, const rocblas_int batch_count)

SYGS2_BATCHED reduces a batch of real symmetric-definite generalized eigenproblems to standard form.

(This is the unblocked version of the algorithm).

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then \(A_i\) is overwritten with

\[\begin{split} \begin{array}{cl} U_i^{-T} A_i U_i^{-1}, & \: \text{or}\\ L_i^{-1} A_i L_i^{-T}, \end{array} \end{split}\]

where the symmetric-definite matrix \(B_i\) has been factorized as either \(U_i^T U_i\) or \(L_i L_i^T\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U_i A_i U_i^T, & \: \text{or}\\ L_i^T A_i L_i, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored, and whether the factorization applied to B_i was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i. On exit, the transformed matrices associated with the equivalent standard eigenvalue problems.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [out] B: array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    The triangular factors of the matrices B_i, as returned by

    POTRF_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygs2_strided_batched()

rocblas_status rocsolver_dsygs2_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_ssygs2_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)

SYGS2_STRIDED_BATCHED reduces a batch of real symmetric-definite generalized eigenproblems to standard form.

(This is the unblocked version of the algorithm).

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then \(A_i\) is overwritten with

\[\begin{split} \begin{array}{cl} U_i^{-T} A_i U_i^{-1}, & \: \text{or}\\ L_i^{-1} A_i L_i^{-T}, \end{array} \end{split}\]

where the symmetric-definite matrix \(B_i\) has been factorized as either \(U_i^T U_i\) or \(L_i L_i^T\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U_i A_i U_i^T, & \: \text{or}\\ L_i^T A_i L_i, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored, and whether the factorization applied to B_i was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i. On exit, the transformed matrices associated with the equivalent standard eigenvalue problems.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] B: pointer to type. Array on the GPU (the size depends on the value of strideB).

    The triangular factors of the matrices B_i, as returned by

    POTRF_STRIDED_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_i to the next one B_(i+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*n.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegs2()

rocblas_status rocsolver_zhegs2(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb)
rocblas_status rocsolver_chegs2(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb)

HEGS2 reduces a hermitian-definite generalized eigenproblem to standard form.

(This is the unblocked version of the algorithm).

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U^{-H} A U^{-1}, & \: \text{or}\\ L^{-1} A L^{-H}, \end{array} \end{split}\]

where the hermitian-definite matrix B has been factorized as either \(U^H U\) or \(L L^H\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U A U^H, & \: \text{or}\\ L^H A L, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblem.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrix A is stored, and whether the factorization applied to B was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the transformed matrix associated with the equivalent standard eigenvalue problem.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] B: pointer to type. Array on the GPU of dimension ldb*n.

    The triangular factor of the matrix B, as returned by

    POTRF.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

rocsolver_<type>hegs2_batched()

rocblas_status rocsolver_zhegs2_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_chegs2_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)

HEGS2_BATCHED reduces a batch of hermitian-definite generalized eigenproblems to standard form.

(This is the unblocked version of the algorithm).

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then \(A_i\) is overwritten with

\[\begin{split} \begin{array}{cl} U_i^{-H} A_i U_i^{-1}, & \: \text{or}\\ L_i^{-1} A_i L_i^{-H}, \end{array} \end{split}\]

where the hermitian-definite matrix \(B_i\) has been factorized as either \(U_i^H U_i\) or \(L_i L_i^H\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U_i A_i U_i^H, & \: \text{or}\\ L_i^H A_i L_i, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored, and whether the factorization applied to B_i was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i. On exit, the transformed matrices associated with the equivalent standard eigenvalue problems.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [out] B: array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    The triangular factors of the matrices B_i, as returned by

    POTRF_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegs2_strided_batched()

rocblas_status rocsolver_zhegs2_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_chegs2_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)

HEGS2_STRIDED_BATCHED reduces a batch of hermitian-definite generalized eigenproblems to standard form.

(This is the unblocked version of the algorithm).

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then \(A_i\) is overwritten with

\[\begin{split} \begin{array}{cl} U_i^{-H} A_i U_i^{-1}, & \: \text{or}\\ L_i^{-1} A_i L_i^{-H}, \end{array} \end{split}\]

where the hermitian-definite matrix \(B_i\) has been factorized as either \(U_i^H U_i\) or \(L_i L_i^H\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U_i A_i U_i^H, & \: \text{or}\\ L_i^H A_i L_i, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored, and whether the factorization applied to B_i was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i. On exit, the transformed matrices associated with the equivalent standard eigenvalue problems.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] B: pointer to type. Array on the GPU (the size depends on the value of strideB).

    The triangular factors of the matrices B_i, as returned by

    POTRF_STRIDED_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_i to the next one B_(i+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*n.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygst()

rocblas_status rocsolver_dsygst(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *B, const rocblas_int ldb)
rocblas_status rocsolver_ssygst(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *B, const rocblas_int ldb)

SYGST reduces a real symmetric-definite generalized eigenproblem to standard form.

(This is the blocked version of the algorithm).

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U^{-T} A U^{-1}, & \: \text{or}\\ L^{-1} A L^{-T}, \end{array} \end{split}\]

where the symmetric-definite matrix B has been factorized as either \(U^T U\) or \(L L^T\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U A U^T, & \: \text{or}\\ L^T A L, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblem.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrix A is stored, and whether the factorization applied to B was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the transformed matrix associated with the equivalent standard eigenvalue problem.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] B: pointer to type. Array on the GPU of dimension ldb*n.

    The triangular factor of the matrix B, as returned by

    POTRF.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

rocsolver_<type>sygst_batched()

rocblas_status rocsolver_dsygst_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_ssygst_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, const rocblas_int batch_count)

SYGST_BATCHED reduces a batch of real symmetric-definite generalized eigenproblems to standard form.

(This is the blocked version of the algorithm).

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then \(A_i\) is overwritten with

\[\begin{split} \begin{array}{cl} U_i^{-T} A_i U_i^{-1}, & \: \text{or}\\ L_i^{-1} A_i L_i^{-T}, \end{array} \end{split}\]

where the symmetric-definite matrix \(B_i\) has been factorized as either \(U_i^T U_i\) or \(L_i L_i^T\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U_i A_i U_i^T, & \: \text{or}\\ L_i^T A_i L_i, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored, and whether the factorization applied to B_i was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i. On exit, the transformed matrices associated with the equivalent standard eigenvalue problems.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [out] B: array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    The triangular factors of the matrices B_i, as returned by

    POTRF_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygst_strided_batched()

rocblas_status rocsolver_dsygst_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_ssygst_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)

SYGST_STRIDED_BATCHED reduces a batch of real symmetric-definite generalized eigenproblems to standard form.

(This is the blocked version of the algorithm).

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then \(A_i\) is overwritten with

\[\begin{split} \begin{array}{cl} U_i^{-T} A_i U_i^{-1}, & \: \text{or}\\ L_i^{-1} A_i L_i^{-T}, \end{array} \end{split}\]

where the symmetric-definite matrix \(B_i\) has been factorized as either \(U_i^T U_i\) or \(L_i L_i^T\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U_i A_i U_i^T, & \: \text{or}\\ L_i^T A_i L_i, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored, and whether the factorization applied to B_i was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i. On exit, the transformed matrices associated with the equivalent standard eigenvalue problems.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] B: pointer to type. Array on the GPU (the size depends on the value of strideB).

    The triangular factors of the matrices B_i, as returned by

    POTRF_STRIDED_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_i to the next one B_(i+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*n.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegst()

rocblas_status rocsolver_zhegst(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb)
rocblas_status rocsolver_chegst(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb)

HEGST reduces a hermitian-definite generalized eigenproblem to standard form.

(This is the blocked version of the algorithm).

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U^{-H} A U^{-1}, & \: \text{or}\\ L^{-1} A L^{-H}, \end{array} \end{split}\]

where the hermitian-definite matrix B has been factorized as either \(U^H U\) or \(L L^H\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U A U^H, & \: \text{or}\\ L^H A L, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblem.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrix A is stored, and whether the factorization applied to B was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the transformed matrix associated with the equivalent standard eigenvalue problem.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] B: pointer to type. Array on the GPU of dimension ldb*n.

    The triangular factor of the matrix B, as returned by

    POTRF.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

rocsolver_<type>hegst_batched()

rocblas_status rocsolver_zhegst_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_chegst_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)

HEGST_BATCHED reduces a batch of hermitian-definite generalized eigenproblems to standard form.

(This is the blocked version of the algorithm).

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then \(A_i\) is overwritten with

\[\begin{split} \begin{array}{cl} U_i^{-H} A_i U_i^{-1}, & \: \text{or}\\ L_i^{-1} A_i L_i^{-H}, \end{array} \end{split}\]

where the hermitian-definite matrix \(B_i\) has been factorized as either \(U_i^H U_i\) or \(L_i L_i^H\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U_i A_i U_i^H, & \: \text{or}\\ L_i^H A_i L_i, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored, and whether the factorization applied to B_i was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i. On exit, the transformed matrices associated with the equivalent standard eigenvalue problems.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [out] B: array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    The triangular factors of the matrices B_i, as returned by

    POTRF_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegst_strided_batched()

rocblas_status rocsolver_zhegst_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_chegst_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)

HEGST_STRIDED_BATCHED reduces a batch of hermitian-definite generalized eigenproblems to standard form.

(This is the blocked version of the algorithm).

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype.

If the problem is of the 1st form, then \(A_i\) is overwritten with

\[\begin{split} \begin{array}{cl} U_i^{-H} A_i U_i^{-1}, & \: \text{or}\\ L_i^{-1} A_i L_i^{-H}, \end{array} \end{split}\]

where the hermitian-definite matrix \(B_i\) has been factorized as either \(U_i^H U_i\) or \(L_i L_i^H\) as returned by POTRF, depending on the value of uplo.

If the problem is of the 2nd or 3rd form, then A is overwritten with

\[\begin{split} \begin{array}{cl} U_i A_i U_i^H, & \: \text{or}\\ L_i^H A_i L_i, \end{array} \end{split}\]

also depending on the value of uplo.

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_i are stored, and whether the factorization applied to B_i was upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i. On exit, the transformed matrices associated with the equivalent standard eigenvalue problems.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] B: pointer to type. Array on the GPU (the size depends on the value of strideB).

    The triangular factors of the matrices B_i, as returned by

    POTRF_STRIDED_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_i to the next one B_(i+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*n.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

3.3.4. Linear-systems solvers

rocsolver_<type>trtri()

rocblas_status rocsolver_ztrtri(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_ctrtri(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dtrtri(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_strtri(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

TRTRI inverts a triangular n-by-n matrix A.

A can be upper or lower triangular, depending on the value of uplo, and unit or non-unit triangular, depending on the value of diag.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] diag: rocblas_diagonal.

    If diag indicates unit, then the diagonal elements of A are not referenced and assumed to be one.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the triangular matrix. On exit, the inverse of A if info = 0.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, A is singular. A[i,i] is the first zero element in the diagonal.

rocsolver_<type>trtri_batched()

rocblas_status rocsolver_ztrtri_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ctrtri_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dtrtri_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_strtri_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

TRTRI_BATCHED inverts a batch of triangular n-by-n matrices \(A_j\).

\(A_j\) can be upper or lower triangular, depending on the value of uplo, and unit or non-unit triangular, depending on the value of diag.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_j are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] diag: rocblas_diagonal.

    If diag indicates unit, then the diagonal elements of matrices A_j are not referenced and assumed to be one.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the triangular matrices A_j. On exit, the inverses of A_j if info[j] = 0.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, A_j is singular. A_j[i,i] is the first zero element in the diagonal.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>trtri_strided_batched()

rocblas_status rocsolver_ztrtri_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ctrtri_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dtrtri_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_strtri_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_diagonal diag, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

TRTRI_STRIDED_BATCHED inverts a batch of triangular n-by-n matrices \(A_j\).

\(A_j\) can be upper or lower triangular, depending on the value of uplo, and unit or non-unit triangular, depending on the value of diag.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the matrices A_j are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] diag: rocblas_diagonal.

    If diag indicates unit, then the diagonal elements of matrices A_j are not referenced and assumed to be one.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the triangular matrices A_j. On exit, the inverses of A_j if info[j] = 0.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, A_j is singular. A_j[i,i] is the first zero element in the diagonal.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getri()

rocblas_status rocsolver_zgetri(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_cgetri(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_dgetri(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)
rocblas_status rocsolver_sgetri(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)

GETRI inverts a general n-by-n matrix A using the LU factorization computed by GETRF.

The inverse is computed by solving the linear system

\[ A^{-1}L = U^{-1} \]

where L is the lower triangular factor of A with unit diagonal elements, and U is the upper triangular factor.

Parameters
  • [in] handle: rocblas_handle.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the factors L and U of the factorization A = P*L*U returned by

    GETRF. On exit, the inverse of A if info = 0; otherwise undefined.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The pivot indices returned by

    GETRF.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

rocsolver_<type>getri_batched()

rocblas_status rocsolver_zgetri_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetri_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetri_batched(rocblas_handle handle, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetri_batched(rocblas_handle handle, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRI_BATCHED inverts a batch of general n-by-n matrices using the LU factorization computed by GETRF_BATCHED.

The inverse of matrix \(A_j\) in the batch is computed by solving the linear system

\[ A_j^{-1} L_j = U_j^{-1} \]

where \(L_j\) is the lower triangular factor of \(A_j\) with unit diagonal elements, and \(U_j\) is the upper triangular factor.

Parameters
  • [in] handle: rocblas_handle.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the factors L_j and U_j of the factorization A = P_j*L_j*U_j returned by

    GETRF_BATCHED. On exit, the inverses of A_j if info[j] = 0; otherwise undefined.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    The pivot indices returned by

    GETRF_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(i+j). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getri_strided_batched()

rocblas_status rocsolver_zgetri_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgetri_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgetri_strided_batched(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgetri_strided_batched(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)

GETRI_STRIDED_BATCHED inverts a batch of general n-by-n matrices using the LU factorization computed by GETRF_STRIDED_BATCHED.

The inverse of matrix \(A_j\) in the batch is computed by solving the linear system

\[ A_j^{-1} L_j = U_j^{-1} \]

where \(L_j\) is the lower triangular factor of \(A_j\) with unit diagonal elements, and \(U_j\) is the upper triangular factor.

Parameters
  • [in] handle: rocblas_handle.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by

    GETRF_STRIDED_BATCHED. On exit, the inverses of A_j if info[j] = 0; otherwise undefined.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [in] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    The pivot indices returned by

    GETRF_STRIDED_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrs()

rocblas_status rocsolver_zgetrs(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, const rocblas_int *ipiv, rocblas_double_complex *B, const rocblas_int ldb)
rocblas_status rocsolver_cgetrs(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, const rocblas_int *ipiv, rocblas_float_complex *B, const rocblas_int ldb)
rocblas_status rocsolver_dgetrs(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, const rocblas_int *ipiv, double *B, const rocblas_int ldb)
rocblas_status rocsolver_sgetrs(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, const rocblas_int *ipiv, float *B, const rocblas_int ldb)

GETRS solves a system of n linear equations on n variables in its factorized form.

It solves one of the following systems, depending on the value of trans:

\[\begin{split} \begin{array}{cl} A X = B & \: \text{not transposed,}\\ A^T X = B & \: \text{transposed, or}\\ A^H X = B & \: \text{conjugate transposed.} \end{array} \end{split}\]

Matrix A is defined by its triangular factors as returned by GETRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] trans: rocblas_operation.

    Specifies the form of the system of equations.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of A.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of the matrix B.

  • [in] A: pointer to type. Array on the GPU of dimension lda*n.

    The factors L and U of the factorization A = P*L*U returned by

    GETRF.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The pivot indices returned by

    GETRF.

  • [inout] B: pointer to type. Array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrix B. On exit, the solution matrix X.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of B.

rocsolver_<type>getrs_batched()

rocblas_status rocsolver_zgetrs_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *const A[], const rocblas_int lda, const rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrs_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *const A[], const rocblas_int lda, const rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrs_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, double *const A[], const rocblas_int lda, const rocblas_int *ipiv, const rocblas_stride strideP, double *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrs_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, float *const A[], const rocblas_int lda, const rocblas_int *ipiv, const rocblas_stride strideP, float *const B[], const rocblas_int ldb, const rocblas_int batch_count)

GETRS_BATCHED solves a batch of systems of n linear equations on n variables in its factorized forms.

For each instance j in the batch, it solves one of the following systems, depending on the value of trans:

\[\begin{split} \begin{array}{cl} A_j X_j = B_j & \: \text{not transposed,}\\ A_j^T X_j = B_j & \: \text{transposed, or}\\ A_j^H X_j = B_j & \: \text{conjugate transposed.} \end{array} \end{split}\]

Matrix \(A_j\) is defined by its triangular factors as returned by GETRF_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] trans: rocblas_operation.

    Specifies the form of the system of equations of each instance in the batch.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    The factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by

    GETRF_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of pivot indices returned by

    GETRF_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [inout] B: Array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

rocsolver_<type>getrs_strided_batched()

rocblas_status rocsolver_zgetrs_strided_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, const rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_cgetrs_strided_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, const rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_dgetrs_strided_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, const rocblas_stride strideA, const rocblas_int *ipiv, const rocblas_stride strideP, double *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_sgetrs_strided_batched(rocblas_handle handle, const rocblas_operation trans, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, const rocblas_stride strideA, const rocblas_int *ipiv, const rocblas_stride strideP, float *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)

GETRS_STRIDED_BATCHED solves a batch of systems of n linear equations on n variables in its factorized forms.

For each instance j in the batch, it solves one of the following systems, depending on the value of trans:

\[\begin{split} \begin{array}{cl} A_j X_j = B_j & \: \text{not transposed,}\\ A_j^T X_j = B_j & \: \text{transposed, or}\\ A_j^H X_j = B_j & \: \text{conjugate transposed.} \end{array} \end{split}\]

Matrix \(A_j\) is defined by its triangular factors as returned by GETRF_STRIDED_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] trans: rocblas_operation.

    Specifies the form of the system of equations of each instance in the batch.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    The factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by

    GETRF_STRIDED_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [in] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    Contains the vectors ipiv_j of pivot indices returned by

    GETRF_STRIDED_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [inout] B: pointer to type. Array on the GPU (size depends on the value of strideB).

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_j to the next one B_(j+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*nrhs.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

rocsolver_<type>gesv()

rocblas_status rocsolver_zgesv(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_double_complex *B, const rocblas_int ldb, rocblas_int *info)
rocblas_status rocsolver_cgesv(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_float_complex *B, const rocblas_int ldb, rocblas_int *info)
rocblas_status rocsolver_dgesv(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, rocblas_int *ipiv, double *B, const rocblas_int ldb, rocblas_int *info)
rocblas_status rocsolver_sgesv(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, rocblas_int *ipiv, float *B, const rocblas_int ldb, rocblas_int *info)

GESV solves a general system of n linear equations on n variables.

The linear system is of the form

\[ A X = B \]

where A is a general n-by-n matrix. Matrix A is first factorized in triangular factors L and U using GETRF; then, the solution is computed with GETRS.

Parameters
  • [in] handle: rocblas_handle.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of A.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of the matrix B.

  • [in] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, if info = 0, the factors L and U of the LU decomposition of A returned by

    GETRF.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU of dimension n.

    The pivot indices returned by

    GETRF.

  • [inout] B: pointer to type. Array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrix B. On exit, the solution matrix X.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of B.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular, and the solution could not be computed. U[i,i] is the first zero element in the diagonal.

rocsolver_<type>gesv_batched()

rocblas_status rocsolver_zgesv_batched(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgesv_batched(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgesv_batched(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, double *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgesv_batched(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, float *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)

GESV_BATCHED solves a batch of general systems of n linear equations on n variables.

The linear systems are of the form

\[ A_j X_j = B_j \]

where \(A_j\) is a general n-by-n matrix. Matrix \(A_j\) is first factorized in triangular factors \(L_j\) and \(U_j\) using GETRF_BATCHED; then, the solutions are computed with GETRS_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j. On exit, if info_j = 0, the factors L_j and U_j of the LU decomposition of A_j returned by

    GETRF_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    The vectors ipiv_j of pivot indices returned by

    GETRF_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [inout] B: Array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for A_j. If info[i] = j > 0, U_i is singular, and the solution could not be computed. U_j[i,i] is the first zero element in the diagonal.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

rocsolver_<type>gesv_strided_batched()

rocblas_status rocsolver_zgesv_strided_batched(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgesv_strided_batched(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgesv_strided_batched(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, double *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgesv_strided_batched(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, float *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)

GESV_STRIDED_BATCHED solves a batch of general systems of n linear equations on n variables.

The linear systems are of the form

\[ A_j X_j = B_j \]

where \(A_j\) is a general n-by-n matrix. Matrix \(A_j\) is first factorized in triangular factors \(L_j\) and \(U_j\) using GETRF_STRIDED_BATCHED; then, the solutions are computed with GETRS_STRIDED_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j. On exit, if info_j = 0, the factors L_j and U_j of the LU decomposition of A_j returned by

    GETRF_STRIDED_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] ipiv: pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    The vectors ipiv_j of pivot indices returned by

    GETRF_STRIDED_BATCHED.

  • [in] strideP: rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • [inout] B: pointer to type. Array on the GPU (size depends on the value of strideB).

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_j to the next one B_(j+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*nrhs.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for A_j. If info[i] = j > 0, U_i is singular, and the solution could not be computed. U_j[i,i] is the first zero element in the diagonal.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

rocsolver_<type>potri()

rocblas_status rocsolver_zpotri(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_cpotri(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_dpotri(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)
rocblas_status rocsolver_spotri(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)

POTRI inverts a symmetric/hermitian positive definite matrix A.

The inverse of matrix \(A\) is computed as

\[\begin{split} \begin{array}{cl} A^{-1} = U^{-1} {U^{-1}}' & \: \text{if uplo is upper, or}\\ A^{-1} = {L^{-1}}' L^{-1} & \: \text{if uplo is lower.} \end{array} \end{split}\]

where \(U\) or \(L\) is the triangular factor of the Cholesky factorization of \(A\) returned by POTRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the factor L or U of the Cholesky factorization of A returned by

    POTRF. On exit, the inverses of A if info = 0.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit for inversion of A. If info = j > 0, A is singular. L[j,j] or U[j,j] is zero.

rocsolver_<type>potri_batched()

rocblas_status rocsolver_zpotri_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotri_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotri_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotri_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)

POTRI_BATCHED inverts a batch of symmetric/hermitian positive definite matrices \(A_i\).

The inverse of matrix \(A_i\) in the batch is computed as

\[\begin{split} \begin{array}{cl} A_i^{-1} = U_i^{-1} {U_i^{-1}}' & \: \text{if uplo is upper, or}\\ A_i^{-1} = {L_i^{-1}}' L_i^{-1} & \: \text{if uplo is lower.} \end{array} \end{split}\]

where \(U_i\) or \(L_i\) is the triangular factor of the Cholesky factorization of \(A_i\) returned by POTRF_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix A_i.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the factor L_i or U_i of the Cholesky factorization of A_i returned by

    POTRF_BATCHED. On exit, the inverses of A_i if info[i] = 0.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for inversion of A_i. If info[i] = j > 0, A_i is singular. L_i[j,j] or U_i[j,j] is zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potri_strided_batched()

rocblas_status rocsolver_zpotri_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cpotri_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dpotri_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_spotri_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)

POTRI_STRIDED_BATCHED inverts a batch of symmetric/hermitian positive definite matrices \(A_i\).

The inverse of matrix \(A_i\) in the batch is computed as

\[\begin{split} \begin{array}{cl} A_i^{-1} = U_i^{-1} {U_i^{-1}}' & \: \text{if uplo is upper, or}\\ A_i^{-1} = {L_i^{-1}}' L_i^{-1} & \: \text{if uplo is lower.} \end{array} \end{split}\]

where \(U_i\) or \(L_i\) is the triangular factor of the Cholesky factorization of \(A_i\) returned by POTRF_STRIDED_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The number of rows and columns of matrix A_i.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the factor L_i or U_i of the Cholesky factorization of A_i returned by

    POTRF_STRIDED_BATCHED. On exit, the inverses of A_i if info[i] = 0.

  • [in] lda: rocblas_int. lda >= n.

    specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for inversion of A_i. If info[i] = j > 0, A_i is singular. L_i[j,j] or U_i[j,j] is zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>potrs()

rocblas_status rocsolver_zpotrs(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb)
rocblas_status rocsolver_cpotrs(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb)
rocblas_status rocsolver_dpotrs(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, double *B, const rocblas_int ldb)
rocblas_status rocsolver_spotrs(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, float *B, const rocblas_int ldb)

POTRS solves a symmetric/hermitian system of n linear equations on n variables in its factorized form.

It solves the system

\[ A X = B \]

where A is a real symmetric (complex hermitian) positive definite matrix defined by its triangular factor

\[\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}\]

as returned by POTRF.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of A.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of the matrix B.

  • [in] A: pointer to type. Array on the GPU of dimension lda*n.

    The factor L or U of the Cholesky factorization of A returned by

    POTRF.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A.

  • [inout] B: pointer to type. Array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrix B. On exit, the solution matrix X.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of B.

rocsolver_<type>potrs_batched()

rocblas_status rocsolver_zpotrs_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_cpotrs_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_dpotrs_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, const rocblas_int batch_count)
rocblas_status rocsolver_spotrs_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, const rocblas_int batch_count)

POTRS_BATCHED solves a batch of symmetric/hermitian systems of n linear equations on n variables in its factorized forms.

For each instance j in the batch, it solves the system

\[ A_j X_j = B_j \]

where \(A_j\) is a real symmetric (complex hermitian) positive definite matrix defined by its triangular factor

\[\begin{split} \begin{array}{cl} A_j = U_j'U_j & \: \text{if uplo is upper, or}\\ A_j = L_jL_j' & \: \text{if uplo is lower.} \end{array} \end{split}\]

as returned by POTRF_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    The factor L_j or U_j of the Cholesky factorization of A_j returned by

    POTRF_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [inout] B: Array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

rocsolver_<type>potrs_strided_batched()

rocblas_status rocsolver_zpotrs_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_cpotrs_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_dpotrs_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)
rocblas_status rocsolver_spotrs_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, const rocblas_int batch_count)

POTRS_STRIDED_BATCHED solves a batch of symmetric/hermitian systems of n linear equations on n variables in its factorized forms.

For each instance j in the batch, it solves the system

\[ A_j X_j = B_j \]

where \(A_j\) is a real symmetric (complex hermitian) positive definite matrix defined by its triangular factor

\[\begin{split} \begin{array}{cl} A_j = U_j'U_j & \: \text{if uplo is upper, or}\\ A_j = L_jL_j' & \: \text{if uplo is lower.} \end{array} \end{split}\]

as returned by POTRF_STRIDED_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    The factor L_j or U_j of the Cholesky factorization of A_j returned by

    POTRF_STRIDED_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [inout] B: pointer to type. Array on the GPU (size depends on the value of strideB).

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_j to the next one B_(j+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*nrhs.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

rocsolver_<type>posv()

rocblas_status rocsolver_zposv(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb, rocblas_int *info)
rocblas_status rocsolver_cposv(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb, rocblas_int *info)
rocblas_status rocsolver_dposv(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, double *B, const rocblas_int ldb, rocblas_int *info)
rocblas_status rocsolver_sposv(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, float *B, const rocblas_int ldb, rocblas_int *info)

POSV solves a symmetric/hermitian system of n linear equations on n variables.

It solves the system

\[ A X = B \]

where A is a real symmetric (complex hermitian) positive definite matrix. Matrix A is first factorized as \(A=LL'\) or \(A=U'U\), depending on the value of uplo, using POTRF; then, the solution is computed with POTRS.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of A.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of the matrix B.

  • [in] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the symmetric/hermitian matrix A. On exit, if info = 0, the factor L or U of the Cholesky factorization of A returned by

    POTRF.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of A.

  • [inout] B: pointer to type. Array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrix B. On exit, the solution matrix X.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of B.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = j > 0, the leading minor of order j of A is not positive definite. The solution could not be computed.

rocsolver_<type>posv_batched()

rocblas_status rocsolver_zposv_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cposv_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dposv_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sposv_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)

POSV_BATCHED solves a batch of symmetric/hermitian systems of n linear equations on n variables.

For each instance j in the batch, it solves the system

\[ A_j X_j = B_j \]

where \(A_j\) is a real symmetric (complex hermitian) positive definite matrix. Matrix \(A_j\) is first factorized as \(A_j=L_jL_j'\) or \(A_j=U_j'U_j\), depending on the value of uplo, using POTRF_BATCHED; then, the solution is computed with POTRS_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the symmetric/hermitian matrices A_j. On exit, if info[j] = 0, the factor L_j or U_j of the Cholesky factorization of A_j returned by

    POTRF_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [inout] B: Array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*nrhs.

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit. If info[j] = i > 0, the leading minor of order i of A_j is not positive definite. The j-th solution could not be computed.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

rocsolver_<type>posv_strided_batched()

rocblas_status rocsolver_zposv_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cposv_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dposv_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sposv_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)

POSV_STRIDED_BATCHED solves a batch of symmetric/hermitian systems of n linear equations on n variables.

For each instance j in the batch, it solves the system

\[ A_j X_j = B_j \]

where \(A_j\) is a real symmetric (complex hermitian) positive definite matrix. Matrix \(A_j\) is first factorized as \(A_j=L_jL_j'\) or \(A_j=U_j'U_j\), depending on the value of uplo, using POTRF_STRIDED_BATCHED; then, the solution is computed with POTRS_STRIDED_BATCHED.

Parameters
  • [in] handle: rocblas_handle.

  • [in] uplo: rocblas_fill.

    Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    The order of the system, i.e. the number of columns and rows of all A_j matrices.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of right hand sides, i.e., the number of columns of all the matrices B_j.

  • [in] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the symmetric/hermitian matrices A_j. On exit, if info[j] = 0, the factor L_j or U_j of the Cholesky factorization of A_j returned by

    POTRF_STRIDED_BATCHED.

  • [in] lda: rocblas_int. lda >= n.

    The leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [inout] B: pointer to type. Array on the GPU (size depends on the value of strideB).

    On entry, the right hand side matrices B_j. On exit, the solution matrix X_j of each system in the batch.

  • [in] ldb: rocblas_int. ldb >= n.

    The leading dimension of matrices B_j.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_j to the next one B_(j+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*nrhs.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit. If info[j] = i > 0, the leading minor of order i of A_j is not positive definite. The j-th solution could not be computed.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of instances (systems) in the batch.

3.3.5. Least-squares solvers

rocsolver_<type>gels()

rocblas_status rocsolver_zgels(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb, rocblas_int *info)
rocblas_status rocsolver_cgels(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb, rocblas_int *info)
rocblas_status rocsolver_dgels(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, double *B, const rocblas_int ldb, rocblas_int *info)
rocblas_status rocsolver_sgels(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, float *B, const rocblas_int ldb, rocblas_int *info)

GELS solves an overdetermined (or underdetermined) linear system defined by an m-by-n matrix A, and a corresponding matrix B, using the QR factorization computed by GEQRF (or the LQ factorization computed by GELQF).

Depending on the value of trans, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = B & \: \text{not transposed, or}\\ A' X = B & \: \text{transposed if real, or conjugate transposed if complex} \end{array} \end{split}\]

If m >= n (or m < n in the case of transpose/conjugate transpose), the system is overdetermined and a least-squares solution approximating X is found by minimizing

\[ || B - A X || \quad \text{(or} \: || B - A' X ||\text{)} \]

If m < n (or m >= n in the case of transpose/conjugate transpose), the system is underdetermined and a unique solution for X is chosen such that \(|| X ||\) is minimal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] trans: rocblas_operation.

    Specifies the form of the system of equations.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of matrix A.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of columns of matrices B and X; i.e., the columns on the right hand side.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the QR (or LQ) factorization of A as returned by

    GEQRF (or GELQF).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrix A.

  • [inout] B: pointer to type. Array on the GPU of dimension ldb*nrhs.

    On entry, the matrix B. On exit, when info = 0, B is overwritten by the solution vectors (and the residuals in the overdetermined cases) stored as columns.

  • [in] ldb: rocblas_int. ldb >= max(m,n).

    Specifies the leading dimension of matrix B.

  • [out] info: pointer to rocblas_int on the GPU.

    If info = 0, successful exit. If info = j > 0, the solution could not be computed because input matrix A is rank deficient; the j-th diagonal element of its triangular factor is zero.

rocsolver_<type>gels_batched()

rocblas_status rocsolver_zgels_batched(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgels_batched(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgels_batched(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgels_batched(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, rocblas_int *info, const rocblas_int batch_count)

GELS_BATCHED solves a batch of overdetermined (or underdetermined) linear systems defined by a set of m-by-n matrices \(A_i\), and corresponding matrices \(B_i\), using the QR factorizations computed by GEQRF_BATCHED (or the LQ factorizations computed by GELQF_BATCHED).

For each instance in the batch, depending on the value of trans, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = B_i & \: \text{not transposed, or}\\ A_i' X_i = B_i & \: \text{transposed if real, or conjugate transposed if complex} \end{array} \end{split}\]

If m >= n (or m < n in the case of transpose/conjugate transpose), the system is overdetermined and a least-squares solution approximating X_i is found by minimizing

\[ || B_i - A_i X_i || \quad \text{(or} \: || B_i - A_i' X_i ||\text{)} \]

If m < n (or m >= n in the case of transpose/conjugate transpose), the system is underdetermined and a unique solution for X_i is chosen such that \(|| X_i ||\) is minimal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] trans: rocblas_operation.

    Specifies the form of the system of equations.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of columns of all matrices B_i and X_i in the batch; i.e., the columns on the right hand side.

  • [inout] A: array of pointer to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_i. On exit, the QR (or LQ) factorizations of A_i as returned by

    GEQRF_BATCHED (or GELQF_BATCHED).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [inout] B: array of pointer to type. Each pointer points to an array on the GPU of dimension ldb*nrhs.

    On entry, the matrices B_i. On exit, when info[i] = 0, B_i is overwritten by the solution vectors (and the residuals in the overdetermined cases) stored as columns.

  • [in] ldb: rocblas_int. ldb >= max(m,n).

    Specifies the leading dimension of matrices B_i.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for solution of A_i. If info[i] = j > 0, the solution of A_i could not be computed because input matrix A_i is rank deficient; the j-th diagonal element of its triangular factor is zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gels_strided_batched()

rocblas_status rocsolver_zgels_strided_batched(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgels_strided_batched(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgels_strided_batched(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgels_strided_batched(rocblas_handle handle, rocblas_operation trans, const rocblas_int m, const rocblas_int n, const rocblas_int nrhs, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_int *info, const rocblas_int batch_count)

GELS_STRIDED_BATCHED solves a batch of overdetermined (or underdetermined) linear systems defined by a set of m-by-n matrices \(A_i\), and corresponding matrices \(B_i\), using the QR factorizations computed by GEQRF_STRIDED_BATCHED (or the LQ factorizations computed by GELQF_STRIDED_BATCHED).

For each instance in the batch, depending on the value of trans, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = B_i & \: \text{not transposed, or}\\ A_i' X_i = B_i & \: \text{transposed if real, or conjugate transposed if complex} \end{array} \end{split}\]

If m >= n (or m < n in the case of transpose/conjugate transpose), the system is overdetermined and a least-squares solution approximating X_i is found by minimizing

\[ || B_i - A_i X_i || \quad \text{(or} \: || B_i - A_i' X_i ||\text{)} \]

If m < n (or m >= n in the case of transpose/conjugate transpose), the system is underdetermined and a unique solution for X_i is chosen such that \(|| X_i ||\) is minimal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] trans: rocblas_operation.

    Specifies the form of the system of equations.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_i in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_i in the batch.

  • [in] nrhs: rocblas_int. nrhs >= 0.

    The number of columns of all matrices B_i and X_i in the batch; i.e., the columns on the right hand side.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_i. On exit, the QR (or LQ) factorizations of A_i as returned by

    GEQRF_STRIDED_BATCHED (or GELQF_STRIDED_BATCHED).

  • [in] lda: rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • [inout] B: pointer to type. Array on the GPU (the size depends on the value of strideB).

    On entry, the matrices B_i. On exit, when info = 0, each B_i is overwritten by the solution vectors (and the residuals in the overdetermined cases) stored as columns.

  • [in] ldb: rocblas_int. ldb >= max(m,n).

    Specifies the leading dimension of matrices B_i.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_i to the next one B_(i+1). There is no restriction for the value of strideB. Normal use case is strideB >= ldb*nrhs

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit for solution of A_i. If info[i] = j > 0, the solution of A_i could not be computed because input matrix A_i is rank deficient; the j-th diagonal element of its triangular factor is zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

3.3.6. Symmetric eigensolvers

rocsolver_<type>syev()

rocblas_status rocsolver_dsyev(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, rocblas_int *info)
rocblas_status rocsolver_ssyev(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, rocblas_int *info)

SYEV computes the eigenvalues and optionally the eigenvectors of a real symmetric matrix A.

The eigenvalues are returned in ascending order. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the eigenvectors of A if they were computed and the algorithm converged; otherwise the contents of A are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrix A.

  • [out] D: pointer to type. Array on the GPU of dimension n.

    The eigenvalues of A in increasing order.

  • [out] E: pointer to type. Array on the GPU of dimension n.

    This array is used to work internally with the tridiagonal matrix T associated with A. On exit, if info > 0, it contains the unconverged off-diagonal elements of T (or properly speaking, a tridiagonal matrix equivalent to T). The diagonal elements of this matrix are in D; those that converged correspond to a subset of the eigenvalues of A (not necessarily ordered).

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, the algorithm did not converge. i elements of E did not converge to zero.

rocsolver_<type>syev_batched()

rocblas_status rocsolver_dsyev_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssyev_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

SYEV_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of real symmetric matrices A_j.

The eigenvalues are returned in ascending order. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    The eigenvalues of A_j in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_j associated with A_j. On exit, if info[j] > 0, E_j contains the unconverged off-diagonal elements of T_j (or properly speaking, a tridiagonal matrix equivalent to T_j). The diagonal elements of this matrix are in D_j; those that converged correspond to a subset of the eigenvalues of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = i > 0, the algorithm did not converge. i elements of E_j did not converge to zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>syev_strided_batched()

rocblas_status rocsolver_dsyev_strided_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssyev_strided_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

SYEV_STRIDED_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of real symmetric matrices A_j.

The eigenvalues are returned in ascending order. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    The eigenvalues of A_j in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_j associated with A_j. On exit, if info[j] > 0, E_j contains the unconverged off-diagonal elements of T_j (or properly speaking, a tridiagonal matrix equivalent to T_j). The diagonal elements of this matrix are in D_j; those that converged correspond to a subset of the eigenvalues of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = i > 0, the algorithm did not converge. i elements of E_j did not converge to zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>heev()

rocblas_status rocsolver_zheev(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_int *info)
rocblas_status rocsolver_cheev(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_int *info)

HEEV computes the eigenvalues and optionally the eigenvectors of a Hermitian matrix A.

The eigenvalues are returned in ascending order. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the Hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the eigenvectors of A if they were computed and the algorithm converged; otherwise the contents of A are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrix A.

  • [out] D: pointer to real type. Array on the GPU of dimension n.

    The eigenvalues of A in increasing order.

  • [out] E: pointer to real type. Array on the GPU of dimension n.

    This array is used to work internally with the tridiagonal matrix T associated with A. On exit, if info > 0, it contains the unconverged off-diagonal elements of T (or properly speaking, a tridiagonal matrix equivalent to T). The diagonal elements of this matrix are in D; those that converged correspond to a subset of the eigenvalues of A (not necessarily ordered).

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, the algorithm did not converge. i elements of E did not converge to zero.

rocsolver_<type>heev_batched()

rocblas_status rocsolver_zheev_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cheev_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

HEEV_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of Hermitian matrices A_j.

The eigenvalues are returned in ascending order. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the Hermitian matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The eigenvalues of A_j in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_j associated with A_j. On exit, if info[j] > 0, E_j contains the unconverged off-diagonal elements of T_j (or properly speaking, a tridiagonal matrix equivalent to T_j). The diagonal elements of this matrix are in D_j; those that converged correspond to a subset of the eigenvalues of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = i > 0, the algorithm did not converge. i elements of E_j did not converge to zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>heev_strided_batched()

rocblas_status rocsolver_zheev_strided_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cheev_strided_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

HEEV_STRIDED_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of Hermitian matrices A_j.

The eigenvalues are returned in ascending order. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the Hermitian matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The eigenvalues of A_j in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_j associated with A_j. On exit, if info[j] > 0, E_j contains the unconverged off-diagonal elements of T_j (or properly speaking, a tridiagonal matrix equivalent to T_j). The diagonal elements of this matrix are in D_j; those that converged correspond to a subset of the eigenvalues of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = i > 0, the algorithm did not converge. i elements of E_j did not converge to zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>syevd()

rocblas_status rocsolver_dsyevd(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, rocblas_int *info)
rocblas_status rocsolver_ssyevd(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, rocblas_int *info)

SYEVD computes the eigenvalues and optionally the eigenvectors of a real symmetric matrix A.

The eigenvalues are returned in ascending order. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the eigenvectors of A if they were computed and the algorithm converged; otherwise the contents of A are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrix A.

  • [out] D: pointer to type. Array on the GPU of dimension n.

    The eigenvalues of A in increasing order.

  • [out] E: pointer to type. Array on the GPU of dimension n.

    This array is used to work internally with the tridiagonal matrix T associated with A. On exit, if info > 0, it contains the unconverged off-diagonal elements of T (or properly speaking, a tridiagonal matrix equivalent to T). The diagonal elements of this matrix are in D; those that converged correspond to a subset of the eigenvalues of A (not necessarily ordered).

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0 and evect is rocblas_evect_none, the algorithm did not converge. i elements of E did not converge to zero. If info = i > 0 and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [i/(n+1), i/(n+1)] to [i%(n+1), i%(n+1)].

rocsolver_<type>syevd_batched()

rocblas_status rocsolver_dsyevd_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssyevd_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

SYEVD_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of real symmetric matrices A_j.

The eigenvalues are returned in ascending order. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    The eigenvalues of A_j in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_j associated with A_j. On exit, if info[j] > 0, E_j contains the unconverged off-diagonal elements of T_j (or properly speaking, a tridiagonal matrix equivalent to T_j). The diagonal elements of this matrix are in D_j; those that converged correspond to a subset of the eigenvalues of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = i > 0 and evect is rocblas_evect_none, the algorithm did not converge. i elements of E_j did not converge to zero. If info[j] = i > 0 and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [i/(n+1), i/(n+1)] to [i%(n+1), i%(n+1)].

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>syevd_strided_batched()

rocblas_status rocsolver_dsyevd_strided_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssyevd_strided_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

SYEVD_STRIDED_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of real symmetric matrices A_j.

The eigenvalues are returned in ascending order. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    The eigenvalues of A_j in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_j associated with A_j. On exit, if info[j] > 0, E_j contains the unconverged off-diagonal elements of T_j (or properly speaking, a tridiagonal matrix equivalent to T_j). The diagonal elements of this matrix are in D_j; those that converged correspond to a subset of the eigenvalues of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = i > 0 and evect is rocblas_evect_none, the algorithm did not converge. i elements of E_j did not converge to zero. If info[j] = i > 0 and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [i/(n+1), i/(n+1)] to [i%(n+1), i%(n+1)].

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>heevd()

rocblas_status rocsolver_zheevd(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_int *info)
rocblas_status rocsolver_cheevd(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_int *info)

HEEVD computes the eigenvalues and optionally the eigenvectors of a Hermitian matrix A.

The eigenvalues are returned in ascending order. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the Hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the eigenvectors of A if they were computed and the algorithm converged; otherwise the contents of A are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrix A.

  • [out] D: pointer to real type. Array on the GPU of dimension n.

    The eigenvalues of A in increasing order.

  • [out] E: pointer to real type. Array on the GPU of dimension n.

    This array is used to work internally with the tridiagonal matrix T associated with A. On exit, if info > 0, it contains the unconverged off-diagonal elements of T (or properly speaking, a tridiagonal matrix equivalent to T). The diagonal elements of this matrix are in D; those that converged correspond to a subset of the eigenvalues of A (not necessarily ordered).

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0 and evect is rocblas_evect_none, the algorithm did not converge. i elements of E did not converge to zero. If info = i > 0 and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [i/(n+1), i/(n+1)] to [i%(n+1), i%(n+1)].

rocsolver_<type>heevd_batched()

rocblas_status rocsolver_zheevd_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cheevd_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

HEEVD_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of Hermitian matrices A_j.

The eigenvalues are returned in ascending order. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the Hermitian matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The eigenvalues of A_j in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_j associated with A_j. On exit, if info[j] > 0, E_j contains the unconverged off-diagonal elements of T_j (or properly speaking, a tridiagonal matrix equivalent to T_j). The diagonal elements of this matrix are in D_j; those that converged correspond to a subset of the eigenvalues of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = i > 0 and evect is rocblas_evect_none, the algorithm did not converge. i elements of E_j did not converge to zero. If info[j] = i > 0 and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [i/(n+1), i/(n+1)] to [i%(n+1), i%(n+1)].

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>heevd_strided_batched()

rocblas_status rocsolver_zheevd_strided_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cheevd_strided_batched(rocblas_handle handle, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

HEEVD_STRIDED_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of Hermitian matrices A_j.

The eigenvalues are returned in ascending order. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect. The computed eigenvectors are orthonormal.

Parameters
  • [in] handle: rocblas_handle.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower part of the Hermitian matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • [in] n: rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    The eigenvalues of A_j in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_j associated with A_j. On exit, if info[j] > 0, E_j contains the unconverged off-diagonal elements of T_j (or properly speaking, a tridiagonal matrix equivalent to T_j). The diagonal elements of this matrix are in D_j; those that converged correspond to a subset of the eigenvalues of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = i > 0 and evect is rocblas_evect_none, the algorithm did not converge. i elements of E_j did not converge to zero. If info[j] = i > 0 and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [i/(n+1), i/(n+1)] to [i%(n+1), i%(n+1)].

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygv()

rocblas_status rocsolver_dsygv(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *B, const rocblas_int ldb, double *D, double *E, rocblas_int *info)
rocblas_status rocsolver_ssygv(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *B, const rocblas_int ldb, float *D, float *E, rocblas_int *info)

SYGV computes the eigenvalues and (optionally) eigenvectors of a real generalized symmetric-definite eigenproblem.

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed depending on the value of evect.

When computed, the matrix Z of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z^T B Z=I & \: \text{if 1st or 2nd form, or}\\ Z^T B^{-1} Z=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblem.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A and B are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the symmetric matrix A. On exit, if evect is original, the normalized matrix Z of eigenvectors. If evect is none, then the upper or lower triangular part of the matrix A (including the diagonal) is destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] B: pointer to type. Array on the GPU of dimension ldb*n.

    On entry, the symmetric positive definite matrix B. On exit, the triangular factor of B as returned by

    POTRF.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

  • [out] D: pointer to type. Array on the GPU of dimension n.

    On exit, the eigenvalues in increasing order.

  • [out] E: pointer to type. Array on the GPU of dimension n.

    This array is used to work internally with the tridiagonal matrix T associated with the reduced eigenvalue problem. On exit, if 0 < info <= n, it contains the unconverged off-diagonal elements of T (or properly speaking, a tridiagonal matrix equivalent to T). The diagonal elements of this matrix are in D; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = j <= n, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info = n + j, the leading minor of order j of B is not positive definite.

rocsolver_<type>sygv_batched()

rocblas_status rocsolver_dsygv_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssygv_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

SYGV_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of real generalized symmetric-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed depending on the value of evect.

When computed, the matrix \(Z_i\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_i^T B_i Z_i=I & \: \text{if 1st or 2nd form, or}\\ Z_i^T B_i^{-1} Z_i=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_i and B_i are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the symmetric matrices A_i. On exit, if evect is original, the normalized matrix Z_i of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_i (including the diagonal) are destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [out] B: array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    On entry, the symmetric positive definite matrices B_i. On exit, the triangular factor of B_i as returned by

    POTRF_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    On exit, the eigenvalues in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_i to the next one D_(i+1). There is no restriction for the value of strideD. Normal use is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_i associated with the ith reduced eigenvalue problem. On exit, if 0 < info[i] <= n, E_i contains the unconverged off-diagonal elements of T_i (or properly speaking, a tridiagonal matrix equivalent to T_i). The diagonal elements of this matrix are in D_i; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_i to the next one E_(i+1). There is no restriction for the value of strideE. Normal use is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit of batch instance i. If info[i] = j <= n, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info[i] = n + j, the leading minor of order j of B_i is not positive definite.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygv_strided_batched()

rocblas_status rocsolver_dsygv_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssygv_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

SYGV_STRIDED_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of real generalized symmetric-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed depending on the value of evect.

When computed, the matrix \(Z_i\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_i^T B_i Z_i=I & \: \text{if 1st or 2nd form, or}\\ Z_i^T B_i^{-1} Z_i=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_i and B_i are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the symmetric matrices A_i. On exit, if evect is original, the normalized matrix Z_i of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_i (including the diagonal) are destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use is strideA >= lda*n.

  • [out] B: pointer to type. Array on the GPU (the size depends on the value of strideB).

    On entry, the symmetric positive definite matrices B_i. On exit, the triangular factor of B_i as returned by

    POTRF_STRIDED_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_i to the next one B_(i+1). There is no restriction for the value of strideB. Normal use is strideB >= ldb*n.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    On exit, the eigenvalues in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_i to the next one D_(i+1). There is no restriction for the value of strideD. Normal use is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_i associated with the ith reduced eigenvalue problem. On exit, if 0 < info[i] <= n, it contains the unconverged off-diagonal elements of T_i (or properly speaking, a tridiagonal matrix equivalent to T_i). The diagonal elements of this matrix are in D_i; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_i to the next one E_(i+1). There is no restriction for the value of strideE. Normal use is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit of batch i. If info[i] = j <= n, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info[i] = n + j, the leading minor of order j of B_i is not positive definite.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegv()

rocblas_status rocsolver_zhegv(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb, double *D, double *E, rocblas_int *info)
rocblas_status rocsolver_chegv(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb, float *D, float *E, rocblas_int *info)

HEGV computes the eigenvalues and (optionally) eigenvectors of a complex generalized hermitian-definite eigenproblem.

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed depending on the value of evect.

When computed, the matrix Z of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z^H B Z=I & \: \text{if 1st or 2nd form, or}\\ Z^H B^{-1} Z=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblem.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A and B are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the hermitian matrix A. On exit, if evect is original, the normalized matrix Z of eigenvectors. If evect is none, then the upper or lower triangular part of the matrix A (including the diagonal) is destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] B: pointer to type. Array on the GPU of dimension ldb*n.

    On entry, the hermitian positive definite matrix B. On exit, the triangular factor of B as returned by

    POTRF.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

  • [out] D: pointer to real type. Array on the GPU of dimension n.

    On exit, the eigenvalues in increasing order.

  • [out] E: pointer to real type. Array on the GPU of dimension n.

    This array is used to work internally with the tridiagonal matrix T associated with the reduced eigenvalue problem. On exit, if 0 < info <= n, it contains the unconverged off-diagonal elements of T (or properly speaking, a tridiagonal matrix equivalent to T). The diagonal elements of this matrix are in D; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = j <= n, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info = n + j, the leading minor of order j of B is not positive definite.

rocsolver_<type>hegv_batched()

rocblas_status rocsolver_zhegv_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_chegv_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

HEGV_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of complex generalized hermitian-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed depending on the value of evect.

When computed, the matrix \(Z_i\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_i^H B_i Z_i=I & \: \text{if 1st or 2nd form, or}\\ Z_i^H B_i^{-1} Z_i=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_i and B_i are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the hermitian matrices A_i. On exit, if evect is original, the normalized matrix Z_i of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_i (including the diagonal) are destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [out] B: array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    On entry, the hermitian positive definite matrices B_i. On exit, the triangular factor of B_i as returned by

    POTRF_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    On exit, the eigenvalues in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_i to the next one D_(i+1). There is no restriction for the value of strideD. Normal use is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_i associated with the ith reduced eigenvalue problem. On exit, if 0 < info[i] <= n, it contains the unconverged off-diagonal elements of T_i (or properly speaking, a tridiagonal matrix equivalent to T_i). The diagonal elements of this matrix are in D_i; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_i to the next one E_(i+1). There is no restriction for the value of strideE. Normal use is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit of batch i. If info[i] = j <= n, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info[i] = n + j, the leading minor of order j of B_i is not positive definite.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegv_strided_batched()

rocblas_status rocsolver_zhegv_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_chegv_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

HEGV_STRIDED_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of complex generalized hermitian-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed depending on the value of evect.

When computed, the matrix \(Z_i\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_i^H B_i Z_i=I & \: \text{if 1st or 2nd form, or}\\ Z_i^H B_i^{-1} Z_i=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_i and B_i are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the hermitian matrices A_i. On exit, if evect is original, the normalized matrix Z_i of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_i (including the diagonal) are destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use is strideA >= lda*n.

  • [out] B: pointer to type. Array on the GPU (the size depends on the value of strideB).

    On entry, the hermitian positive definite matrices B_i. On exit, the triangular factor of B_i as returned by

    POTRF_STRIDED_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_i to the next one B_(i+1). There is no restriction for the value of strideB. Normal use is strideB >= ldb*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    On exit, the eigenvalues in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_i to the next one D_(i+1). There is no restriction for the value of strideD. Normal use is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_i associated with the ith reduced eigenvalue problem. On exit, if 0 < info[i] <= n, it contains the unconverged off-diagonal elements of T_i (or properly speaking, a tridiagonal matrix equivalent to T_i). The diagonal elements of this matrix are in D_i; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_i to the next one E_(i+1). There is no restriction for the value of strideE. Normal use is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit of batch i. If info[i] = j <= n, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info[i] = n + j, the leading minor of order j of B_i is not positive definite.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygvd()

rocblas_status rocsolver_dsygvd(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *B, const rocblas_int ldb, double *D, double *E, rocblas_int *info)
rocblas_status rocsolver_ssygvd(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *B, const rocblas_int ldb, float *D, float *E, rocblas_int *info)

SYGVD computes the eigenvalues and (optionally) eigenvectors of a real generalized symmetric-definite eigenproblem.

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect.

When computed, the matrix Z of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z^T B Z=I & \: \text{if 1st or 2nd form, or}\\ Z^T B^{-1} Z=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblem.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A and B are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the symmetric matrix A. On exit, if evect is original, the normalized matrix Z of eigenvectors. If evect is none, then the upper or lower triangular part of the matrix A (including the diagonal) is destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] B: pointer to type. Array on the GPU of dimension ldb*n.

    On entry, the symmetric positive definite matrix B. On exit, the triangular factor of B as returned by

    POTRF.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

  • [out] D: pointer to type. Array on the GPU of dimension n.

    On exit, the eigenvalues in increasing order.

  • [out] E: pointer to type. Array on the GPU of dimension n.

    This array is used to work internally with the tridiagonal matrix T associated with the reduced eigenvalue problem. On exit, if 0 < info <= n, it contains the unconverged off-diagonal elements of T (or properly speaking, a tridiagonal matrix equivalent to T). The diagonal elements of this matrix are in D; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = j <= n and evect is rocblas_evect_none, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info = j <= n and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [j/(n+1), j/(n+1)] to [j%(n+1), j%(n+1)]. If info = n + j, the leading minor of order j of B is not positive definite.

rocsolver_<type>sygvd_batched()

rocblas_status rocsolver_dsygvd_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssygvd_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

SYGVD_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of real generalized symmetric-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect.

When computed, the matrix \(Z_i\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_i^T B_i Z_i=I & \: \text{if 1st or 2nd form, or}\\ Z_i^T B_i^{-1} Z_i=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_i and B_i are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the symmetric matrices A_i. On exit, if evect is original, the normalized matrix Z_i of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_i (including the diagonal) are destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [out] B: array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    On entry, the symmetric positive definite matrices B_i. On exit, the triangular factor of B_i as returned by

    POTRF_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    On exit, the eigenvalues in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_i to the next one D_(i+1). There is no restriction for the value of strideD. Normal use is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_i associated with the ith reduced eigenvalue problem. On exit, if 0 < info[i] <= n, it contains the unconverged off-diagonal elements of T_i (or properly speaking, a tridiagonal matrix equivalent to T_i). The diagonal elements of this matrix are in D_i; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_i to the next one E_(i+1). There is no restriction for the value of strideE. Normal use is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit of batch i. If info[i] = j <= n and evect is rocblas_evect_none, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info[i] = j <= n and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [j/(n+1), j/(n+1)] to [j%(n+1), j%(n+1)]. If info[i] = n + j, the leading minor of order j of B_i is not positive definite.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygvd_strided_batched()

rocblas_status rocsolver_dsygvd_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_ssygvd_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

SYGVD_STRIDED_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of real generalized symmetric-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect.

When computed, the matrix \(Z_i\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_i^T B_i Z_i=I & \: \text{if 1st or 2nd form, or}\\ Z_i^T B_i^{-1} Z_i=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_i and B_i are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the symmetric matrices A_i. On exit, if evect is original, the normalized matrix Z_i of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_i (including the diagonal) are destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use is strideA >= lda*n.

  • [out] B: pointer to type. Array on the GPU (the size depends on the value of strideB).

    On entry, the symmetric positive definite matrices B_i. On exit, the triangular factor of B_i as returned by

    POTRF_STRIDED_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_i to the next one B_(i+1). There is no restriction for the value of strideB. Normal use is strideB >= ldb*n.

  • [out] D: pointer to type. Array on the GPU (the size depends on the value of strideD).

    On exit, the eigenvalues in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_i to the next one D_(i+1). There is no restriction for the value of strideD. Normal use is strideD >= n.

  • [out] E: pointer to type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_i associated with the ith reduced eigenvalue problem. On exit, if 0 < info[i] <= n, it contains the unconverged off-diagonal elements of T_i (or properly speaking, a tridiagonal matrix equivalent to T_i). The diagonal elements of this matrix are in D_i; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_i to the next one E_(i+1). There is no restriction for the value of strideE. Normal use is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit of batch i. If info[i] = j <= n and evect is rocblas_evect_none, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info[i] = j <= n and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [j/(n+1), j/(n+1)] to [j%(n+1), j%(n+1)]. If info[i] = n + j, the leading minor of order j of B_i is not positive definite.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegvd()

rocblas_status rocsolver_zhegvd(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb, double *D, double *E, rocblas_int *info)
rocblas_status rocsolver_chegvd(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb, float *D, float *E, rocblas_int *info)

HEGVD computes the eigenvalues and (optionally) eigenvectors of a complex generalized hermitian-definite eigenproblem.

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect.

When computed, the matrix Z of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z^H B Z=I & \: \text{if 1st or 2nd form, or}\\ Z^H B^{-1} Z=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblem.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A and B are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the hermitian matrix A. On exit, if evect is original, the normalized matrix Z of eigenvectors. If evect is none, then the upper or lower triangular part of the matrix A (including the diagonal) is destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • [out] B: pointer to type. Array on the GPU of dimension ldb*n.

    On entry, the hermitian positive definite matrix B. On exit, the triangular factor of B as returned by

    POTRF.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

  • [out] D: pointer to real type. Array on the GPU of dimension n.

    On exit, the eigenvalues in increasing order.

  • [out] E: pointer to real type. Array on the GPU of dimension n.

    This array is used to work internally with the tridiagonal matrix T associated with the reduced eigenvalue problem. On exit, if 0 < info <= n, it contains the unconverged off-diagonal elements of T (or properly speaking, a tridiagonal matrix equivalent to T). The diagonal elements of this matrix are in D; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = j <= n and evect is rocblas_evect_none, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info = j <= n and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [j/(n+1), j/(n+1)] to [j%(n+1), j%(n+1)]. If info = n + j, the leading minor of order j of B is not positive definite.

rocsolver_<type>hegvd_batched()

rocblas_status rocsolver_zhegvd_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_chegvd_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

HEGVD_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of complex generalized hermitian-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect.

When computed, the matrix \(Z_i\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_i^H B_i Z_i=I & \: \text{if 1st or 2nd form, or}\\ Z_i^H B_i^{-1} Z_i=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_i and B_i are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the hermitian matrices A_i. On exit, if evect is original, the normalized matrix Z_i of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_i (including the diagonal) are destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [out] B: array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    On entry, the hermitian positive definite matrices B_i. On exit, the triangular factor of B_i as returned by

    POTRF_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    On exit, the eigenvalues in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_i to the next one D_(i+1). There is no restriction for the value of strideD. Normal use is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_i associated with the ith reduced eigenvalue problem. On exit, if 0 < info[i] <= n, it contains the unconverged off-diagonal elements of T_i (or properly speaking, a tridiagonal matrix equivalent to T_i). The diagonal elements of this matrix are in D_i; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_i to the next one E_(i+1). There is no restriction for the value of strideE. Normal use is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit of batch i. If info[i] = j <= n and evect is rocblas_evect_none, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info[i] = j <= n and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [j/(n+1), j/(n+1)] to [j%(n+1), j%(n+1)]. If info[i] = n + j, the leading minor of order j of B_i is not positive definite.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegvd_strided_batched()

rocblas_status rocsolver_zhegvd_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_chegvd_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_int *info, const rocblas_int batch_count)

HEGVD_STRIDED_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of complex generalized hermitian-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_i X_i = \lambda B_i X_i & \: \text{1st form,}\\ A_i B_i X_i = \lambda X_i & \: \text{2nd form, or}\\ B_i A_i X_i = \lambda X_i & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvectors are computed using a divide-and-conquer algorithm, depending on the value of evect.

When computed, the matrix \(Z_i\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_i^H B_i Z_i=I & \: \text{if 1st or 2nd form, or}\\ Z_i^H B_i^{-1} Z_i=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters
  • [in] handle: rocblas_handle.

  • [in] itype: rocblas_eform.

    Specifies the form of the generalized eigenproblems.

  • [in] evect: rocblas_evect.

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • [in] uplo: rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_i and B_i are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_i and B_i are not used.

  • [in] n: rocblas_int. n >= 0.

    The matrix dimensions.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the hermitian matrices A_i. On exit, if evect is original, the normalized matrix Z_i of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_i (including the diagonal) are destroyed, depending on the value of uplo.

  • [in] lda: rocblas_int. lda >= n.

    Specifies the leading dimension of A_i.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_i to the next one A_(i+1). There is no restriction for the value of strideA. Normal use is strideA >= lda*n.

  • [out] B: pointer to type. Array on the GPU (the size depends on the value of strideB).

    On entry, the hermitian positive definite matrices B_i. On exit, the triangular factor of B_i as returned by

    POTRF_STRIDED_BATCHED.

  • [in] ldb: rocblas_int. ldb >= n.

    Specifies the leading dimension of B_i.

  • [in] strideB: rocblas_stride.

    Stride from the start of one matrix B_i to the next one B_(i+1). There is no restriction for the value of strideB. Normal use is strideB >= ldb*n.

  • [out] D: pointer to real type. Array on the GPU (the size depends on the value of strideD).

    On exit, the eigenvalues in increasing order.

  • [in] strideD: rocblas_stride.

    Stride from the start of one vector D_i to the next one D_(i+1). There is no restriction for the value of strideD. Normal use is strideD >= n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the tridiagonal matrix T_i associated with the ith reduced eigenvalue problem. On exit, if 0 < info[i] <= n, it contains the unconverged off-diagonal elements of T_i (or properly speaking, a tridiagonal matrix equivalent to T_i). The diagonal elements of this matrix are in D_i; those that converged correspond to a subset of the eigenvalues (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_i to the next one E_(i+1). There is no restriction for the value of strideE. Normal use is strideE >= n.

  • [out] info: pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[i] = 0, successful exit of batch i. If info[i] = j <= n and evect is rocblas_evect_none, j off-diagonal elements of an intermediate tridiagonal form did not converge to zero. If info[i] = j <= n and evect is rocblas_evect_original, the algorithm failed to compute an eigenvalue in the submatrix from [j/(n+1), j/(n+1)] to [j%(n+1), j%(n+1)]. If info[i] = n + j, the leading minor of order j of B_i is not positive definite.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

3.3.7. Singular value decomposition

rocsolver_<type>gesvd()

rocblas_status rocsolver_zgesvd(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *S, rocblas_double_complex *U, const rocblas_int ldu, rocblas_double_complex *V, const rocblas_int ldv, double *E, const rocblas_workmode fast_alg, rocblas_int *info)
rocblas_status rocsolver_cgesvd(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *S, rocblas_float_complex *U, const rocblas_int ldu, rocblas_float_complex *V, const rocblas_int ldv, float *E, const rocblas_workmode fast_alg, rocblas_int *info)
rocblas_status rocsolver_dgesvd(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *S, double *U, const rocblas_int ldu, double *V, const rocblas_int ldv, double *E, const rocblas_workmode fast_alg, rocblas_int *info)
rocblas_status rocsolver_sgesvd(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *S, float *U, const rocblas_int ldu, float *V, const rocblas_int ldv, float *E, const rocblas_workmode fast_alg, rocblas_int *info)

GESVD computes the singular values and optionally the singular vectors of a general m-by-n matrix A (Singular Value Decomposition).

The SVD of matrix A is given by:

\[ A = U S V' \]

where the m-by-n matrix S is zero except, possibly, for its min(m,n) diagonal elements, which are the singular values of A. U and V are orthogonal (unitary) matrices. The first min(m,n) columns of U and V are the left and right singular vectors of A, respectively.

The computation of the singular vectors is optional and it is controlled by the function arguments left_svect and right_svect as described below. When computed, this function returns the transpose (or transpose conjugate) of the right singular vectors, i.e. the rows of V’.

left_svect and right_svect are rocblas_svect enums that can take the following values:

  • rocblas_svect_all: the entire matrix U (or V’) is computed,

  • rocblas_svect_singular: only the singular vectors (first min(m,n) columns of U or rows of V’) are computed,

  • rocblas_svect_overwrite: the first columns (or rows) of A are overwritten with the singular vectors, or

  • rocblas_svect_none: no columns (or rows) of U (or V’) are computed, i.e. no singular vectors.

left_svect and right_svect cannot both be set to overwrite. When neither is set to overwrite, the contents of A are destroyed by the time the function returns.

Note

When m >> n (or n >> m) the algorithm could be sped up by compressing the matrix A via a QR (or LQ) factorization, and working with the triangular factor afterwards (thin-SVD). If the singular vectors are also requested, its computation could be sped up as well via executing some intermediate operations out-of-place, and relying more on matrix multiplications (GEMMs); this will require, however, a larger memory workspace. The parameter fast_alg controls whether the fast algorithm is executed or not. For more details, see the “Tuning rocSOLVER performance” and “Memory model” sections of the documentation.

Parameters
  • [in] handle: rocblas_handle.

  • [in] left_svect: rocblas_svect.

    Specifies how the left singular vectors are computed.

  • [in] right_svect: rocblas_svect.

    Specifies how the right singular vectors are computed.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of matrix A.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of matrix A.

  • [inout] A: pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, if left_svect (or right_svect) is equal to overwrite, the first columns (or rows) contain the left (or right) singular vectors; otherwise, the contents of A are destroyed.

  • [in] lda: rocblas_int. lda >= m.

    The leading dimension of A.

  • [out] S: pointer to real type. Array on the GPU of dimension min(m,n).

    The singular values of A in decreasing order.

  • [out] U: pointer to type. Array on the GPU of dimension ldu*min(m,n) if left_svect is set to singular, or ldu*m when left_svect is equal to all.

    The matrix of left singular vectors stored as columns. Not referenced if left_svect is set to overwrite or none.

  • [in] ldu: rocblas_int. ldu >= m if left_svect is all or singular; ldu >= 1 otherwise.

    The leading dimension of U.

  • [out] V: pointer to type. Array on the GPU of dimension ldv*n.

    The matrix of right singular vectors stored as rows (transposed / conjugate-transposed). Not referenced if right_svect is set to overwrite or none.

  • [in] ldv: rocblas_int. ldv >= n if right_svect is all; ldv >= min(m,n) if right_svect is set to singular; or ldv >= 1 otherwise.

    The leading dimension of V.

  • [out] E: pointer to real type. Array on the GPU of dimension min(m,n)-1.

    This array is used to work internally with the bidiagonal matrix B associated with A (using

    BDSQR). On exit, if info > 0, it contains the unconverged off-diagonal elements of B (or properly speaking, a bidiagonal matrix orthogonally equivalent to B). The diagonal elements of this matrix are in S; those that converged correspond to a subset of the singular values of A (not necessarily ordered).

  • [in] fast_alg: rocblas_workmode.

    If set to rocblas_outofplace, the function will execute the fast thin-SVD version of the algorithm when possible.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0,

    BDSQR did not converge. i elements of E did not converge to zero.

rocsolver_<type>gesvd_batched()

rocblas_status rocsolver_zgesvd_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *S, const rocblas_stride strideS, rocblas_double_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_double_complex *V, const rocblas_int ldv, const rocblas_stride strideV, double *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgesvd_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *S, const rocblas_stride strideS, rocblas_float_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_float_complex *V, const rocblas_int ldv, const rocblas_stride strideV, float *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgesvd_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *S, const rocblas_stride strideS, double *U, const rocblas_int ldu, const rocblas_stride strideU, double *V, const rocblas_int ldv, const rocblas_stride strideV, double *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgesvd_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *S, const rocblas_stride strideS, float *U, const rocblas_int ldu, const rocblas_stride strideU, float *V, const rocblas_int ldv, const rocblas_stride strideV, float *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)

GESVD_BATCHED computes the singular values and optionally the singular vectors of a batch of general m-by-n matrix A (Singular Value Decomposition).

The SVD of matrix A_j in the batch is given by:

\[ A_j = U_j S_j V_j' \]

where the m-by-n matrix \(S_j\) is zero except, possibly, for its min(m,n) diagonal elements, which are the singular values of \(A_j\). \(U_j\) and \(V_j\) are orthogonal (unitary) matrices. The first min(m,n) columns of \(U_j\) and \(V_j\) are the left and right singular vectors of \(A_j\), respectively.

The computation of the singular vectors is optional and it is controlled by the function arguments left_svect and right_svect as described below. When computed, this function returns the transpose (or transpose conjugate) of the right singular vectors, i.e. the rows of \(V_j'\).

left_svect and right_svect are rocblas_svect enums that can take the following values:

  • rocblas_svect_all: the entire matrix \(U_j\) (or \(V_j'\)) is computed,

  • rocblas_svect_singular: only the singular vectors (first min(m,n) columns of \(U_j\) or rows of \(V_j'\)) are computed,

  • rocblas_svect_overwrite: the first columns (or rows) of \(A_j\) are overwritten with the singular vectors, or

  • rocblas_svect_none: no columns (or rows) of \(U_j\) (or \(V_j'\)) are computed, i.e. no singular vectors.

left_svect and right_svect cannot both be set to overwrite. When neither is set to overwrite, the contents of \(A_j\) are destroyed by the time the function returns.

Note

When m >> n (or n >> m) the algorithm could be sped up by compressing the matrix \(A_j\) via a QR (or LQ) factorization, and working with the triangular factor afterwards (thin-SVD). If the singular vectors are also requested, its computation could be sped up as well via executing some intermediate operations out-of-place, and relying more on matrix multiplications (GEMMs); this will require, however, a larger memory workspace. The parameter fast_alg controls whether the fast algorithm is executed or not. For more details, see the “Tuning rocSOLVER performance” and “Memory model” sections of the documentation.

Parameters
  • [in] handle: rocblas_handle.

  • [in] left_svect: rocblas_svect.

    Specifies how the left singular vectors are computed.

  • [in] right_svect: rocblas_svect.

    Specifies how the right singular vectors are computed.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_j in the batch.

  • [inout] A: Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j. On exit, if left_svect (or right_svect) is equal to overwrite, the first columns (or rows) of A_j contain the left (or right) corresponding singular vectors; otherwise, the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= m.

    The leading dimension of A_j.

  • [out] S: pointer to real type. Array on the GPU (the size depends on the value of strideS).

    The singular values of A_j in decreasing order.

  • [in] strideS: rocblas_stride.

    Stride from the start of one vector S_j to the next one S_(j+1). There is no restriction for the value of strideS. Normal use case is strideS >= min(m,n).

  • [out] U: pointer to type. Array on the GPU (the side depends on the value of strideU).

    The matrices U_j of left singular vectors stored as columns. Not referenced if left_svect is set to overwrite or none.

  • [in] ldu: rocblas_int. ldu >= m if left_svect is all or singular; ldu >= 1 otherwise.

    The leading dimension of U_j.

  • [in] strideU: rocblas_stride.

    Stride from the start of one matrix U_j to the next one U_(j+1). There is no restriction for the value of strideU. Normal use case is strideU >= ldu*min(m,n) if left_svect is set to singular, or strideU >= ldu*m when left_svect is equal to all.

  • [out] V: pointer to type. Array on the GPU (the size depends on the value of strideV).

    The matrices V_j of right singular vectors stored as rows (transposed / conjugate-transposed). Not referenced if right_svect is set to overwrite or none.

  • [in] ldv: rocblas_int. ldv >= n if right_svect is all; ldv >= min(m,n) if right_svect is set to singular; or ldv >= 1 otherwise.

    The leading dimension of V.

  • [in] strideV: rocblas_stride.

    Stride from the start of one matrix V_j to the next one V_(j+1). There is no restriction for the value of strideV. Normal use case is strideV >= ldv*n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the bidiagonal matrix B_j associated with A_j (using

    BDSQR). On exit, if info[j] > 0, E_j contains the unconverged off-diagonal elements of B_j (or properly speaking, a bidiagonal matrix orthogonally equivalent to B_j). The diagonal elements of this matrix are in S_j; those that converged correspond to a subset of the singular values of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [in] fast_alg: rocblas_workmode.

    If set to rocblas_outofplace, the function will execute the fast thin-SVD version of the algorithm when possible.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info[j] = 0, successful exit. If info[j] = i > 0,

    BDSQR did not converge. i elements of E_j did not converge to zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>gesvd_strided_batched()

rocblas_status rocsolver_zgesvd_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *S, const rocblas_stride strideS, rocblas_double_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_double_complex *V, const rocblas_int ldv, const rocblas_stride strideV, double *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_cgesvd_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *S, const rocblas_stride strideS, rocblas_float_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_float_complex *V, const rocblas_int ldv, const rocblas_stride strideV, float *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_dgesvd_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *S, const rocblas_stride strideS, double *U, const rocblas_int ldu, const rocblas_stride strideU, double *V, const rocblas_int ldv, const rocblas_stride strideV, double *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)
rocblas_status rocsolver_sgesvd_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *S, const rocblas_stride strideS, float *U, const rocblas_int ldu, const rocblas_stride strideU, float *V, const rocblas_int ldv, const rocblas_stride strideV, float *E, const rocblas_stride strideE, const rocblas_workmode fast_alg, rocblas_int *info, const rocblas_int batch_count)

GESVD_STRIDED_BATCHED computes the singular values and optionally the singular vectors of a batch of general m-by-n matrix A (Singular Value Decomposition).

The SVD of matrix A_j in the batch is given by:

\[ A_j = U_j S_j V_j' \]

where the m-by-n matrix \(S_j\) is zero except, possibly, for its min(m,n) diagonal elements, which are the singular values of \(A_j\). \(U_j\) and \(V_j\) are orthogonal (unitary) matrices. The first min(m,n) columns of \(U_j\) and \(V_j\) are the left and right singular vectors of \(A_j\), respectively.

The computation of the singular vectors is optional and it is controlled by the function arguments left_svect and right_svect as described below. When computed, this function returns the transpose (or transpose conjugate) of the right singular vectors, i.e. the rows of \(V_j'\).

left_svect and right_svect are rocblas_svect enums that can take the following values:

  • rocblas_svect_all: the entire matrix \(U_j\) (or \(V_j'\)) is computed,

  • rocblas_svect_singular: only the singular vectors (first min(m,n) columns of \(U_j\) or rows of \(V_j'\)) are computed,

  • rocblas_svect_overwrite: the first columns (or rows) of \(A_j\) are overwritten with the singular vectors, or

  • rocblas_svect_none: no columns (or rows) of \(U_j\) (or \(V_j'\)) are computed, i.e. no singular vectors.

left_svect and right_svect cannot both be set to overwrite. When neither is set to overwrite, the contents of \(A_j\) are destroyed by the time the function returns.

Note

When m >> n (or n >> m) the algorithm could be sped up by compressing the matrix \(A_j\) via a QR (or LQ) factorization, and working with the triangular factor afterwards (thin-SVD). If the singular vectors are also requested, its computation could be sped up as well via executing some intermediate operations out-of-place, and relying more on matrix multiplications (GEMMs); this will require, however, a larger memory workspace. The parameter fast_alg controls whether the fast algorithm is executed or not. For more details, see the “Tuning rocSOLVER performance” and “Memory model” sections of the documentation.

Parameters
  • [in] handle: rocblas_handle.

  • [in] left_svect: rocblas_svect.

    Specifies how the left singular vectors are computed.

  • [in] right_svect: rocblas_svect.

    Specifies how the right singular vectors are computed.

  • [in] m: rocblas_int. m >= 0.

    The number of rows of all matrices A_j in the batch.

  • [in] n: rocblas_int. n >= 0.

    The number of columns of all matrices A_j in the batch.

  • [inout] A: pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j. On exit, if left_svect (or right_svect) is equal to overwrite, the first columns (or rows) of A_j contain the left (or right) corresponding singular vectors; otherwise, the contents of A_j are destroyed.

  • [in] lda: rocblas_int. lda >= m.

    The leading dimension of A_j.

  • [in] strideA: rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • [out] S: pointer to real type. Array on the GPU (the size depends on the value of strideS).

    The singular values of A_j in decreasing order.

  • [in] strideS: rocblas_stride.

    Stride from the start of one vector S_j to the next one S_(j+1). There is no restriction for the value of strideS. Normal use case is strideS >= min(m,n).

  • [out] U: pointer to type. Array on the GPU (the side depends on the value of strideU).

    The matrices U_j of left singular vectors stored as columns. Not referenced if left_svect is set to overwrite or none.

  • [in] ldu: rocblas_int. ldu >= m if left_svect is all or singular; ldu >= 1 otherwise.

    The leading dimension of U_j.

  • [in] strideU: rocblas_stride.

    Stride from the start of one matrix U_j to the next one U_(j+1). There is no restriction for the value of strideU. Normal use case is strideU >= ldu*min(m,n) if left_svect is set to singular, or strideU >= ldu*m when left_svect is equal to all.

  • [out] V: pointer to type. Array on the GPU (the size depends on the value of strideV).

    The matrices V_j of right singular vectors stored as rows (transposed / conjugate-transposed). Not referenced if right_svect is set to overwrite or none.

  • [in] ldv: rocblas_int. ldv >= n if right_svect is all; ldv >= min(m,n) if right_svect is set to singular; or ldv >= 1 otherwise.

    The leading dimension of V.

  • [in] strideV: rocblas_stride.

    Stride from the start of one matrix V_j to the next one V_(j+1). There is no restriction for the value of strideV. Normal use case is strideV >= ldv*n.

  • [out] E: pointer to real type. Array on the GPU (the size depends on the value of strideE).

    This array is used to work internally with the bidiagonal matrix B_j associated with A_j (using

    BDSQR). On exit, if info > 0, E_j contains the unconverged off-diagonal elements of B_j (or properly speaking, a bidiagonal matrix orthogonally equivalent to B_j). The diagonal elements of this matrix are in S_j; those that converged correspond to a subset of the singular values of A_j (not necessarily ordered).

  • [in] strideE: rocblas_stride.

    Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • [in] fast_alg: rocblas_workmode.

    If set to rocblas_outofplace, the function will execute the fast thin-SVD version of the algorithm when possible.

  • [out] info: pointer to a rocblas_int on the GPU.

    If info[j] = 0, successful exit. If info[j] = i > 0, BDSQR did not converge. i elements of E_j did not converge to zero.

  • [in] batch_count: rocblas_int. batch_count >= 0.

    Number of matrices in the batch.