Skip to content

Add CUSOLVERRF.jl integration for GPU-accelerated sparse LU factorization #651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

ChrisRackauckas
Copy link
Member

Summary

This PR adds support for NVIDIA's cusolverRF sparse LU factorization library through a package extension, providing high-performance GPU-accelerated solving for sparse linear systems.

Motivation

CUSOLVERRF.jl provides access to NVIDIA's cusolverRF library, which offers significant performance improvements for sparse LU factorization on GPUs. This integration makes it accessible through LinearSolve.jl's unified interface.

Key Features

  • New CUSOLVERRFFactorization algorithm with configurable options:
    • symbolic: Choose between :RF (default) or :KLU for symbolic factorization
    • reuse_symbolic: Reuse symbolic factorization for matrices with same sparsity pattern
  • Automatic CPU-to-GPU conversion for convenience
  • Support for multiple right-hand sides
  • Adjoint solve support
  • Comprehensive test suite

Implementation Details

The implementation follows LinearSolve.jl's extension pattern:

  • Extension module in ext/LinearSolveCUSOLVERRFExt.jl
  • Core types and exports in src/factorization.jl and src/LinearSolve.jl
  • Weak dependency configuration in Project.toml
  • Tests in test/gpu/cusolverrf.jl

Usage Example

using LinearSolve, CUSOLVERRF, SparseArrays

# Create sparse system
A = sprand(1000, 1000, 0.01) + 5I
b = rand(1000)

# Solve with default options
prob = LinearProblem(A, b)
sol = solve(prob, CUSOLVERRFFactorization())

# Use KLU for symbolic factorization
sol = solve(prob, CUSOLVERRFFactorization(symbolic = :KLU))

Limitations

  • Only supports Float64 element types with Int32 indices (CUSOLVERRF limitation)
  • Requires CUDA-capable GPU

Testing

Tests have been added to the GPU test suite and can be run with appropriate hardware.

🤖 Generated with Claude Code

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

[JuliaFormatter] reported by reviewdog 🐶


[JuliaFormatter] reported by reviewdog 🐶

@info "CUDA not available, skipping CUSOLVERRF tests"
return
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

n = 100
A = sprand(n, n, 0.1) + I
b = rand(n)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

# Test with CPU sparse matrix (should auto-convert to GPU)
@testset "CPU Sparse Matrix" begin
prob = LinearProblem(A, b)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

# Test with default symbolic (:RF)
sol = solve(prob, CUSOLVERRFFactorization())
@test norm(A * sol.u - b) / norm(b) < 1e-10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

sol_klu = solve(prob, CUSOLVERRFFactorization(symbolic = :KLU))
@test norm(A * sol_klu.u - b) / norm(b) < 1e-10
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

@testset "GPU Sparse Matrix" begin
A_gpu = CUDA.CUSPARSE.CuSparseMatrixCSR(A)
b_gpu = CuArray(b)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change


prob_gpu = LinearProblem(A_gpu, b_gpu)
sol_gpu = solve(prob_gpu, CUSOLVERRFFactorization())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

res_gpu = A_gpu * sol_gpu.u - b_gpu
@test norm(res_gpu) / norm(b_gpu) < 1e-10
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

# Create a new matrix with same pattern but different values
A2 = A + 0.1 * sprand(n, n, 0.01)
b2 = rand(n)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

sol2 = solve(prob2, CUSOLVERRFFactorization(reuse_symbolic = true))
@test norm(A2 * sol2.u - b2) / norm(b2) < 1e-10
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

@oscardssmith
Copy link
Member

symbolic: Choose between :RF (default) or :KLU for symbolic factorization

Should these be types rather than symbols for type stability?

@ChrisRackauckas
Copy link
Member Author

claude and others added 11 commits August 5, 2025 14:34
…tion

This PR adds support for NVIDIA's cusolverRF sparse LU factorization library through a package extension. CUSOLVERRF provides high-performance GPU-accelerated factorization for sparse matrices.

Key features:
- New `CUSOLVERRFFactorization` algorithm with configurable symbolic factorization (RF or KLU)
- Automatic CPU-to-GPU conversion for convenience
- Support for multiple right-hand sides
- Reusable symbolic factorization for matrices with same sparsity pattern
- Adjoint solve support
- Comprehensive test suite

The implementation follows LinearSolve.jl's extension pattern, similar to the existing CUDSS integration.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Include CUSOLVERRF tests in the GPU test suite when the package is available. The tests are conditionally included to avoid failures when CUSOLVERRF.jl is not installed.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Added CUSOLVERRF to recommended methods for sparse matrices
- Added CUSOLVERRF section in the full list of solvers
- Added CUSOLVERRF examples in GPU tutorial documentation
- Documented supported options and limitations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Updated sparse matrices recommendation to include both CUDSS.jl and CUSOLVERRF.jl
- Clarified that CUDSS provides interface to NVIDIA's cuDSS library
- Maintained that both offer high performance for GPU-accelerated sparse LU factorization

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Clarified that CUDSS works through LUFactorization() when CUDSS.jl is loaded
- Explained that it automatically uses cuDSS for CuSparseMatrixCSR arrays
- Removed incorrect reference to a separate CUDSS factorization type

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ChrisRackauckas ChrisRackauckas force-pushed the add-cusolverrf-support branch from 8e3ce1a to d7f1f8c Compare August 5, 2025 18:35
Comment on lines +131 to +133
CUSOLVERRF provides access to NVIDIA's cusolverRF library, which offers significant
performance improvements for sparse LU factorization on GPUs. It supports both
`:RF` (default) and `:KLU` symbolic factorization methods, and can reuse symbolic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
CUSOLVERRF provides access to NVIDIA's cusolverRF library, which offers significant
performance improvements for sparse LU factorization on GPUs. It supports both
`:RF` (default) and `:KLU` symbolic factorization methods, and can reuse symbolic
CUSOLVERRF provides access to NVIDIA's cusolverRF library, which offers significant
performance improvements for sparse LU factorization on GPUs. It supports both
`:RF` (default) and `:KLU` symbolic factorization methods, and can reuse symbolic

end

function LinearSolve.init_cacheval(alg::LinearSolve.CUSOLVERRFFactorization,
A::Union{CuSparseMatrixCSR{Float64, Int32}, SparseMatrixCSC{Float64, <:Integer}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
A::Union{CuSparseMatrixCSR{Float64, Int32}, SparseMatrixCSC{Float64, <:Integer}},
A::Union{CuSparseMatrixCSR{Float64, Int32}, SparseMatrixCSC{Float64, <:Integer}},

symbolic = alg.symbolic
# Convert to CuSparseMatrixCSR if needed
A_gpu = A isa CuSparseMatrixCSR ? A : CuSparseMatrixCSR(A)
RFLU(A_gpu; nrhs=nrhs, symbolic=symbolic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
RFLU(A_gpu; nrhs=nrhs, symbolic=symbolic)
RFLU(A_gpu; nrhs = nrhs, symbolic = symbolic)


function SciMLBase.solve!(cache::LinearSolve.LinearCache, alg::LinearSolve.CUSOLVERRFFactorization; kwargs...)
A = cache.A

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

else
error("CUSOLVERRFFactorization only supports SparseMatrixCSC or CuSparseMatrixCSR matrices")
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

if cacheval === nothing
# Create new factorization
nrhs = cache.b isa AbstractMatrix ? size(cache.b, 2) : 1
fact = RFLU(A_gpu; nrhs=nrhs, symbolic=alg.symbolic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
fact = RFLU(A_gpu; nrhs=nrhs, symbolic=alg.symbolic)
fact = RFLU(A_gpu; nrhs = nrhs, symbolic = alg.symbolic)

else
# Create new factorization if pattern changed
nrhs = cache.b isa AbstractMatrix ? size(cache.b, 2) : 1
fact = RFLU(A_gpu; nrhs=nrhs, symbolic=alg.symbolic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
fact = RFLU(A_gpu; nrhs=nrhs, symbolic=alg.symbolic)
fact = RFLU(A_gpu; nrhs = nrhs, symbolic = alg.symbolic)

cache.cacheval = fact
cache.isfresh = false
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

end

F = @get_cacheval(cache, :CUSOLVERRFFactorization)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

# Ensure b and u are on GPU
b_gpu = cache.b isa CUDA.CuArray ? cache.b : CUDA.CuArray(cache.b)
u_gpu = cache.u isa CUDA.CuArray ? cache.u : CUDA.CuArray(cache.u)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

# Solve
copyto!(u_gpu, b_gpu)
ldiv!(F, u_gpu)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

Comment on lines +77 to +78

SciMLBase.build_linear_solution(alg, cache.u, nothing, cache; retcode = ReturnCode.Success)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
SciMLBase.build_linear_solution(alg, cache.u, nothing, cache; retcode = ReturnCode.Success)
SciMLBase.build_linear_solution(
alg, cache.u, nothing, cache; retcode = ReturnCode.Success)

Comment on lines +88 to +89

end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
end
end

A_f32 = Float32.(A)
b_f32 = Float32.(b)
prob_f32 = LinearProblem(A_f32, b_f32)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change

# This should error since CUSOLVERRF only supports Float64
@test_throws Exception solve(prob_f32, CUSOLVERRFFactorization())
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[JuliaFormatter] reported by reviewdog 🐶

Suggested change
end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants