C++
A C++ header-only Eigen-based Library for Lie group operations
A C++ header-only Eigen-based Library for Lie group operations
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Large-scale LLM inference engine
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Examples from Programming in Parallel with CUDA
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels