Skip to content
View coderonion's full-sized avatar

Block or report coderonion

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Nano vLLM

Python 1 Updated Jun 27, 2025

Nano vLLM

Python 5,591 674 Updated Jun 27, 2025

Fast and memory-efficient exact attention

Python 1 Updated Jul 24, 2025

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。

C++ 3,834 391 Updated Jul 14, 2025

Transformer: PyTorch Implementation of "Attention Is All You Need"

Python 1 Updated Jul 15, 2025

Transformer: PyTorch Implementation of "Attention Is All You Need"

Python 3,926 560 Updated Jul 15, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1 Updated Jul 10, 2025

hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等

Cuda 1 Updated Jun 5, 2025

hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等

Cuda 27 5 Updated Jul 24, 2025

Best practices & guides on how to write distributed pytorch training code

Python 462 38 Updated Feb 24, 2025

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 287 58 Updated Jun 13, 2025

A light llama-like llm inference framework based on the triton kernel.

Python 1 Updated Jun 8, 2025
Python 1 Updated May 11, 2025
Python 33 9 Updated May 11, 2025
Python 18 5 Updated Feb 10, 2025

Examples from Programming in Parallel with CUDA

Cuda 157 57 Updated Mar 17, 2023

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 1 Updated May 28, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 4 Updated Apr 30, 2025

TTRL: Test-Time Reinforcement Learning

Python 745 62 Updated Jul 11, 2025

Large-scale LLM inference engine

C++ 1,495 160 Updated Aug 4, 2025

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1 Updated Apr 15, 2025

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,139 177 Updated Aug 4, 2025

Yan (炎) is a high-performance CUDA operator library designed for learning purposes while emphasizing clean code and maximum performance.

Cuda 18 Updated Jul 21, 2025

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 2,959 226 Updated Jul 11, 2025

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

1,025 46 Updated Jul 15, 2025

Implementing DeepSeek R1's GRPO algorithm from scratch

Python 1,508 69 Updated Apr 18, 2025

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 1 Updated Apr 8, 2025

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Python 1 Updated Apr 9, 2025

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Python 138 8 Updated Apr 9, 2025

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 647 35 Updated Jul 28, 2025
Next