coderonion

Follow

coderonion

Follow

236 followers · 1.8k following

Achievements

Achievements

Lists (9)

Sort

Awesome

440 repositories

C++

616 repositories

CUDA|Triton|TensorRT

252 repositories

Fortran|Zig|Go

301 repositories

Leetcode

34 repositories

Mojo

60 repositories

Python

1374 repositories

Rust

848 repositories

TypeScript|Flutter|Swift

39 repositories

Starred repositories

coderonion / nano-vllm

Forked from GeeeekExplorer/nano-vllm

Nano vLLM

Python 1 Updated Jun 27, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 5,591 674 Updated Jun 27, 2025

coderonion / flash-attention

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python 1 Updated Jul 24, 2025

ztxz16 / fastllm

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型，任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型，单并发20tps；INT4量化模型单并发30tps，多并发可达60+。

C++ 3,834 391 Updated Jul 14, 2025

coderonion / transformer-hyunwoongko

Forked from hyunwoongko/transformer

Transformer: PyTorch Implementation of "Attention Is All You Need"

Python 1 Updated Jul 15, 2025

hyunwoongko / transformer

Transformer: PyTorch Implementation of "Attention Is All You Need"

Python 3,926 560 Updated Jul 15, 2025

coderonion / tilelang

Forked from tile-ai/tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1 Updated Jul 10, 2025

coderonion / ai-infra-hpc

Forked from jinbooooom/ai-infra-hpc

hpc 教程，包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等

Cuda 1 Updated Jun 5, 2025

jinbooooom / ai-infra-hpc

hpc 教程，包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等

Cuda 27 5 Updated Jul 24, 2025

LambdaLabsML / distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

Python 462 38 Updated Feb 24, 2025

FZJ-JSC / tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 287 58 Updated Jun 13, 2025

coderonion / lite_llama

Forked from harleyszhang/lite_llama

A light llama-like llm inference framework based on the triton kernel.

Python 1 Updated Jun 8, 2025

coderonion / triton_course

Forked from zjhellofss/triton_course

Python 1 Updated May 11, 2025

zjhellofss / triton_course

Python 33 9 Updated May 11, 2025

deepmodeling / openlam

Python 18 5 Updated Feb 10, 2025

RichardAns / CUDA-Programs

Examples from Programming in Parallel with CUDA

Cuda 157 57 Updated Mar 17, 2023

coderonion / LeetCUDA

Forked from xlite-dev/LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 1 Updated May 28, 2025

coderonion / Qwen3

Forked from QwenLM/Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 4 Updated Apr 30, 2025

PRIME-RL / TTRL

TTRL: Test-Time Reinforcement Learning

Python 745 62 Updated Jul 11, 2025

aphrodite-engine / aphrodite-engine

Large-scale LLM inference engine

C++ 1,495 160 Updated Aug 4, 2025

coderonion / SageAttention

Forked from thu-ml/SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1 Updated Apr 15, 2025

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,139 177 Updated Aug 4, 2025

zoheth / yan

Yan (炎) is a high-performance CUDA operator library designed for learning purposes while emphasizing clean code and maximum performance.

Cuda 18 Updated Jul 21, 2025

PeterGriffinJin / Search-R1

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 2,959 226 Updated Jul 11, 2025

MoonshotAI / Kimi-VL

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

1,025 46 Updated Jul 15, 2025

policy-gradient / GRPO-Zero

Implementing DeepSeek R1's GRPO algorithm from scratch

Python 1,508 69 Updated Apr 18, 2025

coderonion / Video-R1

Forked from tulerfeng/Video-R1

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 1 Updated Apr 8, 2025

coderonion / MAYE

Forked from GAIR-NLP/MAYE

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Python 1 Updated Apr 9, 2025

GAIR-NLP / MAYE

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Python 138 8 Updated Apr 9, 2025

tulerfeng / Video-R1

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 647 35 Updated Jul 28, 2025

Starred topics

Rust

Zig

C

Machine learning

Algorithm

Awesome Lists

.NET

Scala

C#

Computer vision

See all starred topics