-
Notifications
You must be signed in to change notification settings - Fork 12.6k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
opencl: support sinks in changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
soft_max
(attn sinks)
ggml
#15152
opened Aug 7, 2025 by
lhez
Loading…
sycl: Fix and disable more configurations of mul_mat
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#15151
opened Aug 7, 2025 by
Rbiessy
Loading…
kleidiai: fix unsigned overflow bug
ggml
changes relating to the ggml tensor library for machine learning
#15150
opened Aug 7, 2025 by
chaxu01
Loading…
chat : Avoid partial reasoning tags in response content
testing
Everything test related
#15149
opened Aug 7, 2025 by
p1-0tr
Loading…
SVE support for exponential functions
ggml
changes relating to the ggml tensor library for machine learning
#15145
opened Aug 7, 2025 by
s-goto-11
Loading…
OpenCL: allow mixed f16/f32 add
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
#15140
opened Aug 6, 2025 by
rmatif
Loading…
CUDA: Optimize changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
testing
Everything test related
reduce_rows_f32
kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n
ggml
#15132
opened Aug 6, 2025 by
ORippler
Loading…
vulkan: support flash attention sinks
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#15126
opened Aug 6, 2025 by
jeffbolznv
Loading…
Add T5Gemma support #14940
python
python script changes
#15123
opened Aug 6, 2025 by
baonudesifeizhai
Loading…
ggml: aarch64: Implement SVE F16 kernels for vector functions
ggml
changes relating to the ggml tensor library for machine learning
#15115
opened Aug 6, 2025 by
Vithulep
Loading…
gguf-py : add Numpy MXFP4 de/quantization support
ggml
changes relating to the ggml tensor library for machine learning
python
python script changes
Tensor Encoding Scheme
https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes
#15111
opened Aug 6, 2025 by
compilade
Loading…
vulkan: Add env var to disable host visible vidmem
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#15109
opened Aug 6, 2025 by
jeffbolznv
Loading…
mtmd: server: Support basic multimodal data in /completions endpoint of server
examples
server
#15108
opened Aug 6, 2025 by
65a
Loading…
CUDA/HIP: ssm-scan: switch from shared memory to reisters, fixes indexing problem on warp64 devices
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#15101
opened Aug 5, 2025 by
IMbackK
Loading…
model : add reasoning/tool parsing to Llama 3.x Nemotron
testing
Everything test related
#15083
opened Aug 5, 2025 by
aldehir
Loading…
CANN: GGML_OP_CPY optimization
Ascend NPU
issues specific to Ascend NPUs
ggml
changes relating to the ggml tensor library for machine learning
#15070
opened Aug 4, 2025 by
noemotiovon
Loading…
quantize : configurable neutral imatrix prior
examples
generation quality
Quality of model output
need feedback
Testing and feedback with results are needed
research 🔬
ggml-cpu : add basic RVV support for vector f32 ops
ggml
changes relating to the ggml tensor library for machine learning
#15057
opened Aug 3, 2025 by
xctan
Loading…
vulkan: conv2d addressing optimizations
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#15056
opened Aug 3, 2025 by
jeffbolznv
Loading…
Support streaming delta.reasoning_content in WebUI
examples
server
#15052
opened Aug 3, 2025 by
mostlygeek
Loading…
Fix: respect localStorage base URL override in Web UI
examples
server
#15048
opened Aug 3, 2025 by
insanerest
Loading…
Fix: flush partial stop string when <EOG> is reached in /completion endpoint in streaming mode
examples
server
#15007
opened Aug 1, 2025 by
matteoserva
Loading…
fix compile bug when the BASE_CUDA_DEV_CONTAINER is based on Ubuntu 2…
devops
improvements to build systems and github actions
#15005
opened Aug 1, 2025 by
simevo
Loading…
Add support for CogVLM model
examples
python
python script changes
#15002
opened Aug 1, 2025 by
Tianyue-Zhao
Loading…
2 of 4 tasks
OpenCL: add initial FA support
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
#14987
opened Jul 31, 2025 by
rmatif
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.