Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

common: fix split model loading by sorting file list testing Everything test related
#21535 opened Apr 6, 2026 by brettp Loading…
YATF (Yet Another Tokenizer Fix) for Gemma 4. With tests! python python script changes testing Everything test related
#21534 opened Apr 6, 2026 by pwilkin Loading…
ggml-webgpu: parameterize submission size and add iOS specific limits ggml changes relating to the ggml tensor library for machine learning WebGPU
#21533 opened Apr 6, 2026 by reeselevine Loading…
llama: remove per-arch tensor name lists merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.
#21531 opened Apr 6, 2026 by JohannesGaessler Loading…
metal: Q1_0 backend Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related
#21528 opened Apr 6, 2026 by khosravipasha Loading…
[SYCL] Add Q8_0 reorder optimization for Intel GPUs (~3x token generation speedup) ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
#21527 opened Apr 6, 2026 by PMZFX Loading…
ggml-webgpu: address quantization precision and backend lifecycle managment ggml changes relating to the ggml tensor library for machine learning testing Everything test related WebGPU
#21521 opened Apr 6, 2026 by Constannnnnt Loading…
ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210) ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#21519 opened Apr 6, 2026 by aviallon Loading…
docs: fix typo in build.md (emdawbwebgpu -> emdawnwebgpu) documentation Improvements or additions to documentation merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.
#21518 opened Apr 6, 2026 by CastelDazur Loading…
llama-quant : overlap compute and write with double buffering
#21507 opened Apr 6, 2026 by nuri-yoo Loading…
6 tasks done
vocab : remove </s> eog token if gemma4
#21492 opened Apr 6, 2026 by aldehir Loading…
llama-quantize: fix tensor-type logic
#21482 opened Apr 5, 2026 by theo77186 Loading…
gguf-py: Fix lazy tensor handling for keyword arguments python python script changes
#21476 opened Apr 5, 2026 by lainon1 Loading…
CUDA: make cuda graphs props check faster ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#21472 opened Apr 5, 2026 by am17an Loading…
ggml : fix repeat_back assert with non-contiguous gradients ggml changes relating to the ggml tensor library for machine learning
#21467 opened Apr 5, 2026 by RealOrko Loading…
ggml : add GGML_OP_GATHER for DeepSeek Sparse Attention (DSA) #21149 ggml changes relating to the ggml tensor library for machine learning testing Everything test related
#21458 opened Apr 5, 2026 by LilySu Loading…
ProTip! Mix and match filters to narrow down what you’re looking for.