-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
server : fix json_schema response_format ignored by some chat templates
examples
server
#21537
opened Apr 7, 2026 by
wiktoraleksanderkaczor
Loading…
common: fix split model loading by sorting file list
testing
Everything test related
#21535
opened Apr 6, 2026 by
brettp
Loading…
YATF (Yet Another Tokenizer Fix) for Gemma 4. With tests!
python
python script changes
testing
Everything test related
#21534
opened Apr 6, 2026 by
pwilkin
Loading…
ggml-webgpu: parameterize submission size and add iOS specific limits
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#21533
opened Apr 6, 2026 by
reeselevine
Loading…
llama: remove per-arch tensor name lists
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
#21531
opened Apr 6, 2026 by
JohannesGaessler
Loading…
metal: Q1_0 backend
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
#21528
opened Apr 6, 2026 by
khosravipasha
Loading…
[SYCL] Add Q8_0 reorder optimization for Intel GPUs (~3x token generation speedup)
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#21527
opened Apr 6, 2026 by
PMZFX
Loading…
common : preserve original Gemma 4 tool responses even when JSON-like
#21522
opened Apr 6, 2026 by
kiwixz
Loading…
ggml-webgpu: address quantization precision and backend lifecycle managment
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
WebGPU
#21521
opened Apr 6, 2026 by
Constannnnnt
Loading…
ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210)
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21519
opened Apr 6, 2026 by
aviallon
Loading…
docs: fix typo in build.md (emdawbwebgpu -> emdawnwebgpu)
documentation
Improvements or additions to documentation
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
#21518
opened Apr 6, 2026 by
CastelDazur
Loading…
kv-cache : support attention rotation for heterogeneous iSWA
#21513
opened Apr 6, 2026 by
ggerganov
Loading…
server : fix restore for checkpoints with pos_min == 0
examples
server
#21510
opened Apr 6, 2026 by
ggerganov
Loading…
llama-server: fix model params not propagated
examples
server
#21509
opened Apr 6, 2026 by
taronaeo
Loading…
llama-quant : overlap compute and write with double buffering
#21507
opened Apr 6, 2026 by
nuri-yoo
Loading…
6 tasks done
mtmd: fit_params now take into account mmproj
examples
server
#21489
opened Apr 5, 2026 by
ngxson
Loading…
server: add null check for context to prevent segfault on init failure
examples
server
#21477
opened Apr 5, 2026 by
Anirudh171202
Loading…
gguf-py: Fix lazy tensor handling for keyword arguments
python
python script changes
#21476
opened Apr 5, 2026 by
lainon1
Loading…
llama-quant: use LLM_KV constants instead of hardcoded strings
#21475
opened Apr 5, 2026 by
lainon1
Loading…
CUDA: make cuda graphs props check faster
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21472
opened Apr 5, 2026 by
am17an
Loading…
ggml : fix repeat_back assert with non-contiguous gradients
ggml
changes relating to the ggml tensor library for machine learning
#21467
opened Apr 5, 2026 by
RealOrko
Loading…
ggml : add GGML_OP_GATHER for DeepSeek Sparse Attention (DSA) #21149
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
#21458
opened Apr 5, 2026 by
LilySu
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.