TL;DR: The latest llama.cpp release introduces non-causal vision support for Gemma 4 unified and fixes MTP post-norm hidden states for Qwen 3.5.
Summary: Release b9494 of llama.cpp enables non-causal vision capabilities for the Gemma 4 unified model. It also integrates post-norm hidden state usage for Multi-token Prediction (MTP) in Qwen 3.5 and resolves a race condition in the Parallelized Dual-Lattice (PDL) kernels by selectively disabling the restrict keyword on affected architectures. The release supports multiple pre-compiled binaries across macOS, Windows, Linux, and Android.
Why it matters: This update allows developers utilizing llama.cpp to deploy more advanced vision capabilities with Gemma 4 and leverage optimized Qwen 3.5 models locally. Local AI builders should update their llama.cpp binaries to take advantage of these compatibility fixes and performance optimizations.
Source: github.com