llama.cpp Adds Gemma 4 Vision and Qwen 3.5 Support

LocalAI OpenSource Tools

TL;DR: The latest llama.cpp release introduces critical updates including non-causal vision support for Gemma 4 and post-norm hidden states for Qwen 3.5 MTP.

Summary: Release b9496 of llama.cpp adds support for Gemma 4 unified models, specifically fixing the unified FPE and enabling non-causal vision capabilities. It also updates Qwen 3.5 support by implementing the post-norm hidden state for Multi-Token Prediction (MTP). These changes extend llama.cpp's compatibility with Google and Alibaba's newest open models.

Why it matters: AI developers can now run Gemma 4 and Qwen 3.5 models locally with optimized hardware acceleration. Watch for improved inference performance and support for multimodal tasks on local consumer hardware.

Source: github.com