TL;DR: Google has released Gemma 4 models using Quantization-Aware Training to maintain high performance in low-bit quantized formats.
Summary: Google announced Gemma 4 models trained with Quantization-Aware Training, providing official collections for mobile and Q4_0 formats on Hugging Face. Unlike post-training quantization, this technique simulates low-bit precision during training to minimize quality loss. Unsloth has supported this release with optimized collections and integration guides.
Why it matters: This enables developers to deploy highly capable models on consumer hardware and mobile devices with minimal accuracy degradation compared to traditional quantization. Developers targeting low-resource environments should evaluate these new collections.
Source: r/localllama