Google Releases Gemma 4 with Quantization-Aware Training

Research

TL;DR: Google has released Gemma 4 models using Quantization-Aware Training to maintain high performance in low-bit quantized formats.

Summary: Google announced Gemma 4 models trained with Quantization-Aware Training, providing official collections for mobile and Q4_0 formats on Hugging Face. Unlike post-training quantization, this technique simulates low-bit precision during training to minimize quality loss. Unsloth has supported this release with optimized collections and integration guides.

Why it matters: This enables developers to deploy highly capable models on consumer hardware and mobile devices with minimal accuracy degradation compared to traditional quantization. Developers targeting low-resource environments should evaluate these new collections.

Source: r/localllama