TL;DR: Google has released Gemma 4 Quantization-Aware Training (QAT) weights, enabling large model sizes to run on consumer hardware with 3x less memory.
Summary: Google released Gemma 4 checkpoints optimized with Quantization-Aware Training (QAT) on Hugging Face and Ollama. The QAT process reduces memory requirements while maintaining model quality, allowing the Gemma 4 E4B model to run on mobile devices with 2GB RAM and the 31B model to run on laptops. These optimized weights are available across all Gemma 4 model sizes and their drafters.
Why it matters: This release lowers the hardware barrier for running state-of-the-art open models locally on consumer devices. Developers should try the QAT weights via Ollama or Hugging Face collections to integrate high-performance local LLMs into resource-constrained applications.
Source: @itspaulai