Hugging Face Transformers v5.10.1 Adds Gemma 4 12B Unified

Research

TL;DR: Hugging Face released Transformers v5.10.1, introducing support for the encoder-free Gemma 4 12B Unified multimodal model.

Summary: The Hugging Face Transformers v5.10.1 release introduces support for Gemma 4 12B Unified, an encoder-free multimodal model available in pretrained and instruction-tuned variants. Instead of using dedicated encoder towers, this architecture projects raw inputs directly into the language model's embedding space using lightweight linear pipelines. This release follows the immediate yanking of version v5.10.0 due to a corrupted branch issue.

Why it matters: This architectural shift simplifies multimodal pipelines, reducing deployment complexity for developers building vision-language applications. Builders should evaluate the unified model to see if the simpler pipeline matches standard Gemma 4 performance in their workflows.

Source: github.com