TL;DR: llama.cpp has added support for Mellum and Granite embedding models, expanding options for localized vector search and RAG pipelines.
Summary: Support for Mellum and Granite embedding models has been integrated into the llama.cpp repository under pull requests 23966 and 22716. This update allows builders to run these specific embedding models locally using llama.cpp's execution engine. This facilitates hardware-optimized, quantized local text embeddings without relying on external cloud APIs.
Why it matters: Developers can now run high-efficiency embeddings on local consumer hardware with minimal memory footprints. AI builders should update their local llama.cpp installations and look for compatible GGUF quants of these embedding models.
Source: r/localllama