TL;DR: The latest llama.cpp release unifies tool parsing for LFM models and optimizes OpenCL performance.
Summary: Llama.cpp version b9535 introduces a unified and fixed tool parser for LFM2 and LFM2.5 models within its common chat interface. Additionally, this release brings performance enhancements for OpenCL, including optimizations for memory copying, concatenation, and flat GEMV operations. It also rounds up tensor parallelism granularity to 128 to improve scaling.
Why it matters: This release improves local function calling capabilities and memory-efficient inference for LFM models. Developers using OpenCL or tensor parallelism for local LLM deployment should update to leverage these stability and speed improvements.
Source: github.com