Llama.cpp release b9531 updates tensor parallelism granularity

LocalAI OpenSource Tools

TL;DR: The latest llama.cpp release introduces tensor parallelism granularity adjustments and Vulkan backend optimization.

Summary: Release b9531 of llama.cpp rounds up tensor parallelism granularity to 128 and removes associated assert constraints. The update adds FWHT support for Intel GPUs in the Vulkan backend using shared memory reduction. It also resolves build compilation issues across multiple platforms.

Why it matters: These changes optimize model execution scaling across multiple GPUs and enhance Vulkan backend support for Intel hardware. Developers using llama.cpp locally should update their versions to leverage these performance optimizations.

Source: github.com