llama.cpp b9523 release adds KleidiAI scheduling

Coding OpenSource

TL;DR: The llama.cpp b9523 release introduces dynamic chunk-based scheduling for KleidiAI hybrid execution alongside various hyperparameter and hardware-specific optimizations.

Summary: The ggerganov/llama.cpp project released version b9523, featuring dynamic chunk-based scheduling for hybrid execution with KleidiAI. This version refactors model hyperparameters (such as layer counting) and fixes Step3.5 Multi-Token Prediction (MTP). The release also includes updated pre-built binaries and build configurations across macOS, Linux, Windows, and Android.

Why it matters: AI builders using llama.cpp on local or edge hardware can expect improved performance and scheduling efficiency, particularly on ARM architectures utilizing KleidiAI. Keep an eye on Step3.5 MTP stability and test the new chunk-based scheduling on compatible CPUs.

Source: github.com