TL;DR: llama.cpp release b9480 integrates StepFun 3.5 with Multi-Token Prediction (MTP), enabling efficient local inference of this model.
Summary: The latest llama.cpp release (b9480) adds support for StepFun 3.5 Multi-Token Prediction (MTP), simplified to a single layer. The release also includes a server SSE ping interval and deprecation of llama_set_warmup. This extends local LLM capabilities to a new model architecture.
Why it matters: StepFun 3.5 with MTP offers a novel inference pattern that can improve generation speed and coherence. Builders can experiment with this model locally using llama.cpp's optimized inference stack.
Source: github.com