llama.cpp b9480 adds StepFun 3.5 MTP support

OpenSource LocalAI Architecture

TL;DR: llama.cpp release b9480 integrates StepFun 3.5 with Multi-Token Prediction (MTP), enabling efficient local inference of this model.

Summary: The latest llama.cpp release (b9480) adds support for StepFun 3.5 Multi-Token Prediction (MTP), simplified to a single layer. The release also includes a server SSE ping interval and deprecation of llama_set_warmup. This extends local LLM capabilities to a new model architecture.

Why it matters: StepFun 3.5 with MTP offers a novel inference pattern that can improve generation speed and coherence. Builders can experiment with this model locally using llama.cpp's optimized inference stack.

Source: github.com