TL;DR: A pull request adds Multi-Token Prediction support for StepFun 3.5/3.7 Flash models in llama.cpp, letting users toggle thinking mode.
Summary: Pull request #23274 by pwilkin introduces MTP (Multi-Token Prediction) support for StepFun 3.5 and 3.7 Flash models in llama.cpp. The feature allows users to enable, disable, or limit thinking during inference, expanding MTP availability beyond previous models like Gemma.
Why it matters: MTP can significantly speed up local inference for compatible models. Developers using llama.cpp should test this PR to leverage faster generation on StepFun GGUF models and track its potential for broader adoption.
Source: r/localllama