Reasoning models use 'Wait' tokens for self-correction backtracking

Research LocalAI Prompt Engineering

TL;DR: Developers are noting reasoning LLMs outputting 'Wait' tokens to initiate self-correction during chain-of-thought processing.

Summary: Analysis of reinforcement learning-trained reasoning models shows they frequently output conversational transitions like 'Wait' to initiate backtracking upon detecting an error. This learned behavior allows the model to pause, re-evaluate its logic, and correct its execution path. While it improves accuracy, this pattern introduces extra latency and conversational noise to raw model outputs.

Why it matters: Understanding these reasoning markers allows developers to design better parsing pipelines for chain-of-thought streams. Builders should implement filtering layers to strip these conversational artifacts before displaying outputs to end users.

Source: r/localllama