TL;DR: Developers are noting reasoning LLMs outputting 'Wait' tokens to initiate self-correction during chain-of-thought processing.
Summary: Analysis of reinforcement learning-trained reasoning models shows they frequently output conversational transitions like 'Wait' to initiate backtracking upon detecting an error. This learned behavior allows the model to pause, re-evaluate its logic, and correct its execution path. While it improves accuracy, this pattern introduces extra latency and conversational noise to raw model outputs.
Why it matters: Understanding these reasoning markers allows developers to design better parsing pipelines for chain-of-thought streams. Builders should implement filtering layers to strip these conversational artifacts before displaying outputs to end users.
Source: r/localllama