NeurIPS Reviewers Targeted with Prompt Injection Attacks

Research Security

TL;DR: NeurIPS peer reviewers are being warned of prompt injection attacks hidden in paper submissions designed to manipulate LLM-assisted reviews.

Summary: Similar to attacks observed during the ICML review cycle, submissions to NeurIPS have been found to contain clever prompt injections. These hidden instructions attempt to hijack LLM-assisted reviewer workflows to generate favorable reviews or bypass critical evaluations. The discovery highlights the growing vulnerability of academic peer review processes that rely on automated LLM pipelines.

Why it matters: AI builders and researchers using LLMs for document synthesis or automated evaluation must implement robust sanitation to prevent instruction hijacking. Academic organizers need to screen submissions for prompt injection payloads before passing texts to reviewer-assistance tools.

Source: r/machinelearning