No Hidden Prompts Needed!
You Can Game AI Peer Review with Presentation-Only Revisions
No Hidden Prompts Needed! Gaming AI Review ARGAR
As LLM-generated reviews enter peer review, robustness work has focused on explicit attacks like prompt injection. We surface a subtler, policy-relevant risk: modifying only presentation-level content (abstract, contribution statements, narrative) with scientific content held fixed can systematically overturn AI-review outcomes.
- ▸ Contribution list enhancement
- ▸ Abstract reframing
- ▸ Related-work repositioning
- ▸ Analytical discussion expansion
Resistance to presentation-only review gaming.
We propose this as a necessary condition for AI review automation: necessary, not sufficient.
conditionWhen scientific content is unchanged, AI reviewer scores should not become systematically more favorable merely because the presentation is adjusted.
Unlike prompt injection or hidden text (already banned and grounds for desk rejection), these edits are legitimate, visible, and within normal writing practice. That makes them far harder to guard against.
reframed, reorganized, renarrated, but no scientific claim unsupported by the original.
may be reworded or reorganized, but factual content must be preserved.
experimental evidence is untouched. This is what the work actually contributes.
Adversarial Repackaging.
Three design choices define it; each round then runs them as a closed loop.
Closed-loop iteration
Best-version tracking across rounds, not a single blind rewrite.
Full-paper editing
The whole presentation layer is in scope, not just the abstract.
Signal-driven strategy
Each round conditions on the reviewer's own specific feedback.
Profile
Extract structured signals from the N reviews: recurring perceptions, tagged by frequency & severity.
Best score climbs only on accepted rounds; failed edits are rolled back, never carried.
Overall attack effectiveness.
Repackaging alone shifts AI reviews. With methods, experiments, figures and numbers held fixed, presentation edits alone produce systematic score gains across all three reviewer models: +1.21 mean, 75.1% success rate.
Built on three design dimensions.
our attack beats every baseline on both reviewers; effectiveness comes from the synergy of all three · paper subset · ΔS per reviewer✓ has it · ◐ signal-guided, but no strategy pool · ✗ missing
The strength–weakness asymmetry.
Easier to impress than to convince. AI reviewers respond to positive presentation signals in a stable, predictable manner, yet their response to attempts at dissolving criticism is uncontrollable and frequently backfires. They reward the salience of strengths more readily than they forgive the evidence of weaknesses.
Δ strength is right-skewed and unimodal; Δ severity is bimodal: a second hump on the positive side is criticism that got harsher.
In rounds that fail the gate, strengths are still enhanced 79.9% of the time, yet weaknesses deteriorate in 45.3%; failures stem not from insufficient strength gains but from weaknesses resistant to dissolution. For 79.2% of papers, mean strength gain exceeds mean weakness reduction.
Among all rounds where the overall score rises, 15.8% simultaneously exhibit worsening weaknesses. Even when deficiencies are criticized more harshly, the score still climbs as long as sufficiently salient new strengths are introduced, and the aggregate judgment is swamped by amplified strength signals.
Strategy effectiveness gradient.
Reframing beats polishing. Success is not “better writing → higher score.” Edits that change how the reviewer understands the paper (what it contributes and how significant it is) far outperform ones that only improve surface appearance. And a strategy that opens the attack is not the one that sustains it.
What opens the attack
First-hit attribution: share of first successful rounds that contain the strategy.
Surface edits (table formatting, text polishing, algorithm boxes) rarely appear in a first breakthrough at all. The opener is structural or narrative.
What sustains the gain
Accepted-exposure rate: share of a strategy’s rounds accepted as the new best (baseline 30.8%).
Contribution-list enhancement opens 87.2% of first breakthroughs but sustains only a 36.8% accepted-exposure rate, with diminishing returns in later rounds. Narrative restructuring (related-work repositioning, discussion expansion) is what sustains effectiveness once the baseline is raised, while surface edits (table formatting, polishing, algorithm boxes) stay below the 30.8% baseline.
Anatomy of a case study.
A paper proposing an information-theoretic metric for low-dimensional embedding quality (Shannon entropy + stable rank), validated on two datasets across five dimensionality-reduction methods.
Same 2 datasets · 5 methods · every number unchanged. Only the presentation moved.
This paper introduces an information-theoretic metric for evaluating the quality of low-dimensional embeddings. The authors argue that existing metrics focus on geometric distortions but do not directly assess information preservation. The metric is validated on two datasets and five reduction methods, showing strong average correlation with a geometric baseline but significant local discrepancies.
This paper introduces an information-theoretic metric for evaluating low-dimensional embedding quality. Unlike existing metrics that focus on geometric distortions, it quantifies information preservation via Shannon entropy and stable rank. Experiments across five reduction methods on synthetic and real-world data demonstrate that (i) distance preservation does not imply information preservation; (ii) strong global correlation (|ρ|=0.96) yet local divergence; (iii) global averages mask pathological neighborhoods.
The reviewer is not evaluating the paper's contributions; it is relaying the paper's claims about them, adopting the manuscript's own phrasing as if it were independent judgment.
Novelty claim
strategy · Contribution list enhancement + Abstract reframingAddresses a genuine gap in dimensionality-reduction evaluation by proposing an explicitly information-theoretic quality metric.
Novel information-theoretic perspective: the first embedding-quality metric explicitly grounded in information theory, addressing a fundamental gap.
No new evidence: the same contribution, stated more assertively. The reviewer upgrades its verdict straight from the paper's own wording, without independently verifying the claim.
Experiments
strategy · Preemptive framingExperimental validation includes comparison with established metrics on synthetic and real-world data.
Comprehensive experimental validation across five reduction methods, demonstrating consistent behavior across complementary data regimes.
The same two datasets, pre-described as “complementary data regimes”; the reviewer adopts the phrase as comprehensiveness.
Theory
strategy · Theoretical formalizationThe connection between stable rank, entropy, and information content is conceptually interesting.
Thorough comparison with existing metrics: systematically compares the metric against established baselines.
An added proposition restates an existing bound (no new math); “conceptually interesting” becomes “systematically compares.”
Across every mechanism the logic is identical: the AI reviewer equates the appearance of having addressed an issue with actually having resolved it. An explanation counts as a methodological decision; a proposition counts as a theoretical contribution; an acknowledged limitation counts as a resolved one.
Not tied to one model or template: it transfers.
An attack optimized against one reviewer still works when a different model re-scores the paper, and across other venues’ guidelines. Every off-diagonal entry stays positive.
ICLR-optimized papers re-scored under other venues' guidelines (reviewer: Sonnet 4.5). Scales differ (ICLR 1–10; NeurIPS/ICML 1–6), so absolute ΔS is not directly comparable across templates, but stays positive within each.
All off-diagonal entries are positive, so attacks transfer across models. Matched mean +1.42 vs mismatched mean +0.88.
A contamination-free testbed for AI-review robustness.
We release the full construction pipeline, not a frozen snapshot: a contamination-free, rolling benchmark of unpublished papers that refreshes as models evolve, with source and PDF paired to mirror the real review workflow.
What makes it a clean testbed
A re-runnable construction pipeline
Released as open code: fully automated, no manual curation. Re-run it anytime to rebuild a fresh, uncontaminated set.