2% of ICML papers desk rejected because the authors used LLM in their reviews

By ICML 2026 Program Chairs Alekh Agarwal, Miroslav Dudik, Sharon Li, Martin Jaggi, Scientific Integrity Chair Nihar B. Shah, and Communications Chairs Katherine Gorman and Gautam Kamath.

AI has increasingly become a valuable part of researchers’ workflows. Unfortunately, AI has the potential to hurt the integrity of peer review if improperly used. Conferences must adapt, creating rules and policies to handle the new normal, and taking disciplinary action against those who break the rules and violate the trust that we all place in the review process.

ICML is actively working to adapt. This year, we desk-rejected 497 papers (~2% of all submissions), corresponding to submissions of the 506 reciprocal reviewers who violated the rules regarding LLM usage that they had previously explicitly agreed to.

ICML 2026 has two policies regarding LLM use in reviewing:

Policy A (Conservative): No LLM use allowed.

Policy B (Permissive): LLMs allowed to help understand the paper and related works, and polish reviews.

This two-policy framework was formed based on community preferences and feedback — indeed, the community is divided on the best way to use LLMs in peer review, with issues such as author consent colliding with preferred reviewer workflows. Further details on the policy are available here.

After a selection process, in which reviewers got to choose which policy they would like to operate under, they were assigned to either Policy A or Policy B. In the end, based on author demands and reviewer signups, the only reviewers who were assigned to Policy A (no LLMs) were those who explicitly selected “Policy A” or “I am okay with either [Policy] A or B.” To be clear, no reviewer who strongly preferred Policy B was assigned to Policy A.

795 reviews (~1% of all reviews) written by 506 unique reviewers who were assigned Policy A (no LLMs) were detected to have used LLMs in their review. Again, recall that these are reviewers who explicitly agreed to not use LLMs in their reviews. The method used is described below, and generic AI-text detectors were not used. Every flagged instance was manually verified by a human, in order to avoid false positives.

If the designated Reciprocal Reviewer for a submission produced such a review, their submission was rejected. In total, this resulted in 497 rejections. All Policy A (no LLMs) reviews that were detected to be LLM generated were removed from the system. If more than half of the reviews submitted by a Policy A reviewer were detected to be LLM generated, then all of their reviews were deleted, and the reviewer themselves was removed from the reviewer pool. 51 Policy A reviewers were detected to have used LLMs in more than half of their reviews, which is about 10% of the total of 506 detected reviewers.

... continue reading