We've identified multiple loopholes with SWE Bench Verified where agents may look at future repository state (by querying it directly or through a variety of methods), and cases in which future repository state includes either solutions or detailed approaches to solving problems (commit messages and more).
Examples:
A trajectory with Claude 4 Sonnet, Pytest-dev__pytest-6202 (complete output here), the agent uses git log --all which leaks future commits that directly fix the issue:
The results of which directly reveal the fix:
Fix incorrect result of getmodpath method. diff --git a/src/_pytest/python.py b/src/_pytest/python.py index b8b365ad3..734a92f9b 100644 --- a/src/_pytest/python.py +++ b/src/_pytest/python.py @@ -285,8 +285,7 @@ class PyobjMixin(PyobjContext): break parts.append(name) parts.reverse() - s = ".".join(parts) - return s.replace(".[", "[") + return ".".join(parts)
Qwen3-Coder 480B ( 20250805-openhands-Qwen3-Coder-480B-A35B-Instruct ) also has several cases of looking ahead: some examples include django__django-13513 (complete output here) uses git log grep=[issue ID] which directly reveals the fix PR which is in the future repo state (future commits).
Running command: cd /workspace/django__django__3.2 && �[1m�[91mgit log�[0m --oneline --grep="31926" -i
In another Qwen3-Coder trajectory, Django__django-15572 , (complete output here) where the model specifically finds the commit containing the fix: 62739b6e2630e37faa68a86a59fad135cc788cd7
Command cd /workspace/django__django__4.1 && �[1m�[91mgit log�[0m --oneline --grep="33628" �[92m--all�[0m executed with exit code 0.
... continue reading