Most benchmarks tell us how AI coding models perform in carefully constructed scenarios. But they don’t tell us what developers actually think when they use these tools every day. That gap is why I built a Reddit sentiment analysis dashboard to see how real engineers compare Claude Code vs Codex in the wild. You can find the dashboard at
https://claude-vs-codex-dashboard.vercel.app/
and the source code at: https://github.com/waprin/claude-vs-codex-dashboard
There are some options to view sentiment weighted or unweighted by upvotes, and compare on specific categories like speed, problem solving, and workflows.
In this newsletter edition, I’ll discuss:
What trends the sentiment analysis dashboard uncovers on Claude Code vs Codex discussions on Reddit
The methodology I used to build the dashboard and plans for future improvements
While notable AI benchmarks like SWEbench, PR Arena, TerminalBench, and LMArena help us navigate the landscape of the quality of AI models, I don’t think any benchmark can truly capture how most software engineers are using agentic coding models day-to-day. We don’t typically “set-it-and-forget” the agent on a constructed task but rather there’s an interactive back-and-forth conversational session. Furthermore, engineers in the wild are facing a far greater diversity of tasks than any given benchmark could hope to capture.
For those reasons, I believe a survey of the “wisdom of the crowd” is valuable to gain a broader understanding of which agentic coding models are performing better. To do so, I scraped a wide variety of comments on Reddit from AI-coding focused subreddits such as /r/ChatGPTCoding, /r/ClaudeCode, and /r/Codex. I then used the Claude Haiku model to classify whether the comment directly compared Claude Code and Codex, and classified the sentiment accordingly.
(note: this analysis was done before the new Haiku model that Anthropic announced yesterday)
... continue reading