GPT-5 vs. Sonnet: Complex Agentic Coding

OpenAI released GPT-5 yesterday, promoting it as their best model yet for agentic coding. When it arrived in my GitHub Copilot this morning, I immediately decided to test it with a complex, long-running agentic coding task — and later gave the exact same task to Claude 4 Sonnet 4 for comparison.

While this isn't a tightly controlled scientific comparison — more of a "vibe check" — both models impressed me with their results. It's worth noting that while Claude Sonnet has been established for coding for a while, GPT-5 is brand new, available in preview with some kinks to iron out, and had only been available for a few hours when I tested it.

The Task

The challenge I set for both models: review the current implementation of Ruler (a tool I built with AI help in TypeScript) and port it to Rust. Ruler isn't especially sophisticated — it's primarily a tool for managing text and configuration, moving text from here to there — but it has enough complexity with support for many different agents and formats to make it a stretch for something to do in one go.

Testing GPT-5: Intelligence and Agency

I opened Visual Studio Code, set GitHub Copilot Chat to GPT-5, and dictated my requirements pretty much stream of consciousness. I intentionally didn't do much planning myself — this model is advertised as good at following instructions with minimal prompting, and I wanted to see how good it is at planning.

The task is to create a port of Ruler, the tool implemented in this repository, in the Rust programming language. You will create a new branch to work in and create a new directory where the Rust implementation will live. You will analyze the code base and everything you have here to understand exactly how the tool ruler works. You will create a functional test that uses a shell script to test both the TypeScript implementation and the Rust implementation independently by running the tool in a temporary directory and seeing how it behaves. When you have that functional test in a shell script and it works correctly with the TypeScript implementation, you can go ahead and create the Rust implementation. The Rust implementation needs to work exactly the same as the TypeScript implementation. all the invocation parameters, all the locations of files and all of its behavior. Unlike the TypeScript implementation it doesn't need to have testing like unit tests or end-to-end tests. You will only use the functional test you wrote in a shell script to validate that it is working correctly. You need to continuously work on the Rust implementation and test it with your functional test until it works correctly exactly as the TypeScript implementation works. When you have a working implementation, you will commit the changes Remember that you're working in a different branch, not in the main branch and you'll push that branch to github but don't create a pull request. Before you begin, research the task based on everything I just told you. look at the code base and do research online if you need, and write up a very detailed plan for what you're going to do and put it in a markdown file plan.md and then stop and let me review that file.

The model immediately impressed. It reviewed my codebase, read online information about relevant packages and libraries, then produced a very nice detailed plan. After I reviewed and approved it, I instructed GPT-5 to execute.

# Ruler Rust Port: Detailed Plan Date: 2025-08-08 ## Goal Create a faithful Rust port of the Ruler CLI that behaves identically to the TypeScript implementation, validated by an end-to-end functional shell test. Work happens on a dedicated branch and will be pushed without opening a PR. ## Scope - Implement the Ruler CLI in Rust with the same commands, flags, config discovery, and outputs: - Commands: `apply`, `init`, `revert`. - Options parity (names, defaults, negations, behavior): - `--project-root ` (default: cwd) - `--agents ` (comma-separated identifiers or substrings) - `--config ` - `--mcp`/`--with-mcp` (default: true) - `--no-mcp` - `--mcp-overwrite` (strategy: overwrite vs merge) - `--gitignore` / `--no-gitignore` (default: enabled) - `--verbose` / `-v` - `--dry-run` - `--local-only` - `revert` additional: `--keep-backups` - `init` additional: `--global` - Exit codes and error prefix: `[RulerError]` for errors. - Output files and directory structure are identical. ... > See https://github.com/intellectronica/ruler/blob/feat/rust-port/plan.md

The first thing I noticed: GPT-5 is way more agentic than previous models. It went on and on — looking at different parts of the codebase, writing new things, all without stopping, doing lots of work with almost no intervention. It stopped twice during the interaction, declaring it had finished some work, explaining clearly what it did and what was left to do, and asking how to continue. This was acceptable — it gave me correct information on the project status — though I would have been fine with it continuing without returning to me.

... continue reading