For almost 1.5 years after the release of ChatGPT, the question of which LLM should you use had an obvious answer: there was GPT 4 and there was everyone else. Now, at the end of 2025, there are dozens of LLMs to choose from, and navigating the space if you're not a terminally online user of X is almost impossible. Well, it was until I wrote this post.
To demonstrate my street creds, look at my Cursor wrapped:
it shows I joined 666+Cursor didn't expect such veterans to still use their app? days ago, which is roughly March 4, 2024, the release date of Sonnet 3 by Anthropic. Since then, I've seen things you people wouldn't believe. GPT4 one-shotting creating a matplotlib projection of a protein structure to a 2D space, all in a single chat message. I watched the 03-25 checkpoint of Gemini 2.5 Pro show full reasoning traces. All those moments...
Anyway, this post is both an ultimate guide to LLMs in 2026 and a commentary on the general state of AI, how we've got here, and where we're going. My thoughts are grouped by use cases.
Use Case 1: Writing Code
TL;DR My overall take is that vibe coding is a huge distraction; it's actively killing Cursor (which is why I stopped using it). You should remain in charge, and best way to do that is to either not use agentic workflows at all (just talk to Gemini 2.5/3 Pro in AI Studio) or use OpenCode, which is like Claude Code, but it shows you all the code changes in git diff format, and I honestly can't understand how anyone would settle for anything else. In OpenCode, use latest Sonnet model (today it's 4.5) for most tasks, switch to latest Opus (today it's 4.5) for complex tasks. You can get that for just $20/mo (the Claude Pro plan, which includes usage of Claude Code, and you can use that plan to authorize OpenCode). If you're rich or have a company that's willing to cover the cost, you can switch to $100/mo or $200/mo plans, which will let you use Opus more frequently (see your usage here).
Vibe coding is killing Cursor
Being able to create a fully functioning software, website, or an app just by writing English prompts is borderline sci-fi, which is probably why it captured so much of everyone's attention ever since late 2023. And it'd be foolish to pretend LLMs can't do that already (in fact, I vibe coded a v0 of what became the best todo + pomodoro app on the market, Grow). The problem is that unleashed vibe coding is insanely token-inefficient and expensive. Consider the following interaction.
You want to create a landing page for your new project. You write the prompt (P1, 100 tokens) and LLM responds with code (O1, 2k tokens) that you (or your agentic harness) puts into index.tsx . Say the page contains 5 sections O1-S{i}, one of which, O1-S4 contains a grid of features, each feature accompanied by an icon.
Say you don't like the choice of icons or the phrasing of particular description in that section O1-S4. If you were a reasonable person, you'd open the file, navigate to the desired section, and tweak the wording or change the icon name to something you like more (you can have the LucideIcon website open in a different window). If you're a vibe coding maximalist, however, you'd keep prompting LLM to make such simple changes for you. You'd write another prompt (P2, 100 tokens). The problem is that under the hood, an LLM will not see just those 100 P2 tokens as an input, it'll see the whole chain (P1, O1, P2) of messages (2.2k tokens) and will produce another response (O3), which will either be a drop-in replacement of the whole file (2k tokens) or a patch, and instructions on how/where to apply the patch, so in practice it'll be maybe 100-500 tokens. But say you still dislike the icons, and you keep prompting the LLM. Each time the chain grows, and you keep running LLM inference on longer, and longer message chains just to change an icon. So chances are, you'll have plenty of other similar interactions for every little thing you want to change.
... continue reading