Management as AI superpower: Thriving in a world of agentic AI

I just taught an experimental class at the University of Pennsylvania where I challenged students to create a startup from scratch in four days. Most of the people in the class were in the executive MBA program, so they were taking classes while also working as doctors, managers, or leaders in a variety of large and small companies. Few had ever coded. I introduced them to Claude Code and Google Antigravity, which they needed to use to build a working prototype. But a prototype alone is not a startup, so they used ChatGPT, Claude, and Gemini to accelerate the idea generation, market research, competitive positioning, pitching, and financial modelling processes. I was curious how far they could get in such a short time. It turns out they got very far.

I’ve been teaching entrepreneurship for a decade and a half, and I've seen thousands of startup ideas (some of which turned into large companies) so I have a good sense of the expectations for what a class of smart MBA students can accomplish. I would estimate that what I saw in a couple of days was an order of magnitude further along the path to a real startup than I had seen out of students working over a full semester before AI. Most of the prototypes were not just sample screens but actually had a core feature working. Ideas were far more diverse and interesting than usual. Market and customer analyses were insightful. It was really impressive. These were not yet working startups nor were they fully operational products (with a couple exceptions) — but they had shaved months and huge amounts of money and effort from the traditional process. And there was something else: most early startups need to pivot, changing direction as they learn more about what the market wants and what is technically possible. By lowering the costs of pivoting, it was much easier to explore the possibilities without being locked in or even explore multiple startups at once: you just tell the AI what you want.

I wish I could say this impressive output was the result of my brilliant teaching, but we don’t really have a great framework yet for how to use all these tools, the students largely figured it out on their own. It helped that they had some management and subject matter expertise because it turns out that the key to success was actually the last bit of the previous paragraph: telling the AI what you want. As AIs are increasingly capable of tasks that would take a human hours to do, and as evaluating those results becomes increasingly time consuming, the value of being good at delegation increases. But when should you delegate to AI?

The Equation of Agentic Work

We actually have an answer, but it is a bit complicated. Consider three factors: First, because of the Jagged Frontier of AI ability, you don’t reliably know what the AI will be good or bad at on complex tasks. Second, whether the AI is good or bad, it is definitely fast. It produces work in minutes that would take many hours for a human to do. Third, it is cheap (relative to professional wages), and it doesn’t mind if you generate multiple versions and throw most of them away.

These three factors mean that deciding to delegate to AI depends on three variables:

Human Baseline Time: how long the task would take you to do yourself Probability of Success: how likely the AI is to produce an output that meets your bar on a given attempt AI Process Time: how long it takes you to request, wait for, and evaluate an AI output

A useful mental model is that you’re trading off “doing the whole task” (Human Baseline Time) against “paying the overhead cost” (AI Process Time), possibly multiple times until you get something acceptable. The higher Probability of Success is, the fewer times you have to pay AI Process Time, and the more useful it is to turn things over to the AI. For example, consider a task that takes you an hour to do, but the AI can do it in minutes, though checking the answer takes thirty minutes. In that case, you should only give the work to the AI if Probability of Success is very high, otherwise you’ll spend more time generating and checking drafts than just doing it yourself. If the Human Baseline Time is 10 hours, though, it could be worth several hours of working with the AI, assuming that the AI can be made to do a competent job.

An example of a many hour Human Baseline Time prompt, with an initial AI Process Time of 30 minutes (when you can be doing something else) plus the time to check and write the prompt. If you have to make a lot of corrections, though, it isn’t worth it.

We know this equation works because this past summer, OpenAI released one of the more important papers on AI and real work, GDPval. I have discussed it before, but the key was that it pitted experienced human experts in diverse fields from finance to medicine to government against the latest AIs, with another set of experts working as judges. It took experts seven hours on average to do the work, so, in this case, that is the Human Baseline Time. The AI Process Time was interesting: the AI took only minutes for tasks, but it required an hour for experts to actually check the work, and, of course, prompts take time to write as well. As for Probability of Success, when GDPval first came out, judges gave human work the win the majority of the time, but, with the release of GPT-5.2, the balance shifted. GPT-5.2 Thinking and Pro models tied or beat human experts an average of 72% of the time.

... continue reading