The machines are fine. I'm worried about us.
Imagine you're a new assistant professor at a research university. You just got the job, you just got a small pot of startup funding, and you just hired your first two PhD students: Alice and Bob. You're in astrophysics. This is the beginning of everything.
You do what your supervisor did for you, years ago: you give each of them a well-defined project. Something you know is solvable, because other people have solved adjacent versions of it. Something that would take you, personally, about a month or two. You expect it to take each student about a year, because they don't know what they're doing yet, and that's the point. The project isn't the deliverable. The project is the vehicle. The deliverable is the scientist that comes out the other end.
Alice's project is to build an analysis pipeline for measuring a particular statistical signature in galaxy clustering data. Bob's is something similar in scope and difficulty, a different signal, a different dataset, the same basic arc of learning. You send them each a few papers to read, point them at some publicly available data, and tell them to start by reproducing a known result. Then you wait.
The academic year unfolds the way academic years do. You have weekly meetings with each student. Alice gets stuck on the coordinate system. Bob can't get his likelihood function to converge. Alice writes a plotting script that produces garbage. Bob misreads a sign convention in a key paper and spends two weeks chasing a factor-of-two error. You give them both similar feedback: read the paper again, check your units, try printing the intermediate output, think about what the answer should look like before you look at what the code gives you. Normal things. The kind of things you say fifty times a year and never remember saying.
By summer, both students have finished. Both papers are solid. Not groundbreaking, not going to change the field, but correct, useful, and publishable. Both go through a round of minor revisions at a decent journal and come out the other side. A perfectly ordinary outcome. The kind of outcome that the entire apparatus of academic training is designed to produce.
But Bob has a secret.
Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.
Here's where it gets interesting. If you are an administrator, a funding body, a hiring committee, or a metrics-obsessed department head, Alice and Bob had the same year. One paper each. One set of minor revisions each. One solid contribution to the literature each. By every quantitative measure that the modern academy uses to assess the worth of a scientist, they are interchangeable. We have built an entire evaluation system around counting things that can be counted, and it turns out that what actually matters is the one thing that can't be.
It gets worse. The majority of PhD students will leave academia within a few years of finishing. Everyone knows this. The department knows it, the funding body knows it, the supervisor probably knows it too even if nobody says it out loud. Which means that, from the institution's perspective, the question of whether Alice or Bob becomes a better scientist is largely someone else's problem. The department needs papers, because papers justify funding, and funding justifies the department. The student is the means of production. Whether that student walks out the door five years later as an independent thinker or a competent prompt engineer is, institutionally speaking, irrelevant. The incentive structure doesn't just fail to distinguish between Alice and Bob. It has no reason to try.
... continue reading