Against Vibes: When is a Generative Model Useful
Let’s suppose I wanted to answer a question: is the tool X useful for the task Y. If I were scientific about this, I would analyze the properties of tool X and develop a model, and the task Y and the requirements for it and develop a model, and I would use my models to predict the behaviour of tool X in the context of task Y. “Can I use timber instead of stainless steel as a support beam for this structure?” “Will this acid be an appropriate solvent for this reaction?” “Will this programming language provide these real-time guarantees?”
The discourse on generative models is not like this. Instead, you get claims like “software engineering is dead”, and attempts to shove generative models into literally everything without a thought. Search? Generative models. Code completion? Generative models. Summarization? Generative models. Voice to text? Generative models. Stock images? Generative models.
Any attempt to criticize this tends to go in circles and/or have people arguing past each other. Is a generative model useful for internet search? Well, look, it produces text that is plausibly related to the input prompt, so. So… what? That doesn’t answer the question. But the newest models are so much better! Better at.. what?
I was upset about this when it was being called “prompt engineering”, and found no sign of engineering, but instead a series of vibes about how to phrase a prompt in a particular version of a particular model, which sometimes produced output that is plausibly related to the input prompt and therefore plausibly close to what you might have intended. I’m upset now when people are making claims that agents are so useful, but can’t tell me when or why or how they’re useful beyond vibes about feeling more productive (vibes that have been refuted by real science contrasting objective measure of productivity vs. subjective reports), or examples of having produced a lot of plausible output.
(Okay, there are some researchers doing actual science and writing papers; I’m talking about the arguments being made as this stuff is integrating into schools, workplaces, etc.)
I want to know when generative models are useful. I don’t want to feel like they’re useful, that’s just a vibe. I’ve been a generative model skeptic basically from the beginning. I could not convince myself that generative models were useful. But I was also skeptical of my own subjective experience. I could imagine that a model capable of produce code from natural language would be useful, in some use cases that I had not found. I imagine there must be a model of when a generative model X is useful for task Y.
In this post, I’m not addressing ethical, political, or social questions. Those questions are important, and I want to address them separately from what the technology is capable of. Just for context: I think the widespread deployment of this technology is deeply problematic and irresponsible. I think further investment in it at its current scale is an almost criminal level of fiduciary negligence and will cause economic harm. I think the ethics of all of this are deeply troubling.
But for now, I just want to know what they’re technically capable of.
A model of generative model utility
... continue reading