The Future of Everything Is Lies, I Guess: New Jobs

Previously: Work.

As we deploy ML more broadly, there will be new kinds of work. I think much of it will take place at the boundary between human and ML systems. Incanters could specialize in prompting models. Process and statistical engineers might control errors in the systems around ML outputs and in the models themselves. A surprising number of people are now employed as model trainers, feeding their human expertise to automated systems. Meat shields may be required to take accountability when ML systems fail, and haruspices could interpret model behavior.

LLMs are weird. You can sometimes get better results by threatening them, telling them they’re experts, repeating your commands, or lying to them that they’ll receive a financial bonus. Their performance degrades over longer inputs, and tokens that were helpful in one task can contaminate another, so good LLM users think a lot about limiting the context that’s fed to the model.

I imagine that there will probably be people (in all kinds of work!) who specialize in knowing how to feed LLMs the kind of inputs that lead to good results. Some people in software seem to be headed this way: becoming LLM incanters who speak to Claude, instead of programmers who work directly with code.

The unpredictable nature of LLM output requires quality control. For example, lawyers keep getting in trouble because they submit AI confabulations in court. If they want to keep using LLMs, law firms are going to need some kind of process engineers who help them catch LLM errors. You can imagine a process where the people who write a court document deliberately insert subtle (but easily correctable) errors, and delete things which should have been present. These introduced errors are registered for later use. The document is then passed to an editor who reviews it carefully without knowing what errors were introduced. The document can only leave the firm once all the intentional errors (and hopefully accidental ones) are caught. I imagine provenance-tracking software, integration with LexisNexis and document workflow systems, and so on to support this kind of quality-control workflow.

These process engineers would help build and tune that quality-control process: training people, identifying where extra review is needed, adjusting the level of automated support, measuring whether the whole process is better than doing the work by hand, and so on.

A closely related role might be statistical engineers: people who attempt to measure, model, and control variability in ML systems directly. For instance, a statistical engineer could figure out that the choice an LLM makes when presented with a list of options is influenced by the order in which those options were presented, and develop ways to compensate. I suspect this might look something like psychometrics—a field in which psychologists have gone to great lengths to statistically model and measure the messy behavior of humans via indirect means.

Since LLMs are chaotic systems, this work will be complex and challenging: models will not simply be “95% accurate”. Instead, an ML optimizer for database queries might perform well on English text, but pathologically on timeseries data. A healthcare LLM might be highly accurate for queries in English, but perform abominably when those same questions are presented in Spanish. This will require deep, domain-specific work.

As slop takes over the Internet, labs may struggle to obtain high-quality corpuses for training models. Trainers must also contend with false sources: Almira Osmanovic Thunström demonstrated that just a handful of obviously fake articles could cause Gemini, ChatGPT, and Copilot to inform users about an imaginary disease with a ridiculous name. There are financial, cultural, and political incentives to influence what LLMs say; it seems safe to assume future corpuses will be increasingly tainted by misinformation.

One solution is to use the informational equivalent of low-background steel: uncontaminated works produced prior to 2023 are more likely to be accurate. Another option is to employ human experts as model trainers. OpenAI could hire, say, postdocs in the Carolingian Renaissance to teach their models all about Alcuin. These subject-matter experts would write documents for the initial training pass, develop benchmarks for evaluation, and check the model’s responses during conditioning. LLMs are also prone to making subtle errors that look correct. Perhaps fixing that problem involves hiring very smart people to carefully read lots of LLM output and catch where it made mistakes.

... continue reading