Why AI Shouldn’t Replace Historians Anytime Soon

Microsoft recently released a study about the types of jobs most likely to be augmented by AI in the future. It led many people to assume those are the jobs that will soon be replaced by machines, and since “historian” ranked second highest on the list, it raised more than a few eyebrows among historians on social media. But after I tested generative AI tools with some historical facts, it seems like historians shouldn’t worry too much about the robots taking over just yet. At the very least, they shouldn’t be afraid that AI could do their job well.

Presidential movies

What history facts did I test? I’m fascinated by the movies that presidents have watched while in office. So that’s where I started.

It’s an odd hobby, but I’ve been researching the topic since 2012, when I found a list of the movies that Ronald Reagan watched at the White House and Camp David. I was inspired to file a Freedom of Information Act (FOIA) request for the movies that then-president Barack Obama had been watching, but I found out that presidential records are exempt from FOIA until five years after a president leaves office. But that didn’t deter me. I dug into the subject and have been combing through a vast array of sources on presidential movie-watching habits ever since, dating back to Teddy Roosevelt’s first screening of a bird documentary in 1908.

If you’re going to test a generative artificial intelligence tool, you need to test it using something you know really well. When people ask questions of ChatGPT, they’re typically asking about things they don’t know, which makes sense. These are supposed to be tools that help us do things better and faster. And if they worked as advertised, they would be fantastic. The problem is that they often do not work as advertised.

I asked questions that were a mix of things I knew could be easily answered by a simple Google search and other questions that would be more difficult to find in books and archives. And the results might be eye-opening if you’re still trusting AI chatbots for work you care about getting right.

OpenAI’s GPT-5 flopped

I first tried to test OpenAI’s GPT-5, asking questions about what movies various presidents may have watched on specific days. I chose dates from when the White House was occupied by Woodrow Wilson, Dwight Eisenhower, Richard Nixon, Ronald Reagan, George H.W. Bush, Bill Clinton, and George W. Bush. Each time, ChatGPT replied that it could find no record of any of those presidents watching movies on the dates I provided.

Thankfully, ChatGPT didn’t just lie to me, as it’s been known to do, but it failed to answer some pretty basic questions. GPT-5 is now available to all free users, but there’s a lack of transparency about which model it’s using to answer a given question, and it wasn’t clear what was going on under the hood when I asked about specific dates.

OpenAI has gotten a lot of flak since it released GPT-5 last week. CEO Sam Altman promised it would be like having “a legitimate PhD-level expert in anything, any area you need, on demand, that can help you with whatever your goals are.” But the company nixed the ability of users to pick from the old models, breaking all kinds of workflows and making hardcore users angry. Altman has since backtracked, and ChatGPT now offers access to 4o for subscribers. But my tests have not been encouraging if you want answers to unique questions without a pricey subscription.

... continue reading