I was unsure if my parents would notice that the voice on the other end wasn’t mine — or that it was mine, sort of, but it wasn’t me. The voice said hello, asked my dad how he was doing, and asked again when he didn’t respond quickly enough. “What is that, Gaby?” He realized something was wrong almost immediately. I explained I had tried to trick him and it clearly hadn’t worked. “It didn’t,” he said. “It sounded like a robot.”
It wasn’t a perfect experiment. My parents were out of the country, which made for a shoddy connection. They were having lunch with friends, and the voice couldn’t deal with crosstalk or delays in the audio — it tried to fill the silences. And most importantly, the voice sounded human, but it didn’t sound like me.
The voice was generated by the deepfake detection company Reality Defender. The problem of manipulated media isn’t new, but the advent of consumer-grade AI tools has made the creation of fake audio, video, and images essentially frictionless, and a number of companies have sprung up in recent years to combat it. Reality Defender, Pindrop, and GetReal are part of a rapidly growing deepfake detection cottage industry valued at an estimated $5.5 billion as of 2023. These startups use machine learning to identify manipulated media. To fight deepfakes, you have to be able to make them.
The term “deepfake” refers to a specific type of manipulated media that has been generated with “deep” learning, but aside from the way they’re made, there is no one commonality that unites all deepfakes. They have been used for fraud, harassment, and memes. Tools like Grok AI have led to a proliferation of nonconsensual sexual deepfakes, including child sexual abuse material. Scammers have cloned people’s voices, called their relatives, and had the voice say they’re being held for ransom. During the 2024 election, a political strategist and a magician teamed up to create a deepfake of former President Joe Biden, which they used to discourage registered Democrats in New Hampshire from voting in the state’s primary. The head of the Senate Foreign Relations Committee took a Zoom call from someone using AI to pose as a Ukrainian official. At the corporate level, deepfake fraud is now “industrial,” according to one study.
The deepfake detection industry primarily exists to address one of these problems: the issue of corporate fraud.
Reality Defender is effectively training AI to combat AI. The company uses an “inference-based model” to detect deepfakes, CTO Alex Lisle told me. “Our foundational model uses something called a student/teacher paradigm. We take a bunch of real things and say, ‘These are real,’ and then a bunch of fake things and say ‘This is fake.’”
For the fake me, we spent some time fine-tuning the voice: fiddling with the consistency, stability, and tone to make it sound more like the actual me. We could only do so much. There isn’t much publicly available footage of me speaking Spanish — the language I use to communicate with my parents — aside from a single podcast interview from 2021, most of which is unusable because there’s music in the background. But with nine seconds of audio and data scraped from years of posts, we managed to cobble together a somewhat convincing AI agent that was able to carry on a conversation with my parents, albeit an impersonal one. The English model we used on my brother was better, because we had much more training data, but even then it wasn’t convincing enough.
But family is the toughest test.
“They know what your voice sounds like,” Scott Steinhardt, the head of communications at Reality Defender, told me. Steinhardt made the deepfake with my consent and tinkered with it until it more or less sounded like me. It might not fool my family, but it’d probably be good enough for, say, colleagues or corporate entities like banks.
We’ve gone the last 40,000-odd years believing our ears and eyesight, but now we can’t
... continue reading