Gen AI Needs Synthetic Data. We Need to Be Able to Trust It
Published on: 2025-10-28 14:00:00
Today's generative AI models, like those behind ChatGPT and Gemini, are trained on reams of real-world data, but even all the content on the internet is not enough to prepare a model for every possible situation.
To continue to grow, these models need to be trained on simulated or synthetic data, which are scenarios that are plausible, but not real. AI developers need to do this responsibly, experts said on a panel at South by Southwest, or things could go haywire quickly.
The use of simulated data in training artificial intelligence models has gained new attention this year since the launch of DeepSeek AI, a new model produced in China that was trained using more synthetic data than other models, saving money and processing power.
But experts say it's about more than saving on the collection and processing of data. Synthetic data — computer generated often by AI itself — can teach a model about scenarios that don't exist in the real-world information it's been provided but that it
... Read full article.