The world’s top artificial intelligence groups are stepping up their focus on so-called world models that can better understand human environments, in the search for new ways to achieve machine “superintelligence.”
Google DeepMind, Meta, and Nvidia are among the companies attempting to gain ground in the AI race by developing systems that aim to navigate the physical world by learning from videos and robotic data rather than just language.
This push comes as questions rise about whether large language models—the technology that powers popular chatbots such as OpenAI’s ChatGPT—are reaching a ceiling in their progress.
The leaps in performance between LLMs released by companies across the sector, such as OpenAI, Google, and Elon Musk’s xAI, have been slowing, despite the vast sums invested in their development.
The potential market for world models could be huge, almost the size of the global economy, according to Rev Lebaredian, vice-president of Omniverse and simulation technology at Nvidia, as it brings the technology into the physical domain, such as the manufacturing and health care sectors.
“What is the opportunity for world foundation models? Essentially... $100 trillion if we can make an intelligence that can understand the physical world and operate in the physical world,” he said.
World models are trained using data streams of real or simulated environments. They are viewed as an important step in pushing forward progress in self-driving cars, robotics, and so-called AI agents, but require a huge amount of data and computing power to train and are considered an unsolved technical challenge.
This focus on an alternative approach to LLMs has become visible as several AI groups have unveiled a series of advancements in world models in recent months.