The World Wide Web (Web) emerged as a new medium in the mid-1990s. It was invented by Tim Berners-Lee at the European Organization for Nuclear Research (CERN) in 1989, but its exploding popularity was also enabled by the release of the Mosaic Web browser in 1993 and the Internet becoming commercially available in 1995. A communication revolution was launched.
Roughly 30 years later, the release of ChatGPT by OpenAI in Nov. 2022 launched another revolution. High-quality generation of natural-language text, defined as the hallmark of intelligence by Alan Turing in 1950, is suddenly widely available. I wonder, however, if the generative AI (GenAI) revolution will end up devouring the Web revolution.
As I pointed out in 2018, the anti-establishment zeitgeist of the 1960s led to the dogma of “Information wants to be free.” Thus, Google, which defines search on the Web, uses advertising as its main revenue engine. Google sends users to Web pages, without charge, that link to Google Ads paid for by advertisers. But users typically want information not Web pages. Web pages are just a means for getting information.
Now, however, I can ask a chatbot the same questions I used to ask a search engine—for instance, “How can I balance the wheels of my car?”—and I will get a detailed answer without going to a web page on wheel balancing. In fact, many Google searches now display “AI Overview” at the top of the results page, obviating the need to visit a webpage.
But if users lose the motivation to visit Web pages, then advertisers lose the motivation to pay for Google Ads. If we can find information by asking GenAI, who needs the Web? While the ecosystem in which the Web thrived had one colossal flaw, namely, Surveillance Capitalism, it had a stable business model. That business model is now being threatened by GenAI.
But the threat of GenAI goes deeper than the threat to advertising-supported Web search. If users lose the motivation to visit Web pages, then Web-page developers lose the motivation to post Web pages. Yet, public Web pages are one major source of data to train the large language models (LLMs) underlying GenAI. Without public Web pages, it would be much more difficult to train LLMs.
And the risk does not go away even if Web-page developers continue to post public pages. As the biblical phrase “In the sweat of thy face shalt thou eat bread” suggests, humans do not like to work hard. Writing is also hard work, not physically, but mentally. But now we have GenAI! People are increasingly using GenAI to create text: It is so much easier than writing. For example, I asked ChatGPT “Would AI destroy the World-Wide Web?” and it replied “Not likely—but it could erode its value unless regulated and used responsibly” and offered a detailed analysis.
I am less optimistic than ChatGPT. It is so easy to generate text using GenAI that people will invariably generate text for public Web pages using GenAI. But, as pointed out earlier, public Web pages are the raw data for training LLMs. What happens when LLMs are trained on LLM-generated texts? This topic was addressed in a July 2024 Nature paper titled “AI Models Collapse when Trained on Recursively Generated Data.” “We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear,” reported the authors. In other words, an LLM-generated Web is useless in providing data to train LLMs. A counterargument is that model collapse is not inevitable. It can be avoided if LLM-generated content is added to human-generated content rather than replaces it. Are we in an incremental content regime or a replacement content regime? Only time will tell.
Cory Doctorow coined the phrase “enshittification” to describe the business strategy by online platforms of hooking users with tempting products, only to degrade them later by shifting value away from users. Is the Web undergoing enshittification? The Web became immensely useful because many people generated content, and Google found the best-quality content in response to our questions. But the current quality of GenAI-generated answers is such that people argue that “The Entire Internet Is Reverting to Beta.” In fact, at the bottom of each AI-generated answer Google adds the disclaimer, in a small font, “AI responses may include mistakes.” I have to teach my students to always click the link and go to the source, but I doubt many people do that.
Will AI destroy the Web? Even ChatGPT agrees it could erode its value unless regulated and used responsibly.