The Internet Archive’s Wayback Machine is one of the most valuable free services available on the web, ensuring that important sources of information are protected from the vicissitudes of fate and tech companies. Until recently, the archive was able to capture the entirety of Reddit, but that is no longer the case following new restrictions implemented by the for-profit community discussion platform … The Internet Archive The archive has been in operation since 1996. We began in 1996 by archiving the Internet itself, a medium that was just beginning to grow in use. Like newspapers, the content published on the web was ephemeral – but unlike newspapers, no one was saving it. Today we have 28+ years of web history accessible through the Wayback Machine and we work with 1,200+ library and other partners through our Archive-It program to identify important web pages. To date, it has archived 835 billion web pages, alongside books, audio recordings, photos, videos, photos, and apps. It is used by millions of people a day, from researchers and historians to the general public. Reddit blocks Wayback Machine Engadget reports that Reddit is almost completely blocking the Wayback Machine from crawling content on the platform. The company has begun to place new restrictions on what the archive site will be able to access in a move that will significantly limit the Wayback Machine’s ability to preserve information from Reddit. With the change, the Wayback Machine, a project run by the nonprofit Internet Archive, will only be able to crawl Reddit’s homepage. It will no longer be able to access comments, subreddit pages, post details, profiles and other data. This is despite the fact that Reddit said last year that it would not block good faith actors, specifically including the Internet Archive within this. Along with our updated robots.txt file, we will continue rate-limiting and/or blocking unknown bots and crawlers from accessing reddit.com. This update shouldn’t impact the vast majority of folks who use and enjoy Reddit. Good faith actors – like researchers and organizations such as the Internet Archive – will continue to have access to Reddit content for non-commercial use. All stems from monetizing user content The restrictions are the latest in a growing move by Reddit to sell access to user content while blocking free access to it. The focus on monetization was driven by the company’s IPO. Google pays Reddit more than $60 million a year to access user content to help train its AI models, and a similar deal was struck with OpenAI. Following the conclusion of the Google deal, Reddit started blocking all other search engines. It’s been speculated that some AI companies may have been indirectly scraping content from Reddit via the Wayback Machine, and that this may have driven the new restrictions. Reddit had previously introduced radical API changes that killed third-party apps, resulting in widespread protests by moderators and users. The company had also confirmed plans for paid subreddits, but for now these are on hold. Highlighted accessories Image: 9to5Mac modification of Reddit image