Reading Old Reddit Threads Will Become More and More Difficult, and Artificial Intelligence Can Be Blamed

With AI increasingly popping up in Google search results lately, I’ve come to rely more and more on the magic word that makes the internet work: Reddit. It has its problems , but adding “Reddit” to a search query is still my most reliable way to get an honest opinion from a real person, which is more than I can say for some other platforms . Unfortunately, it looks like the “Reddit” trick is about to become much less useful, and AI can once again be blamed.

The problem with any live forum is that information is constantly changing: people delete old posts, and new updates break old sections of the site. It used to be possible to bypass this restriction, but over time this loophole will be closed.

Yes, Reddit is about to start blocking the Internet Archive . The site, run by a nonprofit dedicated to preserving the open internet, hosts the Wayback Machine , a popular way to view web pages that are no longer active or have changed significantly since their inception. Simply type a URL into the Machine’s search bar, and you’ll be able to see snapshots of what the page looked like before, sometimes going all the way back to the 1990s.

You may also like

This is a useful way to see how a site has changed, or to access information that was thought to be long lost. In the case of Reddit, this can be used to, for example, view hotel reviews that have since been deleted. Sure, you might feel awkward reading a deliberately deleted post, but since deleting all posts when you leave the service is a common practice , the Wayback Machine is a great way to preserve useful content for years to come and prevent classic memes from becoming useless media.

Unfortunately, while Reddit says it doesn’t oppose the Wayback Machine in general, it is going to block the Internet Archive from indexing anything other than the Reddit homepage. This means that going forward, it will only be able to maintain lists of popular Reddit content on a given day. Individual subreddits and posts will be blocked.

This isn’t entirely useless if you’re an internet researcher, for example, but it will make all future Reddit threads much more temporary and will certainly hurt internet search in the future. If I leave a review for a hotel now and then delete my thread, users won’t be able to find it easily in a month or two. On the other hand, existing archives shouldn’t be affected by this block, at least not unless Reddit asks the Internet Archive to remove existing posts. But over time, the lack of Reddit archives will only become a bigger problem.

So why is this happening? Basically, Reddit doesn’t approve of AI companies copying content from its site, at least not without paying for it .

What do you think at the moment?

“The Internet Archive provides services for the open internet,” Reddit spokesperson Tim Ratschmidt told the Verge , “but we’ve seen cases of AI companies violating our platform rules, including ours, by scraping data from the Wayback Machine.” In fact, Reddit has tried to strictly control which AI companies it works with (it’s already been sued over this ), and has blocked most of them from accessing its site. But since some of them have moved to scraping Reddit pages that the Internet Archive has collected, the company is now going to crack down on those efforts, too. We’re essentially paying the price for several bad situations.

Ratschmidt told The Verge that restrictions on access to the Internet Archive will begin to “tighten” today, though he wasn’t entirely sure how. I reached out to Reddit for details, but so far I’ve double-checked and can access existing archives , so at least Reddit hasn’t reached its climax yet.

As for future publications, all may not be lost. The Verge also spoke with Wayback Machine director Mark Graham, who said that the Internet Archive has a “long-standing relationship with Reddit” and that “discussions around this are ongoing.”

More…

Leave a Reply