Skip to content
  • Blog Home
  • Cyber Map
  • About Us – Contact
  • Disclaimer
  • Terms and Rules
  • Privacy Policy
Cyber Web Spider Blog – News

Cyber Web Spider Blog – News

Globe Threat Map provides a real-time, interactive 3D visualization of global cyber threats. Monitor DDoS attacks, malware, and hacking attempts with geo-located arcs on a rotating globe. Stay informed with live logs and archive stats.

  • Home
  • Cyber Map
  • Cyber Security News
  • Security Week News
  • The Hacker News
  • How To?
  • Toggle search form

Reddit to Block Internet Archive as AI Companies Have Scraped Data From Wayback Machine

Posted on August 12, 2025August 12, 2025 By CWS

Reddit has introduced plans to considerably limit the Web Archive’s Wayback Machine from indexing its platform, citing issues that AI firms have been exploiting the archival service to avoid Reddit’s information safety insurance policies. 

The transfer represents one other escalation in Reddit’s ongoing battle to manage entry to its user-generated content material amid the AI coaching information growth.

Key Takeaways1. The Wayback Machine will solely be capable to archive Reddit’s homepage, not particular person posts or feedback.2. Corporations have been utilizing archived information to bypass Reddit’s direct entry restrictions3. Reddit prefers paid licensing offers over free information entry.

Block Wayback Machine Entry 

Beginning immediately, Reddit will implement what it calls “ramping up” restrictions that can block the Wayback Machine from accessing publish element pages, remark threads, and person profiles. 

The Web Archive will solely retain the flexibility to index Reddit’s homepage, successfully limiting historic information to snapshots of trending headlines and fashionable posts on given dates.

“Web Archive gives a service to the open internet, however we’ve been made conscious of cases the place AI firms violate platform insurance policies, together with ours, and scrape information from the Wayback Machine,” Reddit spokesperson Tim Rathschmidt defined. 

The corporate has recognized particular cases the place AI coaching firms have used the robots.txt bypass capabilities inherent in archived content material to entry Reddit information that may in any other case be restricted by the platform’s present API charge limiting and crawler blocking mechanisms.

Reddit’s technical implementation will probably contain updating its robots.txt file with particular Person-Agent strings focusing on Web Archive crawlers, whereas probably implementing server-side blocking based mostly on IP ranges related to the Wayback Machine’s infrastructure. 

This method mirrors the platform’s current technique of blocking search engine crawlers until firms enter paid licensing agreements.

This restriction kinds a part of Reddit’s complete method to monetizing its information belongings within the AI period. 

The platform has entered into important offers with Google and OpenAI for official information entry, whereas concurrently pursuing authorized motion in opposition to firms like Anthropic for allegedly persevering with to scrape content material after claiming to have stopped.

Reddit’s 2023 API pricing adjustments, which successfully shuttered fashionable third-party functions, have been justified utilizing comparable reasoning about stopping unauthorized AI coaching.

The corporate has carried out charge limiting, authentication necessities, and utilization monitoring throughout its technical infrastructure to keep up management over information entry.

Mark Graham, director of the Wayback Machine, acknowledged ongoing discussions with Reddit concerning the matter, suggesting potential technical options could also be explored. 

Nonetheless, Reddit’s place seems agency: till the Web Archive can assure compliance with platform insurance policies concerning person privateness and content material deletion respect, entry will stay severely restricted.

This growth highlights the rising stress between open internet archival ideas and business information management within the AI coaching panorama.

Increase your SOC and assist your workforce defend your enterprise with free top-notch menace intelligence: Request TI Lookup Premium Trial.

Cyber Security News Tags:Archive, Block, Companies, Data, Internet, Machine, Reddit, Scraped, Wayback

Post navigation

Previous Post: OT Networks Targeted in Widespread Exploitation of Erlang/OTP Vulnerability
Next Post: Critical Vulnerability in Carmaker Portal Let Hackers Unlock the Car Remotely

Related Posts

LegalPwn Attack Exploits Gemini, ChatGPT and other AI Tools into Executing Malware Cyber Security News
Fashion Giant Chanel Hacked in Wave of Salesforce Attacks Cyber Security News
New Banking Malware Abusing WhatsApp to Gain Complete Remote Access to Your Computer Cyber Security News
CISA Releases Best Security Practices Guide for Hardening Microsoft Exchange Server Cyber Security News
Meta Found a New Way to Track Android Users Covertly via Facebook & Instagram Cyber Security News
Threat Actors Advertising ‘MioLab MacOS’ Infostealer on an Underground Forum Cyber Security News

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News

Recent Posts

  • Data Breach at Texas Gas Station Operator Exposes Info of 377,000+ Customers
  • MuddyWater Launches RustyWater RAT via Spear-Phishing Across Middle East Sectors
  • Europol Arrests 34 Black Axe Members in Spain Over €5.9M Fraud and Organized Crime
  • New MacSync Stealer Uses Signed macOS App to Evade Gatekeeper and Steal Data
  • Instagram Data Leak Exposes Sensitive Info of 17.5M Accounts

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Archives

  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025

Recent Posts

  • Data Breach at Texas Gas Station Operator Exposes Info of 377,000+ Customers
  • MuddyWater Launches RustyWater RAT via Spear-Phishing Across Middle East Sectors
  • Europol Arrests 34 Black Axe Members in Spain Over €5.9M Fraud and Organized Crime
  • New MacSync Stealer Uses Signed macOS App to Evade Gatekeeper and Steal Data
  • Instagram Data Leak Exposes Sensitive Info of 17.5M Accounts

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News

Copyright © 2026 Cyber Web Spider Blog – News.

Powered by PressBook Masonry Dark