Skip to content
  • Blog Home
  • Cyber Map
  • About Us – Contact
  • Disclaimer
  • Terms and Rules
  • Privacy Policy
Cyber Web Spider Blog – News

Cyber Web Spider Blog – News

Globe Threat Map provides a real-time, interactive 3D visualization of global cyber threats. Monitor DDoS attacks, malware, and hacking attempts with geo-located arcs on a rotating globe. Stay informed with live logs and archive stats.

  • Home
  • Cyber Map
  • Cyber Security News
  • Security Week News
  • The Hacker News
  • How To?
  • Toggle search form

Reddit to Block Internet Archive as AI Companies Have Scraped Data From Wayback Machine

Posted on August 12, 2025August 12, 2025 By CWS

Reddit has introduced plans to considerably limit the Web Archive’s Wayback Machine from indexing its platform, citing issues that AI firms have been exploiting the archival service to avoid Reddit’s information safety insurance policies. 

The transfer represents one other escalation in Reddit’s ongoing battle to manage entry to its user-generated content material amid the AI coaching information growth.

Key Takeaways1. The Wayback Machine will solely be capable to archive Reddit’s homepage, not particular person posts or feedback.2. Corporations have been utilizing archived information to bypass Reddit’s direct entry restrictions3. Reddit prefers paid licensing offers over free information entry.

Block Wayback Machine Entry 

Beginning immediately, Reddit will implement what it calls “ramping up” restrictions that can block the Wayback Machine from accessing publish element pages, remark threads, and person profiles. 

The Web Archive will solely retain the flexibility to index Reddit’s homepage, successfully limiting historic information to snapshots of trending headlines and fashionable posts on given dates.

“Web Archive gives a service to the open internet, however we’ve been made conscious of cases the place AI firms violate platform insurance policies, together with ours, and scrape information from the Wayback Machine,” Reddit spokesperson Tim Rathschmidt defined. 

The corporate has recognized particular cases the place AI coaching firms have used the robots.txt bypass capabilities inherent in archived content material to entry Reddit information that may in any other case be restricted by the platform’s present API charge limiting and crawler blocking mechanisms.

Reddit’s technical implementation will probably contain updating its robots.txt file with particular Person-Agent strings focusing on Web Archive crawlers, whereas probably implementing server-side blocking based mostly on IP ranges related to the Wayback Machine’s infrastructure. 

This method mirrors the platform’s current technique of blocking search engine crawlers until firms enter paid licensing agreements.

This restriction kinds a part of Reddit’s complete method to monetizing its information belongings within the AI period. 

The platform has entered into important offers with Google and OpenAI for official information entry, whereas concurrently pursuing authorized motion in opposition to firms like Anthropic for allegedly persevering with to scrape content material after claiming to have stopped.

Reddit’s 2023 API pricing adjustments, which successfully shuttered fashionable third-party functions, have been justified utilizing comparable reasoning about stopping unauthorized AI coaching.

The corporate has carried out charge limiting, authentication necessities, and utilization monitoring throughout its technical infrastructure to keep up management over information entry.

Mark Graham, director of the Wayback Machine, acknowledged ongoing discussions with Reddit concerning the matter, suggesting potential technical options could also be explored. 

Nonetheless, Reddit’s place seems agency: till the Web Archive can assure compliance with platform insurance policies concerning person privateness and content material deletion respect, entry will stay severely restricted.

This growth highlights the rising stress between open internet archival ideas and business information management within the AI coaching panorama.

Increase your SOC and assist your workforce defend your enterprise with free top-notch menace intelligence: Request TI Lookup Premium Trial.

Cyber Security News Tags:Archive, Block, Companies, Data, Internet, Machine, Reddit, Scraped, Wayback

Post navigation

Previous Post: OT Networks Targeted in Widespread Exploitation of Erlang/OTP Vulnerability
Next Post: Critical Vulnerability in Carmaker Portal Let Hackers Unlock the Car Remotely

Related Posts

New Malware Attack Via “I’m not a Robot Check” to Trick Users into Running Malware Cyber Security News
Ransomware Actors Exploit Unpatched SimpleHelp RMM to Compromise Billing Software Provider Cyber Security News
Threat Modeling for DevSecOps Practical Guide Cyber Security News
Microsoft Confirms Laying Off 9,000 Employees, Impacting 4% of its Workforce Cyber Security News
Threat Actors Abuse Proofpoint’s and Intermedia’s Link Wrapping Features to Hide Phishing Payloads Cyber Security News
WordPress Admins Beware! Fake Cache Plugin that Steals Admin Logins Cyber Security News

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News

Recent Posts

  • 1Kosmos Raises $57 Million for Identity Verification and Authentication Platform
  • SAP Patches Critical S/4HANA Vulnerability
  • Critical Vulnerability in Carmaker Portal Let Hackers Unlock the Car Remotely
  • Reddit to Block Internet Archive as AI Companies Have Scraped Data From Wayback Machine
  • OT Networks Targeted in Widespread Exploitation of Erlang/OTP Vulnerability

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Archives

  • August 2025
  • July 2025
  • June 2025
  • May 2025

Recent Posts

  • 1Kosmos Raises $57 Million for Identity Verification and Authentication Platform
  • SAP Patches Critical S/4HANA Vulnerability
  • Critical Vulnerability in Carmaker Portal Let Hackers Unlock the Car Remotely
  • Reddit to Block Internet Archive as AI Companies Have Scraped Data From Wayback Machine
  • OT Networks Targeted in Widespread Exploitation of Erlang/OTP Vulnerability

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News