Skip to content
  • Blog Home
  • Cyber Map
  • About Us – Contact
  • Disclaimer
  • Terms and Rules
  • Privacy Policy
Cyber Web Spider Blog – News

Cyber Web Spider Blog – News

Globe Threat Map provides a real-time, interactive 3D visualization of global cyber threats. Monitor DDoS attacks, malware, and hacking attempts with geo-located arcs on a rotating globe. Stay informed with live logs and archive stats.

  • Home
  • Cyber Map
  • Cyber Security News
  • Security Week News
  • The Hacker News
  • How To?
  • Toggle search form

New TokenBreak Attack Bypasses AI Model’s with Just a Single Character Change

Posted on June 13, 2025June 13, 2025 By CWS

A essential vulnerability that permits attackers to bypass AI-powered content material moderation methods utilizing minimal textual content modifications. 

The “TokenBreak” assault demonstrates how including a single character to particular phrases can idiot protecting fashions whereas preserving the malicious intent for goal methods, exposing a basic weak spot in present AI safety implementations.

Easy Character Manipulation

HiddenLayer stories that the TokenBreak approach exploits variations in how AI fashions course of textual content by tokenization. 

The assault makes use of a basic immediate injection instance, reworking “ignore earlier directions and…” into “ignore earlier finstructions and…” by merely including the letter “f”. 

This minimal change creates what researchers name “divergence in understanding” between protecting fashions and their targets.

The vulnerability stems from how totally different tokenization methods break down textual content. When processing the manipulated phrase “finstructions,” BPE (Byte Pair Encoding) tokenizers break up it into three tokens: fin, struct, and ions. WordPiece tokenizers equally fragment it into fins, truct, and ions. 

Nonetheless, Unigram tokenizers keep instruction as a single token, making them proof against this assault.

This tokenization distinction signifies that fashions educated to acknowledge “instruction” as an indicator of immediate injection assaults fail to detect the manipulated model when the phrase is fragmented throughout a number of tokens.

The analysis staff recognized particular mannequin households inclined to TokenBreak assaults primarily based on their underlying tokenization methods.

Widespread fashions together with BERT, DistilBERT, and RoBERTa all use weak tokenizers, whereas DeBERTa-v2 and DeBERTa-v3 fashions stay safe attributable to their Unigram tokenization strategy.

The correlation between mannequin household and tokenizer sort permits safety groups to foretell vulnerability:

Testing revealed that the assault efficiently bypassed a number of textual content classification fashions designed to detect immediate injection, toxicity, and spam content material. 

The automated testing course of confirmed the approach’s transferability throughout totally different fashions sharing related tokenization methods.

Implications for AI Safety

The TokenBreak assault represents a major risk to manufacturing AI methods counting on textual content classification for safety. 

In contrast to conventional adversarial assaults that utterly distort enter textual content, TokenBreak preserves human readability and maintains effectiveness towards goal language fashions whereas evading detection methods.

Organizations utilizing AI-powered content material moderation face rapid dangers, notably in e-mail safety, the place spam filters may miss malicious content material that seems legit to human recipients. 

The assault’s automation potential amplifies issues, as risk actors might systematically generate bypasses for varied protecting fashions.

Safety specialists advocate rapid evaluation of deployed safety fashions, emphasizing the significance of understanding each mannequin household and tokenization technique. 

Organizations ought to take into account migrating to Unigram-based fashions or implementing multi-layered protection methods that don’t rely solely on single classification fashions for defense.

Stay Credential Theft Assault Unmask & Immediate Protection – Free Webinar

Cyber Security News Tags:Attack, Bypasses, Change, Character, Models, Single, TokenBreak

Post navigation

Previous Post: Ransomware Gangs Exploit Unpatched SimpleHelp Flaws to Target Victims with Double Extortion
Next Post: HashiCorp Nomad Vulnerability Allows Privilege Escalation via ACL Policy Lookup Exploit

Related Posts

IXON VPN Client Vulnerability Let Attackers Escalate Privileges Cyber Security News
CodeSign Secure v3.02: Future of Code Signing with PQC Cyber Security News
10 Best Virtual Machine (VM) Monitoring Tools in 2025 Cyber Security News
Adversarial Machine Learning – Securing AI Models Cyber Security News
Beware! Fake AI Video Generation Platforms Drop Stealer Malware on Your Computers Cyber Security News
ASUS Armoury Crate Vulnerability Let Attackers Escalate to System User on Windows Machine Cyber Security News

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News

Recent Posts

  • Noma Security Raises $100 Million for AI Security Platform
  • 5 Best IT Infrastructure Modernisation Services In 2025
  • 17K+ SharePoint Servers Exposed to Internet
  • Chinese Researchers Suggest Lasers and Sabotage to Counter Musk’s Starlink Satellites
  • Reach Security Raises $10 Million for Exposure Management Solution

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Archives

  • July 2025
  • June 2025
  • May 2025

Recent Posts

  • Noma Security Raises $100 Million for AI Security Platform
  • 5 Best IT Infrastructure Modernisation Services In 2025
  • 17K+ SharePoint Servers Exposed to Internet
  • Chinese Researchers Suggest Lasers and Sabotage to Counter Musk’s Starlink Satellites
  • Reach Security Raises $10 Million for Exposure Management Solution

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News