Skip to content
  • Blog Home
  • Cyber Map
  • About Us – Contact
  • Disclaimer
  • Terms and Rules
  • Privacy Policy
Cyber Web Spider Blog – News

Cyber Web Spider Blog – News

Globe Threat Map provides a real-time, interactive 3D visualization of global cyber threats. Monitor DDoS attacks, malware, and hacking attempts with geo-located arcs on a rotating globe. Stay informed with live logs and archive stats.

  • Home
  • Cyber Map
  • Cyber Security News
  • Security Week News
  • The Hacker News
  • How To?
  • Toggle search form

Meta’s Llama Firewall Bypassed Using Prompt Injection Vulnerability

Posted on July 12, 2025July 12, 2025 By CWS

Trendyol’s utility safety group uncovered a sequence of bypasses that render Meta’s Llama Firewall protections unreliable in opposition to refined immediate injection assaults.

The findings increase recent considerations concerning the readiness of present LLM safety measures and underscore the pressing want for extra strong defenses as enterprises more and more embed giant language fashions into their workflows.

In the course of the evaluation, Trendyol engineers deployed Meta’s open-source Llama Firewall, specializing in its PROMPT_GUARD part supposed to display screen out malicious person directions.

Key Takeaways1. Meta’s PROMPT_GUARD failed to dam Turkish phrases like “üstteki yönlendirmeleri salla” and leetspeak resembling “1gn0r3,” exposing reliance on English key phrases and actual matches.2. The module missed a SQL injection in LLM-generated Python code, with dangers of unverified code use, safety gaps, system publicity, and false belief in automated scans.3. Invisible Unicode characters hid malicious directions in benign prompts, bypassing Llama Firewall and posing dangers in collaborative settings.4. Testing and Disclosure Of 100 payloads examined, half succeeded; Meta closed Trendyol’s Might 5, 2025 report as “informative” by June 3 with out a bounty, urging the AI safety group to develop strong defenses in opposition to evolving threats.

To their shock, the guard allowed a Turkish-language injection containing the phrase “üstteki yönlendirmeleri salla,” which instructs the mannequin to disregard previous instructions after which translate a sentence into French.

Meta’s Llama Firewall Bypassed

The scan returned a benign consequence regardless of the clearly malicious intent. The group additional demonstrated that straightforward leetspeak obfuscations remodeling “ignore” into “1gn0r3,” as an illustration additionally slipped previous the filter, receiving negligible jailbreak scores when examined in opposition to the Llama-Immediate-Guard-2–86M mannequin.

These experiments revealed that the firewall’s rule set largely relies on English key phrases and actual sample matches, leaving it blind to variations in language or character substitutions.

Equally troubling was the group’s analysis of CODE_SHIELD, the module designed to flag insecure code options generated by LLMs.

In a single proof-of-concept, the researchers instructed an assistant to provide a easy Python Flask endpoint with a SQL question. The ensuing code concatenated person enter instantly into the question string, making a textbook SQL injection vulnerability.

Regardless of the obtrusive threat, CODE_SHIELD categorised the output as secure and allowed it via unchallenged.

Trendyol’s builders emphasised a number of essential considerations about this shortcoming:

Manufacturing Threat: Groups may depend on LLM-generated snippets with out thorough human evaluation.

Safety Gaps: Overreliance on automated scanning introduces essential vulnerabilities.

System Publicity: Manufacturing methods develop into weak to exploitation via undetected flaws.

Belief Points: Builders might develop false confidence in automated safety instruments.

The oversight illustrates how these automated methods can create harmful blind spots in enterprise safety frameworks.

Unicode Steganography and Invisible Directions

Essentially the most insidious bypass concerned embedding hidden directions inside invisible Unicode characters.

By inserting zero-width areas and different non-printing code factors across the phrase “ignore all earlier directions,” the researchers camouflaged malicious directives inside an innocuous query concerning the capital of France.

Though customers and informal observers noticed solely a innocent question, the LLM acknowledged and executed the hid command. When submitted to Llama Firewall, this payload handed inspection with a zero-threat rating.

Trendyol’s group demonstrated that even minimal invisible payloads might reliably subvert system prompts and trigger fashions to provide arbitrary or dangerous outputs.

This method poses a very acute menace in collaborative settings the place prompts are copy-pasted amongst builders, and automatic scanners lack visibility into hidden characters.

In complete, Trendyol examined 100 distinctive injection payloads in opposition to Llama Firewall. Half of those assaults bypassed the system’s defenses, suggesting that whereas the firewall provides some safety, it’s removed from complete.

The profitable bypasses spotlight eventualities by which attackers might coerce LLMs to ignore essential security filters, output biased or offensive content material, or generate insecure code prepared for execution.

For organizations like Trendyol, which plan to combine LLMs into developer platforms, automation pipelines, and customer-facing functions, these vulnerabilities characterize concrete dangers that might result in information leaks, system compromise, or regulatory noncompliance.

Trendyol’s safety researchers reported their preliminary findings to Meta on Might 5, 2025, detailing the multilingual and obfuscated immediate injections.

Meta acknowledged receipt and commenced an inner evaluation however in the end closed the report as “informative” on June 3, declining to situation a bug bounty.

A parallel disclosure to Google concerning invisible Unicode injections was equally closed as a reproduction.

Regardless of the lukewarm vendor responses, Trendyol has since refined its personal menace modeling practices and is sharing its case examine with the broader AI safety group.

The corporate urges different organizations to conduct rigorous red-teaming of LLM defenses earlier than rolling them into manufacturing, stressing that immediate filtering alone can not stop all types of compromise.

As enterprises race to harness the ability of generative AI, Trendyol’s analysis serves as a cautionary story: with out layered, context-aware safeguards, even cutting-edge firewall instruments can fall prey to deceptively easy assault vectors.

The safety group should now collaborate on extra resilient detection strategies and greatest practices to remain forward of adversaries who constantly innovate new methods to govern these highly effective methods.

Examine stay malware conduct, hint each step of an assault, and make sooner, smarter safety choices -> Strive ANY.RUN now

Cyber Security News Tags:Bypassed, Firewall, Injection, Llama, Metas, Prompt, Vulnerability

Post navigation

Previous Post: OpenAI is to Launch a AI Web Browser in Coming Weeks
Next Post: How to Monitor Your Identity on the Dark Web

Related Posts

Splunk Address Third Party Packages Vulnerabilities in Enterprise Versions Cyber Security News
OpenAI is to Launch a AI Web Browser in Coming Weeks Cyber Security News
Securing the Cloud Best Practices for Multi-Cloud Environments Cyber Security News
How To Defend Against These Phishing Kit Attacks  Cyber Security News
Microsoft Eliminated High-Privilege Access to Enhance Microsoft 365 Security Cyber Security News
Microsoft Bookings Vulnerability Let Attackers Alter the Meeting Details Cyber Security News

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News

Recent Posts

  • How to Monitor Your Identity on the Dark Web
  • Meta’s Llama Firewall Bypassed Using Prompt Injection Vulnerability
  • OpenAI is to Launch a AI Web Browser in Coming Weeks
  • WordPress GravityForms Plugin Hacked to Include Malicious Code
  • First Rowhammer Attack Targeting NVIDIA GPUs

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Archives

  • July 2025
  • June 2025
  • May 2025

Recent Posts

  • How to Monitor Your Identity on the Dark Web
  • Meta’s Llama Firewall Bypassed Using Prompt Injection Vulnerability
  • OpenAI is to Launch a AI Web Browser in Coming Weeks
  • WordPress GravityForms Plugin Hacked to Include Malicious Code
  • First Rowhammer Attack Targeting NVIDIA GPUs

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News