Skip to content
  • Home
  • Cyber Map
  • About Us – Contact
  • Disclaimer
  • Terms and Rules
  • Privacy Policy
Cyber Web Spider Blog – News

Cyber Web Spider Blog – News

Globe Threat Map provides a real-time, interactive 3D visualization of global cyber threats. Monitor DDoS attacks, malware, and hacking attempts with geo-located arcs on a rotating globe. Stay informed with live logs and archive stats.

  • Home
  • Cyber Map
  • Cyber Security News
  • Security Week News
  • The Hacker News
  • How To?
  • Toggle search form
OpenAI Unveils EVMbench for Smart Contract Security

OpenAI Unveils EVMbench for Smart Contract Security

Posted on February 19, 2026 By CWS

OpenAI has partnered with Paradigm, a leading crypto investment firm, to introduce EVMbench, a groundbreaking benchmark aimed at evaluating AI agents’ proficiency in identifying, rectifying, and exploiting significant vulnerabilities within smart contracts.

This innovative release marks a pivotal advancement in assessing artificial intelligence’s capabilities in environments that are economically vital. Smart contracts, which are instrumental in securing more than $100 billion in open-source crypto assets, stand at the forefront of this technological evaluation.

Comprehensive Vulnerability Assessment

EVMbench derives its analysis from a collection of 120 meticulously curated vulnerabilities, gathered from 40 distinct security audits. A significant portion of these vulnerabilities originated from open code audit competitions hosted on platforms like Code4rena.

Moreover, the benchmark extends its focus by incorporating scenarios from the security auditing process of the Tempo blockchain, a Layer 1 platform crafted for high-throughput stablecoin payments. This inclusion broadens EVMbench’s applicability, particularly in the realm of payment-centric smart contract coding, where stablecoin transactions are anticipated to surge.

Three Modes of Evaluation

The EVMbench framework assesses AI agents’ competencies across three specific modes: detect, patch, and exploit. Each mode addresses a unique phase in the lifecycle of smart contract security.

In the detect mode, agents are evaluated on their ability to audit a repository and accurately recall known vulnerabilities. Patch mode requires agents to amend flawed contracts while maintaining their intended functionalities, confirmed through automated testing. The exploit mode challenges agents to conduct comprehensive fund-draining attacks in a controlled, sandboxed blockchain environment.

To ensure reproducibility, OpenAI has developed a Rust-based harness that deploys contracts predictably, limiting unsafe RPC methods. All exploitation tasks are executed in an isolated local Anvil environment, away from live networks.

Performance and Future Outlook

Initial results from EVMbench show significant variation in performance across different task types. In exploit mode, the GPT‑5.3‑Codex model achieved a remarkable 72.2% score, a dramatic improvement from its predecessor, GPT‑5, which scored 31.9% just six months earlier.

While agents excel in exploit tasks due to their clear objectives, detect and patch modes present greater challenges. Agents often cease operations after identifying a single vulnerability and struggle to correct subtle flaws without disrupting existing contract functionalities.

OpenAI acknowledges that EVMbench does not entirely capture the complexities of real-world smart contract security. The current grading system is unable to distinguish true vulnerabilities from false positives when agents exceed the usual human-auditor findings.

In conjunction with EVMbench’s release, OpenAI has allocated $10 million in API credits through its Cybersecurity Grant Program to promote defensive security research, with a focus on open-source software and critical infrastructure. Furthermore, the company has announced the expansion of Aardvark, its security research agent, now available through a private beta program. EVMbench’s tasks, tools, and evaluation framework are publicly accessible to support ongoing research into AI-driven cybersecurity capabilities.

Stay updated with our daily cybersecurity news on Google News, LinkedIn, and X. Reach out to us to share your stories.

Cyber Security News Tags:AI capabilities, blockchain security, Cybersecurity, EVMbench, OpenAI, Paradigm, security audits, smart contracts, Tempo blockchain, Vulnerabilities

Post navigation

Previous Post: AI Tools Misused for Stealthy Malware Communication
Next Post: Guardian AI Revolutionizes Penetration Testing with GPT-4

Related Posts

Multiple GitLab Vulnerabilities Enables Account Takeover and Stored XSS Exploitation Multiple GitLab Vulnerabilities Enables Account Takeover and Stored XSS Exploitation Cyber Security News
Dark Web Travel Agencies Offering Cheap Travel Deals to Steal Credit Card Data Dark Web Travel Agencies Offering Cheap Travel Deals to Steal Credit Card Data Cyber Security News
Palo Alto Networks Firewall Vulnerability Allows Unauthenticated Attackers to Trigger Denial of Service Palo Alto Networks Firewall Vulnerability Allows Unauthenticated Attackers to Trigger Denial of Service Cyber Security News
FileFix Attack Exploits Windows Browser Features to Bypass Mark-of-the-Web Protection FileFix Attack Exploits Windows Browser Features to Bypass Mark-of-the-Web Protection Cyber Security News
Mustang Panda Using New DLL Side-Loading Technique to Deliver Malware Mustang Panda Using New DLL Side-Loading Technique to Deliver Malware Cyber Security News
Hackers Target React Server Components for Cyber Attacks Hackers Target React Server Components for Cyber Attacks Cyber Security News

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News

Recent Posts

  • Guardian AI Revolutionizes Penetration Testing with GPT-4
  • OpenAI Unveils EVMbench for Smart Contract Security
  • AI Tools Misused for Stealthy Malware Communication
  • North Korean Hackers Target Crypto with Fake MetaMask
  • Microsoft Exchange Error Flags Legitimate Emails as Phishing

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Archives

  • February 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025

Recent Posts

  • Guardian AI Revolutionizes Penetration Testing with GPT-4
  • OpenAI Unveils EVMbench for Smart Contract Security
  • AI Tools Misused for Stealthy Malware Communication
  • North Korean Hackers Target Crypto with Fake MetaMask
  • Microsoft Exchange Error Flags Legitimate Emails as Phishing

Pages

  • About Us – Contact
  • Disclaimer
  • Privacy Policy
  • Terms and Rules

Categories

  • Cyber Security News
  • How To?
  • Security Week News
  • The Hacker News