On February 18, 2026, OpenAI officially announced the launch of EVMbench, a sophisticated benchmarking system designed to evaluate and improve the ability of AI agents to secure cryptocurrency tokens and smart contracts. Developed in collaboration with the crypto-focused venture capital firm Paradigm, the system introduces a standardized testing framework specifically for code running on the Ethereum Virtual Machine (EVM). This initiative addresses one of the most persistent bottlenecks in the decentralized finance sector: the vulnerability of smart contracts to exploits that have historically resulted in billions of dollars in losses. EVMbench operates by challenging AI models to identify, exploit, and then remediate critical security flaws within a controlled, local Ethereum execution environment. By providing a programmatic way to measure performance, OpenAI aims to foster a new generation of “security-first” AI agents that can act as autonomous auditors, potentially reducing the industry’s reliance on expensive and time-consuming manual security reviews.
Three Pillars of Evaluation: Detect, Exploit, and Patch
The EVMbench framework is structured around three distinct evaluation modes that mirror the lifecycle of a security audit. In the “Detect” mode, AI agents are tasked with auditing a curated dataset of 120 high-severity vulnerabilities drawn from 40 real-world repositories, with success measured by the agent’s recall of ground-truth flaws. The “Exploit” mode goes further by requiring the agent to demonstrate how a vulnerability can be leveraged, using on-chain events and balance deltas to confirm a successful “attack.” Finally, the “Patch” mode evaluates the agent’s capacity to apply effective code fixes that remediate the issue without breaking the contract’s intended functionality. OpenAI utilized a Rust-based re-execution framework to ensure these evaluations are fast, reproducible, and resistant to “cheating” by the models. Early tests of frontier models indicate that while AI agents are becoming remarkably proficient at end-to-end exploit generation, significant gaps remain in their ability to provide comprehensive, context-aware patches for complex multi-contract ecosystems.
Supporting the Ecosystem Through Cybersecurity Grants and Open Sourcing
Coinciding with the release of EVMbench, OpenAI has committed 10 million dollars in API credits through its Cybersecurity Grant Program to support defensive research in the blockchain space. This funding is specifically targeted at open-source projects and critical infrastructure that protect users from malicious actors. Furthermore, OpenAI has open-sourced the EVMbench dataset and evaluation harness, encouraging researchers and developers to contribute new vulnerability patterns and refinement metrics. This move follows OpenAI’s acquisition of OpenClaw earlier this month, signaling a massive strategic pivot toward autonomous agents that can manage and secure digital wealth. As the “agentic economy” begins to take shape in 2026, the introduction of a rigorous, cross-industry benchmark is seen as a necessary step toward building a trustless financial system where code is not only law but is also verifiably secure. By partnering with Paradigm, OpenAI is leveraging deep crypto-native expertise to ensure that its AI models are tested against the most realistic and sophisticated threats currently facing the global on-chain community.
