Can AI Agents Enhance Ethereum Security? OpenAI and Paradigm Pioneer a Testing Arena
Key Takeaways
- OpenAI and Paradigm have launched EVMbench to enhance Ethereum smart contract security.
- EVMbench tests AI agents’ capability to detect, patch, and exploit smart contract vulnerabilities.
- The initiative reflects the ever-growing importance of smart contract security amid expanding AI-driven utilities.
- Significant advancements were made with the GPT-5.3-Codex, demonstrating potential in cybersecurity applications.
WEEX Crypto News, 2026-02-19 09:43:01
The burgeoning world of cryptocurrencies and blockchain technology hinges increasingly on robust security measures. Among these technologies, Ethereum, with its decentralized network and comprehensive suite of smart contracts, stands as a pillar. But with complex systems come vulnerabilities. Addressing this, OpenAI, renowned for its developments in artificial intelligence, and Paradigm, a crypto-focused investment powerhouse, have embarked on a joint venture—EVMbench.
The Genesis of EVMbench
Designed as a sophisticated testing ground, EVMbench aims to rigorously evaluate AI agents in their proficiency to identify, rectify, and exploit significant vulnerabilities in Ethereum Virtual Machine (EVM) smart contracts. But why is this important? To appreciate the significance, one must understand the role of smart contracts. These self-executing contracts with terms written in code operate the core functionalities of the Ethereum network. Whether it involves decentralized finance (DeFi) protocols or token launches, smart contracts are integral.
With technological advancements fostering an uptick in decentralized applications, the importance of robust security systems cannot be overstated. As per the data from Token Terminal, in November 2025 alone, Ethereum saw a record deployment of 1.7 million smart contracts. Within just the previous week, the network had 669,500 contracts deployed, illustrating the scale and criticality of maintaining their security.
Insights into EVMbench
EVMbench’s inception results from meticulous planning and leveraging past vulnerabilities. The system draws insights from 120 carefully selected vulnerabilities from 40 audits, primarily sourced from open audit competitions like Code4rena. Furthermore, it incorporates scenarios from Tempo, Stripe’s purpose-built blockchain specializing in high-throughput, low-cost stablecoin payments. With participation from prominent entities such as Visa and Shopify, Stripe’s Tempo initiative, active since December, further emphasizes the real-world applicability of these systems.
Three Pillars of Evaluation: Detect, Patch, and Exploit
EVMbench focuses on three critical modes to evaluate AI models: detect, patch, and exploit. In the “detect” phase, AI agents scrutinize code repositories for vulnerabilities, garnering scores based on their recall of known issues. The “patch” mode requires agents to address these vulnerabilities, ensuring the original contract functionalities remain intact. Lastly, in the “exploit” phase, agents simulate full-scale fund-draining attacks within a controlled blockchain environment, judged on the basis of deterministic transaction replays.
Performance on these evaluations offers a mirror into the capabilities of AI in cybersecurity. For example, with the Codex CLI, OpenAI’s GPT-5.3-Codex astonished with an exploit-mode score of 72.2%, significantly surpassing the 31.9% achieved by GPT-5 just six months earlier. However, it’s crucial to note the limitations in the detection and patch phases, where agents occasionally did not conduct exhaustive audits or faltered in preserving contract functionality.
Broader Implications and Industry Dynamics
While EVMbench promises profound implications for Ethereum’s security, OpenAI and Paradigm caution that it does not encapsulate the full spectrum of real-world security intricacies. However, testing in economically consequential contexts is imperative, especially as AI continues to be wielded as a tool for both security professionals and cyber attackers.
The digital frontier sees diverse voices. Sam Altman, OpenAI’s founder, and Vitalik Buterin, Ethereum’s co-founder, have expressed differing views on AI’s developmental pace. In early 2025, Altman confidently articulated his firm’s ability to craft artificial general intelligence (AGI) as traditionally conceptualized. Conversely, Buterin advocates for a ‘soft pause,’ creating a safety net to mitigate risks if warning signs arise during AI deployment.
The Future of AI in Cybersecurity
The collaboration between OpenAI and Paradigm echoes a broader trend in leveraging cutting-edge AI to bolster cybersecurity—an arena where attackers and defenders perpetually vie for supremacy. The prospects of AI bolstering Ethereum’s security and, by extension, broader blockchain platforms unlock fascinating possibilities. As the AI models improve, they serve as both a deterrent to malicious activities and a boon for secure smart contract deployment, safeguarding an increasing array of applications on the Ethereum network.
With the expansion of smart contracts and decentralized applications, EVMbench’s role becomes integral. It offers a balanced mix of foresight and innovation, crucial for maintaining the security of billions in digital assets transacting through these networks.
By aligning AI capabilities with the expansive needs of blockchain security, EVMbench marks an evolutionary step in crafting resilient digital infrastructures. As the world progresses into a digital-first economy, such initiatives position technologies like Ethereum on solid ground, ready to face future challenges head-on.
As industries continue to converge with technological advancements, the role of AI in cybersecurity will likely grow. Its potential to transform and enhance security measures is undeniable, providing an impetus for further innovations that drive the ecosystem forward. With initiatives like EVMbench leading the charge, the future of blockchain security looks promising, heralding new possibilities for a safer digital world.
FAQ
What exactly is EVMbench, and how does it improve Ethereum security?
EVMbench is a cutting-edge tool developed by OpenAI and Paradigm to scrutinize and enhance the security of Ethereum’s smart contracts. It achieves this by assessing AI agents’ ability to detect, patch, and exploit vulnerabilities, thereby fortifying the network against potential cyberspace threats.
How has GPT-5.3-Codex performed in EVMbench’s evaluations?
In the exploit mode of EVMbench, GPT-5.3-Codex demonstrated a remarkable performance, achieving a score of 72.2%. This marked a significant improvement over its predecessor, GPT-5, reflecting advancements in AI’s ability to handle complex security challenges within blockchain environments.
Why are smart contracts critical to Ethereum’s network?
Smart contracts are fundamental to Ethereum’s network, automating transactions and enabling decentralized applications to function seamlessly. They power various operations, from DeFi protocols to token launches, making their security a priority.
How does EVMbench utilize past vulnerabilities?
EVMbench leverages insights from 120 selected vulnerabilities drawn from extensive audits and competitions like Code4rena. This approach ensures that AI agents are evaluated against a wide array of documented weaknesses, fostering a comprehensive understanding of potential risks.
What are the broader implications of EVMbench in AI-driven cybersecurity?
EVMbench reflects a pivotal moment in the integration of AI with cybersecurity. By leveraging AI to enhance Ethereum’s security, it sets a precedent for future collaborations that explore AI’s potential to revolutionize the protection of digital infrastructures against cyber threats.
You may also like

WEEX LALIGA Partnership 2026: Where Football Excellence Meets Crypto Innovation
WEEX becomes official crypto exchange partner of LALIGA in Hong Kong and Taiwan. Discover how this partnership brings together football excellence and trading discipline.

AI Apocalypse, a massive short squeeze

The "Second Truth" of the Luna Crash: Jane Street Exits Ahead of Plunge

Jane Street Market Manipulation, Stripe Considering Acquiring PayPal, What's the Overseas Crypto Community Talking About Today?
WEEX × LALIGA 2026: Trade Crypto, Take Your Shot & Win Official LALIGA Prizes
Unlock shoot attempts through futures trading, spot trading, or referrals. Turn match predictions into structured rewards with BTC, USDT, position airdrops, and LALIGA merchandise on WEEX.

a16z: Why Do AI Agents Need a Stablecoin for B2B Payments?

February 24th Market Key Intelligence, How Much Did You Miss?

Web4.0, perhaps the most needed narrative for cryptocurrency

Some Key News You Might Have Missed Over the Chinese New Year Holiday

Key Market Information Discrepancy on February 24th - A Must-Read! | Alpha Morning Report

$1,500,000 Salary Job: How to Achieve with $500 AI?

Bitcoin On-Chain User Attrition at 30%, ETF Hemorrhage at $4.5 Billion: What's Next for the Next 3 Months?

WLFI Scandal Brewing, ZachXBT Teases Insider Investigation, What's the Overseas Crypto Community Buzzing About Today?

Debunking the AI Doomsday Myth: Why Establishment Inertia and the Software Wasteland Will Save Us
Editor's Note: Citrini7's cyberpunk-themed AI doomsday prophecy has sparked widespread discussion across the internet. However, this article presents a more pragmatic counter perspective. If Citrini envisions a digital tsunami instantly engulfing civilization, this author sees the resilient resistance of the human bureaucratic system, the profoundly flawed existing software ecosystem, and the long-overlooked cornerstone of heavy industry. This is a frontal clash between Silicon Valley fantasy and the iron law of reality, reminding us that the singularity may come, but it will never happen overnight.
The following is the original content:
Renowned market commentator Citrini7 recently published a captivating and widely circulated AI doomsday novel. While he acknowledges that the probability of some scenes occurring is extremely low, as someone who has witnessed multiple economic collapse prophecies, I want to challenge his views and present a more deterministic and optimistic future.
In 2007, people thought that against the backdrop of "peak oil," the United States' geopolitical status had come to an end; in 2008, they believed the dollar system was on the brink of collapse; in 2014, everyone thought AMD and NVIDIA were done for. Then ChatGPT emerged, and people thought Google was toast... Yet every time, existing institutions with deep-rooted inertia have proven to be far more resilient than onlookers imagined.
When Citrini talks about the fear of institutional turnover and rapid workforce displacement, he writes, "Even in fields we think rely on interpersonal relationships, cracks are showing. Take the real estate industry, where buyers have tolerated 5%-6% commissions for decades due to the information asymmetry between brokers and consumers..."
Seeing this, I couldn't help but chuckle. People have been proclaiming the "death of real estate agents" for 20 years now! This hardly requires any superintelligence; with Zillow, Redfin, or Opendoor, it's enough. But this example precisely proves the opposite of Citrini's view: although this workforce has long been deemed obsolete in the eyes of most, due to market inertia and regulatory capture, real estate agents' vitality is more tenacious than anyone's expectations a decade ago.
A few months ago, I just bought a house. The transaction process mandated that we hire a real estate agent, with lofty justifications. My buyer's agent made about $50,000 in this transaction, while his actual work — filling out forms and coordinating between multiple parties — amounted to no more than 10 hours, something I could have easily handled myself. The market will eventually move towards efficiency, providing fair pricing for labor, but this will be a long process.
I deeply understand the ways of inertia and change management: I once founded and sold a company whose core business was driving insurance brokerages from "manual service" to "software-driven." The iron rule I learned is: human societies in the real world are extremely complex, and things always take longer than you imagine — even when you account for this rule. This doesn't mean that the world won't undergo drastic changes, but rather that change will be more gradual, allowing us time to respond and adapt.
Recently, the software sector has seen a downturn as investors worry about the lack of moats in the backend systems of companies like Monday, Salesforce, Asana, making them easily replicable. Citrini and others believe that AI programming heralds the end of SaaS companies: one, products become homogenized, with zero profits, and two, jobs disappear.
But everyone overlooks one thing: the current state of these software products is simply terrible.
I'm qualified to say this because I've spent hundreds of thousands of dollars on Salesforce and Monday. Indeed, AI can enable competitors to replicate these products, but more importantly, AI can enable competitors to build better products. Stock price declines are not surprising: an industry relying on long-term lock-ins, lacking competitiveness, and filled with low-quality legacy incumbents is finally facing competition again.
From a broader perspective, almost all existing software is garbage, which is an undeniable fact. Every tool I've paid for is riddled with bugs; some software is so bad that I can't even pay for it (I've been unable to use Citibank's online transfer for the past three years); most web apps can't even get mobile and desktop responsiveness right; not a single product can fully deliver what you want. Silicon Valley darlings like Stripe and Linear only garner massive followings because they are not as disgustingly unusable as their competitors. If you ask a seasoned engineer, "Show me a truly perfect piece of software," all you'll get is prolonged silence and blank stares.
Here lies a profound truth: even as we approach a "software singularity," the human demand for software labor is nearly infinite. It's well known that the final few percentage points of perfection often require the most work. By this standard, almost every software product has at least a 100x improvement in complexity and features before reaching demand saturation.
I believe that most commentators who claim that the software industry is on the brink of extinction lack an intuitive understanding of software development. The software industry has been around for 50 years, and despite tremendous progress, it is always in a state of "not enough." As a programmer in 2020, my productivity matches that of hundreds of people in 1970, which is incredibly impressive leverage. However, there is still significant room for improvement. People underestimate the "Jevons Paradox": Efficiency improvements often lead to explosive growth in overall demand.
This does not mean that software engineering is an invincible job, but the industry's ability to absorb labor and its inertia far exceed imagination. The saturation process will be very slow, giving us enough time to adapt.
Of course, labor reallocation is inevitable, such as in the driving sector. As Citrini pointed out, many white-collar jobs will experience disruptions. For positions like real estate brokers that have long lost tangible value and rely solely on momentum for income, AI may be the final straw.
But our lifesaver lies in the fact that the United States has almost infinite potential and demand for reindustrialization. You may have heard of "reshoring," but it goes far beyond that. We have essentially lost the ability to manufacture the core building blocks of modern life: batteries, motors, small-scale semiconductors—the entire electricity supply chain is almost entirely dependent on overseas sources. What if there is a military conflict? What's even worse, did you know that China produces 90% of the world's synthetic ammonia? Once the supply is cut off, we can't even produce fertilizer and will face famine.
As long as you look to the physical world, you will find endless job opportunities that will benefit the country, create employment, and build essential infrastructure, all of which can receive bipartisan political support.
We have seen the economic and political winds shifting in this direction—discussions on reshoring, deep tech, and "American vitality." My prediction is that when AI impacts the white-collar sector, the path of least political resistance will be to fund large-scale reindustrialization, absorbing labor through a "giant employment project." Fortunately, the physical world does not have a "singularity"; it is constrained by friction.
We will rebuild bridges and roads. People will find that seeing tangible labor results is more fulfilling than spinning in the digital abstract world. The Salesforce senior product manager who lost a $180,000 salary may find a new job at the "California Seawater Desalination Plant" to end the 25-year drought. These facilities not only need to be built but also pursued with excellence and require long-term maintenance. As long as we are willing, the "Jevons Paradox" also applies to the physical world.
The goal of large-scale industrial engineering is abundance. The United States will once again achieve self-sufficiency, enabling large-scale, low-cost production. Moving beyond material scarcity is crucial: in the long run, if we do indeed lose a significant portion of white-collar jobs to AI, we must be able to maintain a high quality of life for the public. And as AI drives profit margins to zero, consumer goods will become extremely affordable, automatically fulfilling this objective.
My view is that different sectors of the economy will "take off" at different speeds, and the transformation in almost all areas will be slower than Citrini anticipates. To be clear, I am extremely bullish on AI and foresee a day when my own labor will be obsolete. But this will take time, and time gives us the opportunity to devise sound strategies.
At this point, preventing the kind of market collapse Citrini imagines is actually not difficult. The U.S. government's performance during the pandemic has demonstrated its proactive and decisive crisis response. If necessary, massive stimulus policies will quickly intervene. Although I am somewhat displeased by its inefficiency, that is not the focus. The focus is on safeguarding material prosperity in people's lives—a universal well-being that gives legitimacy to a nation and upholds the social contract, rather than stubbornly adhering to past accounting metrics or economic dogma.
If we can maintain sharpness and responsiveness in this slow but sure technological transformation, we will eventually emerge unscathed.
Source: Original Post Link

Have Institutions Finally 'Entered Crypto,' but Just to Vampire?

A $2 Trillion Denouement: The AI-Driven Global Economic Crisis of 2028

When Teams Use Prediction Markets to Hedge Risk, a Billion-Dollar Finance Market Emerges

Cryptocurrency Market Overview and Emerging Trends
Key Takeaways Understanding the current state of the cryptocurrency market is crucial for investors and enthusiasts alike, providing…
WEEX LALIGA Partnership 2026: Where Football Excellence Meets Crypto Innovation
WEEX becomes official crypto exchange partner of LALIGA in Hong Kong and Taiwan. Discover how this partnership brings together football excellence and trading discipline.
AI Apocalypse, a massive short squeeze
The "Second Truth" of the Luna Crash: Jane Street Exits Ahead of Plunge
Jane Street Market Manipulation, Stripe Considering Acquiring PayPal, What's the Overseas Crypto Community Talking About Today?
WEEX × LALIGA 2026: Trade Crypto, Take Your Shot & Win Official LALIGA Prizes
Unlock shoot attempts through futures trading, spot trading, or referrals. Turn match predictions into structured rewards with BTC, USDT, position airdrops, and LALIGA merchandise on WEEX.