A software developer carefully inspects code on their computer screen in an office setting.

AIHigh

AI Fixes Half of Security Bugs But Adds Risks

TL;DR: A new benchmark shows AI agents can fix about 50% of real-world security vulnerabilities. However, they sometimes introduce plausible but insecure fixes, creating new risks for development teams.

By Neeraj DhimanHacker Newsjust now2 min readupdated 1m ago

Source

Key facts

Category: AI
Impact: High
Published: just now
Source: Hacker News

Full summary

A new benchmark found AI agents successfully fix about half of real-world security bugs, but can also introduce insecure code.

A new benchmark tested how well large language model (LLM) agents can fix real-world security vulnerabilities. The study used 20 known security flaws from 18 popular Python projects, including Pillow and GitPython. Researchers ran 300 tests across five different AI agents, tasking them with patching the code inside a secure, isolated environment. The results showed that, on average, the AI agents successfully fixed the vulnerabilities about half the time. The performance of each agent was measured against hidden security tests created by the original project maintainers, ensuring an accurate assessment of the proposed fixes. This data-driven approach provides one of the first clear pictures of how current AI technology performs on complex, real-world security tasks.

These findings are significant for developers, security teams, and CTOs. The 50% success rate demonstrates that AI can be a powerful tool for automating parts of the vulnerability remediation process, potentially speeding up patching and reducing manual effort. However, the study also revealed a critical risk: AI agents can produce code that appears to be a correct fix but remains insecure. These plausible but flawed patches could be easily overlooked by a human reviewer, introducing new, subtle vulnerabilities into a codebase. This highlights that while AI can assist in security, it cannot yet replace expert human oversight. Teams considering these tools must implement rigorous testing and code review processes to validate any AI-generated patches before deploying them to production.

The research also provides a cost-performance analysis of the different models tested, a crucial factor for business leaders. This allows organizations to weigh the price of using a particular AI model against its effectiveness at fixing security flaws. As AI capabilities continue to evolve, benchmarks like this will be essential for tracking progress and understanding the practical limitations of automated security tools. For now, the takeaway is that AI agents are a promising but imperfect assistant for cybersecurity. They can help teams draft fixes faster, but the final responsibility for code security still rests firmly with human developers who must verify every change.

Primary source: Hacker News

Key facts

Full summary

Related on Notifire