It introduces a focused approach to evaluating vulnerabilities in large language models used within AI agents, highlighting critical security insights for the industry.
In a move aimed at strengthening AI agent security, cybersecurity experts and researchers have unveiled the Backbone Breaker Benchmark (b3) — an open-source evaluation framework tailored to test the security of large language models (LLMs) embedded within AI agents. Developed by Lakera, in collaboration with the UK AI Security Institute (AISI) and Check Point Software Technologies, b3 introduces a new way to analyze vulnerabilities without simulating full agent workflows.
At the core lies a concept called “threat snapshots” — targeted evaluations that zoom in on critical moments where an AI system is most likely to be exploited. This approach strips away the complexity of end-to-end simulation, focusing instead on realistic attack surfaces such as prompt exfiltration, malicious code injection, phishing link insertion, denial-of-service, and unauthorized tool calls.
The benchmark integrates 10 representative threat snapshots backed by a dataset of 19,433 adversarial attacks collected through Lakera’s gamified red-teaming platform, Gandalf: Agent Breaker. The game challenges players to hack and defend AI systems, producing rich real-world data on LLM vulnerabilities. Early tests using 31 widely used LLMs produced key findings that challenge assumptions in AI security. Models with stronger reasoning abilities demonstrated better defense against attacks, while model size showed no direct link to security performance. Interestingly, closed-source models outperformed open-weight ones, though top open models are quickly closing the gap.
Lakera’s Chief Scientist, Mateo Rojas-Carulla, noted that the benchmark gives developers a “realistic way to measure and improve their security posture,” emphasizing that “AI agents are only as secure as the LLMs that power them.”














































































